PREMIER TECHNOLOGY DEVELOPMENT AND SERVICE CENTER FOR COCOA BIOTECHNOLOGY
Home | Profile | Achievement | Programmes | Projects | Staffs | Publications | Journals |
Biotech Glossary | Bioinformatics | Lab Protocol | Notes | Malaysia University |

The Applications and Constraints of New Technologiesin Plant Breeding
Mike J. Wilkinson
Plant Science Laboratories, The University of Reading, Whiteknights, P0 Box 221, Reading RG6 6AS, United Kingdom

Abstract
The usefulness of a range of modern biotechnological and molecular tools is evaluated for the genetic improvement and maintenance of cocoa breeding material. The process of selection during plant breeding inevitably leads to a narrowing of the genetic base of any crop. In turn, this can lead to an increased vulnerability of the crop to pests and disease attack. Cocoa is currently facing threat from several diseases, particularly from pathogenic fungL In its current form, cocoa is not amenable to widespread use of prophylactic chemicals for disease control and there is limited scope for control by changed agronomic practice. This leaves biological control (which is in its infancy in cocoa) or more realistically in the short term, internal resistance as the main means of protecting the crop in the field. The maintenance of active and fully characterised germplasm collections represents the starting point for introducing resistance and quality traits nto cultivated material. Resources inevitably limit the size of these collections and so it is important that managers have a rational basis upon which to select which clones to maintain and which to discard. The role and limitations of molecular biology in reaching these decisions are discussed. Effective use of germplasm collections also depends on clear and accurate identification of materiaJ held, ready access of disease-free stocks and evaluation data on the nature and extent of any resistance genes present. Problems of clone identification can be addressed partly by careful management but ultimately rely on a system for identifying mislabelled stocks. The relative merits of RAPDs, AFLP, ISSR, SSR and locus-specific polymorphisms for addressing this problem are examined. Ultimately, the holding of stocks is only of any practical value if the resistance genes or quality traits they contain can be identified, characterised and quickly transferred into breeding material. Here, QTL analysis, transcriptomics and marker assisted breeding each have potential for accelerating the process of gene transfer. Finally, the benefits, limitations and hazards of using genetic modification for cocoa improvement are examined.
Introduction
Over the past fifty years, new breeding technologies have resulted in huge improvements in the yield and productivity of temperate combinable crops such as wheat, potato and maize. For instance, cereal production has increased at a rate of about 60% per decade from 740 Mt in 1950 to about 1,900 Mt in 1995 and will have to increase by another 690 Mt to meet projected demand in 2020 (Dyson 1996). Approximately half of the increase in productivity is generally attributed to genetic improvement. The cultivation of cocoa has not seen this same improvement in agronomic performance, with an increase of around 25% over the last decade from 2407 thousand tonnes in 1989 to 3003 thousand tonnes in 1999 (Anon. 1999 and 2000), which has been mainly due to an increase in land area. Diseases remain a major threat to the yield of cocoa. This paper examines the applications and constraints of new technologies in modern plant breeding and evaluates their potential for the future of improvement of cocoa.
The role of molecular biology in cocoa breeding
The tools available
Conventional plant breeding has enabled significant improvements in the agronomic performance of most crops and will continue to play a central role in the future improvement of cocoa. However, the advent of new technologies greatly enhances the ability of breeders to effect further improvements to crop quality, disease resistance and yield. The use of molecular marker systems aids clone identification and paternity analysis, the assessment of genetic affinity of clones and ultimately will allow gene-based evaluation of the agronomic value or physiology of clones. The emerging field of transcriptomics (the study of global gene expression) seems set to speed the identification of key genes in important pathways and will increase our understanding of how these genes interact. The possibility of subjecting the crop to Genetic Modification may open the way to effecting changes to particular processes and biochemical pathways with greater precision. For a breeder, the starting point for all of this work lies in developing an understanding of the various molecular tools that are available and their limitations.
The type of marker system used and the purpose to which it is applied defines the value of molecular markers for breeding efforts. For global applications, however, it is important that the system used generates differences between closely related individuals but also does so in a way that is easily reproduced and is amenable to database entry. There are many different marker systems currently available to breeders and scientists. These methods can be broadly categorised into two main classes.

Multi-locus systems. These include Amplified Fragment Length Polymorphisms (AFLP), Inter Simple Sequence Repeat (ISSR), Retrotransposon Microsatellite Amplified Polymorphisms (REMAP) and Randomly Amplified Polymorphic DNA (RAPD) analyses. Characteristically, this type of marker system produces large numbers of bands from throughout the genome. In general, the markers tend to arise from non-coding DNA. The presence and absence of bands of the same size provide the polymorphisms that are used as the basis of discrimination. In essence, this is analogous to the comparison of bar code patterns. Individual band differences within the profiles can be converted into simpler locus-specific polymorphisms (single band differences) using Site-Specific Amplified Polymorphisms (SSAP).
Multi-allelic systems. These systems include Simple Sequence Repeat POR (SSRPCR) and isozymes, using markers that arise from a single point in the genome. The polymorphism is given by differences in the size of the product amplified, i.e bands in different positions on a gel.
Multi-locus markers
RAPD analysis was the first multi-locus protocol based on Polymerase Chain Reaction (PCR) to be applied for the genetic characterisation of cocoa (Wilde et at 1992). This system has the innate advantage of being quick, technically simple and cheap to perform but has developed a reputation of questionable reproducibility between laboratories (Gilmour 1994), in yielding only small numbers of variable markers and in being problematic to score for routine, high throughput applications. Problems with the reliability of RAPD analysis led to the development of more robust protocols. AFLP analysis (Vos et al. 1995) is based on the PCR amplification of a subset of DNA fragments generated by two different restriction (i.e. cutting) enzymes. It is widely regarded to be highly reproducible within a laboratory and usually generates around 50-100 DNA fragments of detectable size per amplification reaction. The large number of products produced necessitates the use of DNA sequencing-sized polyacrylamide gSs and a suitable detection system (usually radiodetection or through the use of fluorescently labelled nucleotides) or capillary sequencing equipment for electrophoretic separation of the products. In consequence, initial infrastructural costs can be significant for non-specialist laboratories. Furthermore, complexity of the profiles generated and the sensitivity of the technique to even minor changes in the electrophoresis running conditions make data documentation and transferability between institutions an impracticable prospect. Inter Simple Sequence Repeat Polymerase Chain Reaction (ISSR-PCR) generates more simple band profiles (Charters et S. 1996a) that show some evidence of reproducibility between laboratories (Charters eta!. 1996b). Theoretically therefore, this technique could have potential generic utility although the complexity of the band profiles would cause significant data entry problems (particularly for minor, faint bands) and the protocol used would have to be standardised to a very high degree of accuracy. More recently, further protocols have been developed that utilise the ubiquitous presence of both retrotransposons and microsatellites (Simple Sequence Repeats) in higher organisms. These techniques are REtrotransposon Microsatellite Amplified Polymorphism (REMAP) and Inter Retrotransposon Amplified Polymorphism (IRAP) and like AFLP, these techniques generate very large numbers of amplification products. As yet, the reproducibility of these procedures is uncharacterised although the readiness of retroelements to move in response to environmental stimuli may be a cause for concern. The complexity of band profiles produced, like AFLP and ISSR, also renders data entry problematic.
Multi-allelic markers
Isozymes were the first marker system based on multiple alleles generated at a single site in the genome. The low numbers of different isozymes available, the low numbers of polymorphisms typically generated by each isozyme, difficulty in handling large numbers of samples and the possibility of sensitivity to environmental influences seriously limit the value of isozymes as a universal reference marker system for cocoa. Simple Sequence Repeats (SSRs), also known as microsatellites, consist of tandem arrays of short oligonucleotide sequences 2 to 6 bases in length. SSRs are extremely common throughout the eukaryotic genome (Tautz and Renz 1984) and are highly polymorphic in length (Levinson and Gutman 1987). The variation in length is the most widely exploited means of using SSRs to reveal differences between individual genotypes. It is now the method of choice for human genetics applications (e.g. Zoossmann-Diskin 2000; Service et a!. 2001; Ota et a!. 2001). SSR-PCR analysis involves the direct amplification of a specific SSR using two primers that specifically bind to regions flanking the target SSR. Differences between the lengths of the amplification products form the basis of the comparison. Thus, once appropriate primers have been generated, SSR-PCR is quick, highly diagnostic and because the band profile is simple, is relatively easy to automate and generates information that is transferable between laboratories. Minor limitations lay in the presence of stuffer bands, that can cause difficulty in the automated scoring of some loci, in the requirement for standardisation of labels and alleles for full transferability and, more importantly, that high resolution polyacrylamide gel electrophoresis (PAGE) or capillary sequencing equipment is used for band separation. The last point limits the application of the technique to moderately well established laboratories except for the scoring of large allelic differences. High mutation rates between the allelic states of SSRs could also be a cause for concern.
Targeted locus-specific systems
Thus far, all markers have been initially generated from random sites within the genome. This is even true for SSR-PCR, where the original isolation of the SSR is usually performed by screening a genomic library for clones that contain a large SSR. An attractive alternative to the random approach is the targeting of particular loci to generate markers. Such loci could be targeted because they represent genes of interest or may simply be a particularly useful band marker from a complex profile (such as an AFLP marker) that is linked to a desirable gene or QTL. Targeted locus-specific markers can be based on relatively large differences between individuals in the DNA sequence of the targeted fragment or can be based on Single Nucleotide Polymorphisms (SNPs). There are numerous approaches used for detecting such differences but those most applicable for modestly equipped laboratories exploit intron size variation, such as Heteroduplex analysis and Cleaved Amplified Polymorphic Sequence (CAPS). In all cases, the polymorphisms are simple to generate and score but require primers that are specific to the targeted locus. These can be generated directly from previously known sequence information or if necessary, a band from a complex multiple band profile, or cDNA clone, can be extracted from a gel, re-amplified by PCR, cloned and its DNA sequence determined. From this information primers specific to the DNA in the band can be designed and used to produce individual copies of the band. Markers generated by this process are called Sequence Characterised Amplified Region (SCAR) markers and can be easily scored on agarose gels stained with ethidium bromide. Accumulation of such markers provides the almost limitless ability to distinguish between genotypes and yet also allows targeted focussing on genes of interest.
Applications of markers for cocoa
Markers that differentiate between individuals offer a wide range of uses for both germplasm curators and breeders. The value of molecular biology for cocoa genebank managers can be broadly broken down into three main applications: (1) clone identification and paternity analysis; (2) the assessment of genetic affinity and diversity of clones held, and (3) the evaluation of the agronomic value or physiology of clones. These applications are explored below.
The needs of germplasm managers
Clone identification Genebank managers need techniques to allow them to distinguish between genotypes and identify mislabelled accessions. Genetic fingerprinting techniques are generally very useful in this process. However, there are some cocoa accessions, generally siblings produced through self-pollination of a highly homozygous tree, which give identical, or almost identical, marker profiles. In such cases, it can be difficult to establish whether two plants (cuttings or seedlings) should be considered to be representatives of the same genotype. In practical terms, it is probably advisable to adopt a pragmatic approach in which two plants are considered as effectively belonging to the same clone following the failure to detect variation after applying some arbitrary number of markers. Practical constraints (limited resources, laboratory facilities and time or space) poses far greater demands on the technique if it is to have practical utility for germplasm management purposes. Ideally, the protocol should possess the following features:

highly diagnostic,
cheap,
easy to perform,
demand low infrastructure costs,
easy to document into associated databases,
capable of diagnosis at different taxonomic ranks,
reproducible within and between laboratories,
amenable to automation,.
adaptable for gel-free analysis, and
provide some indication of phenotype.
Of the techniques that are currently available, SSR-PCR probably represents the system of choice, provided that effort is made to minimise linkage between the SSR loci used for identification purposes and to select SSRs with the highest allelic diversity within the species. SSR-PCR can differentiate between a maximum of around ten alleles per locus. Under optimal circumstances, the frequency of such alleles in a population should be evenly distributed so that about 10% of the population possess each allele. This reduces the number of SSR primer pairs needed to distinguish between most genotypes and provides the highest power of diagnosis. There are nevertheless some practical considerations that may ultimately restrict the value of SSR-PCR analysis for automated or local large-scale applications. Most SSR alleles differ in size by only 2 to 10 bases and so require high resolution PAGE for resolution. In most cases, therefore, specialist molecular laboratories are needed for fractionation and fragment detection. For this reason and logistical difficulties in standardising allele identities and label usage, large-scale genotype identification programmes will probably rely on third party service arrangements rather than in-house assessments.
In the future, therefore, it is probable that simple locus-specific systems will be used such as CAPS, SCAR-based polymorphisms and Single Strand Conformational Polymorphism (SSCP). Such systems will prove to be more useful to the germplasm managers in the long run. Of these, CAPS is the simplest and easiest to use, It is based on the presence or absence of a restriction site at a site of sequence polymorphism (e.g. a SNP) and is codominantly inherited (heterozygotes can be distinguished). In this case, the presence or absence of the restriction site defines the allelic states, whereas in SCAR-based systems this is defined by the presence or absence of an amplicon. The most efficient strategy to use such a two or three allelic system for the identification of genotypes is to select markers that progressively subdivide the total gene pool into groups of approximately equal size. Thus, for a two allele system (SCAR), one marker has two possible groups (band present or absent). Two markers generate four groups (++, +- -+, and 2*20 It follows that for ten bands, 2*10 categories are created (1024) and for 20 markers, 20 possible allele combinations are produced (=1 048 576). Thus, even when only two allelic states are possible, 20 to 3D bands should be sufficient for diagnostic purposes.
Genetic diversity analysis
There are two main applications for genetic diversity analysis in germplasm management. Firstly, in establishing the composition of a core collection, a subset of accessions which will give a good representation of the genetic diversity present in the whole collection. Secondly, genetic distance analysis of the entire collection, or a large portion of it, can identify genetic groups that are under-represented. This type of information can enable priorities to be established for subsequent germplasm collecting expeditions.
A key factor determining the accuracy of genetic distance analysis is the avoidance of bias in sampling strategy (if a subset is used). It is also important that the analysis is not based on a small coverage of the genome, particularly if some of the loci used to generate polymorphisms are tightly linked to genes under strong selection pressure. For this reason, multi-locus techniques such as AFLP and ISSR are probably of most value. Of these, the latter is most suited for use in a modestly equipped laboratory.
Clone evaluation
The relationship between genotype and agronomic performance is still poorly characterised in most crops, including cocoa. Ultimately, however, germp!asm managers and breeders will desire knowledge of the identity of genes that are important in controlling agronomically important traits and also of the allelic status of the plant material that is available for improvement programmes. In this way, the breeders will possess some ability to predict the likely performance of the clone and its offspring in relation to the targeted trait. The selection of the most appropriate strategy for identifying and characterising genes involved with the control of important traits depends on the number of genes involved and whether or not their identity is already known.
Where the gene has not been identified, the most commonly used strategy is to use allelic polymorphisms in a tightly linked marker. This has value for predicting the performance of progeny from a characterised parent but is less useful across a wide range of germplasm. This is partly because frequent recombination over many generations breaks up any relationship between marker and gene alleles, but also because alleles on the marker tend to evolve at different rates to alleles on the linked targeted gene. A more direct approach can be used in cases where the identity of the gene is known. Here, the breeder may either use a marker located within or very close to the gene (within 1Kb). This overcomes any problems associated with recombination but not with differential rates of evolution. A second, more robust strategy is to develop SNP-based markers that directly distinguish between alleles of the targeted gene.
The needs of breeders
Marker-assisted breeding
Marker assisted selection has played a significant role in the genetic improvement of crops with well-characterised genomes but has yet to play a significant role in the improvement of cocoa. There are essentially three states of knowledge that dictate the approach used.
First and most simply, is the case where the identity and DNA sequence of the gene is known. In these instances, exploitation of the differences in DNA sequence between allelic forms of the gene can allow the development of molecular markers that distinguish between alleles. For breeding purposes, it is most desirable to use technically simple systems that are reliable and capable of high throughput. There are several possible candidate methodologies. These include variation in intron size (SCAR primers in flanking coding regions), CAPS analysis (restriction site on a SNP that distinguishes between alleles), SCAR analysis (3� end of primer(s) sited on a SNP or insertion/deletion (indel)) and heteroduplex analysis (sequence polymorphisms between alleles giving rise to an extra, heteroduplex band).
A more common scenario is where the breeder knows that a trait is controlled by a single gene but does not know the identity of the gene responsible. In these instances, one option open to the breeder is to identify markers that are linked to the gene and use these to assist selection. This can be achieved relatively simply by bulked segregant analysis�. A segregating mapping population is divided into two halves based on phenotype. DNA is extracted from all individuals in a population. DNA from plants representing each phenotype are pooled together such that there are two pools (one for each allelic state). These pooled DNA samples are then used as a template for a multi-locus marker system such as AFLP or ISSR. Markers that are tightly linked to the gene and polymorphic between the parents will generate bands that are present in one pool and yet absent in the other, These candidate markers are then tested on the original segregating population and those closest to the gene can be used for marker-assisted selection. This approach, although reliable, can have limited generic value since the markers may not be polymorphic in progenies derived from other parents. A more predictive approach is to use information on the behaviour of genes in a model organism as the basis for an informed guesswork approach to infer the possible identity of the gene. This is called the candidate gene approach. Primers specific to the candidate gene are used to amplify all or part of the gene from the parents by PCR and subjected to sequence analysis (to sequence differences between the parental forms of the qene). In turn, these are applied to the segregating population to test for co segregation with the variation observed in the phenotype. Transformation experiments can be used to confirm the identity of the gene.
The final scenario is where there are several unknown genes controlling the trait of interest Under these circumstances, Quantitative Trait Loci (QTLs) analysis can be applied to mapped populations that segregate for the trait of interest. Associations are sought between the alleles of markers flanking a gene or gene cluster that is important in controlling the trait. The validity of the strategy depends firstly on the population size being sufficiently large to avoid recombination bias (typically populations of over 200 plants are desirable). Secondly, the map needs to contain a large number of markers evenly distributed over the entire genome such that there are no regions with poor coverage. Finally, it is important that the progeny segregates widely for the trait of interest.
There have been several studies that describe QTLs associated with traits of interest in Theobroma (Crouzillat et at 1996, 2000a and 2000b), although the markers generated have yet to be applied as part of a breeding programme. The prospect of progressing from the identification of a QTL to the isolation of the gene(s) responsible by map~based cloning is an attractive one but has been achieved so far just once in tomato (Frary et LW. 2000). In practical terms, such an approach is costly in time and resources, and requires very large populations to accumulate a suitable number of recombinant individuals (typically in excess of 1000). It is Therefore unlikely to have practical value for the routine isolation of agronomically important genes.
QTL analysis has far greater potential for the identification of markers for Marker Assisted Breeding. This requires that the QTL identified is stable over years and sites and that the markers used are closely linked to the 0Th (generally less than 2cM). Ultimately, QTL analysis may equally have potential for the selection of two parents with complementary genetic architectures. Again, cognisance should be taken of the stability of the complementary OILs over sites and years. Perhaps one of the most powerful applications of QIL analysis, however, will be realised when it is combined with genomics information. For instance, the map position of a candidate gene overlaid onto a QTL analysis of an established mapping population could be used to test whether the gene is responsible for any of the observed variation in a targeted trait.
In the immediate future, there is a clear need to improve the resolution and accuracy of QTL analysis in Theobroma. This requires the replication of large mapping progenies (more than 200 plants) across several geographically dispersed sites and the repetition of phenotypic scoring over several years For the medium term, Sequence Tagged Sites (STS) anchor points need to be established to enable comparisons with maps of model and intermediate species (synteny mapping). In the medium term, the exploitation of linkage disequilibrium (LD), to identify candidate genes, also has potential utility. LD refers to correlations among neighbouring alleles, reflecting �haplotypes� descended from single, ancestral chromosomes. However, work on the human genome suggests that different regional groups may exhibit differing patterns of LD associations. Reich et at (2001) conducted a large-scale experiment using a uniform protocol to examine 19 randomly selected regions in the human genome. LD in a United States population of north-European descent typically extended 60 kb from common alleles, implying that LD mapping is likely to be practical in this population. By contrast, LD in a Nigerian population extended markedly Jess far. The result was used to suggest that LD in northern Europeans is shaped by a marked demographic event about 27,000-53,000 years ago
The future of Theobroma genomics and post-genomics
The growing volume of DNA sequence information relating to the genomes of model species offers exciting opportunities for the rapid identification of some of the genes involved in key processes. The complete genome of one plant species (Arabidopsis thaliana has now been sequenced and there are now many plant species for which sequences of expressed genes (Expressed Sequence Tags, EST) are available. The function of many of these genes is already known and the functions of many more will be inferred over the coming years. There is a real opportunity to use important genes from the model species in order to find their equivalent versions in cocoa. This is a moderately challenging process but offers great long-term rewards to the breeder. There are several parts to the process of achieving this aim.
First, the DNA code for one gene can be very similar to that of another, related gene. It is consequently important when developing markers that are specific only to the target gene that only the part of the code that is unique to it is used to produce the marker. Reaching this goal starts with the comparison of regions of the DNA sequence of the target gene from several species with those of related genes. Gene-specific parts of the code are used to produce primers that bind only to these diagnostic parts of the gene code. The primers are then used to amplify the equivalent piece of DNA from cocoa. This can be achieved either directly using genomic DNA from cocoa by PCR or using messenger RNA (mRNA) by reverse transcriptase-PCR (Rt-PCR) . Full-length coding sequences of the cocoa version of the gene can then be generated by a number of approaches (e.g. 5�, 3� rapid amplification of cDNA ends (RACE)). Sequences of the candidate gene from the target species (Theobroma) can then be compared with that from the model organisms. Proof that a candidate is a homologue of one of known function in a model group should not be based entirely on sequence similarity but rests on supportive evidence such as complementation experiments and association/mapping studies. Thus, in order to fully exploit these resources it may be necessary to map the Theobroma genome to establish the extent of synteny (shared gene order) between cocoa and the model crop species. This would require collections of around 200-plant progenies at replicated sites to be screened for anchor points to map against the model species. Information generated from anchor points would allow comparative maps to be established.
The choice of marker system for such anchor points and for future mapping and genomics-based applications is open to some question. SSR-PCR is the method generally favoured for the generation of anchor points for most groups within a genus. However, since DNA sequences from distantly related species are more disparate, the likelihood that the PCR will fail increases when attempting to compare plants from different genera or families. For this reason, markers associated with the more conserved (less variable) DNA sequences within genes probably offer better opportunity for anchor points for maps that span large taxonomic distances.
As more information on candidate gene sequences and their functions becomes available, there is also scope to use such data for the development of markers corresponding to the genes themselves rather than the current reliance on neutral markers from non-coding regions. The targeting of SNPs, indels or SSRs within genes for the production of new markers has attractions far beyond testing the role of candidate genes. They have potential also for use as linked markers in their own right.
There are around 25000 genes in the best-characterised plant species, Arabidopsis thaliana (Kaul et a!. 2000). In higher organisms, genes are distributed unevenly across the genome so that groups of genes will frequently cluster together into the same, very small region of a chromosome. Most of the chromosome is almost invariably composed of DNA that does not code for any gene. By definition then, a marker from one gene within such a cluster is likely to fall close to another within the same cluster. Neutral markers� from methods such as AFLP, ISSR or SSR-PCR may be positioned close to or distant from regions containing a gene or gene cluster. Thus, as markers for linkage maps, markers taken directly from the genes themselves should perform no worse and probably better than neutral markers.
There are advantages too in the nature of the markers generated directly from gene sequences. Small changes in sequence between genotypes (e.g. SNPs) are most usefully converted into CAPS markers or indels into SCAR markers. This type of marker produces data of a simple presence/absence type that is both easily scored and amenable for entry into databases. It is conceivable that future research efforts for the generation of markers may place increased emphasis on the generation of markers from within the genes themselves.
The production of gene code sequence data (either from EST or genomic sources) also allows the detailed study of how these genes are expressed and interact within the trees themselves during key developmental stages or in response to infection. This is called transcriptomics. The effective use of transcriptomic approaches also allows for the inference of gene function and the further characterisation of candidate genes. In the early stages, where sequence data is largely missing, Serial Analysis of Genome Expression (SAGE) perhaps offers the most appropriate method of measuring genome expression as it does not depend on prior knowledge of gene code. Data generated from this approach, however, can subsequently be applied for the isolation of full-length code sequence of active genes. This can, in turn, be used to assemble a collection of cocoa genes on micro-arrays. Micro-arrays offer the most effective method for studying the expression of these genes on a large scale. This will ultimately lead to a more comprehensive identification of the gene cascades involved in key agronomic processes.
The role of genetic modification
Genetic Modification (GM) technology has considerable potential for the cocoa breeder and geneticist in two ways:
as an experimental tool to confirm the function of candidate genes,
for the correction of inherent genetic flaws in the crop by the insertion of novel transgene constructs.
Despite the strength of public opinion against GM technology in some parts of the wo�d, the number of GM crop field trials continues to increase. The OECD database also shows a continual rise in the number of genes available. In 1999, the commercial cultivation of GM crops across the globe included over 20 Genetically Modified (GM) cultivars on about 40 Mha (http://nbiap.biochem.vt.edu; corrected on 22 June 2000). Collectively, these cultivars contain 31 transgenes and the trend is towards proliferation of GM crops and increasing the type and number of transgenes they contain. For example, the US National Biological Impacts Assessment database (James and Krattiger 1999) lists approved releases for the USA alone of some 60 GM crop species containing about 300 transgenes. Advances in gene isolation technology, in the control of transgene expression and the advent of gene shuffling� technology will combine to increase the number of crop-cultivar combinations still further.
The development of efficient transformation systems for cocoa (Perry et at 2000) offers considerable potential benefits for the crop over the medium to long-term. GM technology could lead to a durable solution for many of the important pest and disease problems faced by the crop since resistance to new pathogen strains could be incorporated into existing varieties, as and when required, perhaps more quickly using GM than by using conventional techniques. Consumer resistance to the use of the technology is currently preventing emphasis being placed on research into the generation and commercial cultivation of GM cocoa lines. In the medium term, it may be argued that gradual public acceptance of the technology in other crops through repeated exposure to GM materiel will allow its ultimate introduction and use in cocoa.
However, it would be very important that the appropriate risk assessment studies on GM cocoa had been conducted well in advance of the release of any GM cocoa varieties. The commercial release of any crop involves three stages. 1. Small-scale field experiments. 2. Replicated field trials. 3. Registration or listing on a national or regional list of approved cultivars. Detailed evaluation of the environmental risks posed by the GM line is required at each stage of this process. This information is comprised of two parts: a part that relates specifically to the transgene and another part that relates more generically to the crop itself. In a global context, most regulatory restrictions address risks posed to human health (specific to the transgene) and those posed to the environment (determined by the transgene and the biology of the crop). For mast crops, the greatest environmental risk is often seen as the possibility of trangene movement into wild relatives, non-GM crops or possibly to other organisms by horizontal gene transfer and of the effect of the GM crop on non-target species. Often overlooked, however, are possible effects arising from changes to farm practice that are evoked by the technology and the possibility that the crop itself may become an invasive of natural or semi-natural habitats. I will briefly explore each of these in relation to the possible commercial release of GM Theobroma cacao.
Gene flow
The wide range of transgenes available, coupled with historic records of hybridisation events between crop and wild relatives (Ellstrand et at 1999) makes some pollen-mediated transgene movement inevitable over the medium to long term. This is not new and has occurred from non-GM cultivars (Scheffler and Dale 1994: Stace 1997; Ellstrand et at 1999), although in a small number of cases, transgenes will enhance fitness of certain recipient wild relatives and could change their ecology. This may, in turn, affect the communities in which they live. The scale of such an environmental consequence is unlikely to be of great significance on a global scale for most crops, although over a limited range ecological consequences could be significant, depending on the ecological importance of the recipient species.
Theobroma is endemic to South and Central Americas and so the risks of pollen-mediated gene flow from the crop to wild relatives is limited to this area. On the other hand, the possibility of transgene movement to stands or plantations of non-GM Theobroma is not discounted by this argument. The ecological effects of such movement are unlikely to be of significance by such movement although there may be economic consequences should the recipient stand be organic or supplying seed for a market requiring GM-free seed. This is particularly germane given that cocoa is cultivated for its seeds. The probability of horizontal gene movement to soil or water-borne micro-organisms is uncharacterised but likely to be negligible.
Changed farming practice
The nature of the construct used and the crop into which it is placed are the chief factors that determine this aspect of risk assessment. The introduction of herbicide tolerance into temperate annual crops, for instance, is likely to change the pattern and extent of herbicide application. For Theobroma, it is probable that transgenes that enhance resistance to disease or insect attack will be the first to be introduced. It is difficult to anticipate effects of such genes on the pattern and mode of cocoa cultivation.
Direct and indirect effects on non-target species
Secondary effects, including the impact of toxins on non-target organisms (pollinators, herbivores etc) or of their accumulation into higher trophic levels has recently attracted a great deal of interest and is currently the subject of significant amounts of research activity in other crops. Here too, data on the identity of those organisms most likely to be affected by the use of selectively toxic GM lines (e.g. lines carrying the Bt gene) are lacking for cocoa.

Conclusions
New technologies have the capacity to impact on the cocoa industry in many ways although the greatest benefits will probably accrue to curators of germplasm collections and breeders. Managers of germplasm or quarantine facilities face the perpetual problem of mislabelling, erroneous documentation and transcription errors leading to the misidentification of clones held. The problem is compounded by the fact that homozygous, self-compatible trees generate near-identical offspring and colour mutants of pods or seeds are relatively common. In the immediate future, a service-based system of clone assignment or reassignment is to be introduced in which SSRPCR will be used as the molecular tool for diagnosis. This will provide a snapshot that will enable many of the current errors to be corrected. However, it will not allow verification of all replicate clones in collections or the correction of on-going mistakes arising from the introduction of fresh material or errors during collection maintenance. This type of problem dictates the use of in-house facilities. Requirements for infrastructure and detection systems appropriate for high-resolution electrophoresis make it difficult for SSR-PCR to be adopted routinely by local end-user. Furthermore, the need for standardisation of label usage, control sample and allele assignment, problems with distinguishing nulls from heterozygotes and data entry dictate that end users possess moderately good facilities and molecular expertise on site. In the longer term, therefore, there is a requirement for a simple, more robust system of identification. The most plausible system of those currently possible is one based on the exploitation of SNPs or CAPS. This approach can be applied to simple agarase electrophoretic apparatus and generates unequivocal data sets that are amenable to database entry and completely transferable. The adoption of such a system in the medium term would provide the curator with the ability to routinely cull the collection of mislabelled samples and in some cases, even to assign paternity to poorly documented samples.
All germplasm collections are dynamic entities, with new material constantly being introduced and clones lost through age or disease. Given that all facilities have finite size and tend to operate at or near capacity, curators have the responsibility to prioritise the importance of clones within the collection. In this way, unimportant clones can be discarded to make way for new, valuable material. In the absence of comprehensive evaluation data for incoming and existing material, this decision must be based on the relative similarity of clones to others held in a collection. It is more important to retain clones that are genetically distinct than clones that bear a close similarity to others already held. The use of genetic distance analysis is the simplest approach to allow this form of prioritisation. In the short-term, agarose-based ISSRPCR or double-anchored ISSR-PCR (Charters 2001) probably represent the simplest and cheapest of the robust approaches for this to be achieved on site. AFLP (Flament et al 2001) or ISSR using low temperature precast PAGE gels (Charters et at 1996a) generate the most comprehensive datasets for analyses conducted in moderately well equipped molecular facilities.
The ultimate goal of a curator is to provide comprehensive evaluation information that allows breeders to make effective use of material held. New technologies are still some way off making significant advances in this direction. Nevertheless, the identification of candidate genes for certain traits by sequence searches or by functional transcriptomics (e.g. SAGE), coupled with allele screening and the development of CAPS systems to distinguish alleles of a desirable gene may allow progress for some targeted traits in the medium term. The same approach, together with QTL analysis may assist breeders to select parental material. Perhaps the greatest value of QTL analysis in the short to medium term lies in the identification of markers for marker assisted selection. In the longer term and as more key genes are identified, a gradual movement will be possible towards the use of gene-specific, SNP based markers. Transcriptomics, QTL analysis, shared ancestry analysis and allele association studies will play an important role in identifying such genes.
The future of GM technology is perhaps the most difficult of all to predict as its implementation is dependant upon economic, social, environmental and political factors. The potential benefits of the technology in terms of introducing generic or targeted resistance to pests and disease are not in doubt, given the resources deployed in other crops to develop appropriate transgene constructs. In the medium term, growth of GM cocoa would almost certainly require either significant movement in public opinion and/or a greater threat to continued supply such as a major epidemiological outbreak of a significant disease or pest.
References
Anonymous. 1999. ICCO Quarterly Bulletin of Cocoa Statistics, 25(4). 1998/99.
Anonymous. 2000. ICCO Quarterly Bulletin of Cocoa Statistics, 26(4). 1999/00.
Charters Y.M. 2001 Molecular tools for the management of Theobroma cacao germplasm. Ph.D. thesis! University of Reading, United Kingdom.
Charters Y.M., A. Robertson, M.J. Wilkinson! G. Ramsay. 1996a. PCR analysis of oilseed rape (Brassica napus L. ssp. oleifera) using 5-anchored simple sequence repeat (SSR) primers. Theor. and Appt. Genet. 92: 442-447.
ChartersY.M., A. Cuiham, M. End, P. Hadley, M.J. Wlkinson. 1996b. The potential of anchored microsatelite analysis for cocoa germplasm characterization. Pages 301-304 in Proceedings of the 12th International Cocoa Research Conference November 17-23, 1996, Salvador, Bahia, Brazil.
Crouzillat D., E. Lerceteau, V. Petiard, J. Morera, H. Rodriguez, D. Walker, W. Phillips, C. Ronning, R. Schnell, J. Osei, P. Fritz. 1996. Theobmma cacao L: A genetic linkage map and quantitative trait loci analysis. Theor. and AppI. Genet. 93: 205-214.
Crouzillat D, B. Menard, A. Mora, W. Phillips, V. Petiard. 2000a. Quantitative trait analysis in Theobroma cacao using molecular markers - Yield OIL detection and stability over 15 years. Euphytica 114: 13-23.
Crouzillat D, W. Phillips, P.J. Fritz, V. Petiard. 2000b. Quantitative trait loci analysis in Theobroma cacao using motecular markers. Inheritance of polygenic resistance to Phytophthora palmivora in two related cacao populations. Euphytica 114: 25-36.
Dyson T. 1996. Population and Food. Routledge, London, United Kingdom.
Elistrand NC., H.C. Prentice, J.F. Hancock. 1999. Gene flow and introgression from domesticated plants into their wild relatives. Ann. Rev. Ecol. System 30: 539-563.
Flarnent M.H., I. Kebe, D. Clement, I. Pieretti, A.M. Risterucci, J.A.K. N�Goran, C. Cilas, D. Despreaux, C. Lanaud 2001. Genetic mapping of resistance factors to Phytophthora pairnivora in cocoa. Genome 44: 79-85.
Frary A, T.C. Nesbitt, S. Grandilto, E. van der Knaap, B. Cong, J.P. Liu, J. Meller, R. Elber, KB. Alpert, S.D. Tanksley. 2000. A quantitative trait locus key to the evolution of tomato fruit size. Science 289: 85-88.
Gilmour M. 1995. The BCCCA ringtest on the RAPD analysis of cocoa. Pages 135-138 in Proceedings of the International Workshop on Cocoa Breeding Strategies, Kuala Lumpur, Malaysia. October 19-20, 1994. INGENIC, United Kingdom.
James C. and A. Krattiger. 1999. The role of the private sector: in Biotechnology for developing-country Agriculture: Problems and Opportunities (Persley, G.J., ed.) - 2020 Vision Focus 2, Brief 4 of 10. IFPRI, Washington.
Kaul S., H.L. Koo, J. Jenkins et al. . 2000. Analysis of the genome sequence of the flowering plantArabidopsisthaliana. Nature 408: 796-815.
Lanaud C, 0. Sounigo, Y,K. Amefia, D. Paulin, P. Lachenaud, D. Clement. 1987. New data on the mechanisms of incompatibility in cocoa and its consequences on breeding. Cafe Cacao The 31: 267-277.
Levinson G. and GA. Gutman. 1987. High-frequencies of short frameshifts in poly-CA/TG tandem repeats borne by bacteriophage-m13 in Escherichia-coli k-12. Nucleic Acids Research 15: 5323-5338. (belum lengkap)