Originally published : Thu, September 19, 2024 @ 4:08 PM
By Austin Sheppe PhD, LGC Biosearch Technologies with contributions by Dusty Vyas, Senior Scientist LGC Biosearch Technologies.
Genomic tools available today are continuing to be developed at a rapid pace and it can be time consuming to keep up with all the different technologies and their various pros and cons. Depending on the molecular breeding approach of interest, some genotyping techniques are more powerful than others. Therefore, scientists must weigh the likelihood of the genotyping approach succeeding versus the cost to obtain the desired outcome. As such, we are often asked what is the right technology for trait mapping?
Trait mapping
Trait mapping experiments need two major things:
- Consistent phenotypes that are measurable and inheritable.
- Genotyping techniques that generate sequencing data representative of the diversity present in a population.
From corn, to raccoons, to pine trees, each genome represents challenges to both needs, phenotypes, and genotypes. But how do you know which genotyping technique is best suited for your project? It depends on a variety of factors including genome size, amounts of repetitive regions, and ploidy level, but it is evident that both price and effectiveness play the biggest roles. When it comes to population size, several hundred samples are often used which requires the genotyping technique to capture a lot of genetic diversity for not a ton of cost.
Overview of trait mapping workflows
Whole genome sequencing
Whole genome sequencing (WSG) covers a lot of the genome and as the cost of sequencing decreases, it can be a great solution. But if the genome is large, sequencing costs can often outweigh the benefits of looking at the entire genome. Repetitive regions are over represented and coding regions underrepresented. For polyploids, each read is from a different copy of the genome, effectively increasing the need for more sequencing and cost. Therefore, targeted, or reduced representations of the genome can be preferable.
- Pros: SNP discovery across entire genome, small sample volume support
- Cons: Accuracy at low depth, read bias towards repetitive elements, cost per sample
nGBS/ddRAD
nGBS/ddRAD/Diversity Array Displacement techniques use a single restriction enzyme to effectively skew the data towards regions of interest. Producing anywhere from 1-5,000 SNPs, it can be a great tool for groups not needing 100,000's of SNPs to encapsulate the population's genetic diversity. Corn, potato, species where heterozygosity is sometimes lacking often benefit from this technique, but it can be applied across most genomes including cannabis, a highly variable crop. Depending on how routinely SNPs are called from batch to batch, it can even be viable for things like genomic selection.
- Pros: No reference genome needed, low cost per sample, bias towards protein coding genes, small sample volume support.
- Cons: SNP calling pipelines can be difficult to establish for certain species
SNP arrays
SNP arrays are one of the oldest genotyping methodologies still routinely used today for many reasons. Specific SNPs are identified through WGS projects and then manufactured into a microarray using allele specific probes. As DNA samples are applied to the chip, fluorescent signals relay if the surveyed allele is present or not as complementary regions hybridise to the attached probes. This allows researchers to compare datasets since targets are fixed, resulting in comparable SNPs for trait mapping. Traits with specific SNPs can be utilised by anyone as several species have arrays with plenty of useful information. For example, bovine parentage arrays are attractive because they return information pertinent to cattle breeders with static breeding needs. SNP arrays often have lower sample requirements and can be performed without an expensive sequencer; however, fixed targets and high manufacturing fees/turn around times are barriers for many groups looking to push the boundary of genetic manipulation. Alternative alleles not initially designed on the chip can result in considerable amounts of missing genotypes. While some arrays have a long history of use, some arrays are discontinued in place of more modern techniques and can leave breeders in the lurch with SNPs that need to be recreated.
- Pros: Small sample volume support, easy to interpret data, community chosen targets
- Cons: Immutable targets, updates are expensive and time consuming
Amplicon based targeted GBS
Amplicon based targeted GBS is a targeted genotyping solution that enables the development of panels of up to thousands of markers. Throughput, workflow simplicity and cost are common reasons to use amplicon based targeted resequencing. Based on the amplicon chemistry, 10s of thousands of samples can be processed and sequenced per day. The simplicity of the workflow makes it a primary solution when time and cost efficiency is a key requirement. Amplicon based targeted resequencing requires development of the targeting oligo panel which can take longer than with other methods. In terms of data quality much depends on the complexity of the genome and the required target marker types. Where enrichment-based methods are well adaptable to genome complexity, amplicon-based methods excel at species like maize and soy, furthermore because typical densities required for applications like genomic selection in such crops tend to be lower due to the extensive reference sequencing that has typically been done. Regardless in more complex genomes like wheat, generated data has still been observed to map to the reference genome in excess of 95%. Equal quality scores have been obtained for % on target, and call rates. Increasing the quality and quantity of data by even a few percent can impact how quickly and the degree of confidence selection of lines to take forward in any breeding program. When this involves the mapping and selection of complex traits the time and cost savings are incredibly important.
How would this make a difference to breeders? The capability to select lines to take forward in the same timeframe as endpoint genotyping in combination with sequence based data makes Amp-Seq a game changer for the implementation of high throughput genomic selection. The consolidation of markers and cost efficiency with Amplicon based targeted GBS will provide breeders the opportunity to make critical decisions earlier in any breeding program, an idea that has been aspired to by the breeding community since the capability of applying genomic selection has become a reality.
- Pros: Cost effective,one of the lowest cost per sample methods for targeted GBS, highest throughput, screen 10s of thousands of samples within 24 hours, simple workflow, enables high automatability
- Cons: Limited to thousands of markers, in-house processing requires access to high-throughput NGS sequencing equipment
Loci hybridisation
Loci hybridisation is a newer technology that utilises oligo extension as a technique in place of PCR to avoid bias associated with multiple rounds of amplification. This results in high data return, extremely precise start and end locations for sequencing, and scalability to 100,000's of SNPs in highly automated labs (high-throughput genotyping). When run in discovery mode, SNPs are called across each 500 bp loci with some crops returning multiple SNPs per loci. For example, in highly polyploid species like potato and blueberry, 22K loci results in well over 200,000 SNPs. The technology can target up to 25K loci in total, which covers 12.5 Mbp of total sequence space evenly set throughout the genome. Because targeted genotyping limits the total amount of sequence space, data return sees calls from SNPs at an average of 100x depth meaning calls are accurate. For species where coverage is extremely high, lower sequencing reads can be allocated, resulting in fewer sequencing costs. Also, since the system is probe based, new targets can be added to existing probe sets as new traits and markers are discovered.
- Pros: Reproducible datasets, community chosen targets, high throughput (large volume support), easy to update
- Cons: Requires reference genome, sensitive to crude DNA extractions (requires purification), sample volumes above 384 a year
Anchored hybrid enrichment
Anchored hybrid enrichment offers the most flexibility in choosing genomic targets since RNA probe-based chemistry is combined with hybridisation tech in place of PCR, allowing for more mismatches between target and probe sequences. For example, data return is often 99%+ from specific genes/regions of interest, however, the overall reaction efficiency is quite low as many of the reads come from off target area. These probe sets can cover up to 10 Mbp of almost any sequence you want, giving you unprecedented control over what information you sequence. Only want specific genes or promoter regions? In some cases, you can only do that with Capture-Seq. After mapping reads to the loci of interest, the resulting datasets are very consistent, making it an ideal choice for academics collaborating and wanting to compare SNP data. Several consortium approaches exist in the US and abroad. For example, tree genomes can be some of the largest observed in nature and can take years to grow before showing phenotype data of interest. Academics utilising anchored hybrid enrichment can significantly reduce reads from repetitive elements while simultaneously focusing reads on areas of interest contributing to protein coding genes. In turn, this improves the accuracy of the trait mapping experiment, reducing the need to invest more money and time genotyping another population.
- Pros: Extremely customisable in choosing targets, easy to update, minimal missing data, community chosen targets
- Cons: Labour intensive, higher cost per sample due to RNA probe chemistry. Requires a minimum of 96 samples
Selecting your genotyping tools
Overall, genotyping is a very personal choice and there are reasons beyond price and utility. Service, history, quality, and experience are all particularly important parts of the decision and should not be ignored. Businesses are on a timeline and need guarantees on delivery dates. Academics have students graduating who may need more support. At Biosearch Technologies, we pride ourselves on going beyond price and our values represent who we are as a company. Our values of Passion, Curiosity, Integrity, Brilliance and Respect embrace our employees’ commitment and dedication to using science to work for our customers.
More information on our next generation sequencing (NGS) lab services including Flex-Seq our innovative loci hybridisation workflow, Capture-Seq our anchored hybrid enrichment workflow and our GBS/ddRAD services can be found here. If you are looking to have NGS in your lab, our Amp-Seq mid-plex targeted genotyping by sequencing workflow is suitable for labs genotyping over 10,000 samples per year.
Further reading