Long read sequencing was described as the “method of the year[i]” In an article published in Nature Methods at the start of 2023. The development of long read sequencing has significantly expanded the possibilities for genomic analysis.
Despite challenges such as the cost and the complexity of data analysis, the technology continues to improve, with increased accuracy, affordability, and accessibility.
This article introduces long read sequencing, highlights some of the key advantages compared with short read sequencing, and gives some key applications for long read sequencing.
Key points:
- Long read sequencing length
- Long read sequencing methods
- Advantages of long read sequencing vs short read sequencing
- Challenges with long read sequencing
- Long read sequencing technology platforms
- DNA extraction for long read sequencing
- Should I use short read or long read sequencing?
Long Read Sequencing Length
Long-read sequencing, sometimes called “third generation sequencing,” is a DNA sequencing technique that enables the sequencing of much longer stretches of DNA, typically ranging from thousands to over a million base pairs. By comparison, traditional short read sequencing typically captures sequences of 100-500 base pairs.
Long Read Sequencing Methods
Long read sequencing can be either “true long read” sequencing, or ‘synthetic long read sequencing.”
“True long read” sequencing directly reads longer fragments of DNA, typically ranging from thousands to over a million base pairs. This method is employed by companies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), enabling the sequencing of long DNA strands in a single continuous process.
“Synthetic” long read sequencing uses short read sequencing data to reconstruct longer stretches of DNA. This method is employed by companies such as, Element Biosciences and Illumina (primarily known for short read sequencing). In synthetic short read sequencing, short DNA fragments are barcoded and sequenced using standard short read technology platforms, and computational methods are then used to assemble the short reads into longer sequences based on barcodes and overlaps, effectively creating a “synthetic” long read. This approach provides some of the benefits of “true” long read sequencing, like the better assembly of repetitive regions or complex genomic structures, without needing specialized long read sequencing equipment.
Both long-read and short-read sequencing have pros and cons. A “hybrid” DNA sequencing approach combines different DNA sequencing technologies to leverage each other’s strengths while compensating for their weaknesses. This approach typically involves using both long-read and short-read sequencing technologies together.
Advantages of Long-Read Sequencing vs Short Read Sequencing
The development of long read sequencing has been driven by the search for more complete and accurate genomic information.
The key limitation of short read sequencing is the inability to sequence long stretches of DNA. If the sequence of a large region of DNA is required, e.g. for a genome assembly, then the DNA has to be first fragmented, then amplified and sequenced. Bioinformatics tools are used to assemble these short sequences to give the full length sequence. However, if there is insufficient overlap between these shorter DNA fragments, there will be gaps or errors in the final sequence. Also amplification steps can introduce sequencing errors, particularly in repetitive regions of the genome.
The key limitation of short read sequencing is the inability to sequence long stretches of DNA. If the sequence of a large region of DNA is required, e.g. for a genome assembly, then the DNA has to be first fragmented, then amplified and sequenced. Bioinformatics tools are used to assemble these short sequences to give the full length sequence. However, if there is insufficient overlap between these shorter DNA fragments, there will be gaps or errors in the final sequence. Also amplification steps can introduce sequencing errors, particularly in repetitive regions of the genome.
Long read sequencing can provide a more comprehensive view of a genome than short read sequencing, enabling better identification of structural variants and repetitive regions that are often challenging to resolve with short reads, because it is difficult to reassemble sequencing data over long stretches of DNA.
Long reads can be particularly useful when:
- Resolving Complex Genomic Regions: Long reads are particularly advantageous when sequencing regions with repetitive elements, structural variants, and complex rearrangements, which are often challenging for short-read technologies.
- Assembling Genomes: Long-read sequencing provides more contiguous and accurate genome assemblies. This is especially important for de novo sequencing, where a reference genome is not available.
- Detecting Structural Variants: Long-read sequencing is useful for detecting large structural variants such as insertions, deletions, inversions, and translocations, which play significant roles in genetic diversity and disease.
- Phasing and Haplotyping: Long reads can span entire genes or large genomic regions, allowing for the accurate phasing of alleles and haplotype reconstruction.
Challenges with Long Read Sequencing
Despite its advantages, long-read sequencing historically faced several challenges, including higher costs and error rates compared to short-read sequencing. However, ongoing technological advancements are rapidly addressing these issues.
- Error Rates: Long read sequencing historically had higher error rates compared to short reads, affecting data accuracy. However, read accuracy is improving.
- Cost: The initial cost of long read sequencing technologies and associated data analysis can be higher than short read sequencing.
- Bioinformatics: Analyzing long read data may require specialized bioinformatics tools and computational resources, due to the unique characteristics of long reads. Data processing can take longer than with short read sequencing.
Long-Read Sequencing Technology Platforms
There are several platforms that facilitate long-read sequencing, with the two most well known being Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT).
Pacific Biosciences (PacBio): PacBio’s Single Molecule Real-Time (SMRT) sequencing technology can generate reads averaging 10,000-15,000 bp. The technology utilizes real-time observation of DNA synthesis, where fluorescently labelled nucleotides are incorporated by DNA polymerase, allowing for the continuous reading of the sequence.
Oxford Nanopore Technologies (ONT): ONT’s nanopore sequencing passes a DNA molecules through a nanopore embedded in a membrane. As the DNA translocates through the pore, changes in ionic current are measured and translated into sequence data. ONT platforms can produce ultra-long reads offering unparalleled length and flexibility.
DNA Extraction for Long Read Sequencing
Short read sequencing generally requires DNA fragments between 100 and 600 base pairs in length, and therefore, it can tolerate somewhat degraded DNA since the required fragment size is smaller. DNA is often fragmented mechanically (using sonication) or enzymatically during sample preparation.
Short read sequencing generally requires DNA fragments between 100 and 600 base pairs in length, and therefore, it can tolerate somewhat degraded DNA since the required fragment size is smaller. DNA is often fragmented mechanically (using sonication) or enzymatically during sample preparation.
Long-read sequencing requires high-quality, high-molecular-weight DNA, typically upwards of 10,000 base pairs. Any nicks or breaks in DNA strands can significantly impact the ability to generate long reads. The main limiting factor for ONT read lengths is the DNA extraction; Jain et al (2018) found that read lengths produced by the MinION [iii] nano pore sequencer were dependent on the input fragment length[ii]. This often necessitates more careful handling and extraction procedures, and there are several extraction methods and commercially available kits for preparing DNA for long read sequencing.
Should I use Short Read or Long Read Sequencing?
The choice between long-read and short-read sequencing methods will depend on the specific requirements of the research question including the characteristics of the region to be amplified, sample type, cost, and accuracy.
Recent improvements in error correction algorithms, cost reduction strategies, and hybrid sequencing approaches that combine long-read and short-read data are paving the way for broader adoption.
[i] Marx, V. Method of the year: long-read sequencing. Nat Methods 20, 6–11 (2023). https://doi.org/10.1038/s41592-022-01730-w
[ii] Jain M et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol 36, 338–345 (2018).
[iii] MinION is a trademark of Oxford Nanopore