Long-read sequencing: mechanisms, promises and applications in Healthcare

Published on 19 October 2022 Read 25 min

In April 2022, Oxford Nanopore announced the next steps of its collaboration with Genomics England, in charge of the 100,000 Genomes Project in the UK. Their main goal? Completing their cancer database to identify quickly and accurately cancer variants unknown so far in haemato-oncology, sarcoma and brain tumours. The database will then help in the diagnosis and the treatment of various cancers. This collaboration is far from being an isolated case, as long-read sequencing opens new possibilities in the field of healthcare.  Today, our team introduces you to this ground-breaking technology: what are the mechanisms of the 2 main technologies on the market? What are their advantages? What challenges remain? What are the promising applications in the study of cancers and viral genomes?

What is long-read sequencing?

We have come a long way from the first DNA sequencing method, known as Sanger sequencing, to Next Generation Sequencing (NGS) today. Yet, we keep facing problems in genome assembly, due to the current length of reads (from 75 to 400 bp). Long-read sequencing (LRS) could tackle those issues. Let’s have a look at the two main technologies leading this market.

SMRT long-read sequencing by Pacific Biosciences

The first LRS technology to be introduced into access was Single Molecule Real Time (SMRT), deployed by Pacific Bioscience (PacBio) in 2011. How does it work? A double stranded DNA is first fragmented according to the desired size. This long-read sequencing technology reads on average 20 kbp and can go up to 100 kbp. Adaptors are added on each side in order to form a closed circle. A polymerase reaction will then be run in the dark with the fragment. The laser illuminating the DNA fragment will generate signals each time fluorescent nucleotides in the sequencing solution are added, helping infer the bases that have just been added. [1]

ONT long-read sequencing by Oxford Nanopore Technologies

The second main long-read sequencing technology introduced on the market was Oxford Nanopore’s in 2014. In this case, a DNA molecule is selected and will go through a nanopore. What is key here is that all nucleotides have different resistance patterns. As they will go through the pore, changes will be detected in the resistance. Those changes will help infer the sequence. Here, the length of DNA fragments can go up to 800 kbp! [1]

Advantages and challenges of long-read sequencing

Those 2 technologies are quite promising and could help us overcome the problems we’ve been facing so far in DNA sequencing. Yet, a couple of challenges still need to be overcome before fully deploying this technology.

Advantages of long-read sequencing

Short reads are not adapted to large sequence changes or repeated regions. With LRS, we can give more context to long repeated sequences, adding what is before and after the sequence of interest. It will help improve genome assembly and detect variants that have been missed out with short read sequencing.

LRS technology could also greatly reduce time sequencing. Indeed, with LRS there is no need, or little, for amplification. Indeed, it doesn’t require the use of PCR prior to the sequencing step. Some technologies such as Oxford Nanopore don’t even need lots of reads of the same sequence to get complete data. Hence, technologies such as Oxford Nanopore can almost give immediate data streaming, where NGS might sometimes need a day for the longest sequences.

And the advantages of LRS technology don’t stop there: the reduction of sequencing costs and portability of devices make it even more promising.

Main challenges to overcome for long-read sequencing

Some may argue that LRS cannot live up to NGS accuracy standards (error rate <1%). Indeed, today on average LRS rate error is about 10%, except for PacBio technology. Still, the accuracy of this technology is improving at high speed, with a rate error going down to <1% for SMRT and <5% for Oxford Nanopore [2]. So even if all LRS technologies do not reach those levels yet, the day when LRS assemblies will beat NGS’s might come faster than expected.

Second, LRS technologies must be improved to use smaller quantities of DNA. This challenge is especially key in cancer diagnosis. Indeed, clinical samples for molecular diagnosis are very limited. Thus, developing long-read sequencers working with very small amount of DNA will be key. [3]

Finally, even though sequencing costs should drastically drop in the coming years, long-read sequencing still remains more expensive than NGS. Samples preparation and automation are potential sources of improvement in this field, alongside with the size of sequencers.

Applications of long-read sequencing in cancer and viral genomes

Thanks to its advantages, LRS technology has already significantly contributed to research in healthcare, and more specifically to cancer and viral genomes analysis.

Cancer genomes analysis

Long-read sequencers have helped us understand better Structural Variants (SVs) i.e., combinations of DNA mutations longer than 50 bp. SVs play a fundamental role in cancerous genomes, affecting the function of important genes such as oncogenes or tumour suppressor genes, such as ERBB2 in breast cancer or ROS1 in lung cancer. It can be difficult to detect SVs and assess their structure, as the length of short reads used in NGS sometimes do not excess the length of those mutations. But thanks to LRS, research in this field has improved. Since 2016, we’ve been able to conduct research on no less than 17 SVs involved in cancer genomes [3]. Some of the results are already promising. As an example, a team of researchers focused on breast cancer and identified, thanks to LRS, SVs associated with a specific gene particularly active in breast cancer and resistance to specific drugs, ERBB2[3].

Those results should help us in understanding differences between patients, and more importantly, they should enable us to offer new and more effective treatments and diagnosis tools for cancer. That’s why Genomics England launched the initiative “Cancer 2.0” and decided to include long-read sequencing technology and multimodal data, to get insights on cancer drivers and evolution.

Learn more about the latest advances on our Oncology Exploration Center >

Viral genomes analysis

The full sequencing of viral genomes provides us with new opportunities to study viral phylogeny and epidemiology. Indeed, viruses have small and compact genomes, ranging from 2kb to 1mb, and are more prone to mutate than other organisms. Thanks to the introduction of NGS, we’ve been able to make incredible progress in the analysis of viral genomes and follow their evolution. Long-read sequencing technologies could help us go further.

Let’s take the example of SARS-CoV-2. In order to follow the evolution of this virus throughout the recent pandemic, researchers have mainly used short-read sequencing. As highlighted during one of PacBio’s webinar, assemblies presented by GISAID in 2021 had lots of gaps, – up to 40% with Illumina technology. Those gaps could reach up to 1,000 nucleotides in some cases [4]. This can be explained by the position of primers when sequencing, the number of repeated regions and other factors. PacBio technology managed to overcome better those challenges, lowering the level of gaps to only a few [4]. Hence, the full sequencing of this virus with LRS technology provides us with more information on SARS-CoV-2. This will help us improve the monitoring of its evolution, stability and mutation rate.

Long-read sequencing will not only help track SARS-CoV-2 evolution, but also develop relevant therapeutics. Indeed, we will be able to track the evolution of this virus and analyse its mechanisms in living organisms. Therefore, we should be able to anticipate better drugs resistance for instance. This could lead to the development of new therapeutics for Covid-19 pandemic.

The Covid-19 pandemic will not be the only target of LRS technology. It could very well be applied to other viruses such as influenza or HIV. LRS paves the way for the development of new drugs and therapeutics.

Long-read sequencing offers today new possibilities in healthcare and opens up exciting possibilities for developing new diagnostic tools and treatments. The promises of LRS do not stop to cancer and viral genomes analysis, as this technology is used in a growing field of applications in life sciences. Even though the technology still needs improvements, Alcimed believes there is little doubt that LRS will become essential in the coming years in several research fields.

[1] Pollard, M. O., Gurdasani, D., Mentzer, A. J., Porter, T., & Sandhu, M. S. (2018). Long reads: their purpose and place. Human molecular genetics, 27(R2), R234–R241.

[2] Amarasinghe, S.L., Su, S., Dong, X. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 21, 30 (2020).

[3] Yoshitaka Sakamoto, Suzuko Zaha, Yutaka Suzuki, Masahide Seki, Ayako Suzuki, Application of long-read sequencing to the detection of structural variants in human cancer genomes. Computational and Structural Biotechnology Journal, Volume 19, 2021, Pages 4207-4216.

[4] Meredith Ashby, PacBio webinar. (2021, February 5th). Webinar: Opportunities for using PacBio Long-read sequencing for COVID-19 research. Youtube.

About the author,

Charlotte and Lucia Consultant, and Quentin, Project Manager in Alcimed’s Healthcare team in Switzerland

You have a project?

    Tell us about your uncharted territory

    You have a project and want to discuss it with our explorers, write us!

    One of our explorers will contact you shortly.

    To go further