Next Generation Sequencing

About NGS

Until few decades ago, determining the nucleotide sequence solely relied on using Maxixam-Gilbert Sequencing and Sanger Sequencing.

The former technique uses radioactive phosphate to label the DNA end strand. Following this activity is the chemical treatment that breaks down specific nitrogenous bases that is subsequently determined by running on polyacrylamide gel. In Sanger sequencing (also called dideoxy-DNA sequencing), the use of di-deoxynucleotidetriphosphates (ddNTPs) helps identify the bases by terminating DNA strand elongation.

The Sanger sequencing technique now uses fluorescent-based ddNTP that automatically reads the sequencing without gel electrophoresis. This protocol, however, provides limitation in sequencing multiple long strands.
On the other hand, sequencers that can perform high-speed and large-scale DNA sequencing are called “next generation” sequencers. Compared to the conventional sequencers, this technology is able to rapidly read large amounts of sequence data at a lower cost.

NGS Experimental Flowchart

  1. Sequencing: Reading nucleotide sequencing from DNA sample using NGS
  2. Primary analysis: Convert image-derived raw data into fragmented sequence data
  3. Secondary analysis: Assorting big data, creating whole genome data (e.g. Assembly, Mapping)
  4. Tertiary analysis: Data analysis using whole genome data reference (e.g. SNP genotyping)
  5. Acquisition of whole genome sequence data with annotation and other relevant information

NGS Experimental Protocol

The following will explain in details the principles of flowchart (above) up until the secondary analysis, while introducing notable NGS technology currently in use.
In order to obtain the whole genome sequence data using NGS, it is necessary to assemble and annotate after template preparation, sequencing/imaging, and secondary analysis.

  • Template Preparation

For template preparation, there is a method of performing PCR beforehand and amplifying the DNA sample and a method of without prior PCR to avoid sequence bias. In either method, NGS fixes the template on a solid surface or support, thus it is possible to perform several thousand to several billion sequencing at the same time.

Clonally amplified templates (with PCR)

  • Emulsion PCR (emPCR)

Fragmented DNA samples are subjected to adaptor sequence ligation via PCR and subsequently tethered to beads for further protocol. With DNA-beads complex, PCR reagents, and emulsion oil, the DNA template is then amplified (denaturation, annealing, extentson). After amplification process, the DNA strands are removed from the beads using isopropanol and detergent buffer. emPCR protocol is used in Roche/454 and Polonator by Life/AGP.

  • Solid-phase amplification

In solid-phase amplification, the adaptor-ligated DNA fragments bound by flow cell are amplified via Bridge PCR. In this PCR protocol, When a free adapter (not bound to the substrate) binds to a near oligo, DNA forms an arch and is amplified to double strands. As PCR cycles increase, clusters form on the substrate. Solid-phase amplification protocol is used in Illumina/Solexa.

Single molecule templates (without PCR)

Single Molecule Sequencing requires samples as low as about 1μg without amplifying the DNA samples. Generally before NGS is carried out, single molecule templates are immobilized on solid support. In HeliScope/HelicoBioScience, the adaptor-ligated DNA is fixated onto the support for later hybridization process whereas in Pacific BioSciences, the polymerase molecule is instead bound to the support for reading longer sequences.

  • Sequencing:
    There are four different ways in which nucleotide sequences can be read.

・Cyclic revers termination (CRT)

CRT relies reversible terminators under cyclic method to image the correct, fluorescently-labeled nucleotide on the reading sequence. To do this, 3’ blocked (Illumina/Solexa)or 3’ unblocked groups (LaserGen,Inc./Helicos Biosciences) are use to add nucleotide on to the strand. Blocking groups including 3’-O-allyl-2’deoxyribonucleotide triphosphate (dNTP) prevents further DNA polymerase extension. The fluorophores are removed by cleaving the chemical bond with nucleobase.
There are 4 different-colored fluorophores corresponding to the 4 different kinds of nucleotide. These colors are detected by total internal reflection imaging via two laser to determine the bases.

・Sequencing by ligation (SBL)

The use of ligation provides the basis for this sequencing technique where the fluorescently-labeled probe hybridizes to its complementary sequence. This oligonucleotide consists of 2 fluorescently labeled interrogation bases for imaging, 3 degenerate bases, and 3 universal bases. Each nucleotide is interrogated twice (i.e. imaged at different position twice) to determine the identity of the nucleotide base. Thus for five-nucleotide set interval, the process of probe hybridization, ligation, imaging, probe cleavage is repeated ten-times. Such process by Life/APG SOLiD, Polonator is useful for variant discoveries as the process provides inherent error connections.

・Pyrosequencing

In pyrosequecing, two types of beads are loaded into the PTP wells: DNA-amplified beads and smaller beads bound with sulphurylase and luciferase. Single-type nucleotide (dNTP) is added onto each well where the release of pyrophosphate upon binding to complementary base is converted into visible light. The visible light is then detected by high-resolution charge-coupled device (CCD) camera thus recorded into series of peaks called the flow gram.
This sequencing method provided by Roche/454 provides longer reads to improve genome mapping but can cause high error rates when the DNA consists of high repetitive sequence.

・Real-time sequencing

Pacific Biosciences’s real time sequencing is based on continuous incorporation of dye-labelled nucleotide during DNA synthesis and is detected using a platform called zero-mode waveguide (ZMW) detector. Single DNA polymerase is attached to the bottom each well attaches the phopholinked hexaphosphate nucleotides to the complementary site where fluorescent species (pico-/nanomolar concentration range) are subsequently determined by fluorescent pulse.

Alignment and Assembly

The nucleotide sequence read by NGS can be assembled using two kinds of approaches: 1) ab initio (reference-based), and 2) de novo strategies.

・Alignment

The obtained sequence data is referenced and aligned to the sequence in existing database.

de novo Assembly

If there is no reference data for the target sequence, the new reads can be aligned by finding overlaps and further assembled them into large sequences.

Cost of NGS

The cost for whole genome sequencing via NGS per individual has been sharply decreasing. Yet, the cost for large-scale genome analysis studies remains high for many researchers. Therefore, various approaches has been studied to further reduce the cost and labor for the whole genome sequencing, including the use “Genome Enrichment” to enrich the target sequence of interest. Methods currently available include, Microdroplet PCR, Solid-phase capture and solution-phase capture.

References

Nature Reviews Genetics 11, 31–46 (2010)
遺伝子工学の原理 編者 藤原伸介 三共出版 2012年 p35~p46 (Principles of Genetic Engineering 2012, Shinsuke Fujihara)

This post is also available in: Japanese