Agilent Technologies. Element Biosciences. Illumina. MGI Tech. Oxford Nanopore Technologies PacBioPacBio. Qiagen. Roche. Singular Genomics Thermo Fisher Scientific
Ultima Genomics Twist Biosciences
single cell sequencing:
10x Genomics (conoply biosciences) (single cell RNA sequencing, kits for B and T cell receptor profiling ArgenTAG Fluent Biosciences
Enzymatic sequencing:
Ansa Biotechnologies Moligo Technologies NunaBio
Introduction:
The most popular second-generation sequencing platofrms are the 454 sequencing system (Roche), the SOLiD system (Life Technology) and the HiSeq and Genome Analyzer platforms (Illumina). Recently, to overcome the limitations of second generation sequencing due to reverse transcription and PCR amplificaiton, third generaiton sequencing platforms ahve been developed based on direct single molecule sequencing. Another benefit of third generaiton platforms is the decrease in indirect data; measurements are directly linked to the nucleotide sequence rather than being converted into quantitative data for base calling fro captured images. Third generation sequencing offers the following advnatages over second generation systems: higher throughput, higher fold coverage in minutes, higher consensus accuracy, longer read lengths and the need for smaller amounts of starting material. Monsuro, “Next generation sequencing: new tools in immunology and hematology” Blood Res 2013, 48: 242-9).
There has been a rapid proliferation in the number of next-generation sequencing (NGS) platforms, including Illumina, the Applied Biosystems SOLID System, 454 Life Sciences (Roche), Helicos HeliScope, Complete Genomics, Pacific Biosciences PacBio and Life Technologies torrent. Template preparation consists of building and amplifying a library of nucleic acid (genomic DNA or cDNA). Sequencing libraries are constructed by shearing the DNA sample into fragments of about 500 p or less and ligating adapter sequences (synthetic oligonucleotides of a known sequence) onto the ends of the DNA fragments. Once constructed, libraries are clonally amplified in preparation for sequencing. Depending on the platform, the amplification method can vary. For instance, the Life Technologies Ion Torrent PGM platform utilises emulsion PCR on the One Touch system to amplify single library fragments onto microbeads, wheras the Illumina MiSeq instrument utilises bridge amplificaiton to form template clusters on a flow cell. (Stambrook, “Next-generation sequencing technologies: breaking the sound barrier of human genetics” Mutagenesis, 2014, 29(5), 303-310).
Next-geenration sequencing (NGS), also called massive parallel sequencing, was developed in the last decade and allows simultaneous sequencing of millions of DNA fragments, without previous sequence knowledge. This advanced techology ahs been a true revolution copared with the traditional sequencing methods, in which one or a few relativey short fragments of DNA, previously amplified by PCR, could be sequenced per tube. With NGS, the today promise is that a coplete genome can be sequenced in a few days for less than 1k per gehome. (Kamps, “Next-generation sequencing in oncology: genetic diagnosis, risk prediction and cancer classification” International J. of Molecular Sciences, 2017, 18, 308)
Next generation sequencing (NGS) refers to a procedure similar to capillary electrophoresis based sequencing in which DNA polymerase catalyzes the incorporation of fluorescently labeled desoxyribonucleotide triphosphates (dNTPs) into a DNA template strand during sequential cycles of DNA synthesis. During each cycle, at the point of incorporation, the nucleotides are identified by fluorophore excitation. Instead of sequencing a single DNA fragment, the process extends across millions of fragments in a massively parallel manner. (Maher Albitar, Neogenomics, US Patent No: 10,253,370). In practice, this technology increases sequencing thorugh-put by attaching millions of DNA fragments to a solid surface or support, and simultaneously sequencing all fragments in parallel. Current methods generally involve randomly breaking the sample into fragments and building fragment libraries. The fragment libraries are then prepared for sequencing by ligating specific adaptor oligonucleotides to both ends of each fragment, and subsequently using these as sequencing templates. The typical output of NGS is a list of billions of short sequences (25 to 400 bp), called reads, associated with quality scores. The molecular reliability of NGS data depend on 3 criteria: depth fo coverage, heterogeneity, and accuracy of sequencing. Depth of coverage indiates the number of times that a given nucleotide is sequenced (for example a 5x indiates that each nucleotide of the target region was sequenced, on average, 5 times). Heterogeneity is a measure of uneven sequencing depth of coverage along the lenght of the expressed region. Finally, accuracy of sequencing is indicated by the quality of the base calls or quality scores (i.e., the quality scores assigned to each base call in automated sequencer trace, known as phred scores). Monsuro, “Next generation sequencing: new tools in immunology and hematology” Blood Res 2013, 48: 242-9).
The analysis of the billions of short sequence reads generated by NGS platforms requires powerful computational tools. Such tools must be able to align reads to a reference transcriptome or genome sequence to identify and quantify expressed gene isoforms (transcriptome profiling), and to perfom differential expression analysis between specimens (expression quantification). As a fule, the estimation of expression levels in RNA-seq analysis is perfomred in 2 steps: (1) sequence alignments to a reference genome and (2) quantificaiton of gene isoform expression levels. Since the entire process reqires several comptuer programs to be used (whose parameters must be tuned according to the goal of the study), researches tend to prepare their pipeline of programs to analyze RNA-seq sampels in an automated and simple manner. Monsuro, “Next generation sequencing: new tools in immunology and hematology” Blood Res 2013, 48: 242-9).
RNA can be isolated from antibody-secreting cells and libraries prepared by RT-PCR which can then be sequenced using the Illumina MiSeq platform. The robustness of the antibody repertoire data can then be assessed based on clonal identificaiton defined by amino acid sequence of either full-lenght VDJ region or the CDR3. (Ereif, “Quantitative assessment of the robustness of next-generation sequencing of anitobdy variable gene repoertoires from immunized mice” (2014) BMC immunology. 15(40).
High-throughput DNA sequencing can be used to analyze the VL and VH gene repertories derived form the mRNA transcripts of fully differentiated mature B cells, antibody-secreting BMPCs, from immunized mice. After bioinfomratic analysis, several abundant VL and VH gene sequences can be identified with the repertoire of each immunized mouse. VL and VH genes can be parted acording to their relative frequencings within the repertoire. Antibody gene can then be synthesized by oligonucleotide and PCR assembly by automated liquid-handling robots. Recombinant antibodies are then expressed in bacterail and mammalian systems as single-chain variable fragmetns (scFv) and full-lenght IgG, respetively. (Reddy, “Monoclonal antibodies isolated without screwewning by analyzing the variable-gene repertoire of plasma cells” (2010) Nature Biotechnology, 28(9); 965-969.
Multiomics uses NGS techniques to collect unbiased data from different biologcial levels “omes” in one experiment –1. proteomics, 2-transcriptomics 3-epigenetics and 4 -genomics. Doing this allows researcers to examine the same problem from multiple angles. Multiomics has been instrumental in udnerstanding why different patients with the sae type of cancer have variable respones to the same treatment. Single omics may not adequately explain these different resposne rates given taht cancers contain heterogeneous populations of cells which change and evolve in response to treatment. (Multiomics: An Overview of Useful Methods and Applications” Illumina)
NGS Methods:
The NGS workflow includes the basic steps of (1) the sequencing library is prepared by random fragementaion of the DNA or cDNA sample, followed by 5′ and 3′ adapter ligation. Alternatively, “fagmentation” combines the fragmentation and ligation reactions into a single step to increase the efficiency of the library preparation step. Adapter-ligated fragments are then PCR amplified and gel purified. (2) for cluster generation, the library is loaded into a flow cell where fragmetns are captured on a law of surface-bound oligos complementary to the libary adaptorers. Each fragment is then amplified into distinct, clonal clusters through bridge amplificaiton When cluster generation is completed, the templates are ready for sequencing; (3) seqeuncing reagents including fluorescently labeled nucleotides, are added and the first base is incorporated. The flow cell is imaged and the emission from each cluster is recorded. The emission wave-lenghts and intesities are used to identify the bases. (4) newly identified sequence reads are aligned to a reference genome. After alignment, differences between the referenced genome and the newy sequenced reads can be idnetified. (Maher Albitar, Neogenomics, US Patent No: 10,253,370).
Different approaches can be used according to the needs and the questions to be addressed. The initial input material can be genomic DNA 9DNA-seq), messenger or non-coding RNA (RNA-seq) or any nucleic/ribonucleic material obtained after specific procedrues. The implemtntaion of NGS technology can be visualised as four major blocks: (Kamps, “Next-generation sequencing in oncology: genetic diagnosis, risk prediction and cancer classification” International J. of Molecular Sciences, 2017, 18, 308)
(1) Library perpation or sample processing: the material is first fragmented mechanically or enzymatically to yield fragments whose size is compatible with the sequencer (small fragmetns of 200-300 nucleotides for short read sequencing, longer for teh long read sequencing). This material can be enriched to analyse a limited number of genetic regions (e.g., disease gene panels or microbes or all coding exons of the human genome from about 21,000 genes; Whole-exome sequenencing, WES). The complete genomic DNA can also be sequenced (Whole-Genome Sequencing, WGS) and it does not reqire any enrichment step. The regions that are intended to be analysed are defined reigon of inteterest (ROIs). An amplification step through PCR with 4-12 cycles is performed in most cases. During this step, poper linekrs and barcodes are attached to the DNA fragments adn are necessary for subsequent analyses by the sequencer. DNA barcodes, which are unique nucleotide tags (6-8 nt) allow pooling sampels togetehr oin one single flowcell for teh sequencing reaciton.
(2) Sequencing:
For most clinical applications, the use of gene-panels to sequence only a discrete number of genes of interest has been the method of choice, because of its cost efficiency, and because at the same time it achieves high coverage of ROIs and offers implicity in the raw and subsequent data analyses. When the number of enes sequenced is restricted to the few already analysed in previous diagnostic tests using traditional methods, this is normally called targeted re-sequencing. Different protocals are available to design and capture panels of genes and other ROIs. In most cases, companies providing the library prepration kits offer online use friendly tools to design the hydridisation probes or the PCR oligos to enrich the desiged ROIs. Envirhcmetn can be obtained via solid phase hydridisation, in soclution hybridisation (ost fequently sued) or PCR based enrichment and is followed by amplificaiotn via ultiplex PCR, folling circule amplificaiton (HaloPlex) or amplicon based microdroplet PCR (RainDance technology). The latter presetns the advantage of simultaneously amplifying a large number of targeted regions into separate micro drops, thus keeping each amplification separate form teh others and limiting the distrurbance due to primer pair interactions. A cheap and flexible method to capture small regions of the genome for NGS analyses is the Molecualr Inversion Probe.
(3) Initial quality and raw data analyses:
(4) Variant calling and data interpretation. This step is dependent on the specific applciaiton.
Sequencing Before and After Immunziation:
Ankeny (US Patent Application 16/060,304, published as US 2020/0407426) discloses that prior to immunization, the non-immune repertories was sampled from venous blood from hens. Hens were hyperimmunized (booster) and 13 days following the primary hyperimmunization, the post-state repertoire was sampled form venous blood. PMBCs were isolated from anticoagulated bood by density gradient centrifugation, isolated PMBCs were then extracted using the mir-Vana kit form Thermo Fisher, following the PBMC enrichment, cell pellets were lised and mRNAs isolated and reverse transcribed, the variable regions of the H and L chains amplified, sequenced and cloned. Knowing the repertoire prior to hyperimmunization can allow for calculation of clonal frequency changes that can be explained by B cell clonal expansion during the adaptive immune response. A variety of sequencing methods can be used. In one embodiment, Ion Torrent sequencing libaries are prepared and sequenced. Other methods include Illuminia, 454 pyrosequencing as well as 3rd generation systems such as single-molecule real-time sequencing. Following amplication of VL and VH genes, purified VL and VH amplicon libraries were prepared using the Ion Torrent’s ion Plus Library Preparation Kit. Amplicon libraries recieved unique sequencing barcodes. Abundant CDRL3 sequences were matched to the candidate VH sequences by relative rank orders from immunized repertories. As a resutl of clustering by similarity, cominant CD3R3 sequences for each reperotire were revealed. CDR3 lenghts of the two immune states’ most abundant sequences can be significanlty different.
Swinkels (Virology Journal 2013, 10: 206) disclsoes determing the M2 specific antibody response (from influenza virus) in the serum befroe vaccination, 3 weeks after vaccination and two weeks after booster from chickens.
Ma (“characteristics peripheral blood IgG and IgM heavy chain complementarity determinign region 3 repertoire before anda fter immunation with recombinant HBV vaccine” published January 23, 2017). discloses high-througput sequences of BCR heavy chain CDR3 repertories in 3 healthy volunteeers before and after the thrid immunizaiton with recombinant HBV vaccine. The Roche 454 Genome Sequncer FLX system was used to perform a comparative analysis of IgM and IgG H chain CDR3 repertories. In the producedure, PBMCs were isolated from heparin treated periopheral blood using denstiy gradeint centrifugation. Total RNA was extracted from PBMCS, then reverse transcribed into cDNA and PCR was performed to amplify human BCRH hain CDR3 repoertories.
Affinity Purification – Mass Spectometry – Next-generation DNA sequencing
Cheung “A proteomics approach for the identificaiton and cloning of monoclonal antibodies from serum” 2012, Nature Biotechnology, 30(5): 447-452) discloses a proteomics approach that identifies antigen-specific antibody sequences directly from circualting polyclonal antibodies in the serum of an immunized animals. The approach includes affinity purification of antibodies with high specific activity and then analyzing digested antibody fractions by MS. High-confidence petpide spectral matches of antibody variable regions are obtaine by searcing a reference database created by next-generation DNA sequencing of the B-cell immunoglobulin repertoire of the immunized animals. Finally, H and L chain sequences are paired and expressed as recomibnant monoclonal antibodies.
Sequencing Directly from PCR Products:
Large numbers of templates for DNA sequencing can be produced via PCR directly from plaques, colonies or genomic DNA. Sequencing direclty from PCR products has many advantages over subcloning such as removing the need for template prepration. It is also highly amenable to automation. However, a problem is the subsequent purificaiton of the amplified products prior to DNA sequencing. (Hawkins”Solid-phase reversible immobilizaiton for the isolation of PCR products” Nucleic Acids Research, 1995, 23(22) discloses a method for producing aulity DNA sequencing template form PCR products termed SPRI “solid-phase reversible immobilization”. The SPRI employs a carboxyl coated magnetic partcile which can reversibly bind DNA in the presence of polyethylene glycol (PEG) and salt.
mRNA Sequencing:
Messenger RNA accounts for about 2% of the whole-transcriptome and is composed of poly-A-tailed RNA that codes for proteins. mRNA sequecing (mRNA-Seq) provides an unbiased and complete view of the coding transcriptome. Compared to Ttoal RNA-Seq, mRNA-Seq allows researchers to focus on the coding transcriptome which also means less but targeted data. Illumina reccomends the Illumina Stranded mRNA Prep which is a simple, scalable rapid library preprations solution for anlyzing teh coding transcriptome with as little as 25ng of RNA input. Sequencing can be done with either the NextSeq 1000/NextSeq 2000 or NovaSeq 6000. Data analysis can be perofrmed using the DRAGEN RNA Pipeline or Differential Expression apps to obtain differentail expression results at the gene and transcript levels. (Multiomics: An Overview of Useful Methods and Applications” Illumina)
Whole-Genome Sequecing (WGS):
WGS analyzes the whole gehome of a population of cells or of tissue samples. Using WGS, researchers can uncover genetic events that contribute to disease beyond protein coding variants. WES allows researchers to analyze the portion of the genome responsible for coding proteins (the exome). While the exome represents less than 2% of the entire genome, it accounts for 85% of diases related variants. using WES, researchers can study which protein coding variants contribute to disease or dysfunciton. After DNA extraction, libraries ofr WES can be prpared using the Illumina DNA prep with enrichment. Illunina reccomends the NextSeq 200 or NovaSeq ^000 sequencers for WES. Illumina recommends using the DRAGEN platofrm either on BaseSpace Sequence Hub or on a DRAGEN server to obtain data from WES. In BaseSpace Sequence Hub, you can monitor runs in real time while securely streaming data directly from the intruments into the ecosystem. (Multiomics: An Overview of Useful Methods and Applications” Illumina)
Whole-genome sequencing of bacterial pathogens:
Peacock “Rapid single-colony whole-genome sequencing of bacterail pathogens” J. Antimicrob Chemother 2014, 69, 1275-1281) discloses that rapid benchtop sequencers can provide multiple pieces of clinically relevant information in a single process. The starting material for bacterail WGS is typically purified DNA extracted form liquid culture. But sequencing can also be done directly form a single bacterail colony on primary isolation plates.
Proteomic Methods:
CITE-Seq uses oligonucleotide-labeled antibodies to measure proteins and RNA in teh same experiment. CITE-Seq is a high throughput multiomic tool that allows researchers to study protein expression and the intricacies of the cellular transcriptome both at the single cell level and for spatial anlysis. (Multiomics: An Overview of Useful Methods and Applications” Illumina)
Epigenomics Methods:
ATAC-Seq is an epigenomic discovery tool for mapping chromatin accessibility across the genome. This approach analyzes DNA accessbility using the Tn5 transposase. The Tn5 transposase inserts sequencing adapters into open chromatin regions. Researchers can then use sequencing to locate regions of increased chromatin accessbility. ATAC-Seq allows researchers to study how these regions impact gene expression and can be used to study both single cells and cell populations. (Multiomics: An Overview of Useful Methods and Applications” Illumina)