Cell-free DNA Sequencing


Demo Report

Fig1. Experiment pipeline of plasma library

DNA is end-repaired and A-tailed using the polymerase activity of klenow fragment. Indexed adapters are then ligated to the DNA fragments by DNA ligase followed by performing PCR reaction of 10 to 18 cycles to enrich the adapter-modified DNA fragments. After validating the libraries by QPCR, Experion and Qubit, the library could be sequenced using Illumina HiSeq™ 2500.

First step in the trim process was converting the quality score (Q) to error probability. Next, for every base a new value was calculated:

This value would be negative for low quality bases, where the error probability was high. For every base, we calculated the running sum of this value. If the sum dropped below zero, it was set to zero. The part of the sequence between the first positive value of the running sum and the highest value of the running sum was retained. Everything before and after this region was trimmed off. In addition, if the read length was shorter than 35bp, the read would also be discarded.

Table 1.Sequencing Summary

Biological Sample-1

Biological Sample-2

Biological Sample-3

Total read

6386271

15354822

10316842

Original read length

126

126

126

Total base

78425645

214845501

132232842

Total read after QT

6194994

15348146

10311844

Average length after trimming

125.48

125.42

125.86

Total base after QT

79216731

204527816

130601065

Percentage trimming

99.75%

99.64%

99.89%

 

 

The analysis follows the best practices as recommended by the Broad Institute. Reads sequence were aligned by bwa aligner to human reference genome (GRCh37) (Li H, 2009). The exome SNP calls were produced using the GATK SNP calling pipeline (DePristo M., 2010).

 

 

Figure 3. Distribution of mapped reads

 

Table 2. Non duplicate rate

  Biological Sample-1 Biological Sample-2 Biological Sample-3
Non duplicate rate 81% 79% 83%

 

Variant annotation was carried out using Variant Effect Predictor to add information such as what gene the variant is in, the consequence of the mutation (nonsynonymous, nonsense, etc.) and information from databases such as PolyPhen2 , SIFT , dbSNP , and COSMIC (McLaren W, et al., 2010; Adzhubei IA et al., 2010; Kumar P et al., 2009; Sherry ST et al., 1999; Forbes SA et al.,2011).

 

Figure 4. Summary of calculated variation consequences

Figure 5. Summary of SIFT prediction

Figure 6. Summary of PolyPhen prediction

  1. Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60.
  2. DePristo M, et al., (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics. 43:491-498
  3. McLaren W et al., (2010). Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26(16):2069-70  doi:10.1093/bioinformatics/btq330
  4. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, et al. (2010) A method and server for predicting damaging missense mutations. Nature Methods 7: 248–249. doi: 10.1038/nmeth0410-248
  5. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols 4: 1073–1081. doi: 10.1038/nprot.2009.86
  6. Sherry ST, Ward M, Sirotkin K (1999) dbSNP - Database for Single Nucleotide Polymorphisms and Other Classes of Minor Genetic Variation. Genome Research 9: 677–679.
  7. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research 39: D945–D950. doi: 10.1093/nar/gkq929

 

  • analysis/variant_call_add_annotation.xlsx