Small-RNA analysis


Demo Report

 

After gathering the raw data from Solexa/Illiumina Hiseq 2500 sequencing platform, all reads were going to adapter sequence identification and removal. In short, a sophisticated extraction method was developed at YourGene Bioscience (data unpublished) to achieve the purpose of adapter trimming.

Started from those filtered reads passed through our first criterion of adapter identification. 5’ adaptor sequence is “GTTCAGAGTTCTACAGTCCGACGATC” and 3’ adaptor sequence is “TGGAATTCTCGGGTGCCAAGG”. The size selection criterion was restricted to the length of reads between 15 and 27. The sequence reads with criterion will go into subsequent alignment analysis.

Table 1. The length distribution of reads in 6 samples.

 Biological Sample-1  Biological Sample-2  Biological Sample-3  Biological Sample-4  Biological Sample-5  Biological Sample-6
Initial reads 5,763,886 5,612,931 5,546,938 5,577,522 5,945,932 5,661,091
Total reads passed filter 5,733,487 5,559,541 5,492,882 5,531,298 5,115,470 5,182,832
Length:15 30,700 29,817 29,883 21,277 44,023 20,129
Length:16 76,792 78,741 73,675 54,696 52,451 50,737
Length:17 69,129 126,132 118,010 72,207 53,022 62,067
Length:18 82,535 189,443 251,027 98,069 41,928 60,419
Length:19 81,618 162,200 87,117 73,349 45,928 53,808
Length:20 378,781 359,516 300,226 282,194 171,518 241,476
Length:21 1,131,059 928,609 838,202 939,205 569,678 860,044
Length:22 2,284,203 1,709,816 2,052,417 2,160,395 1,437,757 2,026,803
Length:23 683,973 582,858 656,831 610,423 468,176 592,855
Length:24 495,629 407,184 451,598 571,893 415,899 482,454
Length:25 84,069 130,186 86,606 89,898 69,985 76,587
Length:26 49,587 106,693 59,969 52,494 41,470 44,551
Length:27 50,443 103,606 56,036 55,533 41,031 43,477
Length 15 - 27 5,498,518 4,914,801 5,061,597 5,081,633 3,452,866 4,615,407
less 15 85,836 196,384 225,307 149,623 1,277,484 304,198
more 27 149,133 448,356 205,978 300,042 385,120 263,227

 

Table 2. The length distribution (Proportion) of reads in 6 samples.

Biological Sample-1 Biological Sample-2 Biological Sample-3 Biological Sample-4 Biological Sample-5 Biological Sample-6
Length:15 0.54% 0.54% 0.54% 0.38% 0.86% 0.39%
Length:16 1.34% 1.42% 1.34% 0.99% 1.03% 0.98%
Length:17 1.21% 2.27% 2.15% 1.31% 1.04% 1.20%
Length:18 1.44% 3.41% 4.57% 1.77% 0.82% 1.17%
Length:19 1.42% 2.92% 1.59% 1.33% 0.90% 1.04%
Length:20 6.61% 6.47% 5.47% 5.10% 3.35% 4.66%
Length:21 19.73% 16.70% 15.26% 16.98% 11.14% 16.59%
Length:22 39.84% 30.75% 37.37% 39.06% 28.11% 39.11%
Length:23 11.93% 10.48% 11.96% 11.04% 9.15% 11.44%
Length:24 8.64% 7.32% 8.22% 10.34% 8.13% 9.31%
Length:25 1.47% 2.34% 1.58% 1.63% 1.37% 1.48%
Length:26 0.86% 1.92% 1.09% 0.95% 0.81% 0.86%
Length:27 0.88% 1.86% 1.02% 1.00% 0.80% 0.84%
Length 15 - 27 95.90% 88.40% 92.15% 91.87% 67.50% 89.05%
less 15 1.50% 3.53% 4.10% 2.71% 24.97% 5.87%
more 27 2.60% 8.06% 3.75% 5.42% 7.53% 5.08%

 

 

First, the reads were mapped to reference genome sequence (mm10) by bowtie, all reads with alignment were mapped back to Mus musculus miRNA mature reported in miRBase v19. To adapt polymorphisms and RNA editing in miRNA identification, we allow up to three terminal mismatches. The remaining reads with not mapped to Mus musculus miRNA mature were compared with noncode databases to identify known small RNAs such as snoRNA, snRNA piRNA and lncRNA.

1. Mapped to genome

Biological Sample-1 Biological Sample-2 Biological Sample-3 Biological Sample-4 Biological Sample-5 Biological Sample-6
Mapped to genome 5,455,501 4,837,901 5,020,704 5,031,530 3,408,484 4,569,130
Not mapped to genome 43,017 76,900 40,893 50,103 44,382 46,277

 

2. Mapped to miRBase

Biological Sample-1 Biological Sample-2 Biological Sample-3 Biological Sample-4 Biological Sample-5 Biological Sample-6
Mapped to miRBase 3,453,024 2,485,292 2,946,750 3,006,772 1,921,873 2,848,728
Not mapped to miRBase 2,002,477 2,352,609 2,073,954 2,024,758 1,486,611 1,720,402

 

3. Mapped to nonCode

Biological Sample-1 Biological Sample-2 Biological Sample-3 Biological Sample-4 Biological Sample-5 Biological Sample-6
snoRNA 9,259 21,995 7,392 5,907 5,430 4,987
snRA 800 8,329 2,428 1,324 4,739 3,914
lncRNA 753,411 645,974 791,460 670,746 547,368 590,737
piRNA 94,688 92,048 73,117 71,862 34,551 54,780
other non-coding RNA 193,696 273,622 179,286 343,228 156,359 246,040
Not mapped to noncode 950,623 1,310,641 1,020,271 931,691 738,164 819,944

 

After mapped reads to Mus musculus miRNA mature, the mature miRNA expressions (read count) showed in EXCEL file (expression.xlsx).  The differential expression analysis is using DESeq (S Anders et al., 2010) and shown in Table 3 and EXCEL file (Diff.xlsx), the p-value cut-off is 0.05.

Table 3. Significant miRNA in differential expression analysis

Sample Biological Sample-1 Biological Sample-2 Biological Sample-1 Biological Sample-3 Biological Sample-1 Biological Sample-4 Biological Sample-1 Biological Sample-5 Biological Sample-1 Biological Sample-6
Significant mmu-miR-5115
mmu-miR-5130
mmu-miR-5126
mmu-miR-149-3p
mmu-miR-2861
mmu-miR-323-5p
mmu-miR-1934-5p
mmu-miR-5120
mmu-miR-3104-5p
mmu-miR-328-5p

mmu-miR-542-3p
mmu-miR-6238
mmu-miR-1957b
mmu-miR-1187
mmu-miR-3473e
mmu-miR-3473b
mmu-miR-1965
mmu-miR-149-3p
mmu-miR-137-5p
mmu-miR-302c-3p
mmu-miR-1306-5p
mmu-miR-6244
mmu-miR-6241
mmu-miR-379-3p
mmu-miR-411-3p
mmu-miR-1186b
mmu-miR-98-3p
mmu-miR-770-5p
mmu-miR-6390
mmu-miR-137-5p
mmu-miR-5109
mmu-miR-1199-5p
mmu-miR-6390


Figure 2. Heatmap Analysis

  1. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.
  2. S Anders, W Huber. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106.
  3. Changning Liu, etal. NONCODE: an integrated knowledge database of non-coding RNAs. Nucleic Acids Research, 2005, Vol. 33, Database issue D112-D115.
  4. Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. NAR 2011 39(Database Issue):D152-D157

 

  • analysis/expression.xlsx
  • analysis/Diff.xlsx