研究生: |
黃智沂 Huang, Chih Yi |
---|---|
論文名稱: |
以次世代定序技術為基礎之轉錄體側寫分析軟體系統 Accurate and Efficient Analysis Systems for Next-generation Sequencing Technologies-based Transcritpome Profiling |
指導教授: |
唐傳義
Tang, Chuan Yi |
口試委員: |
唐傳義
韓永楷 謝文萍 林俊淵 蔡七女 |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 62 |
中文關鍵詞: | 次世代定序 、轉錄體 、排比演算法 、演算法 |
外文關鍵詞: | Next Generation Sequencing, RNA-seq, Transcritpomics, Alignment |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
RNA-Seq是新近開發出來的方法,用來分析包括基因和小分子RNA的轉錄。RNA-seq 是革命性的工具,改變轉錄體研究的程度和複雜性。研究人員利用RNA-seq的技術,可以有效地測量轉錄體數據的實驗,他們可以得到 如基因融合,差異表達基因,等位基因或小分子RNA,和轉錄後的基因突變存在。要利用這個強大的工具,我們需要新的方法和工具,以克服面臨的挑戰。在這篇論文中,我們提出這整合生物的知識和算法技術的系統,以提高轉錄分析的準確性和效率的方法。
The RNA-Seq is a recently developed method to profile transcriptome including mRNAs and small RNAs. This revolutionary tool for transcriptomics, RNA-seq, altered the view of the extent and complexity of transcriptomes. Using RNA-seq, researchers can efficiently measure transcriptome data experimentally, they can get information such as gene fusions, differential expression of alleles of genes, or existences of small RNA and post-transcriptional mutations. To harness this powerful tool we need new methods and tools to overcome the chanllenge which come with RNA-seq. In this thesis, we proposed methods and systems which integrate biological knowledge and algorithmic techniques to improve the accuracy and efficiency of transcriptome profiling.
{Abouelhoda2004}
M.~I. Abouelhoda, S.~Kurtz, E.~Ohlebusch, Replacing suffix trees with enhanced
suffix arrays, Journal of Discrete Algorithms 2~(1) (2004) 53 -- 86.
{Adjeroh2008}
D.~Adjeroh, T.~Bell, A.~Mukherjee, The Burrows-Wheeler Transform:: Data
Compression, Suffix Arrays, and Pattern Matching, Springer, 2008.
{Altschul1990}
S.~Altschul, W.~Gish, W.~Miller, E.~Myers, D.~Lipman, Basic local alignment
search tool, Journal of molecular biology {215}~({3}) ({1990}) {403--410}.
{Au2010}
K.~F. Au, H.~Jiang, L.~Lin, Y.~Xing, W.~H. Wong, Detection of splice junctions
from paired-end rna-seq data by splicemap, Nucleic Acids Research 38~(14)
(2010) 4570--4578.
{Bahcall2007}
O.~Bahcall, Milestone 15: Blast-off for genomes (2007).
{Bennett2004}
E.~A. Bennett, L.~E. Coleman, C.~Tsui, W.~S. Pittard, S.~E. Devine, Natural
genetic variation caused by transposable elements in humans, Genetics 168~(2)
(2004) 933--951.
{Bloom2009}
J.~Bloom, Z.~Khan, L.~Kruglyak, M.~Singh, A.~Caudy, Measuring differential gene
expression by short read sequencing: quantitative comparison to 2-channel
gene expression microarrays, BMC Genomics 10~(1) (2009) 221.
{Burrows1994}
M.~Burrows, D.~J. Wheeler, {A block-sorting lossless data compression
algorithm.}, Tech. Rep. 124, HP Labs (1994).
{Califano1993}
A.~Califano, I.~Rigoutsos, Flash: A fast look-up algorithm for string homology,
in: Proceedings of the 1st International Conference on Intelligent Systems
for Molecular Biology, AAAI Press, 1993.
{Chaisson2008}
M.~J. Chaisson, D.~Brinza, P.~A. Pevzner, De novo fragment assembly with short
mate-paired reads: Does the read length matter?, Genome Research.
{Cloonan2008}
N.~Cloonan, A.~R.~R. Forrest, G.~Kolle, B.~B.~A. Gardiner, G.~J. Faulkner,
M.~K. Brown, D.~F. Taylor, A.~L. Steptoe, S.~Wani, G.~Bethel, A.~J.
Robertson, A.~C. Perkins, S.~J. Bruce, C.~C. Lee, S.~S. Ranade, H.~E.
Peckham, J.~M. Manning, K.~J. McKernan, S.~M. Grimmond, Stem cell
transcriptome profiling via massive-scale mrna sequencing, Nat Meth 5~(7)
(2008) 613--619.
{Crochemore2002}
M.~Crochemore, W.~Rytter, Jewels of Stringology, World Scientific, 2002.
{DeBona2008}
F.~De~Bona, S.~Ossowski, K.~Schneeberger, G.~R瓣tsch, Optimal spliced
alignments of short sequence reads, Bioinformatics 24~(16) (2008) i174--i180.
{Delcher1999}
A.~L. Delcher, S.~Kasif, R.~D. Fleischmann, J.~Peterson, O.~White, S.~L.
Salzberg, Alignment of whole genomes, Nucleic Acids Research 27~(11) (1999)
2369--2376.
{Ding2010}
L.~Ding, M.~C. Wendl, D.~C. Koboldt, E.~R. Mardis, Analysis of next-generation
genomic data in cancer: accomplishments and challenges, Human Molecular
Genetics 19~(R2) (2010) R188--R196.
{Earnest2011}
L.~Earnest, The first three spelling checkers, Tech. rep., Stanford University
(2011).
{Ferragina2009}
P.~Ferragina, R.~Giancarlo, G.~Manzini, The myriad virtues of wavelet trees,
Inf. Comput. 207~(8) (2009) 849--866.
{Ferragina2000}
P.~Ferragina, G.~Manzini, Opportunistic data structures with applications, in:
Proceedings of the 41st Annual Symposium on Foundations of Computer Science,
IEEE Computer Society, Washington, DC, USA, 2000.
{Flicek2009}
P.~Flicek, E.~Birney, Sense from sequence reads: methods for alignment and
assembly, Nat Meth 6~(11s) (2009) S6--S12.
{Fredkin1960}
E.~Fredkin, Trie memory, Commun. ACM 3 (1960) 490--499.
{Garber2011}
M.~Garber, M.~G. Grabherr, M.~Guttman, C.~Trapnell, Computational methods for
transcriptome annotation and quantification using rna-seq, Nat Meth 8~(6)
(2011) 469--477.
{Gnerre2011}
S.~Gnerre, I.~MacCallum, D.~Przybylski, F.~J. Ribeiro, J.~N. Burton, B.~J.
Walker, T.~Sharpe, G.~Hall, T.~P. Shea, S.~Sykes, A.~M. Berlin, D.~Aird,
M.~Costello, R.~Daza, L.~Williams, R.~Nicol, A.~Gnirke, C.~Nusbaum, E.~S.
Lander, D.~B. Jaffe, High-quality draft assemblies of mammalian genomes from
massively parallel sequence data, Proceedings of the National Academy of
Sciences 108~(4) (2011) 1513--1518.
{Grossi2003}
R.~Grossi, A.~Gupta, J.~S. Vitter, High-order entropy-compressed text indexes,
in: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete
algorithms, SODA '03, Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, 2003.
{Grossi2005}
R.~Grossi, J.~S. Vitter, Compressed suffix arrays and suffix trees with
applications to text indexing and string matching, SIAM J. Comput. 35 (2005)
378--407.
{Guttman2010}
M.~Guttman, M.~Garber, J.~Z. Levin, J.~Donaghey, J.~Robinson, X.~Adiconis,
L.~Fan, M.~J. Koziol, A.~Gnirke, C.~Nusbaum, J.~L. Rinn, E.~S. Lander,
A.~Regev, Ab initio reconstruction of cell type-specific transcriptomes in
mouse reveals the conserved multi-exonic structure of lincrnas, Nat Biotech
28~(5) (2010) 503--510.
{Hahn2009}
D.~Hahn, G.~Ragland, D.~Shoemaker, D.~Denlinger, Gene discovery using massively
parallel pyrosequencing to develop ests for the flesh fly sarcophaga
crassipalpis, BMC Genomics 10~(1) (2009) 234.
{He2008}
Y.~He, B.~Vogelstein, V.~E. Velculescu, N.~Papadopoulos, K.~W. Kinzler, The
antisense transcriptomes of human cells, Science 322~(5909) (2008)
1855--1857.
{Hillier2009}
L.~W. Hillier, V.~Reinke, P.~Green, M.~Hirst, M.~A. Marra, R.~H. Waterston,
Massively parallel sequencing of the polyadenylated transcriptome of c.
elegans, Genome Research 19~(4) (2009) 657--666.
{Homer2009}
N.~Homer, B.~Merriman, S.~F. Nelson, Bfast: An alignment tool for large scale
genome resequencing, PLoS ONE 4~(11) (2009) e7767.
{Initiative2000}
T.~A.~G. Initiative, Analysis of the genome sequence of the flowering plant
arabidopsis thaliana, Nature 408~(6814) (2000) 796--815.
{Jiang2008}
H.~Jiang, W.~H. Wong, Seqmap: mapping massive amount of oligonucleotides to the
genome, Bioinformatics 24~(20) (2008) 2395--2396.
{Knuth1998}
D.~Knuth, The Art of Computer Programming. 3: Sorting and Searching, vol.~3,
Addison-Wesley, 1998.
{Kurtz2004}
S.~Kurtz, A.~Phillippy, A.~Delcher, M.~Smoot, M.~Shumway, C.~Antonescu,
S.~Salzberg, Versatile and open software for comparing large genomes, Genome
Biology 5~(2) (2004) R12.
{Lam2008}
T.~W. Lam, W.~K. Sung, S.~L. Tam, C.~K. Wong, S.~M. Yiu, Compressed indexing
and local alignment of dna, Bioinformatics 24~(6) (2008) 791--797.
{Langmead2009}
B.~Langmead, C.~Trapnell, M.~Pop, S.~Salzberg, Ultrafast and memory-efficient
alignment of short dna sequences to the human genome, Genome Biology 10~(3)
(2009) R25.
{Li2009}
H.~Li, R.~Durbin, Fast and accurate short read alignment with burrows--wheeler
transform, Bioinformatics 25~(14) (2009) 1754--1760.
{Li2010b}
H.~Li, R.~Durbin, {Fast and accurate long-read alignment with Burrows-Wheeler
transform}, Bioinformatics 26~(5) (2010) 589--595.
{Li2010}
H.~Li, N.~Homer, A survey of sequence alignment algorithms for next-generation
sequencing, Briefings in Bioinformatics 11~(5) (2010) 473--483.
{Li2008a}
H.~Li, J.~Ruan, R.~Durbin, Mapping short dna sequencing reads and calling
variants using mapping quality scores, Genome Research 18~(11) (2008)
1851--1858.
{Li2003}
M.~Li, B.~Ma, D.~Kisman, J.~Tromp, Patternhunter ii: Highly sensitive and fast
homology search, Genome Informatics 14 (2003) 164--175.
{Li2008}
R.~Li, Y.~Li, K.~Kristiansen, J.~Wang, Soap: short oligonucleotide alignment
program, Bioinformatics 24~(5) (2008) 713--714.
{Li2009a}
R.~Li, C.~Yu, Y.~Li, T.-W. Lam, S.-M. Yiu, K.~Kristiansen, J.~Wang, Soap2: an
improved ultrafast tool for short read alignment, Bioinformatics 25~(15)
(2009) 1966--1967.
{Li2010a}
R.~Li, H.~Zhu, J.~Ruan, W.~Qian, X.~Fang, Z.~Shi, Y.~Li, S.~Li, G.~Shan,
K.~Kristiansen, S.~Li, H.~Yang, J.~Wang, J.~Wang, De novo assembly of human
genomes with massively parallel short read sequencing, Genome Research 20~(2)
(2010) 265--272.
{Lin2008}
H.~Lin, Z.~Zhang, M.~Q. Zhang, B.~Ma, M.~Li, Zoom! zillions of oligos mapped,
Bioinformatics 24~(21) (2008) 2431--2437.
{Lister2008}
R.~Lister, R.~C. O'Malley, J.~Tonti-Filippini, B.~D. Gregory, C.~C. Berry,
A.~H. Millar, J.~R. Ecker, Highly integrated single-base resolution maps of
the epigenome in arabidopsis, Cell 133~(3) (2008) 523 -- 536.
{Lou2010}
S.~Lou, B.~Ni, L.-Y. Lo, S.~Kwok-Wing~Tsui, T.-F. Chan, K.-S. Leung, Abmapper:
a suffix array-based tool for multi-location searching and splice-junction
mapping, Bioinformatics.
{Ma2002}
B.~Ma, J.~Tromp, M.~Li, Patternhunter: faster and more sensitive homology
search, Bioinformatics 18~(3) (2002) 440--445.
{Maher2009}
C.~A. Maher, C.~Kumar-Sinha, X.~Cao, S.~Kalyana-Sundaram, B.~Han, X.~Jing,
L.~Sam, T.~Barrette, N.~Palanisamy, A.~M. Chinnaiyan, Transcriptome
sequencing to detect gene fusions in cancer, Nature 458~(7234) (2009)
97--101.
{Makinen2005}
V.~M\"{a}kinen, G.~Navarro, Succinct suffix arrays based on run-length
encoding, Nordic J. of Computing 12~(1) (2005) 40--66.
{Manber1993}
U.~Manber, G.~Myers, Suffix arrays: A new method for on-line string searches,
SIAM Journal on Computing 22~(5) (1993) 935--948.
{Manzini2001}
G.~Manzini, An analysis of the burrows--wheeler transform, J. ACM 48~(3) (2001)
407--430.
{Marguerat2010}
S.~Marguerat, J.~B{\"a}hler, Rna-seq: from technology to biology, Cellular and
Molecular Life Sciences 67 (2010) 569--579, 10.1007/s00018-009-0180-6.
{Marioni2008}
J.~C. Marioni, C.~E. Mason, S.~M. Mane, M.~Stephens, Y.~Gilad, Rna-seq: An
assessment of technical reproducibility and comparison with gene expression
arrays, Genome Research 18~(9) (2008) 1509--1517.
{Miller2008}
J.~R. Miller, A.~L. Delcher, S.~Koren, E.~Venter, B.~P. Walenz, A.~Brownley,
J.~Johnson, K.~Li, C.~Mobarry, G.~Sutton, Aggressive assembly of
pyrosequencing reads with mates, Bioinformatics 24~(24) (2008) 2818--2824.
{Morrison1968}
D.~R. Morrison, Patricia\?ractical algorithm to retrieve information coded
in alphanumeric, J. ACM 15 (1968) 514--534.
{Mortazavi2008}
A.~Mortazavi, B.~A. Williams, K.~McCue, L.~Schaeffer, B.~Wold, Mapping and
quantifying mammalian transcriptomes by rna-seq, Nat Meth 5~(7) (2008)
621--628.
{Needleman1970}
S.~B. Needleman, C.~D. Wunsch, A general method applicable to the search for
similarities in the amino acid sequence of two proteins, Journal of Molecular
Biology 48~(3) (1970) 443 -- 453.
{O'Donnell2010}
K.~O'Donnell, K.~Burns, Mobilizing diversity: transposable element insertions
in genetic variation and disease, Mobile DNA 1~(1) (2010) 21.
{Ozsolak2011}
F.~Ozsolak, P.~M. Milos, Rna sequencing: advances, challenges and
opportunities, Nat Rev Genet 12~(2) (2011) 87--98.
{Pearson1988}
W.~R. Pearson, D.~J. Lipman, Improved tools for biological sequence comparison,
Proceedings of the National Academy of Sciences 85~(8) (1988) 2444--2448.
{Pepke2009}
S.~Pepke, B.~Wold, A.~Mortazavi, Computation for chip-seq and rna-seq studies,
Nat Meth 6~(11s) (2009) S22--S32.
{Perkins2009}
T.~T. Perkins, R.~A. Kingsley, M.~C. Fookes, P.~P. Gardner, K.~D. James, L.~Yu,
S.~A. Assefa, M.~He, N.~J. Croucher, D.~J. Pickard, D.~J. Maskell,
J.~Parkhill, J.~Choudhary, N.~R. Thomson, G.~Dougan, A strand-specific
rna?eq analysis of the transcriptome of the typhoid bacillus {\it
salmonella} typhi, PLoS Genet 5~(7) (2009) e1000569.
{Peterson1980}
J.~L. Peterson, Computer programs for detecting and correcting spelling errors,
Commun. ACM 23~(12) (1980) 676--687.
{Pruitt2009}
K.~D. Pruitt, J.~Harrow, R.~A. Harte, C.~Wallin, M.~Diekhans, D.~R. Maglott,
S.~Searle, C.~M. Farrell, J.~E. Loveland, B.~J. Ruef, E.~Hart, M.-M. Suner,
M.~J. Landrum, B.~Aken, S.~Ayling, R.~Baertsch, J.~Fernandez-Banet, J.~L.
Cherry, V.~Curwen, M.~DiCuccio, M.~Kellis, J.~Lee, M.~F. Lin, M.~Schuster,
A.~Shkeda, C.~Amid, G.~Brown, O.~Dukhanina, A.~Frankish, J.~Hart, B.~L.
Maidak, J.~Mudge, M.~R. Murphy, T.~Murphy, J.~Rajan, B.~Rajput, L.~D.
Riddick, C.~Snow, C.~Steward, D.~Webb, J.~A. Weber, L.~Wilming, W.~Wu,
E.~Birney, D.~Haussler, T.~Hubbard, J.~Ostell, R.~Durbin, D.~Lipman, The
consensus coding sequence (ccds) project: Identifying a common protein-coding
gene set for the human and mouse genomes, Genome Research 19~(7) (2009)
1316--1323.
{Rumble2009}
S.~M. Rumble, P.~Lacroute, A.~V. Dalca, M.~Fiume, A.~Sidow, M.~Brudno, Shrimp:
Accurate mapping of short color-space reads, PLoS Comput Biol 5~(5) (2009)
e1000386.
{Salzberg2011}
S.~L. Salzberg, A.~M. Phillippy, A.~Zimin, D.~Puiu, T.~Magoc, S.~Koren, T.~J.
Treangen, M.~C. Schatz, A.~L. Delcher, M.~Roberts, G.~Mar癟ais, M.~Pop, J.~A.
Yorke, Gage: A critical evaluation of genome assemblies and assembly
algorithms, Genome Research.
{Schnattinger2010}
T.~Schnattinger, E.~Ohlebusch, S.~Gog, Bidirectional search in a string with
wavelet trees, in: A.~Amir, L.~Parida (eds.), Combinatorial Pattern Matching,
vol. 6129 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg,
2010.
{Simpson2010}
J.~T. Simpson, R.~Durbin, Efficient construction of an assembly string graph
using the fm-index, Bioinformatics 26~(12) (2010) i367--i373.
{Simpson2009}
J.~T. Simpson, K.~Wong, S.~D. Jackman, J.~E. Schein, S.~J. Jones, n.~Birol,
Abyss: A parallel assembler for short read sequence data, Genome Research
19~(6) (2009) 1117--1123.
{Smith1981}
T.~Smith, M.~Waterman, {identification Of Common Molecular Subsequences},
{journal Of Molecular Biology} {147}~({1}) ({1981}) {195--197}.
{Stajich2007}
J.~Stajich, F.~Dietrich, S.~Roy, Comparative genomic analysis of fungal genomes
reveals intron-rich ancestors, Genome Biology 8~(10) (2007) R223.
{Hoen2008}
P.~A.~C. 't~Hoen, Y.~Ariyurek, H.~H. Thygesen, E.~Vreugdenhil, R.~H. A.~M.
Vossen, R.~X. de~Menezes, J.~M. Boer, G.-J.~B. van Ommen, J.~T. den Dunnen,
Deep sequencing-based expression analysis shows major advances in robustness,
resolution and inter-lab portability over five microarray platforms, Nucleic
Acids Research 36~(21) (2008) e141.
{Torres2008}
T.~T. Torres, M.~Metta, B.~Ottenw瓣lder, C.~Schl繹tterer, Gene expression
profiling by massively parallel sequencing, Genome Research 18~(1) (2008)
172--177.
{Trapnell2009}
C.~Trapnell, L.~Pachter, S.~L. Salzberg, Tophat: discovering splice junctions
with rna-seq, Bioinformatics 25~(9) (2009) 1105--1111.
{Treangen2002}
T.~J. Treangen, D.~D. Sommer, F.~E. Angly, S.~Koren, M.~Pop, Next generation
sequence assembly with amos, in: Current Protocols in Bioinformatics, John
Wiley \& Sons, Inc., 2002, pp.~--.
{Usdin2008}
K.~Usdin, The biological effects of simple tandem repeats: Lessons from the
repeat expansion diseases, Genome Research 18~(7) (2008) 1011--1019.
{VanVliet2010}
A.~H. Van~Vliet, Next generation sequencing of microbial transcriptomes:
challenges and opportunities, FEMS Microbiology Letters 302~(1) (2010) 1--7.
{Wagner1974}
R.~A. Wagner, M.~J. Fischer, The string-to-string correction problem, J. ACM
21~(1) (1974) 168--173.
{Wang2008}
E.~T. Wang, R.~Sandberg, S.~Luo, I.~Khrebtukova, L.~Zhang, C.~Mayr, S.~F.
Kingsmore, G.~P. Schroth, C.~B. Burge, Alternative isoform regulation in
human tissue transcriptomes, Nature 456~(7221) (2008) 470--476.
{Wang2010}
K.~Wang, D.~Singh, Z.~Zeng, S.~J. Coleman, Y.~Huang, G.~L. Savich, X.~He,
P.~Mieczkowski, S.~A. Grimm, C.~M. Perou, J.~N. MacLeod, D.~Y. Chiang, J.~F.
Prins, J.~Liu, Mapsplice: Accurate mapping of rna-seq reads for splice
junction discovery, Nucleic Acids Research 38~(18) (2010) e178.
{Wang2009}
Z.~Wang, M.~Gerstein, M.~Snyder, Rna-seq: a revolutionary tool for
transcriptomics, Nat Rev Genet 10~(1) (2009) 57--63.
{Weese2009}
D.~Weese, A.-K. Emde, T.~Rausch, A.~D繹ring, K.~Reinert, Razers?ast read
mapping with sensitivity control, Genome Research 19~(9) (2009) 1646--1654.
{Weiner1973}
P.~Weiner, Linear pattern matching algorithms, in: Switching and Automata
Theory, 1973. SWAT '08. IEEE Conference Record of 14th Annual Symposium on,
1973.
{Wexelblat1981}
R.~L. Wexelblat (ed.), History of Programming Languages, Academic Press, 1981.
{Wilhelm2008}
B.~T. Wilhelm, S.~Marguerat, S.~Watt, F.~Schubert, V.~Wood, I.~Goodhead, C.~J.
Penkett, J.~Rogers, J.~Bahler, Dynamic repertoire of a eukaryotic
transcriptome surveyed at single-nucleotide resolution, Nature 453~(7199)
(2008) 1239--1243.
{Zerbino2008}
D.~R. Zerbino, E.~Birney, {Velvet: Algorithms for de novo short read assembly
using de Bruijn graphs}, Genome Research 18~(5) (2008) 821--829.