FRESCO: Frequency-based RE-Sequencing tool based on CO-clustering segmentation for short reads - a case study of micro-RNAs

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃名遠 Huang, Ming-Yuan
論文名稱：	FRESCO: Frequency-based RE-Sequencing tool based on CO-clustering segmentation for short reads - a case study of micro-RNAs 對短序列基於分群切段與計算頻率的重定序工具
指導教授：	唐傳義 Tang, Chuan Yi
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2009
畢業學年度：	97
語文別：	英文
論文頁數：	34
中文關鍵詞：	重定序演算法、接頭序列、讀數、切割資料、核醣核酸
外文關鍵詞：	re-sequencing algorithm, adaptor sequence, reads, partition data, RNA
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

In this thesis, we propose a data processing pipeline of the Solexa machine and an algorithm of re-sequencing. According to the experimental protocol of Solexa machine, an adaptor was used for RNA sequencing and the adaptor sequence might contaminate the Solexa reads at the end. At first, we introduce a method of removing adaptor sequence and compare exact matching and one mismatching for adaptor sequence of this method. Then, a re-sequencing algorithm is proposed. Finally, we compare two different data partition methods within this algorithm. We also compare our algorithm with other re-sequencing tools. We show the results of chicken Solexa reads classified by Rfam RNA seed type tree.

在這篇論文中，我們提出了對Solexa資料處理的流程與一個重定序的演算法。根據Solexa機器的定序實驗原理，定序時會使用接頭(adaptor)序列，而當對核糖核酸(RNA)定序時，接頭序列會有很大的機會接在一個讀數(read)的末端，造成對該讀數的汙染。首先，這篇論文介紹了一種移除接頭序列的方法，並比較對移除的接頭序列有無容錯的去除情形。接下來，介紹這篇論文所提出的重定序演算法。最後，比較這個演算法中，兩種不同切割資料方法的差異並與其他現有的重定序工具作比較。另外，我們將雞胚胎的Solexa讀數與Rfam上的已知核糖核酸作比較，根據Rfam核糖核酸分類樹，統計其分類結果。

TABLE OF CONTENTS
中文摘要    iii
ABSTRACT    iv
ACKNOWLDGEMENTS    v
TABLE OF CONTENTS    vi
Chapter 1 - Introduction    1
Chapter 2 - Material and Method    3
2.1    Material    3
2.2    Data Preprocessing stage    3
2.3    The adaptor removing stage    4
2.4    Remove known RNA sequences    5
2.5    Mapping stage    5
2.6    The FRESCO mapping algorithm    6
2.7    Post-processing stage    11
2.8    Error tolerance of adaptor sequence    12
2.9    Parallelization with OpenMP    12
2.10    The (14, 2) error    13
Chapter 3 - Result and discussion    14
Chapter 4 - Conclusion and Future work    19
REFERENCES    21
FIGURES    23
TABLES    31

                                

REFERENCES
Altschul,S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
Cox,A. (unpublished) ELAND: Efficient Local Alignment of Nucleotide Data.
Ferragina,P. et al. (2005) Indexing compressed text. Journal of the ACM, 52, 552 – 581.
Glazov,E.A. et al. (2008) A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res., 18, 957.
Glazov,E.A. et al. (2008) Deep sequencing of small RNA libraries from chicken embryo. NCBI Gene Expression Omnibus under accession no. GSE10686, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10686.
Griffiths-Jones,S. et al. (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res., 33, D121-D124.
Hillier,L.W. et al. (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature, 432, 695-716.
Illumina, Inc. website and guides.
Jiang,H. et al. (2008) SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics, 24, 2395-2396.
Kahveci,T. et al. (2004) Speeding up whole-genome alignment by indexing frequency vectors. Bioinformatics, 20, 2122-2134.
Kent,W.J. (2002) BLAT—the BLAST-like alignment tool. Genome Res., 12, 656–664.
Kent,W.J. et al. (2002) The human genome browser at UCSC. Genome Res., 12, 996–1006.
Lee,H.P. et al. (2004) An efficient algorithm for unique signature discovery on whole-genome EST databases. Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, pp. 650-651.
Li,H. et al. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res., 18, 1851–1858.
Li,R. et al. (2008) SOAP: short oligonucleotide alignment program. Bioinformatics, 24, 713-714.
Li,Y. et al. (2009) Gray codes for reflectable languages. Information Processing Letters, 109, 296-300.
Myers,G. (1999) A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming. Journal of ACM, 46, 395–415.
Schatz,M.C. et al. (2007) High-throughput sequence alignment using Graphics Processing Units. BMC Bioinformatics, 8, 474.
Smith,A.D. et al. (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics, 9, 128.
Wu,S. et al. (1992) Fast Text Searching Allowing Errors. Communications of the ACM, 35, 83-91.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文