簡易檢索 / 詳目顯示

研究生: 黃名遠
Huang, Ming-Yuan
論文名稱: FRESCO: Frequency-based RE-Sequencing tool based on CO-clustering segmentation for short reads - a case study of micro-RNAs
對短序列基於分群切段與計算頻率的重定序工具
指導教授: 唐傳義
Tang, Chuan Yi
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 34
中文關鍵詞: 重定序演算法接頭序列讀數切割資料核醣核酸
外文關鍵詞: re-sequencing algorithm, adaptor sequence, reads, partition data, RNA
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • In this thesis, we propose a data processing pipeline of the Solexa machine and an algorithm of re-sequencing. According to the experimental protocol of Solexa machine, an adaptor was used for RNA sequencing and the adaptor sequence might contaminate the Solexa reads at the end. At first, we introduce a method of removing adaptor sequence and compare exact matching and one mismatching for adaptor sequence of this method. Then, a re-sequencing algorithm is proposed. Finally, we compare two different data partition methods within this algorithm. We also compare our algorithm with other re-sequencing tools. We show the results of chicken Solexa reads classified by Rfam RNA seed type tree.


    在這篇論文中,我們提出了對Solexa資料處理的流程與一個重定序的演算法。根據Solexa機器的定序實驗原理,定序時會使用接頭(adaptor)序列,而當對核糖核酸(RNA)定序時,接頭序列會有很大的機會接在一個讀數(read)的末端,造成對該讀數的汙染。首先,這篇論文介紹了一種移除接頭序列的方法,並比較對移除的接頭序列有無容錯的去除情形。接下來,介紹這篇論文所提出的重定序演算法。最後,比較這個演算法中,兩種不同切割資料方法的差異並與其他現有的重定序工具作比較。另外,我們將雞胚胎的Solexa讀數與Rfam上的已知核糖核酸作比較,根據Rfam核糖核酸分類樹,統計其分類結果。

    TABLE OF CONTENTS 中文摘要 iii ABSTRACT iv ACKNOWLDGEMENTS v TABLE OF CONTENTS vi Chapter 1 - Introduction 1 Chapter 2 - Material and Method 3 2.1 Material 3 2.2 Data Preprocessing stage 3 2.3 The adaptor removing stage 4 2.4 Remove known RNA sequences 5 2.5 Mapping stage 5 2.6 The FRESCO mapping algorithm 6 2.7 Post-processing stage 11 2.8 Error tolerance of adaptor sequence 12 2.9 Parallelization with OpenMP 12 2.10 The (14, 2) error 13 Chapter 3 - Result and discussion 14 Chapter 4 - Conclusion and Future work 19 REFERENCES 21 FIGURES 23 TABLES 31

    REFERENCES
    Altschul,S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
    Cox,A. (unpublished) ELAND: Efficient Local Alignment of Nucleotide Data.
    Ferragina,P. et al. (2005) Indexing compressed text. Journal of the ACM, 52, 552 – 581.
    Glazov,E.A. et al. (2008) A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res., 18, 957.
    Glazov,E.A. et al. (2008) Deep sequencing of small RNA libraries from chicken embryo. NCBI Gene Expression Omnibus under accession no. GSE10686, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10686.
    Griffiths-Jones,S. et al. (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res., 33, D121-D124.
    Hillier,L.W. et al. (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature, 432, 695-716.
    Illumina, Inc. website and guides.
    Jiang,H. et al. (2008) SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics, 24, 2395-2396.
    Kahveci,T. et al. (2004) Speeding up whole-genome alignment by indexing frequency vectors. Bioinformatics, 20, 2122-2134.
    Kent,W.J. (2002) BLAT—the BLAST-like alignment tool. Genome Res., 12, 656–664.
    Kent,W.J. et al. (2002) The human genome browser at UCSC. Genome Res., 12, 996–1006.
    Lee,H.P. et al. (2004) An efficient algorithm for unique signature discovery on whole-genome EST databases. Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, pp. 650-651.
    Li,H. et al. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res., 18, 1851–1858.
    Li,R. et al. (2008) SOAP: short oligonucleotide alignment program. Bioinformatics, 24, 713-714.
    Li,Y. et al. (2009) Gray codes for reflectable languages. Information Processing Letters, 109, 296-300.
    Myers,G. (1999) A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming. Journal of ACM, 46, 395–415.
    Schatz,M.C. et al. (2007) High-throughput sequence alignment using Graphics Processing Units. BMC Bioinformatics, 8, 474.
    Smith,A.D. et al. (2008) Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics, 9, 128.
    Wu,S. et al. (1992) Fast Text Searching Allowing Errors. Communications of the ACM, 35, 83-91.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE