簡易檢索 / 詳目顯示

研究生: 張光延
Chang, Guang-Yen
論文名稱: DOBALI: 以結構區塊為基準的多重序列比對工具
DOBALI: Domain-based Multiple Sequence Alignments
指導教授: 唐傳義
Tang, Chuan-Yi
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 25
中文關鍵詞: 多重序列比對結構區塊片段前處理
外文關鍵詞: Multiple sequences alignment, domain, segmented, preprocess
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   多重序列比對是找出生物序列相互關系的一種知名方式,而結構區塊在蛋白質中是一種可進化的單位。基本上,包含相似結構區塊的蛋白質會被視為是同源的,因為結構區塊有這樣的特性,在做序列比對時,它們值得列入考量中。但是,大部份的多重序列比對工具都略過結構區塊這項資訊,在這篇論文中,我們提出一個基於結構區塊的多重序列比對工具──DOBALI。DOBALI會在執行多重序列比對之前做一些前處理,它讓使用者指定結構區塊的位置,接著,可能的結構區塊會被偵測出來,片段的結構區塊會被合併重組,之後,相同種類的結構區塊會以使用者選擇的現有的MSA工具來做序列比對。我們會在結果中展示出在某些情況下DOBALI會表現的比沒做前處理的時候還要好。


    Multiple sequence alignment (MSA) is a method to find the relationship between biological sequences. A Domain is an evolutionary unit in protein. Generally, the proteins contains similar domains are seen as homologous. Because domains have such properties, they are worth of being taken into account when doing alignments. However, most MSA tools ignore the information about domains. In this paper, a domain-based multiple sequence alignments tool – DOBALI is introduced. DOBALI performs preprocesses before doing MSA. It lets users assign the positions of domains. Then, the possible domains are detected, and the segmented domains are combined and rearranged. After that, all the same kinds of domains are aligned together by an existing MSA tool chosen by the users. The result shows the alignments are better than which produced by none-preprocessing in some conditions.

    Abstract .................................................I Acknowledgement .........................................II Table of Contents ......................................III Chapter 1 - Introduction .................................1 Chapter 2 - Method .......................................3 2.1 Profile establishment stage ..........................3 2.2 Domain detection stage ...............................4 2.3 Segmented domain combination and arrangement stage ...5 2.4 Grouping stage .......................................8 2.5 Output stage .........................................8 Chapter 3 - Result and Conclusion ........................9 References ..............................................11 Figures .................................................13 Table ...................................................23 Appendix ................................................24

    Bashton, M., and Chothia C. (2007) The Generation of New Protein Functions by the Combination of Domains. Structure 15: 85–99.

    Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T. J., Higgins, D. G., and Thompson, J. D. (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31: 3497-3500.

    Chuong, B. D., Mahathi, S. P. M., Michael B., and Serafim B. (2005) ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 15: 330–340.

    Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292: 195-202.

    Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005) MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33: 511–518.

    Maier D. (1978) The Complexity of Some Problems on Subsequences and Supersequences. Journal of the ACM (JACM), v.25 n.2: 322-336.

    Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536–540.

    Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48: 443-453

    Notredame, C. (2007) Recent Evolutions of Multiple Sequence Alignment Algorithms. Computational Biology 3: 1405-1408.

    Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673–4690.

    Thompson, J. D., Plewniak, F., and Poch, O. (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15: 87-88.

    Tu, M. P., Chuong, B. D., Robert, C., Edgar, and Serafim B. (2006) Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Research, 2006, Vol. 34, No. 20: 5932–5942.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE