基於序列排比演算法與二級結構資訊偵測蛋白質環狀排列現象

簡易檢索 / 詳目顯示

回結果列表

研究生：	蔡宗頷 Tsai, Zong-Han
論文名稱：	基於序列排比演算法與二級結構資訊偵測蛋白質環狀排列現象 Circular Permutation Detection in Proteins Based on Sequence Alignment Approach Combined with Secondary Structural Elements Information
指導教授：	唐傳義 Tang, Chuan-Yi
口試委員:	廖崇碩林俊淵
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2011
畢業學年度：	99
語文別：	英文
論文頁數：	29
中文關鍵詞：	環狀排列、序列比對、二級結構、資料庫搜尋、蛋白質序列
外文關鍵詞：	circular permutation, sequence alignment, secondary structure, database search, protein sequence
相關次數：	點閱：127 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在這篇論文中，我們提出了一個基於序列比對的方法，用於偵測蛋白質環狀排列現象。根據先前的研究，一對具有環狀排列現象的蛋白質，具有相似的三維結構但序列卻呈現環狀排列的架構。現今用於偵測此種現象的方法，大多需要用到蛋白質三級結構資訊，然而，現今蛋白質三級結構的數量 (約七萬筆) 卻遠少於序列的數量 (約一千五百萬筆)。目前對此現象的資訊仍舊不足，再加上以序列為主的方法並不夠準確。基於這樣的原因，在此論文中，我們修改了傳統區域序列比對的方法，僅使用蛋白質序列以及結合二級結構元素的資訊 (由一級結構推測得到的二級結構資訊)，進而可不需透過三級結構即可偵測兩兩蛋白質序列是否具有環狀排列現象。我們嘗試了不同的替代矩陣並將其用於序列排比的方法上，並觀察其準確度的變化。最後，依據不同的蛋白質序列相似度，探討此方法偵測蛋白質環狀排列現象的準確度，並舉例說明所提出之方法能夠用來幫助生物學家快速且準確地找出可能具有環狀排列蛋白質結構。未來此方法可用於快速掃描蛋白質資料庫以及尋找可能的環狀排列位點，提供更多具有環狀排列現象的蛋白質資訊。

A pair of proteins is called a circular permutation (CP) if it has similar sequence compositions and most likely share the same fold but the C-terminal and N-terminal regions of the protein sequence are interchanged. CP is useful in protein engineering; however, the biological functions and origination of naturally occurred CP phenomenon are not clear. Most CP detecting methods are based on structural comparison strategies such as GANSTA+ and CPSARST. But now the amount of 3D protein structures (about 70,000) is much less than the amount of protein sequences (about 15,000,000). To our knowledge, there are only two sequence-based CP detecting methods developed by Uliel et al. [1] and Weiner et al. [2], and these two methods lack obvious criteria to distinguish CPs and non-CPs. Here, we purposed a high efficient and accurate sequence-based detection method based on sequence local alignment with secondary structural elements information. In this thesis, we evaluated our method by different substitution matrices and variant sequence identity data. The result shows that the accuracy of our prediction is highly positive correlated with sequence identity of paired protein. In the future, this developed method is helpful for possible CP site determination and large-scale protein screening.

中文摘要-------------------------------------------------------------------------iii
ABSTRACT--------------------------------------------------------------------------iv
致謝辭-----------------------------------------------------------------------------v
CONTENTS--------------------------------------------------------------------------vi
Chapter 1 - Introduction-----------------------------------------------------------1
Chapter 2 - Material and Method----------------------------------------------------6
    2.1    Material-------------------------------------------------------------------6
    2.2    Workflow-------------------------------------------------------------------6
    2.3    Preprocessing--------------------------------------------------------------8
    2.4    Duplicating stage----------------------------------------------------------8
    2.5    Alignment method-----------------------------------------------------------8
    2.6    Judging stage-------------------------------------------------------------14
    2.7    Time complexity-----------------------------------------------------------14
Chapter 3 – Results and Discussion------------------------------------------------16
    3.1    Results-------------------------------------------------------------------16
    3.2    Performance evaluation of the robustness of the proposed method:
    substitution matrices---------------------------------------------------------16
    3.3    Predicted secondary structure elements------------------------------------17
    3.4    Accuracy of the proposed method on the different sequence identity level--18
    3.5    Circular permutation site determination-----------------------------------19
    3.6    Some defects about our method---------------------------------------------24
Chapter 4 - Conclusion and Future work--------------------------------------------26
    4.1    Conclusion----------------------------------------------------------------26
    4.2    Future work---------------------------------------------------------------27
REFERENCES------------------------------------------------------------------------28

                                

1. Uliel, S., et al., A simple algorithm for detecting circular permutations in proteins. Bioinformatics, 1999. 15(11): p. 930-6.
2. Weiner, J., 3rd, G. Thomas, and E. Bornberg-Bauer, Rapid motif-based prediction of circular permutations in multi-domain proteins. Bioinformatics, 2005. 21(7): p. 932-7.
3. Holm, L. and C. Sander, Protein structure comparison by alignment of distance matrices. J Mol Biol, 1993. 233(1): p. 123-38.
4. Shindyalov, I.N. and P.E. Bourne, Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng, 1998. 11(9): p. 739-47.
5. Abyzov, A. and V.A. Ilyin, A comprehensive analysis of non-sequential alignments between all protein structures. BMC Struct Biol, 2007. 7: p. 78.
6. Uliel, S., A. Fliess, and R. Unger, Naturally occurring circular permutations in proteins. Protein Eng, 2001. 14(8): p. 533-42.
7. Lo, W.C. and P.C. Lyu, CPSARST: an efficient circular permutation search tool applied to the detection of novel protein structural relationships. Genome Biol, 2008. 9(1): p. R11.
8. Tsai, L.C., et al., Crystal structure of a natural circularly permuted jellyroll protein: 1,3-1,4-beta-D-glucanase from Fibrobacter succinogenes. J Mol Biol, 2003. 330(3): p. 607-20.
9. Ribeiro, E.A., Jr. and C.H. Ramos, Circular permutation and deletion studies of myoglobin indicate that the correct position of its N-terminus is required for native stability and solubility but not for native-like heme binding and folding. Biochemistry, 2005. 44(12): p. 4699-709.
10. Jung, J. and B. Lee, Circularly permuted proteins in the protein structure database. Protein Sci, 2001. 10(9): p. 1881-6.
11. Schmidt-Goenner, T., et al., Circular permuted proteins in the universe of protein folds. Proteins, 2010. 78(7): p. 1618-30.
12. Murzin, A.G., et al., SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol, 1995. 247(4): p. 536-40.
13. Servant, F., et al., ProDom: automated clustering of homologous domains. Brief Bioinform, 2002. 3(3): p. 246-51.
14. Vesterstrom, J. and W.R. Taylor, Flexible secondary structure based protein structure comparison applied to the detection of circular permutation. J Comput Biol, 2006. 13(1): p. 43-63.
15. Guerler, A. and E.W. Knapp, Novel protein folds and their nonsequential structural analogs. Protein Sci, 2008. 17(8): p. 1374-82.
16. Lo, W.C., et al., CPDB: a database of circular permutation in proteins. Nucleic Acids Res, 2009. 37(Database issue): p. D328-32.
17. Jones, D.T., Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 1999. 292(2): p. 195-202.
18. Smith, T.F. and M.S. Waterman, Identification of common molecular subsequences. J Mol Biol, 1981. 147(1): p. 195-7.
19. Gotoh, O., An improved algorithm for matching biological sequences. J Mol Biol, 1982. 162(3): p. 705-8.
20. Needleman, S.B. and C.D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol, 1970. 48(3): p. 443-53.
21. Blake, J.D. and F.E. Cohen, Pairwise sequence alignment below the twilight zone. J Mol Biol, 2001. 307(2): p. 721-35.
22. Henikoff, S. and J.G. Henikoff, Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A, 1992. 89(22): p. 10915-9.
23. Kabsch, W. and C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22(12): p. 2577-637.
24. Shatsky, M., R. Nussinov, and H.J. Wolfson, A method for simultaneous alignment of multiple protein structures. Proteins, 2004. 56(1): p. 143-56.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文