簡易檢索 / 詳目顯示

研究生: 李昀
Yun Lee
論文名稱: 使用關聯圖及其貝氏網路展開預測蛋白質二級結構
Prediction of Protein Secondary Structure with Dependency Graphs and Their Expanded Bayesian Networks
指導教授: 呂忠津
Chung-Chin Lu
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 57
中文關鍵詞: 蛋白質二級結構預測關聯圖貝氏網路展開Viterbi 演算法和積演算法
外文關鍵詞: Protein secondary structure prediction, Dependency graph, Expanded Bayesian network, Viterbi algorithm, Sum-product algorithm
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 和積演算法人類基因體計畫的完成引發了一波直接透過核酸序列及其衍生出的氨基酸序列去解決不同的生物難題的研究。因此,可以只從任何一條蛋白質鏈的氨基酸序列去預測其三級結構的迫切需要驅使了我們開發出一種以模型為基礎的方法去預測三級結構的基本成份,也就是二級結構。為了達成這個目標,首先我們將所有符合條件的二級結構序列表達成在二級結構格狀圖裡的特定路徑,接著使用關聯圖及其貝氏網路展開的方法將一級結構與二級結構的關係予以量化。最後,藉由採取類似於編碼理論的程序,我們透過兩種解碼演算法,分別是Viterbi演算法及和積演算法,去指派每一個氨基酸的二級結構。模擬結果顯示我們提出的方法所達到的準確率與現今其他只透過序列的預測方法不相上下,而且當所要預測結構的序列限制在特定的蛋白質折疊時可以得到更佳的結果。


    The completion of Human Genome Project has triggered a wave of investigating various
    biological problems directly through the string of nucleotides and also its derived amino
    acid sequence. Therefore, the urgent need of predicting protein three-dimensional structure
    simply from the amino acid sequence propels us to develop a model-based method to predict
    the composition of the fundamental structural elements–that is, secondary structures–of any
    protein chain. To accomplish this goal, we first represent all the eligible secondary structure
    sequences as specific paths in a secondary structure trellis. Then we employ the method
    of dependency graphs and their expanded Bayesian networks to quantify the relationship
    between primary and secondary structures. Following the similar procedure as in the coding
    theory, we finally assign a secondary structure element to each amino acid through the use of
    two decoding algorithms: the Viterbi algorithm and the sum-product algorithm. The simulation
    results reveal that our proposed method achieves an accuracy that is indistinguishable
    from other existing sequence-only methods, and that a better outcome is reached when the
    target sequences are confined to a specific protein fold.

    Abstract i Contents i List of Figures iv List of Tables vi 1 Introduction 1 2 Background 3 2.1 Building Blocks of Protein Structure . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Protein Secondary Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Structural Hierarchy of Protein Molecules . . . . . . . . . . . . . . . . . . . 7 3 Methods 9 3.1 Secondary Structure Trellis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Dependency Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 i 3.3 Expanded Bayesian Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Hypothesis Test of Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5 Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5.1 Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.5.2 Sum-Product Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 21 4 Results 31 4.1 Protein Secondary Structure Data Set . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Determination of the Trellis . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Measures of Prediction Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.1 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3.2 Per-residue accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.1 PDB select25 data set . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4.2 TIM beta/alpha-barrel data set . . . . . . . . . . . . . . . . . . . . . 39 5 Conclusion 41 A Appendix 42 A.1 Chi-square Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 A.2 Factor Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 ii Bibliography 47

    [1] P. Y. Chou and G. D. Fasman, “Prediction of protein conformation,” Biochemistry,
    vol. 13, pp. 222–245, 1974.
    [2] V. I. Lim, “Algorithms for prediction of alpha helices and structural regions in globular
    proteins,” J. Mol. Biol., vol. 88, pp. 873–894, 1974.
    [3] J. Garnier, D. J. Osguthorpe, and B. Robson, “Analysis and implications of simple
    methods for predicting the secondary structure of globular proteins,” J. Mol. Biol.,
    vol. 120, pp. 97–120, 1978.
    [4] D. T. Jones, “Protein secondary structure prediction based on position-specific scoring
    matrices,” J. Mol. Biol., vol. 292, pp. 195–202, 1999.
    [5] B. Rost and C. Sander, “Prediction of protein secondary structure at better than 70%
    accuracy,” J. Mol. Biol., vol. 232, pp. 584–599, 1993.
    [6] D. Frishman and P. Argos, “Seventy-five percent accuracy in protein secondary structure
    prediction,” Proteins, vol. 27, pp. 329–335, 1997.
    [7] S. Hua and Z. Sun, “A novel method of protein secondary structure prediction with high
    segment overlap measure: support vector machine approach,” J. Mol. Biol., vol. 308,
    pp. 397–407, 2001.
    [8] C. Branden and J. Tooze, Introduction to Protein Structure. GARLAND, 1998.
    [9] S. C. Schmidler, J. S. Liu, and D. L. Brutlag, “Bayesians segmentation of protein
    secondary structure,” J. Comput. Biol., vol. 7, pp. 233–248, 2000.
    47
    [10] T. M. Chen, C. C. Lu, and W. H. Li, “Prediction of splice sites with dependency graphs
    and their expanded Bayesian networks,” Bioinformatics, vol. 21, pp. 471–482, 2005.
    [11] W. J. Ewens and G. R. Grant, Statistical methods in Bioinformatics: an introduction.
    New York: Springer-Verlag, 2001.
    [12] G. E. Crooks and S. E. Brenner, “Protein secondary structure: entropy, correlations
    and prediction,” Bioinformatics, vol. 20, pp. 1603–1611, 2004.
    [13] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech
    recognition,” Proc. IEEE, vol. 77, pp. 257–286, 1989.
    [14] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product
    algorithm,” IEEE Trans. Info. Theory, vol. 47, pp. 498–519, 2001.
    [15] U. Hobohm and C. Sander, “Enlarged representative set of protein structures,” Protein
    Science, vol. 3, p. 522, 1994.
    [16] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N.
    Shindyalov, and P. E. Bourne, “The Protein Data Bank,” Nucleic Acids Res., vol. 28,
    pp. 235–242, 2000.
    [17] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, “SCOP: a structural classification
    of proteins database for the investigation of sequences and structures,” J. Mol.
    Biol., vol. 247, pp. 536–540, 1995.
    [18] J. A. Cuff and G. J. Barton, “Evaluation and improvement of multiple sequence methods
    for protein secondary structure prediction,” Proteins, vol. 34, pp. 508–519, 1999.
    [19] W. Kabsch and C. Sander, “Dictionary of protein secondary structure: pattern recognition
    of hydrogen-bonded and geometrical features,” Biopolymers, vol. 22, pp. 2577–2637,
    1983.
    48
    [20] J. Moult, K. Fidelis, A. Zemla, and T. Hubbard, “Critical assessment of methods of
    protein structure prediction (CASP): round IV,” Proteins, vol. Suppl 5, pp. 2–7, 2001.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE