使用關聯圖及其貝氏網路展開預測蛋白質二級結構

簡易檢索 / 詳目顯示

回結果列表

研究生：	李昀 Yun Lee
論文名稱：	使用關聯圖及其貝氏網路展開預測蛋白質二級結構 Prediction of Protein Secondary Structure with Dependency Graphs and Their Expanded Bayesian Networks
指導教授：	呂忠津 Chung-Chin Lu
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2005
畢業學年度：	93
語文別：	中文
論文頁數：	57
中文關鍵詞：	蛋白質二級結構預測、關聯圖、貝氏網路展開、Viterbi 演算法、和積演算法
外文關鍵詞：	Protein secondary structure prediction, Dependency graph, Expanded Bayesian network, Viterbi algorithm, Sum-product algorithm
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

和積演算法人類基因體計畫的完成引發了一波直接透過核酸序列及其衍生出的氨基酸序列去解決不同的生物難題的研究。因此，可以只從任何一條蛋白質鏈的氨基酸序列去預測其三級結構的迫切需要驅使了我們開發出一種以模型為基礎的方法去預測三級結構的基本成份，也就是二級結構。為了達成這個目標，首先我們將所有符合條件的二級結構序列表達成在二級結構格狀圖裡的特定路徑，接著使用關聯圖及其貝氏網路展開的方法將一級結構與二級結構的關係予以量化。最後，藉由採取類似於編碼理論的程序，我們透過兩種解碼演算法，分別是Viterbi演算法及和積演算法，去指派每一個氨基酸的二級結構。模擬結果顯示我們提出的方法所達到的準確率與現今其他只透過序列的預測方法不相上下，而且當所要預測結構的序列限制在特定的蛋白質折疊時可以得到更佳的結果。

The completion of Human Genome Project has triggered a wave of investigating various
biological problems directly through the string of nucleotides and also its derived amino
acid sequence. Therefore, the urgent need of predicting protein three-dimensional structure
simply from the amino acid sequence propels us to develop a model-based method to predict
the composition of the fundamental structural elements–that is, secondary structures–of any
protein chain. To accomplish this goal, we first represent all the eligible secondary structure
sequences as specific paths in a secondary structure trellis. Then we employ the method
of dependency graphs and their expanded Bayesian networks to quantify the relationship
between primary and secondary structures. Following the similar procedure as in the coding
theory, we finally assign a secondary structure element to each amino acid through the use of
two decoding algorithms: the Viterbi algorithm and the sum-product algorithm. The simulation
results reveal that our proposed method achieves an accuracy that is indistinguishable
from other existing sequence-only methods, and that a better outcome is reached when the
target sequences are confined to a specific protein fold.

Abstract i
Contents i
List of Figures iv
List of Tables vi
1 Introduction 1
2 Background 3
2.1 Building Blocks of Protein Structure . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Protein Secondary Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Structural Hierarchy of Protein Molecules . . . . . . . . . . . . . . . . . . . 7
3 Methods 9
3.1 Secondary Structure Trellis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Dependency Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
i
3.3 Expanded Bayesian Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Hypothesis Test of Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Decoding Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5.1 Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5.2 Sum-Product Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Results 31
4.1 Protein Secondary Structure Data Set . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Determination of the Trellis . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Measures of Prediction Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.2 Per-residue accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.1 PDB select25 data set . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.2 TIM beta/alpha-barrel data set . . . . . . . . . . . . . . . . . . . . . 39
5 Conclusion 41
A Appendix 42
A.1 Chi-square Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.2 Factor Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
ii
Bibliography 47

                                

[1] P. Y. Chou and G. D. Fasman, “Prediction of protein conformation,” Biochemistry,
vol. 13, pp. 222–245, 1974.
[2] V. I. Lim, “Algorithms for prediction of alpha helices and structural regions in globular
proteins,” J. Mol. Biol., vol. 88, pp. 873–894, 1974.
[3] J. Garnier, D. J. Osguthorpe, and B. Robson, “Analysis and implications of simple
methods for predicting the secondary structure of globular proteins,” J. Mol. Biol.,
vol. 120, pp. 97–120, 1978.
[4] D. T. Jones, “Protein secondary structure prediction based on position-specific scoring
matrices,” J. Mol. Biol., vol. 292, pp. 195–202, 1999.
[5] B. Rost and C. Sander, “Prediction of protein secondary structure at better than 70%
accuracy,” J. Mol. Biol., vol. 232, pp. 584–599, 1993.
[6] D. Frishman and P. Argos, “Seventy-five percent accuracy in protein secondary structure
prediction,” Proteins, vol. 27, pp. 329–335, 1997.
[7] S. Hua and Z. Sun, “A novel method of protein secondary structure prediction with high
segment overlap measure: support vector machine approach,” J. Mol. Biol., vol. 308,
pp. 397–407, 2001.
[8] C. Branden and J. Tooze, Introduction to Protein Structure. GARLAND, 1998.
[9] S. C. Schmidler, J. S. Liu, and D. L. Brutlag, “Bayesians segmentation of protein
secondary structure,” J. Comput. Biol., vol. 7, pp. 233–248, 2000.
47
[10] T. M. Chen, C. C. Lu, and W. H. Li, “Prediction of splice sites with dependency graphs
and their expanded Bayesian networks,” Bioinformatics, vol. 21, pp. 471–482, 2005.
[11] W. J. Ewens and G. R. Grant, Statistical methods in Bioinformatics: an introduction.
New York: Springer-Verlag, 2001.
[12] G. E. Crooks and S. E. Brenner, “Protein secondary structure: entropy, correlations
and prediction,” Bioinformatics, vol. 20, pp. 1603–1611, 2004.
[13] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech
recognition,” Proc. IEEE, vol. 77, pp. 257–286, 1989.
[14] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product
algorithm,” IEEE Trans. Info. Theory, vol. 47, pp. 498–519, 2001.
[15] U. Hobohm and C. Sander, “Enlarged representative set of protein structures,” Protein
Science, vol. 3, p. 522, 1994.
[16] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N.
Shindyalov, and P. E. Bourne, “The Protein Data Bank,” Nucleic Acids Res., vol. 28,
pp. 235–242, 2000.
[17] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, “SCOP: a structural classification
of proteins database for the investigation of sequences and structures,” J. Mol.
Biol., vol. 247, pp. 536–540, 1995.
[18] J. A. Cuff and G. J. Barton, “Evaluation and improvement of multiple sequence methods
for protein secondary structure prediction,” Proteins, vol. 34, pp. 508–519, 1999.
[19] W. Kabsch and C. Sander, “Dictionary of protein secondary structure: pattern recognition
of hydrogen-bonded and geometrical features,” Biopolymers, vol. 22, pp. 2577–2637,
1983.
48
[20] J. Moult, K. Fidelis, A. Zemla, and T. Hubbard, “Critical assessment of methods of
protein structure prediction (CASP): round IV,” Proteins, vol. Suppl 5, pp. 2–7, 2001.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文