簡易檢索 / 詳目顯示

研究生: 蔡秉翰
Tsai, Ping Han
論文名稱: 區塊限制型的序列比對
Sequence Alignment with Block Constraint
指導教授: 盧錦隆
Lu, Chin lung
口試委員: 唐傳義
邱顯泰
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 25
中文關鍵詞: 序列比對區塊限制
外文關鍵詞: sequence alignment, Block Constraint
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 序列比對(sequence alignment)是一個拿來評估兩個序列相似程度的常用的手段。在生物資訊這個領域,若是要判斷兩個DNA、RNA或蛋白質序列之間的相似程度,通常都會使用序列的比對。利用此方法可以找出兩序列間相似的區域,而這些相似的區域,有可能是因為結構相似、功能相近或者是因為在演化的歷史上有密不可分的關係所造成的。相較於蛋白質的字元集有20個字,RNA的4個字的字元集明顯小很多,而所攜帶的資訊量也相對較少,因此,若兩個RNA序列間的相似程度小於60%的時候,我們很難去推斷這兩個序列在結構上是否相似的。因此當要比對兩個RNA分子的時候,已經有許多研究不僅僅考慮一級序列的資訊,而是會連二級或者三級資訊一起列入考量。我們實驗室在2016年發表的一款比對RNA 三級結構的工具叫做iPARTS2,就是同時考慮一級以及三級的結構資訊的工具。iPARTS2的基本步驟如下:首先利用二個假扭轉角(pseudo-torsion angles) η 與θ把PDB資料庫中RNA結構的核苷酸畫在二維的平面以得到一個Ramachandran-like圖。接著使用親合性互動式(affinity propa-gation)分群演算法對Ramachandran-like圖上的核苷酸進行分群,得到23個核甘酸構型後,再加入A、U、C和G四種核甘酸所攜帶的一級序列的資訊,以得到一個含有92個元素的結構字元集(structural alphabet)。再來利用這個結構字元集把RNA 三級結構轉成一級的結構字元序列。最後使用傳統的序列比對演算法去比較二條結構字元編碼的序列,以決定出他們之間的結構相似程度。但是我們發現iPARTS2所回傳的最佳結果,有可能會把一個RNA的一個二級結構(如stem或loop),同時對到另一個RNA中的兩個或兩個以上的二級結構,該結果造成這兩個RNA在空間上的重疊是不相似的。所以我們在這邊介紹一個問題叫「區塊限制型的序列比對」來限制一個RNA的二級結構(即被視為一個區塊)最多只能比對到另一個RNA的一個二級結構,同時我們也設計了一個二次方時間的演算法來解決這個問題。最後我們的實驗結果也證實我們設計出來的區塊限制型序列比對演算法確實可以改進iPARTS2在比對兩個RNA三級結構的效能。


    In order to determine whether two sequences are similar or not, we usually do the pairwise alignment. In bioinformatics, sequence alignment is an important strategy to determine the identity between two DNA, RNA, or protein sequences. The sequence alignment can identify the similar regions that may share similar structure, function or evolutionary relationship. Compared with the 20-letter protein alphabet, the 4-letter RNA alphabet is smaller and less informative. As a consequence, when the identity between two RNA sequences is under 60%, it is hard to determine whether these two RNA sequences have the similar struc-ture. Thus, to align two RNA molecules, several studies have considered not merely sequence information, but also secondary or tertiary structure infor-mation. Our lab developed a tool called iPARTS2 in 2016 that aligns two RNA 3D structures based on both primary and tertiary structure information. The basic steps of our iPARTS2 are as follows. First, a Ramachandran-like diagram of RNAs was derived by plotting nucleotides of RNA structures in the PDB da-tabase on a 2D axis using their two pseudo-torsion angles η and θ. Then, affinity propagation clustering algorithm was applied to the η-θ plot to obtain 23 nucle-otide conformations, which were combined with RNA 1D sequence information A, U, C and G to further obtain a structural alphabet (SA) of 92 elements. Next, the SA was used to transform RNA 3D structures into 1D sequences of SA let-ters. Finally, classical sequence alignment methods were utilized on two SA-encoded sequences to determine their structural similarities. However, given two RNA molecules 𝑋 and 𝑌, we observe that the optimal result returned by iPARTS2 might align a loop region of 𝑋 with a loop region and a stem region of 𝑌 simultaneously, which makes the resulting structural superposition of 𝑋 and 𝑌 dissimilar. Therefore, in this study, we introduce a problem called sequence alignment with block constraint in which a secondary structure of an RNA (i.e., a block) can be aligned to at most a secondary structure of another RNA. In addition, we have designed a quadratic-time algorithm to solve this problem. Finally, our experimental results have also shown that the algorithm we propose to solve the sequence alignment with block constraint indeed can improve the performance of iPARTS2 when aligning two RNA 3D structures.

    中文摘要 I Abstract II Acknowledgement III Contents IV List of figures VI Chapter 1 Introduction 1 Chapter 2 Notation and definition 5 Chapter 3 Algorithm 8 3.1 X_s⨁Y_t 8 3.2 Statuses 9 3.3 Equivalent(b_(i,j),b_(o,p)) 11 3.4 S(i,j) 11 3.5 Recursive Function 12 3.5.1 A(i,j) 14 3.5.2 B(i,j) 14 3.5.3 C(i,j) 15 Chapter 4 Experimental results 16 4.1 DSSR 16 4.2 GSSU 17 4.3 Results 18 4.3.1 Case 1 19 4.3.1.1 1NWX_9 vs 361D_B (length: 118 vs 20) 19 4.3.1.2 1UN6_E vs 1M90_B (length: 61 vs 122) 20 4.3.2 Case 2 21 4.3.2.1 483D_A vs 1I3Y_A (length: 27 vs 19) 21 4.3.2.2 1FIR_A vs 1PJY_A (length: 76 vs 22) 22 Chapter 5 Conclusion 23 References 24

    [1] Paul P. Gardner, Andreas Wilm and Stefan Washietl. (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Research, Vol. 33, No. 8, 2433–2439.
    [2] Dror, O., Nussinov, R. and Wolfson, H. J. (2005) ARTS: alignment of RNA tertiary structures. Bioinformatics, 21, 47–53.
    [3] Dror, O., Nussinov, R. and Wolfson, H. J. (2006) The ARTS web server for aligning RNA tertiary structures. Nucleic Acids Research, 34, W412–W415.
    [4] Ferr`e, F., Ponty, Y., Lorenz, W. A. and Clote, P. (2007) DIAL: a web server for the pairwise alignment of two RNA three-dimensional structures using nucleotide, dihedral angle and base-pairing similarities. Nucleic Acids Research, 35, W659–W668.
    [5] Chang, Y. F., Huang, Y. L. and Lu, C. L. (2008) SARSA: a web tool for structural alignment of RNA using a structural alphabet. Nucleic Acids Research, 36, W19–W24.
    [6] Capriotti, E. and Marti-Renom, M. A. (2008) RNA structure alignment by a unit-vector approach. Bioinformatics, 24, i112–i118.
    [7] Capriotti, E. and Marti-Renom, M. A. (2009) SARA: a server for function annotation of RNA structures. Nucleic Acids Research, 37, W260–W265.
    [8] Bauer, R. A., Rother, K., Moor, P., Reinert, K., Steinke, T., Bujnicki, J. M. and Preiss-ner, R. (2009) Fast structural alignment of biomolecules using a hash table, n-grams and string descriptors. Algorithms, 2, 692–709.
    [9] Wang, C. W., Chen, K. T. and Lu, C. L. (2010) iPARTS: an improved tool of pairwise alignment of RNA tertiary structures. Nucleic Acids Research, 38, W340–W347.
    [10] Rahrig R. R., Leontis N. B. and Zirbel C. L. (2010) R3D align: global pairwise align-ment of RNA 3D structures using local superpositions. Bioinformatics, 26, 2689–2697.
    [11] Hoksza, D. and Svozil, D. (2012) Efficient RNA pairwise structure comparison by SETTER method. Bioinformatics, 28, 1858–1864.
    [12] Čech, P., Svozil, D. and Hoksza, D. (2012) SETTER: web server for RNA structure comparison. Nucleic Acids Research, 40, W42–W48.
    [13] He, G., et al. (2014) RASS: a web server for RNA alignment in the joint se-quence-structure space. Nucleic Acids Research, 42, W377–W381.
    [14] C.H. Yang, et al. (2016) iPARTS2: an improved tools for pairwise alignment of RNA tertiary structures, version 2. Nucleic Acids Res. 1 doi: 10.1093/nar/gkw412.
    [15] Needleman, Saul B. & Wunsch, Christian D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48 (3): 443–53.
    [16] Smith, Temple F. & Waterman, Michael S. (1981) Identification of Common Molecular Subsequences. Journal of Molecular Biology 147: 195–197.
    [17] Berman, H. M., et al. (2000) The protein data bank. Nucleic Acids Research, 28, 235–242.
    [18] Andrew Colasanti, et al. (2013) Analyzing and building nucleic acid structures with 3DNA. Journal of visualized experiments. JoVE, 74.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE