Efficient Algorithms for Triple-wise Alignment and Its Applications

簡易檢索 / 詳目顯示

回結果列表

研究生：	洪哲倫 Hung, Che-Lun
論文名稱：	Efficient Algorithms for Triple-wise Alignment and Its Applications 有效率的三條序列演算法及其應用
指導教授：	鍾葉青 Chung, Yeh-Ching
口試委員:
學位類別：	博士 Doctor
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2010
畢業學年度：	98
語文別：	英文
論文頁數：	94
中文關鍵詞：	序列比對、三條序列比對、多重序列比對、編碼區比對、序列組比對、平行序列比對、特徵選取
外文關鍵詞：	sequence alignment, three-way alignment, multiple sequence alignment, coding region alignment, profile alignment, parallel sequence alignment, feature selection
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

序列比對在生物序列的分析上是很重要的工具，對於生物學家而言，他們可以使用序列比對來分析基因的同源關係、物種之間的關係、以及蛋白質的結構等等。過去的幾十年裡，許多序列分析的研究都以雙序列比對的方法為基礎來開發。然而越來越多的研究顯示三條序列比對方法可以比雙序列比對方法提供更多的資訊或者更精確的比對結果。在本論文中，我們著重在三條序列比對的演算法與其應用。針對三條序列比對的演算法，我們提出了兩個有效的方法，一個是使用可變空格罰分的動態規劃演算法來比對蛋白質序列，一個是使用機率過濾模型的演算法來快速地比對基因序列。針對三條序列比對的應用，我們將三條序列比對的觀念移植到原來採用雙序列比對的方法以及應用上來取得更精確的分析結果。在本論文中首先提出的是一個結合雙序列與三條序列比對的漸進式多重序列比對的策略來提升多重序列比對的精確度。同樣地，在本論文中將三條序列比對的方法延伸到比對三個序列組，比對的結果可以提供與雙序列組比對不一樣的訊息。除此之外，我們還提出了三序列組的平行比對演算法來提升運算的效能。本論文更進一步提出了蛋白質功能預測的方法，藉由結合三序列組比對與投票演算法可以有效地預測蛋白質功能。本論文中包含上列所有提出的方法之理論分析與實驗。從實驗的結果得知，本論文中所提出來的方法與其對應的分析相呼應，並且所提出的方法得到了較佳的結果。

Sequence alignment is a scientific method that contributes to DNA homology studies, phylogeny determinations, and identification of conserved motifs. In the past few decades, pair-wise alignment has become a methodological standard used in many MSA methods. However, an increasing number of studies indicated that the three-way alignment, which is the alignment of three sequences, is able to provide additional information or a more accurate alignment result than what pair-wise alignment is able to give. In this dissertation, we focused on the investigation and application of three-way alignment algorithm. For the investigation of three-way alignment algorithm, we proposed two efficient methods, a dynamic programming-based algorithm with the variable gap penalty strategy and a linear algorithm adopting a probabilistic filtration model, to align protein and DNA sequences, respectively. For the application of three-way alignment, we applied three-way alignment to the methods and applications that originally adopt pair-wise alignment approaches. We presented a new progressive multiple sequence alignment strategy that combines pair-wise and three-way alignments to compare multiple sequences accurately. Similarly, we extended the three-way alignment algorithm to align three profiles to provide the different insight to the profile-profile alignment method. In addition, we developed a parallel algorithm for three-profile alignment to reduce the computational cost. Further, we combined the three-profile alignment approach and a voting algorithm to select the functional sites of the target protein by comparing protein superfamilies. Theoretical analysis and extensive experimental tests of the proposed methods are conducted in this dissertation. From the conducted experimental results, we got some encouraged remarks regarding to the proposed methods for sequence analysis.

Chapter 1 Introduction    1
1 Motivation of the Dissertation    1
2 Contribution of the Dissertation    2
3 Organization of the Dissertation    4
Chapter 2 Related Work    6
1 Pair-wise and three-way alignments    6
2 Multiple sequence alignment    11
3 Profile-profile alignment    12
4 Chapter Summary    13
Chapter 3 Efficient three-way alignment methods for Protein and DNA sequences 14
1 Three-way alignment with variable gap penalty    14
1.1 Definitions of Variable Gap Penalty Strategy    15
1.2 Dynamic programming algorithm    17
1.3 Time Complexity    26
1.4 Experimental Results    26
2 CORAL-T: Heuristic COding Region ALignment Method for Three Genome Sequences    34
2.1 Probabilistic filtration model    35
2.1 Shifting mutation    38
3 Experimental Results    39
3.1 Comparisons of the computing time    40
3.2 Comparisons of the performance    41
3 Chapter Summary    44
Chapter 4 Progressive multiple sequence alignment strategy: combining pair-wise and three-way alignments    45
1 Progressive multiple sequence alignment strategy: combining pair-wise and three-way alignments    45
1.1 Distance matrix and guide tree    46
1.2 Dynamic programming for pair-wise and three-way alignments    48
1.3 Progressive alignment    48
1.4 Complexity    51
2 Experimental Results    51
2.1 Alignment with ROSE package    52
2.2 Alignment with BRaliBASE2    53
2.3 Computation time    55
3 Chapter Summary    56
Chapter 5 Three-profile alignment and its parallelization    57
1 Dynamic programming-based Three-Profile Alignment algorithm and parallel algorithm    57
1.1 TPA algorithm    58
2 PTPA algorithm    60
2 Experiment Results    63
2.1 Case study: comparison of Three-profile alignment and Profile-Profile alignmnet in Enterovirus    63
2.2 Comparison of computation time    64
3 Chapter Summary    64
Chapter 6    Feature Amplified Voting Algorithm for Functional Analysis of Protein Superfamily    65
1 Method    65
1.1 Imidase and sequence clustering in the amidohydrolase superfamily    66
1.2 Observation and assumption    67
1.3 Algorithm    69
2 Experiment Results    74
2.1 Voting scores of imidase by FAVAT analysis    74
2.2 Comparison of FAVAT and MSA results    75
2.3 The corresponding locations of FAVAT-selected residues in 1GKQ and 1KCX    76
2 Chapter Summary    79
Chapter 7    Conclusions    81
Bibliography    83

                                

[1] J. Abendroth, K. Niefind, and D. Schomburg, "X-ray structure of a dihydropyrimidinase from Thermus sp. at 1.3 A resolution," J. Mol. Biol., vol. 320, no. 1, pp. 143-156, June2002.
[2] L. Allison, "A fast algorithm for the optimal alignment of three strings," J. Theor. Biol., vol. 164, no. 2, pp. 261-269, Sept.1993.
[3] J. Altenbuchner, M. Siemann-Herzberg, and C. Syldatk, "Hydantoinases and related enzymes as biocatalysts for the synthesis of unnatural chiral amino acids," Curr. Opin. Biotechnol., vol. 12, no. 6, pp. 559-563, Dec.2001.
[4] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, "Basic local alignment search tool," J. Mol. Biol., vol. 215, no. 3, pp. 403-410, Oct.1990.
[5] S. F. Altschul and D. J. Lipman, "Protein database searches for multiple alignments," Proc. Natl. Acad. Sci. U. S. A, vol. 87, no. 14, pp. 5509-5513, July1990.
[6] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Res., vol. 25, no. 17, pp. 3389-3402, Sept.1997.
[7] S. A. Benner, M. A. Cohen, and G. H. Gonnet, "Empirical and structural models for insertions and deletions in the divergent evolution of proteins," J. Mol. Biol., vol. 229, no. 4, pp. 1065-1082, Feb.1993.
[8] F. Bernheim and M. L. C. Bernheim, "The hydrolysis of hydantoin by various tissues," J. Biol. Chem., vol. 163, pp. 683-685, 1946.
[9] S. E. Brenner, C. Chothia, and T. J. Hubbard, "Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships," Proc. Natl. Acad. Sci. U. S. A, vol. 95, no. 11, pp. 6073-6078, May1998.
[10] H. Carrillo and D. J. Lipman, "The multiple sequence alignment problem in biology," SIAM J. Appl. Math., vol. 48, no. 5, pp. 1073-1082, 1988.
[11] R. R. Copley and P. Bork, "Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways," J. Mol. Biol., vol. 303, no. 4, pp. 627-641, Nov.2000.
[12] M. O. Dayhoff, R. M. Schwartz, and B. C. Orcutt, "A model for evolutionary change in proteins," Atlas of Protein Sequence and Structure, vol. 5, pp. 348-358, 1978.
[13] R. C. Deo, E. F. Schmidt, A. Elhabazi, H. Togashi, S. K. Burley, and S. M. Strittmatter, "Structural bases for CRMP function in plexin-dependent semaphorin3A signaling," EMBO J., vol. 23, no. 1, pp. 9-22, Jan.2004.
[14] C. B. Do, M. S. Mahabhashyam, M. Brudno, and S. Batzoglou, "ProbCons: Probabilistic consistency-based multiple sequence alignment," Genome Res., vol. 15, no. 2, pp. 330-340, Feb.2005.
[15] G. S. EADIE, F. BERNHEIM, and M. L. BERNHEIM, "The partial purification and properties of animal and plant hydantoinases," J. Biol. Chem., vol. 181, no. 2, pp. 449-458, Dec.1949.
[16] S. R. Eddy, "Profile hidden Markov models," Bioinformatics, vol. 14, no. 9, pp. 755-763, 1998.
[17] R. C. Edgar, "MUSCLE: multiple sequence alignment with high accuracy and high throughput," Nucleic Acids Res., vol. 32, no. 5, pp. 1792-1797, 2004.
[18] R. C. Edgar and K. Sjolander, "COACH: profile-profile alignment of protein families using hidden Markov models," Bioinformatics, vol. 20, no. 8, pp. 1309-1318, May2004.
[19] A. Elofsson, "A study on protein sequence alignment quality," Proteins, vol. 46, no. 3, pp. 330-339, Feb.2002.
[20] D. F. Feng and R. F. Doolittle, "Progressive sequence alignment as a prerequisite to correct phylogenetic trees," J. Mol. Evol., vol. 25, no. 4, pp. 351-360, 1987.
[21] W. M. Fitch and T. F. Smith, "Optimal sequence alignments," Proc. Natl. Acad. Sci. U. S. A, vol. 80, no. 5, pp. 1382-1386, Mar.1983.
[22] P. P. Gardner, A. Wilm, and S. Washietl, "A benchmark of multiple sequence alignment programs upon structural RNAs," Nucleic Acids Res., vol. 33, no. 8, pp. 2433-2439, 2005.
[23] J. A. Gerlt and F. M. Raushel, "Evolution of function in (beta/alpha)8-barrel enzymes," Curr. Opin. Chem. Biol., vol. 7, no. 2, pp. 252-264, Apr.2003.
[24] S. Gong and T. L. Blundell, "Discarding functional residues from the substitution table improves predictions of active sites within three-dimensional structures," PLoS. Comput. Biol., vol. 4, no. 10, p. e1000179, 2008.
[25] G. H. Gonnet, M. A. Cohen, and S. A. Benner, "Exhaustive matching of the entire protein sequence database," Science, vol. 256, no. 5062, pp. 1443-1445, June1992.
[26] N. C. Goonesekere and B. Lee, "Context-specific amino acid substitution matrices and their use in the detection of protein homologs," Proteins, vol. 71, no. 2, pp. 910-919, May2008.
[27] Y. Goshima, F. Nakamura, P. Strittmatter, and S. M. Strittmatter, "Collapsin-induced growth cone collapse mediated by an intracellular protein related to UNC-33," Nature, vol. 376, no. 6540, pp. 509-514, Aug.1995.
[28] O. Gotoh, "Alignment of three biological sequences with an efficient traceback procedure," J. Theor. Biol., vol. 121, no. 3, pp. 327-337, Aug.1986.
[29] C. Grasso and C. Lee, "Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems," Bioinformatics, vol. 20, no. 10, pp. 1546-1556, July2004.
[30] M. Gribskov, M. Homyak, J. Edenfield, and D. Eisenberg, "Profile scanning for three-dimensional structural patterns in protein sequences," Comput. Appl. Biosci., vol. 4, no. 1, pp. 61-66, Mar.1988.
[31] S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna, and S. R. Eddy, "Rfam: an RNA family database," Nucleic Acids Res., vol. 31, no. 1, pp. 439-441, Jan.2003.
[32] N. Hamajima, K. Matsuda, S. Sakata, N. Tamaki, M. Sasaki, and M. Nonaka, "A novel gene family defined by human dihydropyrimidinase and three related proteins with differential tissue distribution," Gene, vol. 180, no. 1-2, pp. 157-163, Nov.1996.
[33] J. Hein, "A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given," Mol. Biol. Evol., vol. 6, no. 6, pp. 649-668, Nov.1989.
[34] J. G. Henikoff, E. A. Greene, S. Pietrokovski, and S. Henikoff, "Increased coverage of protein families with the blocks database servers," Nucleic Acids Res., vol. 28, no. 1, pp. 228-230, Jan.2000.
[35] S. Henikoff and J. G. Henikoff, "Amino acid substitution matrices from protein blocks," Proc. Natl. Acad. Sci. U. S. A, vol. 89, no. 22, pp. 10915-10919, Nov.1992.
[36] D. S. Hirschberg, "A linear space algorithm for computing maximal common subsequences," Communications of the ACM, vol. 18, pp. 341-343, June1975.
[37] I. L. Hofacker, M. Fekete, and P. F. Stadler, "Secondary structure prediction for aligned RNA sequences," J. Mol. Biol., vol. 319, no. 5, pp. 1059-1066, June2002.
[38] P. Hogeweg and B. Hesper, "The alignment of sets of sequences and the construction of phylogenetic trees: An integrated method," J. Mol. Evol., vol. 20, pp. 175-186, 1984.
[39] L. Holm and C. Sander, "An evolutionary treasure: unification of a broad set of amidohydrolases related to urease," Proteins, vol. 28, no. 1, pp. 72-82, May1997.
[40] S. J. Hsieh, C. Y. Lin, N. H. Liu, W. Y. Chow, and C. Y. Tang, "GeneAlign: a coding exon prediction tool based on phylogenetical comparisons," Nucleic Acids Res., vol. 34, no. Web Server issue, p. W280-W284, July2006.
[41] C. Y. Huang, S. K. Chiang, Y. S. Yang, and Y. J. Sun, "Crystallization and preliminary X-ray diffraction analysis of thermophilic imidase from pig liver," Acta Crystallogr. D. Biol. Crystallogr., vol. 59, no. Pt 5, pp. 943-945, May2003.
[42] C. Y. Huang and Y. S. Yang, "The role of metal on imide hydrolysis: metal content and pH profiles of metal ion-replaced mammalian imidase," Biochem. Biophys. Res. Commun., vol. 297, no. 4, pp. 1027-1032, Oct.2002.
[43] C. Y. Huang and Y. S. Yang, "Discovery of a novel N-iminylamidase activity: substrate specificity, chemicoselectivity and catalytic mechanism," Protein Expr. Purif., vol. 40, no. 1, pp. 203-211, Mar.2005.
[44] X. Huang, "Alignment of three sequences in quadratic space," Appl. Comput. Rev., vol. 1, pp. 7-11, 1993.
[45] X. Huang, "Alignment of three sequences in quadratic space," Appl. Comput. Rev., vol. 1, pp. 7-11, 1993.
[46] C. L. Hung, C. H. Lee, C. Y. Lin, C. H. Chang, Y. C. Chung, and C. Y. Tang, "Feature Amplified Voting Algorithm for Functional Analysis of Protein Superfamily," BMC Genomics, 2010.
[47] C. L. Hung, C. Y. Lin, Y. C. Chung, S. J. Shieh, and C. Y. Tang, "CORAL-T: Heuristic COding Region ALignment Method for Three Genome Sequences," Communications of SIWN, vol. 6, pp. 99-105, 2009.
[48] C. L. Hung, C. Y. Lin, Y. C. Chung, and C. Y. Tang, "CrossWA: A new approach of combining pairwise and three-sequence alignments to improve the accuracy for highly divergent sequences alignment," in The International Conference on Bioinformatics HongKong, China: 2007.
[49] C. L. Hung, C. Y. Lin, Y. C. Chung, and C. Y. Tang, "Introducing variable gap penalties into three-sequence alignment for protein sequences," in International Conference on Advanced Information Networking and Applications Okinawa, Japan: 2008, pp. 726-731.
[50] C. L. Hung, C. Y. Lin, Y. C. Chung, and C. Y. Tang, "A Parallel Algorithm for Three-Profile Alignment Method," in International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing Shanghai, China: 2009, pp. 153-159.
[51] M. S. Johnson and R. F. Doolittle, "A method for the simultaneous alignment of three or more amino acid sequences," J. Mol. Evol., vol. 23, no. 3, pp. 267-278, 1986.
[52] K. Karplus, C. Barrett, and R. Hughey, "Hidden Markov models for detecting remote protein homologies," Bioinformatics, vol. 14, no. 10, pp. 846-856, 1998.
[53] K. Katoh, K. Misawa, K. Kuma, and T. Miyata, "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform," Nucleic Acids Res., vol. 30, no. 14, pp. 3059-3066, July2002.
[54] W. J. Kent and A. M. Zahler, "Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment," Genome Res., vol. 10, no. 8, pp. 1115-1125, Aug.2000.
[55] G. J. Kim and H. S. Kim, "C-terminal regions of D-hydantoinases are nonessential for catalysis, but affect the oligomeric structure," Biochem. Biophys. Res. Commun., vol. 243, no. 1, pp. 96-100, Feb.1998.
[56] M. Kruspe and P. F. Stadler, "Progressive multiple sequence alignments from triplets," BMC Bioinformatics, vol. 8, p. 254, 2007.
[57] M. Kruspe and P. F. Stadler, "Progressive multiple sequence alignments from triplets," 2007.
[58] V. Kunin, B. Chan, E. Sitbon, G. Lithwick, and S. Pietrokovski, "Consistency analysis of similarity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifs," J. Mol. Biol., vol. 307, no. 3, pp. 939-949, Mar.2001.
[59] A. M. Lesk, M. Levitt, and C. Chothia, "Alignment of the amino acid sequences of distantly related proteins using variable gap penalties," Protein Eng, vol. 1, no. 1, pp. 77-78, Oct.1986.
[60] M. S. Madhusudhan, M. A. Marti-Renom, R. Sanchez, and A. Sali, "Variable gap penalty for protein sequence-structure alignment," Protein Eng Des Sel, vol. 19, no. 3, pp. 129-133, Mar.2006.
[61] E. H. Margulies, C. W. Chen, and E. D. Green, "Differences between pair-wise and multi-sequence alignment methods affect vertebrate genome comparisons," Trends Genet., vol. 22, no. 4, pp. 187-193, Apr.2006.
[62] M. A. Marti-Renom, M. S. Madhusudhan, A. Fiser, B. Rost, and A. Sali, "Reliability of assessment of protein structure prediction methods," Structure, vol. 10, no. 3, pp. 435-440, Mar.2002.
[63] W. Miller and E. W. Myers, "Sequence comparison with concave weighting functions," Bull. Math. Biol., vol. 50, no. 2, pp. 97-120, 1988.
[64] B. Morgenstern, K. Frech, A. Dress, and T. Werner, "DIALIGN: finding local similarities by multiple sequence alignment," Bioinformatics, vol. 14, no. 3, pp. 290-294, 1998.
[65] T. Muller, S. Rahmann, and M. Rehmsmeier, "Non-symmetric score matrices and the detection of homologous transmembrane proteins," Bioinformatics, vol. 17 Suppl 1, p. S182-S189, 2001.
[66] M. Murata, "Three-way Needleman--Wunsch algorithm," Methods Enzymol., vol. 183, pp. 365-375, 1990.
[67] M. Murata, J. S. Richardson, and J. L. Sussman, "Simultaneous comparison of three protein sequences," Proc. Natl. Acad. Sci. U. S. A, vol. 82, no. 10, pp. 3073-3077, May1985.
[68] W. J. Murphy, E. Eizirik, W. E. Johnson, Y. P. Zhang, O. A. Ryder, and S. J. O'Brien, "Molecular phylogenetics and the origins of placental mammals," Nature, vol. 409, no. 6820, pp. 614-618, Feb.2001.
[69] W. J. Murphy, E. Eizirik, S. J. O'Brien, O. Madsen, M. Scally, C. J. Douady, E. Teeling, O. A. Ryder, M. J. Stanhope, W. W. de Jong, and M. S. Springer, "Resolution of the early placental mammal radiation using Bayesian phylogenetics," Science, vol. 294, no. 5550, pp. 2348-2351, Dec.2001.
[70] S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," J. Mol. Biol., vol. 48, no. 3, pp. 443-453, Mar.1970.
[71] C. Notredame and D. G. Higgins, "SAGA: sequence alignment by genetic algorithm," Nucleic Acids Res., vol. 24, no. 8, pp. 1515-1524, Apr.1996.
[72] C. Notredame, D. G. Higgins, and J. Heringa, "T-Coffee: A novel method for fast and accurate multiple sequence alignment," J. Mol. Biol., vol. 302, no. 1, pp. 205-217, Sept.2000.
[73] T. Ohlson and A. Elofsson, "ProfNet, a method to derive profile-profile alignment scoring functions that improves the alignments of distantly related proteins," BMC Bioinformatics, vol. 6, p. 253, 2005.
[74] A. R. Panchenko, "Finding weak similarities between proteins by sequence profile comparison," Nucleic Acids Res., vol. 31, no. 2, pp. 683-689, Jan.2003.
[75] A. R. Panchenko, "Finding weak similarities between proteins by sequence profile comparison," Nucleic Acids Res., vol. 31, no. 2, pp. 683-689, Jan.2003.
[76] S. Pascarella and P. Argos, "Analysis of insertions/deletions in protein structures," J. Mol. Biol., vol. 224, no. 2, pp. 461-471, Mar.1992.
[77] K. Pawlowski, L. Rychlewski, B. Zhang, and A. Godzik, "Fold predictions for bacterial genomes," J. Struct. Biol., vol. 134, no. 2-3, pp. 219-231, May2001.
[78] K. Pawlowski, B. Zhang, L. Rychlewski, and A. Godzik, "The Helicobacter pylori genome: from sequence analysis to structural and functional predictions," Proteins, vol. 36, no. 1, pp. 20-30, July1999.
[79] W. R. Pearson, "Rapid and sensitive sequence comparison with FASTP and FASTA," Methods Enzymol., vol. 183, pp. 63-98, 1990.
[80] J. Pei and N. V. Grishin, "MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information," Nucleic Acids Res., vol. 34, no. 16, pp. 4364-4374, 2006.
[81] S. Pietrokovski, "Searching databases of conserved sequence regions by aligning protein multiple-alignments," Nucleic Acids Res., vol. 24, no. 19, pp. 3836-3845, Oct.1996.
[82] A. Reyes, C. Gissi, F. Catzeflis, E. Nevo, G. Pesole, and C. Saccone, "Congruent mammalian trees from mitochondrial and nuclear genes using Bayesian methods," Mol. Biol. Evol., vol. 21, no. 2, pp. 397-403, Feb.2004.
[83] R. Sadreyev and N. Grishin, "COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance," J. Mol. Biol., vol. 326, no. 1, pp. 317-336, Feb.2003.
[84] N. Saitou and M. Nei, "The neighbor-joining method: a new method for reconstructing phylogenetic trees," Mol. Biol. Evol., vol. 4, no. 4, pp. 406-425, July1987.
[85] D. Sankoff, "Matching sequences under deletion-insertion constraints," Proc. Natl. Acad. Sci. U. S. A, vol. 69, no. 1, pp. 4-6, Jan.1972.
[86] P. H. Sellers, "An algorithm for the distance between two finite sequences," J. Combin. Theory Ser. A, vol. 16, pp. 253-258, 1974.
[87] P. H. Sellers, "On the theory and computation of evolutionary distance," SIAM J. Appl. Math., vol. 26, pp. 784-793, 1974.
[88] S. Y. Shen, J. Yang, A. Yao, and P. I. Hwang, "Super pairwise alignment (SPA): an efficient approach to global alignment for homologous sequences," J. Comput. Biol., vol. 9, no. 3, pp. 477-486, 2002.
[89] T. F. Smith and M. S. Waterman, "Identification of common molecular subsequences," J. Mol. Biol., vol. 147, no. 1, pp. 195-197, Mar.1981.
[90] E. L. Sonnhammer and D. Kahn, "Modular arrangement of proteins as inferred from analysis of homology," Protein Sci., vol. 3, no. 3, pp. 482-492, Mar.1994.
[91] J. Stoye, D. Evers, and F. Meyer, "Rose: generating sequence families," Bioinformatics, vol. 14, no. 2, pp. 157-163, 1998.
[92] C. L. Strope, S. D. Scott, and E. N. Moriyama, "indel-Seq-Gen: a new protein family simulator incorporating domains, motifs, and indels," Mol. Biol. Evol., vol. 24, no. 3, pp. 640-649, Mar.2007.
[93] T. M. Su and Y. S. Yang, "Identification, purification, and characterization of a thermophilic imidase from pig liver," Protein Expr. Purif., vol. 19, no. 2, pp. 289-297, July2000.
[94] A. R. Subramanian, M. Kaufmann, and B. Morgenstern, "DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment," Algorithms. Mol. Biol., vol. 3, p. 6, 2008.
[95] A. R. Subramanian, M. Kaufmann, and B. Morgenstern, "DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment," Algorithms. Mol. Biol., vol. 3, p. 6, 2008.
[96] A. R. Subramanian, J. Weyer-Menkhoff, M. Kaufmann, and B. Morgenstern, "DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment," BMC Bioinformatics, vol. 6, p. 66, 2005.
[97] C. Syldatk, O. May, J. Altenbuchner, R. Mattes, and M. Siemann, "Microbial hydantoinases--industrial enzymes from the origin of life?," Appl. Microbiol. Biotechnol., vol. 51, no. 3, pp. 293-309, Mar.1999.
[98] R. L. Tatusov, S. F. Altschul, and E. V. Koonin, "Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks," Proc. Natl. Acad. Sci. U. S. A, vol. 91, no. 25, pp. 12091-12095, Dec.1994.
[99] J. B. Thoden, G. N. Phillips, Jr., T. M. Neal, F. M. Raushel, and H. M. Holden, "Molecular structure of dihydroorotase: a paradigm for catalysis through the use of a binuclear metal center," Biochemistry, vol. 40, no. 24, pp. 6989-6997, June2001.
[100] J. D. Thompson, "Introducing variable gap penalties to sequence alignment in linear space," Comput. Appl. Biosci., vol. 11, no. 2, pp. 181-186, Apr.1995.
[101] J. D. Thompson, D. G. Higgins, and T. J. Gibson, "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice," Nucleic Acids Res., vol. 22, no. 22, pp. 4673-4680, Nov.1994.
[102] J. D. Thompson, D. G. Higgins, and T. J. Gibson, "Improved sensitivity of profile searches through the use of sequence weights and gap excision," Comput. Appl. Biosci., vol. 10, no. 1, pp. 19-29, Feb.1994.
[103] J. D. Thompson, P. Koehl, R. Ripp, and O. Poch, "BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark," Proteins, vol. 61, no. 1, pp. 127-136, Oct.2005.
[104] J. D. Thompson, F. Plewniak, and O. Poch, "BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs," Bioinformatics, vol. 15, no. 1, pp. 87-88, Jan.1999.
[105] E. Ukkonen, "On approximate string matching.," 1983, pp. 487-495.
[106] W. Van, I, I. Lasters, and L. Wyns, "Align-m--a new algorithm for multiple alignment of highly divergent sequences," Bioinformatics, vol. 20, no. 9, pp. 1428-1435, June2004.
[107] O. N. von, I. Sommer, and R. Zimmer, "Profile-profile alignment: a powerful tool for protein structure prediction," Pac. Symp. Biocomput., pp. 252-263, 2003.
[108] D. P. WALLACH and S. GRISOLIA, "The purification and properties of hydropyrimidine hydrase," J. Biol. Chem., vol. 226, no. 1, pp. 277-288, May1957.
[109] B. Wallner, H. Fang, T. Ohlson, J. Frey-Skott, and A. Elofsson, "Using evolutionary information for the query and target improves fold recognition," Proteins, vol. 54, no. 2, pp. 342-350, Feb.2004.
[110] L. Wang and T. Jiang, "On the complexity of multiple sequence alignment," J. Comput. Biol., vol. 1, no. 4, pp. 337-348, 1994.
[111] L. H. Wang and S. M. Strittmatter, "Brain CRMP forms heterotetramers similar to liver dihydropyrimidinase," J. Neurochem., vol. 69, no. 6, pp. 2261-2269, Dec.1997.
[112] S. Washietl, I. L. Hofacker, and P. F. Stadler, "Fast and reliable prediction of noncoding RNAs," Proc. Natl. Acad. Sci. U. S. A, vol. 102, no. 7, pp. 2454-2459, Feb.2005.
[113] R. H. Waterston, K. Lindblad-Toh, E. Birney, J. Rogers, J. F. Abril, P. Agarwal, R. Agarwala, R. Ainscough, M. Alexandersson, P. An, S. E. Antonarakis, J. Attwood, R. Baertsch, J. Bailey, K. Barlow, S. Beck, E. Berry, B. Birren, T. Bloom, P. Bork, M. Botcherby, N. Bray, M. R. Brent, D. G. Brown, S. D. Brown, C. Bult, J. Burton, J. Butler, R. D. Campbell, P. Carninci, S. Cawley, F. Chiaromonte, A. T. Chinwalla, D. M. Church, M. Clamp, C. Clee, F. S. Collins, L. L. Cook, R. R. Copley, A. Coulson, O. Couronne, J. Cuff, V. Curwen, T. Cutts, M. Daly, R. David, J. Davies, K. D. Delehaunty, J. Deri, E. T. Dermitzakis, C. Dewey, N. J. Dickens, M. Diekhans, S. Dodge, I. Dubchak, D. M. Dunn, S. R. Eddy, L. Elnitski, R. D. Emes, P. Eswara, E. Eyras, A. Felsenfeld, G. A. Fewell, P. Flicek, K. Foley, W. N. Frankel, L. A. Fulton, R. S. Fulton, T. S. Furey, D. Gage, R. A. Gibbs, G. Glusman, S. Gnerre, N. Goldman, L. Goodstadt, D. Grafham, T. A. Graves, E. D. Green, S. Gregory, R. Guigo, M. Guyer, R. C. Hardison, D. Haussler, Y. Hayashizaki, L. W. Hillier, A. Hinrichs, W. Hlavina, T. Holzer, F. Hsu, A. Hua, T. Hubbard, A. Hunt, I. Jackson, D. B. Jaffe, L. S. Johnson, M. Jones, T. A. Jones, A. Joy, M. Kamal, E. K. Karlsson, D. Karolchik, A. Kasprzyk, J. Kawai, E. Keibler, C. Kells, W. J. Kent, A. Kirby, D. L. Kolbe, I. Korf, R. S. Kucherlapati, E. J. Kulbokas, D. Kulp, T. Landers, J. P. Leger, S. Leonard, I. Letunic, R. Levine, J. Li, M. Li, C. Lloyd, S. Lucas, B. Ma, D. R. Maglott, E. R. Mardis, L. Matthews, E. Mauceli, J. H. Mayer, M. McCarthy, W. R. McCombie, S. McLaren, K. McLay, J. D. McPherson, J. Meldrim, B. Meredith, J. P. Mesirov, W. Miller, T. L. Miner, E. Mongin, K. T. Montgomery, M. Morgan, R. Mott, J. C. Mullikin, D. M. Muzny, W. E. Nash, J. O. Nelson, M. N. Nhan, R. Nicol, Z. Ning, C. Nusbaum, M. J. O'Connor, Y. Okazaki, K. Oliver, E. Overton-Larty, L. Pachter, G. Parra, K. H. Pepin, J. Peterson, P. Pevzner, R. Plumb, C. S. Pohl, A. Poliakov, T. C. Ponce, C. P. Ponting, S. Potter, M. Quail, A. Reymond, B. A. Roe, K. M. Roskin, E. M. Rubin, A. G. Rust, R. Santos, V. Sapojnikov, B. Schultz, J. Schultz, M. S. Schwartz, S. Schwartz, C. Scott, S. Seaman, S. Searle, T. Sharpe, A. Sheridan, R. Shownkeen, S. Sims, J. B. Singer, G. Slater, A. Smit, D. R. Smith, B. Spencer, A. Stabenau, N. Stange-Thomann, C. Sugnet, M. Suyama, G. Tesler, J. Thompson, D. Torrents, E. Trevaskis, J. Tromp, C. Ucla, A. Ureta-Vidal, J. P. Vinson, A. C. Von Niederhausern, C. M. Wade, M. Wall, R. J. Weber, R. B. Weiss, M. C. Wendl, A. P. West, K. Wetterstrand, R. Wheeler, S. Whelan, J. Wierzbowski, D. Willey, S. Williams, R. K. Wilson, E. Winter, K. C. Worley, D. Wyman, S. Yang, S. P. Yang, E. M. Zdobnov, M. C. Zody, and E. S. Lander, "Initial sequencing and comparative analysis of the mouse genome," Nature, vol. 420, no. 6915, pp. 520-562, Dec.2002.
[114] N. K. Williams, M. K. Manthey, T. W. Hambley, S. I. O'Donoghue, M. Keegan, B. E. Chapman, and R. I. Christopherson, "Catalysis by hamster dihydroorotase: zinc binding, site-directed mutagenesis, and interaction with inhibitors," Biochemistry, vol. 34, no. 36, pp. 11344-11352, Sept.1995.
[115] C. H. Wu, H. Huang, L. Arminski, J. Castro-Alvear, Y. Chen, Z. Z. Hu, R. S. Ledley, K. C. Lewis, H. W. Mewes, B. C. Orcutt, B. E. Suzek, A. Tsugita, C. R. Vinayaka, L. S. Yeh, J. Zhang, and W. C. Barker, "The Protein Information Resource: an integrated public resource of functional annotation of proteins," Nucleic Acids Res., vol. 30, no. 1, pp. 35-37, Jan.2002.
[116] Y. S. Yang, S. Ramaswamy, and W. B. Jakoby, "Rat liver imidase," J. Biol. Chem., vol. 268, no. 15, pp. 10870-10875, May1993.
[117] G. Yona and M. Levitt, "Within the twilight zone: a sensitive profile-profile comparison tool based on information theory," J. Mol. Biol., vol. 315, no. 5, pp. 1257-1275, 2002.
[118] Z. Y. Zhu, A. Sali, and T. L. Blundell, "A variable gap penalty function and feature weights for protein 3-D structure comparisons," Protein Eng, vol. 5, no. 1, pp. 43-51, Jan.1992.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文