研究生: |
梁博程 Bor-Cherng Liang |
---|---|
論文名稱: |
大尺度基因型資料之單體型解構與重建 Haplotype Decomposition and Reconstruction from Large Scale Genotype Data |
指導教授: |
劉庭祿
Tyng-Luh Liu 陳朝欽 Chaur-Chin Chen |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2004 |
畢業學年度: | 92 |
語文別: | 英文 |
論文頁數: | 59 |
中文關鍵詞: | 單體型 、標籤單核苷酸多態性 、理想系統發生樹 、鋪貼區塊 |
外文關鍵詞: | Haplotype, tag SNPs, perfect phylogeny tree, tiling block |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文旨在探討有關單體型(haplotype)解構與重建的問題,我們提出一個處理大尺度基因型(genotype)資料的方法,來決定其單體型的區塊分割及重建每個基因型之成對單體型。在單體型解構方面,我們採用動態程式規劃演算法決定最佳的區塊分割;在單體型重建方面,我們提出每個區塊內包含至少一個理想系統發生樹(perfect phylogeny tree)的模型,以及由標籤單核苷酸多態性(tag SNPs)組成區塊間的鋪貼區塊,來重建整個單體型。經由這二個主要元件的搭配,發展出一套有效率的單體型重建系統。
我們所發展出的演算法,是以Eskin等人在RECOMB 2003所發表的論文為出發點。然而,對於Eskin等人採用的區塊內單一理想系統發生樹模型,我們認為其應為至少一個而且通常超過一個理想系統發生樹;而且,對於區塊間重建回整個單體型這個難題,Eskin等人只考慮兩個相鄰區塊間的關係,我們則更進一步考慮所有區塊之間的關係。本篇論文有四點主要貢獻:(1) 提出at-least-one perfect -phylogeny-tree model,更能符合真實基因型資料並改善單體型重建之準確率;(2) 訂定informative score function,分解基因型成最有可能的一對單體型;(3) 建構tiling blocks consisting of tag SNPs,使得所有區塊間的關係為可解決的(resolvable);(4) 根據mutual relation among blocks,重建整個單體型並減低少數錯誤判斷的影響。
為了驗證所提出的演算法之效率及準確度,我們進行了數種測試。我們所建立的系統能提供準確而有效率之單體型解構與重建,在準確度上,我們使用Daly等人的染色體5q31基因型資料庫(129個基因型,每個包含103個單核苷酸多態性)來進行試驗,準確率為97.9%,在Pentium-4 3.06GHz的PC上,只需要一分鐘就可決定其區塊分割及單體型。
In this thesis, we address the problem of haplotype decomposition and reconstruction. While focusing on large scale genotype data, we propose a new framework to determine the haplotype block partitions and to resolve the haplotype pair of each genotype. In implementing the decomposition scheme, we formulate a dynamic programming algorithm to minimize the total number of tag SNPs. For structuring the reconstruction method, we introduce an at-least-one perfect-phylogeny-tree model within each block, and use tiling blocks consisting of tag SNPs among blocks. It turns out that the two elements are well coupled and lead to an accurate and efficient haplotype reconstruction system.
Our approach is closely related to the work of Eskin et al.. However, the perfect phylogeny model used in their scheme is restricted by only one perfect phylogeny tree within a block. We instead adopt a more flexible criterion that requires at least one perfect phylogeny tree. Furthermore, in dealing with the difficult problem of resolving whole haplotypes among blocks, we go further to take into account all blocks, whereas their work only considers two adjacent blocks. Specifically, the contributions of our work can be characterized by: (i) an at-least-one prefect-phylogeny-tree model, to fit the real genotype data and improve the accuracy of haplotype resolving within a block; (ii) an informative score function, to resolve a genotype into the most likely pair of haplotypes; (iii) tiling blocks consisting of tag SNPs, to make all of the choices resolvable; and (iii) mutual relation among blocks, to resolve whole haplotypes among blocks by considering all blocks, and to reduce the effects caused by a few erratic choices. We have also included various experimental results to illustrate the advantages of the proposed method.
Keywords: Haplotype, tag SNPs, perfect phylogeny tree, tiling block
[1] V. Bafna, D. Gusfield, G. Lancia, and S. Yooseph, “Haplotyping as Perfect Phylogeny:
A Direct Approach,” Tech. Rep., Technical Report UCDavis CSE-2002-21,
July 2002.
[2] V. Bafna, B. V. Halldorsson, R. Schwartz, A. G. Clark, and S. Istrail, “Haplotypes
and Informative SNP Selection Algorithms: Don’t Block out Information,” In Proceedings
of The 7th Annual International Conference on Research in Computational
Molecular Biology(RECOMB), pp. 19–27, 2003.
[3] A. Clark, “Inference of Haplotypes from PCR-amplified Samples of Diploid Populations,”
Molecular Biology and Evolution, vol. 7, no. 2, pp. 111–22, March 1990.
[4] M. J. Daly, J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander, “Highresolution
Haplotype Structure in the Human Genome,” Nature Genetics, vol. 29, no.
2, pp. 229–232, October 2001.
[5] E. Eskin, E. Halperin, and R. M. Karp, “Large Scale Reconstruction of Haplotypes
from Genotype Data,” In Proceedings of The 7th Annual International Conference on
Research in Computational Molecular Biology(RECOMB), pp. 104–113, 2003.
[6] L. Excoffier and M. Slatkin, “Maximum-likelihood Estimation of Molecular Haplotype
Frequencies in a Diploid Population,” Molecular Biology and Evolution, vol. 12,
no. 5, pp. 921–927, September 1995.
[7] G. Greenspan and D. Geiger, “Model-based Inference of Haplotype Block Variation,”
In Proceedings of The 7th Annual International Conference on Research in Computational
Molecular Biology(RECOMB), pp. 131–137, 2003.
[8] D. Gusfield, “Haplotyping as Perfect Phylogeny: Conceptual Framework and Ef-
ficient Solutions,” In Proceedings of The 6th Annual International Conference on
Research in Computational Molecular Biology(RECOMB), pp. 166–175, 2002.
[9] E. Halperin and E. Eskin, “Haplotype Reconstruction from Genotype Data using
Imperfect Phylogeny,” To appear in Bioinformatics, 2004.
[10] G. Kimmel and R. Shamir, “Maximum Likelihood Resolution of Multi-block Genotypes,”
In Proceedings of The 8th Annual International Conference on Research in
Computational Molecular Biology(RECOMB), pp. 2–9, 2004.
[11] M. Koivisto, M. Perola, T. Varilo, W. Hennah, J. Ekelund, M. Lukk, L. Peltonen,
E. Ukkonen, and H. Mannila, “An MDL Method for Finding Haplotype Blocks and
for Estimating the Strength of Haplotype Block Boundaries,” In Proceedings of the
Pacific Symposium on Biocomputing (PSB), vol. 8, pp. 502–513, 2003.
[12] J. Long, R. Williams, and M Urbanek, “An EM Algorithm and Testing Strategy for
Multiple-locus Haplotypes,” American Journal of Human Genetics, vol. 56, no. 3,
pp. 799–810, March 1995.
[13] NHGRI, “http://www.genome.gov/10005336,” October 2002.
[14] NHGRI, “http://www.genome.gov/10001772,” February 2004.
[15] N. Patil, A. J. Berno, D. A. Hinds, W. A. Barrett, J. M. Doshi, C. R. Hacker, C. R.
Kautzer, D. H. Lee, C. Marjoribanks, D. P. McDonough, B. T. N. Nguyen, M. C.
Norris, J. B. Sheehan, N. Shen, D. Stern, R. P. Stokowski, D. J. Thomas, M. O.
Trulson, K. R. Vyas, K. A. Frazer, S. P. A. Fodor, and D. R. Cox, “Blocks of limited
haplotype diversity revealed by high-resolution scanning of human chromosome 21,”
Science, vol. 294, no. 5547, pp. 1719–1723, November 2001.
[16] R. SCHWARTZ, B. V. HALLDORSSON, V. BAFNA, A. G. CLARK, and S. ISTRAIL1,
“Robustness of Inference of Haplotype Block Structure,” Journal of Computational
Biology, vol. 10, no. 1, pp. 13–19, 2003.
[17] M. Stephens, N. Smith, and P. Donnelly, “A New Statistical Method for Haplotype
Reconstruction from Population Data,” American Journal of Human Genetics, vol.
68, no. 4, pp. 978–989, October 2001.
[18] K. Zhang, M. Deng, T. Chen, M. S.Waterman, and F. Sun, “A Dynamic Programming
Algorithm for Haplotype Block Partitioning,” Proceedings of the National Acadamy
of Science(PNAS), vol. 99, no. 11, pp. 7335–7339, May 2002.