研究生: |
巴達古 Dagoberto Josue Vaquedano Garrido |
---|---|
論文名稱: |
A Genetic Algorithm Based On Maximum Likelihood and Normalized Mutual Information to Infer Haplotypes from Genotypes 基於最大可能性與正規化交互資訊之基因演算法來從基因型推論單倍體 |
指導教授: |
蘇豐文
Soo, Von Wun |
口試委員: |
陳朝欽
Chen, Chaur Chin 陳宜欣 Chen, Yi Shin |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 英文 |
論文頁數: | 54 |
中文關鍵詞: | Genetic Algorithm 、Haplotype Inference Problem 、Hardy Weinberg Equilibrium 、Linkage Disequilibrium 、Maximum Likelihood Estimates 、Normalized Mutual Information |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Haplotypes consist of blocks of single nucleotide polymorphisms (SNPs). Haplotypes being a unit of inheritance are widely used for association studies and gene candidate studies. However, obtaining these blocks of SNPs through in vitro methods is both time consuming and expensive. In silico studies try to infer haplotypes from genotypic data. This thesis utilizes a genetic algorithm (i.e. a heuristic approach) guided through two genetic models, essentially the Hardy-Weinberg equilibrium and linkage disequilibrium. These have been statistically assessed by maximum likelihood estimates and a normalized mutual information respectively. This technique generates an adequate solution in polynomial time to an inherently NP-Hard problem. The results showed that our algorithm has a better accuracy rate compared to a genetic algorithm that only utilizes the Hardy-Weinberg equilibrium.
單倍體基因型(Haplotypes)中包含有多組的單核苷酸多型性(single nucleotide polymorphisms). 而單倍體基因型(Haplotypes)做為遺傳研究中的一個單位,已被大量使用於相關遺傳與候選基因的研究當中. 然而,藉由試管實驗的研究方法來獲取這些單核苷酸多型性(SNPs)的資訊,不僅非常花時間,成本也非常高昂. 相反的,筆者嘗試透過電腦模擬的研究方法,藉由基因資料庫的運用與推導進而解讀出這些單倍體基因型的資訊. 本次研究希望透過使用兩組遺傳模型-哈代‧溫柏格平衡定律(Hardy-Weinberg equilibrium)與連鎖不平衡(linkage disequilibrium),來發展一套新的遺傳演算法(genetic algorithm). 研究所使用的兩組遺傳模型將分別使用最大似然估計法則(maximum likelihood estimates)與標準化共同資訊量(Normalized Mutual Information)進行統計與評估. 而這套遺傳演算法在處理NP困難問題(NP-Hard problem)中,產生出一個適當的多項式時間解決方法. 最終研究結果顯示,研究中所使用的遺傳演算法在只有使用哈代‧溫柏格平衡定律(Hardy-Weinberg equilibrium)時才能有較高的準確率.
O'Brien, S. J., and Nelson G. W. 2004. "Human genes that limit AIDS." Nature Genetics 36 (6): 565-574.
Wilke, R. A., Lin D. W., Roden, D. M., Watkins, P. B., Flockhart, D., Zineh, I., Giacomini, K. M., and Krauss, R. M. 2007. "Identifying genetic risk factors for serious adverse drug reactions: Current progress and challenges." Nature Reviews Drug Discovery 6 (11): 904-916.
Carlson, C. S., Eberle, M. A., Rieder, M. J., Smith, J. D., Kruglyak, L., and Nickerson, D. A. 2003. “Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans”. Nature Genetics 33: 518–521.
Drysdale, C. M., McGraw, D. W., Stack, C. B., Stephens, J. C., Judson, R. S., Nandabalan, K., Arnold, K., Ruano, G., and Liggett, S. B. 2000. "Complex promoter and coding region b2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness." Proceedings of the National Academy of Sciences 97 (19): 10483–10488.
Vljeg, A. V. H., Baglin, C. A., Bare, L. A., Rosendaal, F. R., and Baglin, T. P. (2008). “Proof of principle of potential clinical utility of multiple SNP analysis for prediction of recurrent venous thrombosis.” Journal of Thrombosis and Haemostasis, 6: 751–754.
de Bakker, P. I. W., McVean, G., Sabeti, P. C., Miretti, M. M, Green, T., Marchini, J., Ke, X., Monsuur, A. J., Whittaker, P., Delgado, M., et al. 2006. "A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC." Nature Genetics 38: 1166–1172.
Tycko, B. 2010. "Allele-specific DNA methylation: beyond imprinting." Human Molecular Genetics 19 (R2): R210–R220.
Giardina, E., Pietrangeli, I., Martínez-Labarga, C., Martone, C., de Angelis, F., Spinella, A., De Stefano, G., Rickards, O., and Novelli, G. 2008. "Haplotypes in SLC24A5 gene as ancestry informative markers in different populations." Current Genomics 9(2): 110–114.
Lawson, D. J., Hellenthal, G., Myers, S. and Falush, D. 2012 "Inference of population structure using dense haplotype data." PLoS Genetics 8(1): e1002453.
Roach, J.C., Glusman, G., Smit, A. F., Huff, C. D., Hubley, R., Shannon, P. T., Rowen, L., Pant, K. P., Goodman, N., Bamshad, M., et al. 2010. “Analysis of genetic inheritance in a family quartet by whole-genome sequencing.” Science 328: 636–639.
Kirkness, E. F., Grindberg, R. V., Yee-Greenbaum, J., Marshall, C. R., Scherer, S. W., Lasken, R. S., and Venter, J. C. 2013. "Sequencing of isolated sperm cells for direct haplotyping of a human genome." Genome Research, 23: 826–832.
Fellows, M. R., Hartman, T., Hermelin, D., Landau, G. M., Rosamond, F., and Rozenberg, L. 2011. "Haplotype Inference Constrained by Plausible Haplotype Data." IEEE Computer Society 8(6): 1692-1699.
Niu, T. 2004, "Algorithms for inferring haplotypes." Genetic Epidemiology, 27(4): 334–347.
Lakshminarasimhan, P., Marmelstein, R., Devito, M., Dongsheng, C., and Qi, L. 2010. "A maximum likelihood based genetic algorithm for inferring haplotypes from genotypes." Education Technology and Computer (ICETC), 2010 2nd International Conference on 5: V5-92 - V5-96.
Wigginton, J. E., Cutler, D. J., and Abecasis, G. R. 2005. "A Note on Exact Tests of Hardy-Weinberg Equilibrium." American journal of human genetics 76(5): 887-893.
Shifman, S., Kuypers, J., Kokoris, M., Yakir, B., and Darvasi, A. 2003. "Linkage disequilibrium patterns of the human genome across populations." Human Molecular Genetics 12(7):771-776.
Gao, X., Huang, M., Liu, L., He, Y., Yu, Q., Zhao, H., Zhou, C., Zhang, J., Zhu, Z., Wan, J., et al. 2013. "Insertion/Deletion Polymorphisms in the Promoter Region of BRM Contribute to Risk of Hepatocellular Carcinoma in Chinese Populations." PLoS ONE 8(1): e55169.
Bingham, E., Koivisto, M., Leino, Y. and Mannila, H. 2010. "Linkage Disequilibrium between Chromosomes in the Human Genome: Test Statistics and Rapid Computation." The Digital Repository of University of Helsinki. http://hdl.handle.net/10138/16957
Haiman, C. A.,Stram, D. O., Pike, M. C., Kolonel, L. N., Burtt, N. P., Altshuler, D., Hirschhorn, J., and Henderson, B. E. 2003. "A comprehensive haplotype analysis ofCYP19 and breast cancer risk: the Multiethnic Cohort." Human Molecular Genetics 12(20): 2679–2692
Zhenqiu L., and Shili L., 2005. “Multilocus LD Measure and Tagging SNP Selection With Generalized Mutual Information”. Genetic Epidemiology 29:353-364.
Liang, H., and Hua, Y. 2010. "An Efficient Tagging SNP Selection Method Using Normalized Mutual Information and Joint Entropy," Intelligent Systems and Applications (ISA), 2010 2nd International Workshop on: 1-4.
Carlson, C. S., Eberle, M. A., Rieder, M. J., Yi, Q., Kruglyak, L., and Nickerson, D. A. 2004. "Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium." American journal of human genetics 74(1): 106-120.
Takeuchi, F., Yanai, K., Morii, T., Ishinaga, Y., Taniguchi-Yanai, K., Nagano, S., and Kato, N. 2005. "Linkage Disequilibrium Grouping of Single Nucleotide Polymorphisms (SNPs) Reflecting Haplotype Phylogeny for Efficient Selection of Tag SNPs." Genetics 170: 291-304.
Wang, L., and Xu, Y. 2003 "Haplotype inference by maximum parsimony." Bioinformatics 19(14): 1773-1780.
Stram, D. O., Haiman, C. A., Hirschhorn, J. N., Altshuler, D., Kolonel, L. N., Henderson, B. E., and Pike, M. C. 2003. "Choosing Haplotype-Tagging SNPS Based on Unphased Genotype Data Using a Preliminary Sample of Unrelated Subjects with an Example from the Multiethnic Cohort Study." Human Heredity 55: 27-36.
Nickerson, D. A., Taylor, S. L., Fullerton, S. M., Weiss, K. M., Clark, A. G., Stengård, J. H., Salomaa, V., Boerwinkle, E., and Sing, C. F. 2000. "Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene." Genome Research 10: 1532-1545.
Qin, Z. S., Niu, T., and Liu, J. S. 2002. "Letter to the Editor." American journal of human genetics 71:1242–1247.