研究生: |
林沿妊 |
---|---|
論文名稱: |
A.利用區域預測法來預測雞蛋的生產力 B. 一個簡單的基因型的補差法和利用隱馬可夫模型來預測單體型的複製數並且推斷單體型聚類 A. Prediction by zone and its application to egg productivity in chickens B. A simple genotype imputation method and copy number haplotype inference with Hidden Markov Model and localized haplotype clustering |
指導教授: |
唐傳義
Tang, Chuan Yi |
口試委員: |
林俊淵
唐傳義 蔡七女 韓永楷 李御賢 劉明麗 謝文萍 |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 97 |
中文關鍵詞: | 區域預測法 、雞蛋的生產力 、雞 、基因型的補差法 、單體型的複製數 、隱馬可夫模型 、單體型 |
外文關鍵詞: | Prediction by zone, egg productivity, chickens, genotype imputation, copy number haplotype, Hidden Markov Model, haplotype |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
A. 利用區域預測法來預測雞蛋的生產力
台灣紅羽土雞(TRFCCs)在台灣的主要肉食資源之一。由於缺乏系統化的育種計劃,導致這個品種的土機產蛋率逐漸下降。區域預測法(PreZone)程式的開發,以利在早期就能選擇產蛋力低的雞,以提高台灣紅羽土雞產蛋力。本研究採用三組雞(A、B和C)。兩種方法被用來確定低產蛋率的雞隻。第一種方法,根據單一組的數據來預測低產雞群,第二個方法是用兩組數據分別測出低產雞群,並取其交集來預測低產雞。四個血清蛋白,包括apolipoprotein A-I, vitellogenin, X protein (an IGF-I-like protein) and apo VLDL-II,分別在8,14,22或24週的雞測得。總產蛋數是紀錄每隻雞在生產周期內所下的總蛋數。區域預測法(PreZone)利用四個血清蛋白作為選擇參數,並與使用相同參數的線性回歸方法所得到的結果進行比較。PreZone方法提供了另一種預測方法,可以用在低相關的参數與預測值。它應用在台灣紅羽土雞中,可在前期尋找低產蛋雞並提高雞蛋的生產率。
B. 一個簡單的基因型的補差法和利用隱馬可夫模型來預測單體型的複製數並且推斷單體型聚類
高通量基因分型技術,取得了全基因組相關的可能。單核苷酸多態性(SNP)陣列為基礎的技術所得數據通常有缺陷的,由於缺少數據,儘管他們有很高的鑑別率和良好的符合率在不同的基因型。缺少的SNPs會影響分析,因此在一些研究中,那些遺失的數據會被刪除。補差法是將一個最可能的數值,填充到缺失的數據的方法。它可以增加數據的關聯性,但又不涉及額外費用去做遺失的SNPs基因型。在這篇文章中,我們提出了一個簡單的補差法(Simpute),高分辨率的SNP陣列平台或大規模並平行測序平台的優勢。它是基於連鎖不平衡(LD)的染色體結構,只用遺失SNPs附近的兩個SNPs資料來填補丟失的SNPs數據。Simpute不使用任何引用數據。Simpute提供了一個對整個基因組有簡便,準確,快速的解決方案。我們已經證明了單核苷酸多態性在染色體上的密集分佈與相鄰位點的連鎖不平衡,沒有必要採用複雜的算法。Simpute是適合大規模SNPs基因分型的定期檢查,尤其是當樣本規模大,效率是一個工作流程的重大問題。
現在拷貝數多態性和畸變研究使用的全基因組SNP陣列的分辨率高。本文提出了基於隱馬爾可夫模型檢測染色體上的具體拷貝數變化的一種方法。一個單體型樹的構造與動態分支合併模型,評估每個SNP位點的兩個等位基因拷貝數狀態的轉換。發散模型構建在兩個單倍型形成的基因型。該方法可提供拷貝數變異區域分割點以及逐步每條染色體上的等位基因狀態的單體。估計拷貝數是小數,可有效地適應,通常包括正常和突變的細胞的癌組織中的體細胞突變。該算法被驗証於先前公佈的270 HapMap的個人變異區域。五個目前偵測變異區域的方法:PennCNV,genoCN,COKGEN,QuantiSNP和cnvHap結果進行比較。該算法在全基因組研究,具有大致相若的CNV區域的靈敏度,也是這些方法中最好的算法,並展現了密集區域的SNP檢出率最高。此外,我們提供更好的準確性的單體型方法。
A. Prediction by zone and its application to egg productivity in chickens
Taiwan red-feathered country chickens (TRFCCs) are one of the main meat resources in Taiwan. Due to the lack of any systematic breeding programs to improve egg productivity, the egg production rate of this breed has gradually decreased. The prediction by zone (PreZone) program was developed to select the chickens with low egg productivity so as to improve the egg productivity of TRFCCs before they reach maturity. Three groups (A, B and C) of chickens were used in this study. Two approaches were used to identify chickens with low egg productivity. The first approach used predictions based on a single dataset, and the second approach used predictions based on the union of two datasets. The levels of four serum proteins, including apolipoprotein A-I, vitellogenin, X protein (an IGF-I-like protein) and apo VLDL-II, were measured in chickens that were 8, 14, 22 or 24 weeks old. Total egg numbers were recorded for each individual bird during the egg production period. PreZone analysis was performed using the four serum protein levels as selection parameters, and the results were compared to those obtained using a first-order multiple linear regression method with the same parameters. The PreZone program provides another prediction method that can be used to validate datasets with a low correlation between response and predictors. It can be used to find low and improve egg productivity in TRFCCs by selecting the best chickens before they reach maturity.
B. A simple genotype imputation method and copy number haplotype inference with Hidden Markov Model and localized haplotype clustering
High-throughput technology for genotyping has made genome-wide associations possible. Single nucleotide polymorphism (SNP) data derived from array-based technology are usually flawed due to missing data, although they have generally high call rates and good concordance rates across different genotype calling schemes. Missing SNPs can bias the results of association analyses and hence loci with missing data are removed in some studies. Imputation is a method of compensating for the missing data by filling in the most probable values. It can increase the power of the association study and does not involve extra cost to genotype the missing SNPs. In this article, we propose a simple imputation method (Simpute) that takes advantage of the high resolution of SNPs in either the array platform or the mass parallel sequencing platform. It is based on the linkage disequilibrium (LD) structure of the chromosome and only two nearby SNPs are needed to fill in the missing data. Simpute does not use any reference data. Simpute provides a simple, accurate and fast solution to the whole genome imputation. We have demonstrated that when the SNPs are densely distributed on the chromosome with high linkage disequilibrium between adjacent loci, there is no need to adopt complicated algorithms. Simpute is suitable for regular screening of the large scale SNP genotyping especially when the sample size is large and the efficiency is a major issue of the workflow.
Copy number polymorphisms and aberrations can now be studied at high resolution using genome-wide SNP arrays. This paper presents a method based on Hidden Markov Model to detect parent specific copy number change on both chromosomes. A haplotype tree is constructed with dynamic branch merging to model the transition of the copy number status of the two alleles assessed at each SNP locus. The emission models are constructed for the genotypes formed with the two haplotypes. The proposed method can provide the segmentation points of the copy number variation regions as well as the haplotype phasing for the allelic status on each chromosome. The estimated copy numbers are provided as fractional numbers, which can effectively accommodate the somatic mutation in cancer specimens that usually consist of both normal and mutant cells. The algorithm is evaluated on the previously published regions of copy number variation on the 270 HapMap individuals. The results were compared with five popular methods: PennCNV, genoCN, COKGEN, QuantiSNP and cnvHap. The proposed algorithm exhibits roughly comparable sensitivity of the CNV regions to the best algorithm in our genome-wide study and demonstrates the highest detection rate in SNP dense regions. In addition, we provide better haplotype phasing accuracy than similar approaches.
PART A
Bibliography
1. Gowe R.S., Fairfull R.W. (1980) Performance of six long-term multi-trait selected Leghorn strains and three control strains, and a strain cross evaluation of the selected strains. In Proceedings of the South Pacific Poultry Science Convention, Auckland, New Zealand, 141-162.
2. Lush J.M. (1937) Animal Bleeding Plans., Collegiate Press, Ames. Iowa, USA.
3. Renema R.A., Robinson F.E., Proudman J.A., Newcombe M., McKay R.I. (1999) Effects of body weight and feed allocation during sexual maturation in broiler breeder hens. 2. Ovarian morphology and plasma hormone profiles. Poultry Science, 78(5), 629-639.
4. Robinson E.F., Wilson J.L. (1996) Reproductive failure in overweight male and female broiler breeder. Animal Feed Science and Technology, 78, 629-639.
5. Antin P.B., Konieczka J.H. (2005) Genomic resources for chicken. Developmental Dynamics, 232(4), 877-882.
6. Schreiweis M.A., Hester P.Y., Settar P., Moody D.E. (2006) Identification of quantitative trait loci associated with egg quality, egg production, and body weight in an F2 resource population of chickens. Animal Genetics, 37(2), 106-112.
7. Dunn I.C., Miao Y.W., Morris A., Romanov M.N., Wilson P.W., Waddington D. (2004) A study of association between genetic markers in candidate genes and reproductive traits in one generation of a commercial broiler breeder hen population. Heredity, 92(2), 128-134.
8. Abasht B., Sandford E., Arango J., Settar P., Fulton J.E., O'Sullivan N.P., Hassen A., Habier D., Fernando R.L. et al (2009) Extent and consistency of linkage disequilibrium and identification of DNA markers for production and egg quality traits in commercial layer chicken populations. BMC Genomics, 10(2), article s2.
9. Kuo Y.M., Shiue Y.L., Chen C.F., Tang P.C., Lee Y.P. (2005) Proteomic analysis of hypothalamic proteins of high and low egg production strains of chickens. Theriogenology, 64(7), 1490-1502.
10. Leszczynski D.E., Hagan R.C., Bitgood J.J., Kummerow F.A. (1985) Relationship of Plasma Estradiol and Progesterone Levels to Egg Productivity in Domestic Chicken Hens. Poultry Science, 64(3), 545-549.
11. Robinson F.E., Renema R.A., Oosterhoff H.H., Zuidhof M.J., Wilson J.L. (2001) Carcass traits, ovarian morphology and egg laying characteristics in early versus late maturing strains of commercial egg-type hens. Poultry Science, 80(1), 37-46.
12. Morris A.J., Pollott G.E. (1997) Comparison of selection based on phenotype, selection index and best linear unbiased prediction using data from a closed broiler line. British Poultry Science, 38(3), 249-254.
13. Berger-Wolf T.Y., Moore C., Saia J. (2007) A computational approach to animal breeding. Journal of Theoretical Biology, 244(3), 433-439.
14. Wang G.Y., Li A., Zhu M.X. (2003) Advance on poultry broodiness. Fujian Animal and Veterinary Scinece, 25, 1-2.
15. Yang N., Jiang R.S. (2005) Recent advances in breeding for quality chickens. World's Poultry Science Journal, 61(3), 373-381.
16. Huang S.Y., Lin J.H., Chen Y.H., Chuang C.K., Chiu Y.F., Chen M.Y., Chen H.H., Lee W.C. (2006) Analysis of chicken serum proteome and differential protein expression during development in single-comb White Leghorn hens. Proteomics, 6(7), 2217-2224.
17. Liou M.L., Huang S.Y., Liu Y.C., Lin J.H., Chuang C.K., Lee W.C. (2007) Association of serum protein levels with egg productivity in Taiwan red-feathered country chickens. Animal Reproduction Science, 100(1-2), 158-171.
18. Quinton M., Smith C., Goddard M.E. (1992) Comparison of Selection Methods at the Same Level of Inbreeding. Journal of Animal Science, 70(4), 1060-1067.
19. Gornall D.A., Kuksis A. (1973) Alterations in Lipid Composition of Plasma Lipoproteins during Deposition of Egg-Yolk. Journal of Lipid Research, 14(2), 197-205.
20. Schneider W.J. (1995) Yolk precursor transport in the laying hen. Current Opinion Lipidology, 6(2), 92-96.
21. Ito Y., Kihara M., Nakamura E., Yonezawa S., Yoshizaki N. (2003) Vitellogenin transport and yolk formation in the quail ovary. Zoolpgocal Science, 20(6), 717-726.
PART B
Bibliography
1. Bell, J.I. (2002) Single nucleotide polymorphisms and disease gene mapping. Arthritis Res, 4 Suppl 3, S273-278.
2. Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L. et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409, 928-933.
3. Kruglyak, L. and Nickerson, D.A. (2001) Variation is the spice of life. Nat Genet, 27, 234-236.
4. Korn, J.M., Kuruvilla, F.G., McCarroll, S.A., Wysoker, A., Nemesh, J., Cawley, S., Hubbell, E., Veitch, J., Collins, P.J., Darvishi, K. et al. (2008) Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet, 40, 1253-1260.
5. Jung, H.Y., Park, Y.J., Kim, Y.J., Park, J.S., Kimm, K. and Koh, I. (2007) New methods for imputation of missing genotype using linkage disequilibrium and haplotype information. Inform Sciences, 177, 804-814.
6. Marchini, J., Howie, B., Myers, S., McVean, G. and Donnelly, P. (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet, 39, 906-913.
7. Abecasis, G.R., Li, Y., Willer, C.J., Ding, J. and Scheet, P. (2010) MaCH: Using Sequence and Genotype Data to Estimate Haplotypes and Unobserved Genotypes. Genet Epidemiol, 34, 816-834.
8. Lin, D.Y., Hu, Y. and Huang, B. (2008) Simple and efficient analysis of disease association with missing genotype data. American Journal of Human Genetics, 82, 444-452.
9. Scheet, P. and Stephens, M. (2006) A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics, 78, 629-644.
10. Browning, B.L. and Browning, S.R. (2009) A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals. American Journal of Human Genetics, 84, 210-223.
11. Burton, P.R., Clayton, D.G., Cardon, L.R., Craddock, N., Deloukas, P., Duncanson, A., Kwiatkowski, D.P., McCarthy, M.I., Ouwehand, W.H., Samani, N.J. et al. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661-678.
12. Spencer, C.C.A., Su, Z., Donnelly, P. and Marchini, J. (2009) Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. Plos Genet, 5(5), 1-13.
13. Sun, Y.V. and Kardia, S.L.R. (2008) Imputing missing genotypic data of single-nucleotide polymorphisms using neural networks. Eur J Hum Genet, 16, 487-495.
14. Chiano, M.N. and Clayton, D.G. (1998) Fine genetic mapping using haplotype analysis and the missing data problem. Ann Hum Genet, 62, 55-60.
15. Qin, Z.H.S., Niu, T.H. and Liu, J.S. (2002) Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. American Journal of Human Genetics, 71, 1242-1247.
16. Chang, C.T., Lin, Y. J., Tang, C.T., Hsieh, W.P. (2009) Comparison of genotype imputation methods for SNP array data. The 59th annual meeting of the American Society of Human Genetics. Oct. 20-24, Honolulu, USA.
17. Rovelet-Lecrux, A., Hannequin, D., Raux, G., Le Meur, N., Laquerriere, A., Vital, A., Dumanchin, C., Feuillette, S., Brice, A., Vercelletto, M. et al. (2006) APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy. Nat Genet, 38, 24-26.
18. Fellermann, K., Stange, D.E., Schaeffeler, E., Schmalzl, H., Wehkamp, J., Bevins, C.L., Reinisch, W., Teml, A., Schwab, M., Lichter, P. et al. (2006) A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am J Hum Genet, 79, 439-448.
19. Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., Yamrom, B., Yoon, S., Krasnitz, A., Kendall, J. et al. (2007) Strong association of de novo copy number mutations with autism. Science, 316, 445-449.
20. Simon-Sanchez, J., Scholz, S., Matarin Mdel, M., Fung, H.C., Hernandez, D., Gibbs, J.R., Britton, A., Hardy, J. and Singleton, A. (2008) Genomewide SNP assay reveals mutations underlying Parkinson disease. Hum Mutat, 29, 315-322.
21. Walsh, T., McClellan, J.M., McCarthy, S.E., Addington, A.M., Pierce, S.B., Cooper, G.M., Nord, A.S., Kusenda, M., Malhotra, D., Bhandari, A. et al. (2008) Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science, 320, 539-543.
22. Cappuzzo, F., Hirsch, F.R., Rossi, E., Bartolini, S., Ceresoli, G.L., Bemis, L., Haney, J., Witta, S., Danenberg, K., Domenichini, I. et al. (2005) Epidermal growth factor receptor gene and protein and gefitinib sensitivity in non-small-cell lung cancer. J Natl Cancer Inst, 97, 643-655.
23. Stark, M. and Hayward, N. (2007) Genome-wide loss of heterozygosity and copy number analysis in melanoma using high-density single-nucleotide polymorphism arrays. Cancer Res, 67, 2632-2642.
24. Zhang, Y., Martens, J.W.M., Yu, J.X., Jiang, J., Sieuwerts, A.M., Smid, M., Klijn, J.G.M., Wang, Y.X. and Foekens, J.A. (2009) Copy Number Alterations that Predict Metastatic Capability of Human Breast Cancer. Cancer Research, 69, 3795-3801.
25. Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W. et al. (2006) Global variation in copy number in the human genome. Nature, 444, 444-454.
26. Wang, K., Li, M., Hadley, D., Liu, R., Glessner, J., Grant, S.F., Hakonarson, H. and Bucan, M. (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res, 17, 1665-1674.
27. Colella, S., Yau, C., Taylor, J.M., Mirza, G., Butler, H., Clouston, P., Bassett, A.S., Seller, A., Holmes, C.C. and Ragoussis, J. (2007) QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res, 35, 2013-2025.
28. Sun, W., Wright, F.A., Tang, Z., Nordgard, S.H., Van Loo, P., Yu, T., Kristensen, V.N. and Perou, C.M. (2009) Integrated study of copy number states and genotype calls using high-density SNP arrays. Nucleic Acids Res, 37, 5365-5377.
29. Coin, L.J., Asher, J.E., Walters, R.G., Moustafa, J.S., de Smith, A.J., Sladek, R., Balding, D.J., Froguel, P. and Blakemore, A.I. (2010) cnvHap: an integrative population and haplotype-based multiplatform model of SNPs and CNVs. Nat Methods, 7, 541-546.
30. Pique-Regi, R., Ortega, A. and Asgharzadeh, S. (2009) Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA. Bioinformatics, 25, 1223-1230.
31. Huang, J., Wei, W., Chen, J., Zhang, J., Liu, G., Di, X., Mei, R., Ishikawa, S., Aburatani, H., Jones, K.W. et al. (2006) CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays. BMC Bioinformatics, 7, 83.
32. Laframboise, T., Harrington, D. and Weir, B.A. (2007) PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data. Biostatistics, 8, 323-336.
33. Yavas, G., Koyuturk, M., Ozsoyoglu, M., Gould, M.P. and Laframboise, T. (2010) Cokgen: a software for the identification of rare copy number variation from SNP microarrays. Pac Symp Biocomput, 371-382.
34. Korn, J.M., Kuruvilla, F.G., McCarroll, S.A., Wysoker, A., Nemesh, J., Cawley, S., Hubbell, E., Veitch, J., Collins, P.J., Darvishi, K. et al. (2008) Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet, 40, 1253-1260.
35. Olshen, A.B., Venkatraman, E.S., Lucito, R. and Wigler, M. (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5, 557-572.
36. Browning, B.L. and Browning, S.R. (2007) Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet Epidemiol, 31, 365-375.
37. Browning, S.R. (2006) Multilocus association mapping using variable-length Markov chains. American Journal of Human Genetics, 78, 903-913.
38. Devlin, B. and Risch, N. (1995) A Comparison of Linkage Disequilibrium Measures for Fine-Scale Mapping. Genomics, 29, 311-322.
39. Pritchard, J.K. and Przeworski, M. (2001) Linkage disequilibrium in humans: Models and data. American Journal of Human Genetics, 69, 1-14.
40. Lewontin, R.C. (1964) Interaction of Selection + Linkage .I. General Considerations - Heterotic Models. Genetics, 49, 49-&.
41. Stephens, M., Smith, N.J. and Donnelly, P. (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet, 68, 978-989.
42. Scheet, P. and Stephens, M. (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet, 78, 629-644.
43. Viterbi, A.J. (1967) Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. Ieee T Inform Theory, It13, 260-+.
44. McVean, G., Frazer, K.A., Ballinger, D.G., Cox, D.R., Hinds, D.A., Stuve, L.L., Gibbs, R.A., Belmont, J.W., Boudreau, A., Hardenbol, P. et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851-U853.
45. Kidd, J.M., Cooper, G.M., Donahue, W.F., Hayden, H.S., Sampas, N., Graves, T., Hansen, N., Teague, B., Alkan, C., Antonacci, F. et al. (2008) Mapping and sequencing of structural variation from eight human genomes. Nature, 453, 56-64.
46. Iafrate, A.J., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W. and Lee, C. (2004) Detection of large-scale variation in the human genome. Nat Genet, 36, 949-951.
47. McCarroll, S.A., Kuruvilla, F.G., Korn, J.M., Cawley, S., Nemesh, J., Wysoker, A., Shapero, M.H., de Bakker, P.I., Maller, J.B., Kirby, A. et al. (2008) Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet, 40, 1166-1174.