研究生: |
鄭琬琪 Wan-Chi Cheng |
---|---|
論文名稱: |
改良SNPscanner偵測SNP之方法 Improved SNP detection with SNPscanner for Affymetrix tiling arrays |
指導教授: |
謝文萍
Wen-Ping Hsieh |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 英文 |
論文頁數: | 42 |
中文關鍵詞: | 生物晶片 、單核甘酸多型性 |
外文關鍵詞: | SNPscanner, SNP, microarray |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Tiling array is a new platform designed to detect sequence variation as well as gene expression for novel alleles. Gresham et al. (2006) developed a program called SNPscanner for tiling array technology to detect single nucleotide polymorphisms between two strains. In this thesis, its performance is demonstrated and two improvements are proposed.
The two main SNP detection problems are bias problem and the separation of two closely linked SNPs. For bias problem, it is not appropriate for SNPscanner to fit the model with symmetric assumptions since it is observed that when the SNP occurs close to the two ends of a probe, the decrease of the affinity is not symmetric. The model is corrected with asymmetric assumptions and it successfully solves the problem of detection bias. For the separation of two closely linked SNPs, the region of positive signals detected by SNPscanner sometimes includes more than one SNP. Curve fitting with Gaussian kernel is proposed and log-likelihood ratio test is performed to select the model with either one peak or two peaks. It gives the correct detection of existing SNPs and improves the accuracy of the detection.
Although the proposed method in this study is on the right direction of improving the SNP detection with SNPscanner, there is still space to enhance the function of the software. For example, the log-likelihood ratio proposed in this study does not strictly follow the chi-square distribution and the proposed Gaussian kernel does not fit the curve well in certain circumstances. In addition to these, the model needs to record lots of information from the training data and limits the flexibility of detecting SNPs when applied on a different species. Further improvement is still under study.
Borevitz J.O., Liang D., Plouffe D, Chang H.-S., Zhu T., Weigel D., Berry C.C., Winzeler E. and Chory J. (2003) Large-scale identification of single-feature polymorphisms in complex genomes. Genome Research, 13, 513–523.
Castle J., Garrett-Engele P., Armour C.D., Duenwald S.J., Loerch P.M., Meyer M.R, Schadt E. E., Stoughton R., Parrish M.L., Shoemaker, D.D. and Johnson J.M. (2003) Optimization of oligonucleotide arrays and RNA amplification protocols for analysis of transcript structure and alternative splicing. Genome Biology, 4, R66.
David L., Huber W., Granovskaia M., Toedling J., Palm C.J., Bofkin L., Jones T., Davis R.W. and Steinmetz L.M. (2006) A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci., 103, 5320–5325.
Gerber A.P., Herschlag D., Brown P.O. (2004) Extensive association of functionally and cytotopically related mRNAs with Puf family RNA-binding proteins in yeast. PLoS Biol., 2, E79.
Gresham D., Douglas M.R., Stephen C.P., Schacherer J., Maitreya J.D., Botstein D. and Kruglyak L. (2006) Genome-wide detection of polymorphisms at nucleotide resolution with a single DNA microarray. Science, 311, 1932–1936.
Irizarry R.A., Hobbs B., Collin F., Beazer-Barclay Y.D., Antonellis K.J., Scherf U. and Speed T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249–264.
Li C. and Wong W. (2001) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Science, 98, 31–36.
Matthew J.B., Cheryl M.C., Dave A.P. and Maitreya J.D. (2006) Mapping novel traits by array-assisted Bulk segregant analysis in saccharomyces cerevisiae. Genetics, 173, 1813–1816.
Mockler T.C., Chan S., Sundaresan A., Chen H., Jacobsen S.E. and Ecker J.R. (2005) Applications of DNA tiling arrays for whole-genome analysis. Genomics, 85, 1–15.
Naef F. and Magnasco M. (2003) Solving the riddle of the bright mismatches:labeling and effective binding in oligonucleotide arrays. Phys. Rev., E68.
Royce T.E., Rozowsky J.S., Bertone P., Samanta M., Stolc V., Weissman S., Snyder M. and Gerstein M. (2005) Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends in Genetics, 21,466–475.
Segre A.V., Murray A.W. and Leu J.-Y. (2006) High-resolution mutation mapping reveals parallel experimental evolution in yeast. PLoS Biology, 4, 1372-1385.
Wilson G.M., Flibotte S., Missirlis P.I., Marra M.A., Jones S., Thornton K., Clark A.G. and Holt R.A. (2006) Identification by full-coverage array CGH of human DNA copy number increases relative to chimpanzee and gorilla. Genome Research, 16, 173–181.
Winzeler E.A., Castillo-Davis C.I., Oshiro G., Liang D., Richards D.R., Zhou Y. and Hartl D.L. (2003) Genetic diversity in yeast assessed with whole genome oligonucleotide arrays. Genetics, 163, 79–89.
Winzeler E.A., Richards D.R., Conway A.R., Goldstein A.L., Kalman S., McCullough M.J., McCusker J.H.,. Stevens D.A., Wodicka L., Lockhart D.J. and Davis, R.W. (1998) Direct allelic variation scanning of the yeast genome. Science, 281, 1194–1197.
Wu Z., Irizarry R.A., Gentleman R., Martinez-Murillo F. and Spencer F. (2004) A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association, 99, 909–917.