研究生: |
李錫諺 Lee, Hsi-Yen |
---|---|
論文名稱: |
利用迭代分群法尋找基因表現分群結構 Iterative clustering of gene expression data in search of subgroups of general population |
指導教授: |
謝文萍
Hsieh, Wen-Ping |
口試委員: |
鍾仁華
Chung, Ren-Hua 張升懋 Chang, Sheng-Mao |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 50 |
中文關鍵詞: | 迭代分群法 、批次效應 、血液基因表現量 、人類體質 |
外文關鍵詞: | iterative clustering, batch effect, blood gene expressiond, sparse k-means, BUS model |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基因表現量矩陣被科學家運用於各種問題上,其中有用於發掘新的疾病特徵、也有用於拆解細胞組成。對於同一種疾病,常常在不同人有著不同的反應,人類的體質在疾病中扮演著多重的角色,不但影響診斷結果,也對治療之後的結果造成很大的分歧。
為了從基因表現的訊號中瞭解人類體質的自然分群結構,我們提出了迭代分群法,利用兩種分群方法,Batch effects correction with Unknown Subtypes(BUS)和sparse k-means,採取迭代的演算法,不斷的將基因表現量中不同的分群結構挖掘出來,藉此找出有關體質的分群以及重要的基因。我們蒐集大量的血液基因表現資料,進行大規模的探索性分析,最後利用gene ontology analysis解釋這些基因在生物體中扮演的角色。
The gene expression matrix has been applied to a variety of problems by scientists, including the use of tapping into the new features of disease and the disassembly of cellular components. For the same disease, it often has different reactions in different people. The human constitution plays multiple roles in the disease, which not only affects the diagnosis results, but also causes great difference on the results after treatment.
In order to understand the natural grouping structure of human constitution from the signal of gene expression, we proposed an iterative clustering method based on two methods of clustering, Sparse k-means and BUS, to detect unknown subtypes. We can detect different clustering structures in the gene expression and select feature genes for each of the structure at the same time. We collected a large amount of blood gene expression data, conducted a large-scale exploratory analysis, and finally use gene ontology analysis to explain the role of these genes.
1. Tusher, V. G., et al. (2001). "Significance analysis of microarrays applied to the ionizing radiation response." Proceedings of the National Academy of Sciences 98(9): 5116-5121.
2. Spang, R. (2003). "Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine." Biosilico 1(2): 64-68.
3. Eisen, M. B., et al. (1998). "Cluster analysis and display of genome-wide expression patterns." Proceedings of the National Academy of Sciences 95(25): 14863-14868.
4. Alizadeh, A. A., et al. (2000). "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling." Nature 403(6769): 503.
5. Gruźdź, A., et al. (2006). "Interactive gene clustering—a case study of breast cancer microarray data." Information Systems Frontiers 8(1): 21-27.
6. Raza, K. (2014). "Clustering analysis of cancerous microarray data." Journal of Chemical and Pharmaceutical Research 6(9): 488-493.
7. D'haeseleer, P. (2005). "How does gene expression clustering work?" Nature biotechnology 23(12): 1499.
8. Kondo, Y., et al. (2012). "A robust and sparse K-means clustering algorithm." arXiv preprint arXiv:1201.6082.
9. Witten, D. M. and R. Tibshirani (2010). "A framework for feature selection in clustering." Journal of the American Statistical Association 105(490): 713-726.
10. Johnson, W. E., et al. (2007). "Adjusting batch effects in microarray expression data using empirical Bayes methods." Biostatistics 8(1): 118-127.
11. Leek, J. T. and J. D. Storey (2007). "Capturing heterogeneity in gene expression studies by surrogate variable analysis." PLoS genetics 3(9): e161.
12. Luo, Xiangyu, and Yingying Wei. "Batch effects correction with unknown subtypes". Journal of the American Statistical Association. Accepted.
13. Tibshirani, R., et al. (2001). "Estimating the number of clusters in a data set via the gap statistic." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2): 411-423.
14. George, Edward I., and Robert E. McCulloch (1993). "Variable selection via Gibbs sampling." Journal of the American Statistical Association 88.423: 881-889.
15. Bolstad, B. M., et al. (2003). "A comparison of normalization methods for high density oligonucleotide array data based on variance and bias." Bioinformatics 19(2): 185-193.
16. LaBreche, H. G., et al. (2011). "Integrating factor analysis and a transgenic mouse model to reveal a peripheral blood predictor of breast tumors." BMC medical genomics 4(1): 61.
17. Rotunno, M., et al. (2011). "A gene expression signature from peripheral whole blood for stage I lung adenocarcinoma." Cancer prevention research.
18. Byrnes, A., et al. (2009). "Gene expression in peripheral blood leukocytes in monozygotic twins discordant for chronic fatigue: no evidence of a biomarker." PLoS One 4(6): e5805.
19. Masud, R., et al. (2012). "Gene expression profiling of peripheral blood mononuclear cells in the setting of peripheral arterial disease." Journal of clinical bioinformatics 2(1): 6.
20. Yang, M., et al. (2015). "Decreased mi R‐146 expression in peripheral blood mononuclear cells is correlated with ongoing islet autoimmunity in type 1 diabetes patients 1 型糖尿病患者外周血单个核细胞 miR‐146 表达下调与胰岛持续免疫失衡相关." Journal of diabetes 7(2): 158-165.
21. http://cpdb.molgen.mpg.de/