簡易檢索 / 詳目顯示

研究生: 彭涵琪
論文名稱: 利用非負矩陣分解從大量生物晶片資料萃取白血球細胞基因表現譜
Meta-Expression Profile Retrieval for White Blood Cells with Non-Negative Matrix Factorization
指導教授: 謝文萍
口試委員: 莊永仁
盧鴻興
學位類別: 碩士
Master
系所名稱: 理學院 - 統計學研究所
Institute of Statistics
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 38
中文關鍵詞: NMFleukocytemicroarraymeta-profiles
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Abstract
    The immune system is important for human body to protect against bacteria and viruses. It consists of various types of cells in blood. One set of the major cells in the immune system consist of several different white blood cells. They have been extensively studied individually with gene expression arrays. There are tens of thousands of genes assessed with only a small number of samples in each study. It should be very interesting to combine all the data together and compare the expression profiles in parallel to explore the similarity and difference across different white blood cells.
    Non-Negative Matrix Factorization (NMF) is one of the most popular tools in multivariate analysis for decomposing high dimensional data. This study aims at retrieving the white blood cell type specific meta-profiles from a large dataset collected from different platforms and different experiments. We adopted NMF and explored the meta-profiles of four types of major leukocytes, the T cells, B cells, monocytes and neutrophils. Array data were collected from the Gene Expression Omnibus (GEO). The meta-profiles derived with NMF carry robust information across the two commercial platforms, Affymetrix and Illumina. It can be well explained by the relatively large difference of expression patterns among the cell types under consideration in comparison with the difference across platforms or experiments. The minimal restriction and assumption of NMF also contributes to the accurate mapping between the meta-profiles and the mean profiles.


    CONTENTS 1. Introduction 1 2. Material and Method 3 2.1. Array data collected from GEO 3 2.2. Method…………………………………………………………………………………………………………….…………4 3. Results 5 3.1. Data Analysis with NMF 5 3.1.1. The correlation of data across platforms, cell types and studies 6 3.1.2. Training with data from Affymetrix U133A and testing with data from Affymetrix U133plus2 9 3.1.3. Training with data from Affymetrix U133Plus2 and testing with data from Affymetrix U133A 12 3.1.4. Training with data from Affymetrix U133 series and testing with data from Affymetrix U95 series 14 3.1.5. Training with data from Affymetrix U133A or U133Plus2 and testing with data from Illumina arrays 16 3.1.6. Cluster with the raw data vs. cluster with loading matrix 18 3.2. Evaluation of the Robustness for Meta-Profile Retrieval with NMF 21 3.3. GO Term Analysis 23 4. Conclusion and discussion 29 5. References 30 Appendix 31

    References
    Anttila, P., P. Paatero, et al. (1995). "Source identification of bulk wet deposition in Finland by positive matrix factorization." Atmospheric Environment 29(14): 1705–1718.
    Beissbarth.T. and Speed.TP. (2004). "GOstat: Find statistically overrepresented Gene Ontologies within a group of genes." Bioinformatics 20(9): 1464-1465.
    Ben-Israel, A. a. Greville, et al., Eds. (2003). Generalized Inverses: Theory and Applications, 2nd edition. New York, Springer.
    Brunet, J.-P., P. Tamayo, et al. (2004). "Metagenes and molecular pattern discovery using matrix factorization." PNAS 101(12): 4164–4169.
    Devarajan, K. (2008). "Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology." PLoS Computational Biology 4(7): e1000029.
    Frigyesi, A. and M. Hoglund (2008). "Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: identification of Clinically Relevant Tumor Subtypes." Cancer Informatics 2008(6): 275–292.
    Gaujoux, R. and C. Seoighe (2010). "A flexible R package for nonnegative matrix factorization." BMC Bioinformatics 11(367): 1471-2105.
    GEO. "GEO website." from http://www.ncbi.nlm.nih.gov/geo/.
    Lawton, W. H. and E. A. Sylvestre (1971). "Self modeling curve resolution." Technometrics 13(3): 617+

    Lee, D. D. and H. S. Seung (1999). "Learning the parts of objects by non-negative matrix factorization." Nature 401 (6755): 788–791.
    Lee, D. D. and H. S. Seung (2001). "Algorithms for Non-negative Matrix Factorization." Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference. MIT Press: pp. 556–562.
    Liebermeister, W. (2002). "linear model of gene expression determined by independent component analysis." Bioinformatics 18(1): 51 - 60.
    Paatero, P. and U. Tapper (1994). "Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values." Environmetrics 5: 111–126.
    Raychaudhuri.S., Stuart.JM., et al. (2000). "Principal components analysis to summarize microarray experiments: application to sporulation time series." Pacific Symposium on Biocomputing 5: 452-463.
    Tamayo, P., D. Scanfeld, et al. (2007). "Metagene projection for cross-platform, cross-species characterization of global transcriptional states." PNAS 104(14): 5959-5964.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE