簡易檢索 / 詳目顯示

研究生: 吳俊良
Wu, Chun-Liang
論文名稱: 使用線性混和模型分解血球細胞組成
Blood Cell Deconvolution with Linear Mixed Model
指導教授: 謝文萍
Hsieh, Wen-Ping
口試委員: 黃冠華
徐南蓉
學位類別: 碩士
Master
系所名稱: 理學院 - 統計學研究所
Institute of Statistics
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 54
中文關鍵詞: 反卷積mRNA微陣列線性
外文關鍵詞: Deconvolution, microarray, Linear
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 血球細胞是由不同類型的細胞組成的混和物,每個細胞類型都有各自的基
    因表現輪廓,而且不同的細胞類型有一定的關聯性。因此,利用基因表現量將
    混和細胞分解成個別細胞類型並得到各自組成比例是一個困難的問題。
    過往相關模型都是先對建個別細胞類型的基因表現量建立模型,再據此
    拆解全血中混和細胞的組成比例。我們提出一個線性混和模型(Linear Mixed
    Model for Deconvolution, LMMD) 分解細胞組成比例,將個別細胞類型和混和
    細胞同時建模,共享相同的參數並共同估計混和細胞中各類型基因表現量,同
    時可獲得未知的細胞組成比例,我們也特別為LMMD 建立了一套顯著基因的選
    取標準。我們將LMMD 的分析流程與其他四個模型進行比較,在處理個別細胞
    類型資料和混和細胞資料來自不同實驗的時候,LMMD 具備了更好的表現。


    Gene expression of blood cells is a mixture of different cell types. Each cell
    type has its own specific profile, and different cell types might be correlated at
    the same time. Hence, decomposing the mixed expression profiles into cell typespecific
    expression profiles and their respective cellular proportions is a difficult problem.
    Previous studies usually build models on reference data that provide cellspecific profiles. We propose a Linear Mixed Model for Deconvolution (LMMD) to estimate the cell-specific expression level by modeling the reference profile and the mixture together in the same construction. We can also obtain the unknown cellular proportions at the same time. We establish the signature gene selection criteria for our LMMD model and compare it with four other models. LMMD has better performance when the reference data and mixture data are from different experiments.

    1 Introduction 1 2 Methods 4 2.1 The algorithm of other deconvolution models compared in this study 4 2.1.1 Linear least squares regression (LLSR) . . . . . . . . . . . . 5 2.1.2 Non-negative least squares model (NNLS) . . . . . . . . . . 5 2.1.3 Digital Sorting Algorithm (DSA) . . . . . . . . . . . . . . . 6 2.1.4 CIBERSORT . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Linear Mixed Deconvolution Model . . . . . . . . . . . . . . . . . . 12 2.2.1 Main model . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Evaluation Criterion . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Signature gene selection strategy . . . . . . . . . . . . . . . . . . . 17 3 Result 18 3.1 Evaluation through simulation . . . . . . . . . . . . . . . . . . . . . 18 3.2 Real data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 Effect of log transformation . . . . . . . . . . . . . . . . . . 34 3.2.3 Comparison of models with three benchmark data sets . . . 36 3.2.4 Effect of gene selection . . . . . . . . . . . . . . . . . . . . . 40 3.2.5 When the cell-specific profiles are from different experiments 44 4 Conclusion and Discussion 50 References 52

    [1] Michael Syskind Pedersen, Ulrik Kjems, Karsten Boye Rasmussen, and
    Lars Kai Hansen. Semi-blind source separation using head-related transfer
    functions [speech signal separation]. In Acoustics, Speech, and Signal Processing,
    2004. Proceedings.(ICASSP’04). IEEE International Conference on,
    volume 5, pages V–713. IEEE, 2004.
    [2] Shahin Mohammadi, Neta Zuckerman, Andrea Goldsmith, and Ananth
    Grama. A critical survey of deconvolution methods for separating cell types
    in complex tissues. Proceedings of the IEEE, 105(2):340–366, 2017.
    [3] Alexander R Abbas, Kristen Wolslegel, Dhaya Seshasayee, Zora Modrusan,
    and Hilary F Clark. Deconvolution of blood microarray data identifies cellular
    activation patterns in systemic lupus erythematosus. PloS one, 4(7):e6098,
    2009.
    [4] Hyunsoo Kim and Haesun Park. Nonnegative matrix factorization based on
    alternating nonnegativity constrained least squares and active set method.
    SIAM journal on matrix analysis and applications, 30(2):713–730, 2008.
    [5] Aaron M Newman, Chih Long Liu, Michael R Green, Andrew J Gentles,
    Weiguo Feng, Yue Xu, Chuong D Hoang, Maximilian Diehn, and Ash A
    Alizadeh. Robust enumeration of cell subsets from tissue expression profiles.
    Nature methods, 12(5):453, 2015.
    [6] Yi Zhong, Ying-Wooi Wan, Kaifang Pang, Lionel ML Chow, and Zhandong Liu. Digital sorting of complex tissues for cell type-specific gene expression profiles. BMC bioinformatics, 14(1):89, 2013.
    [7] Karthik Devarajan. Nonnegative matrix factorization: an analytical and
    interpretive tool in computational biology. PLoS computational biology,
    4(7):e1000029, 2008.
    [8] Renaud Gaujoux and Cathal Seoighe. Semi-supervised nonnegative matrix
    factorization for gene expression deconvolution: a case study. Infection, Genetics
    and Evolution, 12(5):913–921, 2012.
    [9] Bernhard Schölkopf, Alex J Smola, Robert C Williamson, and Peter L
    Bartlett. New support vector algorithms. Neural computation, 12(5):1207–
    1245, 2000.
    [10] Therese Sørlie, Robert Tibshirani, Joel Parker, Trevor Hastie, James Stephen
    Marron, Andrew Nobel, Shibing Deng, Hilde Johnsen, Robert Pesich,
    Stephanie Geisler, et al. Repeated observation of breast tumor subtypes in
    independent gene expression data sets. Proceedings of the National Academy
    of Sciences, 100(14):8418–8423, 2003.
    [11] Marine Jeanmougin, Aurélien De Reynies, Laetitia Marisa, Caroline Paccard,
    Gregory Nuel, and Mickael Guedj. Should we abandon the t-test in the analysis
    of gene expression microarray data: a comparison of variance modeling
    strategies. PloS one, 5(9):e12336, 2010.
    [12] An-Shun Tai. A hierarchical bayesian deconvolution model for tumorinfiltrating
    lymphocytes exploration. unpublished study.
    [13] Yi Zhong and Zhandong Liu. Gene expression deconvolution in linear space.
    Nature methods, 9(1):8, 2012.
    54

    QR CODE