研究生: |
陳威寧 Chen, Wei-Ning |
---|---|
論文名稱: |
以深度學習方法整合微陣列與臨床資料進行非小細胞肺癌的預後預測 Predicting the prognosis of non-small cell lung cancer by integrating microarray and clinical data with deep learning |
指導教授: |
林澤
Lin, Che |
口試委員: |
吳賜猛
Wu, Shy-Meeng 李祈均 Lee, Chi-Chun 曹昱 Tsao, Yu 賴昱衡 Lai, Yu-Heng |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 49 |
中文關鍵詞: | 非小細胞肺癌 、特徵選取 、深度類神經網路 、結合資料 |
外文關鍵詞: | non-small cell lung cancer, feature selection, deep neural network, data integration |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
肺癌是導致癌症患者死亡的主要原因,其中肺癌患者又以非小細胞肺癌佔了大多數。在過去,有許多的研究著重於利用基因表現量來預測肺癌患者的預後狀況。近幾年,有相關的研究指出結合不同種類的資料能夠提供更好的預測。已經有研究利用深度學習的方法結合語音,圖片和影像等不同類型的資料來作更精準的預測。找出新的癌症預後標記能幫助我們鑑定在表現量下的存活率並且對症下藥,在此我們使用深度學習來為非小細胞肺癌患者提供精準的預後預測。
我們使用了614 肺腺癌患者並利用7種知名的生物標記分別將患者分成兩群(marker-/marker+)。分別透過這兩個亞群去計算他們的預後蛋白質相關性係數(PPRV)來建構7個PPRV列表。接下來我們利用這7個7個PPRV列表來挑選出8個額外的預後標記並將原來7個知名的以及這8個標記作為患者的基因特徵來測預5年後生存率。我們的結果顯示出,在DNN模型下的預測精準度(AUC = 0.7926; Accuracy = 74.85%)。此外我們將基因表現量和臨床資料作結合時,能夠再改善這項預測(AUC = 0.8163; Accuracy = 75.44%)。由於資料庫中患者的標籤是不平衡,我們也利用了約登指數來重新分類,而重新分類後的結果顯示出它用在預測和存活分析比先前結果還好。
在本研究中,我們整合了系統生物學和深度學習來設計一個integrative DNN。 它具有比其他所有現有方法更優越的預測性能。我們相信,這樣的準確預測可以幫助腫瘤學家和醫生為每個患者提供適合的輔助治療類型。這樣的研究也為個人化和精確治療奠定了一定的基礎。
Background
According to statistics, over a quarter of deaths by cancer were due to lung cancer. Almost 85% of lung cancer patients suffered from non-small cell lung cancer (NSCLC). An accurate prediction of the prognosis of NSCLC patients is hence an important topic of research. For such prediction, it has been a recent trend in which heterogeneous data sources are combined to provide a better prediction. Tremendous successes have been reported that deep neural networks (DNNs) were used to combine heterogeneous data sources in the area of signal processing of voice, image, and video. However, there have been limited efforts in applying DNNs in the area of personalized and precision therapy.
Result
Based on the microarray data of a cohort set (614 ADC patients), we used 7 well-known NSCLC markers to group patients into marker- and marker+ subgroups, calculated prognostic proteins relevance values (PPRV), and built 7 PPRV lists to select 8 additional prognostic gene markers. We included 7 well-known and 8 additional prognostic gene markers as features and used DNN as a classifier to predict the 5-year survival of ADC patients. Our results showed that the performance of the resulting microarray DNN (AUC=0.7926, accuracy=74.85%) was superior to all the other methods in terms of AUC. Furthermore, we combined gene expression and clinical data to propose an integrative DNN via the concept of bimodal learning to improve the prediction results (AUC=0.8163, accuracy=75.44%). Finally, due to the imbalance of labels of data, Youden index was used for reclassification. The results after reclassification were better than the previous results and all other existing methods in both prediction and survival analysis. Our proposed integrative DNN was also shown to generalize better to an independent validation set.
Conclusion
In this study, we integrated systems biology and deep learning approaches to devise an integrative DNN. The integrative DNN has superior predictive performance than all the other existing methods. We believed that our accurate prediction can help oncologists and physicians to decide on suitable type of adjunct treatment for individual patients, and build the foundation of personalized and precision therapy.
[1] R. Siegel, D. Naishadham, and A. Jemal, “Cancer statistics, 2013,” CA. Cancer J. Clin., vol. 63, no. 1, pp. 11–30, Jan. 2013.
[2] M. F. Reed, M. Molloy, E. L. Dalton, and J. A. Howington, “Survival after resection for lung cancer is the outcome that matters,” Am. J. Surg., vol. 188, no. 5, pp. 598–602, Nov. 2004.
[3] P. Goldstraw et al., “The IASLC Lung Cancer Staging Project: Proposals for the Revision of the TNM Stage Groupings in the Forthcoming (Seventh) Edition of the TNM Classification of Malignant Tumours,” J. Thorac. Oncol., vol. 2, no. 8, pp. 706–714, Aug. 2007.
[4] W. A. Fry, J. L. Phillips, and H. R. Menck, “Ten-year survey of lung cancer treatment and survival in hospitals in the United States,” Cancer, vol. 86, no. 9, pp. 1867–1876, Nov. 1999.
[5] P. C. Hoffman, A. M. Mauer, and E. E. Vokes, “Lung cancer,” The Lancet, vol. 355, no. 9202, pp. 479–485, Feb. 2000.
[6] J.-P. Pignon et al., “Lung Adjuvant Cisplatin Evaluation: A Pooled Analysis by the LACE Collaborative Group,” J. Clin. Oncol., vol. 26, no. 21, pp. 3552–3559, Jul. 2008.
[7] D. G. Beer et al., “Gene-expression profiles predict survival of patients with lung adenocarcinoma,” Nat. Med., vol. 8, no. 8, p. 816, Aug. 2002.
[8] H.-Y. Chen et al., “A Five-Gene Signature and Clinical Outcome in Non–Small-Cell Lung Cancer,” N. Engl. J. Med., vol. 356, no. 1, pp. 11–20, Jan. 2007.
[9] P. A. Baeuerle and O. Gires, “EpCAM (CD326) finding its role in cancer,” Br. J. Cancer, vol. 96, no. 9, pp. 1491–1491, 2007.
[10] W.-F. Zhu et al., “Prognostic value of EpCAM/MUC1 mRNA-positive cells in non-small cell lung cancer patients,” Tumor Biol., vol. 35, no. 2, pp. 1211–1219, Feb. 2014.
[11] S. K. Lau et al., “Three-Gene Prognostic Classifier for Early-Stage Non–Small-Cell Lung Cancer,” J. Clin. Oncol., vol. 25, no. 35, pp. 5562–5569, Dec. 2007.
[12] C. Papadaki et al., “PKM2 as a biomarker for chemosensitivity to front-line platinum-based chemotherapy in patients with metastatic non-small-cell lung cancer,” Br. J. Cancer, vol. 111, no. 9, pp. 1757–1764, Oct. 2014.
[13] R. Chen et al., “A Meta-analysis of Lung Cancer Gene Expression Identifies PTK7 as a Survival Gene in Lung Adenocarcinoma,” Cancer Res., vol. 74, no. 10, pp. 2892–2902, May 2014.
[14] W. C. Zhang et al., “Glycine Decarboxylase Activity Drives Non-Small Cell Lung Cancer Tumor-Initiating Cells and Tumorigenesis,” Cell, vol. 148, no. 1–2, pp. 259–272, Jan. 2012.
[15] D. Zeng et al., “Loss of CADM1/TSLC1 Expression Is Associated with Poor Clinical Outcome in Patients with Esophageal Squamous Cell Carcinoma,” Gastroenterology Research and Practice, 2016.
[16] C. C. Barron, P. J. Bilan, T. Tsakiridis, and E. Tsiani, “Facilitative glucose transporters: Implications for cancer detection, prognosis and treatment,” Metabolism, vol. 65, no. 2, pp. 124–139, Feb. 2016.
[17] T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol. 16, no. 10, pp. 906–914, Oct. 2000.
[18] J. Khan et al., “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nat. Med., vol. 7, no. 6, p. 673, Jun. 2001.
[19] C. Ding and H. Peng, “Minimum redundancy feature selection from microarray gene expression data,” J. Bioinform. Comput. Biol., vol. 03, no. 02, pp. 185–205, Apr. 2005.
[20] T. Li, C. Zhang, and M. Ogihara, “A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression,” Bioinformatics, vol. 20, no. 15, pp. 2429–2437, Oct. 2004.
[21] A. Statnikov, L. Wang, and C. F. Aliferis, “A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification,” BMC Bioinformatics, vol. 9, p. 319, Jul. 2008.
[22] U. Stelzl et al., “A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome,” Cell, vol. 122, no. 6, pp. 957–968, Sep. 2005.
[23] J. J. Hornberg, F. J. Bruggeman, H. V. Westerhoff, and J. Lankelma, “Cancer: A Systems Biology disease,” Biosystems, vol. 83, no. 2, pp. 81–90, Feb. 2006.
[24] L. Hood, J. R. Heath, M. E. Phelps, and B. Lin, “Systems Biology and New Technologies Enable Predictive and Preventative Medicine,” Science, vol. 306, no. 5696, pp. 640–643, Oct. 2004.
[25] M. H. van Vliet, H. M. Horlings, M. J. van de Vijver, M. J. T. Reinders, and L. F. A. Wessels, “Integration of Clinical and Gene Expression Data Has a Synergetic Effect on Predicting Breast Cancer Outcome,” PLOS ONE, vol. 7, no. 7, p. e40358, Jul. 2012.
[26] O. Gevaert, F. D. Smet, D. Timmerman, Y. Moreau, and B. D. Moor, “Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks,” Bioinformatics, vol. 22, no. 14, pp. e184–e190, Jul. 2006.
[27] A. Daemen, O. Gevaert, and B. D. Moor, “Integration of clinical and microarray data with kernel methods,” in 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2007, pp. 5411–5415.
[28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105.
[29] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning Hierarchical Features for Scene Labeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1915–1929, Aug. 2013.
[30] G. Hinton et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012.
[31] T. N. Sainath, A. r Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep convolutional neural networks for LVCSR,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 8614–8618.
[32] M. K. K. Leung, H. Y. Xiong, L. J. Lee, and B. J. Frey, “Deep learning of the tissue-regulated splicing code,” Bioinformatics, vol. 30, no. 12, pp. i121–i129, Jun. 2014.
[33] H. Y. Xiong et al., “The human splicing code reveals new insights into the genetic determinants of disease,” Science, vol. 347, no. 6218, p. 1254806, Jan. 2015.
[34] Y. Bengio, “Learning Deep Architectures for AI,” Found. Trends® Mach. Learn., vol. 2, no. 1, pp. 1–127, Nov. 2009.
[35] D. Sahoo, D. L. Dill, R. Tibshirani, and S. K. Plevritis, “Extracting binary signals from microarray time-course data,” Nucleic Acids Res., vol. 35, no. 11, pp. 3705–3712, Jun. 2007.
[36] H. Akaike, “A new look at the statistical model identification,” IEEE Trans. Autom. Control, vol. 19, no. 6, pp. 716–723, Dec. 1974.
[37] “Artificial neural network - Wikipedia.” [Online]. Available: https://en.wikipedia.org/wiki/Artificial_neural_network. [Accessed: 01-Feb-2018].
[38] Y. Mroueh, E. Marcheret, and V. Goel, “Deep multimodal learning for Audio-Visual Speech Recognition,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 2130–2134.
[39] Ngiam, Jiquan, et al. "Multimodal deep learning." Proceedings of the 28th international conference on machine learning (ICML-11). 2011.
[40] D. Bamber, “The area above the ordinal dominance graph and the area below the receiver operating characteristic graph,” J. Math. Psychol., vol. 12, no. 4, pp. 387–415, Nov. 1975.
[41] W. J. Youden, “Index for rating diagnostic tests,” Cancer, vol. 3, no. 1, pp. 32–35, Jan. 1950.
[42] R. Fluss, D. Faraggi, and B. Reiser, “Estimation of the Youden Index and its Associated Cutoff Point,” Biom. J., vol. 47, no. 4, pp. 458–472, Aug. 2005.
[43] E. L. Kaplan and P. Meier, “Nonparametric Estimation from Incomplete Observations,” J. Am. Stat. Assoc., vol. 53, no. 282, pp. 457–481, Jun. 1958.
[44] D. Collett, Modelling Survival Data in Medical Research, Third Edition. CRC Press, 2015.
[45] F. Cappuzzo et al., “MYC and EIF3H Coamplification Significantly Improve Response and Survival of Non-small Cell Lung Cancer Patients (NSCLC) Treated with Gefitinib,” J. Thorac. Oncol., vol. 4, no. 4, pp. 472–478, Apr. 2009.
[46] R. Li et al., “Identification of putative oncogenes in lung adenocarcinoma by a comprehensive functional genomic approach,” Oncogene, vol. 25, no. 18, pp. 2628–2635, Apr. 2006.
[47] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural Networks,” in PMLR, 2011, pp. 315–323.
[48] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” ArXiv14126980 Cs, Dec. 2014.
[49] T. Dozat, “Incorporating Nesterov Momentum into Adam,” Feb. 2016.
[50] N. S. Altman, “An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression,” Am. Stat., vol. 46, no. 3, pp. 175–185, Aug. 1992.
[51] C. Campbell and Y. Ying, “Learning with Support Vector Machines,” Synth. Lect. Artif. Intell. Mach. Learn., vol. 5, no. 1, pp. 1–95, Feb. 2011.
[52] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001.
[53] A. J. Gentles et al., “Integrating Tumor and Stromal Gene Expression Signatures With Clinical Indices for Survival Stratification of Early-Stage Non–Small Cell Lung Cancer,” JNCI J. Natl. Cancer Inst., vol. 107, no. 10, Oct. 2015.
[54] S. Halabi et al., “Prognostic Model for Predicting Survival in Men With Hormone-Refractory Metastatic Prostate Cancer,” J. Clin. Oncol., vol. 21, no. 7, pp. 1232–1237, Apr. 2003.
[55] T. Hoang, R. Xu, J. H. Schiller, P. Bonomi, and D. H. Johnson, “Clinical Model to Predict Survival in Chemonaive Patients With Advanced Non–Small-Cell Lung Cancer Treated With Third-Generation Chemotherapy Regimens Based on Eastern Cooperative Oncology Group Data,” J. Clin. Oncol., vol. 23, no. 1, pp. 175–183, Jan. 2005.
[56] D. M. Powers, “Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation,” Dec. 2011.
[57] G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Comput. Electr. Eng., vol. 40, no. 1, pp. 16–28, Jan. 2014.
[58] J. Wang et al., “The Expression of RNA-Binding Protein HuR in Non-Small Cell Lung Cancer Correlates with Vascular Endothelial Growth Factor-C Expression and Lymph Node Metastasis,” Oncology, vol. 76, no. 6, pp. 420–429, 2009.
[59] J. Wang, B. Wang, J. Bi, and C. Zhang, “Cytoplasmic HuR expression correlates with angiogenesis, lymphangiogenesis, and poor outcome in lung cancer,” Med. Oncol., vol. 28, no. 1, pp. 577–585, Dec. 2011.
[60] J. C.-H. Yang et al., “Afatinib versus cisplatin-based chemotherapy for EGFR mutation-positive lung adenocarcinoma (LUX-Lung 3 and LUX-Lung 6): analysis of overall survival data from two randomised, phase 3 trials,” Lancet Oncol., vol. 16, no. 2, pp. 141–151, Feb. 2015.
[61] C. K. Lee et al., “Impact of EGFR Inhibitor in Non–Small Cell Lung Cancer on Progression-Free and Overall Survival: A Meta-Analysis,” JNCI J. Natl. Cancer Inst., vol. 105, no. 9, pp. 595–605, May 2013.
[62] J. G. Paez et al., “EGFR Mutations in Lung Cancer: Correlation with Clinical Response to Gefitinib Therapy,” Science, vol. 304, no. 5676, pp. 1497–1500, Jun. 2004.
[63] M. Xu et al., “High expression of Cullin1 indicates poor prognosis for NSCLC patients,” Pathol. - Res. Pract., vol. 210, no. 7, pp. 397–401, Jul. 2014.
[64] A. Subramanian et al., “Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles,” Proc. Natl. Acad. Sci., vol. 102, no. 43, pp. 15545–15550, Oct. 2005.
[65] L. J. Jensen et al., “STRING 8—a global view on proteins and their functional interactions in 630 organisms,” Nucleic Acids Res., vol. 37, no. suppl_1, pp. D412–D416, Jan. 2009.
[66] H. Osada and T. Takahashi, “Genetic alterations of multiple tumor suppressors and oncogenes in the carcinogenesis and progression of lung cancer,” Oncogene, vol. 21, no. 48, p. 7421, Oct. 2002.