研究生: |
余觀至 Yu, Kuan-Chih. |
---|---|
論文名稱: |
應用連續侷限型波茲曼演算法及資料探勘模型分析電子鼻感測資料以鑑別慢性阻塞性肺疾病患者 Recognition of Patients with Chronic Obstructive Pulmonary Disease by Applying Continuous Restricted Boltzmann Machine and Data-Mining Methods to Sensory Data of E-Nose |
指導教授: |
陳新
Chen, Hsin |
口試委員: |
鄭桂忠
Tang, Kea-Tiong 李祈均 Lee, Chi-Chun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2017 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 169 |
中文關鍵詞: | 機器學習 、連續侷限性波茲曼模型 、慢性阻塞性肺疾病 、指紋辨識 |
外文關鍵詞: | Machine learning, Continuous Restricted Boltzmann Machine, Chronic Obstructive Pulmonary Disease, Pattern recognition |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文的研究目的是利用機器學習演算法辨識慢性阻塞性肺疾病(Chronic Obstructive Pulmonary Disease, COPD)。從過去的文獻中可知道,罹患COPD的患者會吐出特定的有機化合物。電子鼻系統能夠達成COPD的辨識,因為電子鼻系統擁有感應氣體的感測器以及儲存辨識方法的微處理器。若不同的感測器被應用,則電子鼻系統能被用來辨識不同的氣味。
為了達到辨識的目標,本論文遵循一套流程:(1)資料預處理(2)辨識演算法。資料預處理的部分,本論文提出一套流程:(1)基線操作(2)接收者操作特性曲線(3)正規化。辨識演算法的部分,本論文使用以下幾種方法:(1)支持向量機(2)線性判別分析(3)線性規劃(4)機率分類器。除此之外,由於連續侷限型波茲曼演算法(Continuous Restricted Boltzmann Machine, CRBM)具有學習與非線性運算的特性,所以本論文探討CRBM對於線性分類器的影響。在CRBM學習完某類資料分布的條件下,無論輸入的資料分布為何,經過多次取樣後,被學習的資料分布會被重建回來。因此,根據此特性,本論文發展出一套CRBM的機率模型作為機率分類器。
研究結果發現,所有的辨識方法皆無法有效地辨識未知的資料,理由在於,經過預處理的COPD資料重疊嚴重,導致演算法無法辨認。從資料預處理的方法可以發現,接收者操作特性曲線會淘汰掉重要的特徵,若沒有操作者接收特性曲線,則COPD的資料分布有所改善。
The purpose of this thesis is to the recognize Chronic Obstructive Pulmonary Disease (COPD) by applying machine-learning algorithms. In previous literature, it is confirmed that specific organic compounds are exhaled by most patients suffering from the COPD. The COPD could thus be diagnosed by using machine-learning algorithms to classify the sensory data of an electronic nose. An electronic nose (e-Nose) consists of an array of neuromorphic sensor with diversity. Each sensor exhibits its own characteristic response to different odorants. Therefore, this study aims to identify a machine-learning algorithm able to detect COPD by classifying the sensory data of an e-Nose.
To ease data-classification, the following methods are employed to preprocess the e-Nose data: (1) baseline manipulation, (2) receiver operating characteristic (ROC) curve, and (3) normalization. For data classification, the performance of the following three linear classifiers are compared: (1) the support vector machine, (2) the linear discriminant analysis, (3) the linear programming. In addition, the Continuous Restricted Boltzmann Machine (CRBM) is employed as a nonlinear, probabilistic classifier. How the CRBM could improve the classification task is further explored in this thesis. Based on the fact that the CRBM learns to regenerate training data, an algorithm for estimating the likelihood of unknown data under a CRBM model is developed. This estimating algorithm enables CRBM to function as a probabilistic classifier reliably.
However, our experimental results indicate that all algorithms are unable to recognize unknown data because different types of pre-processed COPD data exhibit significant overlap among each other. Further analysis indicates that sensor selection based on ROC curve filters out some important dimensions. Therefore, without the sensor selection, better classification result is achieved.
[1] 郵政醫院, “慢性阻塞性肺病診斷與發展,”[online]
http://www.postal.com.tw/網站衛教單張/胸腔內科/慢性阻塞性肺疾病.
html
[2] 李宥瑾, 「利用機器學習方法分析電子氣體感測資料以鑑別慢性肺阻塞與
氣喘患者」, 國立清華大學, 碩士論文, 2015
[3] G. Konvalina and H. Haick, “Sensors for breath testing: from nanomaterials to
comprehensive disease detection,”Acc. Chem. Res., 2013
[4] Mohammed J. Zaki and Wagner Meira JR., “Data mining and analysis:
fundamental concepts and algorithms,”Cambridge University Press, 2014
[5] Christopher M. Bishop, “Pattern Recognition and Machine Learning,” Springer,
2006
[6] H. Chen and A. F. Murray, “Continuous Restricted Boltzmann Machine with an
implementable training algorithm,” IEE Image Signal Process, Vol. 150, No. 3,
June 2003
[7] Smolensky P., “Information processing in dynamical systems: foundations of
harmony theory” in “Parallel distributed processing: explorations in the
microstructure of cognition,” MIT Press, Cambridge, MA, USA, Vol. 1,
pp. 195-281, 1986
[8] J. R. Movellan, “A learning theorem for networks at detailed stochastic
Equilibrium,” Neural Computation, Vol. 10, No. 5, pp. 1157-1178, July, 1998
[9] G. Hinton, “Training products of experts by minimizing contrastive divergence,”
Neural Computation, Vol. 14, 2000
[10] Hopfield, J. J. : “Neurons with graded response have collective computational
Properties like those of two-states neurons,” Proc. Natl. Acad. Sci. USA,
pp. 3088-3092, 1984
[11] 楊廷然, 「利用多標籤分類器實現電子鼻混合氣體識別方法之研究」, 國
立清華大學, 碩士論文, 2014
[12] T. Pearce, S. Schiffman, H. Nagle, and J. Gardner, “Handbook of machine
olfaction: electronic nose technology,” 2006
[13] 黃建銘, 「基於連續型波茲曼模型之電子鼻氣體訊號辨識方法研究」, 國
國立清華大學, 碩士論文, 2014
[14] T.M. Cover, “Geometrical and Statistical Properties of Systems of Linear
Inequalities with Applications in Pattern Recognition,” IEEE Trans. Electron.
Comput., vol. EC-14, no.3, pp.326-334, Jun. 1965
[15] H. Chen, “Continuous-valued Probabilistic Neural Computation in VLSI,”
Edinburgh University, Thesis (Ph.D.), 2004