研究生: |
劉亭利 Liou, Ting-Li |
---|---|
論文名稱: |
疾病預測於保險上之應用:以機器學習的方法建立疾病預測模型 Application of the Disease Prediction in Insurance: Disease Prediction Model by Machine Learning |
指導教授: |
韓傳祥
Han, Chuan-Hsiang |
口試委員: |
黃能富
Huang, Nen-Fu 丁台怡 Ding, Tai-Yi |
學位類別: |
碩士 Master |
系所名稱: |
科技管理學院 - 計量財務金融學系 Department of Quantitative Finance |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 32 |
中文關鍵詞: | 機器學習 、心臟病預測 、保險 、監督式學習 、分類演算法 、特徵重要性 |
外文關鍵詞: | Machine learning, heart disease prediction, insurance, supervised learning, classification algorithms, feature importance |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文以機器學習的方法建立疾病預測模型,以心臟病作為疾病標的。在疾病預測的大框架下,分別建立心臟病短期與中期的疾病預測模型,探討兩者分別及綜合在保險上之應用價值,以解決保險業之痛點。本文以召回率(Recall)作為首要評估準則,在短期疾病預測模型中,利用Logistic regression達到召回率85.71%,並透過特徵選擇後,召回率提升至88.09%。此外,本文發現胸痛種類、運動是否引發心絞痛、螢光透視鏡看到的血管數量、缺陷種類、運動高峰期ST段斜率、性別等為短期疾病預測重要指標。另外,本文於中期疾病預測模型中,利用SMOTE加上Random Forest達到召回率68.57%,再透過特徵選擇後,召回率提升至70.86%。本文亦發現年齡、抽菸與否、性別、每天平均抽菸量、中風與否、高血壓與否等指標對於中期疾病預測結果更為重要。本文期望以短期疾病預測解決保險業理賠、詐保的痛點,並以中期疾病預測為保險業在銷售、核保、理賠帶來業務優化與改善。最後,綜合兩者為客戶打造更精細的保單規劃,為保險業帶來附加價值,提升客戶體驗。
This paper uses machine learning algorithms to build disease prediction model, with heart disease as the disease target. Under the general framework of disease prediction, this paper establishes short-term and medium-term disease prediction models of heart disease, discussing the application value of these two models in insurance domain respectively and jointly, and solving the above pain points. In this paper, in the short-term disease prediction model, Logistic regression classifier achieves the best recall performance (85.71%), and after feature selection, its recall is increased to 88.09%. In addition, through the short-term prediction model, this paper realizes chest pain type, exercise-induced angina or not, number of major vessels colored by fluoroscopy, thallium scan, the slope of the peak exercise ST segment, and sex are the most important indicators for disease prediction. In addition, in the medium-term disease prediction model, this paper uses SMOTE with Random Forest classifier achieves the best recall 68.57%, and then through feature selection, its recall rate is increased to 70.86%. In addition, the article also found that age, current smoker or not, sex, cigarette amount consumed per day, prevalent stroke or not, and prevalent hypertension or not are the most important features for mid-term disease prediction results. This paper anticipates to use short-term disease prediction to solve the pain points of insurance claims and scam; and to use mid-term disease prediction to optimize and improve the insurance industry marketing, underwriting, and claims business. Finally, to use these two disease prediction models to create more refined policy plannings for customers to bring additional value to insurance companies, and also enhance customer experience.
英文部分
1. A. Géron. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 1-207, 273-286, O'Reilly, 2020
2. A. Kelleher & A. Kelleher. Machine learning in production: developing and optimizing data science workflows and applications, 125-131, Pearson Education, 2019
3. J. J. Beunza, E. Puertas, E. Garcia-Ovejero, G. Villalba, E. Condes, G. Koleva, C. Hurtado, and M. F. Landecho. Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease), Journal of Biomedical Informatics, vol. 97, 2019
4. K. Sathya & R. Karthiban. Performance Analysis Of Heart Disease Classification For Computer Diagnosis System, International Conference on Computer Communication and Informatics (ICCCI), pp. 1-7, 2020
5. S. Mohan, C. Thirumalai, and G. Srivastava. Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques, IEEE Access, vol. 7, pp. 81542-81554, 2019
6. A. U. Haq, J. P. Li, M. H. Memon, S. Nazir, and R. Sun. A Hybrid Intelligent System Framework for the Prediction of Heart Disease Using Machine Learning Algorithms, Hindawi, vol.2018, 2018
7. V. Kunwar, K. Chandel, A. S. Sabitha , and A. Bansal. Chronic Kidney Disease analysis using data mining classification techniques, 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence), Noida, pp. 300-305, 2016
8. S. Bharati, M. A. Rahman and P. Podder. Breast Cancer Prediction Applying Different Classification Algorithm with Comparative Analysis using WEKA, 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), Dhaka, Bangladesh, pp. 581-584, 2018
9. M. Kubat. An Introduction to Machine Learning, 91-133,173-188,211-228 Springer, 2017
10. M. Bowles. Machine Learning in Python : Essential Techniques for Predictive Analysis, 23-120,211-315, WILEY, 2015
11. B. Krawczyk. Learning from imbalanced data: open challenges and future directions, Progress in Artificial Intelligence, vol. 5, pp.221-232, 2016
12. A. Sonak, R. A. Patankar. A Survey on Methods to Handle Imbalance Dataset, IJCSMC, vol. 4, Issue. 11, pp.338-343, 2015
13. K. Black Jr. & H. D. Skipper Jr. Life & Health Insurance, Pearson Education, 1999
中文部分
1. 陳允傑. Python資料科學與人工智慧應用實務, 8-2~10-45,13-2~14-19,16-2~16-9, 旗標出版, 2019
2. 寺田學, 辻真吾, 鈴木たかのり, 福島真太朗,許郁文(譯). 用Python快速上手資料分析與機器學習, 89-262, 碁峰出版, 2019
3. 阮敬. Python數據分析基礎-包含數據挖掘和機器學習, 104-240, 469-494, 五南出版, 2019
4. 劉凡平. 大數據時代的演算法:機器學習、人工智慧及其典型實例, 5-22~5-25, 8-1~8-20, 松崗出版, 2017
5. 趙志勇. Python機器學習算法, 1-26,58-137, 電子工業出版社, 2017
6. 鄭捷. 機器學習概論:機器學習發展+演算法原理實務, 3-24~3-32,6-1~6-41,8-2~8-36,10-14~10-32佳魁資訊, 2020
7. 文淵閣工作室(編著), 鄧文淵(總監製). Python機器學習與深度學習特訓班:看得懂也會做的AI人工智慧實戰, 2-2~2-31 碁峰出版, 2019
8. 李顯正. 金融科技概論, 369-405, 新陸書局, 2018
9. K. Black Jr., H. D. Skipper Jr., 蔡政憲, 吳福山, 陳彩稚, 許文彥, 曾榮秀, 吳旭立, 康裕民, 王儷玲, 許碩芬(合譯). 人壽保險, 295-388, 中華民國人壽保險管理學會, 2004