基於機器學習預測血液感染的可解釋血液分析法

簡易檢索 / 詳目顯示

回結果列表

研究生：	蔡東霖 Tsai, Tung-Lin
論文名稱：	基於機器學習預測血液感染的可解釋血液分析法 An explainable hematology data analyzer for predicting blood stream infection based on machine learning
指導教授：	洪健中 Hong, Chien-Chong 楊晶安 Yang, Chin-An
口試委員:	劉通敏 Liiu, Tong-Miin 王信堯 Wang, Hsing-Yao
學位類別：	碩士 Master
系所名稱：	工學院 - 動力機械工程學系 Department of Power Mechanical Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	英文
論文頁數：	78
中文關鍵詞：	血液感染、血液分析儀、機器學習、阻抗直方圖訊號、早期臨床決策
外文關鍵詞：	Blood stream infection, Hematology analyzer, Impedance histogram, Machine learning, Early clinical decision making
相關次數：	點閱：115 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

早期檢測嚴重血流感染對於及早開始治療至關重要。然而，目前判定菌血症的參數，如全血細胞計數（CBC）、血球分類計數（DC）、血球形態變化、C反應蛋白（C-Reactive Protein）升高和陽性血液培養，皆須耗時最短15分鐘最長7天。
在本論文開發了一種基於機器學習方法的血液感染預測系統，該系統利用回顧性血液分析儀阻抗直方圖信號的CBC數據、血液培養報告以及在患者急診科（ED）首次抽血時同時測試的CRP水平綜合數據進行構建。據我們所知，本論文是首次將血液分析儀阻抗直方圖應用在血液感染預測上，且對檢測活躍感染和炎症相關的具有70% 至80% 之高度靈敏性。此外，本論文的陽性預測結果與需要住院接受抗生素治療相關。所提出的方法可應用於輔助早期臨床決策和抗生素治療。

Early detection of severe blood stream infection is essential for early treatment initiation. However, current parameters suggesting bacteremia, such as complete blood count (CBC), differential count (DC), changes in blood cell morphologies, elevated C-reactive protein (CRP), and positive blood culture, are time-consuming, which would cost 15 mins to 7 days.
In this thesis, we have developed a blood stream infection prediction system built by machine learning methods using the integrated data of retrospective hematology analyzer impedance histogram signals of CBC, blood culture reports, and the levels of the CRP, which were simultaneously tested in the first blood draw of patients visiting the emergency department (ED). To our knowledge, this thesis is the first predictor based on hematology impedance histogram signals and has 70% and 80% sensitivity to detect blood cell morphologies correlated to active infection and inflammation. Furthermore, the positive prediction of this thesis is correlated with the need of hospital admission for intravenous antibiotics. The proposed approach can be applied to assist early clinical decision making and antimicrobial treatment.

中文摘要    ii
Abstract    iii
Acknowledgment    iv
Glossary    viii
List of Figures    ix
List of Tables    x
Chapter 1    Introduction    1
1 Blood Stream Infection    1
2 Current Blood Stream Infection Diagnosing Research    4
2.1 Blood Culture    4
2.2 Biomarker Analysis    5
2.3 Blood Stream Infection Patient Record Analysis    7
3 Diagnostic Methods Using Hematology Analyzers    8
3.1 Complete Blood Count (CBC)    8
3.2 Hematology Histograms    10
4 Machine Learning    11
4.1 Basics of Machine Learning    11
4.2 Basics of Deep Learning    12
4.3 Encoder and Decoder Task    15
4.4 Deep Learning in Blood Stream Infection Prediction    16
5 Research Motivation    18
6 Research Objectives    19
7 Thesis Organization    20
Chapter 2    Methods    21
1 Ensemble Learning    21
2 Random Forest    24
3 Extreme Gredient Boosting    25
4 TabNet    26
4.1 Encoder    27
4.2 Decoder    28
5 Long Short-Term Memory    29
6 Overall Workflow    32
6.1 Data Collection and Classifier Establishment    32
6.2 Environments Configurations    34
6.3 Training Pipeline    35
6.4 Preprocess    37
6.5 Postprocess    37
Chapter 3    Experimental Results and Discussion    40
1 Model Establishment    40
1.1 Random Forest Hyperparameters    40
1.1 XGBoost Hyperparameters    41
1.2 TabNet Hyperparameters    43
1.3 LSTM Hyperparameters    45
2 Classifiers Performance    47
2.1 Model Performance of Blood Culture Classifier    47
2.2 Model Performance of CRP Classifiers    50
3 Feature Importances    54
3.1 Statistical Analysis    56
3.2 Morphology Analysis    57
3.3 Correlation of Positive Blood Culture Classifier with the Need for Admission in Second Independent Testing    62
4 Discussion of the Classifiers’ Performance    63
4.1 Analysis of Confusion Matrix of First Independent Testing    63
4.2 Analysis of Different Performances of the Models    65
5 Summary    66
Chapter 4    Conclusion and Future works    67
1 Conclusion    67
2 Research Contribution    68
3 Future Works    71
References    72
Author Profile    77
Journal papers    78


                                

[1] C. L. Holmes et al., "Pathogenesis of Gram-Negative Bacteremia," Clinical Microbiology Reviews, vol. 34, no. 2, 2021
[2] N. Ntusi et al., "Guideline for the optimal use of blood cultures: guideline," South African Medical Journal, vol. 100, no. 12, pp. 839-843, 2010
[3] M. Singer et al., "The third international consensus definitions for sepsis and septic shock (Sepsis-3)," Jama, vol. 315, no. 8, pp. 801-810, 2016
[4] J.-L. Vincent et al., "Sepsis definitions: time for change," Lancet (London, England), vol. 381, no. 9868, p. 774, 2013
[5] T. E. Sweeney et al., "Diagnosis of bacterial sepsis: why are tests for bacteremia not sufficient?," Expert Review of Molecular Diagnostics, vol. 19, no. 11, pp. 959-962, 2019
[6] M. Raghavan et al., "Management of sepsis during the early “golden hours”," The Journal of Emergency Medicine, vol. 31, no. 2, pp. 185-199, 2006
[7] R. R. Magadia et al., "Laboratory diagnosis of bacteremia and fungemia," Infectious Disease Clinics, vol. 15, no. 4, pp. 1009-1024, 2001
[8] T. Takeshima et al., "Identifying patients with bacteremia in community-hospital emergency rooms: a retrospective cohort study," PloS One, vol. 11, no. 3, p. e0148078, 2016
[9] S. M. Lobo et al., "C-reactive protein levels correlate with mortality and organ failure in critically ill patients," Chest, vol. 123, no. 6, pp. 2043-2049, 2003
[10] H. H. Dolin et al., "A novel combination of biomarkers to herald the onset of sepsis prior to the manifestation of symptoms," Shock (Augusta, Ga.), vol. 49, no. 4, p. 364, 2018
[11] P. Povoa et al., "C-reactive protein as an indicator of sepsis," Intensive Care Medicine, vol. 24, pp. 1052-1056, 1998
[12] M. Deutsch et al., "Bacterial infections in patients with liver cirrhosis: clinical characteristics and the role of C-reactive protein," Annals of Gastroenterology, vol. 31, no. 1, p. 77, 2018
[13] J. S. Calvert et al., "A computational approach to early sepsis detection," Computers in Biology and Medicine, vol. 74, pp. 69-73, 2016
[14] A. Komori et al., "Characteristics and outcomes of bacteremia among ICU-admitted patients with severe sepsis," Scientific Reports, vol. 10, no. 1, p. 2983, 2020
[15] A. K. Khanna et al., "Association of systolic, diastolic, mean, and pulse pressure with morbidity and mortality in septic ICU patients: a nationwide observational study," Annals of Intensive Care, vol. 13, no. 1, pp. 1-13, 2023
[16] R. Z. Wang et al., "Predictive models of sepsis in adult ICU patients," in 2018 IEEE International Conference on Healthcare Informatics (ICHI), 2018: IEEE, pp. 390-391.
[17] B. Hedley et al., "Initial performance evaluation of the UniCel® DxH 800 Coulter® cellular analysis system," International Journal of Laboratory Hematology, vol. 33, no. 1, pp. 45-56, 2011
[18] L. Agnello et al., "Machine learning algorithms in sepsis," Clinica Chimica Acta, p. 117738, 2023
[19] Z. Huang et al., "Prognostic value of neutrophil-to-lymphocyte ratio in sepsis: A meta-analysis," The American journal of Emergency Medicine, vol. 38, no. 3, pp. 641-647, 2020
[20] B. Huang et al., "Aiding clinical assessment of neonatal sepsis using hematological analyzer data with machine learning techniques," International Journal of Laboratory Hematology, vol. 43, no. 6, pp. 1341-1356, 2021
[21] E. T. A. Thomas et al., "Clinical utility of blood cell histogram interpretation," JCDR, vol. 11, no. 9, pp. Oe01-oe04, Sep 2017
[22] S. Dixit et al., "Practical approach to the interpretation of complete blood count reports and histograms," Indian Pediatrics, vol. 59, no. 6, pp. 485-491, 2022
[23] Y. LeCun et al., "Deep Learning," nature, vol. 521, no. 7553, pp. 436-444, 2015
[24] I. Goodfellow et al., Deep Learning (no. 2). MIT press Cambridge, 2016.
[25] J. Tang et al., "Extreme learning machine for multilayer perceptron," IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 4, pp. 809-821, 2015
[26] M. H. Sazlı, "A brief review of feed-forward neural networks," Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, vol. 50, no. 01, 2006
[27] A. Vaswani et al., "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, 2017
[28] E. Gultepe et al., "From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system," Journal of the American Medical Informatics Association, vol. 21, no. 2, pp. 315-325, 2014
[29] T. Abiramalatha et al., "Utility of neutrophil volume conductivity scatter (VCS) parameter changes as sepsis screen in neonates," Journal of Perinatology, vol. 36, no. 9, pp. 733-738, 2016
[30] Y. Li et al., "Machine-learning based prediction of prognostic risk factors in patients with invasive candidiasis infection and bacterial bloodstream infection: a singled centered retrospective study," BMC Infectious Diseases, vol. 22, no. 1, p. 150, 2022
[31] F. Lien et al., "Bacteremia detection from complete blood count and differential leukocyte count with machine learning: complementary and competitive with C-reactive protein and procalcitonin tests," BMC Infectious Diseases, vol. 22, no. 1, pp. 1-10, 2022
[32] Y.-H. Chang et al., "Machine learning of cell population data, complete blood count, and differential count parameters for early prediction of bacteremia among adult patients with suspected bacterial infections and blood culture sampling in emergency departments," Journal of Microbiology, Immunology and Infection, 2023
[33] P. Zhang et al., "Categorizing and mining concept drifting data streams," in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 812-820.
[34] M. Jaderberg et al., "Spatial transformer networks," Advances in Neural Information Processing Systems, vol. 28, 2015
[35] J. Cai et al., "Signal modulation classification based on the transformer network," IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 3, pp. 1348-1357, 2022
[36] E. Khalili et al., "Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network," Briefings in Bioinformatics, vol. 23, no. 2, p. bbac015, 2022
[37] L. P. Joseph et al., "Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture," Computers in Biology and Medicine, vol. 151, p. 106178, 2022
[38] M. R. Karim et al., "Explainable ai for bioinformatics: methods, tools and applications," Briefings in Bioinformatics, vol. 24, no. 5, p. bbad236, 2023
[39] Y. Freund et al., "A short introduction to boosting," Journal-Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999
[40] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001
[41] S. B. Kotsiantis, "Decision trees: a recent overview," Artificial Intelligence Review, vol. 39, pp. 261-283, 2013
[42] I. Taneja et al., "Diagnostic and prognostic capabilities of a biomarker and EMR‐based machine learning algorithm for sepsis," Clinical and translational science, vol. 14, no. 4, pp. 1578-1589, 2021
[43] S. Ö. Arik et al., "Tabnet: attentive interpretable tabular learning," in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 8, pp. 6679-6687.
[44] M. Pagès-Gallego et al., "Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling," Genome Biology, vol. 24, no. 1, p. 71, 2023
[45] J. Bergstra et al., "Algorithms for hyper-parameter optimization," Advances in Neural Information Processing Systems, vol. 24, 2011
[46] T. Fushiki, "Estimation of prediction error by using K-fold cross-validation," Statistics and Computing, vol. 21, pp. 137-146, 2011
[47] T. Sipahi et al., "The effects of acute infection on hematological parameters," Pediatric hematology and oncology, vol. 21, no. 6, pp. 511-518, 2004
[48] R. Bro et al., "Principal component analysis," Analytical Methods, vol. 6, no. 9, pp. 2812-2831, 2014
[49] H. He et al., "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263-1284, 2009
[50] C.-R. Chung et al., "Characterization and identification of antimicrobial peptides with different functional activities," Briefings in Bioinformatics, vol. 21, no. 3, pp. 1098-1114, 2020
[51] S. M. Lundberg et al., "Consistent individualized feature attribution for tree ensembles," arXiv preprint arXiv:1802.03888, 2018
[52] H. Koozi et al., "C-reactive protein as a prognostic factor in intensive care admissions for sepsis: A Swedish multicenter study," Journal of Critical Care, vol. 56, pp. 73-79, 2020
[53] A. Lizcano et al., "Erythrocyte sialoglycoproteins engage Siglec-9 on neutrophils to suppress activation," Blood, The Journal of the American Society of Hematology, vol. 129, no. 23, pp. 3100-3110, 2017
[54] M. Mauler et al., "Platelet-neutrophil complex formation—a detailed in vitro analysis of murine and human blood samples," Journal of Leucocyte Biology, vol. 99, no. 5, pp. 781-789, 2016
[55] M. Phankokkruad, "Cost-sensitive extreme gradient boosting for imbalanced classification of breast cancer diagnosis," in 2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2020: IEEE, pp. 46-51.
[56] S. B. S. Lai et al., "Comparing the performance of AdaBoost, XGBoost, and logistic regression for imbalanced data," Mathematics and Statistics, vol. 9, no. 3, pp. 379-385, 2021
[57] M. Raichura et al., "Efficient CNN‐XGBoost technique for classification of power transformer internal faults against various abnormal conditions," IET Generation, Transmission & Distribution, vol. 15, no. 5, pp. 972-985, 2021
[58] S. He et al., "An effective cost-sensitive XGBoost method for malicious URLs detection in imbalanced dataset," IEEE Access, vol. 9, pp. 93089-93096, 2021
[59] Y. Liu et al., "High‐performance machine learning for large‐scale data classification considering class imbalance," Scientific Programming, vol. 2020, no. 1, p. 1953461, 2020
[36] E. Khalili et al., "Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network," Briefings in Bioinformatics, vol. 23, no. 2, p. bbac015, 2022
[37] L. P. Joseph et al., "Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture," Computers in Biology and Medicine, vol. 151, p. 106178, 2022
[38] M. R. Karim et al., "Explainable ai for bioinformatics: methods, tools and applications," Briefings in Bioinformatics, vol. 24, no. 5, p. bbad236, 2023
[39] Y. Freund et al., "A short introduction to boosting," Journal-Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999
[40] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001
[41] S. B. Kotsiantis, "Decision trees: a recent overview," Artificial Intelligence Review, vol. 39, pp. 261-283, 2013
[42] I. Taneja et al., "Diagnostic and prognostic capabilities of a biomarker and EMR‐based machine learning algorithm for sepsis," Clinical and translational science, vol. 14, no. 4, pp. 1578-1589, 2021
[43] S. Ö. Arik et al., "Tabnet: attentive interpretable tabular learning," in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 8, pp. 6679-6687.
[44] M. Pagès-Gallego et al., "Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling," Genome Biology, vol. 24, no. 1, p. 71, 2023
[45] J. Bergstra et al., "Algorithms for hyper-parameter optimization," Advances in Neural Information Processing Systems, vol. 24, 2011
[46] T. Fushiki, "Estimation of prediction error by using K-fold cross-validation," Statistics and Computing, vol. 21, pp. 137-146, 2011
[47] T. Sipahi et al., "The effects of acute infection on hematological parameters," Pediatric hematology and oncology, vol. 21, no. 6, pp. 511-518, 2004
[48] R. Bro et al., "Principal component analysis," Analytical Methods, vol. 6, no. 9, pp. 2812-2831, 2014
[49] H. He et al., "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263-1284, 2009
[50] C.-R. Chung et al., "Characterization and identification of antimicrobial peptides with different functional activities," Briefings in Bioinformatics, vol. 21, no. 3, pp. 1098-1114, 2020
[51] S. M. Lundberg et al., "Consistent individualized feature attribution for tree ensembles," arXiv preprint arXiv:1802.03888, 2018
[52] H. Koozi et al., "C-reactive protein as a prognostic factor in intensive care admissions for sepsis: A Swedish multicenter study," Journal of Critical Care, vol. 56, pp. 73-79, 2020
[53] A. Lizcano et al., "Erythrocyte sialoglycoproteins engage Siglec-9 on neutrophils to suppress activation," Blood, The Journal of the American Society of Hematology, vol. 129, no. 23, pp. 3100-3110, 2017
[54] M. Mauler et al., "Platelet-neutrophil complex formation—a detailed in vitro analysis of murine and human blood samples," Journal of Leucocyte Biology, vol. 99, no. 5, pp. 781-789, 2016
[55] M. Phankokkruad, "Cost-sensitive extreme gradient boosting for imbalanced classification of breast cancer diagnosis," in 2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2020: IEEE, pp. 46-51.
[56] S. B. S. Lai et al., "Comparing the performance of AdaBoost, XGBoost, and logistic regression for imbalanced data," Mathematics and Statistics, vol. 9, no. 3, pp. 379-385, 2021
[57] M. Raichura et al., "Efficient CNN‐XGBoost technique for classification of power transformer internal faults against various abnormal conditions," IET Generation, Transmission & Distribution, vol. 15, no. 5, pp. 972-985, 2021
[58] S. He et al., "An effective cost-sensitive XGBoost method for malicious URLs detection in imbalanced dataset," IEEE Access, vol. 9, pp. 93089-93096, 2021
[59] Y. Liu et al., "High‐performance machine learning for large‐scale data classification considering class imbalance," Scientific Programming, vol. 2020, no. 1, p. 1953461, 2020
[60] V. Goh et al., "Predicting bacteremia among septic patients based on ED information by machine learning methods: a comparative study," Diagnostics, vol. 12, no. 10, p. 2498, 2022
[61] Y.-W. Wu et al., "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm," Microbiome, vol. 2, pp. 1-18, 2014

簡易檢索 / 詳目顯示

相關論文