簡易檢索 / 詳目顯示

研究生: 蔡東霖
Tsai, Tung-Lin
論文名稱: 基於機器學習預測血液感染的可解釋血液分析法
An explainable hematology data analyzer for predicting blood stream infection based on machine learning
指導教授: 洪健中
Hong, Chien-Chong
楊晶安
Yang, Chin-An
口試委員: 劉通敏
Liiu, Tong-Miin
王信堯
Wang, Hsing-Yao
學位類別: 碩士
Master
系所名稱: 工學院 - 動力機械工程學系
Department of Power Mechanical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 78
中文關鍵詞: 血液感染血液分析儀機器學習阻抗直方圖訊號早期臨床決策
外文關鍵詞: Blood stream infection, Hematology analyzer, Impedance histogram, Machine learning, Early clinical decision making
相關次數: 點閱:32下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 早期檢測嚴重血流感染對於及早開始治療至關重要。然而,目前判定菌血症的參數,如全血細胞計數(CBC)、血球分類計數(DC)、血球形態變化、C反應蛋白(C-Reactive Protein)升高和陽性血液培養,皆須耗時最短15分鐘最長7天。
    在本論文開發了一種基於機器學習方法的血液感染預測系統,該系統利用回顧性血液分析儀阻抗直方圖信號的CBC數據、血液培養報告以及在患者急診科(ED)首次抽血時同時測試的CRP水平綜合數據進行構建。據我們所知,本論文是首次將血液分析儀阻抗直方圖應用在血液感染預測上,且對檢測活躍感染和炎症相關的具有70% 至80% 之高度靈敏性。此外,本論文的陽性預測結果與需要住院接受抗生素治療相關。所提出的方法可應用於輔助早期臨床決策和抗生素治療。


    Early detection of severe blood stream infection is essential for early treatment initiation. However, current parameters suggesting bacteremia, such as complete blood count (CBC), differential count (DC), changes in blood cell morphologies, elevated C-reactive protein (CRP), and positive blood culture, are time-consuming, which would cost 15 mins to 7 days.
    In this thesis, we have developed a blood stream infection prediction system built by machine learning methods using the integrated data of retrospective hematology analyzer impedance histogram signals of CBC, blood culture reports, and the levels of the CRP, which were simultaneously tested in the first blood draw of patients visiting the emergency department (ED). To our knowledge, this thesis is the first predictor based on hematology impedance histogram signals and has 70% and 80% sensitivity to detect blood cell morphologies correlated to active infection and inflammation. Furthermore, the positive prediction of this thesis is correlated with the need of hospital admission for intravenous antibiotics. The proposed approach can be applied to assist early clinical decision making and antimicrobial treatment.

    中文摘要 ii Abstract iii Acknowledgment iv Glossary viii List of Figures ix List of Tables x Chapter 1 Introduction 1 1.1 Blood Stream Infection 1 1.2 Current Blood Stream Infection Diagnosing Research 4 1.2.1 Blood Culture 4 1.2.2 Biomarker Analysis 5 1.2.3 Blood Stream Infection Patient Record Analysis 7 1.3 Diagnostic Methods Using Hematology Analyzers 8 1.3.1 Complete Blood Count (CBC) 8 1.3.2 Hematology Histograms 10 1.4 Machine Learning 11 1.4.1 Basics of Machine Learning 11 1.4.2 Basics of Deep Learning 12 1.4.3 Encoder and Decoder Task 15 1.4.4 Deep Learning in Blood Stream Infection Prediction 16 1.5 Research Motivation 18 1.6 Research Objectives 19 1.7 Thesis Organization 20 Chapter 2 Methods 21 2.1 Ensemble Learning 21 2.2 Random Forest 24 2.3 Extreme Gredient Boosting 25 2.4 TabNet 26 2.4.1 Encoder 27 2.4.2 Decoder 28 2.5 Long Short-Term Memory 29 2.6 Overall Workflow 32 2.6.1 Data Collection and Classifier Establishment 32 2.6.2 Environments Configurations 34 2.6.3 Training Pipeline 35 2.6.4 Preprocess 37 2.6.5 Postprocess 37 Chapter 3 Experimental Results and Discussion 40 3.1 Model Establishment 40 3.1.1 Random Forest Hyperparameters 40 3.1.1 XGBoost Hyperparameters 41 3.1.2 TabNet Hyperparameters 43 3.1.3 LSTM Hyperparameters 45 3.2 Classifiers Performance 47 3.2.1 Model Performance of Blood Culture Classifier 47 3.2.2 Model Performance of CRP Classifiers 50 3.3 Feature Importances 54 3.3.1 Statistical Analysis 56 3.3.2 Morphology Analysis 57 3.3.3 Correlation of Positive Blood Culture Classifier with the Need for Admission in Second Independent Testing 62 3.4 Discussion of the Classifiers’ Performance 63 3.4.1 Analysis of Confusion Matrix of First Independent Testing 63 3.4.2 Analysis of Different Performances of the Models 65 3.5 Summary 66 Chapter 4 Conclusion and Future works 67 4.1 Conclusion 67 4.2 Research Contribution 68 4.3 Future Works 71 References 72 Author Profile 77 Journal papers 78

    [1] C. L. Holmes et al., "Pathogenesis of Gram-Negative Bacteremia," Clinical Microbiology Reviews, vol. 34, no. 2, 2021
    [2] N. Ntusi et al., "Guideline for the optimal use of blood cultures: guideline," South African Medical Journal, vol. 100, no. 12, pp. 839-843, 2010
    [3] M. Singer et al., "The third international consensus definitions for sepsis and septic shock (Sepsis-3)," Jama, vol. 315, no. 8, pp. 801-810, 2016
    [4] J.-L. Vincent et al., "Sepsis definitions: time for change," Lancet (London, England), vol. 381, no. 9868, p. 774, 2013
    [5] T. E. Sweeney et al., "Diagnosis of bacterial sepsis: why are tests for bacteremia not sufficient?," Expert Review of Molecular Diagnostics, vol. 19, no. 11, pp. 959-962, 2019
    [6] M. Raghavan et al., "Management of sepsis during the early “golden hours”," The Journal of Emergency Medicine, vol. 31, no. 2, pp. 185-199, 2006
    [7] R. R. Magadia et al., "Laboratory diagnosis of bacteremia and fungemia," Infectious Disease Clinics, vol. 15, no. 4, pp. 1009-1024, 2001
    [8] T. Takeshima et al., "Identifying patients with bacteremia in community-hospital emergency rooms: a retrospective cohort study," PloS One, vol. 11, no. 3, p. e0148078, 2016
    [9] S. M. Lobo et al., "C-reactive protein levels correlate with mortality and organ failure in critically ill patients," Chest, vol. 123, no. 6, pp. 2043-2049, 2003
    [10] H. H. Dolin et al., "A novel combination of biomarkers to herald the onset of sepsis prior to the manifestation of symptoms," Shock (Augusta, Ga.), vol. 49, no. 4, p. 364, 2018
    [11] P. Povoa et al., "C-reactive protein as an indicator of sepsis," Intensive Care Medicine, vol. 24, pp. 1052-1056, 1998
    [12] M. Deutsch et al., "Bacterial infections in patients with liver cirrhosis: clinical characteristics and the role of C-reactive protein," Annals of Gastroenterology, vol. 31, no. 1, p. 77, 2018
    [13] J. S. Calvert et al., "A computational approach to early sepsis detection," Computers in Biology and Medicine, vol. 74, pp. 69-73, 2016
    [14] A. Komori et al., "Characteristics and outcomes of bacteremia among ICU-admitted patients with severe sepsis," Scientific Reports, vol. 10, no. 1, p. 2983, 2020
    [15] A. K. Khanna et al., "Association of systolic, diastolic, mean, and pulse pressure with morbidity and mortality in septic ICU patients: a nationwide observational study," Annals of Intensive Care, vol. 13, no. 1, pp. 1-13, 2023
    [16] R. Z. Wang et al., "Predictive models of sepsis in adult ICU patients," in 2018 IEEE International Conference on Healthcare Informatics (ICHI), 2018: IEEE, pp. 390-391.
    [17] B. Hedley et al., "Initial performance evaluation of the UniCel® DxH 800 Coulter® cellular analysis system," International Journal of Laboratory Hematology, vol. 33, no. 1, pp. 45-56, 2011
    [18] L. Agnello et al., "Machine learning algorithms in sepsis," Clinica Chimica Acta, p. 117738, 2023
    [19] Z. Huang et al., "Prognostic value of neutrophil-to-lymphocyte ratio in sepsis: A meta-analysis," The American journal of Emergency Medicine, vol. 38, no. 3, pp. 641-647, 2020
    [20] B. Huang et al., "Aiding clinical assessment of neonatal sepsis using hematological analyzer data with machine learning techniques," International Journal of Laboratory Hematology, vol. 43, no. 6, pp. 1341-1356, 2021
    [21] E. T. A. Thomas et al., "Clinical utility of blood cell histogram interpretation," JCDR, vol. 11, no. 9, pp. Oe01-oe04, Sep 2017
    [22] S. Dixit et al., "Practical approach to the interpretation of complete blood count reports and histograms," Indian Pediatrics, vol. 59, no. 6, pp. 485-491, 2022
    [23] Y. LeCun et al., "Deep Learning," nature, vol. 521, no. 7553, pp. 436-444, 2015
    [24] I. Goodfellow et al., Deep Learning (no. 2). MIT press Cambridge, 2016.
    [25] J. Tang et al., "Extreme learning machine for multilayer perceptron," IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 4, pp. 809-821, 2015
    [26] M. H. Sazlı, "A brief review of feed-forward neural networks," Communications Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, vol. 50, no. 01, 2006
    [27] A. Vaswani et al., "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, 2017
    [28] E. Gultepe et al., "From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system," Journal of the American Medical Informatics Association, vol. 21, no. 2, pp. 315-325, 2014
    [29] T. Abiramalatha et al., "Utility of neutrophil volume conductivity scatter (VCS) parameter changes as sepsis screen in neonates," Journal of Perinatology, vol. 36, no. 9, pp. 733-738, 2016
    [30] Y. Li et al., "Machine-learning based prediction of prognostic risk factors in patients with invasive candidiasis infection and bacterial bloodstream infection: a singled centered retrospective study," BMC Infectious Diseases, vol. 22, no. 1, p. 150, 2022
    [31] F. Lien et al., "Bacteremia detection from complete blood count and differential leukocyte count with machine learning: complementary and competitive with C-reactive protein and procalcitonin tests," BMC Infectious Diseases, vol. 22, no. 1, pp. 1-10, 2022
    [32] Y.-H. Chang et al., "Machine learning of cell population data, complete blood count, and differential count parameters for early prediction of bacteremia among adult patients with suspected bacterial infections and blood culture sampling in emergency departments," Journal of Microbiology, Immunology and Infection, 2023
    [33] P. Zhang et al., "Categorizing and mining concept drifting data streams," in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 812-820.
    [34] M. Jaderberg et al., "Spatial transformer networks," Advances in Neural Information Processing Systems, vol. 28, 2015
    [35] J. Cai et al., "Signal modulation classification based on the transformer network," IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 3, pp. 1348-1357, 2022
    [36] E. Khalili et al., "Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network," Briefings in Bioinformatics, vol. 23, no. 2, p. bbac015, 2022
    [37] L. P. Joseph et al., "Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture," Computers in Biology and Medicine, vol. 151, p. 106178, 2022
    [38] M. R. Karim et al., "Explainable ai for bioinformatics: methods, tools and applications," Briefings in Bioinformatics, vol. 24, no. 5, p. bbad236, 2023
    [39] Y. Freund et al., "A short introduction to boosting," Journal-Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999
    [40] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001
    [41] S. B. Kotsiantis, "Decision trees: a recent overview," Artificial Intelligence Review, vol. 39, pp. 261-283, 2013
    [42] I. Taneja et al., "Diagnostic and prognostic capabilities of a biomarker and EMR‐based machine learning algorithm for sepsis," Clinical and translational science, vol. 14, no. 4, pp. 1578-1589, 2021
    [43] S. Ö. Arik et al., "Tabnet: attentive interpretable tabular learning," in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 8, pp. 6679-6687.
    [44] M. Pagès-Gallego et al., "Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling," Genome Biology, vol. 24, no. 1, p. 71, 2023
    [45] J. Bergstra et al., "Algorithms for hyper-parameter optimization," Advances in Neural Information Processing Systems, vol. 24, 2011
    [46] T. Fushiki, "Estimation of prediction error by using K-fold cross-validation," Statistics and Computing, vol. 21, pp. 137-146, 2011
    [47] T. Sipahi et al., "The effects of acute infection on hematological parameters," Pediatric hematology and oncology, vol. 21, no. 6, pp. 511-518, 2004
    [48] R. Bro et al., "Principal component analysis," Analytical Methods, vol. 6, no. 9, pp. 2812-2831, 2014
    [49] H. He et al., "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263-1284, 2009
    [50] C.-R. Chung et al., "Characterization and identification of antimicrobial peptides with different functional activities," Briefings in Bioinformatics, vol. 21, no. 3, pp. 1098-1114, 2020
    [51] S. M. Lundberg et al., "Consistent individualized feature attribution for tree ensembles," arXiv preprint arXiv:1802.03888, 2018
    [52] H. Koozi et al., "C-reactive protein as a prognostic factor in intensive care admissions for sepsis: A Swedish multicenter study," Journal of Critical Care, vol. 56, pp. 73-79, 2020
    [53] A. Lizcano et al., "Erythrocyte sialoglycoproteins engage Siglec-9 on neutrophils to suppress activation," Blood, The Journal of the American Society of Hematology, vol. 129, no. 23, pp. 3100-3110, 2017
    [54] M. Mauler et al., "Platelet-neutrophil complex formation—a detailed in vitro analysis of murine and human blood samples," Journal of Leucocyte Biology, vol. 99, no. 5, pp. 781-789, 2016
    [55] M. Phankokkruad, "Cost-sensitive extreme gradient boosting for imbalanced classification of breast cancer diagnosis," in 2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2020: IEEE, pp. 46-51.
    [56] S. B. S. Lai et al., "Comparing the performance of AdaBoost, XGBoost, and logistic regression for imbalanced data," Mathematics and Statistics, vol. 9, no. 3, pp. 379-385, 2021
    [57] M. Raichura et al., "Efficient CNN‐XGBoost technique for classification of power transformer internal faults against various abnormal conditions," IET Generation, Transmission & Distribution, vol. 15, no. 5, pp. 972-985, 2021
    [58] S. He et al., "An effective cost-sensitive XGBoost method for malicious URLs detection in imbalanced dataset," IEEE Access, vol. 9, pp. 93089-93096, 2021
    [59] Y. Liu et al., "High‐performance machine learning for large‐scale data classification considering class imbalance," Scientific Programming, vol. 2020, no. 1, p. 1953461, 2020
    [36] E. Khalili et al., "Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network," Briefings in Bioinformatics, vol. 23, no. 2, p. bbac015, 2022
    [37] L. P. Joseph et al., "Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture," Computers in Biology and Medicine, vol. 151, p. 106178, 2022
    [38] M. R. Karim et al., "Explainable ai for bioinformatics: methods, tools and applications," Briefings in Bioinformatics, vol. 24, no. 5, p. bbad236, 2023
    [39] Y. Freund et al., "A short introduction to boosting," Journal-Japanese Society For Artificial Intelligence, vol. 14, no. 771-780, p. 1612, 1999
    [40] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001
    [41] S. B. Kotsiantis, "Decision trees: a recent overview," Artificial Intelligence Review, vol. 39, pp. 261-283, 2013
    [42] I. Taneja et al., "Diagnostic and prognostic capabilities of a biomarker and EMR‐based machine learning algorithm for sepsis," Clinical and translational science, vol. 14, no. 4, pp. 1578-1589, 2021
    [43] S. Ö. Arik et al., "Tabnet: attentive interpretable tabular learning," in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 8, pp. 6679-6687.
    [44] M. Pagès-Gallego et al., "Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling," Genome Biology, vol. 24, no. 1, p. 71, 2023
    [45] J. Bergstra et al., "Algorithms for hyper-parameter optimization," Advances in Neural Information Processing Systems, vol. 24, 2011
    [46] T. Fushiki, "Estimation of prediction error by using K-fold cross-validation," Statistics and Computing, vol. 21, pp. 137-146, 2011
    [47] T. Sipahi et al., "The effects of acute infection on hematological parameters," Pediatric hematology and oncology, vol. 21, no. 6, pp. 511-518, 2004
    [48] R. Bro et al., "Principal component analysis," Analytical Methods, vol. 6, no. 9, pp. 2812-2831, 2014
    [49] H. He et al., "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263-1284, 2009
    [50] C.-R. Chung et al., "Characterization and identification of antimicrobial peptides with different functional activities," Briefings in Bioinformatics, vol. 21, no. 3, pp. 1098-1114, 2020
    [51] S. M. Lundberg et al., "Consistent individualized feature attribution for tree ensembles," arXiv preprint arXiv:1802.03888, 2018
    [52] H. Koozi et al., "C-reactive protein as a prognostic factor in intensive care admissions for sepsis: A Swedish multicenter study," Journal of Critical Care, vol. 56, pp. 73-79, 2020
    [53] A. Lizcano et al., "Erythrocyte sialoglycoproteins engage Siglec-9 on neutrophils to suppress activation," Blood, The Journal of the American Society of Hematology, vol. 129, no. 23, pp. 3100-3110, 2017
    [54] M. Mauler et al., "Platelet-neutrophil complex formation—a detailed in vitro analysis of murine and human blood samples," Journal of Leucocyte Biology, vol. 99, no. 5, pp. 781-789, 2016
    [55] M. Phankokkruad, "Cost-sensitive extreme gradient boosting for imbalanced classification of breast cancer diagnosis," in 2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2020: IEEE, pp. 46-51.
    [56] S. B. S. Lai et al., "Comparing the performance of AdaBoost, XGBoost, and logistic regression for imbalanced data," Mathematics and Statistics, vol. 9, no. 3, pp. 379-385, 2021
    [57] M. Raichura et al., "Efficient CNN‐XGBoost technique for classification of power transformer internal faults against various abnormal conditions," IET Generation, Transmission & Distribution, vol. 15, no. 5, pp. 972-985, 2021
    [58] S. He et al., "An effective cost-sensitive XGBoost method for malicious URLs detection in imbalanced dataset," IEEE Access, vol. 9, pp. 93089-93096, 2021
    [59] Y. Liu et al., "High‐performance machine learning for large‐scale data classification considering class imbalance," Scientific Programming, vol. 2020, no. 1, p. 1953461, 2020
    [60] V. Goh et al., "Predicting bacteremia among septic patients based on ED information by machine learning methods: a comparative study," Diagnostics, vol. 12, no. 10, p. 2498, 2022
    [61] Y.-W. Wu et al., "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm," Microbiome, vol. 2, pp. 1-18, 2014

    QR CODE