簡易檢索 / 詳目顯示

研究生: 余宣汶
Yu, Hsuan-Wen
論文名稱: 基於合成影像與公開資料集用於醫療影像之聯 邦學習研究
Dynamically synthetic images with open datasets for Federated learning
指導教授: 銀慶剛
Ing, Ching-Kang
盧鴻興
Lu, Horng-Shing
口試委員: 杜憶萍
Tu, I-Ping
陳素雲
Huang, Su-Yun
學位類別: 碩士
Master
系所名稱: 理學院 - 統計學研究所
Institute of Statistics
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 47
中文關鍵詞: 深度學習聯邦學習醫學影像
外文關鍵詞: COVID-19, Deep learning, Federted learning
相關次數: 點閱:92下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 醫學診斷的深度學習模型需要大量的數據,但由於隱私法規的限制,從多
    個醫療機構收集數據很困難。聯邦學習提供了一種解決方案,可以實現在各
    種機構分布的數據上聯合訓練模型,而不需要集中數據。然而,聯邦學習模
    型可能存在對具有較大訓練數據集的機構存在偏見的問題,這是需要注意的
    潛在問題。我們的研究引入了一種結合動態合成圖像聯邦學習與開放數據集
    (FLOMS)的新技術,可以有效地結合來自開放數據集和具有異構數據類型
    的本地醫療機構的信息。FLOMS的關鍵特徵是其能夠動態生成合成圖像,模
    擬當前模型錯誤分類的本地數據,加上可以幫助訓練少量圖像的開放數據集的
    信息。這允許開發一個全域模型,可以更好地處理本地機構收集的數據類型的
    多樣性,特別是包括訓練類似本地數據集中被錯誤分類的合成圖像。在評估
    我們模型的性能時,我們優先考慮每個客戶端數據集的準確性。實驗結果顯
    示,FLOMS在模型準確性方面優於傳統聯邦學習方法。


    Deep learning models for medical diagnosis require large amounts of data, which
    is difficult to collect from multiple medical institutions due to privacy regulations.
    Federated Learning (FL) offers a solution by enabling joint training of models with
    data distributed across various institutions, without requiring data to be centralized. However, FL models can suffer from bias towards institutions with larger
    training datasets, which is a potential issue to be aware of. Our study introduces
    a new technique called Dynamically Synthetic Images for Federated Learning with
    open datasets (FLOMS) that can effectively combine information from an open
    dataset and local medical institutions with heterogeneous data types. The key
    feature of FLOMS is its ability to dynamically generate synthetic images that
    mimic local data which the current model is misclassifying and get the information of the open datasets that can help train the small number of images. This
    allows for the development of a global model that can better handle the diversity in the types of data collected by local institutions, especially including the
    incorporation of training on images that resemble the misclassified cases in local
    datasets. When evaluating the performance of our model, we prioritize the accuracy of each client’s individual dataset and overall accuracy. The experimental
    results show that FLOMS outperforms the conventional FL approach in terms of
    model accuracy.

    Contents Abstract (Chinese) I Acknowledgements (Chinese) II Abstract III Acknowledgements IV Contents V List of Figures VIII List of Tables IX List of Algorithms X 1 Introduction 1 2 Literature review 3 3 Backbone Technique 5 3.1 Convolutional neural networks . . . . . . . . . . . . . . . . . . . . . 5 3.1.1 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.2 EfficientNet . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Federated learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 V 3.2.1 Federated averaging algorithm . . . . . . . . . . . . . . . . . 8 3.3 Synthetic Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.1 Synthetic Minority Oversampling . . . . . . . . . . . . . . . 9 3.3.2 Borderline SMOTE . . . . . . . . . . . . . . . . . . . . . . . 11 4 Methodology 13 4.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Method design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2.1 Dynamically synthetic images . . . . . . . . . . . . . . . . . 14 4.2.2 Monitor the best communication round . . . . . . . . . . . . 17 4.2.3 Open datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Macterials and Experimental settings 25 5.1 COVID-19 Chest X-ray Image Datasets . . . . . . . . . . . . . . . . 25 5.1.1 Fundus Image Data . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Skin datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3 Experiment setting of Dynamically synthetic images . . . . . . . . . 29 5.4 Experiment setting of open datasets . . . . . . . . . . . . . . . . . . 30 5.5 Data Preprocessing and Augmentation . . . . . . . . . . . . . . . . 30 6 Experiment results 32 6.1 Dynamically synthetic images . . . . . . . . . . . . . . . . . . . . . 32 6.1.1 COVID-19 datasets . . . . . . . . . . . . . . . . . . . . . . . 32 6.1.2 Fundus images datasets . . . . . . . . . . . . . . . . . . . . 33 6.1.3 Skin datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.2 Open datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.2.1 Skin datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.2.2 COVID datasets . . . . . . . . . . . . . . . . . . . . . . . . 37 6.2.3 Fundus images datasets . . . . . . . . . . . . . . . . . . . . 39 VI 7 Conclusion and future work 41 Bibliography 43

    [1] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud
    Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm
    Van Der Laak, Bram Van Ginneken, and Clara I S´anchez. A survey on deep
    learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
    [2] Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletari, Holger R Roth,
    Shadi Albarqouni, Spyridon Bakas, Mathieu N Galtier, Bennett A Landman,
    Klaus Maier-Hein, et al. The future of digital health with federated learning.
    NPJ digital medicine, 3(1):1–7, 2020.
    [3] Jakub Koneˇcn´y, H. Brendan McMahan, Felix X. Yu, Peter Richtarik,
    Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies
    for improving communication efficiency. In NIPS Workshop on Private MultiParty Machine Learning, 2016.
    [4] Ines Feki, Sourour Ammar, Yousri Kessentini, and Khan Muhammad. Federated learning for covid-19 screening from chest x-ray images. Applied Soft
    Computing, 106:107330, 2021.
    [5] Peter Kairouz, H Brendan McMahan, Brendan Avent, Aur´elien Bellet, Mehdi
    Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham
    Cormode, Rachel Cummings, et al. Advances and open problems in federated
    43
    learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210,
    2021.
    [6] Manoj Ghuhan Arivazhagan, Vinay Aggarwal, Aaditya Kumar Singh, and
    Sunav Choudhary. Federated learning with personalization layers. arXiv
    preprint arXiv:1912.00818, 2019.
    [7] Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning,
    pages 6105–6114. PMLR, 2019.
    [8] Mingxing Tan and Quoc Le. Efficientnetv2: Smaller models and faster training. In International conference on machine learning, pages 10096–10106.
    PMLR, 2021.
    [9] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and
    Blaise Aguera y Arcas. Communication-efficient learning of deep networks
    from decentralized data. In Artificial intelligence and statistics, pages 1273–
    1282. PMLR, 2017.
    [10] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip
    Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal
    of artificial intelligence research, 16:321–357, 2002.
    [11] Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. Borderline-smote: a new
    over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC
    2005, Hefei, China, August 23-26, 2005, Proceedings, Part I 1, pages 878–887.
    Springer, 2005.
    [12] Muhammad EH Chowdhury, Tawsifur Rahman, Amith Khandakar, Rashid
    Mazhar, Muhammad Abdul Kadir, Zaid Bin Mahbub, Khandakar Reajul Is44
    lam, Muhammad Salman Khan, Atif Iqbal, Nasser Al Emadi, et al. Can ai
    help in screening viral and covid-19 pneumonia? IEEE Access, 8:132665–
    132676, 2020.
    [13] Tawsifur Rahman, Amith Khandakar, Yazan Qiblawey, Anas Tahir, Serkan
    Kiranyaz, Saad Bin Abul Kashem, Mohammad Tariqul Islam, Somaya
    Al Maadeed, Susu M Zughaier, Muhammad Salman Khan, et al. Exploring the effect of image enhancement techniques on covid-19 detection using
    chest x-ray images. Computers in biology and medicine, 132:104319, 2021.
    [14] Maria De La Iglesia Vay´a, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber,
    Domingo Orozco-Beltr´an, Francisco Garc´ıa-Garc´ıa, et al. Bimcv covid-19+:
    a large annotated dataset of rx and ct images from covid-19 patients. arXiv
    preprint arXiv:2006.01174, 2020.
    [15] Hinrich B Winther, Hans Laser, Svetlana Gerbel, Sabine K Maschke, Jan B
    Hinrichs, Jens Vogel-Claussen, Frank K Wacker, Marius M H¨oper, and Bernhard C Meyer. Covid-19 image repository. Figshare (Dataset), 2020.
    [16] Societa Italiana di Radiologia Medica e InterventisticaSocieta Italiana di Radiologia Medica e Interventistica. Sirm covid-19 database, 2018. data retrieved from Societa Italiana di Radiologia Medica e Interventistica, https:
    //sirm.org/category/senza-categoria/covid-19/.
    [17] Seokbum Ko Arman Haghanifar, Mahdiyar Molahasani Majdabadi. Covidcxnet: Detecting covid-19 in frontal chest x-ray images using deep learning.
    2020.
    [18] Joseph Paul Cohen, Paul Morrison, and Lan Dao. Covid-19 image data collection. arXiv 2003.11597, 2020.
    45
    [19] Radiological Society of North AmericasRadiological Society of North Americas. Rsna pneumonia detection challenge, 2018. data retrieved from Radiological Society of North Americas, https://www.kaggle.com/competitions/
    rsna-pneumonia-dete\ction-challenge/overview/description.
    [20] Daniel S Kermany, Michael Goldbaum, Wenjia Cai, Carolina CS Valentim,
    Huiying Liang, Sally L Baxter, Alex McKeown, Ge Yang, Xiaokang Wu,
    Fangbing Yan, et al. Identifying medical diagnoses and treatable diseases
    by image-based deep learning. Cell, 172(5):1122–1131, 2018.
    [21] European Society of RadiologyEuropean Society of Radiology. Eurorad covid19 database, 2020. data retrieved from European Society of Radiology, https:
    //www.eurorad.org/.
    [22] Jos´e Ignacio Orlando, Huazhu Fu, Jo˜ao Barbosa Breda, Karel van Keer,
    Deepti R Bathula, Andr´es Diaz-Pinto, Ruogu Fang, Pheng-Ann Heng, Jeyoung Kim, JoonHo Lee, et al. Refuge challenge: A unified framework for
    evaluating automated methods for glaucoma assessment from fundus photographs. Medical image analysis, 59:101570, 2020.
    [23] Zhuo Zhang, Feng Shou Yin, Jiang Liu, Wing Kee Wong, Ngan Meng Tan,
    Beng Hai Lee, Jun Cheng, and Tien Yin Wong. Origa-light: An online retinal
    fundus image database for glaucoma analysis and research. In 2010 Annual
    international conference of the IEEE engineering in medicine and biology,
    pages 3065–3068. IEEE, 2010.
    [24] Jayanthi Sivaswamy, SR Krishnadas, Gopal Datt Joshi, Madhulika Jain, and
    A Ujjwaft Syed Tabish. Drishti-gs: Retinal image dataset for optic nerve
    head (onh) segmentation. In 2014 IEEE 11th international symposium on
    biomedical imaging (ISBI), pages 53–56. IEEE, 2014.
    46
    [25] Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The ham10000
    dataset, a large collection of multi-source dermatoscopic images of common
    pigmented skin lesions. Scientific data, 5(1):1–9, 2018.
    [26] Seung Seog Han, Myoung Shin Kim, Woohyung Lim, Gyeong Hun Park, Ilwoo
    Park, and Sung Eun Chang. Classification of the clinical images for benign
    and malignant cutaneous tumors using a deep learning algorithm. Journal of
    Investigative Dermatology, 138(7):1529–1538, 2018.
    [27] Andre GC Pacheco, Gustavo R Lima, Amanda S Salomao, Breno Krohling,
    Igor P Biral, Gabriel G de Angelo, F´abio CR Alves Jr, Jos´e GM Esgario,
    Alana C Simora, Pedro BC Castro, et al. Pad-ufes-20: A skin lesion dataset
    composed of patient data and clinical images collected from smartphones.
    Data in brief, 32:106221, 2020.
    [28] Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.

    QR CODE