研究生: |
余宣汶 Yu, Hsuan-Wen |
---|---|
論文名稱: |
基於合成影像與公開資料集用於醫療影像之聯 邦學習研究 Dynamically synthetic images with open datasets for Federated learning |
指導教授: |
銀慶剛
Ing, Ching-Kang 盧鴻興 Lu, Horng-Shing |
口試委員: |
杜憶萍
Tu, I-Ping 陳素雲 Huang, Su-Yun |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 47 |
中文關鍵詞: | 深度學習 、聯邦學習 、醫學影像 |
外文關鍵詞: | COVID-19, Deep learning, Federted learning |
相關次數: | 點閱:93 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
醫學診斷的深度學習模型需要大量的數據,但由於隱私法規的限制,從多
個醫療機構收集數據很困難。聯邦學習提供了一種解決方案,可以實現在各
種機構分布的數據上聯合訓練模型,而不需要集中數據。然而,聯邦學習模
型可能存在對具有較大訓練數據集的機構存在偏見的問題,這是需要注意的
潛在問題。我們的研究引入了一種結合動態合成圖像聯邦學習與開放數據集
(FLOMS)的新技術,可以有效地結合來自開放數據集和具有異構數據類型
的本地醫療機構的信息。FLOMS的關鍵特徵是其能夠動態生成合成圖像,模
擬當前模型錯誤分類的本地數據,加上可以幫助訓練少量圖像的開放數據集的
信息。這允許開發一個全域模型,可以更好地處理本地機構收集的數據類型的
多樣性,特別是包括訓練類似本地數據集中被錯誤分類的合成圖像。在評估
我們模型的性能時,我們優先考慮每個客戶端數據集的準確性。實驗結果顯
示,FLOMS在模型準確性方面優於傳統聯邦學習方法。
Deep learning models for medical diagnosis require large amounts of data, which
is difficult to collect from multiple medical institutions due to privacy regulations.
Federated Learning (FL) offers a solution by enabling joint training of models with
data distributed across various institutions, without requiring data to be centralized. However, FL models can suffer from bias towards institutions with larger
training datasets, which is a potential issue to be aware of. Our study introduces
a new technique called Dynamically Synthetic Images for Federated Learning with
open datasets (FLOMS) that can effectively combine information from an open
dataset and local medical institutions with heterogeneous data types. The key
feature of FLOMS is its ability to dynamically generate synthetic images that
mimic local data which the current model is misclassifying and get the information of the open datasets that can help train the small number of images. This
allows for the development of a global model that can better handle the diversity in the types of data collected by local institutions, especially including the
incorporation of training on images that resemble the misclassified cases in local
datasets. When evaluating the performance of our model, we prioritize the accuracy of each client’s individual dataset and overall accuracy. The experimental
results show that FLOMS outperforms the conventional FL approach in terms of
model accuracy.
[1] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud
Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm
Van Der Laak, Bram Van Ginneken, and Clara I S´anchez. A survey on deep
learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
[2] Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletari, Holger R Roth,
Shadi Albarqouni, Spyridon Bakas, Mathieu N Galtier, Bennett A Landman,
Klaus Maier-Hein, et al. The future of digital health with federated learning.
NPJ digital medicine, 3(1):1–7, 2020.
[3] Jakub Koneˇcn´y, H. Brendan McMahan, Felix X. Yu, Peter Richtarik,
Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies
for improving communication efficiency. In NIPS Workshop on Private MultiParty Machine Learning, 2016.
[4] Ines Feki, Sourour Ammar, Yousri Kessentini, and Khan Muhammad. Federated learning for covid-19 screening from chest x-ray images. Applied Soft
Computing, 106:107330, 2021.
[5] Peter Kairouz, H Brendan McMahan, Brendan Avent, Aur´elien Bellet, Mehdi
Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham
Cormode, Rachel Cummings, et al. Advances and open problems in federated
43
learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210,
2021.
[6] Manoj Ghuhan Arivazhagan, Vinay Aggarwal, Aaditya Kumar Singh, and
Sunav Choudhary. Federated learning with personalization layers. arXiv
preprint arXiv:1912.00818, 2019.
[7] Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning,
pages 6105–6114. PMLR, 2019.
[8] Mingxing Tan and Quoc Le. Efficientnetv2: Smaller models and faster training. In International conference on machine learning, pages 10096–10106.
PMLR, 2021.
[9] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and
Blaise Aguera y Arcas. Communication-efficient learning of deep networks
from decentralized data. In Artificial intelligence and statistics, pages 1273–
1282. PMLR, 2017.
[10] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip
Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal
of artificial intelligence research, 16:321–357, 2002.
[11] Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. Borderline-smote: a new
over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC
2005, Hefei, China, August 23-26, 2005, Proceedings, Part I 1, pages 878–887.
Springer, 2005.
[12] Muhammad EH Chowdhury, Tawsifur Rahman, Amith Khandakar, Rashid
Mazhar, Muhammad Abdul Kadir, Zaid Bin Mahbub, Khandakar Reajul Is44
lam, Muhammad Salman Khan, Atif Iqbal, Nasser Al Emadi, et al. Can ai
help in screening viral and covid-19 pneumonia? IEEE Access, 8:132665–
132676, 2020.
[13] Tawsifur Rahman, Amith Khandakar, Yazan Qiblawey, Anas Tahir, Serkan
Kiranyaz, Saad Bin Abul Kashem, Mohammad Tariqul Islam, Somaya
Al Maadeed, Susu M Zughaier, Muhammad Salman Khan, et al. Exploring the effect of image enhancement techniques on covid-19 detection using
chest x-ray images. Computers in biology and medicine, 132:104319, 2021.
[14] Maria De La Iglesia Vay´a, Jose Manuel Saborit, Joaquim Angel Montell, Antonio Pertusa, Aurelia Bustos, Miguel Cazorla, Joaquin Galant, Xavier Barber,
Domingo Orozco-Beltr´an, Francisco Garc´ıa-Garc´ıa, et al. Bimcv covid-19+:
a large annotated dataset of rx and ct images from covid-19 patients. arXiv
preprint arXiv:2006.01174, 2020.
[15] Hinrich B Winther, Hans Laser, Svetlana Gerbel, Sabine K Maschke, Jan B
Hinrichs, Jens Vogel-Claussen, Frank K Wacker, Marius M H¨oper, and Bernhard C Meyer. Covid-19 image repository. Figshare (Dataset), 2020.
[16] Societa Italiana di Radiologia Medica e InterventisticaSocieta Italiana di Radiologia Medica e Interventistica. Sirm covid-19 database, 2018. data retrieved from Societa Italiana di Radiologia Medica e Interventistica, https:
//sirm.org/category/senza-categoria/covid-19/.
[17] Seokbum Ko Arman Haghanifar, Mahdiyar Molahasani Majdabadi. Covidcxnet: Detecting covid-19 in frontal chest x-ray images using deep learning.
2020.
[18] Joseph Paul Cohen, Paul Morrison, and Lan Dao. Covid-19 image data collection. arXiv 2003.11597, 2020.
45
[19] Radiological Society of North AmericasRadiological Society of North Americas. Rsna pneumonia detection challenge, 2018. data retrieved from Radiological Society of North Americas, https://www.kaggle.com/competitions/
rsna-pneumonia-dete\ction-challenge/overview/description.
[20] Daniel S Kermany, Michael Goldbaum, Wenjia Cai, Carolina CS Valentim,
Huiying Liang, Sally L Baxter, Alex McKeown, Ge Yang, Xiaokang Wu,
Fangbing Yan, et al. Identifying medical diagnoses and treatable diseases
by image-based deep learning. Cell, 172(5):1122–1131, 2018.
[21] European Society of RadiologyEuropean Society of Radiology. Eurorad covid19 database, 2020. data retrieved from European Society of Radiology, https:
//www.eurorad.org/.
[22] Jos´e Ignacio Orlando, Huazhu Fu, Jo˜ao Barbosa Breda, Karel van Keer,
Deepti R Bathula, Andr´es Diaz-Pinto, Ruogu Fang, Pheng-Ann Heng, Jeyoung Kim, JoonHo Lee, et al. Refuge challenge: A unified framework for
evaluating automated methods for glaucoma assessment from fundus photographs. Medical image analysis, 59:101570, 2020.
[23] Zhuo Zhang, Feng Shou Yin, Jiang Liu, Wing Kee Wong, Ngan Meng Tan,
Beng Hai Lee, Jun Cheng, and Tien Yin Wong. Origa-light: An online retinal
fundus image database for glaucoma analysis and research. In 2010 Annual
international conference of the IEEE engineering in medicine and biology,
pages 3065–3068. IEEE, 2010.
[24] Jayanthi Sivaswamy, SR Krishnadas, Gopal Datt Joshi, Madhulika Jain, and
A Ujjwaft Syed Tabish. Drishti-gs: Retinal image dataset for optic nerve
head (onh) segmentation. In 2014 IEEE 11th international symposium on
biomedical imaging (ISBI), pages 53–56. IEEE, 2014.
46
[25] Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The ham10000
dataset, a large collection of multi-source dermatoscopic images of common
pigmented skin lesions. Scientific data, 5(1):1–9, 2018.
[26] Seung Seog Han, Myoung Shin Kim, Woohyung Lim, Gyeong Hun Park, Ilwoo
Park, and Sung Eun Chang. Classification of the clinical images for benign
and malignant cutaneous tumors using a deep learning algorithm. Journal of
Investigative Dermatology, 138(7):1529–1538, 2018.
[27] Andre GC Pacheco, Gustavo R Lima, Amanda S Salomao, Breno Krohling,
Igor P Biral, Gabriel G de Angelo, F´abio CR Alves Jr, Jos´e GM Esgario,
Alana C Simora, Pedro BC Castro, et al. Pad-ufes-20: A skin lesion dataset
composed of patient data and clinical images collected from smartphones.
Data in brief, 32:106221, 2020.
[28] Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.