簡易檢索 / 詳目顯示

研究生: 黃家興
Huang, Chia-Hsin
論文名稱: 影像擴增在自監督比對學習的影像分類之研究
Image Augmentation for Self-Supervised Contrastive Learning on Image Classification
指導教授: 丁川康
Ting, Chuan-Kang
口試委員: 吳建瑋
Wu, Chien-Wei
周哲維
Chou, Che-Wei
温育瑋
Wen, Yu-Wei
學位類別: 碩士
Master
系所名稱: 教務處 - 智慧製造跨院高階主管碩士在職學位學程
AIMS Fellows
論文出版年: 2022
畢業學年度: 111
語文別: 中文
論文頁數: 49
中文關鍵詞: 人工智慧電腦視覺監督式學習自監督式學習比對學習影像擴增
外文關鍵詞: Artificial intelligence, computer vision, supervised learning, self-supervised learning, contrastive learning, augmentation
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 最近幾年人工智慧在很多方面的成果越來越成熟,特別是在電腦視覺應用的效果更為顯著,例如影像分類、影像物件偵測或影像分割處理等技術已經從實驗研究的階段進化到實際普及的應用,而這些技術大部份都是以監督式學習的方式來實現,其中必須依靠大量的人工標籤資料來訓練模型;然而,資料標籤過程會耗費相當的時間與成本,在某些領域更是難以取得人工標籤的資料。自監督式學習不若監督式學習需要大量人工標籤的數據,而是透過資料與資料之間的相互比對來學習特徵,從資料中挖掘到更多資訊,降低在模型訓練時人工標籤資料的需求。本研究目標是透過自監督比對學習資料擴增的優化而提昇下游任務影像分類的準確率,以 Momentum Contrast version 2 (MoCo_v2) 模型進行不同的資料擴增實驗,並在公開資料集STL-10、CIFAR-10及CIFAR-100 進行驗證,從實驗結果顯示使用本研究的資料擴增比Baseline預訓練模型的下游任務準確率高,而使用本研究資料擴增在下游任務預訓練只要10% 人工標籤資料,即可超越原本 Baseline在下游任務中需要30%人工標籤資對應的準確率,因而降低20%人工標籤資料的需求。


    In recent years, artificial intelligence has shown excellent performance on various machine learning tasks, especially on supervised learning in computer vision. It achieves outstanding results on a wide range of challenging vision tasks, such as classification, object detection, or image segmentation. However, supervised learning requires huge amounts of labeled data, which are not easy to obtain for many applications. Self-supervised learning (SSL) using unlabeled data has emerged as an alternative as it needs less manual annotation. SSL constructs feature representations using pretext tasks that operate without manual annotation. The goal of this study is to find suitable methods of image augmentation and parameter settings for the self-supervised learning performance improvement. This study used Momentum Contrast version 2 (MoCo_v2) as a baseline for pretext task and evaluated our augmentation method on STL-10, CIFAR-10 and CIFAR-100 datasets. The results show our augmentation method can improve the accuracy of image classification. Using only 10% labeled data for fine-tuning can achieve the downstream performance comparable to training with 30% labeled data.

    目錄 影像擴增在自監督比對學習的影像分類之研究.....i 目錄......................................i 表目錄..................................iii 圖目錄...................................iv 第一章 緒論...............................1 1.1 研究背景 .........................1 1.2 研究動機 .........................3 1.3 研究目的 .........................3 1.4 論文結構 .........................3 第二章 文獻探討...........................5 2.1 電腦視覺的自監督學習...............5 2.2 比對學習(Contrastive Learning)..6 2.3 自監督比對學習前置任務.............7 2.4 比對損失函數......................8 2.5 資料擴增(Data Augmentation).....8 2.6 編碼器(Encoder)................11 2.7 電腦視覺自監督比對學習前置任務範例.13 2.8 電腦視像下游任務.................19 第三章 研究方法..........................22 3.1 研究架構........................31 3.2 資料集(Dataset)...............24 3.3 實驗硬體及軟體環境...............24 3.4 資料擴增實驗條件.................25 3.5 自監督比對學習前置任務模型建置....27 第四章 實驗結果與討論.....................31 4.1 實驗基準(Baseline)建立..........31 4.2 實驗設計與參數設定說明............32 4.3 實驗結果整理、呈現與討論..........32 第五章 結論..............................45 5.1 結論............................45 5.2 未來展望.........................47 參考文獻.................................48

    [1] L. Jing and Y. Tian, "Self-supervised visual feature learning with deep neural networks: A survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11, pp. 4037-4058, 2020.
    [2] P. Khosla et al., "Supervised contrastive learning," Advances in Neural Information Processing Systems, vol. 33, pp. 18661-18673, 2020.
    [3] A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, "A survey on contrastive self-supervised learning," Technologies, vol. 9, no. 1, p. 2, 2020.
    [4] A. v. d. Oord, Y. Li, and O. Vinyals, "Representation learning with contrastive predictive coding," arXiv preprint arXiv:1807.03748, 2018.
    [5] M. Xu, S. Yoon, A. Fuentes, and D. S. Park, "A Comprehensive Survey of Image Augmentation Techniques for Deep Learning," arXiv preprint arXiv:2205.01491, 2022.
    [6] S. Yang, W. Xiao, M. Zhang, S. Guo, J. Zhao, and F. Shen, "Image Data Augmentation for Deep Learning: A Survey," arXiv preprint arXiv:2204.08610, 2022.
    [7] S. Yun, D. Han, S. Chun, S. J. Oh, Y. Yoo and J. Choe, "CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features," Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6022-6031.
    [8] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 pp. 770-778.
    [9] H. Shi et al., "On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals," in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, vol. 36, no. 2, pp. 2225-2234.
    [10] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," Proceedings of the International Conference on Machine Learning, 2020: PMLR, pp. 1597-1607.
    [11] K. He, H. Fan, Y. Wu, S. Xie and R. Girshick, "Momentum Contrast for Unsupervised Visual Representation Learning," Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9726-9735.
    [12] X. Chen, H. Fan, R. Girshick, and K. He, "Improved baselines with momentum contrastive learning," arXiv preprint arXiv:2003.04297, 2020.
    [13] Z. Wu, Y. Xiong, S. X. Yu and D. Lin, "Unsupervised Feature Learning via Non-parametric Instance Discrimination," Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3733-3742.
    [14] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems, vol. 28, 2015.
    [15] K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980-2988.
    [16] C. -Y. Wang, A. Bochkovskiy and H. -Y. M. Liao, "Scaled-YOLOv4: Scaling Cross Stage Partial Network," Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13024-13033
    [17] W. Liu et al., "Ssd: Single shot multibox detector," in Proceedings of the European Conference on Computer Vision, 2016: Springer, pp. 21-37.
    [18] F. Lateef and Y. Ruichek, "Survey on semantic segmentation using deep learning techniques," Neurocomputing, vol. 338, pp. 321-348, 2019.
    [19] A. Coates, A. Ng, and H. Lee, "An analysis of single-layer networks in unsupervised feature learning," Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011: JMLR Workshop and Conference Proceedings, pp. 215-223.
    [20] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images," 2009.
    [21] T. Carneiro, R. V. M. Da Nóbrega, T. Nepomuceno, G.-B. Bian, V. H. C. De Albuquerque, and P. P. Reboucas Filho, "Performance analysis of google colaboratory as a tool for accelerating deep learning applications," IEEE Access, vol. 6, pp. 61677-61685, 2018.
    [22] S. Appalaraju, Y. Zhu, Y. Xie, and I. Fehérvári, "Towards good practices in self-supervised representation learning," arXiv preprint arXiv:2012.00868, 2020.
    [23] M.-C. Chiu and T.-M. Chen, "Applying data augmentation and mask R-CNN-based instance segmentation method for mixed-type wafer maps defect patterns classification," IEEE Transactions on Semiconductor Manufacturing, vol. 34, no. 4, pp. 455-463, 2021.
    [24] A. Fujishiro, Y. Nagamura, T. Usami, and M. Inoue, "Minimizing Convolutional Neural Network Training Data With Proper Data Augmentation for Inline Defect Classification," IEEE Transactions on Semiconductor Manufacturing, vol. 34, no. 3, pp. 333-339, 2021.
    [25] H. Kahng and S. B. Kim, "Self-supervised representation learning for wafer bin map defect pattern classification," IEEE Transactions on Semiconductor Manufacturing, vol. 34, no. 1, pp. 74-86, 2020.
    [26] H. Geng, F. Yang, X. Zeng, and B. Yu, "When wafer failure pattern classification meets few-shot learning and self-supervised learning," Proceedings of the 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2021: IEEE, pp. 1-8.
    [27] J.-B. Grill et al., "Bootstrap your own latent-a new approach to self-supervised learning," Advances in Neural Information Processing Systems, vol. 33, pp. 21271-21284, 2020.
    [28] Y. Bai, Y. Yang, W. Zhang, and T. Mei, "Directional Self-Supervised Learning for Heavy Image Augmentations," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16692-16701.
    [29] C. J. Reed, S. Metzger, A. Srinivas, T. Darrell and K. Keutzer, "SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning," Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2673-2682.
    [30] Y. Li, F. Qi and Y. Wan, "Improvements On Bicubic Image Interpolation," Proceedings of the 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2019, pp. 1316-1320.

    QR CODE