簡易檢索 / 詳目顯示

研究生: 謝宇宣
Hsieh, Yu-Hsuan
論文名稱: 無監督學習零件分割應用於邏輯性異常偵測
CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 陳煥宗
Chen, Hwann-Tzong
劉庭祿
Liu, Tyng-Luh
陳佩君
Chen, Pei-Chun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 47
中文關鍵詞: 異常偵測工業檢測圖像分割無監督學習電腦視覺
外文關鍵詞: Anomaly Detection, Industrial Inspection, Image Segmentation, Unsupervised Learning, Computer Vision
相關次數: 點閱:78下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 先前用於工業異常檢測的方法雖然已經在檢測表面缺陷方面取得了卓越的性能,最先進技術的偵測準確度甚至已高達 99%,但是這些方法往往無法偵測如物體零件數量錯誤、排列組合不正確等邏輯性異常。

    近年來,一些新的方法將圖像分割技術與傳統的異常檢測方法結合,用以改善邏輯性異常偵測性能。這些方法雖然有效,但分割結果常常不甚理想,或是需要人工標註圖像分割資料。為了克服這些缺點,我們提出了一種無監督學習零件分割技術,它利用基礎模型(Foundation Model)來自動產生用於訓練輕量級分割網路的資料。

    將這種新技術與我們提出的Patch Histogram模組和Local-Global Student-Teacher (LGST) 模組結合起來,我們的方法在MVTec LOCO AD資料集上達到了95.3% 的圖像級AUROC,超越了現有所有的方法。此外,我們提出的方法比大多數現有方法具有更低的延遲和更高的吞吐量。


    Previous works for industrial anomaly detection have achieved excellent performance detecting surface defects, with state-of-the-art (SOTA) approaches reaching detection AUROC of 99%. However, these methods often fail to detect logical anomalies, such as incorrect quantities, arrangements, and combinations of object components.

    Recently, some works have integrated segmentation techniques with conventional anomaly detection methods to improve logical anomaly detection. Although these methods are effective, they frequently lead to unsatisfactory segmentation results and require manual annotations. To address these drawbacks, we develop an unsupervised component segmentation technique that leverages foundation models to autonomously generate training label maps for a lightweight segmentation network without human labeling.

    Integrating this new segmentation technique with our proposed Patch Histogram module and the Local-Global Student-Teacher (LGST) module, we achieve a detection AUROC of 95.3% in the MVTec LOCO AD dataset, which surpasses previous SOTA methods. Furthermore, our proposed method provides lower latency and higher throughput than most existing approaches.

    Contents 1 Introduction 1 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Logical Anomaly Detection . . . . . . . . . . . . . . . . . 1 1.1.3 Component Segmentation . . . . . . . . . . . . . . . . . . 2 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Related Work 6 2.1 Anomaly Detection Methods . . . . . . . . . . . . . . . . . . . . . 6 2.2 Logical Anomaly Detection Methods . . . . . . . . . . . . . . . . . 7 2.2.1 EfficientAD . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 ComAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.3 PSAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Foundation Models . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.1 Segment Anything Model . . . . . . . . . . . . . . . . . . 8 2.3.2 Grounding DINO . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.3 Grounded SAM . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.4 Recognize Anything++ . . . . . . . . . . . . . . . . . . . . 9 3 Proposed Method 10 3.1 Semantic Pseudo-label Generation . . . . . . . . . . . . . . . . . . 10 3.1.1 Component-level Segmentation . . . . . . . . . . . . . . . 10 3.1.2 Image Tags Generation . . . . . . . . . . . . . . . . . . . . 11 3.1.3 Mask Refinement . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.4 Component Feature Extraction . . . . . . . . . . . . . . . . 12 3.1.5 Component Clustering . . . . . . . . . . . . . . . . . . . . 12 3.1.6 Filtering Unreliable Semantic Pseudo-labels . . . . . . . . . 12 3.2 Component Segmentation . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.1 Segmentation Network Architecture . . . . . . . . . . . . . 13 3.2.2 Logical Synthetic Anomalies (LSA) . . . . . . . . . . . . . 13 3.2.3 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Patch Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.1 Class Histogram . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.2 Patch Histogram . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Local-Global Student-Teacher(LGST) . . . . . . . . . . . . . . . . 16 3.4.1 Model Architecture . . . . . . . . . . . . . . . . . . . . . . 16 3.4.2 Difference between LGST and EfficientAD . . . . . . . . . 16 3.5 Anomaly Detection and Localization . . . . . . . . . . . . . . . . . 17 3.5.1 Image-level Score Fusion . . . . . . . . . . . . . . . . . . 17 3.5.2 Anomaly localization . . . . . . . . . . . . . . . . . . . . . 18 4 Experiments 29 4.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1 Segmentation Network . . . . . . . . . . . . . . . . . . . . 29 4.1.2 LGST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2.2 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3.1 Performance on MVTec LOCO AD . . . . . . . . . . . . . 31 4.3.2 Latency and Throughput . . . . . . . . . . . . . . . . . . . 31 4.3.3 Segmentation Branch . . . . . . . . . . . . . . . . . . . . . 31 4.3.4 Augmentation of Semantic Pseudo-label Map . . . . . . . . 32 4.3.5 Anomaly Localization . . . . . . . . . . . . . . . . . . . . 32 5 Ablation Study 37 5.1 Effectiveness of LSA . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Patch Size Combinations of Patch Histogram . . . . . . . . . . . . 37 5.3 Performance and Speed of Different Branches in CSAD . . . . . . . 38 5.4 Impact of Different Components and Settings of CSAD . . . . . . . 38 5.5 Hyperparameter of Component Clustering . . . . . . . . . . . . . . 40 6 Conclusion 41 6.1 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.1.1 Limitation of Patch Histogram . . . . . . . . . . . . . . . . 41 6.1.2 Limitation of Semantic Pseudo-label Generation . . . . . . 41 6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 References 43

    [1] P. Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, “Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization,” International Journal of Computer Vision, vol. 130, no. 4, pp. 947–969, 2022.
    [2] T. Liu, B. Li, X. Du, B. Jiang, X. Jin, L. Jin, and Z. Zhao, “Component-aware anomaly detection framework for adjustable and logical industrial visual inspection,” Advanced Engineering Informatics, vol. 58, p. 102161, 2023.
    [3] S. Kim, S. An, P. Chikontwe, M. Kang, E. Adeli, K. M. Pohl, and S. H. Park, “Few shot part segmentation reveals compositional logic for industrial anomaly detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 8591–8599, 2024.
    [4] N. Cohen and Y. Hoshen, “Sub-image anomaly detection with deep pyramid correspondences,” arXiv preprint arXiv:2005.02357, 2020.
    [5] T. Defard, A. Setkov, A. Loesch, and R. Audigier, “Padim: a patch distribution modeling framework for anomaly detection and localization,” in International Conference on Pattern Recognition, pp. 475–489, Springer, 2021.
    [6] C. Huang, H. Guan, A. Jiang, Y. Zhang, M. Spratlin, and Y. Wang, “Registration based few-shot anomaly detection,” in European Conference on Computer Vision (ECCV), 2022.
    [7] K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14318–14328, 2022.
    [8] J. Bae, J.-H. Lee, and S. Kim, “Pni: industrial anomaly detection using position and neighborhood information,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6373–6383, 2023.
    [9] D. McIntosh and A. B. Albu, “Inter-realization channels: Unsupervised anomaly detection beyond one-class classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6285–6295, 2023.
    [10] J. Hyun, S. Kim, G. Jeon, S. H. Kim, K. Bae, and B. J. Kang, “Reconpatch: Contrastive patch representation learning for industrial anomaly detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2052–2061, 2024.
    [11] J. Lei, X. Hu, Y. Wang, and D. Liu, “Pyramidflow: High-resolution defect con- trastive localization using pyramid normalizing flow,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14143–14152, June 2023.
    [12] D. Gudovskiy, S. Ishizaka, and K. Kozuka, “Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 98–107, 2022.
    [13] J. Yu, Y. Zheng, X. Wang, W. Li, Y. Wu, R. Zhao, and L. Wu, “Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,” arXiv preprint arXiv:2111.07677, 2021.
    [14] J. Jeong, Y. Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Win- clip: Zero-/few-shot anomaly classification and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19606–19616, 2023.
    [15] Q. Zhou, G. Pang, Y. Tian, S. He, and J. Chen, “AnomalyCLIP: Object-agnostic prompt learning for zero-shot anomaly detection,” in The Twelfth International Conference on Learning Representations, 2024.
    [16] Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Anomalygpt: Detecting industrial anomalies using large vision-language models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 1932–1940, 2024.
    [17] X. Li, Z. Zhang, X. Tan, C. Chen, Y. Qu, Y. Xie, and L. Ma, “Promptad: Learning prompts with only normal samples for few-shot anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16838–16848, 2024.
    [18] V. Zavrtanik, M. Kristan, and D. Skočaj, “Dsr–a dual subspace re-projection network for surface anomaly detection,” in European conference on computer vision, pp. 539–554, Springer, 2022.
    [19] N.-C. Ristea, N. Madan, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moes- lund, and M. Shah, “Self-supervised predictive convolutional attentive block for anomaly detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13576–13586, 2022.
    [20] V. Zavrtanik, M. Kristan, and D. Skočaj, “Draem-a discriminatively trained reconstruction embedding for surface anomaly detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8330–8339, 2021.
    [21] T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth, “f-anogan: Fast unsupervised anomaly detection with generative adversarial networks,” Medical image analysis, vol. 54, pp. 30–44, 2019.
    [22] J. Song, K. Kong, Y.-I. Park, S.-G. Kim, and S.-J. Kang, “Anoseg: Anomaly segmentation network using self-supervised learning,” arXiv preprint arXiv:2110.03396, 2021.
    [23] S. Venkataramanan, K.-C. Peng, R. V. Singh, and A. Mahalanobis, “Attention guided anomaly localization in images,” in European Conference on Computer Vision, pp. 485–503, Springer, 2020.
    [24] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, June 2020.
    [25] H. Deng and X. Li, “Anomaly detection via reverse distillation from one-class embedding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9737–9746, 2022.
    [26] T. D. Tien, A. T. Nguyen, N. H. Tran, T. D. Huy, S. Duong, C. D. T. Nguyen, and S. Q. Truong, “Revisiting reverse distillation for anomaly detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 24511–24520, 2023.
    [27] Z. Gu, L. Liu, X. Chen, R. Yi, J. Zhang, Y. Wang, C. Wang, A. Shu, G. Jiang, and L. Ma, “Remembering normality: Memory-guided knowledge distillation for unsupervised anomaly detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16401–16409, 2023.
    [28] S. Wang, L. Wu, L. Cui, and Y. Shen, “Glancing at the patch: Anomaly localization with global and local feature comparison,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 254–263, June 2021.
    [29] K. Batzner, L. Heckler, and R. König, “Efficientad: Accurate visual anomaly detection at millisecond-level latencies,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 128–138, 2024.
    [30] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the International Conference on Computer Vision (ICCV), 2021.
    [31] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” arXiv:2304.02643, 2023.
    [32] S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, et al., “Grounding dino: Marrying dino with grounded pre-training for open- set object detection,” arXiv preprint arXiv:2303.05499, 2023.
    [33] T. Ren, S. Liu, A. Zeng, J. Lin, K. Li, H. Cao, J. Chen, X. Huang, Y. Chen, F. Yan, Z. Zeng, H. Zhang, F. Li, J. Yang, H. Li, Q. Jiang, and L. Zhang, “Grounded sam: Assembling open-world models for diverse visual tasks,” 2024.
    [34] X. Huang, Y.-J. Huang, Y. Zhang, W. Tian, R. Feng, Y. Zhang, Y. Xie, Y. Li, and L. Zhang, “Open-set image tagging with multi-grained text supervision,” arXiv e-prints, pp. arXiv–2310, 2023.
    [35] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 5, pp. 603–619, 2002.
    [36] R. J. Campello, D. Moulavi, and J. Sander, “Density-based clustering based on hierarchical density estimates,” in Pacific-Asia conference on knowledge discovery and data mining, pp. 160–172, Springer, 2013.
    [37] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for discovering clusters in large spatial databases with noise,” in kdd, vol. 96, pp. 226–231, 1996.
    [38] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder- decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018.
    [39] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 2016 fourth international conference on 3D vision (3DV), pp. 565–571, Ieee, 2016.
    [40] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
    [41] Y. Wang, H. Wang, Y. Shen, J. Fei, W. Li, G. Jin, L. Wu, R. Zhao, and X. Le, “Semi-supervised semantic segmentation using unreliable pseudo-labels,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4248–4257, 2022.
    [42] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
    [43] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
    [44] Z. Liu, Y. Zhou, Y. Xu, and Z. Wang, “Simplenet: A simple network for image anomaly detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20402–20411, 2023.
    [45] M. Rudolph, T. Wehrbein, B. Rosenhahn, and B. Wandt, “Asymmetric student- teacher networks for industrial anomaly detection,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 2592– 2602, 2023.
    [46] S. Akcay, D. Ameln, A. Vaidya, B. Lakshmanan, N. Ahuja, and U. Genc, “Anomalib: A deep learning library for anomaly detection,” in 2022 IEEE International Conference on Image Processing (ICIP), pp. 1706–1710, IEEE, 2022.
    [47] C.-L. Li, K. Sohn, J. Yoon, and T. Pfister, “Cutpaste: Self-supervised learning for anomaly detection and localization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9664–9674, 2021.

    QR CODE