簡易檢索 / 詳目顯示

研究生: 何承諭
Ho, Cheng-Yu
論文名稱: 零樣本多物件異常檢測與分割模型
InstAD: Instance-aware Segmentation Framework for Zero-shot Multi-instance Anomaly Detection
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 劉庭祿
Liu, Tyng-Luh
陳佩君
Chen, Pei-Chun
陳煥宗
Chen, Hwann-Tzong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 41
中文關鍵詞: 異常偵測工業檢測電腦視覺深度學習零樣本學習
外文關鍵詞: Anomaly Detection, Industrial Inspection, Computer Vision, Deep Learning, Zero-shot Learning
相關次數: 點閱:93下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在自動化工業檢測中,多物件異常檢測和分割起著至關重要的作用。以往的 研究主要集中於需要輸入已經對齊的圖片、且依賴大量訓練數據的單物件檢 測任務上。在本研究中,我們提出了 InstAD,一個零樣本多物件異常檢測模 型,在輸入圖片未經對齊的情形下也能做到高準確度的異常檢測任務。在本 研究中我們引入 Segment Anything Model 和 Grounded-SAM 模型的分 割結果,並進一步提出了自適應閾值的分割優化方法,以進一步提升多物件 分割的結果。此外,我們還提出了一種基於物件特徵的對齊方法,以對齊物 件來提升模型準確率。我們的多物件異常檢測模型在 VisA 和 MPDD 數據集 的多物件類別上,以四樣本設定進行實驗可達到 93.0% 和 99.1% 圖像級別和 像素級別 AUROC。在零樣本設定下,我們達到了 87.2% 和 98.4% 圖像級別 和像素級別 AUROC。我們在公開數據集上的實驗證明本研究提出的 InstAD 方法在多物件異常檢測和分割方面顯著優於現有的最先進方法。


    Multi-instance anomaly detection and segmentation play a crucial role in automated industrial inspection. Previous works mainly focus on single-instance detection tasks that require well-aligned input and extensive training sets. In this work, we introduce InstAD, a zero-shot multi-instance anomaly detection framework that achieves high accuracy with unaligned multi-instances. Combining segmentation results from the Segment Anything Model and Grounded-SAM model, we further refine the segmentation results with the proposed Adaptive Bandwidth Segmentation Refinement scheme to achieve accurate multi-instance segmentation. Furthermore, we propose a feature-based instance alignment to align instances and improve performances. By using our instance-aware anomaly detection strategy, we achieve 93.0% and 99.1% for image-level and pixel-level AUROC, respectively, with the four-shot setting on the multi-instance classes of the VisA and MPDD datasets. Under the zero-shot scenario, we reach 87.2% and 98.4% for image-level and pixel-level AUROC, respectively. Our experiments on public datasets show that the proposed InstAD method significantly outperforms SOTA methods for multi-instance anomaly detection and segmentation.

    1 Introduction 1 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work 6 2.1 Anomaly Detection Methods . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Embedding-based Methods . . . . . . . . . . . . . . . . . . 6 2.1.2 Reconstruction-based Methods . . . . . . . . . . . . . . . . 6 2.1.3 Normalizing Flow-based Methods . . . . . . . . . . . . . . 7 2.2 Few-shot and Zero-shot Anomaly Detection . . . . . . . . . . . . . 8 2.3 Foundation Model-based Methods . . . . . . . . . . . . . . . . . . 8 3 Methodology 9 3.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.1 Few-shot Anomaly Detection and Segmentation . . . . . . 9 3.1.2 Zero-shot Anomaly Detection and Segmentation . . . . . . 10 3.2 Instance Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.1 Instance Segmentation with SAM and Grounded-SAM . . . 11 3.2.2 Adaptive Bandwidth Segmentation Refinement . . . . . . . 11 3.2.3 Feature-based Instance Alignment . . . . . . . . . . . . . . 12 3.3 Instance-level Anomaly Detection and Segmentation . . . . . . . . 13 3.3.1 Few-shot Memory Bank-based Strategy . . . . . . . . . . . 13 3.3.2 Zero-shot K-Nearest Neighbor Strategy . . . . . . . . . . . 14 4 Experiment 16 4.1 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1.1 Datasets and Evaluation Metric . . . . . . . . . . . . . . . 16 4.1.2 Implementation Detail . . . . . . . . . . . . . . . . . . . . 16 4.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2.1 Few-Shot Experiment . . . . . . . . . . . . . . . . . . . . 18 4.2.2 Zero-Shot Experiment . . . . . . . . . . . . . . . . . . . . 18 4.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3.1 Four-shot Visualization . . . . . . . . . . . . . . . . . . . . 20 4.3.2 Zero-shot Visualization . . . . . . . . . . . . . . . . . . . . 20 5 Ablation Study 25 5.1 Feature Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 Selection of k in K-Nearest-Neighbor . . . . . . . . . . . . . . . . 26 5.3 Selection of Batch Size in K-Nearest Neighbor . . . . . . . . . . . 26 5.4 Distance Metric in K-Nearest Neighbor . . . . . . . . . . . . . . . 27 5.5 Feature-based Alignment Strategy . . . . . . . . . . . . . . . . . . 28 5.6 Component-wise Analysis . . . . . . . . . . . . . . . . . . . . . . 30 5.7 Latency of Few-shot Anomaly Detection . . . . . . . . . . . . . . . 31 5.8 Merging Framework into SOTA Methods . . . . . . . . . . . . . . 31 6 Conclusion 36 6.1 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 References 38

    [1] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026, 2023.
    [2] T. Ren, S. Liu, A. Zeng, J. Lin, K. Li, H. Cao, J. Chen, X. Huang, Y. Chen, F. Yan, et al., “Grounded sam: Assembling open-world models for diverse visual tasks,” arXiv preprint arXiv:2401.14159, 2024.
    [3] K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14318–14328, 2022.
    [4] S. Jezek, M. Jonak, R. Burget, P. Dvorak, and M. Skotak, “Deep learningbased defect detection of metal parts: evaluating current methods in complex conditions,” in 2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT), pp. 66–71, IEEE, 2021.
    [5] J. Jeong, Y. Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19606–19616, 2023.
    [6] Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the-difference self-supervised pre-training for anomaly detection and segmentation,” in European Conference on Computer Vision, pp. 392–408, Springer, 2022.
    [7] Q. Zhou, G. Pang, Y. Tian, S. He, and J. Chen, “Anomalyclip: Objectagnostic prompt learning for zero-shot anomaly detection,” arXiv preprint arXiv:2310.18961, 2023.
    [8] X. Chen, Y. Han, and J. Zhang, “A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,” arXiv preprint arXiv:2305.17382, 2023.
    [9] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9592–9600, 2019.
    [10] A.-T. Ardelean and T. Weyrich, “High-fidelity zero-shot texture anomaly localization using feature correspondence analysis,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1134–1144, 2024.
    [11] T. Aota, L. T. T. Tong, and T. Okatani, “Zero-shot versus many-shot: Unsupervised texture anomaly detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5564–5572, 2023.
    [12] S. Kim, S. An, P. Chikontwe, M. Kang, E. Adeli, K. M. Pohl, and S. H. Park, “Few shot part segmentation reveals compositional logic for industrial anomaly detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 8591–8599, 2024.
    [13] K. Batzner, L. Heckler, and R. König, “Efficientad: Accurate visual anomaly detection at millisecond-level latencies,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 128–138, 2024.
    [14] P. Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, “Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization,” International Journal of Computer Vision, vol. 130, no. 4, pp. 947–969, 2022.
    [15] L. Bonfiglioli, M. Toschi, D. Silvestri, N. Fioraio, and D. De Gregorio, “The eyecandies dataset for unsupervised multimodal anomaly detection and localization,” in Proceedings of the Asian Conference on Computer Vision, pp. 3586–3602, 2022.
    [16] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning, pp. 8748–8763, PMLR, 2021.
    [17] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are fewshot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
    [18] X. Li, Z. Zhang, X. Tan, C. Chen, Y. Qu, Y. Xie, and L. Ma, “Promptad: Learning prompts with only normal samples for few-shot anomaly detection,” arXiv preprint arXiv:2404.05231, 2024.
    [19] Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Anomalygpt: Detecting industrial anomalies using large vision-language models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 1932–1940, 2024.
    [20] T. Defard, A. Setkov, A. Loesch, and R. Audigier, “Padim: a patch distribution modeling framework for anomaly detection and localization,” in International Conference on Pattern Recognition, pp. 475–489, Springer, 2021.
    [21] X. Jiang, J. Liu, J. Wang, Q. Nie, K. Wu, Y. Liu, C. Wang, and F. Zheng, “Softpatch: Unsupervised anomaly detection with noisy data,” Advances in Neural Information Processing Systems, vol. 35, pp. 15433–15445, 2022.
    [22] J. Bae, J.-H. Lee, and S. Kim, “Pni: industrial anomaly detection using position and neighborhood information,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6373–6383, 2023.
    [23] Z. Fang, X. Wang, H. Li, J. Liu, Q. Hu, and J. Xiao, “Fastrecon: Few-shot industrial anomaly detection via fast feature reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17481–17490, 2023.
    [24] V. Zavrtanik, M. Kristan, and D. Skočaj, “Draem-a discriminatively trained reconstruction embedding for surface anomaly detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8330–8339, 2021.
    [25] V. Zavrtanik, M. Kristan, and D. Skočaj, “Dsr–a dual subspace re-projection network for surface anomaly detection,” in European conference on computer vision, pp. 539–554, Springer, 2022.
    [26] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “Highresolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
    [27] F. Lu, X. Yao, C.-W. Fu, and J. Jia, “Removing anomalies as noises for industrial defect localization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16166–16175, 2023.
    [28] H. He, J. Zhang, H. Chen, X. Chen, Z. Li, X. Chen, Y. Wang, C. Wang, and L. Xie, “Diad: A diffusion-based framework for multi-class anomaly detection,” arXiv preprint arXiv:2312.06607, 2023.
    [29] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4183–4192, 2020.
    [30] H. Deng and X. Li, “Anomaly detection via reverse distillation from one-class embedding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9737–9746, 2022.
    [31] T. D. Tien, A. T. Nguyen, N. H. Tran, T. D. Huy, S. Duong, C. D. T. Nguyen, and S. Q. Truong, “Revisiting reverse distillation for anomaly detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 24511–24520, 2023.
    [32] D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in International conference on machine learning, pp. 1530–1538, PMLR, 2015.
    [33] M. Rudolph, B. Wandt, and B. Rosenhahn, “Same same but differnet: Semisupervised defect detection with normalizing flows,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1907–1916, 2021.
    [34] M. Rudolph, T. Wehrbein, B. Rosenhahn, and B. Wandt, “Fully convolutional cross-scale-flows for image-based defect detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1088–1097, 2022.
    [35] J. Yu, Y. Zheng, X. Wang, W. Li, Y. Wu, R. Zhao, and L. Wu, “Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,” arXiv preprint arXiv:2111.07677, 2021.
    [36] C. Huang, H. Guan, A. Jiang, Y. Zhang, M. Spratling, and Y.-F. Wang, “Registration based few-shot anomaly detection,” in European Conference on Computer Vision, pp. 303–319, Springer, 2022.
    [37] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
    [38] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    [39] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
    [40] Z. Liu, Y. Zhou, Y. Xu, and Z. Wang, “Simplenet: A simple network for image anomaly detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20402–20411, 2023.

    QR CODE