零樣本多物件異常檢測與分割模型｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	何承諭 Ho, Cheng-Yu
論文名稱：	零樣本多物件異常檢測與分割模型 InstAD: Instance-aware Segmentation Framework for Zero-shot Multi-instance Anomaly Detection
指導教授：	賴尚宏 Lai, Shang-Hong
口試委員:	劉庭祿 Liu, Tyng-Luh 陳佩君 Chen, Pei-Chun 陳煥宗 Chen, Hwann-Tzong
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2025
畢業學年度：	113
語文別：	英文
論文頁數：	41
中文關鍵詞：	異常偵測、工業檢測、電腦視覺、深度學習、零樣本學習
外文關鍵詞：	Anomaly Detection, Industrial Inspection, Computer Vision, Deep Learning, Zero-shot Learning
相關次數：	點閱：232 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在自動化工業檢測中，多物件異常檢測和分割起著至關重要的作用。以往的研究主要集中於需要輸入已經對齊的圖片、且依賴大量訓練數據的單物件檢測任務上。在本研究中，我們提出了 InstAD，一個零樣本多物件異常檢測模型，在輸入圖片未經對齊的情形下也能做到高準確度的異常檢測任務。在本研究中我們引入 Segment Anything Model 和 Grounded-SAM 模型的分割結果，並進一步提出了自適應閾值的分割優化方法，以進一步提升多物件分割的結果。此外，我們還提出了一種基於物件特徵的對齊方法，以對齊物件來提升模型準確率。我們的多物件異常檢測模型在 VisA 和 MPDD 數據集的多物件類別上，以四樣本設定進行實驗可達到 93.0% 和 99.1% 圖像級別和像素級別 AUROC。在零樣本設定下，我們達到了 87.2% 和 98.4% 圖像級別和像素級別 AUROC。我們在公開數據集上的實驗證明本研究提出的 InstAD 方法在多物件異常檢測和分割方面顯著優於現有的最先進方法。

Multi-instance anomaly detection and segmentation play a crucial role in automated industrial inspection. Previous works mainly focus on single-instance detection tasks that require well-aligned input and extensive training sets. In this work, we introduce InstAD, a zero-shot multi-instance anomaly detection framework that achieves high accuracy with unaligned multi-instances. Combining segmentation results from the Segment Anything Model and Grounded-SAM model, we further refine the segmentation results with the proposed Adaptive Bandwidth Segmentation Refinement scheme to achieve accurate multi-instance segmentation. Furthermore, we propose a feature-based instance alignment to align instances and improve performances. By using our instance-aware anomaly detection strategy, we achieve 93.0% and 99.1% for image-level and pixel-level AUROC, respectively, with the four-shot setting on the multi-instance classes of the VisA and MPDD datasets. Under the zero-shot scenario, we reach 87.2% and 98.4% for image-level and pixel-level AUROC, respectively. Our experiments on public datasets show that the proposed InstAD method significantly outperforms SOTA methods for multi-instance anomaly detection and segmentation.

Introduction 1
1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Related Work 6
1 Anomaly Detection Methods . . . . . . . . . . . . . . . . . . . . . 6
1.1 Embedding-based Methods . . . . . . . . . . . . . . . . . . 6
1.2 Reconstruction-based Methods . . . . . . . . . . . . . . . . 6
1.3 Normalizing Flow-based Methods . . . . . . . . . . . . . . 7
2 Few-shot and Zero-shot Anomaly Detection . . . . . . . . . . . . . 8
3 Foundation Model-based Methods . . . . . . . . . . . . . . . . . . 8
Methodology 9
1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1 Few-shot Anomaly Detection and Segmentation . . . . . . 9
1.2 Zero-shot Anomaly Detection and Segmentation . . . . . . 10
2 Instance Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Instance Segmentation with SAM and Grounded-SAM . . . 11
2.2 Adaptive Bandwidth Segmentation Refinement . . . . . . . 11
2.3 Feature-based Instance Alignment . . . . . . . . . . . . . . 12
3 Instance-level Anomaly Detection and Segmentation . . . . . . . . 13
3.1 Few-shot Memory Bank-based Strategy . . . . . . . . . . . 13
3.2 Zero-shot K-Nearest Neighbor Strategy . . . . . . . . . . . 14
Experiment 16
1 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1 Datasets and Evaluation Metric . . . . . . . . . . . . . . . 16
1.2 Implementation Detail . . . . . . . . . . . . . . . . . . . . 16
2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Few-Shot Experiment . . . . . . . . . . . . . . . . . . . . 18
2.2 Zero-Shot Experiment . . . . . . . . . . . . . . . . . . . . 18
3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Four-shot Visualization . . . . . . . . . . . . . . . . . . . . 20
3.2 Zero-shot Visualization . . . . . . . . . . . . . . . . . . . . 20
Ablation Study 25
1 Feature Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Selection of k in K-Nearest-Neighbor . . . . . . . . . . . . . . . . 26
3 Selection of Batch Size in K-Nearest Neighbor . . . . . . . . . . . 26
4 Distance Metric in K-Nearest Neighbor . . . . . . . . . . . . . . . 27
5 Feature-based Alignment Strategy . . . . . . . . . . . . . . . . . . 28
6 Component-wise Analysis . . . . . . . . . . . . . . . . . . . . . . 30
7 Latency of Few-shot Anomaly Detection . . . . . . . . . . . . . . . 31
8 Merging Framework into SOTA Methods . . . . . . . . . . . . . . 31
Conclusion 36
1 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
References 38

                                

[1] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026, 2023.
[2] T. Ren, S. Liu, A. Zeng, J. Lin, K. Li, H. Cao, J. Chen, X. Huang, Y. Chen, F. Yan, et al., “Grounded sam: Assembling open-world models for diverse visual tasks,” arXiv preprint arXiv:2401.14159, 2024.
[3] K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14318–14328, 2022.
[4] S. Jezek, M. Jonak, R. Burget, P. Dvorak, and M. Skotak, “Deep learningbased defect detection of metal parts: evaluating current methods in complex conditions,” in 2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT), pp. 66–71, IEEE, 2021.
[5] J. Jeong, Y. Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19606–19616, 2023.
[6] Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the-difference self-supervised pre-training for anomaly detection and segmentation,” in European Conference on Computer Vision, pp. 392–408, Springer, 2022.
[7] Q. Zhou, G. Pang, Y. Tian, S. He, and J. Chen, “Anomalyclip: Objectagnostic prompt learning for zero-shot anomaly detection,” arXiv preprint arXiv:2310.18961, 2023.
[8] X. Chen, Y. Han, and J. Zhang, “A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,” arXiv preprint arXiv:2305.17382, 2023.
[9] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9592–9600, 2019.
[10] A.-T. Ardelean and T. Weyrich, “High-fidelity zero-shot texture anomaly localization using feature correspondence analysis,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1134–1144, 2024.
[11] T. Aota, L. T. T. Tong, and T. Okatani, “Zero-shot versus many-shot: Unsupervised texture anomaly detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5564–5572, 2023.
[12] S. Kim, S. An, P. Chikontwe, M. Kang, E. Adeli, K. M. Pohl, and S. H. Park, “Few shot part segmentation reveals compositional logic for industrial anomaly detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 8591–8599, 2024.
[13] K. Batzner, L. Heckler, and R. König, “Efficientad: Accurate visual anomaly detection at millisecond-level latencies,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 128–138, 2024.
[14] P. Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, “Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization,” International Journal of Computer Vision, vol. 130, no. 4, pp. 947–969, 2022.
[15] L. Bonfiglioli, M. Toschi, D. Silvestri, N. Fioraio, and D. De Gregorio, “The eyecandies dataset for unsupervised multimodal anomaly detection and localization,” in Proceedings of the Asian Conference on Computer Vision, pp. 3586–3602, 2022.
[16] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning, pp. 8748–8763, PMLR, 2021.
[17] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are fewshot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[18] X. Li, Z. Zhang, X. Tan, C. Chen, Y. Qu, Y. Xie, and L. Ma, “Promptad: Learning prompts with only normal samples for few-shot anomaly detection,” arXiv preprint arXiv:2404.05231, 2024.
[19] Z. Gu, B. Zhu, G. Zhu, Y. Chen, M. Tang, and J. Wang, “Anomalygpt: Detecting industrial anomalies using large vision-language models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 1932–1940, 2024.
[20] T. Defard, A. Setkov, A. Loesch, and R. Audigier, “Padim: a patch distribution modeling framework for anomaly detection and localization,” in International Conference on Pattern Recognition, pp. 475–489, Springer, 2021.
[21] X. Jiang, J. Liu, J. Wang, Q. Nie, K. Wu, Y. Liu, C. Wang, and F. Zheng, “Softpatch: Unsupervised anomaly detection with noisy data,” Advances in Neural Information Processing Systems, vol. 35, pp. 15433–15445, 2022.
[22] J. Bae, J.-H. Lee, and S. Kim, “Pni: industrial anomaly detection using position and neighborhood information,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6373–6383, 2023.
[23] Z. Fang, X. Wang, H. Li, J. Liu, Q. Hu, and J. Xiao, “Fastrecon: Few-shot industrial anomaly detection via fast feature reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17481–17490, 2023.
[24] V. Zavrtanik, M. Kristan, and D. Skočaj, “Draem-a discriminatively trained reconstruction embedding for surface anomaly detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8330–8339, 2021.
[25] V. Zavrtanik, M. Kristan, and D. Skočaj, “Dsr–a dual subspace re-projection network for surface anomaly detection,” in European conference on computer vision, pp. 539–554, Springer, 2022.
[26] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “Highresolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
[27] F. Lu, X. Yao, C.-W. Fu, and J. Jia, “Removing anomalies as noises for industrial defect localization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16166–16175, 2023.
[28] H. He, J. Zhang, H. Chen, X. Chen, Z. Li, X. Chen, Y. Wang, C. Wang, and L. Xie, “Diad: A diffusion-based framework for multi-class anomaly detection,” arXiv preprint arXiv:2312.06607, 2023.
[29] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4183–4192, 2020.
[30] H. Deng and X. Li, “Anomaly detection via reverse distillation from one-class embedding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9737–9746, 2022.
[31] T. D. Tien, A. T. Nguyen, N. H. Tran, T. D. Huy, S. Duong, C. D. T. Nguyen, and S. Q. Truong, “Revisiting reverse distillation for anomaly detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 24511–24520, 2023.
[32] D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in International conference on machine learning, pp. 1530–1538, PMLR, 2015.
[33] M. Rudolph, B. Wandt, and B. Rosenhahn, “Same same but differnet: Semisupervised defect detection with normalizing flows,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1907–1916, 2021.
[34] M. Rudolph, T. Wehrbein, B. Rosenhahn, and B. Wandt, “Fully convolutional cross-scale-flows for image-based defect detection,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1088–1097, 2022.
[35] J. Yu, Y. Zheng, X. Wang, W. Li, Y. Wu, R. Zhao, and L. Wu, “Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,” arXiv preprint arXiv:2111.07677, 2021.
[36] C. Huang, H. Guan, A. Jiang, Y. Zhang, M. Spratling, and Y.-F. Wang, “Registration based few-shot anomaly detection,” in European Conference on Computer Vision, pp. 303–319, Springer, 2022.
[37] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009.
[38] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[39] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
[40] Z. Liu, Y. Zhou, Y. Xu, and Z. Wang, “Simplenet: A simple network for image anomaly detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20402–20411, 2023.

簡易檢索 / 詳目顯示

相關論文