用於多模態資料異常檢測之擴散模型學習技術

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉杰 Liu, Chieh
論文名稱：	用於多模態資料異常檢測之擴散模型學習技術 Learning Diffusion Models for Multi-View Anomaly Detection
指導教授：	陳煥宗 Chen, Hwann-Tzong
口試委員:	賴尚宏 Lai, Shang-Hong 許秋婷 Hsu, Chiou-Ting 劉庭祿 Liu, Tyng-Luh
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2024
畢業學年度：	113
語文別：	英文
論文頁數：	41
中文關鍵詞：	異常檢測、擴散模型、多模態
相關次數：	點閱：144 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文探索異常檢測（AD）中的一種新方法，可以同時且明確地生成同一目標物體的多個觀察樣本，以解決僅使用單一觀察樣本可能無法有效捕捉潛在缺陷的限制。更具體地說，針對一個特定場景，我們替其中每個目標物體，建立起與七個不同的資料模態的關聯。前六個模態涉及使用靜態攝影機在六種不同的光照條件下拍攝影像，而第七個模態涉及 3D 法向量資訊。我們稱此任務為多模態異常檢測。
為了解決這個問題，我們提出的方法包括訓練一個跨模態的 ControlNet，它可以生成一致的特徵圖，不論是何種資料模態。這種訓練策略使我們能夠減輕光照條件變化的影響，並有效地融合來自 RGB 顏色外觀和 3D 法向量幾何的資訊。此外，由於擴散過程不是確定性的，我們使用 DDIM 方法來提高我們已建立的基於擴散特徵的記憶庫在異常檢測推理中的適用性。為了證明我們方法的有效性，我們在 Eyecandies 資料集上進行了廣泛的對照實驗，並且展示了與最先進方法的實驗比較結果。

This thesis explores an emerging formulation in anomaly detection (AD) where multiple instances of the same object are produced simultaneously and distinctly to address the limitation that using only a single instance may not effectively capture any underlying defects. More specifically, we concentrate on a specific scenario where each object of interest is linked to seven distinct data views/representations. The first six views involve capturing images with a stationary camera under six different lighting conditions, while the seventh view pertains to the 3D normal information. We refer to our intended task as {\em multi-view anomaly detection}. To tackle this problem, our approach involves training a view-invariant ControlNet that can produce consistent feature maps regardless of the data views. This training strategy enables us to mitigate the impact of varying lighting conditions and to fuse information from both the RGB color appearance and the 3D normal geometry effectively. Moreover, as the diffusion process is not deterministic, we utilize the denoising diffusion implicit model (DDIM) scheme to improve the applicability of our established memory banks of diffusion-based features for anomaly detection inference.
To demonstrate the efficacy of our approach, we present extensive ablation studies and state-of-the-art experimental results on the Eyecandies dataset.

List of Tables 3
List of Figures 5
摘要 7
Abstract 8
Introduction 9
Related Work 12
1 Anomaly Detection 12
2 Diffusion Models 13
Approach 15
1 Problem Setting and Notations 15
2 View-Agnostic Latent Diffusion Model 15
3 Feature Representation via View-Invariant ControlNet 16
4 DDIM Memory Banks and Inference 18
Experiments 22
1 Experiment Setup 22
1.1 Dataset 22
1.2 Diffusion Model Settings 23
1.3 Inference Details 23
1.4 Metrics 24
2 Experimental Results 24
2.1 Experimental Results of the Eyecandies Dataset 24
2.2 Experimental Results of the MVTec 3D-AD Dataset 25
3 Qualitative Results 26
3.1 Results of Different Modalities 26
3.2 Comparisons with Other Methods 26
3.3 More Qualitative Results 27
4 The Effectiveness of Feature Loss 28
4.1 The Advantages of Feature Loss 28
4.2 Applying Feature Loss at Different Layers 30
5 Evaluating Decoder Block Features Across Layers 30
6 The Effectiveness of ControlNet 31
6.1 The Benefits of ControlNet 31
6.2 Computational Cost of ControlNet 32
7 Impact of Different Noise Intensity 33
8 Advantages of Multi-View Anomaly Detection 34
9 Detailed Hyperparameter Settings 35
Conclusion 36
Bibliography 37
                                

[1] J. Bae, J. Lee, and S. Kim. PNI: industrial anomaly detection using position and neighborhood information. In ICCV, 2023.
[2] K. Batzner, L. Heckler, and R. König. Efficientad: Accurate visual anomaly detection at millisecond-level latencies. CoRR, 2023.
[3] P. Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger. The mvtec anomaly detection dataset: A comprehensive real-world dataset for unsupervised anomaly detection. Int. J. Comput. Vis., 2021.
[4] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In CVPR, 2020.
[5] P. Bergmann, X. Jin, D. Sattlegger, and C. Steger. The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization. In G. M. Farinella, P. Radeva, and K. Bouatouch, editors, VISIGRAPP, 2022.
[6] P. Bergmann and D. Sattlegger. Anomaly detection in 3d point clouds using deep geometric descriptors. In WACV, 2023.
[7] L. Bonfiglioli, M. Toschi, D. Silvestri, N. Fioraio, and D. D. Gregorio. The eyecandies dataset for unsupervised multimodal anomaly detection and localization. In L. Wang, J. Gall, T. Chin, I. Sato, and R. Chellappa, editors, ACCV, 2022.
[8] R. Chen, G. Xie, J. Liu, J. Wang, Z. Luo, J. Wang, and F. Zheng. Easynet: An easy network for 3d industrial anomaly detection. In A. El-Saddik, T. Mei, R. Cucchiara, M. Bertini, D. P. T. Vallejo, P. K. Atrey, and M. S. Hossain, editors, International Conference on Multimedia, 2023.
[9] Y. Chu, C. Liu, T. Hsieh, H. Chen, and T. Liu. Shape-guided dual-memory learning for 3d anomaly detection. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, ICML, 2023.
[10] N. Cohen and Y. Hoshen. Sub-image anomaly detection with deep pyramid correspondences. CoRR, 2020.
[11] A. Costanzino, P. Zama Ramirez, G. Lisanti, and L. Di Stefano. Multimodal industrial anomaly detection by crossmodal feature mapping. In CVPR, 2024.
[12] T. Defard, A. Setkov, A. Loesch, and R. Audigier. Padim: A patch distribution modeling framework for anomaly detection and localization. In A. D. Bimbo, R. Cucchiara, S. Sclaroff, G. M. Farinella, T. Mei, M. Bertini, H. J. Escalante, and R. Vezzani, editors, Pattern Recognition. ICPR International Workshops and Challenges, 2020.
[13] P. Dhariwal and A. Q. Nichol. Diffusion models beat gans on image synthesis. In M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, editors, NeurIPS, 2021.
[14] Z. Gu, J. Zhang, L. Liu, X. Chen, J. Peng, Z. Gan, G. Jiang, A. Shu, Y. Wang, and L. Ma. Rethinking reverse distillation for multi-modal anomaly detection. Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
[15] H. He, J. Zhang, H. Chen, X. Chen, Z. Li, X. Chen, Y. Wang, C. Wang, and L. Xie. Diad: A diffusion-based framework for multi-class anomaly detection. CoRR, 2023.
[16] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, NeurIPS, 2020.
[17] J. Ho and T. Salimans. Classifier-free diffusion guidance. CoRR, 2022.
[18] E. Horwitz and Y. Hoshen. Back to the feature: Classical 3d features are (almost) all you need for 3d anomaly detection. In CVPR, 2023.
[19] T. Hu, J. Zhang, R. Yi, Y. Du, X. Chen, L. Liu, Y. Wang, and C. Wang. Anomalydiffusion: Few-shot anomaly image generation with diffusion model. CoRR, 2023.
[20] D. A. Hudson, D. Zoran, M. Malinowski, A. K. Lampinen, A. Jaegle, J. L. McClelland, L. Matthey, F. Hill, and A. Lerchner. SODA: bottleneck diffusion models for representation learning, 2023.
[21] X. Ju, A. Zeng, Y. Bian, S. Liu, and Q. Xu. Direct inversion: Boosting diffusion-based editing with 3 lines of code, 2023.
[22] J. Karras, A. Holynski, T. Wang, and I. Kemelmacher-Shlizerman. Dreampose: Fashion image-to-video synthesis via stable diffusion. In ICCV, 2023.
[23] S. Lee, S. Lee, and B. C. Song. CFA: coupled-hypersphere-based feature adaptation for target-oriented anomaly localization. IEEE Access, 2022.
[24] C. Li, K. Sohn, J. Yoon, and T. Pfister. Cutpaste: Self-supervised learning for anomaly detection and localization. In CVPR, 2021.
[25] J. Liu, G. Xie, R. Chen, X. Li, J. Wang, Y. Liu, C. Wang, and F. Zheng. Real3d-ad: A dataset of point cloud anomaly detection, 2023.
[26] Z. Liu, J. P. Zhou, Y. Wang, and K. Q. Weinberger. Unsupervised out-of-distribution detection with diffusion inpainting. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, ICML, 2023.
[27] Z. Liu, Y. Zhou, Y. Xu, and Z. Wang. Simplenet: A simple network for image anomaly detection and localization. In CVPR, 2023.
[28] F. Lu, X. Yao, C. Fu, and J. Jia. Removing anomalies as noises for industrial defect localization. In ICCV, 2023.
[29] G. Luo, L. Dunlap, D. H. Park, A. Holynski, and T. Darrell. Diffusion hyperfeatures: Searching through time and space for semantic correspondence. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, NeurIPS, 2023.
[30] R. Mokady, A. Hertz, K. Aberman, Y. Pritch, and D. Cohen-Or. Null-text inversion for editing real images using guided diffusion models. In CVPR, 2023.
[31] A. Mousakhan, T. Brox, and J. Tayyub. Anomaly detection with conditioned denoising diffusion models. CoRR, 2023.
[32] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
[33] K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P. V. Gehler. Towards total recall in industrial anomaly detection. In CVPR, 2022.
[34] M. Rudolph, T. Wehrbein, B. Rosenhahn, and B. Wandt. Fully convolutional cross-scale-flows for image-based defect detection. In Winter Conference on Applications of Computer Vision (WACV), 2022.
[35] M. Rudolph, T. Wehrbein, B. Rosenhahn, and B. Wandt. Asymmetric student-teacher networks for industrial anomaly detection. In WACV, 2023.
[36] H. M. Schlüter, J. Tan, B. Hou, and B. Kainz. Natural synthetic anomalies for self-supervised anomaly detection and localization. In S. Avidan, G. J. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, editors, ECCV, 2022.
[37] C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, and J. Jitsev. Laion-5b: An open large-scale dataset for training next generation image-text models, 2022.
[38] L. Tang, M. Jia, Q. Wang, C. P. Phoo, and B. Hariharan. Emergent correspondence from image diffusion. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, NeurIPS, 2023.
[39] N. Tumanyan, M. Geyer, S. Bagon, and T. Dekel. Plug-and-play diffusion features for text-driven image-to- image translation. In CVPR, 2023.
[40] Y. Wang, J. Peng, J. Zhang, R. Yi, Y. Wang, and C. Wang. Multimodal industrial anomaly detection via hybrid fusion. In CVPR, 2023.
[41] A. Wu, D. Chen, and C. Deng. Deep feature deblurring diffusion for detecting out-of-distribution objects. In ICCV, 2023.
[42] J. Wyatt, A. Leach, S. M. Schmon, and C. G. Willcocks. Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In CVPR, 2022.
[43] V. Zavrtanik, M. Kristan, and D. Skocaj. Cheating depth: Enhancing 3d surface anomaly detection via depth simulation. CoRR, 2023.
[44] V. Zavrtanik, M. Kristan, and D. Skoaj. Draem – a discriminatively trained reconstruction embedding for surface anomaly detection, 2021.
[45] V. Zavrtanik, M. Kristan, and D. Skoaj. Reconstruction by inpainting for visual anomaly detection. Pattern Recognition, 2021.
[46] H. Zhang, Z. Wang, Z. Wu, and Y.-G. Jiang. Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection, 2023.
[47] L. Zhang, A. Rao, and M. Agrawala. Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
[48] X. Zhang, N. Li, J. Li, T. Dai, Y. Jiang, and S. Xia. Unsupervised surface anomaly detection with diffusion probabilistic model. In ICCV, 2023.
[49] X. Zhang, S. Li, X. Li, P. Huang, J. Shan, and T. Chen. Destseg: Segmentation guided denoising student-teacher for anomaly detection. In CVPR, 2023.
[50] Y. Zheng, X. Wang, Y. Qi, W. Li, and L. Wu. Benchmarking unsupervised anomaly detection and localization. CoRR, 2022.
[51] Q. Zhou, W. Li, L. Jiang, G. Wang, G. Zhou, S. Zhang, and H. Zhao. Pad: A dataset and benchmark for pose-agnostic anomaly detection, 2023.

簡易檢索 / 詳目顯示

相關論文