監控視訊中之運動物體感知異常檢測｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊浚隴 Yang, Chun-Lung
論文名稱：	監控視訊中之運動物體感知異常檢測 Moving-Object-Aware Anomaly Detection in Surveillance Videos
指導教授：	賴尚宏 Lai, Shang-Hong
口試委員:	邱瀞德 Chiu, Ching-Te 林嘉文 Lin, Chia-Wen 劉庭祿 Liu, Tyng-Luh
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	45
中文關鍵詞：	異常偵測、電腦視覺、深度學習
外文關鍵詞：	Anomaly Detection, Computer Vision, Deep Learning
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

視訊異常檢測在監控視訊中的異常動作或事件之自動檢測中起著至關重要的
作用，其有助於保護公共安全。近期，深度學習模型被廣泛地採用於異常檢
測問題並取得出色的成果，視訊中的異常最主要發生在前景物體區域，然而
先前基於圖像生成的方法中，忽略了針對此性質加以訓練其模型，一些最新
的方法應用了預訓練的物件偵測模型將場景中之局部物件資訊提供給異常檢
測器，而這些方法需要對視訊中的異常類型具備先驗知識，這樣的設定與異
常檢測之無監督式學習設置產生矛盾。於本文，我們提出了一種基於使用卷
積自動編碼器架構，學習預測視訊中的運動物體特徵之新框架。我們訓練異
常檢測器以瞭解場景中的移動物體區域，這更適切地遵循了無監督的設置，
而無需事先瞭解特定物件類別，運動物體區域的外觀和運動特徵為無監督異
常檢測學習提供了描述運動物體之綜合信息，此外，所提出的潛在表示學習
策略鼓勵卷積自編碼器模型為正常訓練數據學習更收斂的潛在表示，而異常
數據則表現出截然不同的表示，最後，我們還提出了一種基於運動前景物體
區域的特徵預測誤差和潛在表示規律性之新異常評分方法。我們將所提出的
方法實驗於視訊異常檢測的六個公開數據集上，實驗結果顯示我們的方法與
最先進之方法相比，取得了非常具競爭力的結果。

Video anomaly detection plays a crucial role in automatically detecting abnormal actions or events from surveillance video, which can help to protect public safety. Deep learning techniques have been extensively employed and achieved excellent anomaly detection results recently. However, previous image-reconstruction-based models did not fully exploit foreground object regions for the video anomaly detection. Some recent works applied pre-trained object detectors to provide local context in the video surveillance scenario for anomaly detection. Nevertheless, these methods require prior knowledge of object types for the anomaly which is somewhat contradictory to the problem setting of unsupervised anomaly detection. In this thesis, we propose a novel framework based on learning the moving-object feature prediction based on a convolutional autoencoder architecture. We train our anomaly detector to be aware of moving-object regions in a scene without using an object detector or requiring prior knowledge of specific object classes for the anomaly. The appearance and motion features in moving objects regions provide comprehensive information of moving foreground objects for unsupervised learning of video anomaly detector. Besides, the proposed latent representation learning scheme encourages the convolutional autoencoder model to learn a more convergent latent representation for normal training data, while anomalous data exhibits quite different representations. We also propose a novel anomaly scoring method based on the feature prediction errors of moving foreground object regions and the latent representation regularity. Our experimental results demonstrate that the proposed approach achieves competitive results compared with SOTA methods on six public datasets for video anomaly detection.

Introduction 1
1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Related Work 7
1 Imagereconstructionbased
methods . . . . . . . . . . . . . . . . . 7
2 Semanticinformationbased
methods . . . . . . . . . . . . . . . . 8
3 Latentrepresentationbased
methods . . . . . . . . . . . . . . . . . 9
Proposed Method 11
1 Movingforegroundobject
feature prediction . . . . . . . . . . . . 12
2 Foregroundobjectaware
convolutional autoencoder . . . . . . . . 13
3 Latentqueryrestricting
variational autoencoder . . . . . . . . . . . 16
4 Anomaly scoring strategy . . . . . . . . . . . . . . . . . . . . . . . 17
5 Network structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Experiments 21
1 Experimental setting . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2 Implementation details . . . . . . . . . . . . . . . . . . . . 23
2 Experimental comparison . . . . . . . . . . . . . . . . . . . . . . . 24
2.1 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Comparison with the stateoftheart
. . . . . . . . . . . . . 25
3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Qualitative results . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Ablation study . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Crossdataset
testing . . . . . . . . . . . . . . . . . . . . . 30
3.4 Latent representation analysis . . . . . . . . . . . . . . . . 31
3.5 Case discussion . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Memory consumption and inference time . . . . . . . . . . 35
4 Experiments on noncampus
datasets . . . . . . . . . . . . . . . . . 35
4.1 UR Fall Detection dataset . . . . . . . . . . . . . . . . . . 35
4.2 TrafficTrain
dataset . . . . . . . . . . . . . . . . . . . . . 37
4.3 TrafficBelleview
dataset . . . . . . . . . . . . . . . . . . . 38
Conclusion 40
References 41
                                

[1] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. TensorFlow: Largescale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
[2] Akcay, S., AtapourAbarghouei, A., and Breckon, T. P. Ganomaly: Semi supervised anomaly detection via adversarial training. In Asian conference on computer vision (2018), Springer, pp. 622–637.
[3] Cao, Z., Simon, T., Wei, S.E., and Sheikh, Y. Realtime multiperson 2d pose estimation using part affinity fields. In CVPR (2017).
[4] Chang, Y., Tu, Z., Xie, W., and Yuan, J. Clustering driven deep autoencoder for video anomaly detection. In European Conference on Computer Vision (2020), Springer, pp. 329–345.
[5] Doshi, K., and Yilmaz, Y. Anyshot sequential anomaly detection in surveil lance videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2020), pp. 934–935.
[6] Doshi, K., and Yilmaz, Y. Continual learning for anomaly detection in surveil lance videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2020), pp. 254–255.
[7] Fang, H.S., Xie, S., Tai, Y.W., and Lu, C. Rmpe: Regional multiperson pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 2334–2343.
[8] Finn, C., Abbeel, P., and Levine, S. Modelagnostic metalearning for fast adaptation of deep networks. In International Conference on Machine Learn ing (2017), PMLR, pp. 1126–1135.
[9] Georgescu, M.I., Barbalau, A., Ionescu, R. T., Khan, F. S., Popescu, M., and Shah, M. Anomaly detection in video via selfsupervised and multitask learn ing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 12742–12752.
[10] Georgescu, M. I., Ionescu, R., Khan, F. S., Popescu, M., and Shah, M. A backgroundagnostic framework with adversarial training for abnormal event detection in video. IEEE Transactions on Pattern Analysis and Machine In telligence (2021), 1–1.
[11] Girshick, R. Fast rcnn. In Proceedings of the IEEE international conference on computer vision (2015), pp. 1440–1448.
[12] Golda, T., Murzyn, N., Qu, C., and Kroschel, K. What goes around comes around: Cycleconsistencybased shortterm motion prediction for anomaly detection using generative adversarial networks. In 2019 16th IEEE Interna tional Conference on Advanced Video and Signal Based Surveillance (AVSS) (2019), IEEE, pp. 1–8.
[13] Gong, D., Liu, L., Le, V., Saha, B., Mansour, M. R., Venkatesh, S., and Hengel, A. v. d. Memorizing normality to detect anomaly: Memoryaugmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE International Conference on Computer Vision (2019), pp. 1705–1714.
[14] Goodfellow, I. J., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014).
[15] Hasan, M., Choi, J., Neumann, J., RoyChowdhury, A. K., and Davis, L. S. Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 733–742.
[16] Hinami, R., Mei, T., and Satoh, S. Joint detection and recounting of abnor mal events by learning deep generic knowledge. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 3619–3627.
[17] Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Pro ceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 2462–2470.
[18] Ionescu, R. T., Khan, F. S., Georgescu, M.I., and Shao, L. Objectcentric autoencoders and dummy anomalies for abnormal event detection in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 7842–7851.
[19] Kim, J., and Grauman, K. Observe locally, infer globally: a spacetime mrf for detecting abnormal activities with incremental updates. In 2009 IEEE confer ence on computer vision and pattern recognition (2009), IEEE, pp. 2921–2928.
[20] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[21] Kingma, D. P., and Welling, M. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[22] Kwolek, B., and Kepski, M. Human fall detection on embedded platform using depth maps and wireless accelerometer. Computer methods and programs in biomedicine 117, 3 (2014), 489–501.
[23] Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 2117–2125.
[24] Liu, P., Lyu, M., King, I., and Xu, J. Selflow: Selfsupervised learning of optical flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 4571–4580.
[25] Liu, W., Luo, W., Lian, D., and Gao, S. Future frame prediction for anomaly detection–a new baseline. In Proceedings of the IEEE Conference on Com puter Vision and Pattern Recognition (2018), pp. 6536–6545.
[26] Lu, C., Shi, J., and Jia, J. Abnormal event detection at 150 fps in matlab. In Proceedings of the IEEE international conference on computer vision (2013), pp. 2720–2727.
[27] Lu, Y., Yu, F., Reddy, M. K. K., and Wang, Y. Fewshot sceneadaptive anomaly detection. In European Conference on Computer Vision (2020), Springer, pp. 125–141.
[28] Luo, W., Liu, W., and Gao, S. Remembering history with convolutional lstm for anomaly detection. In 2017 IEEE International Conference on Multimedia and Expo (ICME) (2017), IEEE, pp. 439–444.
[29] Luo, W., Liu, W., and Gao, S. A revisit of sparse coding based anomaly de tection in stacked rnn framework. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 341–349.
[30] Mahadevan, V., Li, W., Bhalodia, V., and Vasconcelos, N. Anomaly detection in crowded scenes. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010), IEEE, pp. 1975–1981.
[31] Markovitz, A., Sharir, G., Friedman, I., ZelnikManor, L., and Avidan, S. Graph embedded pose clustering for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 10539–10547.
[32] Masci, J., Meier, U., Cireşan, D., and Schmidhuber, J. Stacked convolutional autoencoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks (2011), Springer, pp. 52–59.
[33] Menze, M., and Geiger, A. Object scene flow for autonomous vehicles. In Pro ceedings of the IEEE conference on computer vision and pattern recognition (2015), pp. 3061–3070.
[34] Morais, R., Le, V., Tran, T., Saha, B., Mansour, M., and Venkatesh, S. Learn ing regularity in skeleton trajectories for anomaly detection in videos. In Pro ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 11996–12004.
[35] Narasimhan, M. G., and Kamath, S. Dynamic video anomaly detection and localization using sparse denoising autoencoders. Multimedia Tools and Ap plications 77, 11 (2018), 13173–13195.
[36] Nguyen, T.N., and Meunier, J. Anomaly detection in video sequence with appearancemotion correspondence. In Proceedings of the IEEE International Conference on Computer Vision (2019), pp. 1273–1283.
[37] Nogas, J., Khan, S. S., and Mihailidis, A. Fall detection from thermal camera using convolutional lstm autoencoder. In Proceedings of the 2nd workshop on Aging, Rehabilitation and Independent Assisted Living, IJCAI Workshop (2018).
[38] Nogas, J., Khan, S. S., and Mihailidis, A. Deepfall: Noninvasive fall detection with deep spatiotemporal convolutional autoencoders. Journal of Healthcare Informatics Research 4, 1 (2020), 50–70.
[39] Park, H., Noh, J., and Ham, B. Learning memoryguided normality for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 14372–14381.
[40] Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., and Sebe, N. Abnormal event detection in videos using generative adversarial nets. In 2017 IEEE International Conference on Image Processing (ICIP) (2017), IEEE, pp. 1577–1581.
[41] Ravanbakhsh, M., Sangineto, E., Nabi, M., and Sebe, N. Training adversarial discriminators for crosschannel abnormal event detection in crowds. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (2019), IEEE, pp. 1896–1904.
[42] Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. You only look once: Unified, realtime object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 779–788.
[43] Redmon, J., and Farhadi, A. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
[44] Ronneberger, O., Fischer, P., and Brox, T. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical im age computing and computerassisted intervention (2015), Springer, pp. 234– 241.
[45] Sabokrou, M., Khalooei, M., Fathy, M., and Adeli, E. Adversarially learned oneclass classifier for novelty detection. In Proceedings of the IEEE Confer ence on Computer Vision and Pattern Recognition (2018), pp. 3379–3388.
[46] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. Mo bilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 4510– 4520.
[47] Tudor Ionescu, R., Smeureanu, S., Alexe, B., and Popescu, M. Unmasking the abnormal events in video. In Proceedings of the IEEE International Confer ence on Computer Vision (2017), pp. 2895–2903.
[48] Van Der Maaten, L. Accelerating tsne using treebased algorithms. The Jour nal of Machine Learning Research 15, 1 (2014), 3221–3245.
[49] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).
[50] Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
[51] Woo, S., Park, J., Lee, J.Y., and Kweon, I. S. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV) (2018), pp. 3–19.
[52] Wu, K., Otoo, E., and Suzuki, K. Optimizing twopass connectedcomponent labeling algorithms. Pattern Analysis and Applications 12, 2 (2009), 117–135.
[53] Xu, D., Yan, Y., Ricci, E., and Sebe, N. Detecting anomalous events in videos by learning deep representations of appearance and motion. Computer Vision and Image Understanding 156 (2017), 117–127.
[54] Yu, G., Wang, S., Cai, Z., Zhu, E., Xu, C., Yin, J., and Kloft, M. Cloze test helps: Effective video anomaly detection via learning to complete video events. In Proceedings of the 28th ACM International Conference on Multi media (2020), pp. 583–591.
[55] Zaharescu, A., and Wildes, R. Anomalous behaviour detection using spa tiotemporal oriented energies, subset inclusion histogram comparison and eventdriven processing. In European Conference on Computer Vision (2010), Springer, pp. 563–576.
[56] Zhang, H., Goodfellow, I., Metaxas, D., and Odena, A. Selfattention gener ative adversarial networks. In International conference on machine learning (2019), PMLR, pp. 7354–7363.
[57] Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C. C., Lin, D., and Jia, J. Psanet: Pointwise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 267–283.
[58] Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., and Hua, X.S. Spatiotemporal autoencoder for video anomaly detection. In Proceedings of the 25th ACM international conference on Multimedia (2017), pp. 1933–1941.

簡易檢索 / 詳目顯示

相關論文