研究生: |
何倞 He, Liang |
---|---|
論文名稱: |
基於偵測和深層卷積神經網路特徵共享的即時多車輛追蹤系統 Tracking by Detection with Convolution Sharing: A Deep Multi-vehicle Tracking System |
指導教授: |
陳煥宗
Chen, Hwann-Tzong |
口試委員: |
賴尚宏
劉庭祿 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 26 |
中文關鍵詞: | 追蹤 、深層學習 、卷積神經網路 、車輛 、偵測 |
外文關鍵詞: | tracking, deep learning, CNNs, vehicle, detection |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本篇論文中,我們提出了一個創新的高性能即時多車輛追蹤系統。該系統的主要理念是用偵測來追蹤,或者說,是基於偵測的追蹤。我們把目標追蹤看成是逐幀目標偵測、偵測結果迴歸、偵測結果嵌入度量以及最終偵測結果相聯的組合。實現本文系統的核心技術是卷積共享。我們在為不同功能而設計的卷積神經網路模型之間共享同一種極具表現力的特徵。
結合上文提到的主要理念以及核心技術,再加上為每一個卷積神經網路模型定制的訓練流程,我們的系統在一個非常具有挑戰性的真實世界電腦視覺基準數據集的目標追蹤分類上取得了很好的成績。並且,即使我們提出的系統是由多階段、多模型組合而成的,它依然可以做到近乎實時的追蹤。
This thesis presents a novel system for high performance multi-vehicle tracking. The main concept of our system is tracking by detection, or to say, detection based tracking. We consider the whole tracking problem as a combination of per-frame object detection, detection result regression, embedding metric of detection result, and association of final detection result. The core technique in implementation is convolution sharing. We share the same rich features between convolutional neural network (CNN) models that are designed for different parts of our system.
By integrating the concept and the core technique mentioned above, with a properly scheduled training process for specific CNN models, our system achieves a good performance when evaluating on the object tracking datasets of KITTI, a challenging real-world computer vision benchmark suite. Moreover, even though our tracking system is built on multi-stage and multi-CNN models, it can run efficiently in nearly real time.
[1] X. Mei, and H. Ling. Robust visual tracking using L1 minimization. In ICCV, 2009.
[2] D. Ross, J. Lim, R. Lin, and M. Yang. Incremental learning for robust visual tracking. IJCV, 2008.
[3] N. Wang, J. Wang, and D.-Y. Yeung. Online robust nonnegative dictionary learning for visual tracking. In ICCV, 2013.
[4] S. Hare, A. Saffari, and P. H. Torr. Struck: Structured output tracking with kernels. In ICCV, 2011.
[5] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised on-line boosting for robust tracking. In ECCV, 2008.
[6] N. Wang, S. Li, A. Gupta, and D.-Y. Yeung. Transferring rich feature hierarchies for robust visual tracking. arXiv preprint arXiv:1501.04587, 2015.
[7] S. Hong, T. You, S. Kwak, and B. Han. Online tracking by learning discriminative saliency map with convolutional neural network. arXiv preprint arXiv:1502.06796, 2015.
[8] N. Wang, J. Shi, J. Jia, and D.-Y. Yeung. Understanding and Diagnosing Visual Tracking Systems. arXiv preprint arXiv: 1504.06055, 2015.
[9] M. Isard, and A. Blake. ICONDENSATION: Unifying lowlevel and high-level tracking in a stochastic framework. In ECCV, 1998.
[10] D. Park, J. Kwon, and K. Lee. Robust visual tracking using autoregressive hidden Markov model. In CVPR, 2012.
[11] S. Avidan. Support vector tracking. PAMI, 2004.
[12] S. Avidan. Ensemble tracking. PAMI, 2007.
[13] B. Babenko, M.-H. Yang, and S. Belongie. Robust object tracking with online multiple instance learning. PAMI, 2011.
[14] Z. Kalal, K. Mikolajczyk, and J. Matas. Tracking-learning-detection. PAMI, 2012.
[15] J. Zhang, S. Ma, and S. Sclaroff. MEEM: robust tracking via multiple experts using entropy minimization. In ECCV, 2014.
[16] Y. Pang, and H. Ling. Finding the best from the second bests - inhibiting subjective bias in evaluation of visual tracking algorithms. In ICCV, 2013
[17] Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In CVPR, 2013.
[18] N. Dalal, and B. Triggs. Histograms of Oriented Gradients for Human Detection. In CVPR, 2005.
[19] L. Bourdev, S. Maji, T. Brox, and J. Malik. Detecting People Using Mutually Consistent Poselet Activations. In ECCV, 2010.
[20] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object Detection with Discriminatively Trained Part Based Models. PAMI, 2010.
[21] W. Luo, J. Xing, X. Zhang, X. Zhao, and T.-K. Kim. Multiple Object Tracking: A Literature Review. arXiv preprint arXiv: 1409.7618, 2014.
[22] A. Milan, S. Roth, and K. Schindler. Continuous Energy Minimization for Multitarget Tracking. PAMI, 2014.
[23] C. Huang, B. Wu, and R. Nevatia. Robust Object Tracking by Hierarchical Association of Detection Responses. In ECCV, 2008.
[24] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.
[25] J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC2012). http://www.image-net.org/challenges/LSVRC/2012/.
[26] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
[27] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
[28] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
[29] L. Wang, N. Pham, T.-T. Ng, G. Wang, K. Chan, and K. Leman. Learning deep features for multiple object tracking by using a multi-task learning strategy. In ICIP, 2014.
[30] S. Pang, A. Du, and Z. Yu. Robust multi-object tracking using deep learning framework. JOT, 2015.
[31] G. Hager, M. Dewan, and C. Stewart. Multiple kernel tracking with SSD. In ICCV, 2004.
[32] A. Elgammal, R. Duraiswami, and L. Davis. Probabilistic tracking in joint feature-spatial spaces. In ICCV, 2003.
[33] Y. Wu, B. Ma, M. Yang, J. Zhang, and Y. Jia. Metric learning based structural appearance model for robust visual tracking. CSVT, 2014.
[34] N. Jiang, W. Liu, and Y. Wu. Order determination and sparsity-regularized metric learning adaptive visual tracking. In ICCV, 2012.
[35] J. Hu, J. Lu, and Y.-P. Tan. Deep Metric Learning for Visual Tracking. CSVT, 2015.
[36] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014.
[37] R. Girshick. Fast R-CNN. In ICCV, 2015.
[38] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015.
[39] M. D. Zeiler, and R. Fergus. Visualizing and understanding convolutional neural networks. In ECCV, 2014.
[40] T. Pfister, K. Simonyan, J. Charles, and A. Zisserman. Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos. In ACCV, 2014.
[41] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-Scale Video Classification with Convolutional Neural Networks. In CVPR, 2014.
[42] R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality Reduction by Learning an Invariant Mapping. In CVPR, 2006.
[43] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv: 1408.5093, 2014.
[44] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
[45] B. Keni and S. Rainer. Evaluating multiple object tracking performance: the clear mot metrics. JIVP, 2008.
[46] Y. Li, C. Huang, and R. Nevatia. Learning to associate: HybridBoosted multi-target tracker for crowded scene. In CVPR 2009.
[47] A. Geiger, M. Lauer, C. Wojek, C. Stiller and R. Urtasun. 3D Traffic Scene Understanding from Movable Platforms. PAMI, 2014.
[48] W. Choi. Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor. In ICCV, 2015.
[49] J. Yoon, M. Yang, J. Lim and K. Yoon. Bayesian Multi-Object Tracking Using Motion Context from Multiple Objects. In WACV, 2015.
[50] A. Gaidon and E. Vig. Online Domain Adaptation for Multi-Object Tracking. In BMVC, 2015.
[51] S. Wang and C. Fowlkes. Learning Optimal Parameters For Multi-target Tracking. In BMVC, 2015.