以Edge-boxes為基礎的卷積神經網路方法應用於交通場景多目標追蹤

簡易檢索 / 詳目顯示

回結果列表

研究生：	呂明恩 Lu, Ming-En
論文名稱：	以Edge-boxes為基礎的卷積神經網路方法應用於交通場景多目標追蹤 Edge-boxes based Convolutional Neural Networks Approach for Multi-Target Tracking in Traffic Scenes
指導教授：	王家祥 Wang, Jia-Shung
口試委員:	陳煥宗 Chen, Hwann-Tzong 葉梅珍 Yeh, Mei-Chen
學位類別：	碩士 Master
系所名稱：
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	42
中文關鍵詞：	自動車駕駛、視覺追蹤、Edge-boxes 、卷積神經網路、支援向量機
外文關鍵詞：	Autonomous car driving, Visual tracking, Edge-boxes, Convolutional Neural Network, Support Vector Machine
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

計算機視覺能幫助自動車偵測或是追蹤附近的物件，例如人、交通公具或是動物。然而，視覺追蹤存在著許多問題，包含光影變化、變形和遮檔…等等。為了解決這些複雜的問題，外觀模型常常被用來描述一個物體的特徵，而判別模型也用來分類一個候選圖像是否為目標還是背景。
在這篇論文當中，我們提出了一個基於Edge-boxes的追蹤方法，利用邊緣的特性擷取出少量但是高品質的候選圖像。此外，還利用預先訓練好的卷積神經網路當作特徵擷取器來取出候選圖像的特徵，再計算外觀、移動、大小這三種成本函數，並輸入在線支援向量機當作一個分類器來決定是否為目標，而不是單純利用這三種成本的總合。最後，為了維護追蹤器，我們設計了一個更新器去更新模板、預測狀態以及支援向量機。實驗結果展示了本篇的方法在就算遇到了一些挑戰還是能維持很好的表現。由於事先訓練的卷積神經網路能夠從目標提取足夠一般性的特徵，光線明暗變化和變形的問題都能被簡單地解決，而追蹤成功率在重疊率為0.5以上的標準下能夠高達96%。另一方面，拜Edge-boxes和模板所賜，因被遮擋而追蹤失敗的情況也能夠被避免掉，追蹤準確率能夠高達95.238%。

Computer vision is important for autonomous cars to detect or track the object nearby such as people, vehicles or animals. However, there are many problems in visual tracking including illumination variation, deformation and occlusion, etc. To deal with these complicated problems, the appearance model is utilized to describe the target, and the discriminative model is adopted to classify a candidate whether it is the target or the background.
In this thesis, we propose a tracking method based on Edge-boxes, which provides a small but high-quality set of proposals based on edges. In addition, a pre-trained Convolutional Neural Network (CNN) is used to extract the feature from an image patch, and then compute the cost functions of the appearance, motion and size. With these costs, online Support Vector Machine (SVM) is adopted to be a classifier instead of simple computation of the sum of the costs. Finally, we maintain our tracker by updating templates, predicted state, and SVM.
The experimental results demonstrate that the proposed method performs well in videos existing many challenges. Since the pre-trained CNN extracts general features of targets, the illumination variation and deformation problems can be easily solved, and the success rate can be up to 96% under the overlap threshold 0.5. The tracking failure is caused by occlusion in the videos can be also avoided due to high-quality proposals generated by Edge-boxes and templates stored in previous frames, and the MOTA can be up to 95.238%.

致謝    i
中文摘要    ii
ABSTRACT    iii
CONTENTS    v
LIST OF FIGURES    viii
LIST OF TABLES    x
Chapter 1.    Introduction    1
Chapter 2.    Related Work    4
2.1    Object Tracking    4
2.2    Searching Methods in Tracking    5
2.2.1    Single Window Search    5
2.2.2    Particle-based Search    6
2.2.3    Instance Search    6
2.3    Convolutional Neural Networks (CNN)    7
2.4    Support Vector Machine (SVM)    9
Chapter 3.    Proposed Methods    11
3.1    Initialization    12
3.2    Edge-boxes    13
3.3    Pre-trained CNN    15
3.4    Cost Function and Online SVM    16
3.5    Updater    18
3.5.1    Update Templates and Predicted State    18
3.5.2    Update SVM    20
Chapter 4.    Experimental Results    22
4.1    Implementation Details    22
4.2    Evaluation on Driving Recorder Videos    22
4.3    Evaluation on ALOV300++    25
4.4    Analysis of Parameters    29
4.5    Analysis of Cost Function    32
4.6    Time Consumption    33
Chapter 5.    Conclusion    35
REFERENCES    36

                                

[1] C. L. Zitnick and P. Dollár, "Edge boxes: Locating object proposals from edges," in European Conference on Computer Vision, 2014, pp. 391-405: Springer.
[2] S. Hare et al., "Struck: Structured Output Tracking with Kernels," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 2096-2109, 2016.
[3] X. Jia, H. Lu, and M.-H. Yang, "Visual tracking via adaptive structural local sparse appearance model," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1822-1829: IEEE.
[4] J. Zhang, S. Ma, and S. Sclaroff, "MEEM: robust tracking via multiple experts using entropy minimization," in European Conference on Computer Vision, 2014, pp. 188-203: Springer.
[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[6] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[7] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
[8] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, "Overfeat: Integrated recognition, localization and detection using convolutional networks," arXiv preprint arXiv:1312.6229, 2013.
[9] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," in European Conference on Computer Vision, 2014, pp. 346-361: Springer.
[10] A. Toshev and C. Szegedy, "Deeppose: Human pose estimation via deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1653-1660.
[11] L. Lin, G. Wang, R. Zhang, R. Zhang, X. Liang, and W. Zuo, "Deep structured scene parsing by learning with image descriptions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2276-2284.
[12] Z. Hong, C. Zhe, C. Wang, X. Mei, D. Prokhorov, and D. Tao, "MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 749-758.
[13] C. P. Diehl and G. Cauwenberghs, "SVM incremental learning, adaptation and optimization," in Proceedings of the International Joint Conference on Neural Networks, 2003, vol. 4, pp. 2685-2690: IEEE.
[14] J. Fan, W. Xu, Y. Wu, and Y. Gong, "Human Tracking Using Convolutional Neural Networks," IEEE Transactions on Neural Networks, vol. 21, no. 10, pp. 1610-1623, 2010.
[15] D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang, "Incremental Learning for Robust Visual Tracking," Int. J. Comput. Vision, vol. 77, no. 1-3, pp. 125-141, 2008.
[16] N. Wang, S. Li, A. Gupta, and D.-Y. Yeung, "Transferring rich feature hierarchies for robust visual tracking," arXiv preprint arXiv:1501.04587, 2015.
[17] D. Comaniciu, V. Ramesh, and P. Meer, "Real-time tracking of non-rigid objects using mean shift," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2000, vol. 2, pp. 142-149: IEEE.
[18] C. Jia et al., "A Tracking-Learning-Detection (TLD) method with local binary pattern improved," in IEEE International Conference on Robotics and Biomimetics, 2015, pp. 1625-1630.
[19] X. Dong, J. Shen, D. Yu, W. Wang, J. Liu, and H. Huang, "Occlusion-Aware Real-Time Object Tracking," IEEE Transactions on Multimedia, vol. 19, no. 4, pp. 763-771, 2017.
[20] G. Zhu, F. Porikli, and H. Li, "Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 943-951.
[21] W. Zhong, H. Lu, and M. H. Yang, "Robust object tracking via sparsity-based collaborative model," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1838-1845.
[22] S. Hong, T. You, S. Kwak, and B. Han, "Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network," in ICML, 2015, pp. 597-606.
[23] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, "High-speed tracking with kernelized correlation filters," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583-596, 2015.
[24] M. Danelljan, G. Häger, F. Khan, and M. Felsberg, "Accurate scale estimation for robust visual tracking," in British Machine Vision Conference, Nottingham, 2014: BMVA Press.
[25] D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-based object tracking," IEEE Transactions on pattern analysis and machine intelligence, vol. 25, no. 5, pp. 564-577, 2003.
[26] K. Zhang, Q. Liu, Y. Wu, and M.-H. Yang, "Robust visual tracking via convolutional networks without training," IEEE Transactions on Image Processing, vol. 25, no. 4, pp. 1779-1792, 2016.
[27] M.-M. Cheng, Z. Zhang, W.-Y. Lin, and P. Torr, "BING: Binarized normed gradients for objectness estimation at 300fps," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3286-3293.
[28] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004.
[29] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005, vol. 1, pp. 886-893: IEEE.
[30] D. H. Hubel and T. N. Wiesel, "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex," The Journal of physiology, vol. 160, no. 1, pp. 106-154, 1962.
[31] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[32] J. Donahue et al., "Decaf: A deep convolutional activation feature for generic visual recognition," in International conference on machine learning, 2014, pp. 647-655.
[33] A. W. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, "Visual tracking: An experimental survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1442-1468, 2014.
[34] K. Bernardin and R. Stiefelhagen, "Evaluating multiple object tracking performance: the CLEAR MOT metrics," Journal on Image and Video Processing, vol. 2008, p. 1, 2008.
[35] Y. Wu, J. Lim, and M.-H. Yang, "Online object tracking: A benchmark," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2411-2418.

簡易檢索 / 詳目顯示

相關論文