研究生: |
施冠宏 Shih, Kuan-Hung |
---|---|
論文名稱: |
基於剪枝及多特徵連接輔助區域候選網路之即時物件偵測 Real-Time Object Detection via Pruning and Concatenated Multi-Feature Assisted Region Proposal Network |
指導教授: |
邱瀞德
Chiu, Ching-Te |
口試委員: |
李政崑
Lee, Jenq-Kuen 黃朝宗 Huang, Chao-Tsung |
學位類別: |
碩士 Master |
系所名稱: |
|
論文出版年: | 2018 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 55 |
中文關鍵詞: | 物件偵測 、卷積神經網路壓縮 、剪枝 、區域候選網路 、特徵連接 |
外文關鍵詞: | ObjectDetection, ConvolutionalNeuronNetworkCompression, Pruning, RegionProposalNetwork, FeatureMapsConcatenation |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
物件偵測一直以來都是計算機視覺領域中非常重要的研究項目之一,其目 的是找出影像中所有物件的位置並分類。近年來隨著深度學習的蓬勃發展,已 有許多研究將深度學習應用於物件偵測中, 並且取得非常成功的結果。目前用於 物件偵測的深度學習網路架構可分為兩類,一階及二階。兩者的區別為,一階架 構則是假設每個位置皆含有物件並直接預測其類別。而二階架構先找出可能含 有物件的區域,再針對所找出的區域偵測,判斷是否含有物件並分類。兩者在效 能上也有不同的特性,一階架構速度快但精確性較差,反之,二階架構的速度慢 但有較佳的精確性。
本研究針對目前最為廣泛使用的二階架構 Faster R-CNN 為基礎,目標為改 善運行速度達到即時物件偵測且不影響精確性。我們首先使用剪枝的方法同時 減少參數使用量及運算量,可以預期剪枝會造成精確性下降。因此,我們提出多 特徵連接輔助區域候選網路(Multi-Feature Assisted Region Proposal Network) 改善正確率,其中包含輔助多特徵連接(Assisted Multi-Feature Concatenation) 模組結合不同捲積層的特徵,接著將結合後的特徵作為減量區域候選網路
(Reduced Region Proposal Network)的輸入,藉由我們所提出的架構能更精確 地找出候選區域進而彌補因剪枝損失的正確率。最後,我們將所提出的方法搭 配 ZF-Net 及 VGG16 於 PASCAL VOC 2007 資料集上測試。實驗結果顯示,在 ZF-Net 上,能將模型由 227 MB 壓縮至 45 MB,節省 66% 的運算量;在 VGG16 上,模型由 523 MB 壓縮至 144 MB,節省 77% 運算量。藉此,ZF-Net 達到速 度每秒 40 幀而 VGG16 達到每秒 27 幀。在大量壓縮後,ZF-Net 及 VGG16 的 精確度分別達到 60.2% mAP 及 69.1% mAP。
Object detection is an important research area in the field of computer vision. Its purpose is to find all objects in an image and recognize the class of each object. Since the development of deep learning, an increasing number of studies have ap- plied deep learning in object detection and have achieved successful results. For object detection, there are two types of network architectures: one-stage and two- stage. This study is based on the widely-used two-stage architecture, called Faster R-CNN, and our goal is to improve the inference time to achieve real-time speed without losing accuracy.
First, we use pruning to reduce the number of parameters and the amount of computation, which is expected to reduce accuracy as a result. Therefore, we propose a multi-feature assisted region proposal network composed of assisted multi-feature concatenation and a reduced region proposal network to improve accuracy. Assisted multi-feature concatenation combines feature maps from dif- ferent convolutional layers as inputs for a reduced region proposal network. With our proposed method, the network can find regions of interest (ROIs) more accu- rately. Thus, it compensates for loss of accuracy due to pruning. Finally, we use ZF-Net and VGG16 as backbones, and test the network on the PASCAL VOC 2007 dataset.
The results show that we can compress ZF-Net from 227 MB to 45 MB and save 66% of computation. We can also compress VGG16 from 523 MB to 144 MB and save 77% of computation. Consequently, the inference speed is 40 FPS for ZF-Net and 27 FPS for VGG16. With the significant compression rates, the accuracies are 60.2% mean average precision (mAP) and 69.1% mAP for ZF-Net and VGG16, respectively.
[1] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
[2] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural infor- mation processing systems, 2015, pp. 91–99.
[3] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information pro- cessing systems, 2012, pp. 1097–1105.
[5] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
48
[7] W. Zhiqiang and L. Jun, “A review of object detection based on convolutional neural network,” in Control Conference (CCC), 2017 36th Chinese. IEEE, 2017, pp. 11 104–11 109.
[8] B. Wu, F. N. Iandola, P. H. Jin, and K. Keutzer, “Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving.” in CVPR Workshops, 2017, pp. 446–454.
[9] Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li, and S. Hu, “Traffic-sign detection and classification in the wild,” in Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2110–2118.
[10] W. Ouyang and X. Wang, “Joint deep learning for pedestrian detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2056–2063.
[11] J. Park, D. H. Kim, Y. S. Shin, and S.-h. Lee, “A comparison of convolutional object detectors for real-time drone tracking using a ptz camera,” in Control, Automation and Systems (ICCAS), 2017 17th International Conference on. IEEE, 2017, pp. 696–699.
[12] G. V. Konoplich, E. O. Putin, and A. A. Filchenkov, “Application of deep learning to the problem of vehicle detection in uav images,” in Soft Comput- ing and Measurements (SCM), 2016 XIX IEEE international conference on. IEEE, 2016, pp. 4–6.
[13] S. Akcay, M. E. Kundegorski, C. G. Willcocks, and T. P. Breckon, “Using deep convolutional neural network architectures for object classification and detection within x-ray baggage security imagery,” IEEE Transactions on In- formation Forensics and Security, vol. 13, no. 9, pp. 2203–2215, 2018.
[14] S. Akcay and T. P. Breckon, “An evaluation of region based object detec- tion strategies within x-ray baggage security imagery,” in Image Processing (ICIP), 2017 IEEE International Conference on. IEEE, 2017, pp. 1337–1341.
[15] S. O’Keeffe and R. Villing, “Evaluating pruned object detection networks for real-time robot vision,” in Autonomous Robot Systems and Competitions (ICARSC), 2018 IEEE International Conference on. IEEE, 2018, pp. 91–96.
[16] D. Guo, T. Kong, F. Sun, and H. Liu, “Object discovery and grasp detection with a shared convolutional neural network,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2016, pp. 2038– 2043.
[17] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.
[18] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
[19] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European conference on computer vision. Springer, 2014, pp. 818–833.
50
[20] Y. LeCun, J. S. Denker, and S. A. Solla, “Optimal brain damage,” in Advances in neural information processing systems, 1990, pp. 598–605.
[21] B. Hassibi and D. G. Stork, “Second order derivatives for network pruning: Optimal brain surgeon,” in Advances in neural information processing sys- tems, 1993, pp. 164–171.
[22] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connec- tions for efficient neural network,” in Advances in neural information process- ing systems, 2015, pp. 1135–1143.
[23] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” arXiv preprint arXiv:1608.08710, 2016.
[24] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convo- lutional neural networks for resource efficient transfer learning,” CoRR, abs/ 1611.06440, 2016.
[25] Y. He, X. Zhang, and J. Sun, “Channel pruning for accelerating very deep neural networks,” in International Conference on Computer Vision (ICCV), vol. 2, no. 6, 2017.
[26] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured spar- sity in deep neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 2074–2082.
[27] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
51
[28] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, “Selec- tive search for object recognition,” International journal of computer vision, vol. 104, no. 2, pp. 154–171, 2013.
[29] C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,” in European conference on computer vision. Springer, 2014, pp. 391–405.
[30] P. Arbeláez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik, “Mul- tiscale combinatorial grouping,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 328–335.
[31] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detec- tion,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1. IEEE, 2005, pp. 886–893.
[32] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” In- ternational journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
[33] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
[34] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997.
[35] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” arXiv preprint, 2017.
52
[36] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on com- puter vision. Springer, 2016, pp. 21–37.
[37] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, “Dssd: Deconvolu- tional single shot detector,” arXiv preprint arXiv:1701.06659, 2017.
[38] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in International Conference on Computer Vision (ICCV), 2017, pp. 2999–3007.
[39] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017, pp. 2980–2988.
[40] M. Najibi, M. Rastegari, and L. S. Davis, “G-cnn: an iterative grid based object detector,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2369–2377.
[41] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in European conference on computer vision. Springer, 2014, pp. 346–361.
[42] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820–4828.
[43] S. Sharify, A. D. Lascorz, K. Siu, P. Judd, and A. Moshovos, “Loom: Ex- ploiting weight and activation precisions to accelerate convolutional neural
53
networks,” in Proceeding of the 55th Annual Design Automation Conference. ACM, 2018, pp. 20:1–20:6.
[44] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Pro- ceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
[45] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolu- tions,” arXiv preprint arXiv:1511.07122, 2015.
[46] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, “Un- derstanding convolution for semantic segmentation,” arXiv preprint arXiv: 1702.08502, 2017.
[47] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, “Large kernel matters—im- prove semantic segmentation by global convolutional network,” in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017, pp. 1743–1751.
[48] G. Lin, A. Milan, C. Shen, and I. D. Reid, “Refinenet: Multi-path refinement networks for high-resolution semantic segmentation.” in Cvpr, vol. 1, no. 2, 2017, pp. 5168–5177.
[49] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely con- nected convolutional networks.” in Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, vol. 1, no. 2, 2017, pp. 2261–2269.
54
[50] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
[51] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. Ieee, 2009, pp. 248– 255.
[52] R. Caruana, S. Lawrence, and C. L. Giles, “Overfitting in neural nets: Back- propagation, conjugate gradient, and early stopping,” in Advances in neural information processing systems, 2001, pp. 402–408.
[53] P. Yuan, Y. Zhong, and Y. Yuan, “Faster r-cnn with region proposal re- finement,” Computer Science Department, Stanford University, Tech. Rep., 2017.
[54] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in py- torch,” 2017.