研究生: |
蔡品儀 Tsai, Pin-Yi |
---|---|
論文名稱: |
以GPU加速Adaboost為基底之多物件辨識系統 A GPU-Accelerated Object Recognition System Using Adaboost Algorithm |
指導教授: |
許雅三
Hsu, Yarsun |
口試委員: |
邱瀞德
Ching-Te Chiu 李政崑 Jenq-Kuen Lee |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2014 |
畢業學年度: | 103 |
語文別: | 英文 |
論文頁數: | 58 |
中文關鍵詞: | 物件辨識 、圖形處理器 、機器學習 |
外文關鍵詞: | object recognition, GPU, adaboost |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於Nvidia推出CUDA(Compute Unified Device Architecture),GPU(Graphic Processing Unit)在非電腦圖學外的領域,也有了快速的發展,CUDA使原本不是進行電腦圖學研究的程式開發者,也可以透過CUDA包裝好的函式,利用GPU進行運算,與CPU相較之下,單一GPU的執行指令速度雖然較慢,但當資料量夠大且經由適當安排時,GPU在單位時間內的產出可明顯優於CPU,惟目前CUDA支援於Nvidia開發的GPU。
除了CUDA外,由Khronos發表的OpenCL (Open Computing Language)也支援開發者在不需進行過多額外電腦圖學研究的情形下,使用GPU進行運算。
OpenCL不受開發商限制,可在各家廠商提供的GPU上進行開發,通用性較廣;但由於CUDA與Nvidia提供的硬體本身有較優良的整合對應關係,當使用的是Nvidia的GPU時,CUDA往往會有較顯著的效能增加。
近年來先進駕駛輔助系統(Advanced Driver Assistance Systems;ADAS)技術逐漸成熟,為了偵測周遭環境,提醒或提示駕駛因應下,系統判讀行人、交通號誌、車輛的準確率便越趨重要。傳統物件辨識的領域中常利用Adaboost演算法進行資料訓練,但訓練階段常需進行大量且繁複的計算,更新資料庫的速度因此受阻,對實際上路時的安全帶來隱憂,我們導入GPU進行萃取資料特徵及訓練資料庫的加速,在萃取資料特徵的部分,加快了6.12倍;而訓練資料庫的部分則較原始使用C++及單一CPU的時間加速了34.53倍。
Recently, programming on GPU has become the general solution of high performance computing. Various applications and frameworks are developed to utilize the power of GPU. CUDA, proposed by Nvidia, enables programmers who are not major in computer vision also benefit from GPU easily. CUDA is only available on Nvidia’s GPU; for other GPUs, OpenCL can be applied to deal with the similar work. OpenCL supports cross-platform programming and also can cooperate with CPU.
However, the powerful advantage is obvious only when the data is massive. If the data is not enough to cover the communication and overhead of kernel launching, the performance of GPU may be worse than the version with single CPU thread due to GPU’s lower clock-rate and necessary data transportation.
ADAS (Advanced Driver Assistance Systems) is used to remind or help drivers to act while the emergency occurs in the surroundings. For the safety, the ADAS is equipped with a front-view camera to detect other cars, pedestrian and traffic signs around. Traditionally, Adaboost algorithm is often applied to object recognition because of its wider usage and well trained results. However, because the computation of AdaBoost is extremely time-consuming, it is difficult to guarantee that the computations reflect the latest information in real time. To make sure the safety when the car is moving and the environment keeps changing, we would like to accelerate the original object recognition system with GPU. In our system, we applied CUDA to accelerate Feature Extraction and Adaboost Training. We do not focus on Adaboost Testing since it is as complex as Adaboost Training is. For these two parts, we adopted different strategies such as how the data is put in the memory, the amount of CUDA streams, trunk size, the size of block, etc.
Finally, our system can gain 6.12x in Feature Extraction and 34.53x in Adaboost Training on Nvidia K20c. The accuracy and safety of ADAS become higher because the surroundings can be sensed more distinctly with Adaboost algorithm in GPU computing.
[1] Freescale,”Advanced Driver Assistance Systems (ADAS),”
Retrieved Aug 20, 2013, from http://www.freescale.com/webapp/sps/site/overview.jsp?code=APLADASYS
[2] Gutierrez. P.D. , Lastra. M., Herrera. F. and Benitez. J.M. ," A High Performance Fingerprint Matching System for Large Databases Based on GPU," IEEE Transactions On Information Forensics And Security, vol. 9, no. 1, pp. 62-71, January 2014
[3] Heidari. H. Chalechale. A. and Mohammadabadi. A.A. ," Accelerating of color moments and texture features extraction using GPU based parallel computing ," 8th Iranian Conference on Machine Vision and Image Processing (MVIP) , pp. 430 - 435 , Sept. 2013
[4] Felzenszwalb. P.F. , Girshick. R.B. and McAllester. D. ," Cascade Object Detection with Deformable Part Models," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241-2248 , June 2010
[5] Malakar, R. and Vydyanathan. N. ," A CUDA-enabled Hadoop Cluster for Fast Distributed Image Processing ," National Conference on Parallel Computing Technologies (PARCOMPTECH), pp. 1-5 , Feb. 2013
[6] Sakr. F.Z. , Taher. M. , Ei-Bialy. A.M. and Wahba. A.M. ," Accelerating Iris Recognition Algorithms on GPUs ," Cairo International Biomedical Engineering Conference (CIBEC), pp. 73-76 , Dec. 2012
[7] Iwai. K. , Kurokawa. T. and Nisikawa. N. ,” AES Encryption Implementation On CUDA GPU And Its Analysis ,” First International Conference on Networking and Computing (ICNC), pp. 209-214 , Nov.2010
[8] Sharma, B. , Thota. R. , Vydyanathan. N. and Kale. Amit ," Towards a Robust, Real-time Face Processing System using CUDA-enabled GPUs , " International Conference on High Performance Computing (HiPC) , pp. 368-377 , Dec. 2009
[9] Kulkarni. J.B. , Sawant. A.A. and Inamdar. V.S. , "Database processing by Linear Regression on GPU usingCUDA ," International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN) , pp.20-23, July 2011
[10] Martinez. G. , Gardner. M. and Wu-chun Feng ,” CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures ,” IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), pp. 300-307 , Dec.2011
[11] Bo Wu , Haizhou Ai , Chang Huang and Shihong Lao ," Fast rotation invariant multi-view face detection based on real Adaboost ," 6th IEEE International Conference on Automatic Face and Gesture Recognition Proceedings. pp. 79-84, May 2004.
[12] Ming-Ke Zhou, Fei Yin and Cheng-Lin Liu ," GPU-Based Fast Training of Discriminative Learning Quadratic Discriminant Function for Handwritten Chinese Character Recognition ," 12th International Conference on Document Analysis and Recognition (ICDAR) , pp. 842-846 , Aug. 2013
[13] Rodriguez. R. , Martinez. J.L. , Fernandez-Escribano. G. , Claver. J.M. and Sanchez. J.L. ,"Accelerating H.264 inter prediction in a GPU by using CUDA ," International Conference on Consumer Electronics (ICCE) Digest of Technical Papers , pp.463-464 , Jan. 2010
[14] Haofeng Kou, Weijia Shang, Ian Lane and Jike Chong ," Optimized Mfcc Feature Extraction On GPU ," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp 7130-7134 , May 2013
[15] Jan Masek, Radim Burget, Vaclav Uher, and Selda G¨uney ," Speeding up Viola–Jones Algorithm using Multi–Core GPU Implementation ," 36th International Conference Telecommunications and Signal Processing (TSP), pp. 808-812 , July 2013
[16] Berkin Bilgic, Berthold K.P. Horn and Ichiro Masaki ," Fast Human Detection with Cascaded Ensembles on the GPU ," IEEE Intelligent Vehicles Symposium (IV), pp. 325-332 , June 2010
[17] Chih-Rung Chen ,Wei-Su Wong and Ching-Te Chiu," A 0.64 mm2 Real-Time Cascade Face Detection DesignBased on Reduced Two-Field Extraction," IEEE Transactions On Very Large Scale Integration (VLSL) Systems, col.19, no.11, pp.1937-1948, Nov. 2011
[18] Junchul Kim , Eunsoo Park , Xuenan Cui , Hakil Kim and Gruver. W.A. , " A Fast Feature Extraction in Object Recognition Using Parallel processing on CPU and GPU , " IEEE International Conference on Systems, Man and Cybernetics , pp. 3842-3847, Oct. 2009
[19] Jae-chan Jeong , Ho-chul Shin and Jae-il Cho ," GPU-based Real-time Face Detector ," 9th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), pp. 173-175 , Nov. 2012
[20] Ram Kim , Sang-Yong Rhee ," Intention Estimation of a Walker around Pedestrian Lights by Using Fuzzy Rules ," International Conference on Fuzzy Theory and it's Applications (iFUZZY) , pp.233-237, Nov. 2012
[21] Cha Zhang and Zhengyou Zhang ,"A Survey of Recent Advances in Face Detection," Technical Report Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 June 2010
[22] Huang. Jing , Ponce. Sean P. , Park. Seung In , Yong Cao and Quek. Francis ,” GPU-Accelerated Computation for Robust Motion Tracking Using the CUDA Framework” 5th International Conference on Visual Information Engineering (VIE) , pp. 437-442 , Aug. 2008
[23] Yingjie Xia , Li Yang and Xingmin Shi ,” Parallel Viewshed Analysis on GPU using CUDA ,” Third International Joint Conference on Computational Science and Optimization (CSO), pp 373-374 , May 2010
[24] Tingting Xu , Pototschnig. T. , Kuhnlenz. K. and Buss. M. ,” A high-speed multi-GPU implementation of bottom-up attention using CUDA , “IEEE International Conference on Robotics and Automation (ICRA) , pp. 41-47 , May 2009.
[25] Barak. A. , Ben-Nun. T. , Levy. E. and Shiloh. A. ,” A Package for OpenCL Based Heterogeneous Computing on Clusters with Many GPU ,” IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTERWORKSHOPS), pp. 1-7 , Sept. 2010
[26] Quek. F. , Xin-Feng Ma and Bryll. R. ,” A parallel algorithm for dynamic gesture tracking ,” International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems , pp. 64-69 , Sept. 1999.
[27] Qifei Zhang and Zhiqing Liu ,” A Parallel Design Of Computer Go Engine On Cuda-Enabled Gpu ,” IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS) , pp. 85-88 , Sept.2011
[28] Tao Li , Hua Li , Xuechen Liu , Shuai Zhang , Kai Wang and Yulu Yang ," GPU Acceleration of Interior Point Methods in Large Scale SVM Training ," 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications , pp. 863-870 , July 2013
[29] Josef Svenningsson, Joel Svensson and Mary Sheeran. ,"Counting and Occurrence sort for GPUs using an Embedded Language". 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC), pp. 37-46, July 2013.
[30] Yuanyuan Zhang , Jianhui Zhao , Zhiyong Yuan , Yihua Ding , Chengjiang Long and Lu Xiong ,” CUDA Based GPU Programming To Simulate 3D Tissue Deformation ,” International Conference on Biomedical Engineering and Computer Science (ICBECS), pp. 1-5 , April 2010
[31] Vineet. V. and Narayanan. P.J. ,” CUDA Cuts: Fast Graph Cuts on the GPU ,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1-8 , June 2008
[32] Shenshen Liang , Ying Liu , Cheng Wang and Liheng Jian ,” Design and Evaluation of a Parallel K-Nearest Neighbor Algorithm on CUDA-enabled GPU .” IEEE 2nd Symposium on Web Society (SWS), pp. 53-60 , Aug.2010
[33] Yingen Xiong , Bing Fang and Quek. F. ,” Extraction of Hand Gestures with Adaptive Skin Color Models and its Applications to Meeting Analysis ,” Eighth IEEE International Symposium on Multimedia, pp. 647-651 , Dec. 2006
[34] Jingyu Cui , Pratx. G. , Prevrhal. S. , Lingxiong Shao and Levin. C.S. ,” Fully 3-D List-mode Positron Emission Tomography Image Reconstruction on GPU using CUDA ,” IEEE Nuclear Science Symposium Conference Record (NSS/MIC), pp. 2635-2637 , Nov. 2010
[35] Keliang Zhang , Jiajia Li , Gang Chen and Baifeng Wu .” GPU Accelerate Parallel Odd-Even Merge Sort: An OpenCL Method ,” 15th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp.76-83 , June 2011
[36] Ying Zhuge , Yong Cao and Miller. R.W. ,” GPU Accelerated Fuzzy Connected Image Segmentation by using CUDA ,” IEEE Annual International Conference Engineering in Medicine and Biology Society, Minneapolis, Minnesota, USA, pp. 6341-6344 Sept. 2009
[37] Beermann. M. , Monro. E. , Schmalen. L. and Vary. P. ,” High Speed Decoding Of Non-Binary Irregular Ldpc Codes Using Gpus ,” IEEE Workshop on Signal Processing Systems (SiPS), pp. 36-41 , Oct. 2013
[38] Recur. B. , Desbarats. P. and Domenger. J. ,” Implementation Of Usual Computerized Tomography Methods On Gpu Using The Compute Unified Device Architecture (CUDA) ,” Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings (SPA), pp. 41-46 , Sept. 2009
[39] Sathre. P. , Gardner. M. and Wu-Chun Feng ,” Lost in Translation: Challenges in Automating CUDA-to-OpenCL Translation ,” 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 89-96 , Sept. 2012
[40] Seung In Park , Ponce. S.P. , Jing Huang , Yong Cao and Quek. F. ,” Low-Cost, High-Speed Computer Vision Using NVIDIA’s CUDA Architecture ,” 37th IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1- 7 , Oct. 2008
[41] Arun. J.P. , Mishra. M. and Subramaniam. S.V. ,” Parallel Implementation of MOPSO on GPU using OpenCL and CUDA ,” 18th International Conference on High Performance Computing (HiPC), pp. 1-10 Dec.2011
[42] Arora. R. , Tulshyan. R. and Deb. K. ,” Parallelization of Binary and Real-Coded Genetic Algorithms on GPU using CUDA ,“ IEEE Congress on Evolutionary Computation (CEC) , pp. 1-8 , July 2010
[43] Bin Wang and Weikuan Yu ,” Performance and Power Simulation for Versatile GPGPU Global Memory ,” IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 2254-2257 , May 2013
[44] Xiaojun Li , Yang Gao and Ying Liu ,” Performance Evaluation of Fast Fourier Transform application on Heterogeneous Platforms ,” International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 241-249 , Oct.2011
[45] Da Qi Ren and Suda. R. ,” Power Efficient Large Matrices Multiplication by Load Scheduling on Multicore and GPU platform with CUDA ,” International Conference on Computational Science and Engineering (CSE) , pp. 424-429 , Aug. 2009
[46] Lahabar. S. and Narayanan. P.J. ,” Singular Value Decomposition on GPU using CUDA ,” IEEE International Symposium on Parallel & Distributed Processing (IPDPS) , pp. 1-10 , May 2009
[47] Gervasi. O. , Russo. D. and Vella. F. ,” The AES Implantation Based On OpenCL For Multi/Many Core Architecture ,” International Conference on Computational Science and Its Applications (ICCSA), pp. 129-134 , March 2010
[48] Wu-chun Feng and Shucai Xiao ,” To GPU Synchronize or Not GPU Synchronize,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 3801-3804 , June 2010
[49] Lei Chen. R. , Travis Roseand Ying Qiao ,” VACE Multimodal Meeting Corpus ,” Machine Learning for Multimodal Interaction Lecture Notes in Computer Science vol. 3869, pp. 40-51 2006
[50] Francis K. H. Quekand Robert K. Bryll ,” Vector Coherence Mapping: A Parallelizable Approach to Image Flow Computation,” Asian Conference on Computer Vision (ACCV) , pp. 591-598, Jan.1998
[51] VIS Lab Hand Tracking Brochure,” Vector Coherence Mapping-based Tracking,” Retrieved December 20, 2012, from http://vislab.cs.vt.edu/~quek/CHCIFlyers/
[52] Jun Bongjin , Choi Inho and Kim Daijin ,” Local Transform Features and Hybridization for Accurate Face and Human Detection ,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 35 , no. 6 , pp. 1423-1436 , June 2013
[53] Wen-Chang Cheng and Ding-Mao Jhan ,”Triaxial Accelerometer-Based Fall Detection Method Using a Self-Constructing Cascade-AdaBoost-SVM Classifier ,” IEEE Journal of Biomedical and Health Informatics, vol. 17 , no. 2 , pp.411-419 , March 2013
[54] Acasandrei. L. and Barriga. A. ,” AMBA bus hardware accelerator IP for Viola–Jones face detection ,” IET of Computers & Digital Techniques, vol. 7 , no.5 , Sept. 2013
[55] Sofka. M. , Jingdan Zhang , Good. S. , Zhou. S.K. and Comaniciu. D. ,” Automatic Detection and Measurement of Structures in Fetal Head Ultrasound Volumes Using Sequential Estimation and Integrated Detection Network (IDN) ,” IEEE Transactions on Medical Imaging, vol.33 , no. 5 , pp.1054-1070 , May 2014
[56] Wang Weiyan , Zhang Yunquan , Yan Shengen , Zhang Ying and Jia Haipeng ,” Parallelization and Performance Optimization on Face Detection Algorithm with OpenCL: A Case Study ,” Tsinghua Science and Technology , vol.17 , no.3 , pp.287-295 , June 2012
[57] Matthew J. Thurley and Victor Danell ,” Fast Morphological Image Processing Open-Source Extensions for GPU Processing With CUDA ,” IEEE Journal of Selected Topics in Signal Processing , vol.6 , no.7 , pp.849-855 , Nov. 2012
[58] In Kyu Park ; Singhal N. ; Man Hee Lee ; Sungdae Cho and Kim C.W. ,”Design and Performance Evaluation of Image Processing Algorithms on GPUs,” IEEE Transactions of Parallel and Distributed Systems , vol.22 , no.1 , pp.91-104 , Jan. 2011
[59] Santos. L. ; Magli. E. ; Vitulli. R. ; Lopez. J.F. and Sarmiento. R. ,”Highly-Parallel GPU Architecture for Lossy Hyperspectral Image Compression,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , col.6 , no.2 , pp.670-681 , April 2013
[60] Idzenga. T. ; Gaburov. E. ; Vermin. W. ; Menssen. J. and De Korte. C. ,”Fast 2-D ultrasound strain imaging_ the benefits of using a GPU,” IEEE Transactions of Ultrasonics, Ferroelectrics, and Frequency Control , vol.61 , no.1 , pp.207-213 , Jan.2013
[61] Xanthis. C.G. ; Venetis. I.E. ; Chalkias. A.V. and Aletras. A.H. ,”MRISIMUL: A GPU-Based Parallel Approach to MRI Simulations,” IEEE Transactions of Medical Imaging , vol.33 , no.3 , pp.607-617 , March 2014
[62] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features," vol. 1, pp. I-511-I-518 vol.1, 2001.
[63] NVIDIA Corporation,” Thrust - NVIDIA Developer,”
Retrieved May 20, 2013, from https://developer.nvidia.com/Thrust
[64] Wen-Mei W. Hwu and David B. Kirk, "Programming Massively Parallel Processors: A Hands-on Approach", Burlington, Morgan Kaufmann, 2010.
[65] B. Bilgic, B. Horn, and I. Masaki, “Efficient Integral Image Computation on the GPU,” IEEE Intelligent Vehicles Symposium, pp. 528-533, June. 2010.