研究生: |
黃俊揚 Huang, Chung-Yang |
---|---|
論文名稱: |
即時人體上半身姿態辨識與肢體動作捕捉 Real-time Human Upper Body Posture Recognition and Upper Limbs Motion Capturing |
指導教授: |
黃仲陵
Huang, Chung-Lin 鐘太郎 Jong, Tai-Lang |
口試委員: |
范國清
余孝先 葉家宏 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2012 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 48 |
中文關鍵詞: | 深度影像 、人體辨識 、姿態辨識 、即時 |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出並實現一個辨識人體上半身動作及姿態的即時系統。系統的使用環境為坐在電腦前,而深度攝影機架設在螢幕上方。(考慮使用者在電腦前互動時的攝影機位置,攝影機僅可取得人體上半身的影像。)
系統的輸入為即時的深度影像,攝影機是使用微軟的深度攝影機Kinect,它幾乎為目前消費型深度攝影機的最佳選擇,特色是便宜,深度精度可到公釐(mm),可產生即時的深度影像。
系統的輸出有二:使用者現在的動作(正常坐姿、舉起左手、舉起右手、舉起雙手)及使用者目前的身體部位估計位置(臉、肩、手臂、手肘、手掌、軀體等等)。整個系統的FPS約在14~18之間,因目前畫面的總共需處理點數而異。
系統架構主要可分成兩個階段:
在第一階段,深度影像經過前處理、萃取特徵(自行定義的特徵:Depth Context)、將特徵送入動作分類器(Random Forest[16])辨識目前使用者動作,最後考慮動作上的時間相依性修正並輸出使用者當前動作。
第二階段,依照第一階段辨識出的使用者動作,挑選適當的部位分類器(Pixel based Random Forest),將前處理過的深度影像送入分類器,辨識出當前使用者的身體部位分布。之後考慮身體各部位的時間相依性,修正可能被遮擋住的身體部位,輸出使用者目前的身體部位估計位置。
We propose a real-time system to recognize human upper body posture and predict positions of upper limbs joints. The working environment of the system is for the user sitting in front of the computer, and the camera is set up at the top of the screen. We consider users interact with the system in front of the computer, the camera is only available to get the human upper body images.
The system input is a real-time depth image, captured by using Microsoft depth camera Kinect.
The system has two outputs: The user's action (Normal sitting position, raised his left hand, raised his right hand, raised his hands) and estimated location of body parts (Face, shoulders, arms, elbows, palms, body, etc). FPS of the whole system between about 14 to 18, varies due to the total required processing pixels.
System architecture can be divided into two phases:
In the first stage, Depth images after the pre-processing and feature extraction (Depth Context), are analyzed by the action classifier (Random Forest [16]) to Identify the current user action type. Then, the time dependency is applied to correct action type.
In the second stage, according to the user action type we select an appropriate body classifier (Pixel based Random Forest), to classify pre-processing depth image and identify the distribution of body parts. Later, considering the time dependency of each body part, we correct the overlapping body part, and determine the estimated positions of body parts.
[1] Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake, Real-Time Human Pose Recognition in Parts from Single Depth Images ,CVPR 2011
[2] V. Ganapathi, C. Plagemann, D. Koller, and S. Thrun. Real time motion capture using a single time-of-flight camera. In Proc. CVPR, 2010.
[3] S. Belongie, J. Malik, and J. Puzicha. Shape Matching and Object Recognition Using Shape Contexts. PAMI, 24(4):509–522, 2002.
[4] Andreas Baak, Meinard Müller, Gaurav Bharaj, Hans-Peter Seidel, hristian Theobalt. A data-driven approach for real-time full body pose econstruction from a depth camera.ICCV 2011.
[5] K. Hara. Real-time Inference of 3D Human Poses by Assembling Local Patches. WACV, 2009.
[6] T. Moeslund, A. Hilton, and V. Kr¨uger. A survey of advances in vision-based human motion capture and analysis. CVIU, 2006.
[7] R. Poppe. Vision-based human motion analysis: An overview. CVIU, 108, 2007.
[8] Microsoft Corp. Redmond WA. Kinect for Xbox 360.
[9] D. Grest, J. Woetzel, and R. Koch. Nonlinear body pose estimation from depth images. In In Proc. DAGM, 2005
[10] S. Knoop, S. Vacek, and R. Dillmann. Sensor fusion for 3D human body tracking with an articulated 3D body model. In Proc. ICRA, 2006.
[11] Y. Zhu and K. Fujimura. Constrained optimization for human pose estimation from depth sequences. In Proc. ACCV, 2007.
[12] M. Siddiqui and G. Medioni. Human pose estimation from a single view point, real-time range sensor. In CVCG at CVPR, 2010.
[13] C. Plagemann, V. Ganapathi, D. Koller, and S. Thrun. Real-time identification and localization of body parts from depth images. In Proc. ICRA, 2010.
[14] G. Mori and J. Malik. Estimating human body configurations using shape context matching. In Proc. ICCV, 2003
[15] L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3D human pose annotations. In Proc. ICCV, 2009.
[16] L. Breiman. Random forests. Mach. Learning, 45(1):5–32, 2001.
[17] V. Lepetit, P. Lagger, and P. Fua. Randomized trees for real-time keypoint recognition. In Proc. CVPR, pages 2:775–781, 2005.
[18] J. Shotton, M. Johnson, and R. Cipolla. Semantic texton forests for image categorization and segmentation. In Proc. CVPR, 2008
[19] R.Wang and J. Popovi´c. Real-time hand-tracking with a color glove. In Proc. ACM SIGGRAPH, 2009.
[20] C. Bregler and J. Malik. Tracking people with twists and exponential maps. In Proc. CVPR, 1998.
[21] D. Grest, J. Woetzel, and R. Koch. Nonlinear body pose estimation from depth images. In In Proc. DAGM, 2005.
[22] L. Sigal, S. Bhatia, S. Roth, M. Black, and M. Isard. Tracking looselimbed people. In Proc. CVPR, 2004.
[23] T. Sharp. Implementing decision trees and forests on a GPU. In Proc. ECCV, 2008.
[24] Ho, Tin Kam . "Random Decision Forest". Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282 , 1995.