簡易檢索 / 詳目顯示

研究生: 吳銘揚
Ming-Yang Wu
論文名稱: 多模式查詢之人體動作擷取系統
HUMOR : a HUman MOtion Retrieval system with multi-modal queries
指導教授: 楊熙年
Shi-Nine Yang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 68
中文關鍵詞: 內容擷取人體動作多模式
外文關鍵詞: content-based retrieval, human motion, multi-modal
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在這篇論文中,我們展示了一套多模式互動之人體動作擷取系統。這套系統除了提供多種方便的互動式界面以外,並能快速而有效地幫助使用者擷取出需要的人體動作。在界面上,我們設計了四種符合使用者直覺而且操作簡單方便的輸入界面,如關鍵字、關鍵姿勢、照片、範例動作片段,來讓使用者作查詢。使用者可以從資料庫或查詢結果中挑選出一段動作來當做範例進行查詢;也可輸入關鍵字找出已經註解過的動作片段進行查詢;若對動作姿勢有概念,可從資料庫中挑選出並或是由平面照片來產生想要擷取的動作之超始和終止人體立體姿勢,並可對其調整出更符合的人體姿勢來進行查詢。之後,使用者可以透過兩種不同的輸出界面,如動畫、動作影像,來觀看查詢結果。在擷取方法上,我們針對人體動作特徵設計了索引與比對機制。首先,我們對人體姿勢擷取出對平移、縮放和旋轉均不受影響的仿射不變參數(affine invariant),並利用了一種能保有叢集中心點原有拓撲性的自我組織網路(self-organizing map),來產生索引映射圖。為了避免在高維空間搜尋時,會產生時間過長的問題,我們將全身的高維參數換成了各肢幹二維參數的直和。其次對於使用者所給予的起始及終止姿勢,我們將其轉換為對應的索引值然後在索引映射圖上找出此兩位置間可能的動作片段,再利用動態時間校正法(dynamic time warping)來做相似度的計算。最後我們以實驗結果顯示多重使用者互動式介面的有用性以及動作擷取方法的有效性。


    In this thesis, a content-based HUman MOtion Retrieval system, HUMOR, is presented. HUMOR provides multiple convenient and intuitive interfaces for interaction, together with effective indexing and matching algorithms for retrieval. In HUMOR, users can choose convenient input modes, including text, stickman, images, and motion clips, to specify their queries. Later, they can observe retrieval results via graphics images or animation video. Besides, a novel retrieval approach, including indexing and matching, is devised to facilitate the search for human motion. In indexing, we introduce an affine invariant posture representation and propose a SOM-based index map according to the distribution of the raw data. To avoid the curse of dimensionality, the high-dimension feature space of the whole body is decomposed into the direct sum of low-dimension feature spaces of skeletal segments. In matching, the start frame and the end frame of the user query, are used to find candidate clips from the given motion collection. Then the similarity between the user query and each candidate clip are computed by using a dynamic time warping algorithm. The usability of the multi-modal user interface and the effectiveness of proposed retrieval approach are demonstrated.

    中文摘要 ……………………………………………... ii Abstract ……………………………………………… iii Acknowledgments …………………………………... iv Table of Contents …………………………………….. v List of Figures ……………………………………… viii List of Tables ………………………………………… xi Chapter 1 Introduction …………………………..... 1 Chapter 2 Related Work …………………………... 4 2.1 Content-Based Video Retrieval ……………………..... 4 2.2 Human Motion Analysis ……………………………… 8 Chapter 3 System Overview ……………………... 10 Chapter 4 Multi-modal query interfaces ……….. 13 4.1 Example Clip Input Interface ………………………... 13 4.2 Text Input Interface ………………………………….. 13 4.3 Stickman Input Interface …………………………….. 14 4.4 Image Input Interface ………………………………... 14 4.4.1 Human Posture Reconstruction from a Single Image ……………………………………………………….. 14 4.4.2 Posture Library Preprocessing …………………….. 17 4.4.2.1 Posture Feature Representation ………………….. 18 4.4.2.2 Posture Table Creation …………………………... 19 4.4.3 Human Posture Reconstruction …………………… 24 4.4.3.1 Pivotal Posture Retrieval ………………………… 25 4.4.3.3 Constraint-Based Reconstruction ………………... 29 4.4.3.2.1 Physical Constraint …………………………….. 29 4.4.3.2.2 Environmental Constraint ……………………... 32 4.4.4 Experimental Results ……………………………… 34 4.4.4.1 Performances …………………………………….. 34 4.4.4.2 Discussion ……………………………………….. 36 4.5 Graphics Images Output Interface …………………... 43 4.6 Animation Video Output Interface …………………... 43 Chapter 5 Human Motion Retrieval …………….. 44 5.1 Indexing ……………………………………………... 44 5.1.1 Index Map Construction …………………………... 44 5.2 Matching …………………………………………….. 49 5.2.1 Candidate Clip Searching …………………………. 49 5.2.2 Dynamic Time Warping …………………………… 52 Chapter 6 Experimental Results ………………… 54 6.1 Retrieval Scenarios ………………………………….. 54 6.2 Retrieval Accuracy …………………………………... 58 6.3 Retrieval Time ………………………………………. 60 Chapter 7 Conclusions and Future Work …………….. 62 References …………………………………………... 63 List of Figures Chapter 2 Fig. 2.1 Human motion retrieval in a long-length sequence ….. 9 Fig. 2.2 Segmented object motion retrieval …………………... 9 Chapter 3 Fig. 3.1 System Overview (a) Multi-modal query interfaces (b) Motion retrieval …………………………………………. 10 Chapter 4 Fig. 4.1 The reconstruction procedure of the proposed approach …………………………………………………………… 16 Fig. 4.2 The hierarchical human model ……………………... 19 Fig. 4.3 Eight projections around a 3D posture of a set of sampling view directions ………………. 20 Fig. 4.4 The body segment and its projection under scaled orthographic projection ………………………………….. 21 Fig. 4.5 An example of indexing in a posture table for a body segment ………………………………………………….. 24 Fig. 4.6 The 2D human figure in an image (a) Measuring angle and length for each labeled body segment. (b) Estimating the root orientation of the postured character .. 28 Fig. 4.7 Range search in the posture table of a body segment …………………………………………………………… 28 Fig. 4.8 Posture reconstruction for the j-th body segment (a) Front view. (b) Top view ………………………………… 32 Fig. 4.9 The feet-floor contact constraint (a) Original reconstructed posture. (b) Define the floor and floor fulcrum. (c) Apply the inverse kinematics technique ……………... 33 Fig. 4.10 A sequence of 2D and 3D key postures of Tai Chi Chuan motion – “Grasp the Swallow’s Tail.” …………… 38 Fig. 4.11 Experimental results obtained by applying our reconstruction approach to some images ………………... 39 Fig. 4.12 Experimental results for some testing photographs …………………………………………………………… 40 Chapter 5 Fig. 5.1 (a) Initial cluster centers; (b) the segment-posture distribution of the left lower arm; (c) the segment-posture distribution of the left lower leg; (d) the index map of the left lower arm; (e) the index map of the left lower leg ….. 48 Fig. 5.2 Candidate clip searching …………………………… 52 Chapter 6 Fig. 6.1 (a) The query example; (b)-(h) the retrieved clips …. 55 Fig. 6.2 (a) The query example; (b)-(f) the retrieved clips ….. 56 Fig. 6.3 (a) The query example; (b)-(f) the retrieved clips ….. 57 Fig. 6.4 The PR graph for three indexing methods ………….. 59 List of Tables Chapter 4 Table 4.1 Average RMS errors of pivotal postures and reconstructed postures …………………………………... 41 Table 4.2 Average RMS errors of reconstructed postures based on the physical constraint only, and that based on physical and environmental constraints …………………………... 42 Chapter 6 Table 6.1 The matching time cost for three indexing methods …………………………………………………………… 60

    [1] I. Haritaoglu, D. Harwood, L. S. Davis, W4: real-time surveillance of people and their activities, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 2000, pp. 809-830.
    [2] N. M. Oliver, B. Rosario, A. P. Pentland, A Bayesian computer vision system for modeling human interactions, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 2000, pp. 831-843.
    [3] M. Kőhle, D. Merkl, J. Kastner, Clinical gait analysis by neural networks: issues and experiences, IEEE Symposium on Computer-Based Medical Systems, 1997, pp. 138-143.
    [4] D. Meyer, J. Denzler, H. Niemann, Model based extraction of articulated objects in image sequences for gait analysis, IEEE International Conference on Image Processing, 1997, pp. 78-81.
    [5] J. W. Davis, A. F. Bobick, Virtual PAT: a virtual personal aerobics trainer, Workshop on Perceptual User Interfaces, San Francisco, CA, Nov. 5-6, 1998, pp. 13-18.
    [6] P. T. Chua, R. Crivella, B. Daly, N. Hu, R. Schaaf, D Ventura, T. Camill, J. Hodgins, R. Pausch, Training for physical tasks in virtual environments: Tai Chi, IEEE International Conference on Virtual Reality, Los Angeles, CA, Mar. 22-26, 2003.
    [7] C. BenAbelkader, R. Cutler, L. Davis, Person identification using automatic height and stride estimation, IEEE International Conference on Pattern Recognition, Quebec City, Canada, Aug. 11-15, 2002.
    [8] A. F. Bobick, A. Johnson, Gait recognition using static activity-specific parameters, IEEE Computer Vision and Pattern Recognition, Kauai, Hawaii, Dec. 8-14, 2001.
    [9] F. Multon, L. France, M.-P. Cani-Gascuel, G. Debunne, Computer animation of human walking: a survey, The Journal of Visualization and Computer Animation, 10(1), 1999, pp. 39-54.
    [10] O. Arikan, D. A. Forsyth, J. F. O’Brien, Motion synthesis from annotations, ACM Transactions on Graphics, 22(3), 2003, pp. 402-408.
    [11] W. T. Freeman, P. A. Beardsley, H. Kage, K.-I. Tanaka, K. Kyuma, C. D. Weissman, Computer vision for computer interaction, ACM SIGGRAPH Computer Graphics, 33(4), 1999, pp. 65-68.
    [12] J. Lee, J. Chai, J. K. Hodgins, P. S. A. Reitsma, N. S. Pollard, Interactive control of avatars animated with human motion data, ACM Transactions on Graphics, 21(3), 2002, pp. 491-500.
    [13] S. W. Smoliar, H. J. Zhang, Content-based video indexing and retrieval, IEEE Multimedia, 1(2), 1994, pp. 62-72.
    [14] K. Shearer, S. Venkatesh, D. Kieronska, Spatial indexing for video databases, Journal of Visual Communication and Image Representation, 7(4), 1996, pp. 325-335.
    [15] E. Ardizzone, M. Cascia, Automatic video database indexing and retrieval, Multimedia Tools and Applications, 4(1), 1997, pp. 29-56.
    [16] D. Ponceleon, S. Srinvasan, A. Amir, D. Petkovic, D. Diklic, Key to effective video retrieval: effective catalogin and browsing, ACM International Conference on Multimedia, Bristol, UK, Sep. 12-16, 1998, pp. 99-107.
    [17] Y. Deng, B. S. Manjunath, NeTra-V: toward an object-based video representation, IEEE Transactions on Circuits and Systems for Video Technology, 8(5), 1998, pp. 616-627.
    [18] A. K. Jain, A. Vailaya, X. Wei, Query by video clip, Multimedia Systems, 7(5), 1999, pp. 369-384.
    [19] R. Lienhart, W. Effelsberg, R. Jain, VisualGREP: a systematic method to compare and retrieve video sequences, Multimedia Tools and Applications, 10(1), 2000, pp. 47-72.
    [20] M. R. Naphade, T. S. Huang, A probabilistic framework for semantic video indexing, filtering and retrieval, IEEE Transactions on Multimedia, 3(1), 2001, pp. 141-151.
    [21] G. Ahanger, T. D. C. Little, A survey of technologies for parsing and indexing digital video, Journal of Visual Communication and Image Representation, 7(1), 1996, pp. 28-43.
    [22] F. Idris, S. Panchanathan, Review of image and video indexing techniques, Journal of Visual Communication and Image Representation, 8(2), 1997, pp. 146-166.
    [23] R. Brunelli, O. Mich, C. M. Modena, A survey on the automatic indexing of video data, Journal of Visual Communication and Image Representation, 10(2), 1999, pp. 78-112.
    [24] S. Antani, R. Kasturi, R. Jain, A survey on the use of pattern recognition methods for abstraction, indexing, and retrieval of images and video, Pattern Recognition, 35(4), 2002, pp. 945-965.
    [25] S. Jeannin, A. Divakaran, MPEG-7 visual motion descriptors, IEEE Transactions on Circuits and Systems for Video Technology, 11(6), 2001, pp. 720-724.
    [26] N. Dimitrova, F. Golshani, Motion recovery for video content analysis, ACM Transactions on Information Systems, 13(4), 1995, pp. 408-439.
    [27] S. F. Chang, W. Chen, H. J. Meng, H. Sundaram, D. Zhong, A fully automated content-based video search engine supporting spatiotemporal queries, IEEE Transactions on Circuits and Systems for Video Technology, 8(5), 1998, pp. 602-615.
    [28] E. Sahouria, A. Zakhor, A trajectory based video indexing system for street surveillance, IEEE International Conference on Image Processing, Kobe, Japan, Oct. 24-28, 1999.
    [29] S. Dağtaş, W. Al-Khatib, A. Ghafoor, R. L. Kashyap, Models for motion-based video indexing and retrieval, IEEE Transactions on Image Processing, 9(1), 2000, pp. 88-101.
    [30] M. Nabil, A. H. H. Ngu, J. Shepherd, Modeling and retrieval of moving objects, Multimedia Tools and Applications, 13(1), 2001, pp. 35-71.
    [31] C. S. Li, J. R. Smith, L. D. Bergman, V. Castelli, Sequential processing for content-based retrieval of composite objects, SPIE Storage and Retrieval of Image and Video Databases, San Jose, CA, Jan. 28-30, 1998, pp. 2-13.
    [32] H. Sundaram, S. F. Chang, Efficient video sequence retrieval in large repositories, SPIE Storage and Retrieval of Image and Video Databases, San Jose, CA, Jan. 26-29, 1999.
    [33] T. B. Moeslund, E. Granum, A survey of computer vision-based human motion capture, Computer Vision and Image Understanding, 81(3), 2001, pp. 231-268.
    [34] L. Wang, W. Hu, T. Tan, Recent developments in human motion analysis, Pattern Recognition, 36(3), 2003, pp. 585-601.
    [35] Y. Li, T. Wang, H. Y. Shum, Motion texture: a two-level statistical model for character motion synthesis, ACM Transactions on Graphics, 21(3), 2002, pp. 465-472.
    [36] L. Kovar, M. Gleicher, F. Pighin, Motion graphs, ACM Transactions on Graphics, 21(3), 2002, pp. 473-482.
    [37] H. J. Lee and Z. Chen, “Determination of 3D human body postures from a single view,” Computer Vision, Graphics, and Image Processing, 30, 1985, pp. 148-168.
    [38] C. Bregler and J. Malik, “Tracking people with twists and exponential maps,” IEEE Computer Vision and Pattern Recognition, Santa Barnara, California, USA, Jun. 23 - 25, 1998, pp. 8-15.
    [39] C. J. Taylor, “Reconstruction of articulated objects from point correspondences in a single uncalibrated image,” Computer Vision and Image Understanding, 80(3), 2000, pp. 349-363.
    [40] M. Brand, “Shadow puppetry,” IEEE International Conference on Computer Vision, Kerkyra, Corfu, Greece, Sep. 20-25, 1999, pp. 1237-1244.
    [41] N. R. Howe, M. E. Leventon, and W. T. Freeman, “Bayesian reconstruction of 3D human motion from single-camera video,” Neural Information Processing Systems, Denver, Colorado, USA, Nov. 29 - Dec. 4, 1999.
    [42] R. Rosales, M. Siddiqui, J. Alon, and S. Sclaroff, “Estimating 3D body posture using uncalibrated cameras,” IEEE Computer Vision and Pattern Recognition, South Carolina, USA, Jun. 13-15, 2001, pp. I-821- I-827.
    [43] Web3D Working Group on Humanoid Animation, Specification for a Standard Humanoid, Version 1.1, August 1999.
    [44] MPEG-4 Overview, ISO/IEC JTC1/SC29/WG11 N4668, March 2002,
    http://mpeg.telecomitalialab.com/standards/mpeg-4/mpeg-4.htm
    [45] C. Bregler, J. Malik, Tracking people with twists and exponential maps, IEEE Computer Vision and Pattern Recognition, Santa Barbara, California, USA, Jun. 23-25, 1998, pp. 8-15.
    [46] C. J. Taylor, Reconstruction of articulated objects from point correspondences in a single uncalibrated image, Computer Vision and Image Understanding, 80(3), 2000, pp. 349-363.
    [47] H. J. Lee, Z. Chen, Determination of 3D human body postures from a single view, Computer Vision, Graphics, and Image Processing, 30(1985), pp. 148-168.
    [48] D. Tolani, A. Goswami, N. Badler, Real-time inverse kinematics techniques for anthropomorphic limbs, Graphical Models, 62(5), 2000, pp. 353-388.
    [49] S. McFarlane, The Complete Book of T’ai Chi, Dorling Kindersley Limited, London, 1999.
    [50] Reference removed for double blind reviewing.
    [51] M. Gleicher, Retargetting motion to new characters, ACM SIGGRAPH, Orlando, Florida, USA, Jul. 19-24, 1998, pp. 33-42.
    [52] K. J. Choi, H. S. Ko, Online motion retargetting, The Journal of Visualization and Computer Animation, 11(5), 2000, pp. 223-235.
    [53] R. O. Duda, P. E. Hart, D. G. Stork, Patten Classification, John Wiley & Sons, 2001.
    [54] T. W. Parsons, Voice and Speech Processing, McGraw-Hill, 1986.
    [55] G. Salton, M. J. McGill, Introduction to modern information retrieval, McGraw-Hill, 1983.
    [56] G. Lu, Multimedia Database Management Systems, Artech House, 1999.
    [57] G. R. Hjaltason, H. Samet, Ranking in spatial databases, International Symposium on Spatial Databases, Portland, Maine, Aug. 6-9, 1995, pp. 83-95.
    [58] G. Salton, M. J. McGill, Introduction to modern information retrieval, McGraw-Hill, 1983.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE