簡易檢索 / 詳目顯示

研究生: 余信翰
Yu, Shin-Han
論文名稱: 基於視覺利用乘積隱藏式馬可夫模型手語辨識
Vision-based Continuous Sign Language Recognition using Product Hidden Markov Models
指導教授: 黃仲陵
Huang, Chung-Lin
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2010
畢業學年度: 99
語文別: 英文
論文頁數: 60
中文關鍵詞: 手語辨識乘積馬可夫模型
外文關鍵詞: sign language recognition, product hidden markov model
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 手語是聽障人士日常溝通的基本工具之一,基於此動機我們在此設計了一套手語辨識系統。在這篇研究論文中,我們以視覺為基礎下利用乘積隱藏式馬可夫模型來達到手語詞彙的辨識。由語言學構音(articulation)的研究,手語中的手勢是由:手的位置、手的型狀手的位置、手的型狀以及手的移動方向三種音素所組成的。這個系統分成四大部分;特徵擷取、模型訓練、句子切割以及辨識。首先在特徵擷取的部分是手語者佩戴不同顏色手套並且利用連續可適性平均值移
    動(CamShift)演算法來達到手的追蹤,我們對雙手取7Hu 和型與主軸的夾角來描述手型的特徵,接著針對每一個手語詞彙都去訓練一組乘積隱藏是馬可夫模型。那在句子的切割,本篇論文提出兩層的連續手語句子切割,第一層利用手的位置對句子做粗略的切割,對於切割後每段區段都有其相關資訊;第二層則是利用手型的變化來針對每段區段作精細的手語詞彙邊界切割。最後在辨識方面,利用上述切割方法所得到的序列並將其對應的觀察值序列對以訓練好的乘積隱藏式馬可夫模型去計算其機率值,挑選出最高的機率模型,則將其選擇為被辨識的手勢。
    在這個實驗中,我們挑選了40 個台灣手語詞彙來當作是我們的語料庫,相關影片由每個受測者拍攝並作為我們的實驗樣本。經過測試平均後,我們的系統可以獲得94.04%的字彙辨識率。在另一個實驗,我們收集三句台灣手語句子,每句平均由18~23 個手語字彙組成,平均偵測ME 的recall 為74.5%,precision 為89%.


    In this thesis, we introduce a vision-based continuous sign language recognition to recognize sign language sentences in a simple background. The system consists of four modules: feature extraction, product hidden Markov model (PHMM) training, sign words recognition using the PHMMs. To allow real-time moving hand tracking and hand shape extraction, the signer wears gloves with different colors. CamShift algorithm is used to track the moving hands gesture in a video. We apply the 7Hu moment and orientation of major axis to characterize the hand shape. After having
    extracted the feature, we train our system using PHMM. Then, we use the hand location to roughly segment the continuous sign. After rough segmentation, we apply the hand-shape-based segmentation to divide the CSR image sequence into image sub-sequences, and then use the trained PHMM to recognize the isolated sign word.
    In the experiments, we choose 40 Taiwan Sign Language (TSL) sign words, and collect the sign language videos made by different signers. The experimental results demonstrate that our system achieves a good performance of sign-word recognition accuracy of 94.04%. In another experiments, we collect 3 TSL sentences which consist of 18~23 sign-words. The experimental results show that the average sign
    spotting recall rate is 74.5% and precision rate is 89%.

    Contents Contents......................................................................................................................................i List of Figures.......................................................................................................................... iii List of Tables.............................................................................................................................v CHAPTER 1 INTRODUCTION……………….……………………………………...……1 1.1 Motivation..................................................................................................................1 1.2 Related Works .............................................................................................................2 1.3 System Overview........................................................................................................4 1.4 Organization of this Thesis….....................................................................................6 CHAPTER 2 EXTRACTION of MANUAL FEATURES....................................................7 2.1 Hand Track Using CamShift.......................................................................................7 2.1.1 Object Model…...................................................................................................8 2.1.2 Histogram Back-Projection.................................................................................9 2.2 Hands Locations…………..........................................................................12 2.3 Hand Segmentation…………...................................................................................13 2.4 Hybrid-Feature Vector Composition………….........................................................14 2.4.1 7 Hu Moments…...............................................................................................16 2.4.2 Orientation of Major Axis….............................................................................18 2.5 Posture Recognition……………….………….........................................................22 2.6 Orientation of Hand Trajectory Quantization….......................................................23 CHAPTER 3 STATISTICAL MODEL………....................................................................26 3.1 Hidden Markov Models............................................................................................26 3.2 Classification Using HMM.......................................................................................29 3.3 HandShape-Movement Product Model.....................................................................30 CHAPTER 4 CONTINUOUS SIGN LANGUAGE RECOGNITION…..........................34 4.1 Hand-Location-based Rough Segmentation.............................................................35 4.2 Hand-Shape-based Fine Segmentation.....................................................................36 4.3 Merge of The Fine Segments………........................................................................41 4.4 An Isolated Sign Recognition……….......................................................................44 CHAPTER 5 EXPERIMENTAL RESULT……………………..........................................46 5.1 Experimental Environments......................................................................................46 5.2 SVM Classification Experiments..............................................................................46 5.3 Identification Isolated Signs......................................................................................49 5.4 Identification Continuous Sign Language.................................................................53 ii CHAPTER 6 CONCLUSION AND FUTURE WORKS…………………........................57 REFERENCES ……………………………..………………………………….……………58

    REFERENCES
    [1] W. Gao, G. Fang, D. Zhao, and Y. Chen, “Transition Movement Models for
    Large Vocabulary Continuous Sign Language Recognition,” Proc. Sixth
    IEEE Int’l Conf. Automatic Face and Gesture Recognition, pp. 553-558,
    May 2004.
    [2] J. Ye, H. Yao, and F. Jiang, “Based on HMM and SVM Multilayer
    Architecture Classifier for Chinese Sign Language Recognition with Large
    Vocabulary,” Proc. Third Int’l Conf. Image and Graphics (ICIG ’04), pp.
    377- 380, Dec. 2004.
    [3] C. Vogler and D. Metaxas, “A Framework for Recognizing the
    Simultaneous Aspects of American Sign Language,” Computer Vision and
    Image Understanding, vol. 81, no. 3, pp. 358-384, 2001.
    [4] C.Oz, M.C,Leu, ”Recognition of Finger Spelling of American Sign
    Language with Artificial Neural Network Using Position / Orientation
    Sensors and Data Glove”, PROC.ISNN, pp.157-164, 2005.
    [5] B. Bauer and K.F. Kraiss, “Video-Based Sign Recognition Using
    Self-Organizing Subunits,” Proc. 16th Int’l Conf. Pattern Recognition, pp.
    434-437, Aug. 2002.
    [6] E.-J. Holden, G. Lee, and R. Owens, “Australian Sign Language
    Recognition,” Machine Vision and Applications, vol. 16, no. 5, pp. 312-320,
    2005.
    [7] R. Bowden et al., “A Linguistic Feature Vector for the Visual Interpretation
    of Sign Language,” Proc. Eighth European Conf. Computer Vision, pp.
    391-401, May 2004.
    [8] J. Allen, R. Xu, and J. Jin, "Object tracking using CamShift algorithm and
    multiple quantized feature spaces," Proceedings of the Pan-Sydney area
    workshop on Visual information processing, pp. 3-7, 2004.
    [9] LIU X, CHU H X, LI P J. Research of the Improved Camshift Tracking
    Algorithm. International Conference on Mechatronics and Automation.
    China, 2007: 968-972.
    [10] HU, M. K. 1962. Visual pattern recognition by moment invariants. IRE
    Transactions on Information Theory 8, 2, 179–187.
    [11] Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern Classification,’
    Wiley-Interscience.
    59
    [12] Chih-Wei Hsu, Chih-Jen Lin, “A Comparison of Methods for Multi-class
    Support Vector Machines,” IEEE TNN, 2002
    [13] C.-C. Chang, C.-J. Lin, “LIBSVM: a library for support vector machines,
    “ http://www.csie.ntu.edu.tw/~cjlin/libsvm/
    [14] von Agris, U., Zieren, J., Canzler, U., Bauer, B., Kraiss, K.F.: Recent
    developments in visual sign language recognition. Springer Journal on
    Universal Access in the Information Society, “Emerging Technologies for
    Deaf Accessibility in the Information Society” (to appear, 2007)
    [15] Lawrence R. Rabiner. A tutorial on hidden Markov models and selected
    applications in speech recognition. Proceedings of IEEE, 77(2):257 - 286,
    1989
    [16] L. R. Rabiner, B. H. Juang, S. E. Levinson, and M. M. Sondhi, “Some
    properties of continuous hidden Markov model representations,”
    AT&TTech. ]., vol. 64, no. 6, pp. 1251-1270, July- Aug. 1985.
    [17] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An introduction to the
    application of the theory of probabilistic functions of a Markov process to
    automatic speech recognition,” SeIISyst. Tech. J., vol. 62, no. 4, pp.
    1035-1074, Apr. 1983.
    [18] S. Nakamura, K. Kumatani, and S. Tamura, “Multi-modal temporal
    asynchronicity modeling by product HMMs for robust,” in Proceedings of
    Fourth IEEE International Conference on Multimodal Interfaces (ICMI’02),
    p. 305, 2002.
    [19] M.J Tomlinson, M.J. Russell and N.M. Brooke, “Integrating audio and
    visual information to provide highly robust speech recognition”, Proc
    ICASSP-96 vol. 2 pp.821-824 May 1996.
    [20] D. Kelly, J. McDonald, and C. Markham. Recognizing spatiotemporal
    gestures and movement epenthesis in sign language. In IMVIP 2009, 2009.
    [21] H.-D. Yang, S. Sclaroff, and S.-W. Lee. 2009. Sign Language Spotting
    with a Threshold Model Based on Conditional Random Fields. IEEE
    Transactions on Pattern Analysis and Machine Intelligence,
    31(7):1264–1277.
    [22] R. Yang, S. Sarkar, B. Loeding, Handling movement epenthesis and hand
    segmentation ambiguities in continuous sign language recognition using
    nested dynamic programming, IEEE Transactions on Pattern Analysis and
    Machine Intelligence, 2008, submitted for publication.
    60
    [23] William C. Stokoe. Sign Language Structure: An Outline of the visual
    Communication System of the American Deaf. Studies in Linguistics:
    Occasional Papers 8. Linstok Press, Silver Spring, MD, 1960. Revised
    1978.
    [24] Theodorakis, S., A. Katsamanis, et al. Product-HMMs for automatic sign
    language recognition. Acoustics. Speech and Signal Processing. 2009.
    ICASSP2009. IEEE International Conference on.
    [25] Infantino, I., R. Rizzo, et al. "A Framework for Sign Language Sentence
    Recognition by Commonsense Context." Systems, Man, and Cybernetics,
    Part C: Applications and Reviews, IEEE Transactions on 37(5): 1034-1039.
    [26] von Agris, U., C. Blomer, et al.. Rapid signer adaptation for continuous sign
    language recognition using a combined approach of eigenvoices, MLLR,
    and MAP. Pattern Recognition, 2008. ICPR 2008. 19th International
    Conference on.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE