簡易檢索 / 詳目顯示

研究生: 曾于瑄
Tseng, Yu-Hsuan
論文名稱: 利用骨架資料與光流進行羽球動作辨識
Badminton Action Recognition Using Skeleton Data and Optical Flow
指導教授: 李哲榮
Lee, Che-Rung
口試委員: 許秋婷
Hsu, Chiou-Ting
沈之涯
Shen, Chih-Ya
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 17
中文關鍵詞: 動作辨識光流法骨架偵測
外文關鍵詞: Action Recognition, Optical Flow, Skeleton Detection
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 對於運動影像的自動化理解,分析動作的能力是不可或缺的。隨著動作辨識的蓬勃發展,精確地分析影像動作變得有可能。雖然有大量有關動作辨識的研究,但極少關於羽球和運動直播影像的分析。在這篇論文中,我們搜集了一個羽球直播影像資料庫,並提出一個雙流架構來分類羽球動作。首先,我們使用兩種特徵:光流和骨架資料,當作這個雙流架構的輸入,因羽球動作和技巧非常複雜,我們選擇能夠有效提取人類動作的特徵。接著,將特徵輸入VGG和雙向長短期記憶模型來訓練和分類。VGG是一種卷積神經網路,我們用在光流的特徵提取。雙向長短期記憶模型能夠捕捉影片的時間關聯,我們用在雙流中。我們的雙流架構在羽球資料集中達到94.3%的準確率,充分能夠應用在各種運動直播影像分析中。


    The ability to analyze actions in videos is essential for automatic understanding of sports. With the development of action recognition, analyzing videos precisely becomes feasible. Althouth there are abundant researches on action recognition, few focus on badminton and broadcast sport videos. In this paper, we collect a broadcast badminton videos dataset and propose a two-stream architecture for classifying badminton actions. First, we use two features, optical flow and skeleton data, as the input of the two-stream architecture. Since the movements and skills of badminton are very complicated, we choose features that can capture human actions effectively. Then, the extracted features are fed into VGG and bi-directional long short-term memory (Bi-LSTM) model for training and classifying. VGG is a convolutional neural network used in the optical flow stream for feature extraction. Bi-LSTM is used in both stream to capture the dynamic temporal information of the video sequences. The two-stream architecture achieves 94.3% accuracy in our badminton dataset, which is available in analyzing broadcast sport videos.

    摘要 ii Abstract iii Contents iv 1 Introduction 1 2 Related Work 3 2.1 Analysis of Bandminton and Broadcast Video 3 2.2 Skeleton Detection 4 2.3 Action Recognition 4 3 Methods 5 3.1 Skeleton Data 6 3.2 Optical Flow 8 3.3 VGG 9 3.4 Bi-­LSTM 9 4 Experiments 11 4.1 Dataset 11 4.2 Result of Skeleton Detection 11 4.3 Classification Results 12 5 Conclusion 15 Bibliography 16

    [1] Z. Cao, T. Simon, S.­E. Wei, and Y. Sheikh, “Realtime multi­person 2d pose estimation using part affinity fields,” in The IEEE Conference on Computer Vision and Pattern Recog­nition (CVPR), Jul. 2017, pp. 7291–7299.
    [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large­-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
    [3] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transac­tions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, Nov. 1997.
    [4] H. Y. Ting, K. S. Sim, and F. S. Abas, “Kinect­based badminton movement recognition and analysis system,” International Journal of Computer Science in Sport, vol. 14, pp. 25–41, Jan. 2015.
    [5] W. F. Wang, C. Y. Yang, and D. Y. Wang, “Analysis of movement effectiveness in bad­minton strokes with accelerometers,” in Genetic and Evolutionary Computing, 2016, pp. 95–104.
    [6] J. Lin, C. Chang, C. Wang, H. Chi, C. Yi, Y. Tseng, and C. Wang, “Design and imple­ment a mobile badminton stroke classification system,” in 2017 19th Asia­Pacific Network Operations and Management Symposium (APNOMS), Sep. 2017, pp. 235–238.
    [7] S. Ramasinghe, K. G. M. Chathuramali, and R. Rodrigo, “Recognition of badminton strokes using dense trajectories,” in 7th International Conference on Information and Automation
    for Sustainability, Dec. 2014, pp. 1–6.
    [8] G. Liu, D. Zhang, and H. Li, “Research on action recognition of player in broadcast sports video,” International Journal of Multimedia and Ubiquitous Engineering, vol. 9, no. 10, pp. 297–306, 2014.
    [9] A. Ghosh, S. Singh, and C. V. Jawahar, “Towards structured analysis of broadcast bad­minton videos,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Mar. 2018, pp. 296–304.
    [10] A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2014, pp. 1653–1660.
    [11] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a convolutional network and a graphical model for human pose estimation,” in Advances in Neural Information Processing Systems 27, 2014, pp. 1799–1807.
    [12] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-­cnn,” in The IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 2961–2969.
    [13] K. Simonyan and A. Zisserman, “Two­-stream convolutional networks for action recognition in videos,” in Advances in Neural Information Processing Systems 27, 2014, pp. 568–576.
    [14] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-­stream network fusion for video action recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 1933–1941.
    [15] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal fea­tures with 3d convolutional networks,” in The IEEE International Conference on Computer Vision (ICCV), Dec. 2015, pp. 4489–4497.
    [16] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, “Temporal segmentnetworks: Towards good practices for deep action recognition,” in European Conference on Computer Vision (ECCV), 2016, pp. 20–36.
    [17] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-­term recurrent convolutional networks for visual recognition and de­scription,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 2625–2634.
    [18] T.­Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zit­nick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
    [19] S. S. Beauchemin and J. L. Barron, “The computation of optical flow,” ACM Comput. Surv., vol. 27, no. 3, pp. 433–466, Sep. 1995.
    [20] D. Fleet and Y. Weiss, “Optical flow estimation,” in Handbook of Mathematical Models in Computer Vision, N. Paragios, Y. Chen, and O. Faugeras, Eds. Springer US, 2006, pp. 237–257.
    [21] H. Sak, A. W. Senior, and F. Beaufays, “Long short-­term memory recurrent neural network architectures for large scale acoustic modeling,” in INTERSPEECH, 2014.
    [22] H. Wang and C. Schmid, “Action recognition with improved trajectories,” in The IEEE International Conference on Computer Vision (ICCV), Dec. 2013.

    QR CODE