研究生: |
曾于瑄 Tseng, Yu-Hsuan |
---|---|
論文名稱: |
利用骨架資料與光流進行羽球動作辨識 Badminton Action Recognition Using Skeleton Data and Optical Flow |
指導教授: |
李哲榮
Lee, Che-Rung |
口試委員: |
許秋婷
Hsu, Chiou-Ting 沈之涯 Shen, Chih-Ya |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 17 |
中文關鍵詞: | 動作辨識 、光流法 、骨架偵測 |
外文關鍵詞: | Action Recognition, Optical Flow, Skeleton Detection |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
對於運動影像的自動化理解,分析動作的能力是不可或缺的。隨著動作辨識的蓬勃發展,精確地分析影像動作變得有可能。雖然有大量有關動作辨識的研究,但極少關於羽球和運動直播影像的分析。在這篇論文中,我們搜集了一個羽球直播影像資料庫,並提出一個雙流架構來分類羽球動作。首先,我們使用兩種特徵:光流和骨架資料,當作這個雙流架構的輸入,因羽球動作和技巧非常複雜,我們選擇能夠有效提取人類動作的特徵。接著,將特徵輸入VGG和雙向長短期記憶模型來訓練和分類。VGG是一種卷積神經網路,我們用在光流的特徵提取。雙向長短期記憶模型能夠捕捉影片的時間關聯,我們用在雙流中。我們的雙流架構在羽球資料集中達到94.3%的準確率,充分能夠應用在各種運動直播影像分析中。
The ability to analyze actions in videos is essential for automatic understanding of sports. With the development of action recognition, analyzing videos precisely becomes feasible. Althouth there are abundant researches on action recognition, few focus on badminton and broadcast sport videos. In this paper, we collect a broadcast badminton videos dataset and propose a two-stream architecture for classifying badminton actions. First, we use two features, optical flow and skeleton data, as the input of the two-stream architecture. Since the movements and skills of badminton are very complicated, we choose features that can capture human actions effectively. Then, the extracted features are fed into VGG and bi-directional long short-term memory (Bi-LSTM) model for training and classifying. VGG is a convolutional neural network used in the optical flow stream for feature extraction. Bi-LSTM is used in both stream to capture the dynamic temporal information of the video sequences. The two-stream architecture achieves 94.3% accuracy in our badminton dataset, which is available in analyzing broadcast sport videos.
[1] Z. Cao, T. Simon, S.E. Wei, and Y. Sheikh, “Realtime multiperson 2d pose estimation using part affinity fields,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 7291–7299.
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
[3] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, Nov. 1997.
[4] H. Y. Ting, K. S. Sim, and F. S. Abas, “Kinectbased badminton movement recognition and analysis system,” International Journal of Computer Science in Sport, vol. 14, pp. 25–41, Jan. 2015.
[5] W. F. Wang, C. Y. Yang, and D. Y. Wang, “Analysis of movement effectiveness in badminton strokes with accelerometers,” in Genetic and Evolutionary Computing, 2016, pp. 95–104.
[6] J. Lin, C. Chang, C. Wang, H. Chi, C. Yi, Y. Tseng, and C. Wang, “Design and implement a mobile badminton stroke classification system,” in 2017 19th AsiaPacific Network Operations and Management Symposium (APNOMS), Sep. 2017, pp. 235–238.
[7] S. Ramasinghe, K. G. M. Chathuramali, and R. Rodrigo, “Recognition of badminton strokes using dense trajectories,” in 7th International Conference on Information and Automation
for Sustainability, Dec. 2014, pp. 1–6.
[8] G. Liu, D. Zhang, and H. Li, “Research on action recognition of player in broadcast sports video,” International Journal of Multimedia and Ubiquitous Engineering, vol. 9, no. 10, pp. 297–306, 2014.
[9] A. Ghosh, S. Singh, and C. V. Jawahar, “Towards structured analysis of broadcast badminton videos,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Mar. 2018, pp. 296–304.
[10] A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2014, pp. 1653–1660.
[11] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a convolutional network and a graphical model for human pose estimation,” in Advances in Neural Information Processing Systems 27, 2014, pp. 1799–1807.
[12] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in The IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 2961–2969.
[13] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in Neural Information Processing Systems 27, 2014, pp. 568–576.
[14] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 1933–1941.
[15] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in The IEEE International Conference on Computer Vision (ICCV), Dec. 2015, pp. 4489–4497.
[16] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, “Temporal segmentnetworks: Towards good practices for deep action recognition,” in European Conference on Computer Vision (ECCV), 2016, pp. 20–36.
[17] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 2625–2634.
[18] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
[19] S. S. Beauchemin and J. L. Barron, “The computation of optical flow,” ACM Comput. Surv., vol. 27, no. 3, pp. 433–466, Sep. 1995.
[20] D. Fleet and Y. Weiss, “Optical flow estimation,” in Handbook of Mathematical Models in Computer Vision, N. Paragios, Y. Chen, and O. Faugeras, Eds. Springer US, 2006, pp. 237–257.
[21] H. Sak, A. W. Senior, and F. Beaufays, “Long short-term memory recurrent neural network architectures for large scale acoustic modeling,” in INTERSPEECH, 2014.
[22] H. Wang and C. Schmid, “Action recognition with improved trajectories,” in The IEEE International Conference on Computer Vision (ICCV), Dec. 2013.