研究生: |
俞尚毅 Yu, Shang-Yi |
---|---|
論文名稱: |
探尋影片中的承擔特質 Affordance Detection in Videos |
指導教授: |
陳煥宗
Chen, Hwann-Tzong |
口試委員: |
邱維辰
Chiu, Wei-Chen 胡敏君 Hu, Min-Chun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 27 |
中文關鍵詞: | 影片 、承擔特質 、偵測 |
外文關鍵詞: | Video, Affordance, Detection |
相關次數: | 點閱:60 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本文提出了一種全新有關於承擔特質的任務:在影片中的每一幀畫面中,尋
找具有承擔特質的區域以及判斷承擔特質的有無。以往有關承擔特質的研究
中,只著重在圖像的偵測;為了此項在影片中偵測承擔特質的新任務,我們
提出一個新的承擔特質資料庫Support Affordance Video (SAV) dataset,蒐
集支撐承擔特質的影片並設計一系列的動作情境,使得承擔特質有無的狀態
隨著情境中的動作及環境而改變。我們提出了網路架構,使用兩條不同的分
支加上專注於時間序的模組,預測在影片中的承擔特質的關注區域、承擔特
質的區域、及承擔特質有無的標籤。我們檢驗在SAV 資料集上測試的結果,
以驗證此方法的有效性。
This thesis proposes a new task on affordance: detecting the affordance
region and predicting the existence of affordance for each frame in a video
sequence. In the past, researches about affordance only focus on detection
for a single image. For this new task about affordance detection in videos,
we build a new affordance dataset, Support Affordance Video (SAV) dataset.
The dataset consists of support affordance videos that exhibit a series of action
scenarios to make the affordance existence status change as actions and environments
change in scenarios. We propose a network architecture that uses
two different branches and temporal modules to predict affordance attention
area, affordance region, and affordance existence label in a video. The experimental
results on SAV dataset provide a baseline of the new task and validate
the effectiveness of our method.
[1] F.-J. Chu, R. Xu, and P. A. Vela. Detecting robotic affordances on novel objects with
regional attention and attributes. 2019.
[2] C. Chuang, J. Li, A. Torralba, and S. Fidler. Learning to act properly: Predicting and
explaining affordances from images. In 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 975–983, 2018.
[3] T. Do, A. Nguyen, and I. Reid. Affordancenet: An end-to-end deep learning approach
for object affordance detection. In 2018 IEEE International Conference on Robotics
and Automation (ICRA), pages 5882–5889, 2018.
[4] K. Fang, T. Wu, D. Yang, S. Savarese, and J. J. Lim. Demo2vec: Reasoning object
affordances from online videos. In 2018 IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 2139–2147, 2018.
[5] D. F. Fouhey, X. Wang, and A. Gupta. In defense of the direct perception of affordances.
2015.
[6] J. Gibson. The Ecological Approach to Visual Perception. Resources for ecological
psychology. Lawrence Erlbaum Associates, 1986.
[7] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick. Mask R-CNN. CoRR,
abs/1703.06870, 2017.
[8] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu. Squeeze-and-excitation networks.
IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2019.
[9] R. Karlsson and E. Sjoberg. Learning a directional soft lane affordance model for road
scenes using self-supervision. In IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2020, Las Vegas, NV, United States, October 20-23, 2020, 2020.
[10] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Largescale
video classification with convolutional neural networks. In 2014 IEEE Conference
on Computer Vision and Pattern Recognition, pages 1725–1732, 2014.
[11] J. Lin, C. Gan, and S. Han. Tsm: Temporal shift module for efficient video understanding.
In 2019 IEEE/CVF International Conference on Computer Vision (ICCV),
pages 7082–7092, 2019.
[12] L. Manuelli,W. Gao, P. R. Florence, and R. Tedrake. kpam: Keypoint affordances for
category-level robotic manipulation. CoRR, abs/1903.06684, 2019.
[13] A. Myers, C. L. Teo, C. Fermüller, and Y. Aloimonos. Affordance detection of tool
parts from geometric features. In 2015 IEEE International Conference on Robotics
and Automation (ICRA), pages 1374–1381, 2015.
[14] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation.
CoRR, abs/1603.06937, 2016.
[15] M. Rahman and Y. Wang. Optimizing intersection-over-union in deep neural networks
for image segmentation. volume 10072, pages 234–244, 12 2016.
[16] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical
image segmentation. CoRR, abs/1505.04597, 2015.
[17] A. Roy and S. Todorovic. A multi-scale cnn for affordance segmentation in rgb images.
2016.
[18] E. Ruiz and W. W. Mayol-Cuevas. Egocentric affordance detection with the one-shot
geometry-driven interaction tensor. CoRR, abs/1906.05794, 2019.
[19] J. Sawatzky, Y. Souri, C. Grund, and J. Gall. What object should I use? - task driven
object detection. CoRR, abs/1904.03000, 2019.
[20] X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and W. Woo. Convolutional
LSTM network: A machine learning approach for precipitation nowcasting. CoRR,
abs/1506.04214, 2015.
[21] K. K. Singh and Y. J. Lee. Hide-and-seek: Forcing a network to be meticulous for
weakly-supervised object and action localization. CoRR, abs/1704.04232, 2017.
[22] A. Srikantha and J. Gall. Weakly supervised learning of affordances. 2016.
[23] C. Sun, J. M. U. Vianney, and D. Cao. Affordance learning in direct perception for
autonomous driving. 2019.
[24] S. Thermos, P. Daras, and G. Potamianos. A deep learning approach to object affordance
segmentation. In ICASSP 2020 - 2020 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pages 2358–2362, 2020.
[25] S. Thermos, G. T. Papadopoulos, P. Daras, and G. Potamianos. Deep affordancegrounded
sensorimotor object recognition. CoRR, abs/1704.02787, 2017.
[26] M. Toromanoff, E. Wirbel, and F. Moutarde. End-to-end model-free reinforcement
learning for urban driving using implicit affordances. In IEEE Conference on Computer
Vision and Pattern Recognition, CVPR, 2020.
[27] D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri. A closer look at
spatiotemporal convolutions for action recognition. In 2018 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 6450–6459, 2018.
[28] Q.Wang, L. Zhang, L. Bertinetto,W. Hu, and P. H. S. Torr. Fast online object tracking
and segmentation: A unifying approach. In 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), pages 1328–1338, 2019.
[29] X. Wang, R. B. Girshick, A. Gupta, and K. He. Non-local neural networks. CoRR,
abs/1711.07971, 2017.
[30] N. Xu, L. Yang, Y. Fan, J. Yang, D. Yue, Y. Liang, B. L. Price, S. Cohen, and
T. S. Huang. Youtube-vos: Sequence-to-sequence video object segmentation. CoRR,
abs/1809.00461, 2018.
[31] L. Yang, Y. Fan, and N. Xu. Video instance segmentation. In 2019 IEEE/CVF International
Conference on Computer Vision (ICCV), pages 5187–5196, 2019.
[32] L. Yen-Chen, A. Zeng, S. Song, P. Isola, and T.-Y. Lin. Learning to see before learning
to act: Visual pre-training for manipulation. In IEEE International Conference on
Robotics and Automation (ICRA), 2020.