研究生: |
靳文綺 Chin, Wen-Chi |
---|---|
論文名稱: |
利用局部與非局部特徵融合學習稠密對應性 Learning Dense Correspondences via Local and Non-local Feature Fusion |
指導教授: |
陳煥宗
Chen, Hwann-Tzong |
口試委員: |
許秋婷
Hsu, Chiu-Ting 劉庭祿 Liu, Ting-Lu |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 33 |
中文關鍵詞: | 稠密對應性 、視覺描述子 、光流 、特徵聚合 |
外文關鍵詞: | dense correspondence, visual descriptor, optical flow, feature map aggregation |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一套基於類神經網路學習影片中物體的特徵對應關係的方法。透過網路架構學習出特徵之後,我們可以推論出物體在當前影片畫面及任意畫面之間的稠密對應關係。為了達到此目的,我們訓練了一個擁有非局部模塊的深層學習模型,非局部模塊有助於讓特徵圖學習到更廣泛且全局的資訊。此外,我們也提出了一個新的資料集,可在訓練及測試階段被使用。而在我們的方法中也提出了一套特徵聚合的技術,利用影片連續畫面所估出的光流作為依據並整合多個特徵圖,此作法可降低特徵圖的一些不確定性。我們也利用由光流所得到的全局資訊來評估特徵匹配的可靠性,並濾掉一些不可靠的點。最後由實驗結果可知,我們這套結合了局部與非局部資訊的方法能有效提升特徵匹配的準確性。
We present a learning based method for extracting distinctive features on video objects. From the extracted features, we are able to derive dense correspondences between the objects in the current video frame and in the reference template. We train a deep-learning model with non-local blocks to predict dense feature maps for long-range dependencies. A new video object correspondence dataset is introduced for training and for evaluation. Further, we propose a new feature-aggregation technique that is based on the optical flow of consecutive frames and apply it to the integration of multiple feature maps for alleviating uncertainties. We also use the local information provided by optical flow to evaluate the reliability of feature matching. The experimental results show that our local and non-local fusion approach can reduce unreliable correspondences and thus improve the matching accuracy.
[1] P. Agrawal, J. Carreira, and J. Malik. Learning to see by moving. In 2015 IEEE International ConferenceonComputerVision,ICCV, pages37–45,2015.
[2] A. Ahmadi, I. Marras, and I. Patras. Likenet:A siamese motion estimation network trained in an unsupervised way. In British Machine Vision Conference, BMVC, page 296,2018.
[3] H. Bristow, J. Valmadre, and S. Lucey. Dense semantic correspondence where every pixel is a classifier. In 2015 IEEE International Conferenceon Computer Vision, ICCV, pages 4024–4031,2015.
[4] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical flow estimation based on a theory for warping. In Computer Vision-ECCV2004, pages25–36, 2004.
[5] Z. Chen, H. Jin, Z. Lin, S. Cohen, and Y. Wu. Large displacement optical flow from nearest neighbor fields. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages2443–2450,2013.
[6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, pages886–893,2005.
[7] P. R. Florence, L. Manuelli, and R. Tedrake. Dense object nets:Learning dense visual object descriptors by and for robotic manipulation. In 2nd Annual Conference on RobotLearning, pages 373–385, 2018.
[8] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 770–778, 2016.
[9] S. Hinterstoisser, V. Lepetit, N. Rajkumar, and K. Konolige. Going further with point pair features. In Computer Vision-ECCV-14th European Conference, pages 834–848, 2016.
[10] T. Hodan, P. Haluza, S. Obdrzálek, J. Matas, M. I. A. Lourakis, and X. Zabulis. TLESS: an RGB-D dataset for 6d pose estimation of texture-less objects. In IEEE Winter Conference on Applications of Computer Vision, pages880–888, 2017.
[11] A. Hosni, C. Rhemann, M. Bleyer, C. Rother, and M. Gelautz. Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell., 35(2):504–511,2013.
[12] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. Flownet2.0: Evolution of optical flow estimation with deep networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 1647–1655, 2017.
[13] W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab. SSD-6D: making rgb-based 3d detection and 6d pose estimation great again. In IEEE International Conference on Computer Vision, ICCV, pages 1530–1538, 2017.
[14] J. Kim, C. Liu, F. Sha, and K. Grauman. Deformable spatial pyramid matching for fast dense correspondences. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 2307–2314, 2013.
[15] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2014.
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90,2017.
[17] M. Labbé and F. Michaud. Online global loop closure detection for large-scale multisession graph-based SLAM.In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2661–2666, 2014.
[18] Y. Li, G. Wang, X. Ji, Y. Xiang, and D. Fox. Deepim: Deep iterative matching for 6d pose estimation. In Computer Vision-ECCV-15th European Conference, pages 695–711, 2018.
[19] C. Liu, J. Yuen, and A. Torralba. SIFTflow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell., 33(5):978–994, 2011.
[20] J. Long, N. Zhang, and T. Darrell. Do convnets learn correspondence? In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, pages 1601–1609, 2014.
[21] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.
[22] S. Pathak, A. Moro, H. Fujii, A. Yamashita, and H. Asama. Distortion-robust spherical camera motion estimation via dense optical flow. In 2018 IEEE International Conference on Image Processing, ICIP, pages 3358–3362, 2018.
[23] T. Schmidt, R. A. Newcombe,and D. Fox. Self-supervised visual descriptor learning for dense correspondence. IEEE Robotics and Automation Letters, 2(2): 420–427, 2017.
[24] M. Sundermeyer, Z. Marton, M. Durner, M. Brucker, and R. Triebel. Implicit 3d orientation learning for 6d object detection from RGB images. In Computer Vision-ECCV-15th European Conference, pages 712–729, 2018.
[25] B. Tekin, S. N. Sinha, and P. Fua. Real-time seamless single shot 6d object pose prediction. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 292–301, 2018.
[26] A. Verma, H. Qassim, and D. Feinzimer. Residual squeeze CNDS deep learning CNN model for very large scale places image recognition. In 8th IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON, pages 463–469, 2017.
[27] X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 373–385, 2018.
[28] X. Wang, K. He, and A. Gupta. Transitive invariance for self-supervised visualrepresentation learning. In IEEE International Conference on Computer Vision, ICCV, pages 1338–1347, 2017.
[29] P. Wohlhart and V. Lepetit. Learning descriptors for object recognition and 3d pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 3109–3118, 2015.
[30] H. Yang, W. Lin, and J. Lu. DAISY filter flow: Ageneralized discrete approach to dense correspondences. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 3406–3413, 2014.