簡易檢索 / 詳目顯示

研究生: 謝廷翊
Hsieh, Ting-I
論文名稱: 透過共注意力與共激發來實現單樣本物件偵測
One-Shot Object Detection with Co-Attention and Co-Excitation
指導教授: 陳煥宗
Chen, Hwann-Tzong
口試委員: 林彥宇
Lin, Yen-Yu
陳嘉平
Chen, Chia-Ping
劉庭祿
Liu, Tyng-Luh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 30
中文關鍵詞: 共注意力共激發單樣本物件偵測物件偵測透過共注意力與共激發來實現單樣本物件偵測
外文關鍵詞: One-Shot Object Detection with Co-Attention and Co-Excitation, One-Shot, Object Detection, Co-Attention, Co-Excitation
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一套透過共注意力與共激發來實現單樣本物件偵測的方法。在
    現實生活中,人類能夠基於少量樣本所提供的視覺資訊,達到很高的物件偵
    測和辨識率,但對於深度學習模型來說,只依賴少量樣本要達到可靠的物件
    偵測能力,卻是非常困難的挑戰。在本論文中我們探討關於單樣本的強化學
    習,利用共注意力與共激發的方式提升模型的學習能力。方法上,我們以
    Faster R-CNN 做為模型的基本架構,對於目標影像上的每個特徵區塊利用
    樣本的特徵比對相似度,並強化潛在物體的特徵區塊。最後,使用樣本的特
    徵來選擇最有用的特徵,提高有用的特徵,捨棄無用的特徵,進而增加相似
    度的判斷可靠度。我們在單樣本物件偵測的成果可以達到現今最佳方法的水
    準,並且已經將實驗所需的程式碼開源,供後續的研究使用。


    This thesis aims to tackle the challenging problem of one-shot object de-tection. Given a query image patch whose class label is not included in thetraining data, the goal of the task is to detect all instances of the same class ina target image. To this end, we develop a novelco-attention and co-excitation(CoAE) framework that makes contributions in three key technical aspects.First, we propose to use the non-local operation to explore the co-attention em-bodied in each query-target pair and yield region proposals accounting for theone-shot situation. Second, we formulate a squeeze-and-co-excitation schemethat can adaptively emphasize correlated feature channels to help uncover rel-evant proposals and eventually the target objects. Third, we design a margin-based ranking loss for implicitly learning a metric to predict the similarity ofa region proposal to the underlying query, no matter its class label is seen orunseen in training. The resulting model is therefore a two-stage detector thatyields a strong baseline on both VOC and MS-COCO under one-shot settingof detecting objects from both seen and never-seen classes

    5.List of Tables 6.List of Tables 7.摘 要 8.Abstract 9.Introduction 11.Related work 14.Proposed method 18.Experiments 22.Ablation studies 26.Conclusion 27.Bibliography

    [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation
    by jointly learning to align and translate. In 3rd International Conference on Learning
    Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track
    Proceedings, 2015.
    [2] Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: delving into high quality
    object detection. In 2018 IEEE Conference on Computer Vision and Pattern Recog-
    nition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 6154–6162,
    2018.
    [3] Miaobin Cen and Cheolkon Jung. Fully convolutional siamese fusion networks for
    object tracking. In 2018 IEEE International Conference on Image Processing, ICIP
    2018, Athens, Greece, October 7-10, 2018, pages 3718–3722, 2018.
    [4] Hao Chen, Yali Wang, Guoyou Wang, and Yu Qiao. LSTD: A low-shot transfer
    detector for object detection. In Proceedings of the Thirty-Second AAAI Conference
    on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial
    Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in
    Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018,
    pages 2836–2843, 2018.
    [5] Ross B. Girshick. Fast R-CNN. In 2015 IEEE International Conference on Computer
    Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pages 1440–1448, 2015.
    [6] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature
    hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE
    Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus,
    OH, USA, June 23-28, 2014, pages 580–587, 2014.
    [7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling
    in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal.
    Mach. Intell., 37(9):1904–1916, 2015.
    [8] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. Mask R-CNN.
    In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy,
    October 22-29, 2017, pages 2980–2988, 2017.
    [9] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In 2018 IEEE
    Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City,
    UT, USA, June 18-22, 2018, pages 7132–7141, 2018.
    [10] Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell.
    Few-shot object detection via feature reweighting. CoRR, abs/1812.01866, 2018.
    [11] Gregory R. Koch. Siamese neural networks for one-shot image recognition. 2015.
    [12] Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, and Jianbo Shi. Foveabox: Beyond
    anchor-based object detector. CoRR, abs/1904.03797, 2019.
    [13] Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. In Computer
    Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14,
    2018, Proceedings, Part XIV, pages 765–781, 2018.
    [14] Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. High performance visual
    tracking with siamese region proposal network. In 2018 IEEE Conference on Computer
    Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June
    18-22, 2018, 2018.
    [15] Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and
    Serge J. Belongie. Feature pyramid networks for object detection. In 2017 IEEE
    Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI,
    USA, July 21-26, 2017, pages 936–944, 2017.
    [16] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. Focal
    loss for dense object detection. In IEEE International Conference on Computer
    Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2999–3007, 2017.
    [17] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed,
    Cheng-Yang Fu, and Alexander C. Berg. SSD: single shot multibox detector. In
    Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands,
    October 11-14, 2016, Proceedings, Part I, pages 21–37, 2016.
    [18] Claudio Michaelis, Ivan Ustyuzhaninov, Matthias Bethge, and Alexander S. Ecker.
    One-shot instance segmentation. CoRR, abs/1811.11507, 2018.
    [19] Stephen E. Palmer. Vision science : photons to phenomenology. MIT Press, Cambridge,
    Mass., 1999.
    [20] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You
    only look once: Unified, real-time object detection. In 2016 IEEE Conference on
    Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June
    27-30, 2016, pages 779–788, 2016.
    [21] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: towards
    real-time object detection with region proposal networks. In Advances in Neural Information
    Processing Systems 28: Annual Conference on Neural Information Processing
    Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 91–99, 2015.
    [22] Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder,
    Sharathchandra Pankanti, Rogério Schmidt Feris, Abhishek Kumar, Raja Giryes, and
    Alexander M. Bronstein. Repmet: Representative-based metric learning for classification
    and one-shot object detection. CoRR, abs/1806.04728, 2018.
    [23] Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann
    LeCun. Overfeat: Integrated recognition, localization and detection using convolutional
    networks. In 2nd International Conference on Learning Representations, ICLR
    2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
    [24] Jake Snell, Kevin Swersky, and Richard S. Zemel. Prototypical networks for fewshot
    learning. In Advances in Neural Information Processing Systems 30: Annual
    Conference on Neural Information Processing Systems 2017, 4-9 December 2017,
    Long Beach, CA, USA, pages 4080–4090, 2017.
    [25] Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timothy M.
    Hospedales. Learning to compare: Relation network for few-shot learning. In 2018
    IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt
    Lake City, UT, USA, June 18-22, 2018, pages 1199–1208, 2018.
    [26] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances
    in Neural Information Processing Systems 28: Annual Conference on Neural Information
    Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada,
    pages 2692–2700, 2015.
    [27] Oriol Vinyals, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, and Daan Wierstra.
    Matching networks for one shot learning. In Advances in Neural Information
    Processing Systems 29: Annual Conference on Neural Information Processing Systems
    2016, December 5-10, 2016, Barcelona, Spain, pages 3630–3638, 2016.
    [28] Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. Non-local neural
    networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition,
    CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 7794–7803, 2018.
    [29] Tengfei Zhang, Yue Zhang, Xian Sun, Hao Sun, Menglong Yan, Xue Yang, and
    Kun Fu. Comparison network for one-shot conditional object detection. CoRR,
    abs/1904.02317, 2019.

    QR CODE