簡易檢索 / 詳目顯示

研究生: 沈祐安
Shen, You-An
論文名稱: 基於長短期自我注意力的輕量卷積模型的實例分割
Lightweight Convolutional Neural Network with Long Term and Short Term Self-Attention for Instance Segmentation
指導教授: 張隆紋
Chang, Long-Wen
口試委員: 陳朝欽
CHEN, CHAUR-CHIN
黃仲陵
HUANG, CHUNG-LIN
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 35
中文關鍵詞: 電腦視覺機器學習實例分割
外文關鍵詞: ComputerVision, MachineLearning, InstanceSegmentation
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 對象檢測、語義分割和實例分割是計算機視覺中非常重要的技術。對象檢測識別圖像中的對象,而語義分割識別圖像的每個像素及其類別。實例分割是這兩種分割技術的混合體。它識別對象實例及其類標籤,並定位每個實例的掩碼。在本論文中,我們提出了一種新的輕量級神經網絡,基於長短期自我注意力的輕量卷積模型的實例分割 (Long-term and Short-term Self-Attention network,LSSA)。所提出的網絡可以在一些使用只有很少記憶體的設備有所應用,例如Edge Computing或自動駕駛汽車等。它融合了長期和短期自註意力的思想來豐富提取的特徵,從而得到在更好的表現。我們還進行了幾次消融實驗 (Ablation Study) 以證明我們提出的網絡的有效性。它可以在 COCO 數據集上獲得可跟 State-of-The-Art 比擬的結果,同時其參數少於幾個 State-of-The-Art 模型,並且在模型大小和性能之間有很好的平衡。


    Object detection, semantic segmentation and instance segmentation are very important techniques in computer vision. Object detection identifies objects in an image while semantic segmentation identifies every pixel of an image with its class. Instance segmentation is a hybrid of these two segmentation techniques. It identifies object instances and their class labels, and locates the mask of each instance. In this thesis, we propose a new lightweight neural network, Long-term and Short-term Self-Attention network, LSSA, for instance segmentation. The proposed network can have some applications using devices with only small memory, such as edge computing or self-driving car, etc. It integrates the idea of the long-term and the short-term self-attention to enrich the features extracted, resulting in better performance. We also conduct several ablation experiments to show the effectiveness of our proposed network. It can achieve the compatible results on the COCO dataset while its parameters are less than several state-of-the-art models, and has a good balance between model size and performance.

    摘要 Abstract 誌謝 Chapter 1. Introduction ---------- 1 Chapter 2. Related Works -------- 3 Chapter 3. The Proposed Method -- 5 Chapter 4. Experiment ----------- 21 Chapter 5. Conclusion ----------- 32 References ---------------------- 33

    [1] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: towards real-time object detection with region proposal networks. In Proc. Adv. Neural Inf. Process. Syst., pages 91-99, 2015.
    [2] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv:1804.02767, 2018.
    [3] Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. Searching for mobilenetv3. In Proc. Int. Conf. Comp. Vis., pages 1314-1324, 2019.
    [4] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. Eur. Conf. Comp. Vis., pages 801-818, 2018.
    [5] Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. Yolact: Real-time instance segmentation. In Proc. Int. Conf. Comp. Vis., pages 9157-9166, 2019.
    [6] Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, and Youliang Yan. BlendMask: Top-down meets bottom-up for instance segmentation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 8573-8581, 2020.
    [7] Zhi Tian, Chunhua Shen, and Hao Chen. Conditional convolutions for instance segmentation. In Proc. Eur. Conf. Comp. Vis., 2020.
    [8] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 580-587, 2014.
    [9] Ross B. Girshick. Fast R-CNN. In Proc. Int. Conf. Comp. Vis., pages 1440-1448, 2015.
    [10] Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. FCOS: fully convolutional one-stage object detection. In Proc. Int. Conf. Comp. Vis., pages 9627-9636, 2019.
    [11] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask R-CNN. In Proc. Int. Conf. Comp. Vis., pages 2980-2988, 2017.
    [12] Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 7794-7803, 2018.
    [13] Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, and Liang-Chieh Chen. Axial-deeplab: Stand-alone axial-attention for panoptic segmentation, 2020.
    [14] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and ´ S. Belongie. Feature pyramid networks for object detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2117-2125, 2017.
    [15] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition. arXiv:1512.03385, 2015.
    [16] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. In Proc. Int. Conf. Comp. Vis., pages 618-626, 2017.
    [17] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia and Kaiming He. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv:1706.02677, 2017.
    [18] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-FCN: object detection via region-based fully convolutional networks. In Proc. Adv. Neural Inf. Process. Syst., pages 379-387, 2016.
    [19] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 2980-2988, 2017.
    [20] Youngwan Lee and Jongyoul Park. Centermask: Real-time anchor-free instance segmentation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., pages 13906-13915, 2020.
    [21] Jiale Cao, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang,and Ling Shao. Sipmask: Spatial information preservation for fast image and video instancesegmentation.Proc. In Proc. Eur. Conf. Comp. Vis., 2020

    QR CODE