簡易檢索 / 詳目顯示

研究生: 葛東昇
Ko, Tung-Sheng
論文名稱: 一種單圖像實例分割的高效核網路
EK-Net: An Efficient Kernel Network for single image instance segmentation
指導教授: 張隆紋
Chang, Long-Wen
口試委員: 陳朝欽
Chen, Chaur-Chin
陳永昌
Chen, Yung-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 28
中文關鍵詞: 電腦視覺實例分割深度學習
外文關鍵詞: Computer Vision, Instance Segmentation, Deep Learning
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 過去十年,圖像分割在電腦視覺領域裡是一項熱門且頗具挑戰性的任
    務。由於深度學習的快速發展,越來越多的方法增加了它的模型的準確度也降
    低了它的訓練時間和模型的複雜度。因為實時的模型和高速運算的出現,圖像
    分割也變成高科技產品的主要技術像是自駕車或是醫療影像檢測。最近,高科
    技產品更仰賴高效能的模型。在此論文中,在本文中,我們提出了一種基於 KNet 的高效、輕量級的單圖像實例分割模型,稱為 EK-Net。我們提出了一個參
    數較少的高效語義特徵金字塔結構作為我們的編碼器。在內核細化架構中,我
    們添加了跳躍連接以增強內核特徵的傳播,並通過內核產生掩碼質量分數。在
    推理階段,我們使用掩碼質量分數重新計算分類分數以提高性能。為了評估模
    型,我們添加了邊界掩模損失來加強掩模邊界表示。 EK-Net 以更少的參數、
    更快的推理速度和 0.3AP (平均精度)擊敗了 K-Net。


    Last ten years, image segmentation becomes one of the most popular and
    challenging tasks in computer vision. Due to the rapid development of deep learning,
    more and more methods not only improved its model accuracy but also reduce its
    training time and model complexity. It has become the main technology in high-tech
    products such as self-driving and medical image detection. Recently, high-tech
    products rely on efficient models. In this thesis, we propose an efficient and lightweight
    model for single image instance segmentation named Efficient Kernel Network (EKNet) which is based on K-Net (Kernel Net) [1]. We propose an Efficient Semantic FPN
    with fewer parameters as our encoder. In the kernel refinement architecture, we add a
    skip connection to enhance the propagation in kernel features and produce the mask
    quality score by the kernels. In the inference stage, we recalculate the classification
    score with the mask quality score to improve the performance. To evaluate the model,
    we add the boundary mask loss to strengthen the mask boundary representation. Our
    EK-Net beats K-Net [1] 0.3AP (Average Precision) with fewer parameters and faster
    inference speed.

    Chapter 1. Introduction 1 Chapter 2. Related Works 3 Chapter 3. The Proposed Method 5 Chapter 4. Experiment Results 15 Chapter 5. Conclusion 23 References 24

    [1] Zhang, W., Pang, J., Chen, K., & Loy, C.C. (2021). K-Net: Towards Unified
    Image Segmentation. NeurIPS.
    [2] Girshick, R.B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature
    Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014
    IEEE Conference on Computer Vision and Pattern Recognition, 580-587.
    [3] Lin, T., Goyal, P., Girshick, R.B., He, K., & Dollár, P. (2017). Focal Loss for
    Dense Object Detection. 2017 IEEE International Conference on Computer
    Vision (ICCV), 2999-3007.
    [4] He, K., Gkioxari, G., Dollár, P., & Girshick, R.B. (2017). Mask R-CNN. 2017
    IEEE International Conference on Computer Vision (ICCV), 2980-2988.
    [5] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image
    Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), 770-778.
    [6] Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., & Belongie, S.J. (2017).
    Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on
    Computer Vision and Pattern Recognition (CVPR), 936-944.
    [7] Bolya, D., Zhou, C., Xiao, F., & Lee, Y.J. (2019). YOLACT: Real-Time Instance
    Segmentation. 2019 IEEE/CVF International Conference on Computer Vision
    (ICCV), 9156-9165.
    [8] Wang, X., Kong, T., Shen, C., Jiang, Y., & Li, L. (2020). SOLO: Segmenting
    Objects by Locations. ArXiv, abs/1912.04488.
    [9] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S.
    (2020). End-to-End Object Detection with Transformers. ArXiv, abs/2005.12872.
    [10] Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., & Girdhar, R. (2021). Maskedattention Mask Transformer for Universal Image Segmentation. ArXiv,
    abs/2112.01527.
    [11] Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
    Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. ArXiv,
    abs/1706.03762.
    [12] Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networ ks for
    Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine
    Intelligence, 39, 640-651.
    [13] Kirillov, A., He, K., Girshick, R.B., Rother, C., & Dollár, P. (2019). Panoptic
    Segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern
    Recognition (CVPR), 9396-9405.
    [14] Cheng, T., Wang, X., Chen, S., Zhang, W., Zhang, Q., Huang, C., Zhang, Z., &Liu,
    W. (2022). Sparse Instance Activation for Real-Time Instance
    Segmentation. ArXiv, abs/2203.12827.
    [15] Huang, Z., Huang, L., Gong, Y., Huang, C., & Wang, X. (2019). Mask Scoring RCNN. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
    (CVPR), 6402-6411.
    [16] Agarap, A.F. (2018). Deep Learning using Rectified Linear Units (ReLU). ArXiv,
    abs/1803.08375.
    [17] Ba, J., Kiros, J.R., & Hinton, G.E. (2016). Layer Normalization. ArXiv,
    abs/1607.06450.
    [18] Milletari, F., Navab, N., & Ahmadi, S. (2016). V-Net: Fully Convolutional Neural
    Networks for Volumetric Medical Image Segmentation. 2016 Fourth
    International Conference on 3D Vision (3DV), 565-571.
    [19] Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu,
    Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu,
    R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., & Lin, D. (2019).
    MMDetection: Open MMLab Detection Toolbox and Benchmark. ArXiv,
    abs/1906.07155.
    [20] Stewart, R., Andriluka, M., & Ng, A. (2016). End-to-End People Detection in
    Crowded Scenes. 2016 IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), 2325-2333.
    [21] Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P.,
    & Zitnick, C.L. (2014). Microsoft COCO: Common Objects in Context. ECCV.
    [22] Kirillov, A., Girshick, R.B., He, K., & Dollár, P. (2019). Panoptic Feature Pyramid
    Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern
    Recognition (CVPR), 6392-6401.
    [23] Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid Scene Parsing
    Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), 6230-6239.
    [24] Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). ImageNet: A
    large-scale hierarchical image database. 2009 IEEE Conference on Computer
    Vision and Pattern Recognition, 248-255.
    [25] Cheng, T., Wang, X., Huang, L., & Liu, W. (2020). Boundary-preserving Mask
    R-CNN. ECCV.
    [26] Wang, X., Zhang, R., Kong, T., Li, L., & Shen, C. (2020). SOLOv2: Dynamic,
    Faster and Stronger. ArXiv, abs/2003.10152.
    [27] Loshchilov, I., & Hutter, F. (2017). Fixing Weight Decay Regularization in
    Adam. ArXiv, abs/1711.05101

    QR CODE