一種單圖像實例分割的高效核網路｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	葛東昇 Ko, Tung-Sheng
論文名稱：	一種單圖像實例分割的高效核網路 EK-Net: An Efficient Kernel Network for single image instance segmentation
指導教授：	張隆紋 Chang, Long-Wen
口試委員:	陳朝欽 Chen, Chaur-Chin 陳永昌 Chen, Yung-Chang
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	28
中文關鍵詞：	電腦視覺、實例分割、深度學習
外文關鍵詞：	Computer Vision, Instance Segmentation, Deep Learning
相關次數：	點閱：96 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

過去十年，圖像分割在電腦視覺領域裡是一項熱門且頗具挑戰性的任
務。由於深度學習的快速發展，越來越多的方法增加了它的模型的準確度也降
低了它的訓練時間和模型的複雜度。因為實時的模型和高速運算的出現，圖像
分割也變成高科技產品的主要技術像是自駕車或是醫療影像檢測。最近，高科
技產品更仰賴高效能的模型。在此論文中，在本文中，我們提出了一種基於 KNet 的高效、輕量級的單圖像實例分割模型，稱為 EK-Net。我們提出了一個參
數較少的高效語義特徵金字塔結構作為我們的編碼器。在內核細化架構中，我
們添加了跳躍連接以增強內核特徵的傳播，並通過內核產生掩碼質量分數。在
推理階段，我們使用掩碼質量分數重新計算分類分數以提高性能。為了評估模
型，我們添加了邊界掩模損失來加強掩模邊界表示。 EK-Net 以更少的參數、
更快的推理速度和 0.3AP (平均精度)擊敗了 K-Net。

Last ten years, image segmentation becomes one of the most popular and
challenging tasks in computer vision. Due to the rapid development of deep learning,
more and more methods not only improved its model accuracy but also reduce its
training time and model complexity. It has become the main technology in high-tech
products such as self-driving and medical image detection. Recently, high-tech
products rely on efficient models. In this thesis, we propose an efficient and lightweight
model for single image instance segmentation named Efficient Kernel Network (EKNet) which is based on K-Net (Kernel Net) [1]. We propose an Efficient Semantic FPN
with fewer parameters as our encoder. In the kernel refinement architecture, we add a
skip connection to enhance the propagation in kernel features and produce the mask
quality score by the kernels. In the inference stage, we recalculate the classification
score with the mask quality score to improve the performance. To evaluate the model,
we add the boundary mask loss to strengthen the mask boundary representation. Our
EK-Net beats K-Net [1] 0.3AP (Average Precision) with fewer parameters and faster
inference speed.

Chapter 1. Introduction 1
Chapter 2. Related Works 3
Chapter 3. The Proposed Method 5
Chapter 4. Experiment Results 15
Chapter 5. Conclusion 23
References 24
                                

[1] Zhang, W., Pang, J., Chen, K., & Loy, C.C. (2021). K-Net: Towards Unified
Image Segmentation. NeurIPS.
[2] Girshick, R.B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature
Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014
IEEE Conference on Computer Vision and Pattern Recognition, 580-587.
[3] Lin, T., Goyal, P., Girshick, R.B., He, K., & Dollár, P. (2017). Focal Loss for
Dense Object Detection. 2017 IEEE International Conference on Computer
Vision (ICCV), 2999-3007.
[4] He, K., Gkioxari, G., Dollár, P., & Girshick, R.B. (2017). Mask R-CNN. 2017
IEEE International Conference on Computer Vision (ICCV), 2980-2988.
[5] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image
Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 770-778.
[6] Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., & Belongie, S.J. (2017).
Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 936-944.
[7] Bolya, D., Zhou, C., Xiao, F., & Lee, Y.J. (2019). YOLACT: Real-Time Instance
Segmentation. 2019 IEEE/CVF International Conference on Computer Vision
(ICCV), 9156-9165.
[8] Wang, X., Kong, T., Shen, C., Jiang, Y., & Li, L. (2020). SOLO: Segmenting
Objects by Locations. ArXiv, abs/1912.04488.
[9] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S.
(2020). End-to-End Object Detection with Transformers. ArXiv, abs/2005.12872.
[10] Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., & Girdhar, R. (2021). Maskedattention Mask Transformer for Universal Image Segmentation. ArXiv,
abs/2112.01527.
[11] Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. ArXiv,
abs/1706.03762.
[12] Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networ ks for
Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 39, 640-651.
[13] Kirillov, A., He, K., Girshick, R.B., Rother, C., & Dollár, P. (2019). Panoptic
Segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 9396-9405.
[14] Cheng, T., Wang, X., Chen, S., Zhang, W., Zhang, Q., Huang, C., Zhang, Z., &Liu,
W. (2022). Sparse Instance Activation for Real-Time Instance
Segmentation. ArXiv, abs/2203.12827.
[15] Huang, Z., Huang, L., Gong, Y., Huang, C., & Wang, X. (2019). Mask Scoring RCNN. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 6402-6411.
[16] Agarap, A.F. (2018). Deep Learning using Rectified Linear Units (ReLU). ArXiv,
abs/1803.08375.
[17] Ba, J., Kiros, J.R., & Hinton, G.E. (2016). Layer Normalization. ArXiv,
abs/1607.06450.
[18] Milletari, F., Navab, N., & Ahmadi, S. (2016). V-Net: Fully Convolutional Neural
Networks for Volumetric Medical Image Segmentation. 2016 Fourth
International Conference on 3D Vision (3DV), 565-571.
[19] Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu,
Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu,
R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., & Lin, D. (2019).
MMDetection: Open MMLab Detection Toolbox and Benchmark. ArXiv,
abs/1906.07155.
[20] Stewart, R., Andriluka, M., & Ng, A. (2016). End-to-End People Detection in
Crowded Scenes. 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2325-2333.
[21] Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P.,
& Zitnick, C.L. (2014). Microsoft COCO: Common Objects in Context. ECCV.
[22] Kirillov, A., Girshick, R.B., He, K., & Dollár, P. (2019). Panoptic Feature Pyramid
Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 6392-6401.
[23] Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid Scene Parsing
Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 6230-6239.
[24] Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). ImageNet: A
large-scale hierarchical image database. 2009 IEEE Conference on Computer
Vision and Pattern Recognition, 248-255.
[25] Cheng, T., Wang, X., Huang, L., & Liu, W. (2020). Boundary-preserving Mask
R-CNN. ECCV.
[26] Wang, X., Zhang, R., Kong, T., Li, L., & Shen, C. (2020). SOLOv2: Dynamic,
Faster and Stronger. ArXiv, abs/2003.10152.
[27] Loshchilov, I., & Hutter, F. (2017). Fixing Weight Decay Regularization in
Adam. ArXiv, abs/1711.05101

簡易檢索 / 詳目顯示

相關論文