簡易檢索 / 詳目顯示

研究生: 楊朝凱
Yang,Chao-Kai
論文名稱: 用於影像超分辨率的雙尺度融合注意網絡
Dual-Scale Fusion Attention Network For Image Super-resolution
指導教授: 張隆紋
Chang, Long-Wen
黃慶育
Huang, Chin-Yu
口試委員: 陳朝欽
Chen, Chaur-Chin
陳永昌
Chen, Yong-Chang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 42
中文關鍵詞: 超解析度超分辨率雙尺度注意力
外文關鍵詞: Dual scale
相關次數: 點閱:41下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影像超分辨率技術是重要的技術之一,被廣泛應用在計算機視覺和醫學成像等領域。近年來,深度學習已成為圖像超分辨率的常用方法,其中卷積神經網絡(CNN)和Transformer是常用的視覺骨幹結構。然而,這兩種方法都存在限制。因此,提出了一種新的線性注意力機制——大核注意力(LKA),以結合卷積和自注意力機制的優點。然而,LKA仍然存在由於固定尺寸卷積核而帶來的限制。為解決這個問題,我們提出了一種雙大核注意力(DLKA)模塊,使用LKA在不同尺度上,並結合剩餘通道注意力(RCA)形成了一種新的注意力機制——剩餘雙重注意塊(RDAB)。同時,我們還引入了一種基於門控機制的前饋網絡——AGDFNN,在減少參數和計算量的同時保持性能。將兩者組合,我們提出了一種新穎的輕量級超分辨率神經網絡——雙尺度融合注意力網絡(DSFAN),繼承了CNN和Transformer網絡的優點,改善了它們的缺點。
    我們提出的DSFAN與最新輕量級模型的比較。可視化及PSNR/SSIM的結果顯示出DSFAN在恢復影像細節、處理紋理細膩度和精確度方面優於其他模型。我們把這種卓越的性能歸功於DSFAN具有不同的感受野在不同尺度上都能有很好的適應性,使其能夠做出更精確的判斷。令人印象深刻的是,我們的方法在具有顯著較少參數的情況下表現出卓越的性能,超越了其他最新的輕量級模型。


    Image super-resolution is one of the important techniques widely used in computer vision and medical imaging. In recent years, deep learning has become a common method for image super-resolution, with Convolutional Neural Networks (CNNs) and Transformers being common vision backbone structures. However, both methods have limitations. Therefore, an attention mechanism called Large Kernel Attention (LKA) was proposed to combine the advantages of convolution and self-attention mechanisms. However, LKA still has limitations due to fixed size convolution kernels. To address this issue, we propose a Dual Large Kernel Attention (DLKA) module, which uses LKA at different scales and combines with Residual Channel Attention (RCA) to form a novel attention mechanism called Residual Dual Attention Block (RDAB). Additionally, we introduce a feed-forward network based on gate mechanism called AGDFNN, which maintains performance while reducing parameters and computational complexity. Combining the two, we propose a novel lightweight super-resolution neural network called Dual-Scale Fusion Attention Network (DSFAN), inheriting the advantages of CNN and Transformer networks and improving their drawbacks.
    We show a comparison between our proposed DSFAN and the latest lightweight models. The results of visualization and PSNR/SSIM show that DSFAN is superior to other models in restoring image details, handling texture finesse and precision. We attribute this outstanding performance to the adaptability of DSFAN having different receptive fields at different scales, allowing for more precise judgments. Impressively, our approach performs excellently with significantly fewer parameters, surpassing other latest lightweight models.

    Chapter 1. Introduction 1 Chapter 2. Related Works 5 Chapter 3. The Proposed Method 8 Chapter 4. Experiment Results 22 Chapter 5. Conclusion 37 References 38

    [1] C. Dong, C. C. Loy, K. He, and X. Tang, "Image super-resolution using deep convolutional networks," IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, 2016.
    [2] J. Kim, J. K. Lee, and K. M. Lee, "Accurate image super-resolution using very deep convolutional networks," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 1646–1654.
    [3] C. Ledig et al., "Photo-realistic single image super-resolution using a generative adversarial network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, Jul. 21-26, 2017, pp. 105–114.
    [4] Y. Zhang et al., "Image super-resolution using very deep residual channel attention networks," in Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, 2018, vol. 11211, pp. 294–310.
    [5] X. Chen et al., "Activating More Pixels in Image Super-Resolution Transformer," arXiv preprint arXiv:2205.04437, 2022.
    [6] J. Liang et al., "Swinir: Image restoration using swin transformer," in IEEE/CVF International Conference on Computer Vision Workshops, Montreal, Canada, 2021, pp. 1833–1844.
    [7] S. Xie et al., "Aggregated residual transformations for deep neural networks," in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 1492–1500.
    [8] S. Woo et al., "Cbam: Convolutional block attention module," in Eur. Conf. Comput. Vis., 2018, pp. 3–19.
    [9] S. Woo et al., "Cbam: Convolutional block attention module," in Eur. Conf. Comput. Vis., 2018, pp. 3–19.
    [10] Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., & Hu, Q. (2020). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 12809-12818
    [11] M. H. Guo et al., "Visual Attention Network," arXiv preprint arXiv:2202.09741, 2022.
    [12] J. Kim, J. K. Lee, and K. M. Lee, "Deeply-recursive convolutional network for image super-resolution," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 1637–1645.
    [13] B. Lim et al., "Enhanced deep residual networks for single image super-resolution," in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 1132–1140.
    [14] W. S. Lai et al., "Deep laplacian pyramid networks for fast and accurate super-resolution," in 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 5835–5843.
    [15] T. Dai et al., "Second-order attention network for single image super-resolution," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 11065–11074.
    [16] S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "CBAM: Convolutional Block Attention Module," in Proceedings of the European Conference on Computer Vision (ECCV), Cham, Switzerland, Sep. 8-14, 2018, pp. 3-19.
    [17] K. Wu, Z. Li, and K. Q. Weinberger, "Early Convolutions Help Transformers See Better," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, Jun. 19-25, 2021, pp. 10796-10805.
    [18] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1874-1883.
    [19] W. Li, K. Zhou, L. Qi, N. Jiang, J. Lu, and J. Jia, "Lapar: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond," arXiv preprint arXiv:2105.10422, 2021.
    [20] L. Sun, J. Pan, and J. Tang, "ShuffleMixer: An Efficient ConvNet for Image Super-Resolution," in Advances in Neural Information Processing Systems, 2022.
    [21] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," in Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017, vol. 70 of Proceedings of Machine Learning Research, pp. 933–941.
    [22] X. Chu, L. Chen, and W. Yu, "NAFSSR: Stereo Image Super-Resolution Using NAFNet," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June, 2022, pp. 1239-1248.
    [23] S. W. Zamir et al., "Restormer: Efficient transformer for high-resolution image restoration," arXiv preprint arXiv:2111.09881, 2021.
    [24] Y. Wang, Y. Li, G. Wang, and X. Liu, "Multi-scale Attention Network for Single Image Super-Resolution," arXiv preprint arXiv:2209.14145, 2022.
    [25] B. Lim et al., "Enhanced deep residual networks for single image super-resolution," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
    [26] R. Timofte, E. Agustsson, L. Van Gool, M. Yang, and L. Zhang, "Ntire 2017 challenge on single image super-resolution: Methods and results," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 114-125.
    [27] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, "Low-complexity single-image super-resolution based on nonnegative neighbor embedding," in Proceedings of the IEEE International Conference on Image Processing, 2012, pp. 1261-1264.
    [28] R. Zeyde, M. Elad, and M. Protter, "On single image scale-up using sparse representations," in Proceedings of the International Conference on Curves and Surfaces, 2010, pp. 711-730.
    [29] D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics," in Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV), vol. 2, 2001, pp. 416-423.
    [30] J.-B. Huang, A. Singh, and N. Ahuja, "Single image super-resolution from transformed self-exemplars," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5197-5206.
    [31] S. Baruchello and A. Foi, "Bicubic Interpolation of Digital Images," Signal Processing, vol. 41, no. 1, pp. 95-109, Jan. 1995.
    [32] C. Dong, C. C. Loy, and X. Tang, "Accelerating the super-resolution convolutional neural network," in Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 391-407.
    [33] N. Ahn, B. Kang, and K.-A. Sohn, "Fast, accurate, and lightweight super-resolution with cascading residual network," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 252-268.
    [34] X. Chu, B. Zhang, H. Ma, R. Xu, and Q. Li, "Fast, accurate and lightweight super-resolution with neural architecture search," in Proceedings of the International Conference on Pattern Recognition (ICPR), 2020, pp. 59–64.
    [35] Y. Tai, J. Yang, and X. Liu, "Image super resolution via deep recursive residual network," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2790-2798.
    [36] Y. Tai, J. Yang, X. Liu, and C. Xu, "Memnet: A persistent memory network for image restoration," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4539-4547.
    [37] K. Zhang, W. Zuo, and L. Zhang, "Learning a single convolutional super resolution network for multiple degradations," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3262-3271.
    [38] Z. Hui, X. Wang, and X. Gao, "Fast and accurate single image super resolution via information distillation network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 723-731.

    QR CODE