研究生: |
林上堯 Lin, Shang-Yao |
---|---|
論文名稱: |
混合多層門控網絡用於影像超解析度 Hybrid Hierarchical Gate Network For Image Super-resolution |
指導教授: |
張隆紋
Chang, Long-Wen 黃慶育 Huang, Chin-Yu |
口試委員: |
陳朝欽
Chen, Chaur-Chin 陳永昌 Chen, Yong-Chang |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 英文 |
論文頁數: | 52 |
中文關鍵詞: | 超解析度 |
外文關鍵詞: | Super resolution |
相關次數: | 點閱:33 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
超解析度是計算機視覺中的一項重要任務,旨在從低解析度輸入生成高 解析度圖像。本文提出了一種新穎的超解析度模型,稱為混合多層門控網絡 (HHGN)。HHGN 模型結合了淺層特徵提取、CNN 和 Transformer 架構的融 合,以及 RGB (Residual Gate Block)塊、Unit Gate Blocks(UGB)、Refined Filtering Module(RFM)、Channel Attention[1](CA)和 Swin Transformer[2] 骨幹等先進組件。
所提出的模型包括四個主要組件:淺層特徵提取、CNN 骨幹、 Transformer 骨幹和 PixelShuffle[3]圖像重建。低解析度輸入圖像通過淺層卷積 層進行特徵的萃取,再通過深層特徵提取模塊利用殘差結構通過門控機制提取 關鍵特徵。RGB 塊和 Transformer 塊組成了模型的核心,結合了 CNN 和 Transformer 架構的優勢,實現了有效的特徵提取和全局信息引用(global information referencing)。
為了重建高解析度圖像,通過 PixelShuffle[3]方法處理門控機制的輸出。 RGB 塊包括 RFM 和 UGB,通過殘差結構和可調節縮放來提煉細節和邊緣特 徵,並增強特徵表示和穩定性。CA 機制通過選擇性地強調信息豐富的特徵並 抑制不相關的特徵,增強特徵表示並捕捉關鍵信息。Swin Transformer[2]骨幹 通過自注意機制促進了有意義特徵的提取。
2
我們在 Set5[4]、Set14[5]、BSD100[6]和 Urban100[7]等主流的數據集測 試效果,使用峰值信噪比(PSNR)和結構相似性指數(SSIM)作為評估指標。 從實驗的結果可以看出,我們提出的 HHGN 模型優於現有模型,特別是在放 大倍率為 x2、x3 和 x4 時表現出色。視覺效果進一步展示了該模型在處理線條 和人臉方面具有更清晰的邊緣處理和優秀的色彩再現能力。
所提出的 HHGN (Hybrid Hierarchical Gate Network) 模型取得了最先進的性能,突顯了其在高解析度圖像生成方面的效果和潛力。它的組合架構組件和先進技術為推進計算機視覺中的超解析度研究和實際應用提供了一種不一樣的方向。
Super-resolution is a crucial task in computer vision that aims to generate high- resolution images from low-resolution inputs. In this thesis, we propose a novel super- resolution model called the Hybrid Hierarchical Gate Network (HHGN). The HHGN model combines a shallow feature extraction, a fusion of CNN and transformer architectures, and advanced components such as RGB blocks, Unit Gate Blocks (UGB), Refined Filtering Modules (RFM), Channel Attention[1] (CA), and the Swin Transformer[2] backbone.
The proposed model is composed of four essential components: a shallow feature extraction module, a CNN backbone, a transformer backbone, and the PixelShuffle[3] technique for image reconstruction. The input low-resolution image undergoes a shallow convolution layer for feature extraction, followed by a deep feature extraction module that utilizes Residual Gate Blocks to extract essential features through a gated mechanism. The RGB blocks and the transformer block form the core of the model, combining the strengths of CNN and the transformer architecture for effective feature extraction and global information referencing.
To reconstruct high-resolution images, the outputs of the gating mechanism are processed using the PixelShuffle[3] method. The RGB blocks incorporate the RFM and UGB, which refine details and edge features while enhancing feature representation and
4
stability through residual structures and adjustable scaling. The CA mechanism selectively emphasizes informative features and suppresses less relevant ones, enhancing feature representation and capturing crucial information. The Swin Transformer[2] backbone facilitates the extraction of meaningful features through a self-attention mechanism.
To measure the performance of our model, we conduct experiments on widely- used benchmark datasets, namely Set5[4], Set14[5], BSD100[6], and Urban100[7]. The evaluation is based on commonly used metrics such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM).. The results demonstrate the superiority of the proposed HHGN model over existing models, particularly for upscaling factors x2, x3, and x4. The visualizations further illustrate the model's capability in handling lines and faces with sharper edges and excellent color reproduction.
The proposed HHGN model achieves state-of-the-art performance in super- resolution tasks, highlighting its effectiveness and potential for high-quality image generation. Its combination of architectural components and advanced techniques presents a promising approach for advancing super-resolution research and applications in computer vision.
[1] J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," in Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132-7141. doi: 10.1109/CVPR.2018.00745.
[2] Z. Liu et al., "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows," in Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021, pp. 9992-10002.
[3] W. Shi, J. Caballero, F. Huszár, et al., "Real-Time Single Image and Video Super- Resolution Using an Efficient Sub-Pixel Convolutional Neural Network," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1874-1883.
[4] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, "Low-complexity single-image super-resolution based on nonnegative neighbor embedding," Signal Processing, vol. 92, no. 2, pp. 219-232, 2012.
[5] R. Zeyde, M. Elad, and M. Protter, "On single image scale-up using sparse representations," in International conference on curves and surfaces, 2010, pp. 711–730.
[6] D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A database of human segmented natural images and its application to evaluating segmentation algorithms and
47
measuring ecological statistics," in Proceedings Eighth IEEE International
Conference on Computer Vision. ICCV 2001, vol. 2, 2001, pp. 416–423.
[7] J.-B. Huang, A. Singh, and N. Ahuja, "Single image super-resolution from transformed self-exemplars," in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2015, pp. 5197–5206.
[8] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, "Enhanced deep residual networks
for single image super-resolution," in Proceedings of the IEEE conference on
computer vision and pattern recognition workshops, 2017, pp. 136–144.
[9] R. Timofte, E. Agustsson, L. Van Gool, M. Yang, and L. Zhang, "NTIRE 2017 challenge on single image super-resolution: methods and results," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops,
2017, pp. 114–125.
[10] J. Allebach and P. W. Wong, "Edge-directed interpolation," in Proceedings of the
3rd IEEE International Conference on Image Processing, 1996, vol. 3, pp. 707–
710.
[11] C. Dong, C. C. Loy, K. He, and X. Tang, "Image super-resolution using deep
convolutional networks," IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, 2016.
48
[12] J. Kim, J. K. Lee, and K. M. Lee, "Accurate image super-resolution using very deep convolutional networks," in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1646–1654.
[13] C. Ledig et al., "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network," in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 105-114. doi: 10.1109/CVPR.2017.19.
[14] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, "Enhanced deep residual networks for single image super-resolution," in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 1132–1140.
[15] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, "Residual Dense Network for Image Super-Resolution," in Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 2472-2481. doi: 10.1109/CVPR.2018.00262.
[16] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, "Deep laplacian pyramid networks for fast and accurate super-resolution," in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5835–5843.
49
[17] K. Wu, Z. Li, and K. Q. Weinberger, "Early Convolutions Help Transformers See Better," in Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10796-10805.
[18] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778.
[19] F. Baruchello and A. Foi, "Bicubic Interpolation of Digital Images," Signal Processing, vol. 41, no. 1, pp. 95-109, Jan. 1995.
[20] L. Sun, J. Pan, and J. Tang, "ShuffleMixer: An Efficient ConvNet for Image Super- Resolution," in Advances in Neural Information Processing Systems, 2022.
[21] C. Dong, C. C. Loy, and X. Tang, "Accelerating the super-resolution convolutional neural network," in Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 391–407.
[22] N. Ahn, B. Kang, and K.-A. Sohn, "Fast, accurate, and lightweight super- resolution with cascading residual network," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 252–268.
[23] J. Kim, J. K. Lee, and K. M. Lee, "Deeply-recursive convolutional network for image super-resolution," in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1637–1645.
50
[24] Y. Tai, J. Yang, and X. Liu, "Image super-resolution via deep recursive residual network," in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.2790–2798.
[25] X. Chu, B. Zhang, H. Ma, R. Xu, and Q. Li, "Fast, accurate and lightweight super- resolution with neural architecture search," in Proceedings of the International Conference on Pattern Recognition (ICPR), 2020.
[26] Y. Tai, J. Yang, X. Liu, and C. Xu, "Memnet: A persistent memory network for image restoration," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4539-4547.
[27] K. Zhang, W. Zuo, and L. Zhang, "Learning a single convolutional superresolution network for multiple degradations," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3262-3271.
[28] Z. Hui, X. Wang, and X. Gao, "Fast and accurate single image superresolution via information distillation network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 723-731.
[29] W. Li, K. Zhou, L. Qi, N. Jiang, J. Lu, and J. Jia, "Lapar: Linearly-assembled pixel- adaptive regression network for single image super-resolution and beyond," arXiv preprint arXiv:2105.10422, 2021.
51
[30] C. Subakan, M. Ravanelli, S. Cornell, M. Bronzi, and J. Zhong, "Attention Is All You Need In Speech Separation," in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 21-25. doi: 10.1109/ICASSP39728.2021.9413901.
[31] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.