研究生: |
江愷笙 Chiang, Kai-Sheng |
---|---|
論文名稱: |
基於改進殘差注意模組的U型生成器於生成對抗網路的影像去模糊 Improved-RAM: UNet Generator with Improved Residual Attention Module for Single Image Deblurring |
指導教授: |
張隆紋
Chang, Long-Wen |
口試委員: |
邱瀞德
Chiu, Ching-Te 陳朝欽 Chen, Chaur-Chin |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2019 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 48 |
中文關鍵詞: | 深度學習 、神經網路 、影像去模糊 、注意力機制 、生成對抗網路 |
外文關鍵詞: | deep learning, neural network, deblurring, attention, generative adversarial network |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著生成對抗性網路(generative adversarial network)的發展,我們現在常將影像去模糊視為是一種風格轉換(style transfer)的問題,將影像從模糊的風格轉換為清晰的風格。但是生成對抗性網路常常因為訓練時的不穩定性以及生成樣本單一性的問題而為人詬病。另一方面,注意機制(attention method)現在也被廣泛應用在各個深度學習的網路裡面以用來取出以及幫助網路專注在重要的特徵上面。
在本篇論文中,我們提出了 “ DGIRAM ” ,其使用了加入全域跳躍連接(global skip connection)的U型網路於生成對抗式網路中,並在每層殘差模塊(residual block)裡加入改進的殘差注意模組(improved residual attention module)來還原清晰的影像。我們提出的improved residual attention module (IRAM)相較於先前的注意機制可以達到更好的結果。我們也透過在attention map上面做統計來驗證我們的假設。我們將譜正歸化(spectral normalization)的方式加在生成器(generator)以及辨別器(discriminator)上而非利用梯度懲罰(gradient penalty)的方式來使生成對抗網路的辨別器符合1-Lipschitz的約束並縮短訓練網路所需要的時間。
從量化的分析來看,我們的架構在PSNR上達到了29.93,SSIM則達到了0.8922。而從生成圖片的素質來比較,我們的架構能還原出邊緣更明顯以及細節更多的圖片,並且透過譜正歸化的方式,我們可以用更少的訓練時間來達到更佳的結果。
With the development of generative adversarial network (GAN), many works nowadays have modeled the single image deblurring problem as the style transfer problem, which transfers the image from a blurred domain to a sharp domain. However, GAN is notorious due to the instability of its training phase and it is likely to have mode collapsing problem. On the other hand, attention method is widely used in many deep learning network in order to help the network focus on important features.
In this work, we propose “DGIRAM”, which is a generative adversarial network structure using U-shape generator with a global skip connection combined with an improved residual attention module in every residual block for single image deblurring. We show that our proposed improved residual attention module (IRAM) performs better than previous attention mechanisms. On the other hand, we also show the statistical results of attention map to verify our assumption. Instead of using gradient penalty to stabilize the GAN training, we add spectral normalization on both generator and discriminator to satisfy the Lipschitz constraint for discriminator and further improve the training speed.
Our proposed network achieves 29.93 for peak signal to noise ratio (PSNR) and 0.8922 for structural similarity (SSIM) index. For the qualitative analysis, our network restores sharper edges and finer details than other single-scaled image deblurring works. Our network also reduces the training time by adopting the spectral normalization method.
[1] S. Nah, T. H. Kim and K. M. Lee, “Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3883-3891.
[2] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin and J. Matas, “DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8183-8192.
[3] P. Isola, J. Zhu, T. Zhou and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1125-1134.
[4] O. Ronneberger, P. Fischer and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in International Conference on Medical Image Computing and Computer-assisted Intervention, 2015, pp. 234-241.
[5] X. Tao, H. Gao, Y. Wang, X. Shen, J. Wang and J. Jia, “Scale-recurrent Network for Deep Image Deblurring,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8174-8182.
[6] J. Johnson, A. Alahi and Fei-Fei. Li, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution,” in European Conference on Computer Vision, 2016, pp. 694-711.
[7] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, “Generative Adversarial Nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672-2680.
[8] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin and A. Courville, “Improved Training of Wasserstein GANs,” in Advances in Neural Information Processing Systems, 2017, pp. 5767-5777.
[9] T. Miyato, T. Kataoka, M. Koyama and Y. Yoshida, “Spectral Normalization for Generative Adversarial Networks,” arXiv preprint arXiv: 1802.05957, 2018.
[10] J. Hu, L. Shen and G. Sun, “Squeeze-and-Excitation Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132-7141.
[11] S. Woo, J. Park, J-Y. Lee and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 3-19.
[12] J-H. Kim, J-H. Choi, M. Cheon and J-S. Lee, “RAM: Residual Attention Module for Single Image Super-Resolution,” arXiv preprint arXiv: 1811.12043, 2018.
[13] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
[14] H. Zhang, I. Goodfellow, D. Metaxas and A. Odena, “Self-Attention Generative Adversarial Networks,” arXiv preprint arXiv: 1805.08318, 2018.
[15] S. Ramakrishnan, S. Pachori, A. Gangopadhyay and S. Raman, “Deep Generative Filter for Motion Deblurring,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2993-3000.
[16] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman, “Removing camera shake from a single photograph,” In ACM Transactions on Graphics (TOG), ACM, 2006, pp. 787-794.
[17] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 769-777.
[18] M. Arjovsky, S. Chintala and L. Bottou, “Wasserstein GAN,” arXiv preprint arXiv: 1701.07875, 2017.
[19] Y. Hu, J. Li, Y. Huang and X. Gao, “Channel-wise and spatial feature modulation network for single image super-resolution,” arXiv preprint arXiv: 1809.11130, 2018.
[20] W-S. Lai, J-B. Huang, Z. Hu, N. Ahuja and M-H. Yang, “A comparative study for single image blind deblurring,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1701-1709.
[21] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv: 1804.02767, 2018.
[22] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv: 1411.1784, 2014.
[23] X. Mao, C. Shen and Y. B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” in Advances in Neural Information Processing Systems, 2016, pp. 2802-2810.
[24] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, 2014.
[25] J. Y. Zhu, T. Park, P. Isola and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” In Proceedings of the IEEE Conference on Computer Vision, 2017, pp. 2223-2232.
[26] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and X. Chen, “Improved techniques for training gans,” in Advances in Neural Information Processing Systems, 2016, pp. 2234-2242.
[27] Y. Chen, Y. Kalantidis, J. Li, S. Yan and J. Feng, “A^ 2-Nets: Double Attention Networks,” in Advances in Neural Information Processing Systems, 2018, pp. 352-361.
[28] M. Noroozi, P. Chandramouli and P. Favaro, “Motion deblurring in the wild,” in German conference on pattern recognition, Springer, Cham, 2017, pp. 65-77.
[29] Z. Shen, W. S. Lai, T. Xu, J. Kautz and M. H. Yang, “Deep semantic face deblurring,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8260-8269.
[30] A. Chakrabarti, “A neural approach to blind motion deblurring,” in European Conference on Computer Vision, Springer, Cham, 2016, pp. 221-235.