簡易檢索 / 詳目顯示

研究生: 韓 磊
Han, Lei
論文名稱: 多尺度型生成對抗網路於視頻解模糊
Multi-scale GAN for Video Deblurring
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 邱瀞德
Chiu, Ching-Te
許秋婷
Hsu, Chiu-Ting
學位類別: 碩士
Master
系所名稱:
論文出版年: 2018
畢業學年度: 107
語文別: 英文
論文頁數: 37
中文關鍵詞: 生成对抗网路解模糊
外文關鍵詞: GAN, deblur
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習的快速發展,通過一個個深度模型,越來越多的視覺問題得以解決。對於視頻解模糊這個任務,已經有一些深度模型被提出。與此同時,生成對抗網路也由於它強大的能力而被廣泛應用到許多場景。在這篇論文中,我們將傳統方法中經典的多尺度結構與生成對抗網路相結合以來解決視頻解模糊問題。從試驗結果中可以發現,無論是從數值度量上還是視覺直觀上,我們的模型都比一些其他的深度模型算法更加穩定。最後,我們也從不同的角度分析了我們的模型的優越性。


    With the rapid development of deep learning, more and more vision problems can be solved by using deep neural network models. Several deep models have been released to handle video deblurring task. At the same time, generative adversarial networks (GANs) are also widely used in many kinds of problems for its strength. In this thesis, we connect the classical multi-scale structure in traditional vision methods with GAN for video deblurring. The quantitative and qualitative results of our experiments demonstrate the proposed model can restore frames more robustly than the state-of-the-art video deblurring deep methods. We also justify the superiority of our model from different perspectives.

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Description . . . . . . . . . . . . . . . . . . . . 2 1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . 3 2 Related works . . . . . . . . . . . . . . . . . . . .. . . . . 5 2.1 Image/Video Deblurring . . . . . . . . . . . . . . . . . . . 5 2.2 Generative Adversarial Networks .. . . . . . . . . . . . . . 6 3 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 Network Architecture . . . . . . . . . . . . . . . . . . . . 9 3.1.1 Generator . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.2 Discriminator . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Loss function . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Training details . . . . . . . . . . . . . . . . . . . . . 16 4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Comparison . . . . . . . . . . . . .. . . . . . . . . . . . 20 4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . 24 4.4.1 Loss Function and Hyperparameters . . . . . . . . . . . . 24 4.4.2 Network structure . . . . . . . . . . . . . . . . . . . . 26 5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 34 References . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    [1] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan:
    Blind motion deblurring using conditional adversarial networks,” CoRR,
    vol. abs/1711.07064, 2017.
    [2] Y. Chen, Y.-K. Lai, and Y.-J. Liu, “Cartoongan: Generative adversarial networks
    for photo cartoonization,” 2018.
    [3] P. Wieschollek, M. Hirsch, B. Schölkopf, and H. Lensch, “Learning blind motion
    deblurring,” in Proceedings IEEE International Conference on Computer Vision
    (ICCV), (Piscataway, NJ, USA), pp. 231–240, IEEE, Oct. 2017.
    [4] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” CoRR,
    vol. abs/1709.01507, 2017.
    [5] A. He, C. Luo, X. Tian, and W. Zeng, “A twofold siamese network for real-time
    object tracking,” CoRR, vol. abs/1802.08817, 2018.
    [6] S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network
    for dynamic scene deblurring,” in The IEEE Conference on Computer Vision and
    Pattern Recognition (CVPR), July 2017.
    [7] A. Chakrabarti, “A neural approach to blind motion deblurring,” in ECCV, 2016.
    [8] S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep video
    deblurring for hand-held cameras,” in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 1279–1288, 2017.
    [9] T. H. Kim, K. M. Lee, B. Scholkopf, and M. Hirsch, “Online Video Deblurring via
    Dynamic Temporal Blending Network,” Proceedings of the IEEE International
    Conference on Computer Vision, vol. 2017-October, pp. 4058–4067, 2017.
    [10] P. Wieschollek, B. Schölkopf, H. P. A. Lensch, and M. Hirsch, “End-to-end learning
    for image burst deblurring,” in Computer Vision - ACCV 2016 - 13th Asian
    Conference on Computer Vision, vol. 10114 of Image Processing, Computer Vision,
    Pattern Recognition, and Graphics, pp. 35–51, Springer, 2017.
    [11] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation,
    vol. 9, pp. 1735–1780, 1997.
    [12] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating
    blind deconvolution algorithms,” 2009 IEEE Conference on Computer Vision and
    Pattern Recognition, pp. 1964–1971, 2009.
    [13] D. Krishnan and R. Fergus, “Fast image deconvolution using hyper-laplacian priors,”
    in NIPS, 2009.
    [14] T. F. Chan and C.-K. Wong, “Total variation blind deconvolution,” IEEE transactions
    on image processing : a publication of the IEEE Signal Processing Society,
    vol. 7 3, pp. 370–5, 1998.
    [15] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
    A. C. Courville, and Y. Bengio, “Generative adversarial networks,” CoRR,
    vol. abs/1406.2661, 2014.
    [16] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. A. Cunningham, A. Acosta, A. P.
    Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image
    super-resolution using a generative adversarial network,” 2017 IEEE Conference
    on Computer Vision and Pattern Recognition (CVPR), pp. 105–114, 2017.
    [17] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for
    improved quality, stability, and variation,” CoRR, vol. abs/1710.10196, 2017.
    [18] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “Highresolution
    image synthesis and semantic manipulation with conditional gans,”
    CoRR, vol. abs/1711.11585, 2017.
    [19] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with
    conditional adversarial networks,” 2017 IEEE Conference on Computer Vision and
    Pattern Recognition (CVPR), pp. 5967–5976, 2017.
    [20] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation
    using cycle-consistent adversarial networks,” 2017 IEEE International Conference
    on Computer Vision (ICCV), pp. 2242–2251, 2017.
    [21] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,”
    in ICML, 2017.
    [22] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved
    training of wasserstein gans,” in NIPS, 2017.
    [23] X. Xu, D. Sun, J. Pan, Y. Zhang, H. Pfister, and M.-H. Yang, “Learning to superresolve
    blurry face and text images,” 2017 IEEE International Conference on
    Computer Vision (ICCV), pp. 251–260, 2017.
    [24] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
    biomedical image segmentation,” in MICCAI, 2015.
    [25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
    2016 IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), pp. 770–778, 2016.
    [26] A. L. Maas, “Rectifier nonlinearities improve neural network acoustic models,”
    2013.
    [27] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Instance normalization: The missing
    ingredient for fast stylization,” CoRR, vol. abs/1607.08022, 2016.
    [28] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training
    by reducing internal covariate shift,” in ICML, 2015.
    [29] X. Wei, B. Gong, Z. Liu, W. Lu, and L. Wang, “Improving the improved
    training of wasserstein gans: A consistency term and its dual effect,” CoRR,
    vol. abs/1803.01541, 2018.
    [30] M. Arjovsky and L. Bottou, “Towards principled methods for training generative
    adversarial networks,” CoRR, vol. abs/1701.04862, 2017.
    [31] Y. Wu et al., “Tensorpack.” https://github.com/tensorpack/, 2016.
    [32] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado,
    A. Davis, J. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving,
    M. Isard, Y. Jia, R. Józefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané,
    R. Monga, S. Moore, D. G. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner,
    I. Sutskever, K. Talwar, P. A. Tucker, V. Vanhoucke, V. Vasudevan, F. B. Viégas,
    O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “Tensorflow:
    Large-scale machine learning on heterogeneous distributed systems,” CoRR,
    vol. abs/1603.04467, 2015.
    [33] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR,
    vol. abs/1412.6980, 2014.
    [34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
    image recognition,” CoRR, vol. abs/1409.1556, 2014.
    [35] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-
    Scale Hierarchical Image Database,” in CVPR09, 2009.
    [36] X. Tao, H. Gao, Y. Wang, X. Shen, J. Wang, and J. Jia, “Scale-recurrent network
    for deep image deblurring,” CoRR, vol. abs/1802.01770, 2018.

    QR CODE