簡易檢索 / 詳目顯示

研究生: 張益銓
Chang, Yi-Chuan
論文名稱: 對齊和非對齊之影像轉換學習
Learning Aligned and Misaligned Image-to-Image Translation
指導教授: 陳煥宗
Chen, Hwann-Tzong
口試委員: 劉庭祿
Liu, Tyng-Luh
許秋婷
Hsu, Chiou-Ting
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 50
中文關鍵詞: 圖像生成對抗生成網路手寫字生成
外文關鍵詞: Image-to-Image translation, GAN, Chinese handwriting synthesis
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文討論對齊以及非對齊圖片組的圖像轉換。在非對齊圖片組的圖像轉換 上,我們提出了相對應的中文手寫字合成問題。方法上 ,我們使用了串連式 的對抗生成網路來解決非對齊圖像與 U 型生成器的衝突,並證明此網路同 時有效地解決了非對齊圖片組造成一般對抗生成網路的模式崩塌問題。在對 齊圖像的轉換上,我們探討了深度學習模型在只看過一張訓練圖片的情況下 應該如何因應,在此,我們提出了兩步驟的模型訓練方式,補足了訓練資料 短缺造成的圖像模糊問題,在特徵提取上,也進一步獲得圖像內部群組的關 係,並證明了在我們訓練方式下所合成的圖片最受到人們的喜愛。


    This research aims to address the aligned and misaligned image-to-image translation problems. For misaligned image-to-image translation, we study the corresponding Chinese handwriting synthesize problem. We introduce the Cascaded-GAN to handle the incompatibility between U-Net and the mis- aligned training image pairs. Cascaded-GAN efficiently solves the mode col- lapsing problem. For aligned image-to-image translation, we discuss how a deep learning model may tackle the one-shot learning scenario on image trans- lation. We propose a two-step training strategy to solve the blurry image result due to the lack of training data. Furthermore, we successfully get the group- ing information when extracting features. Finally, we show that most people prefer the synthesized images from our model.

    Contents 摘要7 Abstract 8 1 Introduction 9 2 Related work 12 2.1 GenerativeAdversarialNetwork(GAN) ............................ 12 2.2 StyleTransfer.......................................... 13 2.3 ImagetoImageTranslation................................... 13 3 Image-to-Image Translation for Misaligned Image Pairs 15 3.1 Approach ............................................ 16 3.1.1 DesignofGenerator .................................. 16 3.1.2 DesignofDiscriminator ................................ 18 3.1.3 SpectralNormalization................................. 18 3.1.4 Cascaded-GAN..................................... 19 3.2 Experiments........................................... 20 3.2.1 Datasets......................................... 20 3.2.2 ImplementationDetails................................. 21 3.2.3 ComparisonwithBaseline ............................... 21 3.2.4 NumberofTrainingData................................ 22 4 One-Shot Image-to-Image Translation for Aligned Image Pairs 26 4.1 Approach ............................................ 27 4.1.1 FirstStepofTraining.................................. 27 4.1.2 SecondStepofTraining ................................ 29 4.1.3 DesignofDiscriminator ................................ 30 4.1.4 DesignofLoss ..................................... 32 4.2 Experiments........................................... 32 4.2.1 Datasets......................................... 32 4.2.2 ImplementationDetails................................. 33 4.2.3 ComparisonagainstBaseline.............................. 34 4.2.4 EvaluationMetrics ................................... 34 5 Conclusion 42 6 Bibliography 43 A More Comparison and Detail of Network Structures 47 A.1 StructureofNetworkWorks .................................. 47 A.2 MoreResultComparisonwithMUNIT............................. 48

    [1] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
    [2] S.BenaimandL.Wolf.One-shotunsupervisedcrossdomaintranslation.InAdvances in Neural Information Processing Systems, pages 2104–2114, 2018.
    [3] Y.Choi,M.Choi,M.Kim,J.-W.Ha,S.Kim,andJ.Choo.Stargan:Unifiedgenerative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8789–8797, 2018.
    [4] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understand- ing. In Proceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 3213–3223, 2016.
    [5] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convo- lutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
    [6] O. Frigo, N. Sabater, J. Delon, and P. Hellier. Split and match: Example-based adap- tive patch sampling for unsupervised style transfer. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 553–561, 2016.
    [7] L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
    [8] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016.
    [9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
    [10] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5767–5777, 2017.
    [11] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analo- gies. In Proceedings of the 28th annual conference on Computer graphics and inter- active techniques, pages 327–340. ACM, 2001.
    [12] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.
    [13] X. Huang and S. Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, pages 1501–1510, 2017.
    [14] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz. Multimodal unsupervised image-to- image translation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 172–189, 2018.
    [15] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with con- ditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
    [16] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for im- proved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
    [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
    [18] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
    [19] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Universal style transfer via feature transforms. In Advances in neural information processing systems, pages 386–396, 2017.
    [20] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation net- works. In Advances in Neural Information Processing Systems, pages 700–708, 2017.
    [21] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2794–2802, 2017.
    [22] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
    [23] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for gen- erative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
    [24] A.Radford,L.Metz,andS.Chintala.Unsupervisedrepresentationlearningwithdeep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
    [25] L. Sheng, Z. Lin, J. Shao, and X. Wang. Avatar-net: Multi-scale zero-shot style transfer by feature decoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8242–8250, 2018.
    [26] Y.Shih,S.Paris,F.Durand,andW.T.Freeman.Data-drivenhallucinationofdifferent times of day from a single outdoor photo. ACM Transactions on Graphics (TOG), 32(6):200, 2013.
    [27] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
    [28] T.-C.Wang,M.-Y.Liu,J.-Y.Zhu,A.Tao,J.Kautz,andB.Catanzaro.High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8798–8807, 2018.
    [29] X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7794–7803, 2018.
    [30] O. Wiles, A. Sophia Koepke, and A. Zisserman. X2face: A network for controlling face generation using images, audio, and pose codes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 670–686, 2018.
    [31] B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015.
    [32] R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do. Semantic image inpainting with deep generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5485–5493, 2017.
    [33] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
    [34] J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems, pages 465–476, 2017.

    QR CODE