簡易檢索 / 詳目顯示

研究生: 吳彥儀
Wu, Yen-Yi
論文名稱: 基於條件生成對抗網路之語意感知的互動式影像操作
Semantic-Aware Interactive Image Manipulation with Conditional Generative Adversarial Networks
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 劉庭祿
Liu, Tyng-Luh
黃思皓
Huang, Szu-Hao
李哲榮
Lee, Che-Rung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 37
中文關鍵詞: 深度學習生成對抗網路條件生成對抗網路影像操作
外文關鍵詞: Deep Learning, Generative Adversarial Networks, Conditional Generative Adversarial Networks, Image Manipulation
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 影像操作是一項具挑戰性的工作,因為不只需要對影像內容、風格的了解,還需要使被編輯的內容與未編輯的內容保持一致性。在此篇論文,我們提出一個條件生成對抗網路模型來輔助使用者以簡單的步驟如筆刷和橡皮擦來編輯複雜的影像。

    我們的模型是一個編碼器─解碼器的構造:編碼器藉由同時生成圖像分割來產生具語意資訊的高維特徵圖讓使用者編輯,而解碼器則是從編輯後的特徵圖生成擬真的影像。從重建原圖於特徵圖的實驗可知我們的高維特徵圖比起特徵向量更能表現多物件影像之風格。

    最後,我們建立了一個基於此方法的互動式影像編輯工具,並提供一些與其他舊有方法在模型訓練以及影像編輯的過程之比較來展現我們的方法在真實影像的操作上更具優勢。


    Image manipulation is a challenging task because it requires not only understanding of the semantic content and style of the images but also skills of keeping the modification semantically consistent with the unmodified parts. In this thesis, we propose a conditional GAN model to assist users in manipulating complicated images with simple operations like brushes and erasers.

    Our model is an encode-decoder structure, in which the encoder generates high dimensional feature maps corresponding to semantic information with the help of a segmentation branch for users to manipulate and the decoder produces realistic images from the modified feature maps. Experiments of reconstructing from the features demonstrate that our high dimensional feature maps can better represent the style of images of multiple objects than latent vectors.

    Finally, we build an interactive image editing application based on our approach and provide some comparisons on the processes of model training and the image editing results with other previous works to demonstrate the proposed method gives superior results for manipulating real images.

    1 Introduction 1 1.1 Motivation 1 1.2 Problem Statement 2 1.3 Contributions 2 1.4 Thesis Organization 3 2 Related Work 4 2.1 Generative Adversarial Networks 4 2.2 Conditional GANs 4 2.3 Photorealistic Image Synthesis 5 2.3.1 CRN 5 2.3.2 pix2pixHD 5 2.3.3 SPADE 6 2.4 Interactive Image Editing 6 2.4.1 Locating and ablating units 6 2.4.2 Encoder-Decoder 7 2.4.3 Image Translation 7 2.4.4 Image Completion 8 3 Method 9 3.1 Network Architecture 10 3.1.1 Encoder 10 3.1.2 Color-wise Averaging 11 3.1.3 Generator and Discriminator 13 3.2 Objective Functions 14 3.2.1 Segmentation Loss 14 3.2.2 Adversarial Loss 14 3.2.3 Reconstruction Loss 15 3.2.4 Perceptual Loss 15 3.2.5 Full Objective 16 4 Interactive Editing 17 4.1 Editing Pipeline 17 4.2 Datasets 18 4.3 Edit Transfer 19 4.4 Interactive Application 20 4.5 Editing Results 21 5 Experiments 25 5.1 Implementation Details 25 5.2 Editing Comparison 25 5.2.1 Comparison with SPADE 26 5.2.2 Comparison with SC-FEGAN 29 5.3 Comparison of Training and Testing Processes 30 5.4 Ablation Study 32 6 Conclusions 35 References 36

    [1] Bau, D., Zhu, J.-Y., Strobelt, H., Bolei, Z., Tenenbaum, J. B., Freeman, W. T., and Torralba, A. Gan dissection: Visualizing and understanding generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR) (2019).
    [2] Chen, Q., and Koltun, V. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV)) (2017).
    [3] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).
    [4] Denton, E. L., Chintala, S., Fergus, R., et al. Deep generative image models using a laplacian pyramid of dversarial networks. In Proceedings of the Neural Information Processing Systems Conference (NIPS) (2015).
    [5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2672–2680.
    [6] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).
    [7] Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).
    [8] Jo, Y., and Park, J. Sc-fegan: Face editing generative adversarial network with user’s sketch and color. arXiv preprint arXiv:1902.06838 (2019).
    [9] Johnson, J., Alahi, A., and Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV) (2016).
    [10] Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. In Proceedings of the International Conference on Learning Representations (ICLR) (2017).
    [11] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [12] Kingma, D. P., and Welling, M. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR) (2014).
    [13] Lee, C.-H., Liu, Z., Wu, L., and Luo, P. Maskgan: Towards diverse and interactive facial image manipulation. arXiv preprint arXiv:1907.11922 (2019).
    [14] Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015).
    [15] Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., and Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017).
    [16] Mirza, M., and Osindero, S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
    [17] Park, T., Liu, M.-Y., Wang, T.-C., and Zhu, J.-Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
    [18] Reinhard, E., Adhikhmin, M., Gooch, B., and Shirley, P. Color transfer between images. IEEE Computer Graphics and Applications 21, 5 (2001), 34–41.
    [19] Ronneberger, O., Fischer, P., and Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2015), Springer, pp. 234–241.
    [20] Simonyan, K., and Zisserman, A. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556 (2014).
    [21] Tyleček, R., and Šára, R. Spatial pattern templates for recognition of objects with regular structure. In Proceedings of the German Conference on Pattern Recognition (GCPR) (2013).
    [22] Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., and Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018).
    [23] Yu, F., Koltun, V., and Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).
    [24] Zhu, J.-Y., Krähenbühl, P., Shechtman, E., and Efros, A. A. Generative visual manipulation on the natural image manifold. In Proceedings of the European Conference on Computer Vision (ECCV) (2016).
    [25] Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017).

    QR CODE