簡易檢索 / 詳目顯示

研究生: 劉康軍
Liu, Kang-Jun
論文名稱: 保留區域語意之快速影像生成
Fast Region-Semantics Preserving Image Synthesis
指導教授: 吳尚鴻
Wu, Shan-Hung
口試委員: 彭文孝
Peng, Wen-Hsiao
林嘉文
Lin, Chia-Wen
陳煥宗
Chen, Hwann-Tzong
賴尚宏
Lai, Shang-Hong
學位類別: 碩士
Master
系所名稱:
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 14
中文關鍵詞: 影像生成
外文關鍵詞: GAN
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 我們將探討區域語意保留之影像生成問題。給定一張參考影像和標定好的區域,而我們的目標是訓練一個模型能根據該區域的語意,保留其資訊並生成多樣且真實的影像。這是一個具有挑戰性的問題,首先模型必須能理解和保留來自參考區域的邊界語意。所謂邊界語意就是去除掉所有子區域語意後的語意。再者,影像中在參考區域外的部分也必須維持和參考區域內之語意的關聯性。在這篇論文中,我們提出了一個新的模型稱作快速區域語意保留者來解決區域語意保留之影像生成問題。這個模型使用一個預先訓練好的生成對抗模型和一個預先訓練好的深層特徵擷取模型來產生圖片,而且除了這兩個模型外,不需要額外其他訓練時間。這使我們的模型非常適合作為互動式應用。我們做了大量的實驗在真實世界的資料集上,而結果顯示我們的模型能有效率的根據語意保留之影像生成問題,產生真實且多樣的影像。


    We study the problem of region-semantics preserving (RSP) image synthesis. Given a reference image and a region specification R, our goal is to train a model that is able to generate realistic and diverse images, each preserving the same semantics as that of the reference image within the region R. This problem is challenging because the model needs to (1) understand and preserve the marginal semantics of the reference region; i.e., the semantics excluding that of any subregion; and (2) maintain the compatibility of any synthesized region with the marginal semantics of the reference region. In this paper, we propose a novel model, called the fast region-semantics preserver (Fast-RSPer), for the RSP image synthesis problem. The Fast-RSPer uses a pre-trained GAN generator and a pre-trained deep feature extractor to generate images without undergoing a dedicated training phase. This makes it particularly useful for the interactive applications. We conduct extensive experiments using the real-world datasets and the results show that Fast-PSPer can synthesize realistic, diverse RSP images efficiently.

    1 Introduction 3 2 Related Work 4 3 Fast RSP Image Synthesis 6 4 Cross-Domain RSP Image Synthesis 7 5 Experiments 8 5.1 Results Given Single-Object Regions . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.2 Results Given Complex Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.3 Quantitative Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.4 Semantics Preserving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.5 Effect of l and Gradient Noises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.6 Cross-Domain RSP Image Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6 Conclusion 13 7 Reference 13

    [1] A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin, and M. Cohen.
    Interactive digital photomontage. In ACM Transactions on Graphics (TOG), volume 23, pages 294–302.
    ACM, 2004. 1, 3
    [2] J. Ball´e, V. Laparra, and E. P. Simoncelli. Density modeling of images using a generalized normalization
    transformation. arXiv preprint arXiv:1511.06281, 2015. 1
    [3] D. Berthelot, T. Schumm, and L. Metz. Began: Boundary equilibrium generative adversarial networks.
    arXiv preprint arXiv:1703.10717, 2017. 5
    [4] A. Dosovitskiy, J. Tobias Springenberg, and T. Brox. Learning to generate chairs with convolutional
    neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
    pages 1538–1546, 2015. 2
    [5] L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. arXiv preprint
    arXiv:1508.06576, 2015. 2
    [6] L. A. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis and the controlled generation of natural stimuli
    using convolutional neural networks. arXiv preprint arXiv:1505.07376, 12, 2015. 1
    [7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A. Courville, and Y. Bengio.
    Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680,
    2014. 1, 1, 3
    [8] K. Gregor, I. Danihelka, A. Graves, D. Rezende, and D. Wierstra. Draw: A recurrent neural network for
    image generation. In International Conference on Machine Learning, pages 1462–1471, 2015. 1
    [9] Y. G¨uc¸l¨ut¨urk, U. G¨uc¸l¨u, R. van Lier, and M. A. van Gerven. Convolutional sketch inversion. In European
    Conference on Computer Vision, pages 810–824. Springer, 2016. 2
    [10] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of wasserstein
    gans. arXiv preprint arXiv:1704.00028, 2017. 5
    [11] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. In Proceedings
    of the 28th annual conference on Computer graphics and interactive techniques, pages 327–340. ACM,
    2001. 2
    [12] S. Iizuka, E. Simo-Serra, and H. Ishikawa. Globally and locally consistent image completion. ACM
    Transactions on Graphics (TOG), 36(4):107, 2017. 1
    [13] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial
    networks. arXiv preprint arXiv:1611.07004, 2016. 1, 1, 4
    [14] D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
    1, 2
    [15] J. E. Kyprianidis, J. Collomosse, T. Wang, and T. Isenberg. State of the ”art”: A taxonomy of artistic
    stylization techniques for images and video. IEEE transactions on visualization and computer graphics,
    19(5):866–885, 2013. 2
    [16] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther. Autoencoding beyond pixels using a
    learned similarity metric. In International Conference on Machine Learning, pages 1558–1566, 2016. 3,
    5
    [17] C. Ledig, L. Theis, F. Husz´ar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz,
    Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network.
    arXiv preprint arXiv:1609.04802, 2016. 1
    [18] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised
    learning of hierarchical representations. In Proceedings of the 26th annual international conference
    on machine learning, pages 609–616. ACM, 2009. 1
    [19] M. Mirza and S. Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784,
    2014. 2
    [20] A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune. Synthesizing the preferred inputs for
    neurons in neural networks via deep generator networks. In Advances in Neural Information Processing
    Systems, pages 3387–3395, 2016. 1
    [21] A. Nguyen, J. Yosinski, Y. Bengio, A. Dosovitskiy, and J. Clune. Plug & play generative networks:
    Conditional iterative generation of images in latent space. arXiv preprint arXiv:1612.00005, 2016. 3
    [22] A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. In International
    Conference on Machine Learning, pages 1747–1756, 2016. 1, 2, 5
    [23] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional
    generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015. 1
    [24] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image
    synthesis. In International Conference on Machine Learning, pages 1060–1069, 2016. 2
    [25] R. Salakhutdinov and G. Hinton. Deep boltzmann machines. In Artificial Intelligence and Statistics,
    pages 448–455, 2009. 1
    [26] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for
    training gans. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016. 5
    [27] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition.
    arXiv preprint arXiv:1409.1556, 2014. 5
    [28] C. K. Sønderby, T. Raiko, L. Maaløe, S. K. Sønderby, and O. Winther. Ladder variational autoencoders.
    In Advances in Neural Information Processing Systems, pages 3738–3746, 2016. 1, 2, 5
    [29] G. Toderici, D. Vincent, N. Johnston, S. J. Hwang, D. Minnen, J. Shor, and M. Covell. Full resolution
    image compression with recurrent neural networks. arXiv preprint arXiv:1608.05148, 2016. 1
    [30] A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. Conditional image generation
    with pixelcnn decoders. In Advances in Neural Information Processing Systems, pages 4790–4798, 2016.
    1, 2, 5
    [31] X. Yan, J. Yang, K. Sohn, and H. Lee. Attribute2image: Conditional image generation from visual
    attributes. In European Conference on Computer Vision, pages 776–791. Springer, 2016. 2
    [32] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In European conference
    on computer vision, pages 818–833. Springer, 2014. 3
    [33] T. Zhou, S. Tulsiani, W. Sun, J. Malik, and A. A. Efros. View synthesis by appearance flow. In European
    Conference on Computer Vision, pages 286–301. Springer, 2016. 2
    [34] J.-Y. Zhu, P. Kr¨ahenb¨uhl, E. Shechtman, and A. A. Efros. Generative visual manipulation on the natural
    image manifold. In European Conference on Computer Vision, pages 597–613. Springer, 2016. 2
    [35] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent
    adversarial networks. arXiv preprint arXiv:1703.10593, 2017. 1, 1, 4

    QR CODE