研究生: |
方姿婷 Fang, Tzu-Ting |
---|---|
論文名稱: |
基於生成對抗網路的不同風格線稿著色 Stylized Colorization for Line-Art with Generative Adversarial Networks |
指導教授: |
賴尚宏
Lai, Shang-Hong |
口試委員: |
朱宏國
Chu, Hung-Kuo 李哲榮 Lee, Che-Rung 王昱舜 Wang, Yu-Shuen |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 43 |
中文關鍵詞: | 深度學習 、生成對抗網路 、著色 |
外文關鍵詞: | Deep Learning, Generative Adversarial Network, Colorization |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
上色風格占繪畫的過程中很重要的一部分,不論是觀賞繪本、漫畫或是各種形式的繪畫作品,讀者們都能特過上色風格感受到作者想表達的不同氛圍。隨著深度網路技術的發展,自動著色語影像風格轉換方法相繼被提出,然而在現有的方法中並沒有辦法進行端到端不同風格的著色,深度網路不僅同時需要學習著色與風格兩種不同的目標,生成多個領域的圖像轉換也是一項具挑戰性的工作。
在此篇論文,我們提出了一個基於條件生成對抗網路的風格著色模型,我們的生成網路由一個編碼器與一個解碼器所組成,編碼器致力於學習未上色影像的高維特徵資訊,解碼器則是透過這些特徵與我們輸入的風格條件資訊,生成出具有上色風格的著色結果,而在我們的判別網路中,我們提出兩個判別器的架構,一個辨別生成影像的著色,另一個辨別風格的部分,使我們的生成網路能更有效率地同時學習著色與風格兩個目標。實驗結果顯示我們單一的模型能在多領域的生成獲得優異的著色結果,比起先著色再進行風格轉換,我們的模型可以透過更簡單的操作直接生成出結果。
The development of automatic line-art colorization has achieved great improvement after researchers proposed to apply the Generative Adversarial Networks (GANs) to this problem. Using different ways to colorize the line art will generate the illustrations of different styles. Different coloring styles can be used in various situations and scenarios. We can imagine that the styles for picture books, comics, and animations are all distinct. Different coloring styles of illustration could cause different types of feeling to the readers.
In this paper, we focus on a new problem of stylized colorization which is to colorize an input line-art by using a specific coloring style. This problem can be considered to be a multi-domain image translation problem. We propose an end-to-end adversarial network for stylized colorization where the model consists of one generator and two discriminators. Our Generator receives a pair of a line-art and a coloring style as its input to produce stylized-colorization image of the line-art. Two discriminators, on the other hand, judge the stylized-colorization images in two different aspects: one for colorization, and one for coloring style. One generator and two discriminators are jointly trained in an adversarial and end-to-end manner. Extensive experiments demonstrate superior colorization results by using the proposed model in comparison with the previous methods.
[1] Pixiv. https://www.pixiv.net/.
[2] Anonymous, community, D., Branwen, G., and Gokaslan, A. Danbooru2018:
A large-scale crowdsourced and tagged anime illustration dataset. https://
www.gwern.net/Danbooru2018, January 2019.
[3] Chen, W., and Hays, J. Sketchygan: Towards diverse and realistic sketch to
image synthesis. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (2018), pp. 9416–9425.
[4] Choi, Y., Choi, M., Kim, M., Ha, J.-W., Kim, S., and Choo, J. Stargan: Unified
generative adversarial networks for multi-domain image-to-image translation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (2018), pp. 8789–8797.
[5] Ci, Y., Ma, X., Wang, Z., Li, H., and Luo, Z. User-guided deep anime line art
colorization with conditional adversarial networks. In 2018 ACM Multimedia
Conference on Multimedia Conference (2018), ACM, pp. 1536–1544.
[6] Frans, K. Outline colorization through tandem adversarial networks. arXiv
preprint arXiv:1704.08834 (2017).
[7] Furusawa, C., Hiroshiba, K., Ogaki, K., and Odagiri, Y. Comicolorization:
semi-automatic manga colorization. In SIGGRAPH Asia 2017 Technical Briefs
(2017), ACM, p. 12.
[8] Gatys, L. A., Ecker, A. S., and Bethge, M. Image style transfer using convolutional
neural networks. In Proceedings of the IEEE conference on computer
vision and pattern recognition (2016), pp. 2414–2423.
[9] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in
neural information processing systems (2014), pp. 2672–2680.
[10] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image
recognition. In CVPR (2016).
[11] Hensman, P., and Aizawa, K. cgan-based manga colorization using a single
training image. In 2017 14th IAPR International Conference on Document
Analysis and Recognition (ICDAR) (2017), vol. 3, IEEE, pp. 72–77.
[12] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans
trained by a two time-scale update rule converge to a local nash equilibrium. In
Advances in Neural Information Processing Systems (2017), pp. 6626–6637.
[13] Huang, X., and Belongie, S. Arbitrary style transfer in real-time with adaptive
instance normalization. In Proceedings of the IEEE International Conference
on Computer Vision (2017), pp. 1501–1510.
[14] Iizuka, S., Simo-Serra, E., and Ishikawa, H. Let there be color!: joint end-toend
learning of global and local image priors for automatic image colorization
with simultaneous classification. ACM Transactions on Graphics (TOG) 35,
4 (2016), 110.
[15] Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. Image-to-image translation
with conditional adversarial networks. In Proceedings of the IEEE conference
on computer vision and pattern recognition (2017), pp. 1125–1134.
[16] Johnson, J., Alahi, A., and Fei-Fei, L. Perceptual losses for real-time style
transfer and super-resolution. In European conference on computer vision
(2016), Springer, pp. 694–711.
[17] Kim, H., Jhoo, H. Y., Park, E., and Yoo, S. Tag2pix: Line art colorization
using text tag with secat and changing loss. arXiv preprint arXiv:1908.05840
(2019).
[18] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980 (2014).
[19] Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with
deep convolutional neural networks. In Advances in neural information processing
systems (2012), pp. 1097–1105.
[20] LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard,
W., and Jackel, L. D. Backpropagation applied to handwritten zip code recognition.
Neural computation 1, 4 (1989), 541–551.
[21] Li, J. Pixiv dataset. https://github.com/jerryli27/pixiv_dataset, 2017.
[22] Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., and Yang, M.-H. Universal style
transfer via feature transforms. In Advances in neural information processing
systems (2017), pp. 386–396.
[23] Lim, J. H., and Ye, J. C. Geometric gan. arXiv preprint arXiv:1705.02894
(2017).
[24] Liu, Y., Qin, Z., Wan, T., and Luo, Z. Auto-painter: Cartoon image generation
from sketch by using conditional wasserstein generative adversarial networks.
Neurocomputing 311 (2018), 78–87.
[25] lllyasviel. sketchkeras. https://github.com/lllyasviel/sketchKeras, 2017.
[26] Manjunatha, V., Iyyer, M., Boyd-Graber, J., and Davis, L. Learning to color
from language. arXiv preprint arXiv:1804.06026 (2018).
[27] Mirza, M., and Osindero, S. Conditional generative adversarial nets. arXiv
preprint arXiv:1411.1784 (2014).
[28] Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization
for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018).
[29] nagadomi. lbpcascade_animeface. https://github.com/nagadomi/
lbpcascade_animeface, 2014.
[30] Networks, P. Petalica paint. https://petalica-paint.pixiv.dev/, 2017.
[31] Radford, A., Metz, L., and Chintala, S. Unsupervised representation learning
with deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434 (2015).
[32] Ronneberger, O., Fischer, P., and Brox, T. U-net: Convolutional networks for
biomedical image segmentation. In International Conference on Medical image
computing and computer-assisted intervention (2015), Springer, pp. 234–
241.
[33] Sanakoyeu, A., Kotovenko, D., Lang, S., and Ommer, B. A style-aware content
loss for real-time hd style transfer. In Proceedings of the European Conference
on Computer Vision (ECCV) (2018), pp. 698–714.
[34] Sangkloy, P., Lu, J., Fang, C., Yu, F., and Hays, J. Scribbler: Controlling deep
image synthesis with sketch and color. In CVPR (2017).
[35] Sheng, L., Lin, Z., Shao, J., and Wang, X. Avatar-net: Multi-scale zero-shot
style transfer by feature decoration. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (2018), pp. 8242–8250.
[36] Simo-Serra, E., Iizuka, S., and Ishikawa, H. Mastering Sketching: Adversarial
Augmentation for Structured Prediction. ACM Transactions on Graphics
(TOG) 37, 1 (2018).
[37] Simo-Serra, E., Iizuka, S., Sasaki, K., and Ishikawa, H. Learning to Simplify:
Fully Convolutional Networks for Rough Sketch Cleanup. ACM Transactions
on Graphics (SIGGRAPH) 35, 4 (2016).
[38] Simonyan, K., and Zisserman, A. Very deep convolutional networks for largescale
image recognition. arXiv preprint arXiv:1409.1556 (2014).
[39] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking
the inception architecture for computer vision. In Proceedings of the IEEE
conference on computer vision and pattern recognition (2016), pp. 2818–2826.
[40] Tang, H., Xu, D., Liu, H., and Sebe, N. Asymmetric generative adversarial
networks for image-to-image translation. arXiv preprint arXiv:1912.06931
(2019).
[41] Tang, H., Xu, D., Wang, W., Yan, Y., and Sebe, N. Dual generator generative
adversarial networks for multi-domain image-to-image translation. In ACCV
(2018).
[42] Wang, X., Oxholm, G., Zhang, D., and Wang, Y.-F. Multimodal transfer: A hierarchical
deep convolutional neural network for fast artistic style transfer. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2017), pp. 5239–5247.
[43] Wen, Y., Zhang, K., Li, Z., and Qiao, Y. A discriminative feature learning approach
for deep face recognition. In European conference on computer vision
(2016), Springer, pp. 499–515.
[44] WinnemöLler, H., Kyprianidis, J. E., and Olsen, S. C. Xdog: an extended
difference-of-gaussians compendium including advanced image stylization.
Computers & Graphics 36, 6 (2012), 740–753.
[45] Xian, W., Sangkloy, P., Agrawal, V., Raj, A., Lu, J., Fang, C., Yu, F., and
Hays, J. Texturegan: Controlling deep image synthesis with texture patches. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2018), pp. 8456–8465.
[46] Xiao, Y., Zhou, P., and Zheng, Y. Interactive deep colorization with simultaneous
global and local inputs. arXiv preprint arXiv:1801.09083 (2018).
[47] Yao, Y., Ren, J., Xie, X., Liu, W., Liu, Y.-J., and Wang, J. Attention-aware
multi-stroke style transfer. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (2019), pp. 1467–1475.
[48] Zhang, L., Ji, Y., Lin, X., and Liu, C. Style transfer for anime sketches with
enhanced residual u-net and auxiliary classifier gan. In 2017 4th IAPR Asian
Conference on Pattern Recognition (ACPR) (2017), IEEE, pp. 506–511.
[49] Zhang, R., Isola, P., and Efros, A. A. Colorful image colorization. In European
conference on computer vision (2016), Springer, pp. 649–666.
[50] Zhang, R., Zhu, J.-Y., Isola, P., Geng, X., Lin, A. S., Yu, T., and Efros, A. A.
Real-time user-guided image colorization with learned deep priors. arXiv
preprint arXiv:1705.02999 (2017).
[51] Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. Unpaired image-to-image
translation using cycle-consistent adversarial networks. In Computer Vision
(ICCV), 2017 IEEE International Conference on (2017).
[52] Zou, C., Mo, H., Du, R., Wu, X., Gao, C., and Fu, H. Lucss: Languagebased
user-customized colourization of scene sketches. arXiv preprint
arXiv:1808.10544 (2018).