研究生: |
陳柏鈞 Chen, Po-Chun |
---|---|
論文名稱: |
利用對抗條件自編碼器模型進行似人類草圖生成 Human-Like Sketch Synthesis by Using Conditional Auto-Encoder-Generative Adversarial Network |
指導教授: |
鐘太郎
Jong, Tai-Lang |
口試委員: |
黃裕煒
謝奇文 Hsieh, Chi-Wen |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 39 |
中文關鍵詞: | 草圖生成 、資料增強 、風格轉換 、深度學習 |
外文關鍵詞: | sketch synthesis, CAE-GAN |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Content-Based Image Retrieval (CBIR)以關鍵字做圖像檢索的方式已實行多年,但其並不能完全滿足使用者的需求,有效的解決方法之一是使用草圖形式來檢索圖片,這種方法被稱為SBIR, Sketch-Based Image Retrieval。近年,深度學習在各式各樣的影像辨識任務中展現巨大的成功,SBIR的研究也開始積極投入以深度學習的方式來增進對草圖的辨識能力,但由於草圖的數據蒐集不易,且以人為製作草圖比起圖片內容標註更為耗時耗力,因此基於深度學習的SBIR研究往往都須先面對數據缺乏的問題,導致SBIR研究推進緩慢。鑒於上述情形,本論文探討基於深度學習的草圖資料增強,提出以CAE-GAN端到端的模型架構將原始圖片直接轉換至草圖圖像,以利後續SBIR的研究。
為了客觀分析圖片生成的結果,根據Top-1 ACC、Top-5 ACC、IS三種草圖品質判別指標去評估模型生成的好壞,本論文提出之CAE-GAN平均具有52.28%的Top-1 ACC、72.29%的Top-5 ACC、75.0948的IS,可證實該方法可用於草圖資料增強的應用中,在SBIR的草圖前處理階段提供可能的資料增強手段。
綜上所述,本論文提出之方法具備以下優點: 其一,可於現有的草圖數據集進行草圖的擴增,節省草圖資料蒐集上的寶貴人力與時間;其二,使用對抗式的做法訓練模型,相較於只使用固定損失函數的自編碼器能夠產生出視覺上更令人接受的草圖;其三,在亮度及角度些有不同的情況下,依然能保持穩定的水準。
Content-Based Image Retrieval (CBIR) has been studied for many years by using keywords to retrieve images, but it cannot fully meet the needs of users. One of the effective solutions is to retrieve images in the form of sketches, called Sketch-Based Image Retrieval (SBIR). In recent years, deep learning has shown great success in various image recognition tasks, and SBIR research has also begun to actively exploit deep learning to improve the ability to recognize sketches. Making sketches is more time-consuming and labor-intensive than labeling image content. Therefore, SBIR research based on deep learning often has to face the problem of lack of data first, resulting in slow progress in SBIR research. In view of the above situation, this thesis discusses the augmentation of sketch data based on deep learning, and proposes the CAE-GAN end-to-end model architecture to directly convert the original image to the sketch image to facilitate subsequent SBIR research.
In order to objectively analyze the results of image generation, the quality of the model is evaluated according to three sketch error metrics, namely the Top-1 ACC, Top-5 ACC, and IS. The CAE-GAN proposed in this thesis has an average of 52.28% Top-1 ACC, Top-5 ACC of 72.29%, and IS of 75.0948, which can confirm that the proposed method can be used in the application of sketch data augmentation, and provide possible data augmentation method in the sketch pre-processing stage of SBIR.
In summary, the proposed method has the following advantages. First, it can expand sketches in the existing sketch data set, saving valuable manpower and time in collecting sketch data; second, it uses an adversarial approach to training the model which can produce more visually acceptable sketches than autoencoders that only use a fixed loss function; third, it can maintain a stable level in the case of different brightness and rotation angles.
[1] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta and R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1349-1380, 12 2000.
[2] T. Kato, T. Kurita, N. Otsu and K. Hirata, “A sketch retrieval method for full color image database.,” 11th IAPR International Conference on Pattern Recognition, pp. 530-533, 1992.
[3] Kaiyue, P., et al., "Solving mixed-modal jigsaw puzzle for fine-grained sketch-based image retrieval," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[4] Ayan Kumar Bhunia, et al., “More photos are all you need: semi-supervised learning for fine-grained sketch based image retrieval,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
[5] Sasikiran Yelamarthi, et al., “A Zero-Shot Framework for Sketch-based Image Retrieval,” Proceedings of the European Conference on Computer Vision (ECCV), 2018.
[6] Leon A. Gatys, et al., “A Neural Algorithm of Artistic Style,” 2015.
[7] Jun-Yan Zhu, et al., “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” Proceedings of the IEEE international conference on computer vision, 2017.
[8] G. E. Hinton and R. R. Salakhutdinov., "Reducing the dimensionality of data with neural networks," science, vol. 313, no. 5786, pp. 504-507, 2006.
[9] Ian J. Goodfellow, et al., Generative Adversarial Networks, 2014.
[10] Mirza, Mehdi, and Simon Osindero, “Conditional generative adversarial nets.,” 2014.
[11] Nair, Vinod, and Geoffrey E. Hinton, “Inferring motor programs from images of handwritten digits,” 於 Advances in neural information processing systems, 2005.
[12] Y Li, YZ Song, et al., "Free-hand sketch synthesis with deformable stroke models," International Journal of Computer Vision, vol. 122, no. 1, pp. 169-190, 2017.
[13] Ha, David, and Douglas Eck, “A neural representation of sketch drawings,” 2017.
[14] Ge, Songwei, et al., “Creative sketch generation,” 2020.
[15] Warren S. McCulloch and Walter Pitts, "A logical calculus of the ideas immanent in nervous activity," The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115-133, 1943.
[16] ujjwalkarn, Introduction to Neural Networks, 2016.
[17] Y. Lecun, et al., "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[18] Shyamal Patel, J.P., Introduction to Deep Learning: What Are Convolutional Neural Networks?, 2017.
[19] Dumoulin, V. and F.J.a.p.a. Visin, A guide to convolution arithmetic for deep learning., 2016.
[20] Zeiler, M.D. and R. Fergus., “Visualizing and understanding convolutional networks.,” European conference on computer vision, 2014.
[21] Alex Krizhevsky, et al., "ImageNet Classification with Deep Convolutional," Advances in neural information processing systems, vol. 25, 2012.
[22] Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.
[23] "Stanford University CS231n: Convolutional Neural Networks for Visual Recognition.," [Online].
[24] Kaiming He, et al., “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[25] Martin Arjovsky, et al., "Wasserstein generative adversarial networks," International conference on machine learning, 2017.
[26] Ishaan Gulrajani, et al., Improved Training of Wasserstein GANs, 2017.
[27] Sergey Ioffe and Christian Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," International conference on machine learning, 2015.
[28] Dmitry Ulyanov, et al., Instance Normalization: The Missing Ingredient for Fast Stylization, 2016.
[29] Xun Huang and Serge Belongie, “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization,” Proceedings of the IEEE international conference on computer vision, 2017.
[30] Justin Johnson, et al., “Perceptual losses for real-time style transfer and super-resolution,” European conference on computer vision, 2016.
[31] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convolutional Networks for Biomedical.,” In International Conference on Medical image computing and computer-assisted intervention, pp. 234-241, 10 2015.
[32] Patsorn Sangkloy, et al., "The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies," ACM Transactions on Graphics(TOG), vol. 35, no. 4, 2016.
[33] Jia Deng, et al., “ImageNet: A large-scale hierarchical image database,” 2009 IEEE conference on computer vision and pattern recognition, 2009.
[34] Connor Shorten and Taghi M. Khoshgoftaar, "A survey on Image Data Augmentation for Deep Learning," Journal of big data, vol. 6, no. 1, pp. 1-48, 2019.
[35] Luis Perez and Jason Wang, The Effectiveness of Data Augmentation in Image Classification using Deep Learning, 2017.