利用對抗條件自編碼器模型進行似人類草圖生成

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳柏鈞 Chen, Po-Chun
論文名稱：	利用對抗條件自編碼器模型進行似人類草圖生成 Human-Like Sketch Synthesis by Using Conditional Auto-Encoder-Generative Adversarial Network
指導教授：	鐘太郎 Jong, Tai-Lang
口試委員:	黃裕煒謝奇文 Hsieh, Chi-Wen
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	39
中文關鍵詞：	草圖生成、資料增強、風格轉換、深度學習
外文關鍵詞：	sketch synthesis, CAE-GAN
相關次數：	點閱：224 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Content-Based Image Retrieval (CBIR)以關鍵字做圖像檢索的方式已實行多年，但其並不能完全滿足使用者的需求，有效的解決方法之一是使用草圖形式來檢索圖片，這種方法被稱為SBIR, Sketch-Based Image Retrieval。近年，深度學習在各式各樣的影像辨識任務中展現巨大的成功，SBIR的研究也開始積極投入以深度學習的方式來增進對草圖的辨識能力，但由於草圖的數據蒐集不易，且以人為製作草圖比起圖片內容標註更為耗時耗力，因此基於深度學習的SBIR研究往往都須先面對數據缺乏的問題，導致SBIR研究推進緩慢。鑒於上述情形，本論文探討基於深度學習的草圖資料增強，提出以CAE-GAN端到端的模型架構將原始圖片直接轉換至草圖圖像，以利後續SBIR的研究。
為了客觀分析圖片生成的結果，根據Top-1 ACC、Top-5 ACC、IS三種草圖品質判別指標去評估模型生成的好壞，本論文提出之CAE-GAN平均具有52.28%的Top-1 ACC、72.29%的Top-5 ACC、75.0948的IS，可證實該方法可用於草圖資料增強的應用中，在SBIR的草圖前處理階段提供可能的資料增強手段。
綜上所述，本論文提出之方法具備以下優點: 其一，可於現有的草圖數據集進行草圖的擴增，節省草圖資料蒐集上的寶貴人力與時間；其二，使用對抗式的做法訓練模型，相較於只使用固定損失函數的自編碼器能夠產生出視覺上更令人接受的草圖；其三，在亮度及角度些有不同的情況下，依然能保持穩定的水準。

Content-Based Image Retrieval (CBIR) has been studied for many years by using keywords to retrieve images, but it cannot fully meet the needs of users. One of the effective solutions is to retrieve images in the form of sketches, called Sketch-Based Image Retrieval (SBIR). In recent years, deep learning has shown great success in various image recognition tasks, and SBIR research has also begun to actively exploit deep learning to improve the ability to recognize sketches. Making sketches is more time-consuming and labor-intensive than labeling image content. Therefore, SBIR research based on deep learning often has to face the problem of lack of data first, resulting in slow progress in SBIR research. In view of the above situation, this thesis discusses the augmentation of sketch data based on deep learning, and proposes the CAE-GAN end-to-end model architecture to directly convert the original image to the sketch image to facilitate subsequent SBIR research.
In order to objectively analyze the results of image generation, the quality of the model is evaluated according to three sketch error metrics, namely the Top-1 ACC, Top-5 ACC, and IS. The CAE-GAN proposed in this thesis has an average of 52.28% Top-1 ACC, Top-5 ACC of 72.29%, and IS of 75.0948, which can confirm that the proposed method can be used in the application of sketch data augmentation, and provide possible data augmentation method in the sketch pre-processing stage of SBIR.
In summary, the proposed method has the following advantages. First, it can expand sketches in the existing sketch data set, saving valuable manpower and time in collecting sketch data; second, it uses an adversarial approach to training the model which can produce more visually acceptable sketches than autoencoders that only use a fixed loss function; third, it can maintain a stable level in the case of different brightness and rotation angles.

摘要 ii
Abstract iii
致謝 iv
目錄 v
圖目錄 vii
表目錄 ix
第一章 簡介 1
1 前言 1
2 文獻回顧 2
2.1 風格轉換 2
2.2 圖片生成 2
3 研究目的 5
4 論文架構 5
第二章 神經網路與深度學習 7
1 類神經網路簡介 7
1.1 人工神經網路 7
1.2 激勵函數與損失函數 7
1.3 全連接神經網路 8
1.4 卷積神經網路 8
1.5 填補 10
1.6 逆卷積 11
2 圖片分類神經網路模型 12
2.1 AlexNet 12
2.2 VGGNet 12
2.3 Resnet 14
3 對抗式生成網路模型 15
3.1 GAN 15
3.2 WGAN 16
3.3 WGAN-GP 17
4 CAE-GAN架構 18
4.1 批次標準化變體─AdaIN 18
4.2 感知損失 20
4.3 跳躍連接 20
4.4 條件輸入 21
4.5 損失函數 21
第三章 實驗方法與結果 22
1 實驗流程圖 22
2 數據資料集 22
3 錯誤評估指標 23
3.1 Top-1 & Top-5 Accuracy 23
3.2 Inception Score 24
4 資料增強 25
4.1 線性轉換 25
4.2 顏色轉換 26
5 實驗結果與討論 27
5.1 訓練過程 27
5.2 生成草圖結果 29
5.3 評估結果以及討論 34
第四章 結論與未來展望 35
1 結論 35
2 未來展望 35
參考文獻 37

                                

[1] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta and R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1349-1380, 12 2000.
[2] T. Kato, T. Kurita, N. Otsu and K. Hirata, “A sketch retrieval method for full color image database.,” 11th IAPR International Conference on Pattern Recognition, pp. 530-533, 1992.
[3] Kaiyue, P., et al., "Solving mixed-modal jigsaw puzzle for fine-grained sketch-based image retrieval," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[4] Ayan Kumar Bhunia, et al., “More photos are all you need: semi-supervised learning for fine-grained sketch based image retrieval,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
[5] Sasikiran Yelamarthi, et al., “A Zero-Shot Framework for Sketch-based Image Retrieval,” Proceedings of the European Conference on Computer Vision (ECCV), 2018.
[6] Leon A. Gatys, et al., “A Neural Algorithm of Artistic Style,” 2015.
[7] Jun-Yan Zhu, et al., “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” Proceedings of the IEEE international conference on computer vision, 2017.
[8] G. E. Hinton and R. R. Salakhutdinov., "Reducing the dimensionality of data with neural networks," science, vol. 313, no. 5786, pp. 504-507, 2006.
[9] Ian J. Goodfellow, et al., Generative Adversarial Networks, 2014.
[10] Mirza, Mehdi, and Simon Osindero, “Conditional generative adversarial nets.,” 2014.
[11] Nair, Vinod, and Geoffrey E. Hinton, “Inferring motor programs from images of handwritten digits,” 於 Advances in neural information processing systems, 2005.
[12] Y Li, YZ Song, et al., "Free-hand sketch synthesis with deformable stroke models," International Journal of Computer Vision, vol. 122, no. 1, pp. 169-190, 2017.
[13] Ha, David, and Douglas Eck, “A neural representation of sketch drawings,” 2017.
[14] Ge, Songwei, et al., “Creative sketch generation,” 2020.
[15] Warren S. McCulloch and Walter Pitts, "A logical calculus of the ideas immanent in nervous activity," The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115-133, 1943.
[16] ujjwalkarn, Introduction to Neural Networks, 2016.
[17] Y. Lecun, et al., "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[18] Shyamal Patel, J.P., Introduction to Deep Learning: What Are Convolutional Neural Networks?, 2017.
[19] Dumoulin, V. and F.J.a.p.a. Visin, A guide to convolution arithmetic for deep learning., 2016.
[20] Zeiler, M.D. and R. Fergus., “Visualizing and understanding convolutional networks.,” European conference on computer vision, 2014.
[21] Alex Krizhevsky, et al., "ImageNet Classification with Deep Convolutional," Advances in neural information processing systems, vol. 25, 2012.
[22] Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.
[23] "Stanford University CS231n: Convolutional Neural Networks for Visual Recognition.," [Online].
[24] Kaiming He, et al., “Deep Residual Learning for Image Recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[25] Martin Arjovsky, et al., "Wasserstein generative adversarial networks," International conference on machine learning, 2017.
[26] Ishaan Gulrajani, et al., Improved Training of Wasserstein GANs, 2017.
[27] Sergey Ioffe and Christian Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," International conference on machine learning, 2015.
[28] Dmitry Ulyanov, et al., Instance Normalization: The Missing Ingredient for Fast Stylization, 2016.
[29] Xun Huang and Serge Belongie, “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization,” Proceedings of the IEEE international conference on computer vision, 2017.
[30] Justin Johnson, et al., “Perceptual losses for real-time style transfer and super-resolution,” European conference on computer vision, 2016.
[31] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convolutional Networks for Biomedical.,” In International Conference on Medical image computing and computer-assisted intervention, pp. 234-241, 10 2015.
[32] Patsorn Sangkloy, et al., "The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies," ACM Transactions on Graphics(TOG), vol. 35, no. 4, 2016.
[33] Jia Deng, et al., “ImageNet: A large-scale hierarchical image database,” 2009 IEEE conference on computer vision and pattern recognition, 2009.
[34] Connor Shorten and Taghi M. Khoshgoftaar, "A survey on Image Data Augmentation for Deep Learning," Journal of big data, vol. 6, no. 1, pp. 1-48, 2019.
[35] Luis Perez and Jason Wang, The Effectiveness of Data Augmentation in Image Classification using Deep Learning, 2017.

簡易檢索 / 詳目顯示

相關論文