基於廣泛風格與關注特徵進行域泛化的風格擴充方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	林哲緯 Lin, Tse-Wei
論文名稱：	基於廣泛風格與關注特徵進行域泛化的風格擴充方法 Style Augmentation for Domain Generalization using Diverse Styles and Attention Content Features
指導教授：	許秋婷 Hsu, Chiou-Ting
口試委員:	王聖智 Wang, Sheng-Jyh 邵皓強 Shao, Hao-Chiang
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2020
畢業學年度：	109
語文別：	英文
論文頁數：	34
中文關鍵詞：	風格擴充、風格轉換、資料擴充、域泛化、關注模塊、廣泛風格
外文關鍵詞：	Style Augmentation, Style Transfer, Data Augmentation, Domain Generalization, Attention Module, Diverse Styles
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

域泛化旨在通過從多個源域學習到的特徵表示，在所有看不見的目標域中進行泛化。因無法在訓練過程中獲得目標域的資料的情況下，資料擴充方法是其中一種可以有效幫助神經網絡泛化到目標域的方法。在本文中，我們專注於圖像分類任務，並提出了一種用於域泛化的風格資料擴充方法。本篇論文所提出的方法包括三個主要思想以生成擴充數據。首先，我們從多個源域中提取各種風格以增加增強數據的多樣性，並包括從二進制邊緣圖像中提取的額外風格，以進一步提高泛化能力。其次，為保留類別區分的圖像內容，我們的風格擴充方法包含一個關注模塊，以關注圖像中的前景對象。第三，通過分類器共同指導風格增強模型和分類網絡，我們進一步提升了我們的風格擴充方法在未知目標域上的精確度和泛化能力。在多個基準跨域數據集上的實驗結果顯示，我們的方法優於以前的方法。

Domain generalization aims to generalize to any unseen target domain by adapting the feature representation learned from multiple source domains. Since the target domain is unavailable during the training stage, data augmentation is one of the solutions that helps the neural network generalized to the target domain. In this thesis, we focus on domain generalization for image classification task and propose a style augmentation method. The proposed method includes three cooperative ideas to generate augmented data. First, we extract various styles from multiple source domains to increase the diversity of the augmented data and we also include an additional style extracted from binary edge images to further improve the generalization capability. Second, to preserve the class-dicriminative image content, the proposed method includes an attention module which focuses on the foreground object of the images. Third, the classification performance and generalization capability are boosted by simultaneously training the style augmentation model and the classification network with the classifier. In this thesis, we construct several experimental results on different cross-domain benchmark datasets and show that the proposed method significantly outperforms previous methods.

中文摘要i
Abstract ii
Introduction 1
Related work 6
Proposed method 8
1 Problem Statement and Motivations . . . . . . . . . . . . . . . . . . 8
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Style Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1 Style sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Attention Module . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Content Features . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Style Features . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Class-discriminative Features . . . . . . . . . . . . . . . . . 16
2.6 Loss term of Style Augmentation . . . . . . . . . . . . . . . 16
3 Augmented Latent Data Visualization . . . . . . . . . . . . . . . . . 16
4 Overall Framework and Objective Function . . . . . . . . . . . . . . 17
Experiments 19
1 Domain Generalization Benchmarks . . . . . . . . . . . . . . . . . . 19
2 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Cross-domain image classification on PACS dataset . . . . . . . . . . 23
5 Cross-domain image classification on VLCS dataset . . . . . . . . . . 24
6 Cross-domain image classification on MUS-10 dataset . . . . . . . . 26
Conclusion 31
References 32
                                

[1] F. M. Carlucci, A. D’Innocente, S. Bucci, B. Caputo, and T. Tommasi, “Domain generalization by solving jigsaw puzzles,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[2] Z. Ding and Y. Fu, “Deep domain generalization with structured low rank constraint,”IEEE Transactions on image processing, vol. 27, no. 1, 2018.
[3] M. Ghifary, W. B. Kleijn, M. Zhang, and D. Balduzzi, “Domain generalization for object recognition with multi-task autoencoders,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.
[4] H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[5] K. Muandet, D. Balduzzi, and B. Schölkopf, “Domain generalization via invariant feature representation,” in Proceedings of the IEEE International Conference on Machine Learning (ICML), 2013.
[6] M. M. Rahman, C. Fookes, M. Baktashmotlagh, and S. Sridharan, “Multicomponent image translation for deep domain generalization,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2019.
[7] H. Wang, Z. He, Z. C. Lipton, and E. P. Xing, “Learning robust representations by projecting superficial statistics out,” in Proceedings of the International Conference on Learning Representation (ICLR), 2019.
[8] Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and D. Tao, “Deep domain generalization via conditional invariant adversarial networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018.
[9] R. Gong, W. Li, Y. Chen, and L. V. Gool, “Dlow: Domain flow for adaptation and generalization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[10] Y. Chen, T. W. Lin, and C. T. Hsu, “Towards a universal appearance for domain generalization via adversarial learning,” in Asian Conference of Pattern Recognition (ACPR), 2019.
[11] D. Li, Y. Yang, Y. Z. Song, and T. M. Hospedales, “Learning to generalize: meta-learning for domain generalization,” in AAAI Conference on Artificial Intelligence, 2018.
[12] D. Li, J. Zhang, Y. Yang, C. Liu, Y. Z. Song, and T. Hospedales, “Episodic training for domain generalization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
[13] M. Mancini, S. R. Bulo, B. Caputo, and E. Ricci, “Best sources forward: domain generalization through source-specific nets,” in IEEE International Conference on Image Processing (ICIP), 2018.
[14] T. Matsuura and T. Harada, “Domain generalization using a mixture of multiple latent domains.,” in AAAI, pp. 11749–11756, 2020.
[15] A. Anoosheh, E. Agustsson, R. Timofte, and L. V. Gool, “Combogan: Unrestrained scalability for image domain translation,” 2017. arXiv:1712.06909.
[16] A. Carlson, K. A. Skinner, R. Vasudevan, and M. Johnson-Roberson, “Sensor transfer: Learning optimal sensor effect image augmentation for sim-to-real domain adaptation,” CoRR, vol. abs/1809.06256, 2018.
[17] R. Volpi, P. Morerio, S. Savarese, and V. Murino, “Adversarial feature augmentation for unsupervised domain adaptation,” CoRR, vol. abs/1711.08561, 2017.
[18] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” 2017. arXiv preprint arXiv:1702.05464.
[19] X. Peng and K. Saenko, “Synthetic to real adaptation with generative correlation alignment networks,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1982–1991, IEEE, 2018.
[20] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[21] Y. Li, N. Wang, J. Liu, and X. Hou, “Demystifying neural style transfer,” arXiv preprint arXiv:1701.01036, 2017.
[22] Y. Zhang, Z. Qiu, T. Yao, D. Liu, and T. Mei, “Fully convolutional adaptation networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6810–6818, 2018.
[23] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang,
and X. Tang, “Residual attention network for image classification,” CoRR, vol. abs/1704.06904, 2017.
[24] S. Woo, J. Park, J.-Y. Lee, and I. So Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision
(ECCV), pp. 3–19, 2018.
[25] J. Canny, “A computational approach to edge detection,” IEEE Transactions on pattern analysis and machine intelligence, no. 6, pp. 679–698, 1986.
[26] D. Li, Y. Yang, Y. Z. Song, and T. Hospedales, “Deeper, broader and artier domain generalization,” in Proceedings of the IEEE International Conference on
Computer Vision (ICCV), 2017.
[27] A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in CVPR 2011, pp. 1521–1528, IEEE, 2011.
[28] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object category dataset,” 2007.
[29] P. Sangkloy, N. Burnell, C. Ham, and J. Hays, “The sketchy database: learning to retrieve badly drawn bunnies,” ACM Transactions on Graphics (TOG), vol. 35, no. 4, pp. 1–12, 2016.
[30] M. Eitz, J. Hays, and M. Alexa, “How do humans sketch objects?,” ACM Transactions on graphics (TOG), vol. 31, no. 4, pp. 1–10, 2012.
[31] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, p. 303–338, 2010.
[32] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “Labelme: a
database and web-based tool for image annotation,” International journal of computer vision, vol. 77, no. 1, pp. 157–173, 2008.
[33] M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky, “Exploiting hierarchical context on a large database of object categories,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 129–136, 2010.
[34] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278–2324, Nov 1998.
[35] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, vol. 1. Springer series in statistics New York, 2001.
[36] Y. Netzer, T. Wang, A. Coates, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” Advances in Neural Information Processing Systems, 01 2011.
[37] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F. F. Li, “Imagenet: A largescale hierarchical image database,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[38] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014. arXiv preprint arXiv:1412.6980.
[39] L. V. D. Maaten, “Barnes-hut-sne,” 2013. arXiv:1301.3342.
[40] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell,“Decaf: A deep convolutional activation feature for generic visual recognition,” in Proceedings of the IEEE International Conference on Machine Learning (ICML), 2014.

簡易檢索 / 詳目顯示

相關論文