基於雙重域擴增進行單一域泛化語意分割｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	張書榕 Chang, Shu-Jung
論文名稱：	基於雙重域擴增進行單一域泛化語意分割 Single-Domain Generalization for Semantic Segmentation via Dual-Level Domain Augmentation
指導教授：	許秋婷 Hsu, Chiou-Ting
口試委員:	林彥宇 Lin, Yen-Yu 王聖智 Wang, Sheng-Jyh
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	30
中文關鍵詞：	單一域泛化、語意分割
外文關鍵詞：	single-domain generalization, semantic segmentation
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

單一域泛化的目的在於僅使用單一源域的設定下學習出具有域泛化性的模型。為了避免模型過擬合在源域的問題，過往文獻多專注在透過域擴增的方法來學習域泛化之特徵。因此域的多樣性對於模型的泛化能力至關重要。在本文中，我們針對單一域泛化語意分割任務提出一個新穎的雙重域擴增框架來增加域的多樣性。我們特別設計圖像等級擴增模塊以及類別等級擴增模塊來分別擴展合成圖片與各類別特徵的多樣性。
接著我們再基於原圖與合成圖像設計域泛化特徵學習，並利用大型預訓練模型來約束分割模型能學習具有代表性的特徵。在多個語意分割基準上的實驗結果顯示我們所提出方法在有效性和性能方面優越於過往的文獻。

The goal of single-domain generalization is to learn a domain-generalized model from only one single source domain. To avoid overfitting to the source domain, recent research focused on domain augmentation for learning domain generalized features. Therefore, domain diversity is indeed crucial to the generalization ability of the model.
In this paper, we propose a novel dual-level domain augmentation framework to enrich the domain diversity for single-domain generalized semantic segmentation.
We specifically devise an Image-Level and a Class-Level Augmentation Module (IAM and CAM) to enlarge the diversity of augmented images and per-class features, respectively. From the original and augmented data, we then design a Domain-Generalized Feature Learning to learn representative features regularized by a large-scale pre-trained model. Experimental results on semantic segmentation benchmarks demonstrate the effectiveness and outperformance of the proposed method over previous work.

摘要i
Abstract ii
Acknowledgements
Introduction 1
Related Work 3
1 Semantic Segmentation in Domain Generalization . . . . . . . . . . . . . . . . 3
2 Domain Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Consistency Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Method 5
1 Overview of Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Image-Level Augmentation Module (IAM) . . . . . . . . . . . . . . . . . . . 6
3 Class-Level Augmentation Module (CAM) . . . . . . . . . . . . . . . . . . . 8
3.1 Non-Learning-Based Class-Level Augmentation Module . . . . . . . . 9
3.2 Limitation in Non-Learning-Based Class-Level Augmentation Module 10
3.3 Learning-Based Class-Level Augmentation Module . . . . . . . . . . . 11
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Domain-Generalized Feature Learning . . . . . . . . . . . . . . . . . . . . . . 13
5 Training Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Experiments 15
1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1 Synthetic Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Real-World Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Protocols and Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 Comparison between Different Augmentation Modules and Losses . . 17
4.2 Comparison between Different Losses in Learning-Based Class-Level
Augmentation Module . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Comparison between Different Class-Level Augmentation Methods . . 19
4.4 Comparison between Different Combinations of Hyper-Parameters . . 19
5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.1 Synthetic-to-Real Protocol . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2 Real-to-Real Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6 Visualization and Limitation in Adverse Scenarios . . . . . . . . . . . . . . . . 21
Conclusion 27
References 28
                                

[1] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.,” Journal of machine
learning research, vol. 9, no. 11, 2008.
[2] Z. Zhong, Y. Zhao, G. H. Lee, and N. Sebe, “Adversarial style augmentation for domain
generalized urban-scene segmentation,” in Advances in Neural Information Processing
Systems (A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, eds.), 2022.
[3] Y. Zhao, Z. Zhong, N. Zhao, N. Sebe, and G. H. Lee, “Style-hallucinated dual consistency
learning for domain generalized semantic segmentation,” in Proceedings of the European
Conference on Computer Vision (ECCV), 2022.
[4] G. Neuhold, T. Ollmann, S. Rota Bulo, and P. Kontschieder, “The mapillary vistas dataset
for semantic understanding of street scenes,” in Proceedings of the IEEE international
conference on computer vision, pp. 4990–4999, 2017.
[5] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke,
S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 3213–3223, 2016.
[6] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, “Bdd100k:
A diverse driving video database with scalable annotation tooling,” arXiv preprint
arXiv:1805.04687, vol. 2, no. 5, p. 6, 2018.
[7] Z. Wang, Y. Luo, R. Qiu, Z. Huang, and M. Baktashmotlagh, “Learning to diversify for
single domain generalization,” in Proceedings of the IEEE/CVF International Conference
on Computer Vision (ICCV), pp. 834–843, October 2021.
[8] Y.-J. Kuo, C.-Y. Yang, and C.-T. Hsu, “Towards robust in-domain and out-of-domain generalization:
Contrastive learning with prototype alignment and collaborative attention,”
2022.
[9] S. Choi, S. Jung, H. Yun, J. T. Kim, S. Kim, and J. Choo, “Robustnet: Improving domain
generalization in urban-scene segmentation via instance selective whitening,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 11580–11590, 2021.
[10] D. Peng, Y. Lei, M. Hayat, Y. Guo, and W. Li, “Semantic-aware domain generalized segmentation,”
in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 2594–2605, 2022.
[11] Q. Xu, L. Yao, Z. Jiang, G. Jiang, W. Chu, W. Han, W. Zhang, C. Wang, and Y. Tai, “Dirl:
Domain-invariant representation learning for generalizable semantic segmentation,” 2022.
[12] S. Su, H. Wang, and M. Yang, “Consistency learning based on class-aware style variation
for domain generalizable semantic segmentation,” in Proceedings of the 30th ACM
International Conference on Multimedia, pp. 6029–6038, 2022.
[13] W. Cho, S. Choi, D. K. Park, I. Shin, and J. Choo, “Image-to-image translation via groupwise
deep whitening-and-coloring transformation,” in Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pp. 10639–10647, 2019.
[14] J. Cha, K. Lee, S. Park, and S. Chun, “Domain generalization by mutual-information regularization
with pre-trained models,” European Conference on Computer Vision (ECCV),
2022.
[15] W. Huang, C. Chen, Y. Li, J. Li, C. Li, F. Song, Y. Yan, and Z. Xiong, “Style projected
clustering for domain generalized semantic segmentation,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3061–3071,
2023.
[16] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance
normalization,” in ICCV, 2017.
[17] D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan, “Augmix:
A simple data processing method to improve robustness and uncertainty,” arXiv
preprint arXiv:1912.02781, 2019.
[18] S. Jeon, K. Hong, P. Lee, J. Lee, and H. Byun, “Feature stylization and domain-aware
contrastive learning for domain generalization,” in Proceedings of the 29th ACM International
Conference on Multimedia, pp. 22–31, 2021.
[19] H. Wang, C. Xiao, J. Kossaifi, Z. Yu, A. Anandkumar, and Z. Wang, “Augmax: Adversarial
composition of random augmentations for robust training,” Advances in neural
information processing systems, vol. 34, pp. 237–250, 2021.
[20] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch,
B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al., “Bootstrap your own latent-a new
approach to self-supervised learning,” Advances in neural information processing systems,
vol. 33, pp. 21271–21284, 2020.
[21] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of
the IEEE/CVF conference on computer vision and pattern recognition, pp. 15750–15758,
2021.
[22] P.-K. Huang, J.-X. Chong, H.-Y. Ni, T.-H. Chen, and C.-T. Hsu, “Towards diverse liveness
feature representation and domain expansion for cross-domain face anti-spoofing,”
in 2023 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2023.
[23] X. Li, Y. Dai, Y. Ge, J. Liu, Y. Shan, and L. DUAN, “Uncertainty modeling for outof-
distribution generalization,” in International Conference on Learning Representations,
2022.
[24] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural
networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 2414–2423, 2016.
[25] L. Hoyer, D. Dai, and L. Van Gool, “Daformer: Improving network architectures and
training strategies for domain-adaptive semantic segmentation,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9924–9935,
2022.
[26] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from
computer games,” in European conference on computer vision, pp. 102–118, Springer,
2016.
[27] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with
atrous separable convolution for semantic image segmentation,” in Proceedings of the
European conference on computer vision (ECCV), pp. 801–818, 2018.
[28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–
778, 2016.
[29] X. Pan, P. Luo, J. Shi, and X. Tang, “Two at once: Enhancing learning and generalization
capacities via ibn-net,” in Proceedings of the European Conference on Computer Vision
(ECCV), pp. 464–479, 2018.

簡易檢索 / 詳目顯示

相關論文