研究生: |
蔡博丞 Tsai, Bo-Cheng |
---|---|
論文名稱: |
基於對抗式訓練的跨城市街景分割 Cross City Adaptation of Road Scene Segmenters via Adversarial Learning |
指導教授: |
孫民
Sun, Min 邱瀞德 Chiu, Ching-Te |
口試委員: |
王鈺強
Wang, Yu-Chiang 賴尚宏 Lai, Shang-Hong 陳煥宗 Chen, Hwann-Tzong |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 通訊工程研究所 Communications Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 42 |
中文關鍵詞: | 街景語意分割 、域適應性 、對抗式訓練 |
外文關鍵詞: | Road-Scene-Semantic-Segmentation, Domain-Adaptation, Adversarial-Learning |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
儘管近期基於深度學習的方法已成功地運用在語意切割上,然而若要將一個已事先訓練好的街景分割器運用於一個其影像從未出現在分割器訓練集中的城市上,其結果會因為數據集的偏差而無法達到令人滿意的成果。因此我們提出一個非監督式學習的方法來適應不同城市的街景分割器,而不是在每個城市中都收集大量已標註好的影像來訓練或改進分割器。我們發現可以藉由Google map和其time-machine功能來收集在每一個街景不同時間點的未標註影像,因此我們就能間接地萃取出靜態的物體(static-object priors)。然後我們進一步地結合全域(global)和特定類別(class-specific)的域對抗學習框架(domain adversarial learning framework)在不需要任何用戶去標註或介入的情況下,來實現事先訓練好的分割器對該城市的適應。從結果可以顯示出我們提出的方法在跨洲且多個城市中都能提高了其語意分割的準確度,同時我們的方法對於那些需要有標註好訓練數據的最先進方法是有利的。
Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases. Instead of collecting a large number of annotated images of each city of interest to train or refine the segmenter, we propose an unsupervised learning approach to adapt road scene segmenters across different cities. By utilizing Google Street View and its time-machine feature, we can collect unannotated images for each road scene at different times, so that the associated static-object priors can be extracted accordingly. By advancing a joint global and class-specific domain adversarial learning framework, adaptation of pre-trained segmenters to that city can be achieved without the need of any user annotation or interaction. We show that our method improves the performance of semantic segmentation in multiple cities across continents, while it performs favorably against state-of-the-art approaches requiring annotated training data.
[1] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015.
[2] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in ICLR, 2016.
[3] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” arXiv preprint arXiv:1511.00561, 2015.
[4] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in CVPR, IEEE, 2016.
[5] J. Xie, M. Kiefel, M.-T. Sun, and A. Geiger, “Semantic instance annotation of street scenes by 3d to 2d label transfer,” in CVPR, IEEE, 2016.
[6] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in ECCV, Springer, 2016.
[7] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in CVPR, IEEE, 2016.
[8] F. Saleh, M. S. A. Akbarian, M. Salzmann, L. Petersson, S. Gould, and J. M. Alvarez, “Built-in foreground/background prior for weakly-supervised semantic segmentation,” in ECCV, Springer, 2016.
[9] W. Shimoda and K. Yanai, “Distinct class-specific saliency maps for weakly supervised semantic segmentation,” in ECCV, Springer, 2016.
[10] A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei, “Whats the point: Semantic segmentation with point supervision,” in ECCV, Springer, 2016.
[11] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
[12] A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in CVPR, IEEE, 2011.
[13] A. Khosla, T. Zhou, T. Malisiewicz, A. A. Efros, and A. Torralba, “Undoing the damage of dataset bias,” in ECCV, Springer, 2012.
[14] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1915–1929, 2013.
[15] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in ICCV, IEEE, 2015.
[16] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” in ICLR, 2015.
[17] G. Papandreou, L.-C. Chen, K. P. Murphy, and A. L. Yuille, “Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation,” in ICCV, IEEE, 2015.
[18] D. Pathak, E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional multi-class multiple instance learning,” 2015.
[19] P. O. Pinheiro and R. Collobert, “From image-level to pixel-level labeling with convolutional networks,” in CVPR, IEEE, 2015.
[20] D. Pathak, P. Krahenbuhl, and T. Darrell, “Constrained convolutional neural networks for weakly supervised segmentation,” in ICCV, IEEE, 2015.
[21] A. Kolesnikov and C. H. Lampert, “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in ECCV, Springer, 2016.
[22] D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “Scribblesup: Scribble-supervised convolutional networks for semantic segmentation,” in CVPR, IEEE, 2016.
[23] J. Xu, A. G. Schwing, and R. Urtasun, “Learning to segment under various forms of weak supervision,” in CVPR, IEEE, 2015.
[24] M. Guillaumin, D. Küttel, and V. Ferrari, “Imagenet auto-annotation with segmentation propagation,” in IJCV, Springer, 2014.
[25] K. M. Borgwardt, A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Schölkopf, and A. J. Smola, “Integrating structured biological data by kernel maximum mean discrepancy,”
in ISMB, pp. 49–57, 2006.
[26] J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt, and B. Schölkopf, “. correcting sample selection bias by unlabeled data,” in NIPS, pp. 601–608, 2006.
[27] B. Gong, K. Grauman, and F. Sha, “Connecting the dots with landmarks: Discriminatively
learning domain-invariant features for unsupervised domain adaptation,” in ICML, 2013.
[28] M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks.,” in ICML, 2015.
[29] M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Unsupervised domain adaptation with residual transfer networks,” in NIPS, 2016.
[30] W. Zellinger, T. Grubinger, E. Lughofer, T. Natschläger, and S. Saminger-Platz, “Central moment discrepancy (CMD) for domain-invariant representation learning,” in ICLR, 2017.
[31] O. Sener, H. O. Song, A. Saxena, and S. Savarese, “Learning transferrable representations for unsupervised domain adaptation,” in NIPS, 2016.
[32] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014.
[33] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” in ICLR, 2016.
[34] J.-Y. Zhu, P. Krähenbühl, E. Shechtman, and A. A. Efros, “Generative visual manipulation on the natural image manifold,” in ECCV, Springer, 2016.
[35] M.-Y. Liu and O. Tuzel, “Coupled generative adversarial networks,” in NIPS, 2016.
[36] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in ICML, 2015.
[37] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” Journal of Machine Learning Research, vol. 17, no. 59, pp. 1–35, 2016.
[38] S. Purushotham, W. Carvalho, T. Nilanon, and Y. Liu, “Variational recurrent adversarial deep domain adaptation,” in ICLR, 2017.
[39] J. Hoffman, D. Wang, F. Yu, and T. Darrell, “FCNs in the wild: Pixel-level adversarial and constraint-based adaptation,” arXiv preprint arXiv:1612.02649, 2016.
[40] M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian, “A realtime algorithm for signal analysis with the help of the wavelet transform,” in Wavelets: Time-Frequency Methods and Phase Space, pp. 289–297, 1989.
[41] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, N. Thang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” in ICML, 2014.
[42] M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer feature learning with joint distribution adaptation,” in ICCV, 2013.
[43] P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, “Deepflow: Large displacement optical flow with deep matching,” in ICCV, IEEE, 2013.
[44] M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa, “Entropy rate superpixel segmentation,” in CVPR, IEEE, 2011.
[45] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
[46] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2015.
[47] B. Settles, “Active learning literature survey,” Computer Sciences Technical Report 1648, University of Wisconsin- Madison, 2009.
[48] C. E. Shannon and W. Weaver, “The mathematical theory of communication,” University of Illinois press, 1998.