簡易檢索 / 詳目顯示

研究生: 陸海侖
Lu, Hai-Lun
論文名稱: 利用動作資訊改善跨域人體目標分割
Leveraging Motion Information for Improving Domain Adaptation in Human Segmentation
指導教授: 孫民
Sun, Min
口試委員: 邱維辰
Chiu, Wei-Chen
陳冠文
Chen, Kuan-Wen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 38
中文關鍵詞: 域適應學習人體目標分割動作資訊
外文關鍵詞: Domain Adaptation, Human Segmentation, Motion Information
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習技術的不斷發展,基於深度學習技術的人體目標分割技術效果日益改進。但是考慮到對大規模數據集做標註耗時耗費,非監督式的跨域解決方案不斷湧現用於解決將在源數據集上訓練好的模型跨域到沒有標註的目標數據集上。大多數跨域解決方案只利用了影像中的外表資訊,在這篇文章中,我們提出利用視訊中前景的動作資訊來改善跨域人體目標分割效果。融合外表資訊和動作資訊來解決跨域問題。此外,我們引入利用聯合交叉標準的損失函數用於改善影像分割和跨域訓練效果。為了檢驗我們提出方法的效果,我們在我們新收集的跨場景、跨形態視訊數據集和一個公開的視訊數據集上做了相關實驗。結果表明利用動作資訊和更有針對性設計的訓練損失函數,跨域人體目標分割的效果得到改進,比只利用外表資訊的非監督式跨域解決方案更好。最後,我們還做了模型簡化實驗來說明我們提出的方法的每一部分所起的作用與數據集的場景和形態特性相關。


    With the recent development of deep learning technology, the performance of human segmentation is improving. But considering labeling a large-scale dataset is expensive and time-consuming, many unsupervised domain adaptation methods are proposed to adapt models pretrained on source domain dataset to target domain dataset. Most of these methods only utilize appearance information. In this thesis, we propose to leverage motion information in videos to improve human segmentation performance. Appearance and motion information are fused together to do domain adaptation. Besides, we propose a training loss leveraging intersection over union criterion for human segmentation and domain adaptation. We conduct experiments on our newly collected cross-scenes and cross-modalities video datasets and a public dataset. The results demonstrate that leveraging motion information and better loss design outperforms unsupervised domain adaptation methods only using appearance information. Finally we do ablation study and illustrate that the contribution made by every component of our proposed method depends on scenes and modality properties in datasets.

    摘要 ii Abstract iii 誌謝 iv 1 Introduction 1 1.1 Motivation and Problem Description 1 1.2 Main Contribution 3 1.3 Thesis Structure 3 2 Related Work 5 2.1 Human Segmentation 5 2.2 Motion Information Extraction 6 2.3 Two Stream Architecture 6 2.4 Domain Adaptation 7 3 Preliminaries 9 3.1 U-Net Architecture for Semantic Segmentation 9 3.2 DNN-based Domain Adaptation 11 3.2.1 Domain Adversarial Neural Networks 11 3.2.2 Output Space Adaptation Network 13 3.2.3 Maximum Classifier Discrepancy Network 14 4 Dataset 17 4.1 Public Datasets 17 4.2 Our Datasets 17 5 Our Method 20 5.1 Appearance and Motion Based Domain Adaptation Method 20 5.2 Intersection over Union Loss 24 6 Experiments 26 6.1 Implementation Details 26 6.2 Cross-scenes and Cross-modalities Experiments 27 6.3 Ablation Study 32 7 Conclusion 34 References 35

    [1] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
    biomedical image segmentation,” in MICCAI, Springer, 2015.
    [2] Y.-H. Tsai, W.-C. Hung, S. Schulter, K. Sohn, M.-H. Yang, and M. Chandraker,
    “Learning to adapt structured output space for semantic segmentation,” CVPR,
    2018.
    [3] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada, “Maximum classifier discrepancy for unsupervised domain adaptation,” CVPR, 2018.
    [4] M. A. Ponti, L. S. F. Ribeiro, T. S. Nazare, T. Bui, and J. Collomosse, “Everything you wanted to know about deep learning for computer vision but were afraid
    to ask,” in Graphics, Patterns and Images Tutorials (SIBGRAPI-T), 2017 30th
    SIBGRAPI Conference on, pp. 17–41, IEEE, 2017.
    [5] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on
    knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
    [6] J. Xie, M. Kiefel, M.-T. Sun, and A. Geiger, “Semantic instance annotation of
    street scenes by 3d to 2d label transfer,” in CVPR, IEEE, 2016.
    [7] F. Saleh, M. S. A. Akbarian, M. Salzmann, L. Petersson, S. Gould, and J. M. Alvarez, “Built-in foreground/background prior for weakly-supervised semantic segmentation,” in ECCV, Springer, 2016.
    [8] A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei, “Whatž s the point: Semantic segmentation with point supervision,” in ECCV, Springer, 2016.
    [9] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in ICML, 2015.
    [10] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in CVPR, 2017.
    [11] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan, “Domain
    separation networks,” in NIPS, 2016.
    [12] T. V. Spina, M. Tepper, A. Esler, V. Morellas, N. Papanikolopoulos, A. X. Falcão,
    and G. Sapiro, “Video human segmentation using fuzzy object models and its application to body pose estimation of toddlers for behavior studies,” arXiv preprint
    arXiv:1305.6918, 2013.
    35
    [13] C. Song, Y. Huang, Z. Wang, and L. Wang, “1000fps human segmentation with
    deep convolutional neural networks,” in ACPR, IEEE, 2015.
    [14] T. Zhao and R. Nevatia, “Bayesian human segmentation in crowded situations,”
    in CVPR, 2003.
    [15] F. He, Y. Guo, and C. Gao, “Human segmentation of infrared image for mobile
    robot search,” Multimedia Tools and Applications, pp. 1–14, 2017.
    [16] Y. Tan, Y. Guo, and C. Gao, “Background subtraction based level sets for human
    segmentation in thermal infrared surveillance systems,” Infrared Physics & Technology, vol. 61, pp. 230–240, 2013.
    [17] J. Yan and M. Pollefeys, “A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate,” in ECCV,
    Springer, 2006.
    [18] R. Dragon, B. Rosenhahn, and J. Ostermann, “Multi-scale clustering of frame-toframe correspondences for motion segmentation,” in ECCV, Springer, 2012.
    [19] T. Brox and J. Malik, “Large displacement optical flow: descriptor matching in
    variational motion estimation,” TPAMI, vol. 33, no. 3, pp. 500–513, 2011.
    [20] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0:
    Evolution of optical flow estimation with deep networks,” in CVPR, 2017.
    [21] Y.-H. Tsai, M.-H. Yang, and M. J. Black, “Video segmentation via object flow,” in
    CVPR, 2016.
    [22] S. Xie, C. Sun, J. Huang, Z. Tu, and K. Murphy, “Rethinking spatiotemporal feature learning for video understanding,” arXiv preprint arXiv:1712.04851, 2017.
    [23] X. Lin, V. Campos, X. Giro-i Nieto, J. Torres, and C. C. Ferrer, “Disentangling motion, foreground and background features in videos,” arXiv preprint
    arXiv:1707.04092, 2017.
    [24] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action
    recognition in videos,” in NIPS, 2014.
    [25] P. Tokmakov, C. Schmid, and K. Alahari, “Learning to segment moving objects,”
    arXiv preprint arXiv:1712.01127, 2017.
    [26] J. Zhu, W. Zou, and Z. Zhu, “Learning gating convnet for two-stream based methods in action recognition.,” CoRR, 2017.
    [27] R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee, “Decomposing motion and
    content for natural video sequence prediction,” arXiv preprint arXiv:1706.08033,
    2017.
    [28] Z. Jiang, V. Rozgic, and S. Adali, “Learning spatiotemporal features for infrared
    action recognition with 3d convolutional neural networks,” in CVPR Workshop,
    2017.
    36
    [29] B. Gong, K. Grauman, and F. Sha, “Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation,”
    in ICML, 2013.
    [30] M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable features with
    deep adaptation networks,” in ICML, 2015.
    [31] M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Unsupervised domain adaptation
    with residual transfer networks,” in NIPS, 2016.
    [32] W. Zellinger, T. Grubinger, E. Lughofer, T. Natschläger, and S. Saminger-Platz,
    “Central moment discrepancy (cmd) for domain-invariant representation learning,” arXiv preprint arXiv:1702.08811, 2017.
    [33] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
    A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014.
    [34] M.-Y. Liu and O. Tuzel, “Coupled generative adversarial networks,” in NIPS,
    2016.
    [35] Y.-H. Chen, W.-Y. Chen, Y.-T. Chen, B.-C. Tsai, Y.-C. F. Wang, and M. Sun, “No
    more discrimination: Cross city adaptation of road scene segmenters,” in ICCV,
    2017.
    [36] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
    segmentation,” in CVPR, 2015.
    [37] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” arXiv preprint
    arXiv:1511.00561, 2015.
    [38] O. Sener, H. O. Song, A. Saxena, and S. Savarese, “Learning transferrable representations for unsupervised domain adaptation,” in NIPS, 2016.
    [39] S. Sankaranarayanan, Y. Balaji, A. Jain, S. N. Lim, and R. Chellappa, “Unsupervised domain adaptation for semantic segmentation with gans,” arXiv preprint
    arXiv:1711.06969, 2017.
    [40] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette,
    M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” Journal of Machine Learning Research, vol. 17, no. 59, pp. 1–35, 2016.
    [41] K. Fragkiadaki, W. Zhang, G. Zhang, and J. Shi, “Two-granularity tracking: Mediating trajectory and detection graphs for tracking under occlusions,” in ECCV,
    Springer, 2012.
    [42] J. Cheng, Y.-H. Tsai, S. Wang, and M.-H. Yang, “Segflow: Joint learning for video
    object segmentation and optical flow,” in ICCV, 2017.
    [43] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and
    C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, Springer,
    2014.
    37
    [44] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR,
    2015.

    QR CODE