簡易檢索 / 詳目顯示

研究生: 黃柏瑜
Huang, Po-Yu
論文名稱: 高效率不確定性預測應用於影像語意分割
Efficient Uncertainty Estimation for Semantic Segmentation in Videos
指導教授: 孫民
Sun, Min
口試委員: 王鈺強
Wang, Yu-Chiang
陳縕儂
Chen, Yun-Nung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 43
中文關鍵詞: 不確定性語意分割影像高效率
外文關鍵詞: uncertainty, segmentation, video, efficient
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在深度學習領域模型不確定性逐漸受到重視,如果無法得知深度學習模型對判斷的確定性,將很難被真實應用採納。許多文獻提出貝葉斯神經網路搭配蒙特卡洛dropout(MC dropout)來估計不確定性,但是MC dropout需要運算模型N次(例如:N=50),這樣會造成N倍的減速。對於實時應用像自駕車系統需要取得預測以及不確定性愈快愈好的情況,MC dropout不是一個實際的解決方案。本論文提出區域時間集合方法,簡稱RTA-MC。RTA-MC 利用影像中時間上的資訊去模擬取樣的流程。RTA-MC 可以加速MC dropout 10.97倍快,而且只下降1.2% mean IOU 的準確率。除此之外,RTA-MC的不確定性估計在
    幀層次上超越MC dropout,RTA-MC 可以找出比MC dropout多3%困難的幀。最後,將不確定性估計應用在主動學習上,結果顯示RTA-MC可以得到和MC dropout相同的表現並且超越隨機選擇。


    Uncertainty estimation in deep learning becomes more important recently. A deep learning model can't be applied in real applications if we don't know whether the model is certain about the decision or not. Some literature proposes the Bayesian neural network which can estimate the uncertainty by Monte Carlo Dropout (MC dropout). However, MC dropout needs to forward the model N times (e.g. N=50) which results in N times slower. For real-time applications such as self-driving car system, which need to obtain the prediction and the uncertainty as fast as possible, so that MC dropout becomes impractical. In this work, we propose the region-based temporal aggregation (RTA) method which leverages the temporal information in videos to simulate the sampling procedure. Our RTA method speeds up the MC dropout 10.97x faster and only drop 1.2% on mean IoU metric. Furthermore, the uncertainty estimation obtained by our RTA method outperforms MC dropout on frame-level uncertainty by more retrieving 3% of hardest frames. Finally, we use the uncertainty estimation to active learning. The results show that RTA-MC's uncertainty has the same performance as MC dropout and outperform random select strategy.

    摘要 ii Abstract iii 誌謝 iv 1 Introduction 1 2 Related Work 4 2.1 Uncertainty of Deep Learning 4 2.2 Leverage Temporal Information 5 2.3 Semantic Segmentation 6 2.4 Active Learning 7 3 Preliminary 8 3.1 Uncertainty 8 3.1.1 Bayesian Neural Network 8 3.1.2 MC Dropout 10 3.2 Semantic segmentation 11 3.2.1 SegNet 12 3.2.2 Bayesian SegNet 13 3.3 Optical flow 13 3.3.1 FlowNet2.0 13 4 Temporal Aggregation 15 4.1 Temporal Aggregation MC Dropout(TA-MC) 15 4.1.1 Notations 15 4.1.2 Temporal Aggregation 16 4.2 Region-based Temporal Aggregation MC Dropout(RTA-MC) 19 5 Setting and Datasets 22 5.1 Semantic Segmentation Dataset 22 5.2 Experiment Detail 23 6 Experiments 24 6.1 Video Semantic Segmentation 24 6.2 Uncertainty Evaluation 25 6.3 Active Learning 30 7 Conclusion and Future Work 3 7 7.1 Conclusion 37 7.2 Future Work 38 References 39

    [1] A. Kendall, V. Badrinarayanan, and R. Cipolla, “Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding,” arXiv preprint arXiv:1511.02680, 2015. 1, 5, 10, 11, 23
    [2] Y. Gal and Z. Ghahramani, “Bayesian convolutional neural networks with bernoulli approximate variational inference,” arXiv preprint arXiv:1506.02158, 2015. 1, 5
    [3] C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” arXiv preprint arXiv:1505.05424, 2015. 1, 4
    [4] D. J. MacKay, “A practical bayesian framework for backpropagation networks,” Neural computation, vol. 4, no. 3, pp. 448–472, 1992. 1, 4
    [5] J. S. Denker and Y. Lecun, “Transforming neural-net output levels to probability distributions,” in Advances in neural information processing systems, pp. 853–859, 1991. 1, 4
    [6] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in 33rd International Conference on Machine Learning, ICML 2016, vol. 3, pp. 1651–1660, 2016. 1, 5
    [7] A. Graves, “Practical variational inference for neural networks,” in Advances in Neural Information Processing Systems, pp. 2348–2356, 2011. 4
    [8] A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?,” in Advances in Neural Information Processing Systems, pp. 5580–5590, 2017. 5, 25
    [9] D. Miller, L. Nicholson, F. Dayoub, and N. Sünderhauf, “Dropout sampling for robust object detection in open-set conditions,” arXiv preprint arXiv:1710.06677, 2017. 5
    [10] C.-C. Kao, T.-Y. Lee, P. Sen, and M.-Y. Liu, “Localization-aware active learning for object detection,” arXiv preprint arXiv:1801.05124, 2018. 5
    [11] Y. Gal, R. Islam, and Z. Ghahramani, “Deep bayesian active learning with image data,” arXiv preprint arXiv:1703.02910, 2017. 5, 7
    [12] X. Zhu, Y. Wang, J. Dai, L. Yuan, and Y. Wei, “Flow-guided feature aggregation for video object detection,” arXiv preprint arXiv:1703.10025, 2017. 5 39
    [13] R. Gadde, V. Jampani, and P. V. Gehler, “Semantic video cnns through representation warping,” CoRR, abs/1708.03088, 2017. 5
    [14] X. Zhu, Y. Xiong, J. Dai, L. Yuan, and Y. Wei, “Deep feature flow for video recognition,” in Proc. CVPR, vol. 2, p. 7, 2017. 5
    [15] X. Zhu, J. Dai, X. Zhu, Y. Wei, and L. Yuan, “Towards high performance video object detection for mobiles,” arXiv preprint arXiv:1804.05830, 2018. 6
    [16] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015. 6
    [17] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1915–1929, 2013. 6
    [18] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658, 2015. 6
    [19] G. Lin, C. Shen, A. Van Den Hengel, and I. Reid, “Efficient piecewise training of deep structured models for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203, 2016. 6
    [20] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017. 6, 11
    [21] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, 2015. 6
    [22] G. Lin, A. Milan, C. Shen, and I. Reid, “Refinenet: Multi-path refinement networks with identity mappings for high-resolution semantic segmentation,” arXiv preprint arXiv:1611.06612, 2016. 6
    [23] W. Liu, A. Rabinovich, and A. C. Berg, “Parsenet: Looking wider to see better,” arXiv preprint arXiv:1506.04579, 2015. 6
    [24] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2018. 6
    [25] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890, 2017. 6 40
    [26] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, “Understanding convolution for semantic segmentation,” arXiv preprint arXiv:1702.08502, 2017. 6
    [27] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” CoRR, abs/1703.06211, vol. 1, no. 2, p. 3, 2017. 6
    [28] Z. Wu, C. Shen, and A. v. d. Hengel, “Bridging category-level and instance-level semantic image segmentation,” arXiv preprint arXiv:1605.06885, 2016. 6
    [29] M. Gorriz, A. Carlier, E. Faure, and X. Giro-i Nieto, “Cost-effective active learning for melanoma segmentation,” arXiv preprint arXiv:1711.09168, 2017. 7
    [30] Z. Zhou, J. Shin, L. Zhang, S. Gurudu, M. Gotway, and J. Liang, “Fine-tuning convolutional neural networks for biomedical image analysis: actively and incrementally,” in IEEE conference on computer vision and pattern recognition, Hawaii, pp. 7340–7349, 2017. 7
    [31] L. C. Freeman, Elementary applied statistics: for students in behavioral science. John Wiley & Sons, 1965. 9
    [32] C. E. Shannon, “A mathematical theory of communication,” ACM SIGMOBILE Mobile Computing and Communications Review, vol. 5, no. 1, pp. 3–55, 2001. 10
    [33] M. Kampffmeyer, A.-B. Salberg, and R. Jenssen, “Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2016 IEEE Conference on, pp. 680–688, IEEE, 2016. 10
    [34] N. Houlsby, F. Huszár, Z. Ghahramani, and M. Lengyel, “Bayesian active learning for classification and preference learning,” arXiv preprint arXiv:1112.5745, 2011. 10
    [35] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. van der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766, 2015. 13
    [36] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2017. 13, 14, 23
    [37] N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048, 2016. 14
    [38] T. Brox and J. Malik, “Large displacement optical flow: descriptor matching in variational motion estimation,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 3, pp. 500–513, 2011. 14 41
    [39] J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid, “Epicflow: Edgepreserving interpolation of correspondences for optical flow,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1164– 1172, 2015. 14
    [40] P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, “Deepflow: Large displacement optical flow with deep matching,” in Computer Vision (ICCV), 2013 IEEE International Conference on, pp. 1385–1392, IEEE, 2013. 14
    [41] G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-definition ground truth database,” Pattern Recognition Letters, vol. 30, no. 2, pp. 88–97, 2009. 22
    [42] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017. 23 [43] M. Kendall, “A new measure of rank correlation.,” Biometrika, 1938. 26

    QR CODE