簡易檢索 / 詳目顯示

研究生: 簡瑞霆
Chien, Jui-Ting
論文名稱: 隱「行」人偵測
Detecting nonexistent pedestrians
指導教授: 陳煥宗
Chen, Hwann-Tzong
口試委員: 賴尚宏
Lai, Shang-Hong
劉庭祿
Liu, Tyng-Luh
學位類別: 碩士
Master
系所名稱:
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 43
中文關鍵詞: 物件偵測語意分割深度學習卷機神經網路對抗式生成網路
外文關鍵詞: Object detection, Semantic segmentaition, Deep learning, Convolutional neural network, Generative adversarial networks
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 有別於一般的物件偵測跟語意分割等視覺問題,本論文希望藉由進一步分析不存在的行人出現於道路街景中的機率,來探討利用周邊資訊間接達成場景感知的可能性。我們的方法建立於生成和鑑別的對抗流程上,藉此實現具備找出遺失的視覺資訊的感知能力。為了產生可用於偵測不存在行人的訓練資料,我們使用了最新的圖像修補技術,即使圖片中的指定區域只存在背景,經過學習後的偵測器依然可以預測該區域觀測到行人的機率。我們根據存在機率將額外的行人放入對應圖片中,再透過使用者研究來評估我們的方法,衡量合成的圖片和真實圖片是否已難以區別。實證研究的結果顯示,我們的方法可以領悟到在道路街景中,何處是合理的行人走路或站立位置的概念。


    We explore beyond object detection and semantic segmentation, and propose to address
    the problem of estimating the presence probabilities of nonexistent pedestrians in a street
    scene. Our method builds upon a combination of generative and discriminative procedures
    to achieve the perceptual capability of figuring out missing visual information. We adopt
    state-of-the-art inpainting techniques to generate the training data for nonexistent pedestrian
    detection. The learned detector can predict the probability of observing a pedestrian
    at some location in the current image, even if that location exhibits only the background.
    We evaluate our method by inserting pedestrians into the image according to the presence
    probabilities and conducting user study to distinguish real and synthetic images. The empirical
    results show that our method can capture the idea of where the reasonable places are
    for pedestrians to walk or stand in a street scene.

    1 Introduction 9 2 Related work 11 3 Approach 13 3.1 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Experiments 25 4.1 Nonexistent Pedestrian Detection . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Synthetic Images with Rendered Pedestrians . . . . . . . . . . . . . . . . . 28 5 Conclusion and Future Work 36

    [1] M. Andriluka, L. Pishchulin, P. V. Gehler, and B. Schiele. 2d human pose estimation:
    New benchmark and state of the art analysis. In 2014 IEEE Conference on Computer
    Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014,
    pages 3686–3693, 2014.
    [2] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN. CoRR, abs/1701.07875,
    2017.
    [3] D. Berthelot, T. Schumm, and L. Metz. BEGAN: boundary equilibrium generative
    adversarial networks. CoRR, abs/1703.10717, 2017.
    [4] A. Borji and L. Itti. Exploiting local and global patch rarities for saliency detection.
    In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence,
    RI, USA, June 16-21, 2012, pages 478–485, 2012.
    [5] G. J. Brostow, J. Fauqueur, and R. Cipolla. Semantic object classes in video: A highdefinition
    ground truth database. Pattern Recognition Letters, 30(2):88–97, 2009.
    [6] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand. What do different evaluation
    metrics tell us about saliency models? CoRR, abs/1604.03605, 2016.
    [7] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun. Monocular 3d object
    detection for autonomous driving. In 2016 IEEE Conference on Computer Vision
    and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages
    2147–2156, 2016.
    [8] Y. Chen, C. Shen, X.Wei, L. Liu, and J. Yang. Adversarial posenet: A structure-aware
    convolutional network for human pose estimation. CoRR, abs/1705.00389, 2017.
    [9] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke,
    S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding.
    In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR
    2016, Las Vegas, NV, USA, June 27-30, 2016, pages 3213–3223, 2016.
    [10] A. Criminisi, P. P´erez, and K. Toyama. Region filling and object removal by exemplarbased
    image inpainting. IEEE Trans. Image Processing, 13(9):1200–1212, 2004.
    [11] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In
    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
    (CVPR 2005), 20-26 June 2005, San Diego, CA, USA, pages 886–893, 2005.
    [12] J. Dickmann, N. Appenrodt, J. Klappstein, H. Bl¨ocher, M. M. Muntzinger, A. Sailer,
    M. Hahn, and C. Brenk. Making bertha see even more: Radar contribution. IEEE
    Access, 3:1233–1247, 2015.
    [13] P. Doll´ar, C.Wojek, B. Schiele, and P. Perona. Pedestrian detection: An evaluation of
    the state of the art. IEEE Trans. Pattern Anal. Mach. Intell., 34(4):743–761, 2012.
    [14] A. Ess, B. Leibe, K. Schindler, and L. J. V. Gool. A mobile vision system for robust
    multi-person tracking. In 2008 IEEE Computer Society Conference on Computer
    Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska,
    USA, 2008.
    [15] C. Finn, I. J. Goodfellow, and S. Levine. Unsupervised learning for physical interaction
    through video prediction. In Advances in Neural Information Processing Systems
    29: Annual Conference on Neural Information Processing Systems 2016, December
    5-10, 2016, Barcelona, Spain, pages 64–72, 2016.
    [16] F. Flohr and D. Gavrila. Pedcut: an iterative framework for pedestrian segmentation
    combining shape models and multiple data cues. In British Machine Vision Conference,
    BMVC 2013, Bristol, UK, September 9-13, 2013, 2013.
    [17] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The KITTI
    dataset. I. J. Robotics Res., 32(11):1231–1237, 2013.
    [18] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for
    accurate object detection and semantic segmentation. In 2014 IEEE Conference on
    Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June
    23-28, 2014, pages 580–587, 2014.
    [19] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A. C.
    Courville, and Y. Bengio. Generative adversarial networks. CoRR, abs/1406.2661,
    2014.
    [20] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved
    training of wasserstein gans. CoRR, abs/1704.00028, 2017.
    [21] C. Guo, Q. Ma, and L. Zhang. Spatio-temporal saliency detection using phase spectrum
    of quaternion fourier transform. In 2008 IEEE Computer Society Conference on
    Computer Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage,
    Alaska, USA, 2008.
    [22] C. Guo and L. Zhang. A novel multiresolution spatiotemporal saliency detection
    model and its applications in image and video compression. IEEE Trans. Image Processing,
    19(1):185–198, 2010.
    [23] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.
    In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016,
    Las Vegas, NV, USA, June 27-30, 2016, pages 770–778, 2016.
    [24] X. Hou and L. Zhang. Saliency detection: A spectral residual approach. In 2007 IEEE
    Computer Society Conference on Computer Vision and Pattern Recognition (CVPR
    2007), 18-23 June 2007, Minneapolis, Minnesota, USA, 2007.
    [25] G. Huang, Z. Liu, and K. Q. Weinberger. Densely connected convolutional networks.
    CoRR, abs/1608.06993, 2016.
    [26] S. Iizuka, E. Simo-Serra, and H. Ishikawa. Globally and Locally Consistent Image
    Completion. ACM Transactions on Graphics (Proc. of SIGGRAPH 2017),
    36(4):107:1–107:14, 2017.
    [27] M. Jiang, S. Huang, J. Duan, and Q. Zhao. SALICON: saliency in context. In IEEE
    Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA,
    USA, June 7-12, 2015, pages 1072–1080, 2015.
    [28] S. Johnson and M. Everingham. Clustered pose and nonlinear appearance models for
    human pose estimation. In British Machine Vision Conference, BMVC 2010, Aberystwyth,
    UK, August 31 - September 3, 2010. Proceedings, pages 1–11, 2010.
    [29] K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. M. Bhandarkar, W. Matusik, and
    A. Torralba. Eye tracking for everyone. In 2016 IEEE Conference on Computer
    Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016,
    pages 2176–2184, 2016.
    [30] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional
    neural networks. In Advances in Neural Information Processing Systems
    25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings
    of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States.,
    pages 1106–1114, 2012.
    [31] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang,
    and W. Shi. Photo-realistic single image super-resolution using a generative adversarial
    network. CoRR, abs/1609.04802, 2016.
    [32] J. Li, X. Liang, Y.Wei, T. Xu, J. Feng, and S. Yan. Perceptual Generative Adversarial
    Networks for Small Object Detection. ArXiv e-prints, June 2017.
    [33] Y. Li, S. Liu, J. Yang, and M. Yang. Generative face completion. CoRR,
    abs/1704.05838, 2017.
    [34] P. Luc, C. Couprie, S. Chintala, and J. Verbeek. Semantic segmentation using adversarial
    networks. CoRR, abs/1611.08408, 2016.
    [35] L. Marchesotti, C. Cifarelli, and G. Csurka. A framework for visual saliency detection
    with applications to image thumbnailing. In IEEE 12th International Conference on
    Computer Vision, ICCV 2009, Kyoto, Japan, September 27 - October 4, 2009, pages
    2232–2239, 2009.
    [36] G.M´attyus, S.Wang, S. Fidler, and R. Urtasun. HD maps: Fine-grained road segmentation
    by parsing ground and aerial images. In 2016 IEEE Conference on Computer
    Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016,
    pages 3611–3619, 2016.
    [37] M. Mirza and S. Osindero. Conditional generative adversarial nets. CoRR,
    abs/1411.1784, 2014.
    [38] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation.
    In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam,
    The Netherlands, October 11-14, 2016, Proceedings, Part VIII, pages 483–499, 2016.
    [39] H. Pan and H. Jiang. Supervised adversarial networks for image saliency detection.
    CoRR, abs/1704.07242, 2017.
    [40] J. Pan, C. Canton-Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and
    X. Gir´o i Nieto. Salgan: Visual saliency prediction with generative adversarial networks.
    CoRR, abs/1701.01081, 2017.
    [41] D. Pathak, P. Kr¨ahenb¨uhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders:
    Feature learning by inpainting. In 2016 IEEE Conference on Computer Vision
    and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages
    2536–2544, 2016.
    [42] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep
    convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015.
    [43] G. Ros, S. Ramos, M. Granados, A. Bakhtiary, D. V´azquez, and A. M. L´opez. Visionbased
    offline-online perception paradigm for autonomous driving. In 2015 IEEE Winter
    Conference on Applications of Computer Vision, WACV 2015, Waikoloa, HI, USA,
    January 5-9, 2015, pages 231–238, 2015.
    [44] E. Shelhamer, J. Long, and T. Darrell. Fully convolutional networks for semantic
    segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(4):640–651, 2017.
    [45] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale
    image recognition. CoRR, abs/1409.1556, 2014.
    [46] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
    and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on
    Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12,
    2015, pages 1–9, 2015.
    [47] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler. Joint training of a convolutional
    network and a graphical model for human pose estimation. In Advances in Neural
    Information Processing Systems 27: Annual Conference on Neural Information
    Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages
    1799–1807, 2014.
    [48] C. Vondrick, H. Pirsiavash, and A. Torralba. Generating videos with scene dynamics.
    In Advances in Neural Information Processing Systems 29: Annual Conference
    on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona,
    Spain, pages 613–621, 2016.
    [49] X. Wang, A. Shrivastava, and A. Gupta. A-fast-rcnn: Hard positive generation via
    adversary for object detection. CoRR, abs/1704.03414, 2017.
    [50] C. Wojek, S. Walk, and B. Schiele. Multi-cue onboard pedestrian detection. In 2009
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition
    (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 794–801, 2009.
    [51] J. Xie, M. Kiefel, M. Sun, and A. Geiger. Semantic instance annotation of street
    scenes by 3d to 2d label transfer. In 2016 IEEE Conference on Computer Vision
    and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages
    3688–3697, 2016.
    [52] Y. Xue, T. Xu, H. Zhang, R. Long, and X. Huang. SegAN: Adversarial Network with
    Multi-scale L 1 Loss for Medical Image Segmentation. ArXiv e-prints, June 2017.
    [53] C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, and H. Li. High-resolution image
    inpainting using multi-scale neural patch synthesis. CoRR, abs/1611.09969, 2016.
    [54] R. Yeh, C. Chen, T. Lim, M. Hasegawa-Johnson, and M. N. Do. Semantic image
    inpainting with perceptual and contextual losses. CoRR, abs/1607.07539, 2016.
    [55] Z. Zhang, S. Fidler, and R. Urtasun. Instance-level segmentation for autonomous
    driving with deep densely connected mrfs. In 2016 IEEE Conference on Computer
    Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016,
    pages 669–677, 2016.
    [56] J. J. Zhao, M. Mathieu, and Y. LeCun. Energy-based generative adversarial network.
    CoRR, abs/1609.03126, 2016.

    QR CODE