隱「行」人偵測｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	簡瑞霆 Chien, Jui-Ting
論文名稱：	隱「行」人偵測 Detecting nonexistent pedestrians
指導教授：	陳煥宗 Chen, Hwann-Tzong
口試委員:	賴尚宏 Lai, Shang-Hong 劉庭祿 Liu, Tyng-Luh
學位類別：	碩士 Master
系所名稱：
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	43
中文關鍵詞：	物件偵測、語意分割、深度學習、卷機神經網路、對抗式生成網路
外文關鍵詞：	Object detection, Semantic segmentaition, Deep learning, Convolutional neural network, Generative adversarial networks
相關次數：	點閱：98 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

有別於一般的物件偵測跟語意分割等視覺問題，本論文希望藉由進一步分析不存在的行人出現於道路街景中的機率，來探討利用周邊資訊間接達成場景感知的可能性。我們的方法建立於生成和鑑別的對抗流程上，藉此實現具備找出遺失的視覺資訊的感知能力。為了產生可用於偵測不存在行人的訓練資料，我們使用了最新的圖像修補技術，即使圖片中的指定區域只存在背景，經過學習後的偵測器依然可以預測該區域觀測到行人的機率。我們根據存在機率將額外的行人放入對應圖片中，再透過使用者研究來評估我們的方法，衡量合成的圖片和真實圖片是否已難以區別。實證研究的結果顯示，我們的方法可以領悟到在道路街景中，何處是合理的行人走路或站立位置的概念。

We explore beyond object detection and semantic segmentation, and propose to address
the problem of estimating the presence probabilities of nonexistent pedestrians in a street
scene. Our method builds upon a combination of generative and discriminative procedures
to achieve the perceptual capability of figuring out missing visual information. We adopt
state-of-the-art inpainting techniques to generate the training data for nonexistent pedestrian
detection. The learned detector can predict the probability of observing a pedestrian
at some location in the current image, even if that location exhibits only the background.
We evaluate our method by inserting pedestrians into the image according to the presence
probabilities and conducting user study to distinguish real and synthetic images. The empirical
results show that our method can capture the idea of where the reasonable places are
for pedestrians to walk or stand in a street scene.

Introduction 9
Related work 11
Approach 13
1 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Experiments 25
1 Nonexistent Pedestrian Detection . . . . . . . . . . . . . . . . . . . . . . . 25
2 Synthetic Images with Rendered Pedestrians . . . . . . . . . . . . . . . . . 28
Conclusion and Future Work 36
                                

[1] M. Andriluka, L. Pishchulin, P. V. Gehler, and B. Schiele. 2d human pose estimation:
New benchmark and state of the art analysis. In 2014 IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014,
pages 3686–3693, 2014.
[2] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN. CoRR, abs/1701.07875,
2017.
[3] D. Berthelot, T. Schumm, and L. Metz. BEGAN: boundary equilibrium generative
adversarial networks. CoRR, abs/1703.10717, 2017.
[4] A. Borji and L. Itti. Exploiting local and global patch rarities for saliency detection.
In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence,
RI, USA, June 16-21, 2012, pages 478–485, 2012.
[5] G. J. Brostow, J. Fauqueur, and R. Cipolla. Semantic object classes in video: A highdefinition
ground truth database. Pattern Recognition Letters, 30(2):88–97, 2009.
[6] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand. What do different evaluation
metrics tell us about saliency models? CoRR, abs/1604.03605, 2016.
[7] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun. Monocular 3d object
detection for autonomous driving. In 2016 IEEE Conference on Computer Vision
and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages
2147–2156, 2016.
[8] Y. Chen, C. Shen, X.Wei, L. Liu, and J. Yang. Adversarial posenet: A structure-aware
convolutional network for human pose estimation. CoRR, abs/1705.00389, 2017.
[9] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke,
S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding.
In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR
2016, Las Vegas, NV, USA, June 27-30, 2016, pages 3213–3223, 2016.
[10] A. Criminisi, P. P´erez, and K. Toyama. Region filling and object removal by exemplarbased
image inpainting. IEEE Trans. Image Processing, 13(9):1200–1212, 2004.
[11] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR 2005), 20-26 June 2005, San Diego, CA, USA, pages 886–893, 2005.
[12] J. Dickmann, N. Appenrodt, J. Klappstein, H. Bl¨ocher, M. M. Muntzinger, A. Sailer,
M. Hahn, and C. Brenk. Making bertha see even more: Radar contribution. IEEE
Access, 3:1233–1247, 2015.
[13] P. Doll´ar, C.Wojek, B. Schiele, and P. Perona. Pedestrian detection: An evaluation of
the state of the art. IEEE Trans. Pattern Anal. Mach. Intell., 34(4):743–761, 2012.
[14] A. Ess, B. Leibe, K. Schindler, and L. J. V. Gool. A mobile vision system for robust
multi-person tracking. In 2008 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska,
USA, 2008.
[15] C. Finn, I. J. Goodfellow, and S. Levine. Unsupervised learning for physical interaction
through video prediction. In Advances in Neural Information Processing Systems
29: Annual Conference on Neural Information Processing Systems 2016, December
5-10, 2016, Barcelona, Spain, pages 64–72, 2016.
[16] F. Flohr and D. Gavrila. Pedcut: an iterative framework for pedestrian segmentation
combining shape models and multiple data cues. In British Machine Vision Conference,
BMVC 2013, Bristol, UK, September 9-13, 2013, 2013.
[17] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The KITTI
dataset. I. J. Robotics Res., 32(11):1231–1237, 2013.
[18] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for
accurate object detection and semantic segmentation. In 2014 IEEE Conference on
Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June
23-28, 2014, pages 580–587, 2014.
[19] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A. C.
Courville, and Y. Bengio. Generative adversarial networks. CoRR, abs/1406.2661,
2014.
[20] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved
training of wasserstein gans. CoRR, abs/1704.00028, 2017.
[21] C. Guo, Q. Ma, and L. Zhang. Spatio-temporal saliency detection using phase spectrum
of quaternion fourier transform. In 2008 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage,
Alaska, USA, 2008.
[22] C. Guo and L. Zhang. A novel multiresolution spatiotemporal saliency detection
model and its applications in image and video compression. IEEE Trans. Image Processing,
19(1):185–198, 2010.
[23] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.
In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016,
Las Vegas, NV, USA, June 27-30, 2016, pages 770–778, 2016.
[24] X. Hou and L. Zhang. Saliency detection: A spectral residual approach. In 2007 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR
2007), 18-23 June 2007, Minneapolis, Minnesota, USA, 2007.
[25] G. Huang, Z. Liu, and K. Q. Weinberger. Densely connected convolutional networks.
CoRR, abs/1608.06993, 2016.
[26] S. Iizuka, E. Simo-Serra, and H. Ishikawa. Globally and Locally Consistent Image
Completion. ACM Transactions on Graphics (Proc. of SIGGRAPH 2017),
36(4):107:1–107:14, 2017.
[27] M. Jiang, S. Huang, J. Duan, and Q. Zhao. SALICON: saliency in context. In IEEE
Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA,
USA, June 7-12, 2015, pages 1072–1080, 2015.
[28] S. Johnson and M. Everingham. Clustered pose and nonlinear appearance models for
human pose estimation. In British Machine Vision Conference, BMVC 2010, Aberystwyth,
UK, August 31 - September 3, 2010. Proceedings, pages 1–11, 2010.
[29] K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. M. Bhandarkar, W. Matusik, and
A. Torralba. Eye tracking for everyone. In 2016 IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016,
pages 2176–2184, 2016.
[30] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional
neural networks. In Advances in Neural Information Processing Systems
25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings
of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States.,
pages 1106–1114, 2012.
[31] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang,
and W. Shi. Photo-realistic single image super-resolution using a generative adversarial
network. CoRR, abs/1609.04802, 2016.
[32] J. Li, X. Liang, Y.Wei, T. Xu, J. Feng, and S. Yan. Perceptual Generative Adversarial
Networks for Small Object Detection. ArXiv e-prints, June 2017.
[33] Y. Li, S. Liu, J. Yang, and M. Yang. Generative face completion. CoRR,
abs/1704.05838, 2017.
[34] P. Luc, C. Couprie, S. Chintala, and J. Verbeek. Semantic segmentation using adversarial
networks. CoRR, abs/1611.08408, 2016.
[35] L. Marchesotti, C. Cifarelli, and G. Csurka. A framework for visual saliency detection
with applications to image thumbnailing. In IEEE 12th International Conference on
Computer Vision, ICCV 2009, Kyoto, Japan, September 27 - October 4, 2009, pages
2232–2239, 2009.
[36] G.M´attyus, S.Wang, S. Fidler, and R. Urtasun. HD maps: Fine-grained road segmentation
by parsing ground and aerial images. In 2016 IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016,
pages 3611–3619, 2016.
[37] M. Mirza and S. Osindero. Conditional generative adversarial nets. CoRR,
abs/1411.1784, 2014.
[38] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation.
In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam,
The Netherlands, October 11-14, 2016, Proceedings, Part VIII, pages 483–499, 2016.
[39] H. Pan and H. Jiang. Supervised adversarial networks for image saliency detection.
CoRR, abs/1704.07242, 2017.
[40] J. Pan, C. Canton-Ferrer, K. McGuinness, N. E. O’Connor, J. Torres, E. Sayrol, and
X. Gir´o i Nieto. Salgan: Visual saliency prediction with generative adversarial networks.
CoRR, abs/1701.01081, 2017.
[41] D. Pathak, P. Kr¨ahenb¨uhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders:
Feature learning by inpainting. In 2016 IEEE Conference on Computer Vision
and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages
2536–2544, 2016.
[42] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep
convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015.
[43] G. Ros, S. Ramos, M. Granados, A. Bakhtiary, D. V´azquez, and A. M. L´opez. Visionbased
offline-online perception paradigm for autonomous driving. In 2015 IEEE Winter
Conference on Applications of Computer Vision, WACV 2015, Waikoloa, HI, USA,
January 5-9, 2015, pages 231–238, 2015.
[44] E. Shelhamer, J. Long, and T. Darrell. Fully convolutional networks for semantic
segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(4):640–651, 2017.
[45] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale
image recognition. CoRR, abs/1409.1556, 2014.
[46] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on
Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12,
2015, pages 1–9, 2015.
[47] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler. Joint training of a convolutional
network and a graphical model for human pose estimation. In Advances in Neural
Information Processing Systems 27: Annual Conference on Neural Information
Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages
1799–1807, 2014.
[48] C. Vondrick, H. Pirsiavash, and A. Torralba. Generating videos with scene dynamics.
In Advances in Neural Information Processing Systems 29: Annual Conference
on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona,
Spain, pages 613–621, 2016.
[49] X. Wang, A. Shrivastava, and A. Gupta. A-fast-rcnn: Hard positive generation via
adversary for object detection. CoRR, abs/1704.03414, 2017.
[50] C. Wojek, S. Walk, and B. Schiele. Multi-cue onboard pedestrian detection. In 2009
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
(CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 794–801, 2009.
[51] J. Xie, M. Kiefel, M. Sun, and A. Geiger. Semantic instance annotation of street
scenes by 3d to 2d label transfer. In 2016 IEEE Conference on Computer Vision
and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages
3688–3697, 2016.
[52] Y. Xue, T. Xu, H. Zhang, R. Long, and X. Huang. SegAN: Adversarial Network with
Multi-scale L 1 Loss for Medical Image Segmentation. ArXiv e-prints, June 2017.
[53] C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, and H. Li. High-resolution image
inpainting using multi-scale neural patch synthesis. CoRR, abs/1611.09969, 2016.
[54] R. Yeh, C. Chen, T. Lim, M. Hasegawa-Johnson, and M. N. Do. Semantic image
inpainting with perceptual and contextual losses. CoRR, abs/1607.07539, 2016.
[55] Z. Zhang, S. Fidler, and R. Urtasun. Instance-level segmentation for autonomous
driving with deep densely connected mrfs. In 2016 IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016,
pages 669–677, 2016.
[56] J. J. Zhao, M. Mathieu, and Y. LeCun. Energy-based generative adversarial network.
CoRR, abs/1609.03126, 2016.

簡易檢索 / 詳目顯示

相關論文