簡易檢索 / 詳目顯示

研究生: 欒俊
Luan, Jun
論文名稱: 基於混合深度架構的行人偵測
Hybrid Deep Architecture for Pedestrian Detection
指導教授: 賴尚宏
Lai, Shang-Hong
劉庭祿
Liu, Tyng-Luh
口試委員: 孫民
Sun, Min
陳煥宗
Chen, Hwann-Tzong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 41
中文關鍵詞: 行人偵測卷積神經網路混合深度架構
外文關鍵詞: Pedestrian Detection, CNN, Hybrid Deep Architecture
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本篇論文中,我們提出了一個混合卷積神經網路 (CNN) - 分類玻爾茲曼機(ClassRBM) 的模型,並應用於行人偵測問題。雖然深度網路的方法在辨識和一般物體偵測的問題下取得了巨大的突破,但是在行人偵測問題上,這類方法並沒有清晰地顯示出它的優越性,並且和當今最好的特徵池加強分類器的方法有一定的差距。我們利用預先訓練好的 AlexNet 網路,並在行人的數據集上做精細調整訓練,加上仔細訓練的分類玻爾茲曼機,在行人偵測的數據集上取得了很好的結果。我們的模型同步的提取了局部特徵,並且利用它們通過多層網路來提取高層次和全局的特徵。分類玻爾茲曼機將高層特徵轉換成最終的幾率分佈。我們利用物件位置回歸加取樣的方法來解決由低質量候選物件引起的定位問題。我們的實驗從不同方面展現出了深度網路在行人偵測上的成功應用。


    In this thesis we propose a hybrid convolutional neural network (CNN)-classification Restricted Boltzmann Machine (ClassRBM) model for the task of pedestrian detection. Although deep-net approaches have been shown to be successful in tackling recognition and general object detection problems, its success in pedestrian detection is not clear and not competitive with the state-of-the-art feature pools plus boosted decision trees method. We integrate a fine-tuned AlexNet with a carefully-trained ClassRBM to achieve competitive performances in the INRIA and Caltech pedestrian dataset. The model jointly extracts local features and further processes them through multiple layers to extract high-level and global features. The top-layer ClassRBM performs inference from CNN features and outputs classification results as a probability distribution. An additional bounding-box regression with sampling method is employed for addressing the localization problem caused by low-quality region proposals. Our experiments demonstrate the successful results of deep net for pedestrian detection in many aspects.

    1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Main Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Previous Works 5 2.1 Traditional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Deep Network Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Region Proposals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Proposed Hybrid Deep Architecture 10 3.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Deep CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Classification RBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4 Bounding Box Regression . . . . . . . . . . . . . . . . . . . . . . . . 18 3.5 Merge and Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Experiments 22 4.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3 Different Region Proposals . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4 Optimal ClassRBM Parameters . . . . . . . . . . . . . . . . . . . . . . 27 4.5 Different Layer Features . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.6 Box Regression and Sampling . . . . . . . . . . . . . . . . . . . . . . 28 4.7 Fine-tune The Network . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.8 Training Data’s Selection . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.9 Transfer Learning Ability . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Conclusion 36 References 37

    [1] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV,
    2004.
    [2] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,”
    in CVPR, 2005.
    [3] P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively trained,
    multiscale, deformable part model,” in CVPR, 2008.
    [4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
    convolutional neural networks,” in NIPS, 2012.
    [5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A largescale hierarchical image database,” in CVPR, 2009.
    [6] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for
    accurate object detection and semantic segmentation,” in CVPR, 2014.
    [7] P. Dollar, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: An evaluation of the state of the art,” PAMI, 2012.
    [8] R. Benenson, M. Omran, J. Hosang, and B. Schiele, “Ten years of pedestrian detection, what have we learned?,” in ECCV, CVRSUAD workshop, 2014.
    38
    [9] A. Ess, B. Leibe, K. Schindler, and L. Van Gool, “A mobile vision system for
    robust multi-person tracking,” in CVPR, 2008.
    [10] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the
    kitti vision benchmark suite,” in CVPR, 2012.
    [11] P. Dollár, Z. Tu, P. Perona, and S. Belongie, “Integral channel features.,” in BMVC,
    2009.
    [12] J. Hosang, M. Omran, R. Benenson, and B. Schiele, “Taking a deeper look at
    pedestrians,” CVPR, 2015.
    [13] R. Benenson, M. Mathias, T. Tuytelaars, and L. Van Gool, “Seeking the strongest
    rigid detector,” in CVPR, 2013.
    [14] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian detection
    with unsupervised multi-stage feature learning,” in CVPR, 2013.
    [15] W. Ouyang and X. Wang, “A discriminative deep model for pedestrian detection
    with occlusion handling,” in CVPR, 2012.
    [16] W. Ouyang, X. Zeng, and X. Wang, “Modeling mutual visibility relationship in
    pedestrian detection,” in CVPR, 2013.
    [17] W. Ouyang and X. Wang, “Joint deep learning for pedestrian detection,” in ICCV,
    2013.
    [18] P. Luo, Y. Tian, X. Wang, and X. Tang, “Switchable deep network for pedestrian
    detection,” in CVPR, 2014.
    39
    [19] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Computer Science Department, University of Toronto, Tech. Rep, 2009.
    [20] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, “Selective
    search for object recognition,” IJCV, 2013.
    [21] B. Alexe, T. Deselaers, and V. Ferrari, “What is an object?,” in CVPR, 2010.
    [22] C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,”
    in ECCV, 2014.
    [23] M.-M. Cheng, Z. Zhang, W.-Y. Lin, and P. Torr, “Bing: Binarized normed gradients
    for objectness estimation at 300fps,” in CVPR, 2014.
    [24] W. Ouyang and X. Wang, “Single-pedestrian detection aided by multi-pedestrian
    detection,” in CVPR, 2013.
    [25] X. Zeng, W. Ouyang, and X. Wang, “Multi-stage contextual deep learning for
    pedestrian detection,” in ICCV, 2013.
    [26] Z. Zhang, J. Warrell, and P. H. Torr, “Proposal generation for object detection using
    cascaded ranking svms,” in CVPR, 2011.
    [27] G. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief
    nets,” Neural computation, 2006.
    [28] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama,
    and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,”
    arXiv preprint arXiv:1408.5093, 2014.
    40
    [29] H. Larochelle and Y. Bengio, “Classification using discriminative restricted boltzmann machines,” in ICML, 2008.
    [30] P. Dollár, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: A benchmark,” in CVPR, 2009.
    [31] W. Nam, P. Dollár, and J. H. Han, “Local decorrelation for improved pedestrian
    detection,” in NIPS, 2014.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE