簡易檢索 / 詳目顯示

研究生: 黃廉傑
Huang, Lien-Chieh
論文名稱: 基於可見區域引導的遮擋處理之高效行人檢測
Efficient Pedestrian Detection based on Visible Guided Occlusion Handling
指導教授: 邱瀞德
Chiu, Ching-Te
口試委員: 謝君偉
Hsieh, Jun-Wei
賴尚宏
Lai, Shang-Hong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2024
畢業學年度: 113
語文別: 英文
論文頁數: 49
中文關鍵詞: 行人偵測物件偵測遮擋行人偵測端到端偵測特徵融合深度學習
外文關鍵詞: Pedestrian detection, Object detection, Occlusion pedestrian detection, End-to-end detection, Feature fusion, Deep learning
相關次數: 點閱:87下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 行人偵測是電腦視覺中的一項重要任務,在自動駕駛、監控和機器人技術中有廣泛的應用。它涉及到在影像或視訊中識別和定位行人。作為自動駕駛的關鍵組成部分之一,行人偵測在避免碰撞方面發揮著關鍵作用,可確保及時、準確地識別車輛周圍的行人。然而,目前的行人偵測器面臨三大挑戰。首先,行人之間的尺度差異會影響偵測精確度,因為小尺度的行人只佔用較少的像素,導致解析度較低且特徵不夠明顯。其次,當其他物件遮擋行人或相互遮擋時,偵測就會變得更具挑戰性。最後,大多數最先進的方法都著重於提升效能,但往往忽略了計算複雜度的增加。針對上述問題,我們提出基於可見區域引導的行人偵測(VGPD),不增加運算量地引入來自可見區域標註的資訊。我們提出了重新參數化模組RepC4,並將其用於Efficient Layer Aggregate Network(ELAN)中作為基本計算單元,在強化特徵表示的狀況下不增加額外的計算量。為了處理尺度差異問題,我們結合Adaptive Spatial Feature Fusion (ASFF) 與Path Aggregate Network (PAN),進一步強化空間與語意資訊。此外,還應用可見框引導輔助頭,提供可見區域知識以處理遮蔽。我們提出了可見區域引導的標籤分配,以透過可見區域和全身標註提供精確的正樣本。我們提出的VGPD 優於其SOTAs,MR−2 為7.7%,而其他SOTAs 則為8.3%,且推理時間最短,在CityPersons上僅需0.1 秒。


    Pedestrian detection is a vital task in computer vision, with wide-ranging applications in autonomous driving, robotics, and surveillance. It focuses on identifying and locating pedestrians in images or videos. As a crucial element of autonomous driving systems, pedestrian detection is essential for collision avoidance, enabling the timely and accurate recognition of pedestrians around vehicles. However, current pedestrian detectors face three major challenges. First, the scale variance among pedestrians affects the detection accuracy as small-scale pedestrians only occupy fewer pixels, resulting in lower resolution and less distinctive features. Second, the detection becomes more challenging as other objects occlude pedestrians or are mutual occlusion. Lastly, most SOTA methods focus on improving performance but often ignore increases in computational complexity. To tackle these challenges, we propose the Visible Guided method for Pedestrian Detection (VGPD), which leverages the additional information from the visible region annotations at no extra cost. A re-parameterization module, RepC4, is proposed and used in the Efficient Layer Aggregate Network (ELAN) as a basic computational unit to improve feature representation without any additional cost. To handle the scale variance problem, we combine the Adaptive Spatial Feature Fusion (ASFF) and Path Aggregate Network (PAN) to further enhance both spatial and semantic information. Additionally, the visible box guided auxiliary head is utilized to provide knowledge about visible regions for occlusion handling. The visible guided label assignment is proposed to offer precise positive samples via both the visible region and full body annotations. Our proposed VGPD outperforms other SOTAs with MR−2 of 7.7% compared to 8.3%, and it achieves the lowest inference time taking only 0.1 seconds on CityPersons.

    摘要 Abstract 1 Introduction. . . . . . . . . 1 1.1 Backgroud . . . . . . . . . 1 1.2 Goal . . . . . . . . . 2 1.3 Contribution . . . . . . . . . 4 2 Related Work . . . . . . . . . 5 2.1 Scale Variance Pedestrian Detection . . . . . . . . . 5 2.2 Occlusion scenes Pedestrian Detection . . . . . . . . . 6 2.3 Efficient Pedestrian Detection . . . . . . . . . 6 3 VGPD . . . . . . . . . 9 3.1 Overall pipeline . . . . . . . . . 9 3.2 Network Architecture . . . . . . . . . 10 3.2.1 Backbone . . . . . . . . . 10 3.2.2 Feature Fusion . . . . . . . . . 10 3.2.3 Re-parameterization: RepC4 . . . . . . . . . 11 3.2.4 Efficient Layer Aggregation Network with Re-parameterization . . . . . . . . . 15 3.2.5 Adaptive Spatial Feature Fusion and Path Aggregation Network . . . . . . . . . 16 3.3 Visible Guided Pedestrian Detection . . . . . . . . . 19 3.3.1 Visible Box Guided Auxiliary Head . . . . . . . . . 19 3.3.2 Visible Guided Label Assignment . . . . . . . . . 21 3.3.3 NMS-free Method . . . . . . . . . 23 3.3.4 Detection Head . . . . . . . . . 25 3.4 Loss Function . . . . . . . . . 26 4 Experiments . . . . . . . . . 29 4.1 Dataset . . . . . . . . . 29 4.2 Evaluation setting . . . . . . . . .30 4.3 Comparison with state-of-the-arts. . . . . . . . .31 4.3.1 Comparison with SOTAs on CityPersons dataset . . . . . . . . . 32 4.3.2 Comparison with SOTAs on Caltech Pedestrian dataset . . . . . . . . . 32 4.3.3 Qualitative comparison . . . . . . . . . 34 4.4 Ablation study . . . . . . . . . 36 5 Conclusion . . . . . . . . . 43 References . . . . . . . . . 45

    [1] Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S Davis. Soft-nms–improving object detection with one line of code. In Proceedings of the IEEE international conference on computer vision, pages 5561–5569, 2017.
    [2] Garrick Brazil and Xiaoming Liu. Pedestrian detection with autoregressive network phases. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 7231–7240, 2019.
    [3] Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018.
    [4] Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z Li, and Xudong Zou. Pedhunter: Occlusion robust pedestrian detector in crowded scenes. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 10639–10646, 2020.
    [5] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
    [6] Xiaohan Ding, Yuchen Guo, Guiguang Ding, and Jungong Han. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1911–1920, 2019.
    [7] Xiaohan Ding, Xiangyu Zhang, Jungong Han, and Guiguang Ding. Diverse branch block: Building a convolution as an inception-like unit. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10886–10895, 2021.
    [8] Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, and Jian Sun. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13733–13742, 2021.
    [9] Piotr Dollár, Christian Wojek, Bernt Schiele, and Pietro Perona. Pedestrian detection: A benchmark. In 2009 IEEE conference on computer vision and pattern recognition, pages 304–311. IEEE, 2009.
    [10] Chengjian Feng, Yujie Zhong, Yu Gao, Matthew R Scott, and Weilin Huang. Tood: Taskaligned one-stage object detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3490–3499. IEEE Computer Society, 2021.
    [11] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
    [12] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
    [13] Wei-Yen Hsu and Wen-Yen Lin. Adaptive fusion of multi-scale yolo for pedestrian detection. IEEE Access, 9:110063–110073, 2021.
    [14] Wei-Yen Hsu and Pei-Yu Yang. Pedestrian detection using multi-scale structure-enhanced super-resolution. IEEE Transactions on Intelligent Transportation Systems, 2023.
    [15] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
    [16] Xin Huang, Zheng Ge, Zequn Jie, and Osamu Yoshie. Nms by representative region: Towards crowded pedestrian detection by proposal pairing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10750–10759, 2020.
    [17] Abdul Hannan Khan, Mohsin Munir, Ludger van Elst, and Andreas Dengel. F2dnet: Fast focal detection network for pedestrian detection. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 4658–4664. IEEE, 2022.
    [18] Abdul Hannan Khan, Mohammed Shariq Nawaz, and Andreas Dengel. Localized semantic feature mixers for efficient pedestrian detection in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5476– 5485, 2023.
    [19] Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV), pages 734–750, 2018.
    [20] Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. Deeply-supervised nets. In Artificial intelligence and statistics, pages 562–570. Pmlr, 2015.
    [21] Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, and Jian Yang. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Advances in Neural Information Processing Systems, 33:21002–21012, 2020.
    [22] Chih-Yang Lin, Hong-Xia Xie, and Hua Zheng. Pedjointnet: Joint head-shoulder and full body deep network for pedestrian detection. IEEE Access, 7:47687–47697, 2019.
    [23] Chunze Lin, Jiwen Lu, Gang Wang, and Jie Zhou. Graininess-aware deep feature learning for pedestrian detection. In Proceedings of the European conference on computer vision (ECCV), pages 732–747, 2018.
    [24] Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, and Zhidong Deng. Detr for crowd pedestrian detection. arXiv preprint arXiv:2012.06785, 2020.
    [25] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
    [26] Mengyin Liu, Jie Jiang, Chao Zhu, and Xu-Cheng Yin. Vlpd: Context-aware pedestrian detection via vision-language semantic self-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6662–6671, 2023.
    [27] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8759–8768, 2018.
    [28] Songtao Liu, Di Huang, and Yunhong Wang. Adaptive nms: Refining pedestrian detection in a crowd. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6459–6468, 2019.
    [29] Songtao Liu, Di Huang, and Yunhong Wang. Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516, 2019.
    [30] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision– ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016.
    [31] Wei Liu, Shengcai Liao, Weidong Hu, Xuezhi Liang, and Xiao Chen. Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In Proceedings of the European Conference on Computer Vision (ECCV), pages 618–634, 2018.
    [32] Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, and Yinan Yu. High-level semantic feature detection: A new perspective for pedestrian detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5187–5196, 2019.
    [33] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
    [34] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards realtime object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
    [35] K.N Ajay Shastry, K. Ravi Sri Teja, Aditya Nigam, and Chetan Arora. Favoring one among equals - not a good idea: Many-to-one matching for robust transformer based pedestrian detection. In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 748–757, 2024. doi: 10.1109/WACV57701.2024.00081.
    [36] Lei Shi, Charles Livermore, and Ioannis A Kakadiaris. Dvrnet: Decoupled visible region network for pedestrian detection. In 2020 IEEE International Joint Conference on Biometrics (IJCB), pages 1–9. IEEE, 2020.
    [37] Xiaolin Song, Kaili Zhao, Wen-Sheng Chu, Honggang Zhang, and Jun Guo. Progressive refinement network for occluded pedestrian detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pages 32–48. Springer, 2020.
    [38] Xiaolin Song, Binghui Chen, Pengyu Li, Jun-Yan He, Biao Wang, Yifeng Geng, Xuansong Xie, and Honggang Zhang. Optimal proposal learning for deployable end-to-end pedestrian detection. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3250–3260, 2023. doi: 10.1109/CVPR52729.2023.00317.
    [39] Peize Sun, Yi Jiang, Enze Xie, Wenqi Shao, Zehuan Yuan, Changhu Wang, and Ping Luo. What makes for end-to-end object detection? In International Conference on Machine Learning, pages 9934–9944. PMLR, 2021.
    [40] Mingxing Tan, Ruoming Pang, and Quoc V Le. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10781–10790, 2020.
    [41] Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30, 2017.
    [42] Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos: A simple and strong anchor-free object detector. IEEE transactions on pattern analysis and machine intelligence, 44(4): 1922–1933, 2020.
    [43] Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, and Guiguang Ding. Yolov10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458, 2024.
    [44] Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, and I-Hau Yeh. Cspnet: A new backbone that can enhance learning capability of cnn. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 390–391, 2020.
    [45] Chien-Yao Wang, Hong-Yuan Mark Liao, and I-Hau Yeh. Designing network design strategies through gradient path analysis. arXiv preprint arXiv:2211.04800, 2022.
    [46] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7464–7475, 2023.
    [47] Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn using programmable gradient information. arXiv preprint arXiv:2402.13616, 2024.
    [48] Jianfeng Wang, Lin Song, Zeming Li, Hongbin Sun, Jian Sun, and Nanning Zheng. Endto-end object detection with fully convolutional network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15849–15858, 2021.
    [49] Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, and Chunhua Shen. Repulsion loss: Detecting pedestrians in a crowd. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7774–7783, 2018. [50] Jin Xie, Yanwei Pang, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, and Ling Shao. Mask-guided attention network and occlusion-sensitive hard example mining for occluded pedestrian detection. IEEE transactions on image processing, 30:3872–3884, 2020.
    [51] Peiyu Yang, Guofeng Zhang, Lu Wang, Lisheng Xu, Qingxu Deng, and Ming-Hsuan Yang. A part-aware multi-scale fully convolutional network for pedestrian detection. IEEE Transactions on Intelligent Transportation Systems, 22(2):1125–1137, 2020.
    [52] Jialiang Zhang, Lixiang Lin, Jianke Zhu, Yang Li, Yun-chen Chen, Yao Hu, and Steven CH Hoi. Attribute-aware pedestrian detection in a crowd. IEEE Transactions on Multimedia, 23:3085–3097, 2020.
    [53] Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. Citypersons: A diverse dataset for pedestrian detection. In CVPR, 2017.
    [54] Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. Occlusion-aware rcnn: Detecting pedestrians in a crowd. In Proceedings of the European conference on computer vision (ECCV), pages 637–653, 2018.
    [55] Xiaowei Zhang, Shuai Cao, and Chenglizhao Chen. Scale-aware hierarchical detection network for pedestrian detection. IEEE Access, 8:94429–94439, 2020.
    [56] Anlin Zheng, Yuang Zhang, Xiangyu Zhang, Xiaojuan Qi, and Jian Sun. Progressive endto-end object detection in crowded scenes. arXiv preprint arXiv:arXiv:2203.07669v1, 2022.
    [57] Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, and Dongwei Ren. Distance-iou loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 12993–13000, 2020.
    [58] Chunluan Zhou and Junsong Yuan. Multi-label learning of part detectors for heavily occluded pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 3486–3495, 2017

    QR CODE