研究生: |
廖品捷 Liao, Pin-Jie |
---|---|
論文名稱: |
增強式多目標追蹤運用空間不確定性及注意行人整體特徵 Robust Multi-Object Tracking with Spatial Uncertainty and Transfomer Re-ID |
指導教授: |
賴尚宏
Lai, Shang-Hong |
口試委員: |
劉庭祿
Liu, Tyng-Luh 江振國 Chiang, Chen-Kuo 陳煥宗 Chen, Hwann-Tzong |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 英文 |
論文頁數: | 43 |
中文關鍵詞: | 多目標追蹤 |
外文關鍵詞: | MOT |
相關次數: | 點閱:40 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
大多數方法通過追蹤-檢測範式來解決多目標追蹤(MOT)問題,該範式通 過關聯檢測框的分數高於給定閾值的檢測窗口中的對象來追蹤對象。因此, 在處理複雜情況(如遮擋)時,置信度成為邊界框的唯一指標。然而,高置 信度不能保證邊界框與附近物體沒有重疊,尤其是在擁擠的場景中。在本論 文中,我們引入了一種更可靠的遮擋指標稱為 SU。此外,我們提出了一種 簡單的基於 Transformer 的目標重識別框架,與遮擋管理協同工作並處理遮 擋。首先,統計分析顯示空間不確定性與遮擋比例高度相關,能夠更好地表 示檢測框中的遮擋程度。所提出的兩階段 SSUTracker(具有空間不確定性的 稀疏追蹤器)在第一階段測量空間不確定性,然後在第二階段用於學習強健 的軌跡表示和關聯。結果,我們的方法在流行的 MOT16, MOT17 和 MOT20 基準測試中相比於最先進的方法取得了非常有競爭力的結果。
Most methods address the multi-object tracking (MOT) problem by tracking-by- detection paradigm, which tracks objects from the detected windows by associating detection boxes whose scores are higher than a given threshold. As such, the confi- dence score becomes the only indicator to bounding boxes when handling compli- cated cases, such as occlusions. However, high confidence score cannot guarantee that the bounding box has no overlapping with nearby obects, especially in crowded scenarios. In this thesis, we propose a reliable occlusion indicator called SU. Sta- tistical analysis indicates high correlation between spatial uncertainty and occlusion ratio, providing good indication of occlusion level in the detection results. In addi- tion, we propose a two-stage SSUTracker (Sparse tracker with Spatial Uncertainty), which measures spatial uncertainty in stage one and learns robust tracklet represen- tation and association with the SU for each object in stage two. Additionally, we propose a transformer-based ReID model to synergize with occlusion management and handle occlusions. As a result, our approach achieves very competitive results on popular MOT16, MOT17 and MOT20 benchmarks compared to state-of-the-art methods.
[1] Al-Shakarji, N. M., Bunyak, F., Seetharaman, G., and Palaniappan, K. Multi- object tracking cascade with multi-step data association and occlusion han- dling. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2018), IEEE, pp. 1–6.
[2] Bergmann, P., Meinhardt, T., and Leal-Taixe, L. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Com- puter Vision (2019), pp. 941–951.
[3] Bernardin, K., and Stiefelhagen, R. Evaluating multiple object tracking per- formance:the clear mot metrics. In EURASIP Journal on Image and Video Processin (2008).
[4] Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B. Simple online and re- altime tracking. In 2016 IEEE International Conference on Image Processing (ICIP) (2016), pp. 3464–3468.
[5] Cai,J.,Xu,M.,Li,W.,Xiong,Y.,Xia,W.,Tu,Z.,andSoatto,S.Memot:multi- object tracking with memory. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022).
[6] Chen, T., Ding, S., Xie, J., Yuan, Y., Chen, W., Yang, Y., Ren, Z., and Wang, Z. Abd-net: Attentive but diverse person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (2019), pp. 8351– 8361.
[7] Chen, X., Fu, C., Zhao, Y., Zheng, F., Song, J., Ji, R., and Yang, Y. Salience- guided cascaded suppression network for person re-identification. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition (2020), pp. 3300–3310.
[8] Dendorfer, P., Ošep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S., and Leal-Taixé, L. Motchallenge: A benchmark for single-camera multiple target tracking. In International Journal of Computer Vision (IJCV) (2020).
[9] Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., and Leal-Taixé, L. Mot20: A benchmark for multi object tracking in crowded scenes, 2020.
[10] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Un- terthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[11] Evangelidis, G. D., and Psarakis, E. Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE transactions on pattern analysis and machine intelligence 30, 10 (2008), 1858–1865.
[12] Gal, Y., and Ghahramani, Z. Dropout as a bayesian approximation: Repre- senting model uncertainty in deep learning. In The International Conference on Learning Representations (ICLR) (2016).
[13] Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).
[14] Girshick, R. Fast r-cnn. In The IEEE International Conference on Computer Vision (ICCV) (2015).
[15] He,K.,Zhang,X.,Ren,S.,andSun,J.Deepresiduallearningforimagerecog- nition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 770–778.
[16] He, S., Luo, H., Wang, P., Wang, F., Li, H., and Jiang, W. Transreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (2021), pp. 15013–15022.
[17] He, Y., Zhu, C., Wang, J., Savvides, M., and Zhang, X. Bounding box regres- sion with uncertainty for accurate object detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (6 2019).
[18] Kalman, R. E., and Others. A new approach to linear filtering and prediction problems. Journal of basic Engineering 82, 1 (1960), 35–45.
[19] Kampffmeyer, M., Salberg, A.-B., and Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images us- ing deep convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) (2016).
[20] Kendall, A., and Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Advances in Neural Information Processing Systems (2017).
[21] Kuhn, H. W., and Yaw, B. The hungarian method for the assignment problem. in Naval Res. Logist. Quart.
[22] Liao, P.-J., Huang, Y.-C., Chiang, C.-K., and Lai, S.-H. Robust multi-object tracking with spatial uncertainty. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023), IEEE, pp. 1–5.
[23] Lin,T.-Y.,Maire,M.,Belongie,S.,Bourdev,L.,Girshick,R.,Hays,J.,Perona, P., Ramanan, D., Zitnick, C. L., and Dollár, P. Microsoft coco: Common objects in context, 2014.
[24] Meinhardt,T.,Kirillov,A.,Leal-Taixe,L.,andFeichtenhofer,C.Trackformer: Multi-object tracking with transformers. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2022).
[25] Milan, A., Leal-Taixé, L., Reid, I. D., Roth, S., and Schindler, K. MOT16: A benchmark for multi-object tracking. CoRR (2016).
[26] Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell, T., and Yu, F. Quasi-dense similarity learning for multiple object tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (6 2021).
[27] Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Infor- mation Processing Systems (2015).
[28] Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
[29] Ristani, E., Solera, F., Zou, R. S., Cucchiara, R., and Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision (ECCV) Workshop (2016).
[30] Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., and Sun, J. Crowd- human: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018).
[31] Stadler, D., and Beyerer, J. Improving multiple pedestrian tracking by track management and occlusion handling. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition (2021), pp. 10958–10967.
[32] Sun,P.,Zhang,R.,Jiang,Y.,Kong,T.,Xu,C.,Zhan,W.,Tomizuka,M.,Li,L., Yuan, Z., Wang, C., and Luo, P. SparseR-CNN: End-to-end object detection with learnable proposals. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
[33] Tang, S., Andriluka, M., Andres, B., and Schiele, B. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 3539–3548.
[34] Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jégou, H. Training data-efficient image transformers & distillation through attention. In International conference on machine learning (2021), PMLR, pp. 10347– 10357.
[35] Wang, Q., Zheng, Y., Pan, P., and Xu, Y. Multiple object tracking with cor- relation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 3876–3886.
[36] Wei,L.,Zhang,S.,Gao,W.,andTian,Q.Persontransfergantobridgedomain gap for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 79–88.
[37] Wojke, N., Bewley, A., and Paulus, D. Simple online and realtime tracking with a deep association metric. In The IEEE International Conference on Im- age Processing (ICIP) (2017), IEEE, pp. 3645–3649.
[38] Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., and Yuan, J. Track to detect and segment: An online multi-object tracker. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
[39] Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., and Wang, X. Bytetrack: Multi-object tracking by associating every detection box. European Conference on Computer Vision (ECCV) (2022).
[40] Zhang,Y.,Wang,C.,Wang,X.,Zeng,W.,andLiu,W.Fairmot:Onthefairness of detection and re-identification in multiple object tracking. In International Journal of Computer Vision (IJCV).
[41] Zhang, Y., Wang, C., Wang, X., Zeng, W., and Liu, W. Robust multi-object tracking by marginal inference. European Conference on Computer Vision (ECCV) (2022).
[42] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., and Tian, Q. Scalable per- son re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision (2015), pp. 1116–1124.
[43] Zhou, K., Yang, Y., Cavallaro, A., and Xiang, T. Omni-scale feature learning for person re-identification. In The IEEE International Conference on Com- puter Vision (ICCV) (2019).
[44] Zhou, X., Wang, D., and Krähenbühl, P. Objects as points. In arXiv preprint arXiv:1904.07850 (2019).
[45] 黃鈺程. 使用 sparse r-cnn 與空間不確性進行多目標跟蹤. Master’s thesis, 國立清華大學資訊工程學系, 2021.