增強式多目標追蹤運用空間不確定性及注意行人整體特徵

簡易檢索 / 詳目顯示

回結果列表

研究生：	廖品捷 Liao, Pin-Jie
論文名稱：	增強式多目標追蹤運用空間不確定性及注意行人整體特徵 Robust Multi-Object Tracking with Spatial Uncertainty and Transfomer Re-ID
指導教授：	賴尚宏 Lai, Shang-Hong
口試委員:	劉庭祿 Liu, Tyng-Luh 江振國 Chiang, Chen-Kuo 陳煥宗 Chen, Hwann-Tzong
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	43
中文關鍵詞：	多目標追蹤
外文關鍵詞：	MOT
相關次數：	點閱：40 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

大多數方法通過追蹤-檢測範式來解決多目標追蹤(MOT)問題，該範式通過關聯檢測框的分數高於給定閾值的檢測窗口中的對象來追蹤對象。因此，在處理複雜情況(如遮擋)時，置信度成為邊界框的唯一指標。然而，高置信度不能保證邊界框與附近物體沒有重疊，尤其是在擁擠的場景中。在本論文中，我們引入了一種更可靠的遮擋指標稱為 SU。此外，我們提出了一種簡單的基於 Transformer 的目標重識別框架，與遮擋管理協同工作並處理遮擋。首先，統計分析顯示空間不確定性與遮擋比例高度相關，能夠更好地表示檢測框中的遮擋程度。所提出的兩階段 SSUTracker(具有空間不確定性的稀疏追蹤器)在第一階段測量空間不確定性，然後在第二階段用於學習強健的軌跡表示和關聯。結果，我們的方法在流行的 MOT16, MOT17 和 MOT20 基準測試中相比於最先進的方法取得了非常有競爭力的結果。

Most methods address the multi-object tracking (MOT) problem by tracking-by- detection paradigm, which tracks objects from the detected windows by associating detection boxes whose scores are higher than a given threshold. As such, the confi- dence score becomes the only indicator to bounding boxes when handling compli- cated cases, such as occlusions. However, high confidence score cannot guarantee that the bounding box has no overlapping with nearby obects, especially in crowded scenarios. In this thesis, we propose a reliable occlusion indicator called SU. Sta- tistical analysis indicates high correlation between spatial uncertainty and occlusion ratio, providing good indication of occlusion level in the detection results. In addi- tion, we propose a two-stage SSUTracker (Sparse tracker with Spatial Uncertainty), which measures spatial uncertainty in stage one and learns robust tracklet represen- tation and association with the SU for each object in stage two. Additionally, we propose a transformer-based ReID model to synergize with occlusion management and handle occlusions. As a result, our approach achieves very competitive results on popular MOT16, MOT17 and MOT20 benchmarks compared to state-of-the-art methods.

Introduction 1 1.1 ProblemStatement .......................... 1 1.2 Motivation............................... 2 1.3 Contributions ............................. 3 1.4 ThesisOrganization.......................... 4
Related Works 6 2.1 Tracking-by-Detection ........................ 6 2.2 ObjectDetectionusingSpatialUncertainty . . . . . . . . . . . . . 7 2.3 OcclusionHandling.......................... 7 2.4 Re-identificationwithVisionTransformer . . . . . . . . . . . . . . 8
Proposed Method 9
1 DetectionwithSpatialUncertainty.................. 9
1.1 SpatialUncertainty...................... 9
1.2 CorrelationTestonOcclusionLevel . . . . . . . . . . . . 10
1.3 SparseR-CNNwithSpatialUncertainty . . . . . . . . . . . 11
2 AssociationwithSpatialUncertainty................. 12 3.2.1 OcclusionIndicator-SU................... 12 3.2.2 Transformer-basedReID................... 13 3.2.3 AppearancesimilaritywithSU................ 17 3.2.4 GeometrySimilaritywithSU ................ 17 3.2.5 Trackingalgorithm...................... 18
Experiments 21 4.1 Datasets................................ 21 4.2 Evaluation............................... 22 4.3 ImplementationDetails........................ 24 4.4 ExperimentalResults ......................... 25
Ablation study 28 5.0.1 Componentanalysis ..................... 28 5.0.2 Transformer-basedReID................... 29 5.0.3 SUAnalysis ......................... 33
Conclusions 39 
References
40
                                

[1] Al-Shakarji, N. M., Bunyak, F., Seetharaman, G., and Palaniappan, K. Multi- object tracking cascade with multi-step data association and occlusion han- dling. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2018), IEEE, pp. 1–6.
[2] Bergmann, P., Meinhardt, T., and Leal-Taixe, L. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Com- puter Vision (2019), pp. 941–951.
[3] Bernardin, K., and Stiefelhagen, R. Evaluating multiple object tracking per- formance:the clear mot metrics. In EURASIP Journal on Image and Video Processin (2008).
[4] Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B. Simple online and re- altime tracking. In 2016 IEEE International Conference on Image Processing (ICIP) (2016), pp. 3464–3468.
[5] Cai,J.,Xu,M.,Li,W.,Xiong,Y.,Xia,W.,Tu,Z.,andSoatto,S.Memot:multi- object tracking with memory. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022).
[6] Chen, T., Ding, S., Xie, J., Yuan, Y., Chen, W., Yang, Y., Ren, Z., and Wang, Z. Abd-net: Attentive but diverse person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (2019), pp. 8351– 8361.
[7] Chen, X., Fu, C., Zhao, Y., Zheng, F., Song, J., Ji, R., and Yang, Y. Salience- guided cascaded suppression network for person re-identification. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition (2020), pp. 3300–3310.
[8] Dendorfer, P., Ošep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S., and Leal-Taixé, L. Motchallenge: A benchmark for single-camera multiple target tracking. In International Journal of Computer Vision (IJCV) (2020).
[9] Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., and Leal-Taixé, L. Mot20: A benchmark for multi object tracking in crowded scenes, 2020.
[10] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Un- terthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[11] Evangelidis, G. D., and Psarakis, E. Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE transactions on pattern analysis and machine intelligence 30, 10 (2008), 1858–1865.
[12] Gal, Y., and Ghahramani, Z. Dropout as a bayesian approximation: Repre- senting model uncertainty in deep learning. In The International Conference on Learning Representations (ICLR) (2016).
[13] Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).
[14] Girshick, R. Fast r-cnn. In The IEEE International Conference on Computer Vision (ICCV) (2015).
[15] He,K.,Zhang,X.,Ren,S.,andSun,J.Deepresiduallearningforimagerecog- nition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 770–778.
[16] He, S., Luo, H., Wang, P., Wang, F., Li, H., and Jiang, W. Transreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF international conference on computer vision (2021), pp. 15013–15022.
[17] He, Y., Zhu, C., Wang, J., Savvides, M., and Zhang, X. Bounding box regres- sion with uncertainty for accurate object detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (6 2019).
[18] Kalman, R. E., and Others. A new approach to linear filtering and prediction problems. Journal of basic Engineering 82, 1 (1960), 35–45.
[19] Kampffmeyer, M., Salberg, A.-B., and Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images us- ing deep convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) (2016).
[20] Kendall, A., and Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Advances in Neural Information Processing Systems (2017).
[21] Kuhn, H. W., and Yaw, B. The hungarian method for the assignment problem. in Naval Res. Logist. Quart.
[22] Liao, P.-J., Huang, Y.-C., Chiang, C.-K., and Lai, S.-H. Robust multi-object tracking with spatial uncertainty. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023), IEEE, pp. 1–5.
[23] Lin,T.-Y.,Maire,M.,Belongie,S.,Bourdev,L.,Girshick,R.,Hays,J.,Perona, P., Ramanan, D., Zitnick, C. L., and Dollár, P. Microsoft coco: Common objects in context, 2014.
[24] Meinhardt,T.,Kirillov,A.,Leal-Taixe,L.,andFeichtenhofer,C.Trackformer: Multi-object tracking with transformers. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2022).
[25] Milan, A., Leal-Taixé, L., Reid, I. D., Roth, S., and Schindler, K. MOT16: A benchmark for multi-object tracking. CoRR (2016).
[26] Pang, J., Qiu, L., Li, X., Chen, H., Li, Q., Darrell, T., and Yu, F. Quasi-dense similarity learning for multiple object tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (6 2021).
[27] Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Infor- mation Processing Systems (2015).
[28] Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
[29] Ristani, E., Solera, F., Zou, R. S., Cucchiara, R., and Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision (ECCV) Workshop (2016).
[30] Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., and Sun, J. Crowd- human: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018).
[31] Stadler, D., and Beyerer, J. Improving multiple pedestrian tracking by track management and occlusion handling. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition (2021), pp. 10958–10967.
[32] Sun,P.,Zhang,R.,Jiang,Y.,Kong,T.,Xu,C.,Zhan,W.,Tomizuka,M.,Li,L., Yuan, Z., Wang, C., and Luo, P. SparseR-CNN: End-to-end object detection with learnable proposals. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
[33] Tang, S., Andriluka, M., Andres, B., and Schiele, B. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 3539–3548.
[34] Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jégou, H. Training data-efficient image transformers & distillation through attention. In International conference on machine learning (2021), PMLR, pp. 10347– 10357.
[35] Wang, Q., Zheng, Y., Pan, P., and Xu, Y. Multiple object tracking with cor- relation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 3876–3886.
[36] Wei,L.,Zhang,S.,Gao,W.,andTian,Q.Persontransfergantobridgedomain gap for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 79–88.
[37] Wojke, N., Bewley, A., and Paulus, D. Simple online and realtime tracking with a deep association metric. In The IEEE International Conference on Im- age Processing (ICIP) (2017), IEEE, pp. 3645–3649.
[38] Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., and Yuan, J. Track to detect and segment: An online multi-object tracker. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
[39] Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., and Wang, X. Bytetrack: Multi-object tracking by associating every detection box. European Conference on Computer Vision (ECCV) (2022).
[40] Zhang,Y.,Wang,C.,Wang,X.,Zeng,W.,andLiu,W.Fairmot:Onthefairness of detection and re-identification in multiple object tracking. In International Journal of Computer Vision (IJCV).
[41] Zhang, Y., Wang, C., Wang, X., Zeng, W., and Liu, W. Robust multi-object tracking by marginal inference. European Conference on Computer Vision (ECCV) (2022).
[42] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., and Tian, Q. Scalable per- son re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision (2015), pp. 1116–1124.
[43] Zhou, K., Yang, Y., Cavallaro, A., and Xiang, T. Omni-scale feature learning for person re-identification. In The IEEE International Conference on Com- puter Vision (ICCV) (2019).
[44] Zhou, X., Wang, D., and Krähenbühl, P. Objects as points. In arXiv preprint arXiv:1904.07850 (2019).
[45] 黃鈺程. 使用 sparse r-cnn 與空間不確性進行多目標跟蹤. Master’s thesis, 國立清華大學資訊工程學系, 2021.

簡易檢索 / 詳目顯示

相關論文