簡易檢索 / 詳目顯示

研究生: 鄭程哲
Cheng, Cheng-Che
論文名稱: 以可重構時空圖模型處理多相機多物件追蹤
ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 劉庭祿
Liu, Tyng-Luh
陳煥宗
Chen, Hwann-Tzong
江振國
Chiang, Chen-Kuo
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 40
中文關鍵詞: 圖像化神經網路多相機多物件追蹤空間與時間感知特徵
外文關鍵詞: Graph Neural Network, Multi-Camera Multi-Object Tracking, Spatial-Temporal Feature Representation
相關次數: 點閱:45下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 多相機多物件追蹤利用多視角的豐富畫面資訊更好地解決遮擋或擁擠場景中的追蹤問題。近年來,在物件追蹤領域中,以圖結構作為基底的研究方法漸漸盛行,然而這類的方法並沒有有效地利用時間與空間資訊上的一致性,再從多相機的角度去解決問題,他們更依賴以單相機追蹤模型產生的結果作為輸入,這樣會導致很多的軌跡碎片化或 ID 交換的錯誤發生。在這篇論文中,我們提出了一個創新的可重構時空圖模型,首先將所有相機視角下偵測到的物件建立一個空間圖並進行空間上的配對,再利用重構圖的模組轉換成時間圖,最後進行時間上的配對完成追蹤。兩階段的配對可以讓我們分別提取出空間與時間的特徵並有效地解決軌跡碎片化的問題。此外,我們的模型是設計成線上追蹤,更加適合應用在現實生活中的場景,實驗結果也顯示我們的模型可以提取出更具識別性的特徵,並在多個資料集上超越其他方法,達到最好的表現。


    Multi-Camera Multi-Object Tracking (MC-MOT) utilizes information from multiple views to better handle problems with occlusion and crowded scenes. Recently, the use of graph-based approaches to solve tracking problems has become very popular. However, many current graph-based methods do not effectively utilize information regarding spatial and temporal consistency. Instead, they rely on single-camera trackers as input, which are prone to fragmentation and ID switch errors. In this thesis, we propose a novel reconfigurable graph model that first associates all detected objects across cameras spatially before reconfiguring it into a temporal graph for Temporal Association. This two-stage association approach enables us to extract robust spatial and temporal-aware features and address the problem with fragmented tracklets. Furthermore, our model is designed for online tracking, making it suitable for real-world applications. Experimental results show that the proposed graph model is able to extract more discriminating features for object tracking, and our model achieves state-of-the-art performance on several public datasets.

    摘要 目錄 第一章-----------------------1 第二章-----------------------6 第三章-----------------------8 第四章----------------------21 第五章----------------------36 參考文獻--------------------37

    [1] Andriluka, M., Roth, S., and Schiele, B. People-tracking-by-detection and people-detection-by-tracking. In IEEE Conference on Computer Vision and Pattern Recognition(CVPR) (2008), pp. 1–8.
    [2] Berclaz, J., Fleuret, F., Turetken, E., and Fua, P. Multiple object tracking using k-shortest paths optimization. IEEE transactions on pattern analysis and machine intelligence(TPAMI) (2011).
    [3] Bergmann, P., Meinhardt, T., and Leal-Taixe, L. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019).
    [4] Bernardin, K., and Stiefelhagen, R. Evaluating multiple object tracking performance: The clear mot metrics. EURASIP Journal on Image and Video Processing 2008 (01 2008).
    [5] Braso, G., and Leal-Taixe, L. Learning a neural solver for multiple object tracking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
    [6] Cao, J., Weng, X., Khirodkar, R., Pang, J., and Kitani, K. Observationcentric sort: Rethinking sort for robust multi-object tracking. arXiv preprint arXiv:2203.14360 (2022).
    [7] Chavdarova, T., Baqué, P., Bouquet, S., Maksai, A., Jose, C., Bagautdinov, T., Lettry, L., Fua, P., Van Gool, L., and Fleuret, F. Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR) (2018), pp. 5030–5039.
    [8] Chu, P., and Ling, H. Famnet: Joint learning of feature, affinity and multidimensional assignment for online multiple object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019).
    [9] Chu, P., Wang, J., You, Q., Ling, H., and Liu, Z. Transmot: Spatialtemporal graph transformer for multiple object tracking. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2023), pp. 4870–4880.
    [10] Engilberge, M., Liu, W., and Fua, P. Multi-view tracking using weakly supervised human motion prediction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2023), pp. 1582–1592.
    [11] Ferryman, J., and Shahrokni, A. Pets2009: Dataset and challenge. In 2009 Twelfth IEEE international workshop on performance evaluation of tracking and surveillance(IEEE) (2009), pp. 1–6.
    [12] Gori, M., Monfardini, G., and Scarselli, F. A new model for learning in graph domains(ieee). In Proceedings of the IEEE International Joint Conference on Neural Networks (2005), pp. 729–734 vol. 2. 37
    [13] He, Y., Wei, X., Hong, X., Shi, W., and Gong, Y. Multi-target multi-camera tracking by tracklet-to-target assignment. IEEE Transactions on Image Processing(TIP) 29 (2020), 5191–5205.
    [14] Hofmann, M., Wolf, D., and Rigoll, G. Hypergraphs for joint multi-view reconstruction and multi-object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) (2013), pp. 3650–3657.
    [15] Hou, Y., and Zheng, L. Multiview detection with shadow transformer (and view-coherent data augmentation). In Proceedings of the 29th ACM International Conference on Multimedia (MM ’21) (2021).
    [16] Hou, Y., Zheng, L., and Gould, S. Multiview detection with feature perspective transformation. In ECCV (2020).
    [17] Kalman, R. E., and Others. A new approach to linear filtering and prediction problems. Journal of basic Engineering 82, 1 (1960), 35–45.
    [18] Kim, A., Brasó, G., Ošep, A., and Leal-Taixé, L. Polarmot: How far can geometric relations take us in 3d multi-object tracking? In European Conference on Computer Vision (ECCV) (2022), pp. 41–58.
    [19] Kim, C., Li, F., Ciptadi, A., and Rehg, J. M. Multiple hypothesis tracking revisited. In IEEE International Conference on Computer Vision (ICCV) (2015), pp. 4696–4704.
    [20] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations(ICLR) (2015), Y. Bengio and Y. LeCun, Eds.
    [21] Kipf, T. N., and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR) (2017).
    [22] Lan, L., Wang, X., Hua, G., Huang, T. S., and Tao, D. Semi-online multipeople tracking by re-identification. International Journal of Computer Vision(IJCV) (2020), 1937–1955.
    [23] Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollár, P. Focal loss for dense object detection. In IEEE International Conference on Computer Vision (ICCV) (2017), pp. 2999–3007.
    [24] Luna, E., SanMiguel, J. C., Martínez, J. M., and Carballeira, P. Graph neural networks for cross-camera data association. arXiv preprint arXiv:2201.06311 (2022).
    [25] Nguyen, D. M. H., Henschel, R., Rosenhahn, B., Sonntag, D., and Swoboda, P. LMGP: Lifted multicut meets geometry projections for multi-camera multiobject tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 8866–8875.38
    [26] Ong, J., Vo, B.-T., Vo, B.-N., Kim, D. Y., and Nordholm, S. A bayesian filter for multi-view 3d multi-object tracking with occlusion handling. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI) (2020), 2246–2263.
    [27] Qian, Y., Yu, L., Liu, W., and Hauptmann, A. G. Electricity: An efficient multi-camera vehicle tracking system for intelligent city. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020).
    [28] Quach, K. G., Nguyen, P., Le, H., Truong, T.-D., Duong, C. N., Tran, M.-T., and Luu, K. DyGLIP: A dynamic graph model with link prediction for accurate multi-camera multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 13784–13793.
    [29] Ristani, E., Solera, F., Zou, R., Cucchiara, R., and Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In Computer Vision – ECCV 2016 Workshops (Cham, 2016), G. Hua and H. Jégou, Eds., Springer International Publishing, pp. 17–35.
    [30] Sun, S., Akhtar, N., Song, X., Song, H., Mian, A., and Shah, M. Simultaneous detection and tracking with motion modelling for multiple object tracking. In Computer Vision – ECCV 2020 (2020), A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds., pp. 626–643.
    [31] Tang, S., Andres, B., Andriluka, M., and Schiele, B. Subgraph decomposition for multi-target tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015).
    [32] Tang, S., Andres, B., Andriluka, M., and Schiele, B. Multi-person tracking by multicut and deep matching. In Computer Vision – ECCV 2016 Workshops (2016), G. Hua and H. Jégou, Eds., pp. 100–111.
    [33] Tang, S., Andriluka, M., Andres, B., and Schiele, B. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).
    [34] Tang, Z., Naphade, M., Liu, M.-Y., Yang, X., Birchfield, S., Wang, S., Kumar, R., Anastasiu, D., and Hwang, J.-N. Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
    [35] van der Maaten, L., and Hinton, G. Visualizing data using t-sne. Journal of Machine Learning Research 9, 86 (2008), 2579–2605.
    [36] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems (2017), vol. 30, Curran Associates, Inc. 39
    [37] Wang, Z., Zheng, L., Liu, Y., Li, Y., and Wang, S. Towards real-time multiobject tracking. In Computer Vision–ECCV 2020 (2020), Springer, pp. 107–122.
    [38] Wen, L., Lei, Z., Chang, M.-C., Qi, H., and Lyu, S. Multi-camera multi-target tracking with space-time-view hyper-graph. International Journal of Computer Vision(IJCV) (2017), 313–333.
    [39] Xu, Y., Liu, X., Liu, Y., and Zhu, S.-C. Multi-view people tracking via hierarchical trajectory composition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 4256–4265.
    [40] Xu, Y., Liu, X., Qin, L., and Zhu, S.-C. Cross-view people tracking by scenecentered spatio-temporal parsing. In Proceedings of the AAAI Conference on Artificial Intelligence(AAAI) (2017).
    [41] Xu, Y., Osep, A., Ban, Y., Horaud, R., Leal-Taixe, L., and Alameda-Pineda,
    X. How to train your deep multi-object tracker. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
    [42] You, Q., and Jiang, H. Real-time 3d deep multi-camera tracking. arXiv preprint arXiv:2003.11753 (2020).
    [43] Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., and Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision (ECCV) (2022).
    [44] Zhou, K., Yang, Y., Cavallaro, A., and Xiang, T. Omni-scale feature learning for person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019).
    [45] Zhou, X., Koltun, V., and Krähenbühl, P. Tracking objects as points. In Computer Vision – ECCV 2020 (2020), pp. 474–490.

    QR CODE