簡易檢索 / 詳目顯示

研究生: 吳 瞳
Wu, Tong
論文名稱: 應用於動態影像空間時間超解析之高效記憶體區塊內採樣可變形捲積之軟硬體協同設計
Memory-Efficient Algorithm-Hardware Co-design of In-Tile-Sampling Deformable Convolution for Space-Time Video Super-Resolution
指導教授: 黃朝宗
Huang, Chao-Tsung
口試委員: 呂仁碩
Liu, Ren-Shuo
賴伯承
Lai, Bo-Cheng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 59
中文關鍵詞: 可變形卷積卷積神經網路動態影像空間時間超解析數位電路軟硬體協同設計
外文關鍵詞: Deformable Convolution, Convolutional Neural Network, Space-Time Video Super-Resolution, Digital Circuits, Algorithm-Hardware Co-Design
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 具有調變功能的可變形卷積(DCM)最近在基於CNN的影片修復任務中,如動態影像超解析(STVSR),展示了出色的增強影像品質的能力。DCM通過動態調整空間採樣位置(SSLs) 以及使用位移和遮罩來控制樣本的相對影響,捕捉幾何和物體運動資訊。我們的實驗顯示,將DCM整合進模型後,時空內插的影格在PSNR 可提升最多0.52 dB,整段影片的 PSNR 最高則可提升 0.18 dB。然而,支持多樣化的SSLs會在硬體實現中導致顯著的記憶體開銷。

    為了解決這個問題,先前提出的位移限制(OffsetConfinement, OC) 方法將位移值限制在[-1, +1] 的範圍內,從而限制SSLs。然而,與一般卷積相比,基於OCDCM(OC-DCM) 擴大了感受野,導致儲存重用特徵圖所需的記憶體開銷大約增加一倍。

    為了進一步減少OC-DCM所引入的記憶體開銷,我們分析了STVSR模型中SSLs 的分布,並發現了兩個關鍵點:(1)大多數SSLs位於3×3的一般卷積核的感受野內,(2)SSLs 在水平方向的擴展大於垂直方向。受到以區塊方式處理資料的CNN加速器的啟發,並基於這些觀察,我們提出了適用於硬體實現的區塊內採樣DCM (ITS-DCM)。此方法將 SSLs 限制在一個水平延展的 3×4 輸入區塊內,該區塊能涵蓋大部分的SSLs (≥98.75%),同時消除了記憶體開銷。

    此外,與標準的DCM相比,ITS-DCM的性能下降極小(<0.01dB)。在硬體性能評估中,我們將DCM引擎整合到目標為8K解析度30影格率的STVSR 加速器中。與OC-DCM相比,ITS-DCM 在 Layer Fusion 流程中儲存重用特徵圖的記憶體需求僅為OC-DCM的49%。此外,合成結果顯示,ITS-DCM 面積相較OC-DCM 減少了5.7%。總結,我們提出的方法在提供更高記憶體效率的同時,仍保持了高品質的STVSR表現。


    Deformable convolution with modulation (DCM) has recently shown exceptional capability in enhancing video quality for CNN-based video restoration tasks, such as space-time video super-resolution (STVSR). It achieves this by capturing geometric information and object motion through the dynamic adjustment of spatial sampling locations (SSLs) and the relative influence of the samples using offsets and masks. Our experiments show that incorporating DCM results in a PSNR improvement of up to 0.52 dB for space-time-interpolated frames and up to 0.18 dB for entire video sequences. However, supporting diverse SSLs in DCM leads to significant memory overhead in hardware implementations.

    To address this, a previously proposed method, known as Offset Confinement (OC), restricts offset values to a fixed range of [-1, +1], thereby limiting SSLs. However, compared to plain convolution, OC-based DCM (OC-DCM) increases the receptive field, leading to approximately double the memory overhead for storing reused feature maps.

    To further reduce the memory overhead introduced by OC-DCM, we analyzed the distribution of SSLs in the STVSR model and found two key patterns: (1) most SSLs lie within the receptive field of a 3×3 plain convolution kernel, and (2) SSLs exhibit a greater horizontal than vertical spread. Inspired by CNN accelerators that process data in tiles and based on these observations, we propose in-tile sampling DCM (ITS-DCM) for hardware implementation. This approach confines SSLs to a horizontally elongated 3 × 4 input tile, which covers most SSLs (≥ 98.75%) while eliminating memory overhead. Furthermore, compared to standard DCM, ITS-DCM incurs only a negligible performance drop(< 0.01 dB).

    For hardware performance evaluation, we integrate a DCM engine into an accelerator targeting the STVSR application at 8K-resolution 30 fps. Compared to OC-DCM, ITS-DCM uses only 49% of the memory required for storing reused feature maps during layer fusion workflow. Furthermore, synthesis results show that ITS-DCM achieves a 5.7% reduction in area for DCM inference logic compared to OC-DCM. In summary, the proposed method offers greater memory efficiency while achiving high STVSR quality.

    1 Introduction .................................... 1 1.1 Deformable Convolution with Modulation ...................... 1 1.2 Related Work ................................................ 5 1.2.1 Algorithm-Hardware Co-design for Deformable Convolution ... 5 1.3 Baseline Approach: Offset-Confinement ....................... 7 1.3.1 Offset Confinement Approach ............................... 7 1.3.2 Deployment of OC-DCM on CNN Accelerators .................. 9 1.3.3 Problems of OC-DCM ........................................ 13 1.4 Thesis Organization ......................................... 14 2 Effectiveness of DCM in STVSR ................................ 15 2.1 Overview of Space-time Video Super-resolution ............... 15 2.2 STVSR Model with DCM in Feature Temporal Interpolation ...... 17 2.2.1 Experiment Settings ....................................... 20 2.2.2 Quantitative Analysis of Model Performance ................ 20 2.2.3 Qualitative Analysis of Model Performance ................. 22 2.3 STVSR Model with DCM in Reconstructor ....................... 23 2.3.1 Quantitative Analysis of Model Performance ................ 24 2.3.2 Qualitative Analysis of Model Performance ................. 27 2.4 Failure Case Study .......................................... 29 3 Proposed Method .............................................. 33 3.1 Analysis of Spatial Sampling Locations ...................... 33 3.2 In-Tile-Sampling DCM ........................................ 36 3.3 Performance Comparison ...................................... 38 3.4 More Discussions ............................................ 40 3.4.1 The Effectiveness of ITS-DCM .............................. 40 3.4.2 OC-DCM with Re-training ................................... 41 4 Hardware Implementation ...................................... 43 4.1 Target System and Specification ............................. 43 4.2 Hardware Architecture ....................................... 44 4.2.1 Inference Datapath ........................................ 44 4.2.2 Offset and Mask Generation ................................ 45 4.2.3 DCM Engine ................................................ 46 4.3 Hardware Implementation Results ............................. 49 4.4 Model Quantization .......................................... 51 5 Conclusion ................................................... 55

    [1] Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, and Chenliang Xu, “Zooming slow-mo: Fast and accurate one-stage space-time video super-resolution,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020, pp. 3370–3379.

    [2] Yun Fu Yapeng Tian, Yulun Zhang and Chenliang Xu, “Tdan: Temporally deformable alignment network for video super-resolution,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020, pp. 3360–3369.

    [3] Xintao Wang, Kelvin C.K. Chan, Ke Yu, Chao Dong, and Chen Change Loy, “Edvr: Video restoration with enhanced deformable convolutional networks,” in The IEEE Conference on Computer Vision and Pattern Recognition Work shops (CVPRW), June 2019.

    [4] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision (ICCV), 2017, pp. 764–773.

    [5] Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai, “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE/CVF Con ference on Computer Vision and Pattern Recognition (CVPR), June 2019.

    [6] Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.

    [7] Nian-Hui Lin, “Memory-effcient deformable convolution engine with location confined for video super-resolution,” Master’s Thesis in NTHU, 2022.

    [8] Liu Jia-Han, “Algorithm-hardware co-design of tile-based offset-confined deformable convolution for video super-resolution,” Master’s Thesis in NTHU, 2022.

    [9] Kai-Ping Lin, Jia-Han Liu, Jyun-Yi Wu, Hong-Chuan Liao, and Chao Tsung Huang, “Vista: A 704mw 4k-uhd cnn processor for video and image spatial/temporal interpolation acceleration,” in 2023 IEEE International Solid State Circuits Conference (ISSCC), 2023, pp. 48 50.

    [10] Saehyun Ahn, Jung-Woo Chang, and Suk-Ju Kang, “An efficient accelerator design methodology for deformable convolutional networks,” in IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 3075–3079.

    [11] Qijing Huang, Dequan Wang, Zhen Dong, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, and John Wawrzynek, “Codenet: Efficient deployment of input-adaptive object detection on embedded fpgas,” in The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2021, pp. 206–216.

    [12] Ce Liu and Deqing Sun, “On bayesian adaptive video super resolution,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 2, pp. 346–360, 2013.

    [13] Juhyoung Lee, Dongjoo Shin, Jinsu Lee, Jinmook Lee, Sanghoon Kang, and Hoi Jun Yoo, “A full hd 60 fps cnn super resolution processor with selective caching based layer fusion for mobile devices,” in 2019 Symposium on VLSI Circuits, VLSI Circuits 2019- Digest of Technical Papers, June 2019, pp. C302–C303.

    [14] Chao-Tsung Huang, Yu-Chun Ding, Huan-Ching Wang, Chi-Wen Weng, Kai Ping Lin, Li-Wei Wang, and Li-De Chen, “ecnn: A block-based and highly parallel cnn accelerator for edge inference,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 182-195.

    [15] Yu-Chun Ding, Kai-Pin Lin, Chi-Wen Weng, Li-Wei Wang, Huan-Ching Wang, Chun-Yeh Lin, Yong-Tai Chen, and Chao-Tsung Huang, “A 4.6-8.3 tops/w 1.2-4.9 tops cnn-based computational imaging processor with overlapped stripe inference achieving 4k ultra-hd 30fps,” in ESSCIRC 2022 IEEE 48th European Solid State Circuits Conference (ESSCIRC), 2022, pp. 81–84.

    [16] Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita,
    “Recurrent back-projection network for video super-resolution,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3897–3906.

    [17] Zhou Wang, Ligang Lu, and A.C. Bovik, “Video quality assessment using structural distortion measurement,” in Proceedings. International Conference on Image Processing, 2002, vol. 3, pp. III–III.

    [18] Kalpana Seshadrinathan and Alan Conrad Bovik, “Motion tuned spatio-temporal quality assessment of natural videos,” IEEE transactions on image processing, vol. 19, no. 2, pp. 335–350, 2009.

    QR CODE