簡易檢索 / 詳目顯示

研究生: 林念慧
Lin, Nian-Hui
論文名稱: 適用於動態影像超解析之具記憶體優勢的位置限制可變形卷積引擎
Memory-Efficient Deformable Convolution Engine with Location-Confined for Video Super-Resolution
指導教授: 黃朝宗
Huang, Chao-Tsung
口試委員: 呂仁碩
Liu, Ren-Shuo
陳坤志
Chen, Kun-Chih
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 44
中文關鍵詞: 可變形卷積引擎
外文關鍵詞: Deformable Convolution
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 卷積神經網路最近已在計算影像領域表現出突出的優勢。對於應用於動態影像超解析,它不但可以帶來更高的解析度還能有更好的視覺一致性,這對於下一代終端裝置像是電視來說會是不可或缺的。而動態影像超解析中最大的挑戰之一在於如何正確對齊相鄰幀。最近的研究,例如EDVR 和TDAN,應用可變形卷積來自適應地藉由偏移量去學習如何對齊並能在動態影像超解析中獲得更好的品質。可變形卷積使用與輸入相關的偏移量來獲得可遠離原規則地鄰近區域的新位置。然而,在終端裝置上實現可變形卷積時,仍有兩個主要挑戰需要解決。
    首先,可變形卷積需要額外的晶片上儲存空間來解決由於位置不受約束而導致對輸入特徵圖的不規則提取的問題。然而,將整個輸入特徵圖存在晶片上儲存空間中對於終端裝置來說是無法負擔的。其次,在可變形卷積中獲取新位置時,使用到了雙線性插值法來處理落在非整數點上的位置。雙線性插值法的總計算量與偏移量的位寬和輸入特徵圖的大小成正比,這將在硬體實現上產生相當大的額外運算負擔。
    因此,本文分別提出了兩種方法來解決上述對應問題。對於前者,我們提出了一個位置受限的可變形卷積,它將參考位置限制在絕對值範圍1 內以減少在獲得新參考位置時所需的晶片上儲存容量同時又維持在相似的影像品質。對於後者,我們進一步優化偏移量的位寬以減少雙線性插值法的運算負擔。此外,在台積電40 奈米製程技術下,我們以FHD 四倍動態影像超解析每秒30幀規格的eCNN 引擎作為基準,來驗證我們提出的方法。在峰值訊噪比僅下降0.01dB 的情況下,當應用我們提出的限制位置可變形卷積在範圍為1 時,我們可以將晶片上儲存空間大小從16.59MB 減少到0.07MB,這是原始可變形卷積0.4%。此外,可變形卷積中的雙線性插值法原需額外的718 萬個邏輯閘來支持8 位寬的偏移值。通過優化偏移值的位寬,在峰值訊噪比下降僅為0.07dB 的情況下,我們可以將偏移值的位寬降低到4 位,也就是降到380 萬個邏輯閘。總之,與作為基準的eCNN 引擎相比,我們的位置限制可變形卷積引擎在將偏移值從8 位元減少到4 位元時,可以將額外的面積負擔從39.9% 減少到23.7% 同時維持相似的影像品質。


    Convolutional Neural Networks (CNNs) recently have shown prominent superiority in computational imaging. For application in video super-resolution (VSR), it can support higher resolution and bring better visual consistency, which may be indispensable to the next generation of edge devices such as TVs. One of the biggest challenges in VSR is to align neighboring frames properly. Recent studies, such as EDVR and TDAN, have applied the Deformable Convolution (DC) to learn the alignment by offsets adaptively and achieve better quality in VSR. DC uses the input-dependent offsets to obtain new locations away from its regular local neighborhood. However, when it comes to implementing DC on edge devices, there exist two main challenges to be solved.

    To begin with, DC needs additional on-chip memory to solve the problem of irregular access on input feature maps caused by unconstrained locations. However, storing the whole input feature maps in on-chip memory is unaffordable for edge devices. Second, when obtaining a new location in DC, bilinear interpolation is used to process locations that fall on non-integer points. The total computation of bilinear interpolation is proportional to the bit width of offsets and size of input feature maps, which will incur a considerable computational overhead on hardware implementation.

    Therefore, this thesis proposes two methods to deal with the above problems, respectively. For the former, we propose a location-confined DC (LCDC) that confines the reference locations to a constrained range 1 to reduce the on-chip storage required with similar quality when obtaining new reference locations. For the latter, we further optimize the bit width of offsets to reduce the computational overhead in bilinear interpolation. Furthermore, we take eCNN with the specification of VSRx4 FHD resolution 30fps under TSMC 40nm process technology as our baseline engine to verify our proposed methods. When applying LCDC engine, we can reduce the on-chip SRAM size from 16.59MB to 0.07MB which is 0.4% of the original DC with only 0.01dB PSNR drop. Moreover, the bilinear interpolation function in DC needs additional 7.18M gates to support the 8-bit offsets. By optimizing the bit width of offsets, we can reduce the bit width of offsets to 4-bit and cost 3.8M gates with only 0.07dB PSNR drop. In conclusion, the area overhead of our LCDC engine compared to the baseline eCNN engine can be reduced from 39.9% to 23.7% with similar quality when reducing offsets from 8-bit to 4-bit.

    1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Alignment of Neighboring Frames in Video Super-Resolution 5 1.2.2 Hardware Accelerators for Convolutional Neural Networks 6 1.2.3 Implementation of Deformable Convolution on Hardware . 6 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Proposed Modeling of Location-Confined Deformable Convolution 9 2.1 Analysis of Memory Overhead on Deformable Convolutional Layer 9 2.2 Memory-Efficient Location-Confined Deformable Convolution . . 15 2.3 Evaluation on Video Super-Resolution Model . . . . . . . . . . . 16 2.3.1 Training Setting and Model . . . . . . . . . . . . . . . . . 16 2.3.2 Analysis of Quality . . . . . . . . . . . . . . . . . . . . . . 17 2.3.3 Quantization and Fine-Tuning . . . . . . . . . . . . . . . . 20 2.4 Quick Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 Implementation of LCDC Engine 25 3.1 Target System and Specification . . . . . . . . . . . . . . . . . . . 25 3.2 Inference Flow of LCDC Engine . . . . . . . . . . . . . . . . . . . 26 3.3 Analysis of Computational Overhead on Deformable Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4 Hardware Optimization on Different Bit Widths of Offsets . . . . 30 3.5 Implementation Results . . . . . . . . . . . . . . . . . . . . . . . 30 3.5.1 Analysis of Performance . . . . . . . . . . . . . . . . . . . 30 3.5.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . 31 3.5.3 Synthesis Results . . . . . . . . . . . . . . . . . . . . . . . 32 4 Conclusion and Possible Extension 37 4.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Possible Extension . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    [1] C.-T. Huang, Y.-C. Ding, H.-C. Wang, C.-W. Weng, K.-P. Lin, L.-W. Wang,
    and L.-D. Chen, “eCNN: A block-based and highly-parallel CNN accelerator
    for edge inference,” in Proceedings of the 52nd Annual IEEE/ACM International
    Symposium on Microarchitecture (MICRO), 2019.
    [2] J. Caballero, C. Ledig, A. Aitken, A. Acosta, J. Totz, Z. Wang, and W. Shi,
    “Real-time video super-resolution with spatio-temporal networks and motion
    compensation,” in 2017 IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), 2017, pp. 2848–2857.
    [3] X. Wang, K. K. Chan, K. Yu, C. Dong, and C. Loy, “EDVR: Video restoration
    with enhanced deformable convolutional networks,” in 2019 IEEE/CVF Conference
    on Computer Vision and Pattern Recognition Workshops (CVPRW),
    2019, pp. 1954–1963.
    [4] Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu, “TDAN: Temporallydeformable
    alignment network for video super-resolution,” in Proceedings of
    the IEEE/CVF Conference on Computer Vision and Pattern Recognition
    (CVPR), 2020.
    [5] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and
    Yichen Wei, “Deformable convolutional networks,” in Proceedings of the
    IEEE International Conference on Computer Vision (ICCV), 2017.
    [6] Saehyun Ahn, Jung-Woo Chang, and Suk-Ju Kang, “An efficient accelerator
    design methodology for deformable convolutional networks,” in 2020 IEEE
    International Conference on Image Processing (ICIP), 2020, pp. 3075–3079.
    7] Qijing Huang, Dequan Wang, Zhen Dong, Yizhao Gao, Yaohui Cai, Tian
    Li, Bichen Wu, Kurt Keutzer, and John Wawrzynek, “CoDeNet: Efficient
    deployment of input-adaptive object detection on embedded FPGAs,” in The
    2021 ACM/SIGDA International Symposium on Field-Programmable Gate
    Arrays (FPGA), 2021, pp. 206–216.
    [8] Renjie Liao, Xin Tao, Ruiyu Li, Ziyang Ma, and Jiaya Jia, “Video superresolution
    via deep draft-ensemble learning,” in 2015 IEEE International
    Conference on Computer Vision (ICCV), 2015, pp. 531–539.
    [9] Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K. Katsaggelos,
    “Video super-resolution with convolutional neural networks,” IEEE Transactions
    on Computational Imaging, vol. 2, no. 2, pp. 109–122, 2016.
    [10] Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T. Freeman,
    “Video enhancement with task-oriented flow,” International Journal of
    Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.
    [11] Longguang Wang, Yulan Guo, Li Liu, Zaiping Lin, Xinpu Deng, and Wei
    An, “Deep video super-resolution using HR optical flow estimation,” IEEE
    Transactions on Image Processing, vol. 29, pp. 4323–4336, 2020.
    [12] Hua Wang, Dewei Su, Chuangchuang Liu, Longcun Jin, Xianfang Sun, and
    Xinyi Peng, “Deformable non-local network for video super-resolution,” IEEE
    Access, vol. 7, pp. 177734–177744, 2019.
    [13] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik, “Rich feature
    hierarchies for accurate object detection and semantic segmentation,”
    Proceedings of the IEEE Computer Society Conference on Computer Vision
    and Pattern Recognition (CVPR), 2014.
    [14] Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus,
    and Yann LeCun, “Overfeat: Integrated recognition, localization and detection
    using convolutional networks,” in 2nd International Conference on
    Learning Representations (ICLR), 2014.
    [15] Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo Durand, “Deep
    joint demosaicking and denoising,” ACM Trans. Graph., vol. 35, no. 6, 2016.
    [16] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang, “Beyond
    a Gaussian denoiser: Residual learning of deep CNN for image denoising,”
    2017, vol. 26, pp. 3142–3155.
    [17] K. Zhang, Wangmeng Zuo, and Lei Zhang, “FFDNet: Toward a fast and
    flexible solution for CNN-based image denoising,” IEEE Transactions on
    Image Processing, vol. 27, pp. 4608–4622, 2018.
    [18] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang, “Learning a
    deep convolutional network for image super-resolution,” in European Conference
    on Computer Vision (ECCV), 2014, pp. 184–199.
    [19] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee, “Accurate image superresolution
    using very deep convolutional networks,” 2016 IEEE Conference
    on Computer Vision and Pattern Recognition (CVPR), pp. 1646–1654, 2016.
    [20] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew P.
    Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi,
    “Photo-realistic single image super-resolution using a generative adversarial
    network,” 2017 IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR), pp. 105–114, 2017.
    [21] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee,
    “Enhanced deep residual networks for single image super-resolution,” 2017
    IEEE Conference on Computer Vision and Pattern Recognition Workshops
    (CVPRW), pp. 1132–1140, 2017.
    [22] Ayan Chakrabarti, “A neural approach to blind motion deblurring,” in
    European Conference on Computer Vision (ECCV), 2016, pp. 221–235.
    [23] Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li,
    Qi Guo, Tianshi Chen, and Yunji Chen, “Cambricon-X: An accelerator for
    sparse neural networks,” 2016 49th Annual IEEE/ACM International Symposium
    on Microarchitecture (MICRO), pp. 1–12, 2016.
    [24] Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo,
    Xiaobing Feng, Yunji Chen, and Olivier Temam, “ShiDianNao: Shifting
    vision processing closer to the sensor,” 2015 ACM/IEEE 42nd Annual International
    Symposium on Computer Architecture (ISCA), pp. 92–104, 2015.
    [25] Yu hsin Chen, Joel S. Emer, and Vivienne Sze, “Eyeriss: A spatial architecture
    for energy-efficient dataflow for convolutional neural networks,” 2016
    ACM/IEEE 43rd Annual International Symposium on Computer Architecture
    (ISCA), pp. 367–379, 2016.
    [26] Kartik Hegde, Rohit Agrawal, Yulun Yao, and Christopher W. Fletcher,
    “Morph: Flexible acceleration for 3D CNN-based video understanding,” 2018
    51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO),
    pp. 933–946, 2018.
    [27] Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling
    Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam, “DaDianNao:
    A machine-learning supercomputer,” 2014 47th Annual IEEE/ACM International
    Symposium on Microarchitecture (MICRO), pp. 609–622, 2014.
    [28] Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji
    Chen, and Tianshi Chen, “Cambricon: An instruction set architecture for
    neural networks,” 2016 ACM/IEEE 43rd Annual International Symposium
    on Computer Architecture (ISCA), pp. 393–405, 2016.
    [29] Norman P. Jouppi, Cliff Young, Nishant Patil, David A. Patterson, Gaurav
    Agrawal, Raminder Singh Bajwa, Sarah Bates, Suresh Bhatia, Nanette J. Boden,
    Al Borchers, Rick Boyle, Pierre luc Cantin, Clifford Chao, Chris Clark,
    Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir
    Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert B. Hagmann,
    C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Daniel Hurt, Julian
    Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan,
    Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon,
    James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle A. Lucke, Alan
    Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller,Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie,
    Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt
    Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew
    Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory
    Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter,
    Walter Wang, Eric Wilcox, and Doe Hyun Yoon, “In-datacenter performance
    analysis of a tensor processing unit,” 2017 ACM/IEEE 44th Annual
    International Symposium on Computer Architecture (ISCA), pp. 1–12, 2017.
    [30] Dongjoo Shin, Jinmook Lee, Jinsu Lee, and Hoi-Jun Yoo, “14.2 DNPU: An
    8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep
    neural networks,” 2017 IEEE International Solid-State Circuits Conference
    (ISSCC), pp. 240–241, 2017.
    [31] Gedas Bertasius, Lorenzo Torresani, and Jianbo Shi, “Object detection in
    video with spatiotemporal sampling networks,” in Proceedings of the European
    Conference on Computer Vision (ECCV), 2018.
    [32] Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei, “Integral
    human pose regression,” in Proceedings of the European Conference on
    Computer Vision (ECCV), 2018.
    [33] Ce Liu and Deqing Sun, “On Bayesian adaptive video super resolution,”
    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36,
    no. 2, pp. 346–360, 2014.
    [34] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
    recognition,” in 2016 IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), 2016, pp. 770–778.
    [35] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang,
    Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam
    Lerer, “Automatic differentiation in PyTorch,” in NIPS 2017 Workshop on
    Autodiff, 2017.
    [36] Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,”
    in arXiv, 2017.
    [37] Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou,
    Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong
    Yang, “Going deeper with embedded FPGA platform for convolutional neural
    network,” in Proceedings of the 2016 ACM/SIGDA International Symposium
    on Field-Programmable Gate Arrays, 2016, pp. 26–35.
    [38] Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung, “Fixed point optimization
    of deep convolutional neural networks for object recognition,” in 2015
    IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP), 2015, pp. 1131–1135.
    [39] Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi, “Hardware-oriented
    approximation of convolutional neural networks,” in arXiv, 2016.

    QR CODE