應用於動態影像超解析之具偏移限制的區塊式可變形卷積之軟硬體協同設計

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉家漢 Liu, Jia-Han
論文名稱：	應用於動態影像超解析之具偏移限制的區塊式可變形卷積之軟硬體協同設計 Algorithm-Hardware Co-design of Tile-Based Offset-Confined Deformable Convolution for Video Super-Resolution
指導教授：	黃朝宗 Huang, Chao-Tsung
口試委員:	呂仁碩 Liu, Ren-Shuo 賴永康 Lai, Yeong-Kang
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	111
語文別：	英文
論文頁數：	42
中文關鍵詞：	可變形卷積、卷積神經網路、動態影像超解析、軟硬體協同設計
外文關鍵詞：	Deformable Convolution, Convolutional Neural Network, Video Super-Resolution, Algorithm-Hardware Co-Design
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，可變形卷積在對齊動態影像時有很好的效果，因而漸漸地被應用到基於卷積神經網路的動態影像超解析上。在硬體實作層面，可變形卷積的其中一個設計困難是它的動態生成偏移量所導致的不規則記憶體存取。先前的研究—位置限制的可變形卷積，透過限制偏移量的方式來解決不規則存取問題。使得它可以不用受到記憶體存取的瓶頸，並實現在卷積神經網路加速器上。然而，位置限制的可變形卷積在生成變形特徵圖時，仍會有運算雙線性插值所帶來的代價。具體來說，在運算位置限制的可變形卷積時，雙線性插值的運算可能占了很大的比例，因而導致實作電路時要付出很大的硬體代價。
在本篇論文中，我們提出了一個軟硬體協同實作的可變形卷積—具偏移限制的區塊式可變形卷積，來解決雙線性插值代價過高的問題。受到神經網路加速器使用區塊作為最小運算單位這個特性的啟發，具偏移限制的區塊式可變形卷積使用基於區塊的取樣方式來減少雙線性插值的運算數目。同時，也減少了在運算過程中所需要的偏移量數目。相較於位置限制的可變形卷積，具偏移限制的區塊式可變形卷積是一個較輕量化的設計，但它會帶來一些影像品質的流失。為了補償影像品質的流失，我們引入另外一個叫做可變形群組的方法，它可以用偏移量數目作為交換來提升影像品質，且不會帶來額外的雙線性插值運算。最後，我們實作位置限制的可變形卷積、具偏移限制的區塊式可變形卷積及可變形群組到一個嵌入式卷積神經網路加速器—eCNN 上。作為一個案例分析，我們實作的規格為 Full HD 60fps，並且使用台積電 40 奈米製程技術來合成電路。
實驗結果顯示，相較於沒有做動態影像對齊的模型，有使用具偏移限制的區塊式可變形卷積及可變形群組的模型可以有 0.29dB到 0.33dB不等的峰值訊躁比提升。而相較於模型使用位置限制的可變形卷積，在偏移量數目相同的情況下，我們提出的方法只有0.05dB 到 0.09dB不等的影像品質流失，但可以降低56% 到 75% 不等的雙線性插值運算代價。電路合成結果顯示，在區塊大小為4x2 的例子之下，我們實作的具偏移限制的區塊式可變形卷積可以降低 67%的雙線性插值面積及 68% 的雙線性插值功耗，且可以避免額外 170 萬個邏輯閘去實作控制電路。

Deformable Convolution (DC) has recently shown outstanding performance in the alignment of multiple frames and is increasingly being applied in convolutional neural network-based (CNN-based) video super-resolution. When it comes to hardware implementation, one of the design challenges is irregular memory access due to the dynamically-generated offsets. Previous work, location-confined deformable convolution (LCDC), resolved this issue by an offset confinement method, enabling LCDC to be deployed on CNN accelerators without suffering memory bottleneck. However, LCDC still suffers the overheads of bilinear interpolation (BLI) used to generate deformed features. Specifically, BLI could occupy a large portion of computation in LCDC, leading to considerable circuit overheads.
In this thesis, we propose an algorithm-hardware co-designed DC, tile-based offset-confined deformable convolution (TODC), to address this issue. Inspired by CNN accelerators that have computational granularity in tiles, TODC employs a tile-based sampling method to reduce the operations of BLI, and meanwhile reduce the number of offsets involved in the computation. Although TODC is more lightweight, it brings some image quality degradation as compared to LCDC. To compensate for the quality, we introduce another technique, deformable group (DG), that can trade off the offset number against image quality but without additional BLI computations. Finally, we implement LCDC, TODC, and DG on an embedded CNN accelerator—eCNN with the specification of Full HD 60fps and synthesize them under TSMC 40nm technology as a case study.
Experiment results show that models equipped with TODC and DG have 0.29–0.33dB of PSNR improvement over models without frame alignment. Compared to the LCDC counterparts under the same offset number, they only have 0.05–0.09dB of quality degradation, yet can reduce BLI computation overheads by 56–75%. Synthesis results show that our TODC design with a tile size of 4x2 can achieve a 67% of area reduction and a 68% of power reduction on the BLI circuits, respectively, and can avoid extra 1.7M gates for control logics.

摘要i
Abstract iii
誌謝v
Introduction 1
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Temporal Alignment in Video Super-Resolution . . . . . . 3
2.2 Algorithm-Hardware Co-design for Deformable Convolution 4
3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 6
Previous Work: Location-Confined Deformable Convolution 7
1 Location-Confined Deformable Convolution . . . . . . . . . . . . 7
1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Deployment on CNN Accelerators with Tile-Based Dataflow 10
2 Bilinear Interpolation Overheads . . . . . . . . . . . . . . . . . . 11
2.1 The Computation of Bilinear Interpolation . . . . . . . . . 11
2.2 Computational Complexity Analysis . . . . . . . . . . . . 12
2.3 Bit Complexity Analysis . . . . . . . . . . . . . . . . . . . 13
Proposed Methods 15
1 Tile-Based Offset-Confined Deformable Convolution . . . . . . . . 15
1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Overhead Analysis . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Quantative Comparision . . . . . . . . . . . . . . . . . . . 18
1.4 Qualitative Comparision . . . . . . . . . . . . . . . . . . . 19
1.5 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Deformable Group . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Image Quality and Offset Number Trade-offs . . . . . . . . 22
2.3 Qualitative Comparision . . . . . . . . . . . . . . . . . . . 24
3 Software Implementation . . . . . . . . . . . . . . . . . . . . . . . 24
3.1 Experiment Settings . . . . . . . . . . . . . . . . . . . . . 24
3.2 Tile-Aware Training . . . . . . . . . . . . . . . . . . . . . 28
Hardware Implementation 29
1 Target System and Specification . . . . . . . . . . . . . . . . . . . 29
2 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1 Inference Datapath . . . . . . . . . . . . . . . . . . . . . . 30
2.2 Implementation of Bilinear Interpolation . . . . . . . . . . 31
3 Implementation Results . . . . . . . . . . . . . . . . . . . . . . . 34
3.1 Synthesis Results . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Comparison with Previous Work . . . . . . . . . . . . . . 35
Conclusion and Future Work 37
1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
                                

[1] Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu Tdan, “temporallydeformable
alignment network for video super-resolution. in 2020 ieee,” in
CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020,
pp. 3357–3366.
[2] Hua Wang, Dewei Su, Chuangchuang Liu, Longcun Jin, Xianfang Sun, and
Xinyi Peng, “Deformable non-local network for video super-resolution,” IEEE
Access, vol. 7, pp. 177734–177744, 2019.
[3] Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy,
“Edvr: Video restoration with enhanced deformable convolutional networks,”
in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops, 2019, pp. 0–0.
[4] Xinyi Ying, Longguang Wang, Yingqian Wang, Weidong Sheng, Wei An, and
Yulan Guo, “Deformable 3d convolution for video super-resolution,” IEEE
Signal Processing Letters, vol. 27, pp. 1500–1504, 2020.
[5] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and
Yichen Wei, “Deformable convolutional networks,” in Proceedings of the
IEEE international conference on computer vision, 2017, pp. 764–773.
[6] Nian-Hui Lin, “Memory-effcient deformable convolution engine with locationconfined
for video super-resolution,” Master’s Thesis in NTHU, 2022.
[7] François Chollet, “Xception: Deep learning with depthwise separable convolutions,”
in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, pp. 1251–1258.
[8] Chao-Tsung Huang, Yu-Chun Ding, Huan-Ching Wang, Chi-Wen Weng, Kai-
Ping Lin, Li-Wei Wang, and Li-De Chen, “ecnn: A block-based and highlyparallel
cnn accelerator for edge inference,” in Proceedings of the 52nd Annual
IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 182–
195.
[9] Yu-Chun Ding, Kai-Pin Lin, Chi-Wen Weng, Li-Wei Wang, Huan-Ching
Wang, Chun-Yeh Lin, Yong-Tai Chen, and Chao-Tsung Huang, “A 4.6-8.3
tops/w 1.2-4.9 tops cnn-based computational imaging processor with overlapped
stripe inference achieving 4k ultra-hd 30fps,” in European Solid-State
Circuits Conference (ESSCIRC), 2022.
[10] Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy,
“Understanding deformable alignment in video super-resolution,” in Proceedings
of the AAAI conference on artificial intelligence, 2021, vol. 35, pp.
973–981.
[11] Ce Liu and Deqing Sun, “On bayesian adaptive video super resolution,” IEEE
transactions on pattern analysis and machine intelligence, vol. 36, no. 2, pp.
346–360, 2013.
[12] Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman,
“Video enhancement with task-oriented flow,” International Journal of
Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.
[13] Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes
Totz, Zehan Wang, and Wenzhe Shi, “Real-time video super-resolution with
spatio-temporal networks and motion compensation,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2017, pp. 4778–
4787.
[14] Ding Liu, Zhaowen Wang, Yuchen Fan, Xianming Liu, Zhangyang Wang,
Shiyu Chang, and Thomas Huang, “Robust video super-resolution with
learned temporal dynamics,” in Proceedings of the IEEE International Conference
on Computer Vision, 2017, pp. 2507–2515.
[15] Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia, “Detailrevealing
deep video super-resolution,” in Proceedings of the IEEE International
Conference on Computer Vision, 2017, pp. 4472–4480.
[16] Mehdi SM Sajjadi, Raviteja Vemulapalli, and Matthew Brown, “Framerecurrent
video super-resolution,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2018, pp. 6626–6634.
[17] Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, and Seon Joo Kim, “Deep
video super-resolution network using dynamic upsampling filters without explicit
motion compensation,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2018, pp. 3224–3232.
[18] Takashi Isobe, Songjiang Li, Xu Jia, Shanxin Yuan, Gregory Slabaugh, Chunjing
Xu, Ya-Li Li, Shengjin Wang, and Qi Tian, “Video super-resolution with
temporal group attention,” in Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, 2020, pp. 8008–8017.
[19] Seungjun Nah, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun
Son, Radu Timofte, and Kyoung Mu Lee, “Ntire 2019 challenge on video
deblurring and super-resolution: Dataset and study,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,
2019, pp. 0–0.
[20] Saehyun Ahn, Jung-Woo Chang, and Suk-Ju Kang, “An efficient accelerator
design methodology for deformable convolutional networks,” in IEEE International
Conference on Image Processing (ICIP). IEEE, 2020, pp. 3075–3079.
[21] Sanghamitra Dutta, Ziqian Bai, Tze Meng Low, and Pulkit Grover, “Codenet:
Training large scale neural networks in presence of soft-errors,” arXiv preprint
arXiv:1903.01042, 2019.
[22] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken,
Rob Bishop, Daniel Rueckert, and Zehan Wang, “Real-time single image
and video super-resolution using an efficient sub-pixel convolutional neural
network,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2016, pp. 1874–1883.
[23] Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang,
“Deep laplacian pyramid networks for fast and accurate super-resolution,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2017, pp. 624–632.
[24] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury,
Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca
Antiga„ et al., “Pytorch: An imperative style, high-performance deep learning
library,” Advances in neural information processing systems, vol. 32,
2019.
[25] Diederik P Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,”
arXiv preprint arXiv:1412.6980, 2014.

簡易檢索 / 詳目顯示

相關論文