簡易檢索 / 詳目顯示

研究生: 陳委志
Chen, Wei-Chih
論文名稱: 高效能複數可控金字塔之硬體分析與設計
VLSI Analysis and Design for High-Performance Complex Steerable Pyramid
指導教授: 黃朝宗
Huang, Chao-Tsung
口試委員: 賴永康
Lai, Yeong-Kang
劉奕汶
Liu, Yi-Wen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 42
中文關鍵詞: 相位處理複數可控金字塔超大型積體電路場域可程式化邏輯閘陣列
外文關鍵詞: Phase-based, CSP, VLSI, FPGA
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,一種稱作相位訊號處理的影像處理技術被提出來,其原理為利用相位資訊來判斷位移,而非使用具有高度運算複雜的光流法。此技術已經被應於像是畫面內插,視點合成及影像放大等應用。相位訊號處理是建構在複數可控金字塔的影像分解及重建上。而演算法中的運算複雜度主要集中於在複數可控金字塔的分解及重建上。隨著解析度增加,隨之上升的運算複雜度將使即時運算的應用難以實現。在本篇論文中,我們將對於這項問題,提出高效的複數可控金字塔之硬體。

    傳統的作法會採用快速傅立葉變換作為複數可控金字塔的實現方式,但此方式將使用大量的晶載記憶體及延遲緩衝器。此外,二維的快速傅立葉變換將消耗高頻寬的記憶體資源。針對這些問題,我們採取了基於有限脈衝響應的硬體實現。針對一維的複數可控金字塔,相較基於快速傅立葉變換的複數可控金字塔之硬體實現,我們的硬體減少68\%的邏輯閘及62\%的晶載記憶體。針對二維的複數可控金字塔,我們的目標是提出每秒60幀的全高清畫質的複數可控金字塔處理引擎。此設計的挑戰是二維的複數可控金字塔需要高頻寬的記憶體資源以及大量的晶載記憶體。我們提出基於條狀的運算流程來降低記憶體頻寬及節省85\%的晶載記憶體。

    我們基於台積電40奈米製程實作我們的一維及二維複數可控金字塔之硬體。一維複數可控金字塔的電路使用483K的邏輯閘,以及85-KB的晶載記憶體。在250MHz下合成時,能夠時能提供吞吐量達314Mpixel/s來支援每秒30幀的4K超高畫質顯示器。二維複數可控金字塔的電路使用1.5M的邏輯閘,以及32-KB的晶載記憶體。在222MHz下合成時,能夠時能提供吞吐量達126Mpixel/s來支援每秒60幀的全高清畫質影片。此外,我們也將二維複數可控金字塔的電路實作在FPGA平台上,跑在80MHz之操作頻率下,提供每秒30張1024x1024解析度的影片。


    In recent years, a novel video processing technique called phase-based processing was proposed. This method measures the motion by local phase instead of computationally-intensive optical flow, and it has been applied on various applications, such as frame interpolation, view synthesis, and video magnification. Phase-based processing is based on the complex steerable pyramid (CSP). However, the decomposition and reconstruction of CSP occupied the most of the computation complexity of phase-based algorithm. With the increasing of image resolution, the high computational complexity of CSP makes real-time application difficult. In this thesis, we aim to resolve this difficulty by providing detailed design analysis and efficient VLSI implementation for CSPs.

    For hardware implementation, conventional method used the fast Fourier transform (FFT) for CSP implementation. However, the FFT needs large numbers of on-chip memory and delay buffers. Moreover, the 2D-FFT needs huge DRAM bandwidth. To address these problems, we adopt a computation- and memory-efficient finite impulse response (FIR) implementation. In 1-D CSP, we reduce 68\% of gate counts and 62\% of on-chip memory compared to FFT-based CSP. For 2-D CSP, we aim to introduce a FHD CSP processing engine at 60fps. The design challenge is that 2-D CSP requires high bandwidth and large size of on-chip memory. We propose a stripe-based computation flow to reduce the DRAM bandwidth and save 85\% of on-chip memory compared to frame-based computation flow.

    We implemented both our 1-D and 2-D CSP VLSI circuit using TSMC 40nm technology process. The 1-D CSP circuit uses 483K gate count and 85KB on-chip memory. When synthesized at 250 MHz, it can deliver 314 Mpixel/s to support 4K Ultra-HD Display at 30fps. The 2-D CSP circuit uses 3.5M logic gates and 32KB on-chip memory. When synthesized at 222 MHz, it delivers 126 Mpixel/sec to support FHD Video at 60fps. Besides, we also implement our 2-D CSP on FPGA. The FPGA demo system operates at 80MHz and achieves real-time performance for 1024x1024 resolution video at 30 fps.

    Abstract iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Related work 3 2.1 The Design and Use of Steerable Filters . . . . . . . . . . . . . . . 3 2.2 Steerable Pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3.1 Description of the FFT . . . . . . . . . . . . . . . . . . . . 6 2.3.2 Pipelined Architecture of FFT . . . . . . . . . . . . . . . . 9 2.4 Phase-based Processing . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 Phase-Based Motion Magnification . . . . . . . . . . . . . . 11 2.4.2 Phase-based View Synthesis . . . . . . . . . . . . . . . . . . 12 3 VLSI Design of One-Dimensional Complex Steerable Pyramid 15 3.1 Overview to Wavelet Decomposition and Reconstruction . . . . . . 15 3.2 Analysis of Hardware Complexity . . . . . . . . . . . . . . . . . . . 16 3.2.1 FFT-based Architecture . . . . . . . . . . . . . . . . . . . . 16 3.2.2 Convolution-based Architecture . . . . . . . . . . . . . . . . 17 3.2.3 Hardware Complexity Comparison . . . . . . . . . . . . . . 17 3.3 System Architecture of 1-D CSP . . . . . . . . . . . . . . . . . . . 20 3.3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.2 FIR Processing Element . . . . . . . . . . . . . . . . . . . . 21 3.3.3 Precision Analysis . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4 VLSI Design of Two-Dimensional Complex Steerable Pyramid 25 4.1 Hardware Performance Analysis . . . . . . . . . . . . . . . . . . . . 25 4.1.1 Arithmetic Complexity . . . . . . . . . . . . . . . . . . . . . 25 4.1.2 Bandwidth Analysis . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Memory-efficient System Architecture of 2-D CSP . . . . . . . . . . 26 4.2.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.2 Stripe-based Data Flow . . . . . . . . . . . . . . . . . . . . 27 4.2.3 Highly-parallel Convolution Engine . . . . . . . . . . . . . . 29 4.2.4 Precision Analysis . . . . . . . . . . . . . . . . . . . . . . . 31 5 Implementation Result of 2-D CSP 35 5.1 TSMC 40 nm Synthesis Result . . . . . . . . . . . . . . . . . . . . 35 5.2 FPGA Implementation Result . . . . . . . . . . . . . . . . . . . . . 36 6 Conclusion and Future Work 39

    [1] W. T. Freeman and E. H. Adelson, “The design and use of steerable filters,”
    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13,
    no. 9, pp. 891–906, Sep. 1991.
    [2] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger, “Shiftable
    multiscale transforms,” IEEE Transactions on Information Theory, vol. 38,
    no. 2, pp. 587–607, March. 1992.
    [3] E. P. Simoncelli and W. T. Freeman, “The steerable pyramid: a flexible
    architecture for multi-scale derivative computation,” in Proceedings., International
    Conference on Image Processing, Oct 1995, vol. 3, pp. 444–447.
    [4] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation
    of complex fourier series,” Mathematics of computation, vol. 19, no. 90, pp.
    297–301, 1965.
    [5] John G. Proakis and Dimitris G. Manolakis, Digital Signal Processing (3rd
    Ed.): Principles, Algorithms, and Applications, Prentice-Hall, Inc., USA,
    1996.
    [6] S. M. Joshi, “FFT architectures: a review,” international Journal of Computer
    applications, vol. 116, no. 7, 2015.
    [7] B. M. Baas, “A low-power, high-performance, 1024-point FFT processor,”
    IEEE Journal of Solid-State Circuits, vol. 34, no. 3, pp. 380–387, March 1999.
    [8] Wold and Despain, “Pipeline and parallel-pipeline FFT processors for VLSI
    implementations,” IEEE Transactions on Computers, vol. C-33, no. 5, pp.
    414–426, May 1984.
    [9] Shousheng He and M. Torkelson, “A new approach to pipeline FFT processor,”
    in Proceedings of International Conference on Parallel Processing, April
    1996, pp. 766–770.
    [10] Ce Liu, Antonio Torralba, William T. Freeman, Frédo Durand, and Edward
    H. Adelson, “Motion magnification,” ACM Trans. Graph., vol. 24,
    no. 3, pp. 519–526, Jul. 2005.
    [11] Hao-Yu Wu, Michael Rubinstein, Eugene Shih, John Guttag, Frédo Durand,
    and William T. Freeman, “Eulerian video magnification for revealing subtle
    changes in the world,” ACM Trans. Graph. (Proceedings SIGGRAPH 2012),
    vol. 31, no. 4, 2012.
    [12] Neal Wadhwa, Michael Rubinstein, Frédo Durand, and William T. Freeman,
    “Phase-based video motion processing,” ACM Trans. Graph. (Proceedings
    SIGGRAPH 2013), vol. 32, no. 4, 2013.
    [13] H. Huang, Y. Wang, W. Chen, P. Lin, and C. Huang, “System and VLSI
    implementation of phase-based view synthesis,” in 2019 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019,
    pp. 1428–1432.
    [14] Han-Chih Huang, “VLSI system implementation of phase-based view synthesis
    for 4K Ultra-HD 3DTV,” in NTHU, 2018.

    QR CODE