簡易檢索 / 詳目顯示

研究生: 鄭安佑
Zheng, An-You
論文名稱: 應用於光場分解之非負矩陣分解引擎硬體架構
VLSI Architecture of Non-Negative Matrix Factorization Engine for Light Field Factorization
指導教授: 黃朝宗
Huang, Chao-Tsung
口試委員: 賴永康
Lai, Yeong-Kang
邱瀞德
Chiu, Ching-Te
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 45
中文關鍵詞: 光場顯示器光場分解非負矩陣分解硬體架構設計定點數即時全高清
外文關鍵詞: Light Field Display, Light Field Factorization, Non-negative Matrix Factorization, VLSI Design, Fixed-point, Real-time Full HD
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來在3D 顯示器上已經有廣泛的研究,而我們也對3D 顯示器在未來市
    場的潛力抱有期望,因此我們致力於一種稱為光場顯示器的3D 顯示器,因為
    它將較於同種顯示器有著高解析度和全視差的優勢。在光場顯示器上,我們使
    用雙層液晶螢幕的架構搭配一個稱為光場分解的技術,而在光場分解這項技術
    上,我們採用一種稱為非負整數矩陣分解的疊代演算法。然而這個演算法需要
    很高的運算量,有其他研究提出利用圖形處理器來對這個演算法加速,但依然
    無法滿足我們對於即時全高清規格的應用,所以在這篇論文中,我們將這個計
    算繁重的非負整數矩陣分解實現到特殊應用積體電路,而透過我們的數位晶片
    可以做到即時全高清的光場分解。
    我們將非負整數矩陣分解這個演算法在硬體實現上的問題分成兩部分,第一
    部分為應用於光場分解的定點數非負整數矩陣分解,第二部分為非負整數矩陣
    分解引擎的系統架構。應用於光場分解的定點數非負整數矩陣分解過去沒有人
    討論過,所以我們會討論我們定點數非負整數矩陣分解的作法,包含定點數除
    法和內部量化。在整個運算之中,除法在硬體實現上是最關鍵的部分,考慮到
    硬體上的表現,我們為了光場分解設計了除法,且這個除法運用到傳統倒數查
    表的作法,而基於相同的原因,我們也根據光場還原的品質對內部的運算做量
    化來降低硬體複雜度,相較於原本保留完整精確度的作法,我們的實作可以提
    升34% 的面積延遲乘積。第二部分是為了高通量的光場分解所設計的非負整數
    矩陣分解引擎系統架構,在這個系統當中,我們用了多組乘法更新單元來執行
    定點數非負矩陣分解,為了最大化我們引擎的硬體效率,我們提出了一個分解
    控制設計,這個設計相較於直接的設計可以減少68% 的運算時間。
    我們為了光場顯示器實現了一個使用台積電四零奈米半導體製程的超大型積
    體電路來支援即時全高清的光場分解,在這篇論文中我們提出了一個非負整數
    矩陣分解引擎,這個引擎使用了630 萬個邏輯閘,且當運行在200 MHz 的頻率
    下,它可以提供每秒36G 個像素的吞吐量,而功耗的部分我們透過寄生電容在
    時間上的功耗來估計,結果為1.07 瓦。


    3D displays have been widely studied in recent years, and we are optimistic about their potentiality in the future market as well. Thus, we target one of the 3D displays called light field display due to its high resolution and full parallax. We adopt the architecture of dual-layer LCDs with the technique called light field factorization. We exploit the iteration-based algorithm Non-negative Maxtrix Factorization(NMF) to implement light field factorization. However, this algorithm needs high computation. The acceleration with GPU in other works is not sufficient for our real-time Full HD application. In this thesis, we implement this time-consuming algorithm ---Non-negative Matrix Factorization--- into ASIC. Using our VLSI architecture can achieve the throughput of real-time Full HD for light field factorization.
    The issues of our hardware implementation on NMF are divided into two parts: fixed-point NMF for light field factorization and system architecture of NMF engine. Fixed-point NMF applied on light field factorization has not been discussed before. Therefore, we will introduce our fixed-point NMF implementation including fixed-point division and internal quantization. In the overall computation, the division operator is the most critical part for hardware implementation. For the consideration of hardware performance, we design the division for light field factorization with the conventional reciprocal look-up table division. For the same purpose, we quantize the internal operation bitwidth according to the light field reconstruction quality to reduce the hardware complexity. Comparing to the full precision implementation, our implementation can enhance 34% of the area-delay product. The second issue is the system architecture of our NMF engine for high-throughput light field factorization. In the system, we have multiple multiplicative update units performing the fixed-point NMF computation. To maximize the hardware efficiency of our engine, we proposed a factorization control scheme. With this control scheme, our NMF engine can reduce 68% of computing time compared to a direct scheme.
    We implemented a VLSI circuit to support real-time Full HD light field factorization for light field display using TSMC 40 nm technology process. Our proposed design called NMF engine is given in this thesis, which costs 6.3M logic gates.
    It can provide 36G pixel/sec of throughput when operating at 200 MHz. The power consumption is 1.07 W which is measured using time-based power estimation with extracted parasitics.

    Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Non-negative Matrix Factorization . . . . . . . . . . . . . 3 1.2.2 Block-based Light Field Factorization . . . . . . . . . . . 9 1.2.3 Update with Sparsified Constraint . . . . . . . . . . . . . 10 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Fixed-point NMF for Light Field Factorization 13 2.1 Fixed-point NMF Overview . . . . . . . . . . . . . . . . . . . . . 13 2.2 Reciprocal LUT Division . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Lookup Table Decision . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Division Comparison . . . . . . . . . . . . . . . . . . . . . 16 2.3 Internal Quantization . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 Quantization Decision . . . . . . . . . . . . . . . . . . . . 19 2.3.2 Quantization Analysis . . . . . . . . . . . . . . . . . . . . 20 3 System Architecture of NMF Engine 23 3.1 NMF Engine Overview . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Multiplicative Update Unit . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Hardware Complexity . . . . . . . . . . . . . . . . . . . . 26 3.2.2 Design Pipeline . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Factorization Control Scheme . . . . . . . . . . . . . . . . . . . . 28 3.3.1 Factorization Control Scheme . . . . . . . . . . . . . . . . 30 3.3.2 Scheduling Comparison . . . . . . . . . . . . . . . . . . . 31 4 Implementation Result of NMF Engine for Real-Time Light Field Factorization 35 4.1 ASIC Implementation Result . . . . . . . . . . . . . . . . . . . . 35 4.2 FPGA Implementation Result . . . . . . . . . . . . . . . . . . . . 36 5 Conclusion and Future Work 41 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    [1] Haruo Isono, Minoru Yasuda, and Hideaki Sasazawa, “Autostereoscopic 3-d display using lcd-generated parallax barrier,” Electronics and Communications in Japan (Part II: Electronics), vol. 76, no. 7, pp. 77–84, 1993.
    [2] Yeong-Ho Ha and Chang-Hwan Son, “Color matching that improves image quality on mobile phone displays,” .
    [3] Byoungho Lee, Jae-Hyeung Park, and Sung-Wook Min, “Three-dimensionaldisplay and information processing based on integral imaging,” in Digital Holography and Three-Dimensional Display, pp. 333–378. Springer, 2006.
    [4] Gordon Wetzstein, Douglas R Lanman, Matthew Waggener Hirsch, and Ramesh Raskar, “Tensor displays: compressive light field synthesis using multilayer displays with directional backlighting,” 2012.
    [5] Douglas Lanman, Matthew Hirsch, Yunhee Kim, and Ramesh Raskar,
    “Content-adaptive parallax barriers: optimizing dual-layer 3d displays using low-rank light field factorization,” in ACM SIGGRAPH Asia 2010 papers, pp. 1–10. 2010.
    [6] Douglas Lanman, Gordon Wetzstein, Matthew Hirsch, Wolfgang Heidrich, and Ramesh Raskar, “Polarization fields: dynamic light field display using multi-layer lcds,” in Proceedings of the 2011 SIGGRAPH Asia Conference, 2011, pp. 1–10.
    [7] Anders H Andersen and Avinash C Kak, “Simultaneous algebraic reconstruction technique (sart): a superior implementation of the art algorithm,” Ultrasonic imaging, vol. 6, no. 1, pp. 81–94, 1984.
    [8] Jiahui Zhang, Zhencheng Fan, Dawei Sun, and Hongen Liao, “Unified mathematical model for multilayer-multiframe compressive light field displays using lcds,” IEEE transactions on visualization and computer graphics, vol. 25, no.3, pp. 1603–1614, 2018.
    [9] Ciyou Zhu, Richard H Byrd, Peihuang Lu, and Jorge Nocedal, “Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization,” ACM Transactions on Mathematical Software (TOMS), vol. 23, no. 4, pp. 550–560, 1997.
    [10] Avinash C Kak, Malcolm Slaney, and Ge Wang, “Principles of computerized tomographic imaging,” 2002.
    [11] Vincent D Blondel, Ngoc-Diep Ho, Paul Dooren„ et al., “Weighted nonnegative matrix factorization and face feature extraction,” in In Image and Visio Computing. Citeseer, 2008.
    [12] Daniel D Lee and H Sebastian Seung, “Algorithms for non-negative matrix factorization,” in Advances in neural information processing systems, 2001, pp. 556–562.
    [13] Matthew Hirsch, Gordon Wetzstein, and Ramesh Raskar, “A compressive light field projection system,” ACM Transactions on Graphics (TOG), vol.33, no. 4, pp. 1–12, 2014.
    [14] Fu-Chung Huang, David P Luebke, and Gordon Wetzstein, “The light field stereoscope.,” in SIGGRAPH Emerging Technologies, 2015, pp. 24–1.
    [15] Li-De Chen, “Temporal fusion: Continuous time light field vedio factorization,” To be submitted to SIGGRAGH 2021, 2020.
    [16] Hao-Chien Cheng, “Vlsi architecture and chip design of real-time full-hd light field factorization for automultiscopic 3d display,” 2020.
    [17] Shaobing Huang, Li Yu, Fang-jian Han, and Yiwen Luo, “A pipelined architecture for user-defined floating-point complex division on fpga,” in 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE). IEEE, 2017, pp. 1–4.
    [18] Stuart F Obermann and Michael J. Flynn, “Division algorithms and implementations,” IEEE Transactions on computers, vol. 46, no. 8, pp. 833–854, 1997.
    [19] Katrin Honauer, Ole Johannsen, Daniel Kondermann, and Bastian Goldluecke, “A dataset and evaluation methodology for depth estimation on 4d light fields,” in Asian Conference on Computer Vision. Springer, 2016, pp.19–34.
    [20] Giuseppe Franco, Pierandrea Cancian, Luca Cerina, Elisabetta Besana, N Beretta, and Marco D Santambrogio, “Fpga-based muscle synergy extraction for surface emg gesture classification,” in 2017 IEEE Biomedical Circuits and Systems Conference (BioCAS). IEEE, 2017, pp. 1–4.

    QR CODE