研究生: |
楊博翔 Yang, Bo-Hsiang |
---|---|
論文名稱: |
應用於提升深度解析度至4K Ultra-HD之加權眾數濾波器硬體架構設計 VLSI Architecture of Weighted Mode Filter for 4K Ultra-HD Depth Upsampling |
指導教授: |
黃朝宗
Huang, Chao-Tsung |
口試委員: |
簡韶逸
Chien, Shao-Yi 邱瀞德 Chiu, Ching-Te |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 56 |
中文關鍵詞: | 提升深度解析度 、加權眾數濾波器 、大型積體電路設計 |
外文關鍵詞: | Depth Upsampling, Weighted Mode Filter, VLSI |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著高等計算機視覺的蓬勃發展,高品質且高解析度的深度圖變得越來越加重要,並為數十種應用開闢各種可能性,例如:機器人學、視點合成、擴增實境/虛擬實境顯示、影像重新對焦和 3D 模型重建。為了即時產生高解析度的深度圖,需要大量的硬體資源且是計算密集。隨著深度解析度的需求增加,主動式深度感測器通常受限於面積和功率問題,而立體匹配算法常需要很高的計算複雜度。因此在本論文中,我們提出深度圖放大的技術——加權眾數濾波器,並分析其優缺點。為了能夠即時放大深度影片,我們實作加權眾數濾波器的超大型積體電路,而此電路並不會帶來太多的額外硬體成本。此外,此濾波器具有保留物體邊界的特性,能更進一步提升深度圖品質。
我們的目標是提出每秒 30 幀的 4K 超高畫質深度圖放大引擎,因此該設計的吞吐量非常高。設計挑戰有兩方面:統計直方圖的大量記憶體成本,以及因為高計算複雜度產生的大量邏輯閘。我們提出記憶體縮減方法——深度映射架構,此方法可以降低46.9%的記憶體成本。此外,我們設計二進位加權函數來計算值域加權,以解決複雜性的問題,此方法與高斯加權函數相比,邏輯閘減少了64.3%。
我們基於台積電 40 奈米製程實作每秒 30 幀的 4K 超高畫質深度圖放大引擎,此電路使用 25.5-KB 的記憶體,以及 42 萬的邏輯閘,而此晶片的大小為1.1 × 1.1 mm2 。在 200 MHz 下合成時,它可以提供 320M pixel/sec 的吞吐量,也就是每秒 40 幀的 4K 超高畫質深度影片,而它的功耗為 104 mW。
High-quality and high-resolution depth maps become more and more important as the trend of advanced computer vision arises, and they have opened tremendous possibilities for dozens of applications, such as robotics, view synthesis, AR/VR display, image refocusing, and 3D reconstruction. Generating high-resolution depth maps is computation-intensive and requires heavy hardware resources for real-time applications. As resolution increases, active sensors are often limited to area and power issues while stereo matching algorithms usually require heavy computational complexity. Therefore, in this thesis, we introduce an upsampling technique, weighted mode filter, and analyze the pros and cons of it. To support real-time depth video upsampling, we present a VLSI circuit that applies weighted mode filtering to increase resolution without much hardware overhead. Moreover, the weighted mode filter is an edge-preserving filter technique which can further enhance depth quality.
We aim to introduce a 4K Ultra-HD depth upsampling engine at 30 fps, so the throughput of this design is extremely high. The design challenges are twofold: tremendous memory cost for histograms and large gate counts due to high computational complexity. We present a memory reduction method—depth candidate mapping architecture which can reduce memory cost by 46.9%. Furthermore, we devise a binary weight kernel for range weighting to solve the complexity issue, and it reduces logic gates by 64.3% compared to Gaussian weight kernel.
We implemented a VLSI circuit for 4K Ultra-HD depth video upsampling using TSMC 40nm technology process with 25.5-KB on-chip memory and 420K gate counts. The core area is 1.1 × 1.1 mm 2 . When synthesized at 200 MHz, it can deliver 320M pixel/sec to support 4K Ultra-HD depth video at 40 fps, and the power consumption is 104 mW in the post-layout simulation.
[1] D. Bronzi, Y. Zou, F. Villa, S. Tisa, A. Tosi, and F. Zappa, “Automotive
three-dimensional vision through a single-photon counting SPAD camera,”
IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 3, pp.
782–795, March 2016.
[2] R. A. Newcombe, D. Fox, and S. M. Seitz, “DynamicFusion: Reconstruction
and tracking of non-rigid scenes in real-time,” in 2015 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 343–352.
[3] Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox, “A large-scale hierar-
chical multi-view RGB-D object dataset,” in IEEE International Conference
on on Robotics and Automation, 2011.
[4] S. Schuon, C. Theobalt, J. Davis, and S. Thrun, “LidarBoost: Depth super-
resolution for tof 3D shape scanning,” in 2009 IEEE Conference on Computer
Vision and Pattern Recognition, June 2009, pp. 343–350.
[5] S. Schuon, C. Theobalt, J. Davis, and S. Thrun, “High-quality scanning
using time-of-flight depth superresolution,” in 2008 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition Workshops, June
2008, pp. 1–7.
[6] Y. Cui, S. Schuon, D. Chan, S. Thrun, and C. Theobalt, “3D shape scanning
with a time-of-flight camera,” in 2010 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, June 2010, pp. 1173–1180.
[7] Oisin Mac Aodha, Neill D.F. Campbell, Arun Nair, and Gabriel J. Brostow,
“Patch based synthesis for single depth image super-resolution,” in ECCV
(3), 2012, pp. 71–84.
[8] J. Park, H. Kim, Y. W. Tai, M. S. Brown, and I. S. Kweon, “High-quality
depth map upsampling and completion for RGB-D cameras,” IEEE Trans-
actions on Image Processing, vol. 23, no. 12, pp. 5559–5572, Dec 2014.
[9] J. Li, Z. Lu, G. Zeng, R. Gan, and H. Zha, “Similarity-aware patchwork
assembly for depth image super-resolution,” in 2014 IEEE Conference on
Computer Vision and Pattern Recognition, June 2014, pp. 3374–3381.
[10] J. Xie, R. S. Feris, and M. T. Sun, “Edge-guided single depth image super
resolution,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp.
428–438, Jan 2016.
[11] James Diebel and Sebastian Thrun, “An application of Markov Random
Fields to range sensing,” in Proceedings of the 18th International Confer-
ence on Neural Information Processing Systems, Cambridge, MA, USA, 2005,
NIPS’05, pp. 291–298, MIT Press.
[12] J. Lu, D. Min, R. S. Pahwa, and M. N. Do, “A revisit to MRF-based depth
map super-resolution and enhancement,” in 2011 IEEE International Con-
ference on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp.
985–988.
[13] K. H. Lo, K. L. Hua, and Y. C. F. Wang, “Depth map super-resolution via
Markov Random Fields without texture-copying artifacts,” in 2013 IEEE
International Conference on Acoustics, Speech and Signal Processing, May
2013, pp. 1414–1418.
[14] Sergey Smirnov, Atanas Gotchev, and Karen Egiazarian, “Methods for depth-
map filtering in view-plus-depth 3D video representation,” EURASIP Journal
on Advances in Signal Processing, vol. 2012, no. 1, pp. 25, 2012.
[15] Q. Yang, R. Yang, J. Davis, and D. Nister, “Spatial-depth super resolution
for range images,” in 2007 IEEE Conference on Computer Vision and Pattern
Recognition, June 2007, pp. 1–8.
[16] J. H. Cho, S. Ikehata, H. Yoo, M. Gelautz, and K. Aizawa, “Depth map
up-sampling using cost-volume filtering,” in IVMSP 2013, June 2013, pp.
1–4.
[17] Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele,
“Joint bilateral upsampling,” ACM Trans. Graph., vol. 26, no. 3, July 2007.
[18] K. L. Hua, K. H. Lo, and Y. C. F. Frank Wang, “Extended guided filtering
for depth map upsampling,” IEEE MultiMedia, vol. 23, no. 2, pp. 72–83, Apr
2016.
[19] “Middlebury stereo,” vision.middlebury.edu/stereo/.
[20] Z. Ma, K. He, Y. Wei, J. Sun, and E. Wu, “Constant time weighted me-
dian filtering for stereo matching and beyond,” in 2013 IEEE International
Conference on Computer Vision, Dec 2013, pp. 49–56.
[21] Z. Ma, K. He, Y. Wei, J. Sun, and E. Wu, “Constant time weighted me-
dian filtering for stereo matching and beyond,” in 2013 IEEE International
Conference on Computer Vision, Dec 2013, pp. 49–56.
[22] Q. Zhang, L. Xu, and J. Jia, “100+ times faster weighted median filter
(WMF),” in 2014 IEEE Conference on Computer Vision and Pattern Recog-
nition, June 2014, pp. 2830–2837.
[23] D. Min, J. Lu, and M. N. Do, “Depth video enhancement based on weighted
mode filtering,” IEEE Transactions on Image Processing, vol. 21, no. 3, pp.
1176–1190, March 2012.
[24] D. Min, J. Lu, V. A. Nguyen, and M. N. Do, “Weighted mode filtering and its
applications to depth video enhancement and coding,” in 2012 IEEE Inter-
national Conference on Acoustics, Speech and Signal Processing (ICASSP),
March 2012, pp. 5433–5436.
[25] Y. C. Tseng, P. H. Hsu, and T. S. Chang, “A 124 Mpixels/s VLSI design
for histogram-based joint bilateral filtering,” IEEE Transactions on Image
Processing, vol. 20, no. 11, pp. 3231–3241, Nov 2011.
[26] L. D. Chen, Y. L. Hsiao, and C. T. Huang, “VLSI architecture design of
weighted mode filter for Full-HD depth map upsampling at 30fps,” in 2016
IEEE International Symposium on Circuits and Systems (ISCAS), May 2016,
pp. 1578–1581.
[27] H. Hirschmuller, “Accurate and efficient stereo processing by semi-global
matching and mutual information,” in 2005 IEEE Computer Society Confer-
ence on Computer Vision and Pattern Recognition (CVPR’05), June 2005,
vol. 2, pp. 807–814 vol. 2.
[28] Lincheng Li, Xin Yu, Shunli Zhang, Xiaolin Zhao, and Li Zhang, “3D cost ag-
gregation with multiple minimum spanning trees for stereo matching,” Appl.
Opt., vol. 56, no. 12, pp. 3411–3420, Apr 2017.
[29] W. Wang, J. Yan, N. Xu, Y. Wang, and F. H. Hsu, “Real-time high-quality
stereo vision system in fpga,” IEEE Transactions on Circuits and Systems
for Video Technology, vol. 25, no. 10, pp. 1696–1708, Oct 2015.
[30] C. C. Kao, J. H. Lai, and S. Y. Chien, “VLSI architecture design of guided
filter for 30 frames/s Full-HD video,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 24, no. 3, pp. 513–524, March 2014.
[31] Sang-Kyo Han, “An architecture for high-throughput and improved-quality
stereo vision processor,” in M.S. thesis, Dept. Elect. Comput. Eng., Univ.
Maryland, College Park, MD, 2010.
[32] J. Van de Weijer and R. Van den Boomgaard, “Local mode filtering,” in
Proceedings of the 2001 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition. CVPR 2001, 2001, vol. 2, pp. II–428–II–433
vol.2.