一個低面積高輸出效能之多協定影像轉換使用共同因子分散式算術技術

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳俊能 Chen, Jyun-Neng
論文名稱：	一個低面積高輸出效能之多協定影像轉換使用共同因子分散式算術技術 A Low Cost High Throughput Architecture for Multi-standard Video Transforms Using Common Sharing Distributed Arithmetic
指導教授：	張慶元 Chang, Tsin-Yuan
口試委員:	陳竹一黃元豪謝明得
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2011
畢業學年度：	99
語文別：	英文
論文頁數：	57
中文關鍵詞：	多協定影像轉換、分散式算術技術、共同因子分散式算術技術
外文關鍵詞：	Multi-standard video transforms, Distributed arithmetic, Common sharing distributed arithmetic
相關次數：	點閱：79 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

影像壓縮標準諸如MPEG-1/2/4、H.264和VC-1是廣泛使用在影像/視訊應用中。在本論文中，提出一個低面積高輸出效能之多協定影像轉換使用共同因子分散式算術(Common sharing distributed arithmetic
，簡稱CSDA)技術可以支援目前較常見影像壓縮標準，並且可以處理四種轉換型態，包含8×8、4×4、4×8和8×8等矩陣轉換。在本論文中，提出的共同因子分散式算術策略可以增加各個影像標準轉換係數(coefficient)間電路共用之能力，並且減少樹狀加法器(adder tree
)之加法數目。因此，提出的二維(2-D)架構只需使用較少的硬體面積來實現。在資料處理上，使用8條平行路徑(paths)且最大操作頻率為205MHz。因此，本論文所提出的二維轉換架構可以達到每秒處理1.64G的像素值(pixels)，而且所需面積為30K邏輯閘(logic gates)
，由於本論文所提出之架構擁有高輸出率(high throughput rate)，因此可以支援數位電影(digital cinema, 4298×2048@24Hz) 4:4:4的亮度色彩比和高畫質電視1080p(high definition television，簡稱HDTV, 1920×1080@60Hz) 4:2:0的亮度色彩比之規格。在技術上，本論文所提出之二維架構使用TSMC 0.18-µm CMOS 1P6M來實現。

Video and image compression standards, such as MPEG-1/2/4, H.264, and VC-1, are widely used in video and image
applications. In this thesis, the proposed low cost high
throughput architecture for multi-standard transform using common sharing distributed arithmetic (CSDA) can support variable video compression standards. Moreover, the proposed
architecture can process four transform types including 8 × 8, 4 × 4, 4 × 8,and 8 × 4 transforms. In this thesis, the proposed CSDA strategy can increase the circuit sharing capability among the coecients and reduce the number of adders in adder tree. Hence, the proposed architecture can be implemented by using less hardware cost. In data processing rate, the proposed architecture can process the eight pixel data per cycle and operates in 205MHz clock frequency. Therefore, the proposed two-dimensional (2-D) architecture can achieve 1.64G-pixels/s throughput rate with the hardware cost of 30K gates, and the throughput rate can meet the specications of digital cinema (4298 × 2048@24Hz) with 4:4:4 video processing bits and high denition television (HDTV) 1080p(1920 × 1080@60Hz) with 4:2:0 video processing bits. For verication, the proposed 2-D
architecture is implemented by using TSMC 0.18-m CMOS 1P6M process.

Introduction . . . . . . . . . . . . . . . . . . . . . 1
1 Introduction . . . . . . . . . . . . . . . . . . . . 1
2 Previous Work . . . . . . . . . . . . . . . . . . .  4
2.1 Distributed Arithmetic . . . . . . . . . . . . . . 5
2.2 Factor Share . . . . . . . . . . . . . . . . . . . 9
2.3 Matrix Decomposition . . . . . . . . . . . . . . . 9
3 Motivation . . . .  . . . . . . . . . . . . . . . . 10
4 Thesis Organizations . . . . . . . . . . .. . . . . 11
Proposed Common Sharing Distributed Arithmetic Architecture 12
1 Recursive Algorithm for Transform . . . . . . . . . 12
2 Mathematic Derivation of Distributed Arithmetic . . 16
3 Mathematic Derivation of Factor Sharing . . . . . . 19
4 Proposed Mapping Strategy for Common Sharing and Distributed Arithmetic 20
5 Proposed Common Sharing Distributed Arithmetic Circuit . . . . . . . . . 32
6 Proposed Compensated Strategy and Architecture . .  35
7 Proposed 2-D Transform Circuit . . . . . . . . . . . . . . . . . . . . . . . . 39
7.1 Block Diagram of The Proposed 1-D Transform Core .40
7.2 Transpose Unit . . . . . . . . . . . .. . . . . . 41
Result and Comparison . . . . . . . . . . . . .  . . .43
1 Estimation . . . . . . . . . . . . . . . . . . .. . 43
2 Simulation Result . . . .  . . . . . .. . . . . . . 44
3 Specication . . . . .  . . . . . . . . .. . . . . . 46
4 Comparison . . .  . . . . . . . . . . . . . . . . . 46
5 Chip implementation and Characteristics . . . . . . 48
6 FPGA Verication . . . . . . . . . . . . . . . . . . 49
Conclusion and Future Work . . . . . . . . . . . . . .51
1 Conclusion . .. . . . . . . . . . . . . . . . . . . 51
2 Future Work  . . . . .  . . . . . . . . . . . . . . 52
Bibliography . . . . . . . . . . .  . . . . . . . . . . 52

                                

[1] Moving Picture Experts Group Web site,
http://mpeg.chiariglione.org/index.html.
[2] International Telecommunication Union Telecommunication Standardization Sector Web site, http://www.itu.int/ITU-T.
[3] Offcial Windows Media Web site,
http://www.microsoft.com/windows/windows-media/default.asp.
[4] Video Compression standardsWeb site,
http://en.wikipedia.org/wiki/Video compression.
[5] S. Srinivasan, P. Hsu, T. Holcomb, K. Mukerjee, S. L. Regunathan, B. Lin, J. Liang, M. C. Lee, and J. R. Corbera \Windows Media Video 9: Overview and applications," Signal Processing, Image Communication, vol. 19, no. 9, pp. 851-875, October 2004.
[6] S. I. Uramoto, Y. Inoue, A. Takebatake, J. Takeda, Y. Yamashita, H. Terane, and M. Yoshimoto, "A 100-MHz 2-D Discrete Cosine Transform Core Processor," IEEE Journal of Solid-State Circuits (JSSCC), vol. 27, no. 4, pp. 492-499, April 1992.
[7] S. Yu and E. E. S. Jr., "DCT Implementation with Distributed Arithmetic," IEEE Transactions on Computers (TC), vol. 50, no. 9, pp. 985-991, September 2001.
[8] A. M. Shame, A. Chidanandan, W. Pan, and M. A. Bayoumi,
"NEDA: A Low-Power High-Performance DCT Architecture," IEEE Transactions on Signal Processing (TSP), vol. 54, no. 3, pp. 955-964, March 2006.
[9] M. R. M. Rizk and M. Ammar, "Low Power Small Area High Performance 2D-DCT architecture, " in Proc. International on Design and Test Workshop (IDT), 2007, pp.120-125.
[10] J. W. Chen, K. H Chen, J. S. Wang, and J. I. Guo, "A Performance-Aware IP Core Design for Multi-mode Transform Coding Using Scalable-DA Algorithm," in Proc. IEEE International Symposium on Circuits and Systems (ISCAS), 2006, pp.1904-1907.
[11] C. Y. Huang, L. F. Chen, and Y. K Lai, "A High-Speed 2-D Transform Architecture with Unique Kernel for Multi-Standard Video Applications," in Proc. IEEE International Symposium on Circuits and Systems (ISCAS), 2008, pp. 21-24.
[12] H. Chang, S. Kim, S. Lee, and K. Cho, "Design of Area-efficient Unied Transform Circuit for Multi-standard Video Decoder," in Proc. IEEE International SoC Design Conference
(ISOC), 2009, pp. 369-372.
[13] S. Lee and K. Cho, "Circuit Implementation for
Transform and Quantization Operation of H.264/MPEG-4/VC-1 Video Decoder," in Proc. IEEE International Conference on Design & Technology of Integrated Systems (DTIS), 2007, pp. 102-107.
[14] S. Lee and K. Cho, "Architecture of Transform Circuit for Video Decoder Supporting Multiple Standards," Institution of Engineering and Technology (IET), vol. 44, no.4, pp. 274-275, February 2008.
[15] H. Qi, Q. Huang, and W. Gao, "A Low-Cost Very Large Scale Integration Architecture For Multistandard Inverse Transform," IEEE Transactions on Circuits and Systems II (TCSII), Express Briefs, vol. 57, no. 7, pp. 551-555, July 2010.
[16] C. P. Fan and G. A. Su, "Fast Algorithm and Low-Cost Hardware-Sharing Design of Multiple Integer Transforms for VC-1," IEEE Transactions on Circuits and Systems II (TCSII)
, Express Briefs, vol. 56, no. 10, pp. 788-792, October 2009.
[17] C. P. Fan and G. A. Su, "Efficient Fast 1-D 8×8 Inverse Integer Transform for VC-1 Application," IEEE Transactions on Circuits and Systems for Video Technology(TCSVT), vol. 19, no. 4, pp. 584-590, April 2009.
[18] W. Hwangbo and C. M. Kyung, "A Multitransform Architecture for H.264/AVC High-Prole Coders," IEEE Transactions on Multimedia (TM), vol. 12, no. 3, pp.157-167
, April 2010.
[19] S. Lee and K. Cho, "Design of High-Performance Transform and Quantization Circuit for Unified Video CODEC,
" in Proc. IEEE Asia Pacic conference on Circuit and
Systems (APCCAS), 2008, pp. 1450-1453.
[20] G. A. Ruiz, J. A. Michell, and A. M. Buron, "Parallel-Pipeline 8×8 Forward 2-D ICT Processor Chip for Image Coding," IEEE Transactions on Signal Processing (TSP),vol. 53, no. 2, pp. 714-723, February 2005.
[21] K. H. Chen, J. I. Guo, and J. S. Wang, "A High-Performance Direct 2-D Transform Coding IP Design for MPEG-4 AVC/H.264," IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 16, no. 4, pp. 472-483, April
2006.
[22] P. Chungan, C. Xixin, Y. Dunshan, and Z. Xing, "A 250MHz Optimized Distributed Architecture of 2D 8×8 DCT," in Proc. International Conference on ASIC (ICASIC), 2007,
pp. 189-192.
[23] Y. J. Wang, C. C. Chang, G. Z. Wu, and T. C. Chen,
"Low-Complexity Integrated Architecture of 4×4, 4×8, 8×4 and 8×8 Inverse Integer Transforms of VC-1," in Proc. IEEE International Midwest Symposium on Circuit and Systems (MWSCAS), 2009, pp. 543-546.
[24] C. P. Fan and G. A. Su, "Ecient Low-Cost Sharing Design of Fast 1-D Inverse Integer Transform Algorithms for H.264/AVC and VC-1," IEEE Signal Processing Letters (LSP), vol. 15, pp. 926-929, 2008.
[25] Z. Y. Cheng, C. H. Chen, B. D. Liu, and J. F. Yang,
"High Throughput 2-D Transform Architectures For H.264 Advanced Video Coders," in Proc. IEEE Asic Pacific
Conference on Circuit and Systems (APCCAS), 2004, pp. 1141-1144.
[26] S. M. Kim, J. G. Chung, and K. K. Parhi, "Low Error Fixed-Width CSD Multiplier With Ecient Sign Extension," IEEE Transactions on Circuits and Systems II(TCSII), vol. 50, pp. 984-993, December 2003.
[27] L. D. Van, S. S.Wang, and W. S. Feng, "Design of the Lower Error Fixed-Width Multiplier and Its Application," IEEE Transactions on Circuits and Systems II (TCSII),
vol. 47, pp. 1112-1118, October 2000.
[28] Y. H. Chen, T. Y. Chang, and C. Y. Li, "High Throughput
DA-Based DCT With High Accuracy Error-Compensated Adder Tree," IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 19, pp. 709-714, April 2011.
56
[29] J. I. Guo, R. C. Ju, and J. W. Chen, "An Ecient 2-D DCT/IDCT Core Design Using Cyclic Convolution and Adder-Based Realization," IEEE Transactions on Circuits and
Systems for Video Technology (TCSVT), vol. 14, no. 4, pp. 416-428, April 2004.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文