在GPU上實作平行處理Bzip2資料壓縮演算法｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳政倫 Wu, Chen-Lun
論文名稱：	在GPU上實作平行處理Bzip2資料壓縮演算法 Implementation of Bzip2 Data Compression Algorithm with Parallel Program Based on GPU
指導教授：	石維寬 Shih, Wei-Kuan
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2010
畢業學年度：	98
語文別：	中文
論文頁數：	50
中文關鍵詞：	資料壓縮、顯示晶片、演算法、平行處理、平行程式
外文關鍵詞：	GPGPU, Bzip2, BWT, MTF, CUDA, Nvidia, Burrows-Wheeler Transformation, Move-To-Front, Parallel, Data Compression, Huffman code, GPU
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

　　科技資訊的發展日新月異，隨著網路頻寬以及儲存設備的不斷演進，人們對於數位資料的儲存需求也日益提昇，因此資料壓縮成為一個重要而且無法避免的課題。而無損壓縮具有確保資料正確性的能力，在考量編碼效率、編碼延遲以及編碼複雜度的情況下，如何取得一個平衡值是相當重要的研究方向。

　　本論文將針對無損壓縮的Bzip2演算法改寫為平行處理的方式，首先介紹Nvidia CUDA的平行程式開發環境，利用顯示卡上的GPU來達成3D圖形顯示以外的運算用途，泛稱GPGPU，藉由繪圖晶片的強大運算能力來進行壓縮編碼的工作。由於CUDA支援C語言的使用，所以對於開發GPU程式的門檻降低，是一個相當適合的實驗環境，除了介紹目前無損壓縮編碼的演進，也順便介紹CUDA的程式架構以及硬體設備。

　　對於無損編碼的改良，本論文會在壓縮編碼之前先執行Burrows-Wheeler Transformation以及Move-To-Front的轉換，此方式可以改善無損編碼的壓縮率，而我們的平行程式也著重在這兩者的操作上。除了探討程式分配的概念，本論文也會將GPU與CPU的Bzip2程式執行結果作比較與分析，最後討論壓縮演算法平行化對於此系統實作上所帶來的影響。

　　The development of information technology is rapid. With the network bandwidth and storage devices continue to evolve, require for digital data storage demand is rising. Data compression has become an important and unavoidable issue. The lossless compression has the ability to ensure data accuracy. In consideration of coding efficiency, coding delay and complexity of coding, how to strike a balance between the values is important research direction.

　　In this paper, we rewrite the lossless compression Bzip2 algorithms in the way for the parallel processing, first introduced the Nvidia CUDA the parallel programming environment. Using GPU on the graphics cards to achieve more operations besides 3D graphics computing, GPGPU, by a powerful graphics computing power to carry out the work of compression. As the CUDA support to the use of C language, so the threshold get lower for the development of GPU programming is a very suitable experimental environment. Apart from the evolution of the current lossless compression, but also the way introduce CUDA programming architecture and hardware.

　　For lossless coding improvements, this paper will execute Burrows-Wheeler Transformation and the Move-To-Front transformation before the compression entropy coding. This method can improve the lossless compression ratio, and our program also focuses on the parallel operation on both transform. In addition to the concept of distribute the program, we will compare the performance about CUDA GPU program and Bzip2 CPU program. This paper checks the results for comparison and analysis, finally discuss the impact of parallel compression algorithm implemented on this system.

中文摘要    I
ABSTRACT    II
ACKNOWLEDGEMENTS    III
INDEX    IV
FIGURE LIST    VI
TABLE LIST    VII
 INTRODUCTION    1
1  BACKGROUND     1
2  MOTIVATION    3
3  PROBLEM     3
4  SOLUTION     4
5  CONTRIBUTION     4
 RELATED WORK    5
1  LOSSLESS COMPRESSION ALGORITHM     5
2  PARALLEL LOSSLESS COMPRESSION ALGORITHM     6
 NVIDIA CUDA OVERVIEW    7
1  GPGPU INTRODUCTION    7
2  CUDA PROGRAMMING MODEL OVERVIEW    10
2.1  KERNELS    10
2.2  THREAD HIERARCHY    11
2.3  HETEROGENEOUS PROGRAMMING    13
2.4  COMPILATION WITH NVCC    14
3  CUDA HARDWARE ARCHITECTURE    15
3.1  STREAMING PROCESSOR    15
3.2  SIMT ARCHITECTURE    17
4  MEMORY OVERVIEW    17
4.1  DEVICE MEMORY    18
4.2  REGISTER    19
4.3  SHARED MEMORY    19
4.4  LOCAL MEMORY    19
4.5  GLOBAL MEMORY    20
4.6  CONSTANT MEMORY    20
4.7  TEXTURE MEMORY    20
 PARALLEL BURROWS-WHEELER TRANSFORMATION IMPLEMENTATION    21
1  BURROWS-WHEELER TRANSFORMATION    21
1.1  CHARACTER SHIFTING    21
1.2  STRING SORTING    21
1.3  MOVE-TO-FRONT TRANSFORMATION    22
1.4  INVERSE    22
2  SYSTEM DESIGN AND ARCHITECTURE    23
2.1  BWT ENCODER    23
2.1.1  LEFT ROTATION    23
2.1.2  SORTING    24
2.1.3  INDEX OF ORIGINAL SEQUENCE    30
2.1.4  SYMBOL TABLE    30
2.2  BWT DECODER    30
2.2.1  RESTORATION    30
2.2.2  SPEED UP RESTORATION    33
2.3  MOVE-TO-FRONT IMPLEMENTATION IN CUDA    40
 EXPERIMENT RESULT AND STATISTIC ANALYSIS    42
 CONCLUSION AND FUTURE WORK    47
REFERENCE    48

                                

[1] Julian Seward's original reference implementation available under a BSD license.
The Bzip2 home page http://www.bzip.org/

[2] M. Burrows, D. Wheeler, “A block sorting lossless data compression algorithm”, Technical Report 124, Digital Equipment Corporation, 1994.

[3] B. Ya. Ryabko, “Data compression by means of a “book stack””, Problems of Information Transmission, 16:4 pp. 265–269. 1980.

[4] D.A. Huffman, “A method for the construction of minimum-redundancy codes”, Proceedings of the I.R.E., Sept 1952.

[5] Jorma J. Rissanen, "Generalized Kraft Inequality and Arithmetic Coding" , IBM Journal of Research and Development 20 (3): pp. 198–203. May 1976.

[6] C.E. Shannon, "A Mathematical Theory of Communication", Bell System Technical Journal 27: pp. 379–423 ,July 1948.

R.M. Fano, "The transmission of information", Technical Report No. 65 ,Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT, 1949.

[7] J. S. Vitter, "Design and Analysis of Dynamic Huffman Codes", Journal of the ACM, 34(4), pp. 825–845, October 1987.

[8] C-S. Chang , Joy A. Thomas, “Huffman algebras for independent random variables”, IBM RC, 1994.

[9] L. L. Larmore, D. S. Hirschberg. “A fast algorithm for optimal length-limited Huffman codes”, Journal of the ACM, V 37 No. 3: pp. 464--473, 1990.

[10] G. Nigel N. Martin, “Range encoding: An algorithm for removing redundancy from a digitized message”, Video & Data Recording Conference, Southampton, UK, July 24-27, 1979.

[11] S.W. Golomb, “Run-length encodings”, IEEE Transactions on Information Theory, IT-12,03: pp. 399—401, 1966.

[12] R. F. Rice, R. Plaunt, “Adaptive variable-length coding for efficient compression of spacecraft television data”, IEEE Transactions on Communications, vol. 16,09, pp. 889–897, Dec. 1971.

[13] S-H. Teng, “The construction of Huffman-equivalent prefix code in NC”, ACM SIGACT News, Vol. 18, No.4, pp. 54-61, 1987.

[14] M. J. Atallah, S. R. Kosaraju, L. L. Larmore, G. L. Miller and S-H. Teng. “Constructing trees in parallel”, ACM SIGACT, Proc. 1st Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 421-431, June. 1989.

[15] Paul G. Howard, Jeffrey Scott Vitter, “Parallel Lossless Image Compression Using Huffman and Arithmetic Coding”, IEEE Data Compression Conference, Snowbird, Utah,299-308, March 23-26, 1992.

[16] L. L. Larmore, T. M. Przytycka, “Constructing Huffman trees in parallel”, SIAM Journal on Computing,Vol. 24, No.6, pp. 1163-1169, December 1995.

[17] C. Kruskal, “Searching, merging and sorting in parallel computation”, IEEE Trans. Computer, Vol. C-32,No. 10, pp. 942-946, October 1983.

[18] S. T. Klein , Y. Wiseman, “Parallel Huffman Decoding with Applications to JPEG Files”, The Computer Journal, 46(5), c British Computer Society ,2003.

[19] Laurentiu Acasandrei, Marius Neag, “A Fast Parallel Huffman Decoder For FPGA Implementation” , Acta Technica Napocensis, Volume 49, Number 1, 2008.

[20] Jeff Gilchrist, Parallel Bzip2 (PBZIP2) Data Compression Software, BSD license, 2003~2010.
http://compression.ca/pbzip2/

[21] Nvidia CUDA Programming Guide 3.0
http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf

[22] Nvidia CUDA Reference Manual
http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/CudaReferenceManual.pdf

[23] Nvidia Parallel Processing with CUDA
http://www.nvidia.com/docs/IO/47906/220401_Reprint.pdf

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文