簡易檢索 / 詳目顯示

研究生: 吳政倫
Wu, Chen-Lun
論文名稱: 在GPU上實作平行處理Bzip2資料壓縮演算法
Implementation of Bzip2 Data Compression Algorithm with Parallel Program Based on GPU
指導教授: 石維寬
Shih, Wei-Kuan
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2010
畢業學年度: 98
語文別: 中文
論文頁數: 50
中文關鍵詞: 資料壓縮顯示晶片演算法平行處理平行程式
外文關鍵詞: GPGPU, Bzip2, BWT, MTF, CUDA, Nvidia, Burrows-Wheeler Transformation, Move-To-Front, Parallel, Data Compression, Huffman code, GPU
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   科技資訊的發展日新月異,隨著網路頻寬以及儲存設備的不斷演進,人們對於數位資料的儲存需求也日益提昇,因此資料壓縮成為一個重要而且無法避免的課題。而無損壓縮具有確保資料正確性的能力,在考量編碼效率、編碼延遲以及編碼複雜度的情況下,如何取得一個平衡值是相當重要的研究方向。

      本論文將針對無損壓縮的Bzip2演算法改寫為平行處理的方式,首先介紹Nvidia CUDA的平行程式開發環境,利用顯示卡上的GPU來達成3D圖形顯示以外的運算用途,泛稱GPGPU,藉由繪圖晶片的強大運算能力來進行壓縮編碼的工作。由於CUDA支援C語言的使用,所以對於開發GPU程式的門檻降低,是一個相當適合的實驗環境,除了介紹目前無損壓縮編碼的演進,也順便介紹CUDA的程式架構以及硬體設備。

      對於無損編碼的改良,本論文會在壓縮編碼之前先執行Burrows-Wheeler Transformation以及Move-To-Front的轉換,此方式可以改善無損編碼的壓縮率,而我們的平行程式也著重在這兩者的操作上。除了探討程式分配的概念,本論文也會將GPU與CPU的Bzip2程式執行結果作比較與分析,最後討論壓縮演算法平行化對於此系統實作上所帶來的影響。


      The development of information technology is rapid. With the network bandwidth and storage devices continue to evolve, require for digital data storage demand is rising. Data compression has become an important and unavoidable issue. The lossless compression has the ability to ensure data accuracy. In consideration of coding efficiency, coding delay and complexity of coding, how to strike a balance between the values is important research direction.

      In this paper, we rewrite the lossless compression Bzip2 algorithms in the way for the parallel processing, first introduced the Nvidia CUDA the parallel programming environment. Using GPU on the graphics cards to achieve more operations besides 3D graphics computing, GPGPU, by a powerful graphics computing power to carry out the work of compression. As the CUDA support to the use of C language, so the threshold get lower for the development of GPU programming is a very suitable experimental environment. Apart from the evolution of the current lossless compression, but also the way introduce CUDA programming architecture and hardware.

      For lossless coding improvements, this paper will execute Burrows-Wheeler Transformation and the Move-To-Front transformation before the compression entropy coding. This method can improve the lossless compression ratio, and our program also focuses on the parallel operation on both transform. In addition to the concept of distribute the program, we will compare the performance about CUDA GPU program and Bzip2 CPU program. This paper checks the results for comparison and analysis, finally discuss the impact of parallel compression algorithm implemented on this system.

    中文摘要 I ABSTRACT II ACKNOWLEDGEMENTS III INDEX IV FIGURE LIST VI TABLE LIST VII 1. INTRODUCTION 1 1.1 BACKGROUND 1 1.2 MOTIVATION 3 1.3 PROBLEM 3 1.4 SOLUTION 4 1.5 CONTRIBUTION 4 2. RELATED WORK 5 2.1 LOSSLESS COMPRESSION ALGORITHM 5 2.2 PARALLEL LOSSLESS COMPRESSION ALGORITHM 6 3. NVIDIA CUDA OVERVIEW 7 3.1 GPGPU INTRODUCTION 7 3.2 CUDA PROGRAMMING MODEL OVERVIEW 10 3.2.1 KERNELS 10 3.2.2 THREAD HIERARCHY 11 3.2.3 HETEROGENEOUS PROGRAMMING 13 3.2.4 COMPILATION WITH NVCC 14 3.3 CUDA HARDWARE ARCHITECTURE 15 3.3.1 STREAMING PROCESSOR 15 3.3.2 SIMT ARCHITECTURE 17 3.4 MEMORY OVERVIEW 17 3.4.1 DEVICE MEMORY 18 3.4.2 REGISTER 19 3.4.3 SHARED MEMORY 19 3.4.4 LOCAL MEMORY 19 3.4.5 GLOBAL MEMORY 20 3.4.6 CONSTANT MEMORY 20 3.4.7 TEXTURE MEMORY 20 4. PARALLEL BURROWS-WHEELER TRANSFORMATION IMPLEMENTATION 21 4.1 BURROWS-WHEELER TRANSFORMATION 21 4.1.1 CHARACTER SHIFTING 21 4.1.2 STRING SORTING 21 4.1.3 MOVE-TO-FRONT TRANSFORMATION 22 4.1.4 INVERSE 22 4.2 SYSTEM DESIGN AND ARCHITECTURE 23 4.2.1 BWT ENCODER 23 4.2.1.1 LEFT ROTATION 23 4.2.1.2 SORTING 24 4.2.1.3 INDEX OF ORIGINAL SEQUENCE 30 4.2.1.4 SYMBOL TABLE 30 4.2.2 BWT DECODER 30 4.2.2.1 RESTORATION 30 4.2.2.2 SPEED UP RESTORATION 33 4.2.3 MOVE-TO-FRONT IMPLEMENTATION IN CUDA 40 5. EXPERIMENT RESULT AND STATISTIC ANALYSIS 42 6. CONCLUSION AND FUTURE WORK 47 REFERENCE 48

    [1] Julian Seward's original reference implementation available under a BSD license.
    The Bzip2 home page http://www.bzip.org/

    [2] M. Burrows, D. Wheeler, “A block sorting lossless data compression algorithm”, Technical Report 124, Digital Equipment Corporation, 1994.

    [3] B. Ya. Ryabko, “Data compression by means of a “book stack””, Problems of Information Transmission, 16:4 pp. 265–269. 1980.

    [4] D.A. Huffman, “A method for the construction of minimum-redundancy codes”, Proceedings of the I.R.E., Sept 1952.

    [5] Jorma J. Rissanen, "Generalized Kraft Inequality and Arithmetic Coding" , IBM Journal of Research and Development 20 (3): pp. 198–203. May 1976.

    [6] C.E. Shannon, "A Mathematical Theory of Communication", Bell System Technical Journal 27: pp. 379–423 ,July 1948.

    R.M. Fano, "The transmission of information", Technical Report No. 65 ,Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT, 1949.

    [7] J. S. Vitter, "Design and Analysis of Dynamic Huffman Codes", Journal of the ACM, 34(4), pp. 825–845, October 1987.

    [8] C-S. Chang , Joy A. Thomas, “Huffman algebras for independent random variables”, IBM RC, 1994.

    [9] L. L. Larmore, D. S. Hirschberg. “A fast algorithm for optimal length-limited Huffman codes”, Journal of the ACM, V 37 No. 3: pp. 464--473, 1990.

    [10] G. Nigel N. Martin, “Range encoding: An algorithm for removing redundancy from a digitized message”, Video & Data Recording Conference, Southampton, UK, July 24-27, 1979.

    [11] S.W. Golomb, “Run-length encodings”, IEEE Transactions on Information Theory, IT-12,03: pp. 399—401, 1966.

    [12] R. F. Rice, R. Plaunt, “Adaptive variable-length coding for efficient compression of spacecraft television data”, IEEE Transactions on Communications, vol. 16,09, pp. 889–897, Dec. 1971.

    [13] S-H. Teng, “The construction of Huffman-equivalent prefix code in NC”, ACM SIGACT News, Vol. 18, No.4, pp. 54-61, 1987.

    [14] M. J. Atallah, S. R. Kosaraju, L. L. Larmore, G. L. Miller and S-H. Teng. “Constructing trees in parallel”, ACM SIGACT, Proc. 1st Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 421-431, June. 1989.

    [15] Paul G. Howard, Jeffrey Scott Vitter, “Parallel Lossless Image Compression Using Huffman and Arithmetic Coding”, IEEE Data Compression Conference, Snowbird, Utah,299-308, March 23-26, 1992.

    [16] L. L. Larmore, T. M. Przytycka, “Constructing Huffman trees in parallel”, SIAM Journal on Computing,Vol. 24, No.6, pp. 1163-1169, December 1995.

    [17] C. Kruskal, “Searching, merging and sorting in parallel computation”, IEEE Trans. Computer, Vol. C-32,No. 10, pp. 942-946, October 1983.

    [18] S. T. Klein , Y. Wiseman, “Parallel Huffman Decoding with Applications to JPEG Files”, The Computer Journal, 46(5), c British Computer Society ,2003.

    [19] Laurentiu Acasandrei, Marius Neag, “A Fast Parallel Huffman Decoder For FPGA Implementation” , Acta Technica Napocensis, Volume 49, Number 1, 2008.

    [20] Jeff Gilchrist, Parallel Bzip2 (PBZIP2) Data Compression Software, BSD license, 2003~2010.
    http://compression.ca/pbzip2/

    [21] Nvidia CUDA Programming Guide 3.0
    http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf

    [22] Nvidia CUDA Reference Manual
    http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/CudaReferenceManual.pdf

    [23] Nvidia Parallel Processing with CUDA
    http://www.nvidia.com/docs/IO/47906/220401_Reprint.pdf

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE