研究生: |
張展榕 Chang, Chan-Jung |
---|---|
論文名稱: |
ECS2: 利用直接且平行的I/O 路徑提升使用 GPU 加速糾刪碼的儲存系統效能 ECS2: A Fast Erasure Coding Library for GPU-Accelerated Storage Systems With Parallel & Direct IO |
指導教授: |
周志遠
Chou, Jerry |
口試委員: |
李哲榮
Lee, Che-Rung 賴冠州 Lai, Kuan-Chou |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 29 |
中文關鍵詞: | 儲存裝置 、糾刪碼 、可靠性 、效能 、平行I/O |
外文關鍵詞: | Storage system, Erasure Code, Reliability, Performance, Parallel I/O |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著資料量高速成長,對於可靠、大規模、具成本效益的儲存系統有迫切的
需求。糾刪碼由於可透過較高的儲存成本效益保持資料的可靠度,而逐漸吸
引眾人目光,同時已被廣泛應用在許多分散式、大規模儲存系統,如Azure
cloud storage 和HDFS。然而,採用糾刪碼的代價即是較高的計算複雜度。許
多研究指出,糾刪碼的計算可透過GPU 大幅度加速,這也同時導致新的效
能瓶頸轉移到儲存裝置與GPU 之間的資料傳輸。在本研究中,我們設計並
實作了ECS2。ECS2 是個透過GPU 加速,快速的糾刪碼函式庫。使用者可
以透過該函式庫加強資料可靠性保護,而該函式庫提供類似儲存系統的程式
介面。透過Nvidia GPU 提供的最新GPUDirect 技術,本函式庫可使I/O 路經
省略並繞過CPU 和主記憶體,以減少計算以及I/O 的時間花費。基於真實的
儲存系統追蹤,我們透過合成的I/O 追蹤,驗證了I/O 延遲可透過GPUDirect
技術降低10% ∼ 20% 的時間,且整體的通過量可提高至70%。
As data volume keeps increasing at a rapid rate, there is an urgent need for large,
reliable, and cost-effective storage systems. Erasure coding has drawn increasing
attention because of its ability to ensure data reliability with higher storage efficiency,
and it has been widely adopted in many distributed and large-scale storage
systems, such as Azure cloud storage and HDFS. However, the storage efficiency
of erasure code comes at the price of higher computing complexity. While many
studies have shown the coding computations can be significantly accelerated using
GPU, the overhead of data transfer between storage devices and GPUs become a
new performance bottleneck. In this work, we designed and implemented, ECS2,
a fast erasure coding library on GPU-accelerated storage to let users enhance their
data protection with transparent IO performance and storage system like programming
interface. By taking advantage of the latest GPUDirect technology supported
on Nvidia GPU, our library is able to bypass CPU and host memory copy from the
IO path, so that both the computing and IO overhead from coding can be minimized.
Using synthetic IO workload based on real storage system trace, we show that the IO
latency can be reduced by 10% ∼ 20% with GPUDirect technology, and the overall
IO throughput of a storage system can be improved up to 70%.
[1] A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology.
https://github.com/NVIDIA/gdrcopy.
[2] GPUDirect family. https://developer.nvidia.com/gpudirect.
[3] H3 falcon 4008 pcie switch. https://www.h3platform.com/productdetail/
overview/11.
[4] IO trace from Systor ’17 Traces. http://iotta.snia.org/tracetypes/3.
[5] Al-Kiswany, S., Gharaibeh, A., and Ripeanu, M. Gpus as storage system accelerators.
CoRR abs/1202.3669 (02 2012).
[6] Balaji, S., Muralee Krishnan, N. K., Vajha, M., Ramkumar, V., Sasidharan, B.,
and Kumar, P. Erasure coding for distributed storage: an overview. Science
China Information Sciences 61 (10 2018).
[7] Bhatotia, P., Rodrigues, R., and Verma, A. Shredder: Gpu-accelerated incremental
storage and computation. In Proceedings of the 10th USENIX Conference
on File and Storage Technologies (USA, 2012), FAST’12, USENIX
Association, p. 14.
[8] Chang, F., Ji, M., Leung, S.-T., MacCormick, J., Perl, S., and Zhang, L. Myriad:
Cost-effective disaster tolerance. In Proceedings of the 1st USENIX Conference
on File and Storage Technologies (USA, 2002), FAST ’02, USENIX
Association, p. 8‒es.
[9] Chen, X., Liu, J., and Xie, P. Erasure code of small file in a distributed file
system. In 2017 3rd IEEE International Conference on Computer and Communications
(ICCC) (2017), pp. 2549–2554.
[10] Chen, X., and Reed, I. S. Error-Control Coding for Data Networks. Kluwer
Academic Publishers, USA, 1999.
[11] Chu, X., Liu, C., Ouyang, K., Yung, L. S., Liu, H., and Leung, Y. Perasure: A
parallel cauchy reed-solomon coding library for gpus. In 2015 IEEE International
Conference on Communications (ICC) (2015), pp. 436–441.
[12] Curry, M. L., Skjellum, A., Lee Ward, H., and Brightwell, R. Gibraltar: A
reed-solomon coding library for storage applications on programmable graphics
processors. Concurr. Comput.: Pract. Exper. 23, 18 (Dec. 2011), 2477‒
2495.
[13] David Reinsel, John Gantz, J. R. Data age 2025, November 2018.
[14] Greenan, K. M., Li, X., and Wylie, J. J. Flat xor-based erasure codes in storage
systems: Constructions, efficient recovery, and tradeoffs. In 2010 IEEE 26th
Symposium on Mass Storage Systems and Technologies (MSST) (2010), pp. 1–
14.
[15] Haddock, W., Curry, M. L., Bangalore, P. V., and Skjellum, A. Gpu erasure
coding for campaign storage. In High Performance Computing (Cham, 2017),
Springer International Publishing, pp. 145–159.
[16] Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., and
Yekhanin, S. Erasure coding in windows azure storage. In Presented as part of
the 2012 USENIX Annual Technical Conference (USENIX ATC 12) (Boston,
MA, 2012), USENIX, pp. 15–26.
[17] Ishengoma, F. Hdfs+: Erasure-coding based hadoop distributed file system.
International Journal of Scientific and Research Technology Volume 2 (09
2013).
[18] Khan, O., Burns, R., Park, J., and Huang, C. In search of i/o-optimal recovery
from disk failures. In Proceedings of the 3rd USENIX Conference on Hot
Topics in Storage and File Systems (USA, 2011), HotStorage’11, USENIX
Association, p. 6.
[19] Khasymski, A., Rafique, M. M., Butt, A. R., Vazhkudai, S. S., and Nikolopoulos,
D. S. On the use of gpus in realizing cost-effective distributed raid. In 2012
IEEE 20th International Symposium on Modeling, Analysis and Simulation of
Computer and Telecommunication Systems (Aug 2012), pp. 469–478.
[20] Khasymski, A., Rafique, M. M., Butt, A. R., Vazhkudai, S. S., and Nikolopoulos,
D. S. On the use of gpus in realizing cost-effective distributed raid. In 2012
IEEE 20th International Symposium on Modeling, Analysis and Simulation of
Computer and Telecommunication Systems (2012), pp. 469–478.
[21] Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D.,
Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao,
B. Oceanstore: An architecture for global-scale persistent storage. SIGPLAN
Not. 35, 11 (Nov. 2000), 190‒201.
[22] Liu, C., Wang, Q., Chu, X., and Leung, Y. G-crs: Gpu accelerated cauchy reedsolomon
coding. IEEE Transactions on Parallel and Distributed Systems 29,
7 (2018), 1484–1498.
[23] Mayhew, D., and Krishnan, V. Pci express and advanced switching: evolutionary
path to building next generation interconnects. 11th Symposium on High
Performance Interconnects, 2003. Proceedings. (2003), 21–29.
[24] Oggier, F., and Datta, A. Self-repairing homomorphic codes for distributed
storage systems. In 2011 Proceedings IEEE INFOCOM (2011), pp. 1215–
1223.
[25] Patterson, D. A., Gibson, G., and Katz, R. H. A case for redundant arrays of
inexpensive disks (raid). In Proceedings of the 1988 ACM SIGMOD International
Conference on Management of Data (New York, NY, USA, 1988),
SIGMOD ’88, Association for Computing Machinery, p. 109‒116.
[26] Plank, J. S. Erasure codes for storage systems: A brief primer. ;login: the
Usenix magazine 38, 6 (December 2013).
[27] Plank, J. S., Simmerman, S., and Schuman, C. D. Jerasure: A library in c/c++
facilitating erasure coding for storage applications version 1.2. Tech. Rep. CS-
08-627, University of Tennessee, 2008.
[28] Rashmi, K. V., Shah, N. B., Gu, D., Kuang, H., Borthakur, D., and Ramchandran,
K. A solution to the network challenges of data recovery in erasure-coded
distributed storage systems: A study on the facebook warehouse cluster. In
Proceedings of the 5th USENIX Conference on Hot Topics in Storage and File
Systems (USA, 2013), HotStorage’13, USENIX Association, p. 8.
[29] Rashmi, K. V., Shah, N. B., and Kumar, P. V. Optimal exact-regenerating
codes for distributed storage at the msr and mbr points via a product-matrix
construction. IEEE Transactions on Information Theory 57, 8 (2011), 5227–
5239.
[30] REED, I. S. Polynomial codes over certain finite fields. Journal of SIAM 8, 2
(1960), 300–304.
[31] Rossbach, C. J., Currey, J., Silberstein, M., Ray, B., and Witchel, E. Ptask:
Operating system abstractions to manage gpus as compute devices. In Proceedings
of the Twenty-Third ACM Symposium on Operating Systems Principles
(New York, NY, USA, 2011), SOSP ’11, Association for Computing
Machinery, p. 233‒248.
[32] Silberstein, M., Ford, B., Keidar, I., and Witchel, E. Gpufs: Integrating a file
system with gpus. SIGPLAN Not. 48, 4 (Mar. 2013), 485‒498.
[33] Suh, C., and Ramchandran, K. Exact-repair mds codes for distributed storage
using interference alignment. In 2010 IEEE International Symposium on
Information Theory (2010), pp. 161–165.
[34] Tseng, H.-W., Zhao, Q., Zhou, Y., Gahagan, M., and Swanson, S. Morpheus:
Creating application objects efficiently for heterogeneous computing.
In Proceedings of the 43rd International Symposium on Computer Architecture
(2016), ISCA ’16, IEEE Press, p. 53‒65.
[35] Weatherspoon, H., and Kubiatowicz, J. D. Erasure coding vs. replication: A
quantitative comparison. In Peer-to-Peer Systems (Berlin, Heidelberg, 2002),
P. Druschel, F. Kaashoek, and A. Rowstron, Eds., Springer Berlin Heidelberg,
pp. 328–337.
[36] Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D. E., and Maltzahn, C.
Ceph: A scalable, high-performance distributed file system. In Proceedings of
the 7th Symposium on Operating Systems Design and Implementation (USA,
2006), OSDI ’06, USENIX Association, p. 307‒320.
[37] Xiang, L., Xu, Y., Lui, J. C., and Chang, Q. Optimal recovery of single disk
failure in rdp code storage systems. SIGMETRICS Perform. Eval. Rev. 38, 1
(June 2010), 119‒130.
[38] Yiu, M. M. T., Chan, H. H. W., and Lee, P. P. C. Erasure coding for small
objects in in-memory kv storage. In Proceedings of the 10th ACM International
Systems and Storage Conference (New York, NY, USA, 2017), SYSTOR ’17,
Association for Computing Machinery.