簡易檢索 / 詳目顯示

研究生: 吳俊峯
Wu, Chun Feng
論文名稱: 基於雲端儲存服務之混合寫入機制
Hybrid Mechanisms to Improve Write Scenarios for Cloud Storage Services
指導教授: 鍾葉青
Yeh, Ching Chung
口試委員: 周志遠
Jerry Chou
許慶賢
Hsu, Ching Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 39
中文關鍵詞: 混合型分散式檔案系統雲端儲存CephHDFS
外文關鍵詞: Hybrid Distributed File System, Cloud Storage, Ceph, HDFS
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 雲端儲存服務隨處可見。許多雲端儲存服務使用分散式檔案系統作為後端儲存系統。根據我們的觀察,使用者上傳的檔案通常為文件檔,圖檔或者音樂檔,而這些檔案通常小於10MB。因此,上傳到雲端儲存系統的檔案大部分小於10MB。除了大部分的小檔案外,雲端儲存系統中也有少部分超過1GB的大檔案,例如:電影檔,影像檔又或者作業系統映像檔。再者根據過去研究顯示,系統中檔案依據檔案大小分布均為重尾分佈 (Heavy-Tailed Distribution)。因此我們可以假設在雲端存系統中,大部分都為小檔案,只有少數10%到20%的大檔案。
    基於上述的結果,我們針對大量小檔案以及小量大檔案,提出混合型分散式檔案系統並且搭配檔案分配機制來提升系統寫檔案的效能。我們混合HDFS和Ceph兩種分散式檔案系統並且提出兩種檔案分配的機制。我們的測試結果顯示,我們提出的檔案分配機制搭配混合型分散式檔案系統比單純使用Ceph或HDFS多出將近兩倍的寫入吞吐量。為了驗證機制的可行性並且評估我們機制對雲端儲存系統整體效能的改進,我們將我們提出的混合型分散式檔案系統搭配檔案分配機制,整合進雲端儲存系統SSBox中。


    Cloud storage service is pervasive nowadays. Many cloud storage services leverage distributed file systems as their base storage. Users usually upload files such as document files, image files and music files that are usually smaller than 10MB. Therefore, most files in cloud storage may be small files smaller than 10MB. There are few large files such as operating system image files, Ubuntu images for example, or some video files such as movies, which are usually bigger than 1GB. Moreover, distribution of file sizes obeys Heavy-Tailed Distribution according to past researches. As a result, we could assume that most files in cloud storage are small files and there are just 10% to 20% large files.
    Based on the observation above, the goal of this thesis is to design a distributed file system with data allocation mechanisms which perform high write throughputs for large small files and few very large files. This thesis proposes two files allocation mechanisms for hybrid distributed file system based on Ceph and HDFS. The result shows that our hybrid mechanism for hybrid distributed file system outperforms either HDFS or Ceph around 200%. To verify the feasibility and evaluate the performance on cloud storage services, we also implement and integrate the hybrid distributed file system with the files allocation mechanism on cloud storage services, SSBox.

    CHAPTER 1 Introduction --------------------------------- 9 CHAPTER 2 Background and Related Works ----------------- 12 2.1 File Distributions in Cloud storage ------------- 12 2.2 Ceph -------------------------------------------- 12 2.3 HDFS -------------------------------------------- 13 2.4 Comparison between Ceph and HDFS ---------------- 14 2.5 SSBox Architecture ------------------------------ 15 2.6 Machine Learning -------------------------------- 17 CHAPTER 3 Mechanisms and System Configuration ----------- 19 3.1 Hybrid Distributed File System Mechanism -------- 19 3.1.1 File Size Preprocessing ------------------- 19 3.1.2 K-Nearest Neighbors(KNN) ------------------ 21 3.1.3 RAM Disk Cache and Parallel Programming --- 22 3.2 Configuration for Ceph, HDFS and SSBox ---------- 25 CHAPTER 4 Experiments Design ---------------------------- 27 4.1 Experiments on Local ---------------------------- 28 4.2 Experiments on Cloud Storage: SSBox ------------- 28 CHAPTER 5 Evaluation ------------------------------------ 31 4.1 Evaluation on Local ----------------------------- 31 4.2 Evaluation on SSBox ----------------------------- 34 CHAPTER 6 Conclusions and Future Works ------------------ 36 REFERENCE ----------------------------------------------- 37

    Jun, S., & Sha-sha, Y. (2011, May). The application of cloud storage technology in SMEs. In E-Business and E-Government (ICEE), 2011 International Conference on (pp. 1-5). IEEE.
    Grossman, R. L., Gu, Y., Sabala, M., & Zhang, W. (2009). Compute and storage clouds using wide area high performance networks. Future Generation Computer Systems, 25(2), 179-183.
    Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D., & Maltzahn, C. (2006, November). Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation (pp. 307-320). USENIX Association.
    Weil, S. A., Brandt, S. A., Miller, E. L., & Maltzahn, C. (2006, November). CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing (p. 122). ACM.
    Weil, S. A., Leung, A. W., Brandt, S. A., & Maltzahn, C. (2007, November). Rados: a scalable, reliable storage service for petabyte-scale storage clusters. In Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07 (pp. 35-44). ACM.
    Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010, May). The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on (pp. 1-10). IEEE.
    Depardon, B., Séguin, C., & Mahec, G. L. (2013). Analysis of six distributed file systems Tech. Rep. hal-00789086 Université de Picardie Jules Verne.
    Donvito, G., Marzulli, G., & Diacono, D. (2014). Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis. In Journal of Physics: Conference Series (Vol. 513, No. 4, p. 042014). IOP Publishing.
    Hsing-Chang Chou, Che-Rung Lee, Yeh-Ching Chung. (2015) Container-Based Scale-Out Architecture for Cloud Storage Service. Master Thesis in National Tsing Hua University.
    Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. Information Theory, IEEE Transactions on, 13(1), 21-27.
    Zeng, W., Zhao, Y., Ou, K., & Song, W. (2009, November). Research on cloud storage architecture and key technologies. In Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human (pp. 1044-1048). ACM.
    Calder, B., Wang, J., Ogus, A., Nilakantan, N., Skjolsvold, A., McKelvie, S., ... & Haridas, J. (2011, October). Windows Azure Storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (pp. 143-157). ACM.
    Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In ACM SIGOPS operating systems review (Vol. 37, No. 5, pp. 29-43). ACM.
    Downey, A. B. (2001). The structural cause of file size distributions. In Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2001. Proceedings. Ninth International Symposium on (pp. 361-370). IEEE.
    Evans, K. M., & Kuenning, G. H. (2002, July). A study of irregularities in file-size distributions. In Proceedings of the 2002 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS).
    Barford, P., & Crovella, M. (1998). Generating representative web workloads for network and server performance evaluation. ACM SIGMETRICS Performance Evaluation Review, 26(1), 151-160.
    Barford, P., Bestavros, A., Bradley, A., & Crovella, M. (1999). Changes in web client access patterns: Characteristics and caching implications. World Wide Web, 2(1-2), 15-28.
    Crovella, M. E., Taqqu, M. S., & Bestavros, A. (1998). Heavy-tailed probability distributions in the World Wide Web. A practical guide to heavy tails, 1, 3-26.
    Welch, B., & Noer, G. (2013, May). Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions. In Mass Storage Systems and Technologies (MSST), 2013 IEEE 29th Symposium on (pp. 1-12). IEEE.
    Agrawal, N., Bolosky, W. J., Douceur, J. R., & Lorch, J. R. (2007). A five-year study of file-system metadata. ACM Transactions on Storage (TOS), 3(3), 9.
    Matsumoto, T., Onoyama, T., & Komoda, N. (2012). File size distribution model in enterprise file server toward efficient operational management. In Proceedings of world congress on engineering and computer science (Vol. 2, pp. 1400-1404).

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE