研究生: |
沈芷萱 Shen, Chih-Xuan |
---|---|
論文名稱: |
基於疊瓦式硬碟之半外部圖處理系統的效能改善 Improving Performance for SMR-based Semi-External Graph Processing Systems |
指導教授: |
石維寬
Shih, Wei-Kuan |
口試委員: |
張原豪
Chang, Yuan-Hao 梁郁珮 Liang, Yu-Pei |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 英文 |
論文頁數: | 33 |
中文關鍵詞: | 疊瓦式磁紀錄 、疊瓦式硬碟 、半外部圖處理系統 |
外文關鍵詞: | Shingled Magnetic Recording, SMR disk, Semi-external, Graph Processing System |
相關次數: | 點閱:13 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著圖形的規模越來越大,在記憶體沒辦法容納的情況下,如何改善圖形在硬碟上存取的效能也成為一個熱門的研究主題。本篇論文主要透過兩種方法來改善基於疊瓦式硬碟之半外部圖處理系統。
第一種方法會在預處理的階段透過廣度優先搜尋演算法來改善資料佈局,透過優化圖形資料的局部性,我們可以大幅改善執行圖形演算法時的輸入/輸出時間。第二種方法則是改善疊瓦式硬碟在更新資料時所產生的寫入放大問題,採用的方法是透過寫入指標的位置來決定要就地更新還是異地更新,如此一來便能減緩只使用就地更新所帶來的嚴重的寫入放大問題,以及減緩只使用異地更新所帶來的垃圾回收的負擔。
在實驗結果中,更新資料時所產生的寫入放大問題和異地更新相比最多可以減少58%,執行廣度優先搜尋演算法和網頁排名演算法時則分別可以減少43% 和17% 的輸入/輸出時間。
With the increasing scale of graphs, the performance of accessing graph data on disk has become a prominent research topic, especially when the large-scale graphs cannot fit in memory. This paper focuses on improving the performance for SMR-based semi-external graph processing systems through two main approaches.
The first approach improves data layout through the breadth-first search (BFS) algorithm during the preprocessing step. By optimizing the locality of graph data, we can improve I/O performance when executing graph algorithms.
The second approach tackles the write amplification issue of shingled magnetic recording (SMR) disks when updating graph data. We propose a method to determine whether to perform in-place-update or out-place-update by considering the track of the write pointer and the updated data. This approach mitigates the severe write amplification problem when relying solely on in-place-update, as well as the significant overhead of garbage collection when relying solely on out-place-update.
In the experimental results, we observed a reduction of up to 58% compared to out-place-update in the write amplification problem during data updates. Additionally, when executing the breadth-first search (BFS) algorithm and the PageRank algorithm, the I/O time was reduced by approximately 43% and 17% respectively.
[1] Zhu, Xiaowei, Wentao Han, and Wenguang Chen. "GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning." 2015 USENIX Annual Technical Conference (USENIX ATC 15). 2015.
[2] Kyrola, Aapo, Guy Blelloch, and Carlos Guestrin. "GraphChi: Large-Scale Graph Computation on Just a PC." 10th USENIX symposium on operating systems design and implementation (OSDI 12). 2012.
[3] Roy, Amitabha, Ivo Mihailovic, and Willy Zwaenepoel. "X-stream: Edge-centric graph processing using streaming partitions." Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 2013.
[4] Zheng, Da, et al. "FlashGraph: Processing {Billion-Node} graphs on an array of commodity SSDs." 13th USENIX Conference on File and Storage Technologies (FAST 15). 2015.
[5] Liu, Hang, and H. Howie Huang. "Graphene: Fine-Grained IO Management for Graph Computing." 15th USENIX Conference on File and Storage Technologies (FAST 17). 2017.
[6] Piramanayagam, S. N. "Perpendicular recording media for hard disk drives." Journal of Applied Physics 102.1 (2007).
[7] He, Weiping, and David HC Du. "SMaRT: An Approach to Shingled Magnetic Recording Translation." 15th USENIX Conference on File and Storage Technologies (FAST 17). 2017.
[8] Liang, Yu-Pei, et al. "Enabling sequential-write-constrained B+-Tree index scheme to upgrade shingled magnetic recording storage performance." ACM Transactions on Embedded Computing Systems (TECS) 18.5s (2019): 1-20.
[9] Z Jo, Yong-Yeon, et al. "A Data Layout with Good Data Locality for Single-Machine based Graph Engines." IEEE Transactions on Computers 71.8 (2021): 1784-1793.
[10] The dataset of Twitter Network,
https://www.kaggle.com/datasets/mathurinache/twitter-edge-nodes
[11] The dataset of Slashdot Network,
https://snap.stanford.edu/data/soc-Slashdot0811.html