研究生: |
郭柏妤 Kuo, Bo-Yu |
---|---|
論文名稱: |
DASH: 處理分析隨時間變動之圖型資料的非同步動態分散式計算系統 DASH: A Dynamic and Asynchronized Computing Framework for Processing Time Evolving Graph at Scale using Heterogeneous Resources |
指導教授: |
周志遠
Chou, Jerry |
口試委員: |
金仲達
李哲榮 |
學位類別: |
碩士 Master |
系所名稱: |
|
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 43 |
中文關鍵詞: | 圖處理系統 、串流圖數據 、分散式計算 、漸進演算法 |
外文關鍵詞: | graph processing system, time-evolving graph, distributed computing, incremental algorithm |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
傳統的圖計算系統通常是為靜態圖而設計的,然而這些系統常用的方法不再
適用於現今隨時間變動之圖資料,它將導致諸如負載不平衡和更高記憶體使
用率等問題。而現今工作節點間的異質資源更增加了不平衡負載的程度,進
一步降低了系統性能。
在本論文中,我們提出了一個非同步動態分散式圖計算系統DASH,它
採用動態圖數據上傳來減少記憶體使用量並避免花在圖預分區的時間,使用
非同步和漸進演算法來達到更快的演算法收斂,使用有工作負載感知的任務
調度方法來產生更均衡的負載。我們的實驗結果顯示,與GPS 相比,DASH
在同質資源中的速度提高了3 倍以上,異質資源下的速度提高了8 倍。除了
執行時間外,DASH 更節省60%的記憶體使用量,因此DASH 的系統架構
和我們採用的方法對處理隨時間變動之圖資料有很好的效能。
Traditional distributed graph processing systems are generally designed for static graph. However, common approaches for these systems are no longer suitable for time-evolving graph, it will lead to problems like load unbalance and higher memory usage. Moreover, heterogeneous resources among worker nodes increase the degree of imbalance workload, further have a decrease in system performance.
In this work, we present a distributed graph processing system DASH that adopts dynamic graph data loading to reduce memory usage and avoid graph pre-partition time. We also use asynchronized and incremental computing to have faster algorithm convergence, and we have a workload-aware task scheduling method to have more balanced load in the system. Our evaluations using real world datasets show that the architecture of DASH and the approaches we adopt has more than 3x speedup in homogeneous resources and 8x speedup in heterogeneous resources compared to another well-designed distributed graph processing system GPS. Besides execution time, DASH also saves 60% of memory usage since it doesn’t need to load the whole graph partition into worker nodes.
[1] Apache giraph. http://giraph.apache.org/.
[2] Bahmani, B., Chowdhury, A., and Goel, A. Fast incremental and personalized pagerank. Proceedings of the VLDB Endowment 4, 3 (2010), 173–184.
[3] Chen, R., Shi, J., Chen, Y., and Chen, H. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the Tenth European Conference on Computer Systems (2015), ACM, p. 1.
[4] Cheng, R., Hong, J., Kyrola, A., Miao, Y., Weng, X., Wu, M., Yang, F., Zhou, L., Zhao, F., and Chen, E. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM european conference on Computer Systems (2012), ACM, pp. 85–98.
[5] Ediger, D., McColl, R., Riedy, J., and Bader, D. A. Stinger: High performance data structure for streaming graphs. In High Performance Extreme Computing (HPEC), 2012 IEEE Conference on (2012), IEEE, pp. 1–5.
[6] Gonzalez, J. E., Low, Y., Gu, H., Bickson, D., and Guestrin, C. Powergraph: distributed graph-parallel computation on natural graphs. In OSDI (2012), vol. 12, p. 2.
[7] Iyer, A. P., Li, L. E., Das, T., and Stoica, I. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems (2016), ACM, p. 5.
[8] Kao, J.-S., and Chou, J. Distributed incremental pattern matching on streaming graphs. In Proceedings of the ACM Workshop on High Performance Graph Processing (2016), ACM, pp. 43–50.
[9] Karypis, G., and Kumar, V. Multilevelk-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed computing 48, 1 (1998), 96–129.
[10] Karypis, G., and Kumar, V. Parallel multilevel series k-way partitioning scheme for irregular graphs. Siam Review 41, 2 (1999), 278–300.
[11] Kwak, H., Lee, C., Park, H., and Moon, S. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web (2010), ACM, pp. 591–600.
[12] Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., and Hellerstein, J. M. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716–727.
[13] Low, Y., Gonzalez, J. E., Kyrola, A., Bickson, D., Guestrin, C. E., and Hellerstein, J. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014).
[14] Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (2010), ACM, pp. 135–146.
[15] Mayer, C., Tariq, M. A., Li, C., and Rothermel, K. Graph: Heterogeneity aware graph computation with adaptive partitioning. In Distributed Computing Systems (ICDCS), 2016 IEEE 36th International Conference on (2016), IEEE, pp. 118–128.
[16] Page, L., Brin, S., Motwani, R., and Winograd, T. The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford InfoLab, 1999.
[17] Roy, A., Mihailovic, I., and Zwaenepoel, W. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013), ACM, pp. 472–488.
[18] Salihoglu, S., and Widom, J. Gps: a graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (2013), ACM, p. 22.
[19] Xin, R. S., Gonzalez, J. E., Franklin, M. J., and Stoica, I. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems (2013), ACM, p. 2.
[20] Zhu, X., Chen, W., Zheng, W., and Ma, X. Gemini: A computation-centric distributed graph processing system.