使用圖形總結模型加速分散式圖形演算法｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	侯佳欣 Hou, Chia-Hsin
論文名稱：	使用圖形總結模型加速分散式圖形演算法 Graph Summary: Towards Efficient Graph Algorithms in Distributed-Parallel Model
指導教授：	李哲榮 Lee, Che-Rung
口試委員:	韓永楷 Hon, Wing-Kai 何建明 Ho, Jan-Ming
學位類別：	碩士 Master
系所名稱：
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	31
中文關鍵詞：	分散式、圖論
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

傳統的圖論演算法是假設圖示可以完全存在單一台機器中，然而因為資料管理方便或是一些隱私原因，有的時候，圖會被切成好幾個小塊或是分別存在不同的機器裡。例如，當一個圖是會隨時間而改變而且非常的巨大時，我們會希望將它切成好幾塊，因此在更新時只需要更改一個小部分不會影響到其他區塊。又或者是，多個資料管理機構分別都掌管了一些資料，這些機構可能想將資料合併並且做一些資料分析，但又不想將所有資料都交出來讓彼此詳閱。要處理這種已經被切塊好的圖，要重新思考分散式演算法，以減少機器間的溝通來降低整體時間。有種方法是計算小區塊的圖得到一個圖的小結論，然後將這些小結論組合起來繼續後續的計算。這個方法已經被實踐在最大流量問題上了。有幾個好處: (1)機器之間的消息可以批量 (2)計算各個圖的小結論是可以被平行的 (3)可以利用平庸的計算能力或有限記憶體的機器 (4)可以隱藏小圖內部的資料
在這篇論文中，我們將探討圖形總結模型運用在基礎的圖論問題上，我們將會說明如何運用圖形總結模型來計算連通元件問題、最短路問題以及最小生成樹問題，並且實作一個分散式系統可即時回應使用者詢問。實驗結果顯示我們的方法可以比傳統的主從架構快30倍。

Traditional graph algorithms assume that the input graphs are completely resided in a single machine. However, due to data management or privacy reasons, a graph could be partitioned and stored on different machines. For instance, when a graph is dynamic and very large in scale, one would hope to partition it into parts so that updating one part would not affect the other parts. Alternatively, there may be multiple administrative parties, each owning only a part of a graph, but from time to time, these parties may want to work collaboratively on the whole graph (say, to compute some graph statistics), without others accessing their own part. To handle partitioned graphs, one may resort to design distributed algorithms, targeting to reduce the number of individual messages among the machines, as well as the overall running time. An orthogonal approach is to compute a small graph summary for each part of the graph (a.k.a. mimicking network), and then a single machine collects these summaries to perform subsequent computation. This approach is relatively underdeveloped and has only been restricted to the maximum flow problems. Yet, it has multiple advantages: (1) the messages among machines could be batched, (2) the computation of graph summaries could be parallelized, (3) machines with mediocre computing power or limited memory can be utilized, and (4) privacy of the internal structure of the partitioned graph can be preserved. In this paper, we explore both the theoretical and practical aspects of the graph summary approach on various fundamental graph algorithms. We demonstrated how graph summary works by designing and analyzing three commonly used graph algorithms: connected components, shortest path, and minimum spanning tree, and implemented a distributed system for answering the queries for large and dynamically updating graphs. Experimental results show that our system is almost 30 times faster than traditional master-slave architecture.

Chinese Abstract i
Abstract ii
Contents iii
List of Figures v
List of Tables vii
Introduction 1
Our Model 4
1 Computation Model: Distributed-Parallel Model . . . . . . . . . . . . 4
2 DataModel: Distributed Graph..................... 4
Graph Summary 6
1 Connected Components ......................... 8
2 Shortest Path ............................... 9
3 Minimum Spanning Tree......................... 11
3.1 The Revised Summary ...................... 13
Experiments 16
1 SystemImplementation.......................... 16
2 ExperimentalSettings .......................... 17
3 Exp1: Performance Comparison and Memory Usage . . . . . . . . . . 18
3.1 Connected Component Performance Comparison . . . . . . . . 18
3.2 Minimum Spanning Tree Performance Comparison . . . . . . 20
3.3 Shortest Path Performance Comparison. . . . . . . . . . . . . 22
3.4 Memory usage........................... 22
4 Exp2:Influence of Caching........................ 24
Related Work 6 Conclusion
29
                                

[1] Sen Su, Jian Li, Qingjia Huang, Xiao Huang, Kai Shuang, and Jie Wang. Cost- efficient task scheduling for executing large programs in the cloud. Parallel Computing, 39(4):177–188, 2013.
[2] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, pages 591–600. ACM, 2010.
[3] Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, volume 12, page 2, 2012.
[4] Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135–146. ACM, 2010.
[5] Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716–727, 2012.
[6] Adrian Segall. Decentralized maximum-flow protocols. Networks, 12(3):213– 230, 1982.
[7] Yossi Shiloach and Uzi Vishkin. An o (n2log n) parallel max-flow algorithm. Journal of Algorithms, 3(2):128–146, 1982.
[8] Torben Hagerup, Jyrki Katajainen, Naomi Nishimura, and Prabhakar Ragde. Characterizing multiterminal flow networks and computing flows in networks of small treewidth. Journal of Computer and System Sciences, 57(3):366–375, 1998.
[9] Robert Krauthgamer and Inbal Rika. Mimicking networks and succinct rep- resentations of terminal cuts. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1789–1799. Society for Industrial and Applied Mathematics, 2013.
[10] LR Ford Jr and DR Fulkerson. Maximal flow through a network. In Classic papers in combinatorics, pages 243–248. Springer, 2009.
[11] Jack Edmonds and Richard M Karp. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM (JACM), 19(2):248– 264, 1972.
[12] Thomas H Cormen. Introduction to algorithms. MIT press, 2009.
[13] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
[14] Sherif Sakr. Big data 2.0 processing systems: A survey. SpringerBriefs in computer science, 2016.

簡易檢索 / 詳目顯示

相關論文