研究生: |
王閭威 Wang, Lyu Wei |
---|---|
論文名稱: |
BiFennel:針對大數據的快速二部圖劃分算法 BiFennel: Fast Bipartite Graph Partitioning Algorithm for Big Data |
指導教授: |
鍾葉青
Yeh Ching, Chung |
口試委員: |
許慶賢
陳世璋 李哲榮 蕭宏章 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 英文 |
論文頁數: | 36 |
中文關鍵詞: | 二部圖 、圖劃分 、圖計算 、PowerGraph |
外文關鍵詞: | Bipartite graph, Graph partitioning, Graph processing, PowerGraph |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在雲端計算被廣泛應用的今日,社交網絡分析、生物信息網絡、語義分析類的諸多應用對於快速處理億萬節點圖的能力有著迫切的需求,這也是分佈式高性能計算領域的研發重點之一。而許多諸如音樂、電影網絡的推薦系統,LDA主題模型等問題都可以通過二部圖的建模、計算進行解決。而圖劃分作為圖計算預處理最重要的環節,雖然已經是一個很成熟的技術,然而許多經典算法因為需要迭代多次,時間複雜度過大,因此應用到大規模圖劃分時時間過長。近年來也湧現出一些速度較快的圖劃分算法,然而這些算法并沒有針對二部圖進行針對性的優化。本文提出的BiFennel算法是一種新二部圖劃分算法,通過降低節點複製率,平衡圖數據負載的方式,進而降低迭代圖計算部分時間以及網路負載。最後,本文將BiFennel算法應用于圖計算平台PowerGraph [10]。實驗證明,相比於目前最好的Aweto算法,使用BiFennel後,圖引擎整體執行時間最高降低1.69倍,網路負載減少最高達到2.32倍。另外,當圖規模增大或是集群中節點數增多時,BiFennel的性能並沒有降低。
Cloud computing is widely utilized in today’s internet service, which severely
requires the ability of processing graphs of billion vertices rapidly in the situation
such as social network analyze, bio-informational network analyze and semantic
processing. Therefore, graph processing has a significant role in the research and
development of high-performance computing. However, many problems such as
music and movie recommendation web, LDA topic model can be solved by modeling
these data into bipartite graph and computing it with graph processing engines. As the
most important step in preprocessing, graph partitioning is a relatively mature
technology. However, most of classic graph partitioning algorithms require iterative
calculation for several times, which causes huge time complexity especially adapting
to big data. There are some rapid algorithms these years, which cannot be used in
bipartite graph directly. Therefore, this thesis proposed a new bipartite graph
partitioning algorithm, BiFennel, which effectively decrease graph processing time
and network loading by reducing vertex replication factor and maintaining work
balance. Afterwards, this thesis applies BiFennel to a popular graph engine called
PowerGraph and proves it gets up to 1.69 times overall runtime speedup of graph
processing and 2.32 times network load reduction comparing to the best competitor,
Aweto. Besides, the algorithm doesn’t meet a degradation when graph scale increases,
so does its scalability.
1. Sergey B., Larry P. "The anatomy of a large-scale hypertextual web search engine. " Computer Networks and ISDN Systems, 30 (1998): 5-20.
2. Yu, Gu. "Large scale graph data processing on cloud computing environments." Chinese Journal of Computers, 34.10 (2010): 1753-1765.
3. Malewicz G., Austern M. H., et al. "Pregel: a system for large-scale graph processing. " In SIGMOD 10(2010): 135–146.
4. http://finance.chinanews.com/it/2014/01-16/5745005.shtml
5. http://tech.qq.com/a/20140725/000288.htm
6. Yucheng L., Bickson D., et al. "Distributed GraphLab: a framework for machine learning and data mining in the cloud." Proceedings of the VLDB Endowment 5.8 (2012): 716-727.
7. Xin, Reynold S., et al. "Graphx: A resilient distributed graph system on spark." First International Workshop on Graph Data Management Experiences and Systems. ACM, 2013.
8. Tsourakakis C. E., Gkantsidis C., et al. "Fennel: Streaming graph partitioning for massive scale graphs. " Tech. Rep. 175918, Microsoft, 2012.
9. Spark. http://spark.apache.org
10. Gonzalez, Joseph E., et al. "PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs." OSDI. Vol. 12. No. 1. 2012.
11. Andreev, Konstantin, and Harald Racke. "Balanced graph partitioning." Theory of Computing Systems 39.6 (2006): 929-939.
12. Chen, Rong, et al. "Bipartite-oriented distributed graph partitioning for big learning." Proceedings of 5th Asia-Pacific Workshop on Systems. ACM, 2014.
13. Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
14. Valiant, Leslie G. "A bridging model for parallel computation." Communications of the ACM 33.8 (1990): 103-111.
15. BSP: https://en.wikipedia.org/wiki/Bulk_synchronous_parallel
16. Apache Giraph: http://giraph.apache.org
17. Xin, Reynold S., et al. "Graphx: A resilient distributed graph system on spark." First International Workshop on Graph Data Management Experiences and Systems. ACM, 2013.
18. Kernighan, Brian W., and Shen Lin. "An efficient heuristic procedure for partitioning graphs." Bell system technical journal 49.2 (1970): 291-307.
19. Bi, T., et al. "Efficient multiway graph partitioning method for fault section estimation in large-scale power networks." IEE Proceedings-Generation, Transmission and Distribution 149.3 (2002): 289-294.
20. Karypis, George, and Vipin Kumar. "Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0." (1995).
21. KONECT. http://konect.uni-koblenz.de/
22. Bershad, Brian N., Edward D. Lazowska, and Henry M. Levy. "PRESTO: A system for object‐oriented parallel programming." Software: Practice and Experience 18.8 (1988): 713-732.