簡易檢索 / 詳目顯示

研究生: 王閭威
Wang, Lyu Wei
論文名稱: BiFennel:針對大數據的快速二部圖劃分算法
BiFennel: Fast Bipartite Graph Partitioning Algorithm for Big Data
指導教授: 鍾葉青
Yeh Ching, Chung
口試委員: 許慶賢
陳世璋
李哲榮
蕭宏章
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 36
中文關鍵詞: 二部圖圖劃分圖計算PowerGraph
外文關鍵詞: Bipartite graph, Graph partitioning, Graph processing, PowerGraph
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在雲端計算被廣泛應用的今日,社交網絡分析、生物信息網絡、語義分析類的諸多應用對於快速處理億萬節點圖的能力有著迫切的需求,這也是分佈式高性能計算領域的研發重點之一。而許多諸如音樂、電影網絡的推薦系統,LDA主題模型等問題都可以通過二部圖的建模、計算進行解決。而圖劃分作為圖計算預處理最重要的環節,雖然已經是一個很成熟的技術,然而許多經典算法因為需要迭代多次,時間複雜度過大,因此應用到大規模圖劃分時時間過長。近年來也湧現出一些速度較快的圖劃分算法,然而這些算法并沒有針對二部圖進行針對性的優化。本文提出的BiFennel算法是一種新二部圖劃分算法,通過降低節點複製率,平衡圖數據負載的方式,進而降低迭代圖計算部分時間以及網路負載。最後,本文將BiFennel算法應用于圖計算平台PowerGraph [10]。實驗證明,相比於目前最好的Aweto算法,使用BiFennel後,圖引擎整體執行時間最高降低1.69倍,網路負載減少最高達到2.32倍。另外,當圖規模增大或是集群中節點數增多時,BiFennel的性能並沒有降低。


    Cloud computing is widely utilized in today’s internet service, which severely
    requires the ability of processing graphs of billion vertices rapidly in the situation
    such as social network analyze, bio-informational network analyze and semantic
    processing. Therefore, graph processing has a significant role in the research and
    development of high-performance computing. However, many problems such as
    music and movie recommendation web, LDA topic model can be solved by modeling
    these data into bipartite graph and computing it with graph processing engines. As the
    most important step in preprocessing, graph partitioning is a relatively mature
    technology. However, most of classic graph partitioning algorithms require iterative
    calculation for several times, which causes huge time complexity especially adapting
    to big data. There are some rapid algorithms these years, which cannot be used in
    bipartite graph directly. Therefore, this thesis proposed a new bipartite graph
    partitioning algorithm, BiFennel, which effectively decrease graph processing time
    and network loading by reducing vertex replication factor and maintaining work
    balance. Afterwards, this thesis applies BiFennel to a popular graph engine called
    PowerGraph and proves it gets up to 1.69 times overall runtime speedup of graph
    processing and 2.32 times network load reduction comparing to the best competitor,
    Aweto. Besides, the algorithm doesn’t meet a degradation when graph scale increases,
    so does its scalability.

    List of Figures 4 List of Algorithm 5 Chapter 1 Introduction 6 Chapter 2 Related Work 11 2.1 Graph Engines 11 2.1.1 Pregel (Apache Giraph) 11 2.1.2 GraphX 13 2.1.3 PowerGraph 14 2.2 Graph Partitioning Algorithms 15 2.2.1 METIS 16 2.2.2 Fennel 18 2.2.3 BiCut 19 2.2.4 Aweto 21 Chapter 3 Proposed Algorithm and Implementation 22 3.1 BiFennel Algorithm 22 3.2 Sample of BiFennel 25 3.3 Implementation in PowerGraph 27 Chapter 4 Performance Evaluation 29 4.1 Environment Settings 29 4.2 Partitioning Performance on Different Graph Data 30 4.3 Performance on Different Algorithms 31 4.4 Scalability 33 Chapter 5 Conclusions and Future Work 34 REFERENCE 35

    1. Sergey B., Larry P. "The anatomy of a large-scale hypertextual web search engine. " Computer Networks and ISDN Systems, 30 (1998): 5-20.
    2. Yu, Gu. "Large scale graph data processing on cloud computing environments." Chinese Journal of Computers, 34.10 (2010): 1753-1765.
    3. Malewicz G., Austern M. H., et al. "Pregel: a system for large-scale graph processing. " In SIGMOD 10(2010): 135–146. 

    4. http://finance.chinanews.com/it/2014/01-16/5745005.shtml
    5. http://tech.qq.com/a/20140725/000288.htm
    6. Yucheng L., Bickson D., et al. "Distributed GraphLab: a framework for machine learning and data mining in the cloud." Proceedings of the VLDB Endowment 5.8 (2012): 716-727.
    7. Xin, Reynold S., et al. "Graphx: A resilient distributed graph system on spark." First International Workshop on Graph Data Management Experiences and Systems. ACM, 2013.
    8. Tsourakakis C. E., Gkantsidis C., et al. "Fennel: Streaming graph partitioning for massive scale graphs. " Tech. Rep. 175918, Microsoft, 2012.
    9. Spark. http://spark.apache.org
    10. Gonzalez, Joseph E., et al. "PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs." OSDI. Vol. 12. No. 1. 2012.
    11. Andreev, Konstantin, and Harald Racke. "Balanced graph partitioning." Theory of Computing Systems 39.6 (2006): 929-939.
    12. Chen, Rong, et al. "Bipartite-oriented distributed graph partitioning for big learning." Proceedings of 5th Asia-Pacific Workshop on Systems. ACM, 2014.
    13. Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
    14. Valiant, Leslie G. "A bridging model for parallel computation." Communications of the ACM 33.8 (1990): 103-111.
    15. BSP: https://en.wikipedia.org/wiki/Bulk_synchronous_parallel
    16. Apache Giraph: http://giraph.apache.org
    17. Xin, Reynold S., et al. "Graphx: A resilient distributed graph system on spark." First International Workshop on Graph Data Management Experiences and Systems. ACM, 2013.
    18. Kernighan, Brian W., and Shen Lin. "An efficient heuristic procedure for partitioning graphs." Bell system technical journal 49.2 (1970): 291-307.
    19. Bi, T., et al. "Efficient multiway graph partitioning method for fault section estimation in large-scale power networks." IEE Proceedings-Generation, Transmission and Distribution 149.3 (2002): 289-294.
    20. Karypis, George, and Vipin Kumar. "Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0." (1995).
    21. KONECT. http://konect.uni-koblenz.de/
    22. Bershad, Brian N., Edward D. Lazowska, and Henry M. Levy. "PRESTO: A system for object‐oriented parallel programming." Software: Practice and Experience 18.8 (1988): 713-732.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE