簡易檢索 / 詳目顯示

研究生: 劉立恆
Liou, Li-Heng
論文名稱: 近乎線性時間之社群偵測與分群演算法
Nearly-Linear time Algorithms for Community Detection and Clustering
指導教授: 張正尚
Chang, Cheng-Shang
口試委員: 李端興
Lee, Duan-Shin
林華君
Lin, Hwa-Chun
連卿閔
Lien, Ching-Min
陳震宇
Chen, Cheng-Yu
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 通訊工程研究所
Communications Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 116
中文關鍵詞: 社群偵測分群網路科學線性時間演算法
外文關鍵詞: community detection, clustering, network science, linear time algorithm
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在網路分析領域中,社群偵測與分群這兩個相近的議題十分被重視。由於網路資料大小快速增長,社群偵測與分群演算法的效率及擴展性變得日漸重要。在本篇論文中,我們提供許多方法來在多樣且大型的網路資料上進行社群偵測與分群。

    本篇論文可以分成三個部分。於第一部分中,我們提出了兩項目標:一、在有向(directed)網路中,正式且精準的定義圖的分群與社群偵測。二、在有向網路中,演算法設計以及分析。為此,我們開發了用於有向網路的機率框架,此框架是奠基於我們之前用於無向網路的版本。縱使將對聯合分佈的要求從「對稱」放寬至只需擁有「相同的邊緣分佈」,我們仍然可以正式的在有向網路中定義何謂「重要性」(centrality)、「相對重要性」(relative centrality)、「社群」(community)、「模組度」(modularity)。透過模組度與稀疏度守恆轉換,我們也將許多常見於無向網路的社群偵測演算法拓展至有向網路,例如:hierarchical agglomerative演算法、partitional演算法、fast unfolding演算法。透過機率框架,我們可以得知這些三種演算法會在有限步內收斂。其中,partitional演算法更被證明是近乎線性時間的演算法,而hierarchical agglomerative演算與fast unfolding演算法輸出結果必定符合社群的數學定義。這些演算法在經過少許修改之後,都可以被拓展至更普遍的聯合分佈。我們實驗了使用「PageRank」和「隨機漫步與後跳」兩種方法得到的聯合分佈。

    在本論文的第二部份中,我們提出了一個新的迭代演算法「K-sets+」。該演算法可用於對半度量空間(metric space)中的資料分群。而在半度量空間中,三角不等式未必會成立。我們證明了K-sets+會在有限步內收斂,並擁有與K-sets相同的效能。此外我們還將K-sets+拓展至僅有對稱性的相似性度量(similarity measure)。這樣的拓展大幅降低了計算複雜度,使得當相似性矩陣是疏鬆時,演算法會具有線型的時間複雜度與空間複雜度。我們也進行了許多實驗驗證K-sets+演算法的效率。實驗資料分別來自於隨機塊狀模型(stochastic block model)和WonderNetwork的網頁。

    在本論文的第三部份中,我們詳細描述了如何建構一個線性時間的fast unfolding演算法。由於時間複雜度與空間複雜度受資料結構影響甚大,我們介紹了三個在實現線性時間的fast unfolding演算法時必要的資料結構。分別為adjacency list、disjoint sets和array set。其中adjacency lsit是廣泛用於儲存疏鬆網路拓樸的資料結構,而disjoint sets和array set為我們所開發的資料結構,可用於避免超線性時間的運算,例如排序或在雜湊樹(或二元樹)中插入元素。我們也用實驗去驗證我們的實踐方法的效率與擴展性。在該實驗中,我們的方法速度為比較對象的3.6倍,而處理的網路連結數也可以上達十億個。


    Community detection and clustering are two closely related issues that have drawn much of the attention in network analysis. Due to the rapid growth of the scale of networked data, the efficiency and the scalability of community detection algorithms and clustering algorithms are taken more seriously. In this thesis, we provide several efficient methods to perform community detection and clustering that can deal with diverse and large-scale data.

    The thesis is organized into three parts. In the first part of this thesis, we address two major points: (i) a formal and precise definition of the graph clustering and community detection problem in directed networks, and (ii) algorithm design and evaluation of community detection algorithms in directed networks.
    Motivated by these, we develop a probabilistic framework for structural analysis and community detection in directed networks based on our previous work in undirected networks.
    By relaxing the assumption from symmetric bivariate distributions in our previous work to bivariate distributions that have the same marginal distributions in this thesis,
    we can still formally define various notions for structural analysis in directed networks, including centrality, relative centrality, community, and modularity. We also extend three commonly used community detection algorithms in undirected networks to directed networks:
    the hierarchical agglomerative algorithm, the partitional algorithm, and the fast unfolding algorithm.
    These are made possible by two modularity preserving and sparsity preserving transformations. In conjunction with the probabilistic framework, we show these three algorithms converge in a finite number of steps. In particular, we show that the partitional algorithm is a nearly-linear time algorithm for large sparse graphs. Moreover, the outputs of the hierarchical agglomerative algorithm and the fast unfolding algorithm are guaranteed to be communities.
    These three algorithms can also be extended to general bivariate distributions with some minor modifications. We also conduct various experiments by using two sampling methods in directed networks: (i) PageRank and (ii) random walks with self-loops and backward jumps.

    In the second part of this thesis, we first propose a new iterative algorithm, called the K-sets+ algorithm for clustering data points in a semi-metric space, where the distance measure does not necessarily satisfy the triangular inequality. We show that the K-sets+ algorithm converges in a finite number of iterations and it retains the same performance guarantee as the K-sets algorithm for clustering data points in a metric space. We then extend the applicability of the K-sets+ algorithm from data points in a semi-metric space to data points that only have a symmetric similarity measure. Such an extension leads to great reduction of computational complexity. In particular, for an n × n similarity matrix with m nonzero elements in the matrix, the computational complexity of the K-sets+ algorithm is O((Kn+m)I), where I is the number of iterations. The memory complexity to achieve that computational complexity is O(Kn+m).
    As such, both the computational complexity and the memory complexity are linear in n when the n × n similarity matrix is sparse, i.e., m=O(n). We also conduct various experiments to show the effectiveness of the K-sets+ algorithm by using a synthetic dataset from the stochastic block model and a real network from the WonderNetwork website.

    In the third part of this thesis, we detail the implementation of the fast unfolding algorithm that has a nearly-linear time complexity and a linear memory complexity. Since the time and the memory complexity depend heavily on the data structures, we introduce three essential data structures for the implementation of the nearly-linear time fast unfolding algorithm: (i) adjacency list, (ii) disjoin sets, and (iii) array set. The adjacency list is a commonly used memory-efficient data structure for storing sparse networks. The disjoint sets and array set are our newly invented data structure that can allow us to avoid using superlinear operations such as sorting and insertings in a hash (or binary) tree. We also do an experiment to test the efficiency and scalability of our implementation of the fast unfolding algorithm. With the data structures and techniques designed by us, our implementation is 3.6 times faster than the competitor, and can cope with networks with one billion edges.

    1 Introduction 1 1.1 Community detection........................... 1 1.2 Clustering................................. 4 1.3 Overview ................................. 5 2 A Probabilistic Framework for Structural Analysis and Community Detection in Directed Networks 7 2.1 Sampling networks by bivariate distributions with the same marginal distributions................................ 11 2.1.1 PageRank............................. 13 2.1.2 Random walks with self-loops and backward jumps . . . . . . . 15 2.2 The framework for directed networks .................. 16 2.2.1 Centrality and relative centrality................. 16 2.2.2 Community strength and communities . . . . . . . . . . . . . . 18 2.2.3 Modularity and modularity preserving transformations . . . . . 21 2.3 Community detection........................... 24 2.3.1 A hierarchical agglomerative algorithm . . . . . . . . . . . . . 25 2.3.2 A partitional algorithm...................... 26 2.3.3 A fast unfolding algorithm.................... 30 2.3.4 Weak communities and outliers ................. 32 2.4 General bivariate distribution....................... 33 2.5 Experimentalresults ........................... 35 2.5.1 The hierarchical agglomerative algorithm . . . . . . . . . . . . 35 2.5.2 The fast unfolding algorithm................... 37 2.6 Conclusion ................................ 46 3 K-sets+: A Linear-time Clustering Algorithm for Data Points with a Sparse Similarity Measure 49 3.1 Clustering in a semi-metric space..................... 51 3.1.1 Semi-cohesion measure ..................... 52 3.1.2 Clusters in a semi-metric space.................. 53 3.1.3 The K-sets+ algorithm ...................... 54 3.2 Beyond semi-metric spaces........................ 57 xiii 3.2.1 Clustering with a symmetric similarity measure . . . . . . . . . 57 3.2.2 Computational complexity.................... 59 3.3 Experiments................................ 61 3.3.1 Community detection of signed networks with two communities 61 3.3.2 Clusteringofarealnetwork ................... 62 3.4 Conclusion ................................ 64 4 A Nearly-linear Time Implementation of the Fast Unfolding Algorithm 67 4.1 Data structures............................... 69 4.1.1 Adjacencylist........................... 71 4.1.2 Disjointsets............................ 74 4.1.3 Arrayset ............................. 76 4.1.4 Summary of data structures ................... 78 4.2 The implementation details of the fast unfolding algorithm . . . . . . . 79 4.2.1 The partitional algorithm..................... 79 4.2.2 The node aggregation algorithm ................. 84 4.2.3 The fast unfolding algorithm................... 86 4.3 Experiments................................ 87 5 Conclusions and future studies 91 5.1 Parallelization............................... 92 5.2 Fuzzy modularity ............................. 92 6 Appendix 95 6.1 Appendix of Chapter2 .......................... 95 6.1.1 Appendix A............................ 95 6.1.2 Appendix B............................ 96 6.1.3 Appendix C............................ 98 6.1.4 Appendix D............................ 98 6.1.5 Appendix E............................ 99 6.1.6 Appendix F............................100 6.1.7 Appendix G............................100 6.2 Appendix of Chapter 3 ..........................101 6.2.1 Appendix A............................101 6.2.2 Appendix B............................104 6.2.3 Appendix C............................105 6.3 Appendix of Chapter 4 ..........................107 6.3.1 Appendix A: Basic operations of the adjacency list . . . . . . . 107 6.3.2 Appendix B: Basic operations of the disjoint sets . . . . . . . . 108 6.3.3 Appendix C: Basic operations of the arrayset . . . . . . . . . . 109 References 111

    ``Number of social media users worldwide from 2010 to 2021 (in billions),''
    2018.

    R.~Courtland, ``Gordon moore: The man whose name means progress,'' IEEE
    Spectrum, vol.~30, 2015.

    M.~Plantié and M.~Crampes, ``Survey on social community detection,'' in
    Social media retrieval, pp.~65--85, Springer, 2013.

    S.~Fortunato, ``Community detection in graphs,'' Physics reports,
    vol.~486, no.~3, pp.~75--174, 2010.

    M.~A. Porter, J.-P. Onnela, and P.~J. Mucha, ``Communities in networks,'' Notices of the AMS, vol.~56, no.~9, pp.~1082--1097, 2009.

    B.~Yang, D.~Liu, and J.~Liu, ``Discovering communities from social networks:
    Methodologies and applications,'' in Handbook of social network
    technologies and applications, pp.~331--346, Springer, 2010.

    S.~Papadopoulos, Y.~Kompatsiaris, A.~Vakali, and P.~Spyridonos, ``Community
    detection in social media,'' Data Mining and Knowledge Discovery,
    vol.~24, no.~3, pp.~515--554, 2012.

    M.~E. Newman and M.~Girvan, ``Finding and evaluating community structure in
    networks,'' Physical review E, vol.~69, no.~2, p.~026113, 2004.

    B.~W. Kernighan and S.~Lin, ``An efficient heuristic procedure for partitioning
    graphs,'' The Bell system technical journal, vol.~49, no.~2,
    pp.~291--307, 1970.

    J.~Friedman, T.~Hastie, and R.~Tibshirani, The elements of statistical
    learning, vol.~1.
    \newblock Springer series in statistics New York, 2001.

    U.~Von~Luxburg, ``A tutorial on spectral clustering,'' Statistics and
    computing, vol.~17, no.~4, pp.~395--416, 2007.

    C.~F. Burk and F.~W. Horton, ``Infomap: a complete guide to discovering
    corporate information resources,'' in Infomap: a complete guide to
    discovering corporate information resources, Prentice Hall, 1988.

    F.-Y. Wu, ``The potts model,'' Reviews of modern physics, vol.~54, no.~1,
    p.~235, 1982.

    P.~Ronhovde and Z.~Nussinov, ``Local resolution-limit-free potts model for
    community detection,'' Physical Review E, vol.~81, no.~4, p.~046114,
    2010.

    M.~E. Newman, ``Modularity and community structure in networks,'' {\em
    Proceedings of the national academy of sciences}, vol.~103, no.~23,
    pp.~8577--8582, 2006.

    V.~D. Blondel, J.-L. Guillaume, R.~Lambiotte, and E.~Lefebvre, ``Fast unfolding
    of communities in large networks,'' Journal of Statistical Mechanics:
    Theory and Experiment, vol.~2008, no.~10, p.~P10008, 2008.

    U.~N. Raghavan, R.~Albert, and S.~Kumara, ``Near linear time algorithm to
    detect community structures in large-scale networks,'' Physical review
    E, vol.~76, no.~3, p.~036106, 2007.

    F.~Wu and B.~A. Huberman, ``Finding communities in linear time: a physics
    approach,'' The European Physical Journal B-Condensed Matter and Complex
    Systems, vol.~38, no.~2, pp.~331--338, 2004.

    S.~Fortunato and M.~Barthelemy, ``Resolution limit in community detection,''
    Proceedings of the National Academy of Sciences, vol.~104, no.~1,
    pp.~36--41, 2007.

    A.~K. Jain, ``Data clustering: 50 years beyond k-means,'' Pattern
    Recognition Letters, vol.~31, no.~8, pp.~651--666, 2010.

    M.~Newman, Networks: an introduction.
    \newblock OUP Oxford, 2009.

    C.-S. Chang, C.-Y. Hsu, J.~Cheng, and D.-S. Lee, ``A general probabilistic
    framework for detecting community structure in networks,'' in INFOCOM,
    2011 Proceedings IEEE, pp.~730--738, IEEE, 2011.

    C.-S. Chang, C.-J. Chang, W.-T. Hsieh, D.-S. Lee, L.-H. Liou, and W.~Liao,
    ``Relative centrality and local community detection,'' Network Science,
    vol.~3, no.~4, pp.~445--479, 2015.

    D.~Liben-Nowell and J.~Kleinberg, ``The link prediction problem for social
    networks,'' in Proceedings of the twelfth international conference on
    Information and knowledge management, pp.~556--559, ACM, 2003.

    R.~Lambiotte, ``Multi-scale modularity in complex networks,'' in Modeling
    and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), 2010
    Proceedings of the 8th International Symposium on, pp.~546--553, IEEE, 2010.

    J.-C. Delvenne, S.~N. Yaliraki, and M.~Barahona, ``Stability of graph
    communities across time scales,'' Proceedings of the National Academy of
    Sciences, vol.~107, no.~29, pp.~12755--12760, 2010.

    F.~D. Malliaros and M.~Vazirgiannis, ``Clustering and community detection in
    directed networks: A survey,'' Physics Reports, vol.~533, no.~4,
    pp.~95--142, 2013.

    M.~E. Newman, ``Fast algorithm for detecting community structure in networks,''
    Physical review E, vol.~69, no.~6, p.~066133, 2004.

    C.-S. Chang, W.~Liao, Y.-S. Chen, and L.-H. Liou, ``A mathematical theory for
    clustering in metric spaces,'' to appear in IEEE Transactions on Network
    Science and Engineering, 2016.

    S.~Brin and L.~Page, ``The anatomy of a large-scale hypertextual web search
    engine,'' Computer networks and ISDN systems, vol.~30, no.~1,
    pp.~107--117, 1998.

    R.~Lambiotte and M.~Rosvall, ``Ranking and clustering of nodes in networks with
    smart teleportation,'' Physical Review E, vol.~85, no.~5, p.~056107,
    2012.

    F.~Chung, ``Laplacians and the cheeger inequality for directed graphs,'' {\em
    Annals of Combinatorics}, vol.~9, no.~1, pp.~1--19, 2005.

    E.~A. Leicht and M.~E. Newman, ``Community structure in directed networks,''
    Physical review letters, vol.~100, no.~11, p.~118703, 2008.

    R.~Nelson, Probability, stochastic processes, and queueing theory: the
    mathematics of computer performance modeling.
    \newblock Springer Verlag, 1995.

    Y.~Kim, S.-W. Son, and H.~Jeong, ``Finding communities in directed networks,''
    Physical Review E, vol.~81, no.~1, p.~016103, 2010.

    L.~C. Freeman, ``A set of measures of centrality based on betweenness,'' {\em
    Sociometry}, pp.~35--41, 1977.

    L.~C. Freeman, ``Centrality in social networks conceptual clarification,'' {\em
    Social networks}, vol.~1, no.~3, pp.~215--239, 1979.

    M.~Newman, Networks: an introduction.
    \newblock Oxford university press, 2010.

    R.~Andersen, F.~Chung, and K.~Lang, ``Local graph partitioning using pagerank
    vectors,'' in Foundations of Computer Science, 2006. FOCS'06. 47th
    Annual IEEE Symposium on, pp.~475--486, IEEE, 2006.

    J.~Leskovec, K.~J. Lang, and M.~Mahoney, ``Empirical comparison of algorithms
    for network community detection,'' in Proceedings of the 19th
    international conference on World wide web, pp.~631--640, ACM, 2010.

    A.~Clauset, M.~E. Newman, and C.~Moore, ``Finding community structure in very
    large networks,'' Physical review E, vol.~70, no.~6, p.~066111, 2004.

    M.~Rosvall and C.~T. Bergstrom, ``Maps of information flow reveal community
    structure in complex networks,'' Proceedings of the National Academy of
    Sciences USA, vol.~105, pp.~1118--1123, 2008.

    R.~Lambiotte, J.-C. Delvenne, and M.~Barahona, ``Random walks, markov processes
    and the multiscale modular organization of complex networks,'' IEEE
    Transactions on Network Science and Engineering, vol.~1, no.~2, pp.~76--90,
    2014.

    J.~White, E.~Southgate, J.~Thomson, and S.~Brenner, ``The structure of the
    nervous system of the nematode caenorhabditis elegans: the mind of a worm,''
    Phil. Trans. R. Soc. Lond, vol.~314, pp.~1--340, 1986.

    D.~J. Watts and S.~H. Strogatz, ``Collective dynamics of
    ‘small-world’networks,'' nature, vol.~393, no.~6684, p.~440, 1998.

    A.~Lancichinetti, S.~Fortunato, and F.~Radicchi, ``Benchmark graphs for testing
    community detection algorithms,'' Physical review E, vol.~78, no.~4,
    p.~046110, 2008.

    J.~Yang and J.~Leskovec, ``Defining and evaluating network communities based on
    ground-truth,'' Knowledge and Information Systems, vol.~42, no.~1,
    pp.~181--213, 2015.

    L.~A. Adamic and N.~Glance, ``The political blogosphere and the 2004 us
    election: divided they blog,'' in Proceedings of the 3rd international
    workshop on Link discovery, pp.~36--43, ACM, 2005.

    G.~Csardi and T.~Nepusz, ``The igraph software package for complex network
    research,'' InterJournal, vol.~Complex Systems, p.~1695, 2006.

    S.~Harenberg, G.~Bello, L.~Gjeltema, S.~Ranshous, J.~Harlalka, R.~Seay,
    K.~Padmanabhan, and N.~Samatova, ``Community detection in large-scale
    networks: a survey and empirical evaluation,'' Wiley Interdisciplinary
    Reviews: Computational Statistics, vol.~6, no.~6, pp.~426--439, 2014.

    S.~Isaacman, R.~Becker, R.~C{\'a}ceres, S.~Kobourov, M.~Martonosi, J.~Rowland,
    and A.~Varshavsky, ``Identifying important places in people’s lives from
    cellular network data,'' in International Conference on Pervasive
    Computing, pp.~133--151, Springer, 2011.

    W.~Gao, Q.~Li, B.~Zhao, and G.~Cao, ``Multicasting in delay tolerant networks:
    a social network perspective,'' in Proceedings of the tenth ACM
    international symposium on Mobile ad hoc networking and computing,
    pp.~299--308, ACM, 2009.

    N.~P. Nguyen, T.~N. Dinh, Y.~Xuan, and M.~T. Thai, ``Adaptive algorithms for
    detecting community structure in dynamic social networks,'' in INFOCOM,
    2011 Proceedings IEEE, pp.~2282--2290, IEEE, 2011.

    N.~P. Nguyen, G.~Yan, M.~T. Thai, and S.~Eidenbenz, ``Containment of
    misinformation spread in online social networks,'' in Proceedings of the
    4th Annual ACM Web Science Conference, pp.~213--222, ACM, 2012.

    S.~Theodoridis, K.~Koutroumbas, et~al., ``Pattern recognition.,'' 1999.

    J.~Leskovec, A.~Rajaraman, and J.~D. Ullman, Mining of massive datasets.
    \newblock Cambridge university press, 2014.

    A.~K. Jain, M.~N. Murty, and P.~J. Flynn, ``Data clustering: a review,'' {\em
    ACM computing surveys (CSUR)}, vol.~31, no.~3, pp.~264--323, 1999.

    S.~Theodoridis and K.~Koutroumbas, Pattern Recognition.
    \newblock Elsevier Academic press, USA, 2006.

    A.~Rajaraman, J.~Leskovec, and J.~D. Ullman, Mining of massive datasets.
    \newblock Cambridge University Press, 2012.

    S.~Lloyd, ``Least squares quantization in pcm,'' Information Theory, IEEE
    Transactions on, vol.~28, no.~2, pp.~129--137, 1982.

    M.~Agarwal, R.~Jaiswal, and A.~Pal, ``k-means++ under approximation
    stability,'' in Theory and Applications of Models of Computation,
    pp.~84--95, Springer, 2013.

    L.~Kaufman and P.~J. Rousseeuw, Finding groups in data: an introduction to
    cluster analysis, vol.~344.
    \newblock John Wiley \& Sons, 2009.

    M.~Van~der Laan, K.~Pollard, and J.~Bryan, ``A new partitioning around medoids
    algorithm,'' Journal of Statistical Computation and Simulation,
    vol.~73, no.~8, pp.~575--584, 2003.

    H.-S. Park and C.-H. Jun, ``A simple and fast algorithm for k-medoids
    clustering,'' Expert Systems with Applications, vol.~36, no.~2,
    pp.~3336--3341, 2009.

    J.~Shi and J.~Malik, ``Normalized cuts and image segmentation,'' Pattern
    Analysis and Machine Intelligence, IEEE Transactions on, vol.~22, no.~8,
    pp.~888--905, 2000.

    M.~Filippone, F.~Camastra, F.~Masulli, and S.~Rovetta, ``A survey of kernel and
    spectral methods for clustering,'' Pattern recognition, vol.~41, no.~1,
    pp.~176--190, 2008.

    F.~Krzakala, C.~Moore, E.~Mossel, J.~Neeman, A.~Sly, L.~Zdeborov{\'a}, and
    P.~Zhang, ``Spectral redemption in clustering sparse networks,'' {\em
    Proceedings of the National Academy of Sciences}, vol.~110, no.~52,
    pp.~20935--20940, 2013.

    M.~Ester, H.-P. Kriegel, J.~Sander, and X.~Xu, ``A density-based algorithm for
    discovering clusters in large spatial databases with noise.,'' in Kdd,
    vol.~96, pp.~226--231, 1996.

    A.~Cuevas, M.~Febrero, and R.~Fraiman, ``Cluster analysis: a further approach
    based on density estimation,'' Computational Statistics \& Data
    Analysis, vol.~36, no.~4, pp.~441--459, 2001.

    M.~Halkidi and M.~Vazirgiannis, ``A density-based cluster validity approach
    using multi-representatives,'' Pattern Recognition Letters, vol.~29,
    no.~6, pp.~773--786, 2008.

    M.-F. Balcan, A.~Blum, and A.~Gupta, ``Clustering under approximation
    stability,'' Journal of the ACM (JACM), vol.~60, no.~2, p.~8, 2013.

    ``The wondernetwork dataset.''

    L.-H.~L. Cheng-Shang~Chang, Duan-Shin~Lee and S.-M. Lu, ``Community detection
    in signed networks: an error-correcting code approach,'' International
    Conference on Internet of People, 2017.

    ``The submarine cable map.''

    C.-S. Chang, D.-S. Lee, L.-H. Liou, S.-M. Lu, and M.-H. Wu, ``A probabilistic
    framework for structural analysis in directed networks,'' in {\em
    Communications (ICC), 2016 IEEE International Conference on}, pp.~1--6, IEEE,
    2016.

    C.-S. Chang, D.-S. Lee, L.-H. Liou, S.-M. Lu, and M.-H. Wu, ``A probabilistic
    framework for structural analysis and community detection in directed
    networks,'' IEEE/ACM Transactions on Networking, 2017.

    Blondel, Jlguillaume, and Taynaud, ``Louvain method for community detection in
    large graphs,'' 2011.

    J.~Dean and S.~Ghemawat, ``Mapreduce: simplified data processing on large
    clusters,'' Communications of the ACM, vol.~51, no.~1, pp.~107--113,
    2008.

    R.~C. Martin, More C++ gems, vol.~17.
    \newblock Cambridge University Press, 2000.

    D.~E. Knuth, The art of computer programming: sorting and searching,
    vol.~3.
    \newblock Pearson Education, 1998.

    R.~E. Tarjan, ``Efficiency of a good but not linear set union algorithm,'' {\em
    Journal of the ACM (JACM)}, vol.~22, no.~2, pp.~215--225, 1975.

    T.~White, Hadoop: The definitive guide.
    \newblock " O'Reilly Media, Inc.", 2012.

    J.~Liu, ``Fuzzy modularity and fuzzy community structure in networks,'' {\em
    The European Physical Journal B-Condensed Matter and Complex Systems},
    vol.~77, no.~4, pp.~547--557, 2010.

    T.~C. Havens, J.~C. Bezdek, C.~Leckie, K.~Ramamohanarao, and M.~Palaniswami,
    ``A soft modularity function for detecting fuzzy communities in social
    networks,'' IEEE Transactions on Fuzzy Systems, vol.~21, no.~6,
    pp.~1170--1175, 2013.

    QR CODE