簡易檢索 / 詳目顯示

研究生: 柯恩
Kenn Daryn Slagter
論文名稱: Load Balancing and Data Placement in MapReduce
MapReduce之負載平衡與資料放置策略
指導教授: 鍾葉青
Professor Yeh-Ching Chung
口試委員: 許慶賢
Professor Ching-Hsien (Robert) Hsu
張西亞
Dr Hsi-Ya Chang
李哲榮
Professor Che-Rung Lee
周志遠
Professor Jerry Chou
陳宜欣
Professor Yi-Shin Chen
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 111
中文關鍵詞: MapReduce大數據雲端計算分散式計算異質性集群執行時間最佳化加快數據密集型計算數據中心資源管理
外文關鍵詞: MapReduce, BigData, Cloud Computing, Distributed Computing, Heterogeneity, Cluster, Runtime Optimization, Accelerating Data Intensive Computation, Data Center, Resource Management
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在大數據的時代,每天無所不在且無時無刻的產生大量的結構化和非結構化數據。處理大數據最困難的工作是需要同時在許多的電腦中平行執行軟體。 MapReduce是一個新的程式架構,對於處理大數據時他可簡易的將程式分散在各電腦中執行。 MapReduce 是利用電腦的網路連接的性質,將工作在電腦之間做平行處理。

    雲端計算往往是與MapReduce交互使用的概念。雲端計算是一種計算資源在網上互聯的計算方法。對於處理大數據而言,雲端計算與MapReduce架構的結合是一個完美的組合,MapReduce架構提供了一個既靈活且有彈性的硬體平台和結構,而雲端計算提供了MapReduce所需的密集型數據計算的資源。

    本文著眼於在群集或數據中心的資源利用率,並專注於負載平衡和數據在MapReduce放置。負載平衡是一種跨多台計算機或群集電腦的分散工作量的方法,並涉及到中央處理單元、硬碟和其他資源。負載平衡的目的是為了實現最佳的資源利用率,最大限度地提高處理能力,減少反應時間,並避免過載。在工作的期間,數據放置的位置與負載平衡是密切相關的。

    在這篇論文中,我們研究的負載平衡和MapReduce的數據佈局,以及它是如何在不同的情況下使用。然而,我們提出幾種演算法來提升MapReduce的負載均衡。首先,我們介紹一種能改善對全局排序(Total Order Partitioner)的抽樣方法。其次,我們提出一個使用動態資料重新配置(dynamic data redistribution)的多表格連結的演算法(multiway join algorithm)。最後我們展現了如何讓數據可以更智能地在異構雲端環境中的分佈。

    在我們的第一個主題,我們提出對目前的XTrie,ETrie和ATrie演算法對資料去做改進的抽樣方法和配置方法。該XTrie演算法使用固定的記憶體使用量,這不像是傳統的全局排序(total order partitioner)法把每個樣本都儲存在記憶體中。除此之外,我們的XTrie具有更好的性能,且XTrie對200,000樣本能有快七倍的執行時間。使用XTrie也改善ETrie和ATrie演算法對記憶體的需求,對ETrie而言用,XTrie可減少記憶體達到1/16。最後藉由使用在出生日期數據的自適應方法(adaptive method) 和一個2級的線索,可讓使用XTrie的ATrie減少記憶體到達1/16384。

    在我們的第二個主題中,我們提出了一個MapReduce的網絡感知多表格連結(SmartJoin)提高性能,並在重新分配工作負載於reducer時考慮網絡流量。我們發現這可以減少連結多個數據組所需的時間。在我們的評測中,證明SmartJoin的效能比未重新分配的方法(non-redistribution method)好39%,比隨機重新分配方法(random redistribution)好26.8%,比worst join重新分配方法好27.6%。

    最後,在我們的第三個主題中,我們對MapReduce佈署在異質雲環境提出了一個動態資料分配方法和虛擬機映射器。模擬和實驗結果證明,能提高MapReduce的性能,對數據局部性的部分改善了33%且提升最佳總完工時間41%。此外,藉由使用負載感知虛擬機映射器(virtual machine mapper)可在reduce階段額外減少13%的完成時間。


    In the era of Big Data, huge amounts of structured and unstructured data are being produced daily by a myriad of ubiquitous sources. Big Data is difficult to work with and requires massively parallel software running on a large number of computers. MapReduce is a recent programming model that simplifies writing distributed applications that handle Big Data. MapReduce does this by dividing its workload amongst computers in a network and then processing the work in parallel.
    Often intertwined with the concept of MapReduce is Cloud Computing. Cloud Computing is a method of computing that shares computing resources over the Internet. Cloud Computing is a perfect match for handling Big Data and the MapReduce model by providing a hardware platform and framework that is flexible and elastic. Cloud Computing provides the resources needed for the data intensive computation required by MapReduce and its data.
    This dissertation looks at resource utilization within a cluster or data center and focuses on load balancing and data placement in MapReduce. Load balancing is a method to distribute workload across multiple computers or a computer cluster and involves central processing units, disk drives, and other resources. The purpose of load balancing is to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Data placement is closely related to load balancing and refers to where data is in the network during the lifetime of a job.
    In this dissertation, we investigate load balancing and data placement for MapReduce and how it applies in different situations. We then propose several algorithms that can improve load balancing for MapReduce. Firstly we introduce an improved sampling method for total order partitioning. Secondly we present a multiway join algorithm that uses dynamic data redistribution and finally we show how data can be more intelligently distributed within a heterogeneous cloud environment.
    In our first topic, we propose an improved sampling and partitioning method for strings. In this topic, we present the XTrie, ETrie and ATrie algorithms. The XTrie algorithm uses a fixed memory footprint, which is unlike the traditional total order partitioning method that stores all elements in a sample set in memory. Furthermore, we show XTrie has better performance, and is able to execute 7 times faster on 200,000 samples. Both ETrie and ATrie algorithms further improve the memory requirements used by XTrie. ETrie was able to reduce memory consumption to 1/16 of that used by XTrie. Finally, ATrie was able to reduce memory consumption by 1/16384 of that used by XTrie, by using an adaptive method on birthdate data and a 2-level trie.
    In our second topic, we propose a network aware multiway join for MapReduce (SmartJoin) that improves performance and considers network traffic when redistributing workload amongst reducers. We show this can reduce the time required to join multiple datasets. In our evaluation, we show that SmartJoin has up to 39% improvement compared to the non-redistribution method, a 26.8% improvement over random redistribution and 27.6% improvement over WorstJoin redistribution.
    Finally, in our third topic we propose a dynamic data partitioner and virtual machine mapper for MapReduce when deployed on a heterogeneous cloud environment. Simulation and experimental results show an improvement in MapReduce performance, improving data locality by 33% and optimizing total completion time by 41%. Furthermore, by using the Load Aware Virtual Machine Mapper obtained an additional 13% improvement in reduce phase completion time.

    摘要 i Abstract iii Acknowledgments v Contents vi List of Figures ix List of Tables xi List of Algorithms xii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Organization of the Dissertation 3 Chapter 2 Preliminaries 4 2.1 MapReduce 4 2.2 Cloud Computing 6 2.3 An Introspective of Covered Topics 7 Chapter 3 Sampling and Partitioning 10 3.1 Background 11 3.1.1 HashCodes 11 3.1.2 TeraSort 12 3.2 Proposed Techniques and Optimizations 13 3.2.1 XTrie 13 3.2.2 ETrie 17 3.2.3 ATrie 20 3.3 Evaluation 24 3.3.1 XTrie and ETrie 24 3.3.2 ATrie 29 3.4 Chapter Summary 35 Chapter 4 SmartJoin: A Multiway Join For MapReduce 36 4.1 Background 38 4.1.1 Network Model 38 4.1.2 Join Algorithms 40 4.1.2.1 Reduce-side Join 40 4.1.2.2 Map-side Join 41 4.1.2.3 Hash Join 42 4.1.3 Smart Join 43 4.2 Evaluation 48 4.2.1 Experiment Configuration 48 4.2.2 Experiment Results 49 4.3 Chapter Summary 58 Chapter 5 Virtual Machine Mapper 59 5.1 Proposed Techniques and Optimizations 60 5.1.1 Research Model 60 5.1.2 Dynamic Data Partitioning 60 5.1.3 Virtual Machine Mapping 64 5.1.3.1 Priority Based Virtual Machine Mapping 70 5.1.3.2 Load Aware Virtual Machine Mapping 71 5.2 Evaluation 73 5.2.1 Data Locality 74 5.2.2 Task Execution Time 76 5.3 Chapter Summary 81 Chapter 6 Related Work 82 6.1 Sampling and Partitioning 82 6.2 Smart Join 84 6.3 Virtual Machine Mapping 87 Chapter 7 Conclusions and Future Work 89 Bibliography 91 VITA 97

    [1] J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, pp. 107-113, 2008.
    [2] K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon, "Parallel data processing with MapReduce: a survey," ACM SIGMOD Record, vol. 40, pp. 11-20, 2012.
    [3] J. Dittrich and J.-A. Quiané-Ruiz, "Efficient big data processing in Hadoop MapReduce," Proceedings of the VLDB Endowment, vol. 5, pp. 2014-2015, 2012.
    [4] S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google file system," in ACM SIGOPS Operating Systems Review, 2003, pp. 29-43.
    [5] T. White, Hadoop: The definitive guide: O'Reilly Media, Inc., 2012.
    [6] J. Tan, X. Pan, S. Kavulya, R. Gandhi, and P. Narasimhan, "Mochi: visual log-analysis based tools for debugging hadoop," in Proceedings of the 2009 conference on Hot topics in cloud computing, 2009, p. 18.
    [7] Hadoop Available: http://hadoop.apache.org/, Accessed:(Oct. 1, 2013)
    [8] Wikipedia, "Load Balancing," ed.
    [9] S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan, "An analysis of traces from a production mapreduce cluster," in Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, 2010, pp. 94-103.
    [10] N. Sultan and S. van de Bunt-Kokhuis, "Organisational culture and cloud computing: coping with a disruptive innovation," Technology Analysis & Strategic Management, vol. 24, pp. 167-179, 2012.
    [11] J. Lin and C. Dyer, "Data-intensive text processing with MapReduce," Synthesis Lectures on Human Language Technologies, vol. 3, pp. 1-177, 2010.
    [12] L. Wang, G. Von Laszewski, A. Younge, X. He, M. Kunze, J. Tao, and C. Fu, "Cloud computing: a perspective study," New Generation Computing, vol. 28, pp. 137-146, 2010.
    [13] M. Böhm, S. Leimeister, C. Riedl, and H. Krcmar, "Cloud computing and computing evolution," Technische Universität München (TUM), Germany, 2010.
    [14] P. Mell and T. Grance, "The NIST definition of cloud computing (draft)," NIST special publication, vol. 800, p. 7, 2011.
    [15] Y. Xing and Y. Zhan, "Virtualization and Cloud Computing," in Future Wireless Networks and Information Systems, ed: Springer, 2012, pp. 305-312.
    [16] B. Furht and A. Escalante, Handbook of cloud computing: Springer Publishing Company, Incorporated, 2010.
    [17] Xen, Available: http://www.xenproject.org, Accessed:(Aug. 21, 2013)
    [18] VMWare, Available: http://www.vmware.com, Accessed:(Aug. 21, 2013)
    [19] KVM, Available: http://www.linux-kvm.org, Accessed:(Aug. 21, 2013)
    [20] T. Hoefler, A. Lumsdaine, and J. Dongarra, "Towards Efficient MapReduce Using MPI," Recent Advances in Parallel Virtual Machine and Message Passing Interface, Proceedings, vol. 5759, pp. 240-249, 2009.
    [21] Virtual Hadoop, Available: http://wiki.apache.org/hadoop/Virtual%20Hadoop, Accessed:(Dec. 24, 2013)
    [22] G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica, "Effective straggler mitigation: attack of the clones," in Proc. NSDI, 2013.
    [23] O. O’Malley, "Terabyte sort on apache hadoop," Yahoo, available online at: http://sortbenchmark.org/Yahoo-Hadoop.pdf, (May), pp. 1-3, 2008.
    [24] B. Panda, M. Riedewald, and D. Fink, "The model-summary problem and a solution for trees," in Data Engineering (ICDE), 2010 IEEE 26th International Conference on, 2010, pp. 449-460.
    [25] J. Shafer, S. Rixner, and A. L. Cox, "The Hadoop distributed filesystem: Balancing portability and performance," in Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on, 2010, pp. 122-133.
    [26] F. Xhafa, " Processing and Analysing Large Log Data Files of a Virtual Campus," Journal of Convergence, vol. 3, pp. 1-8, 2012.
    [27] J. Augusto, V. Callaghan, D. Cook, A. Kameas, I. Satoh, T. Saba, K. Chorianopoulos, N. Howard, E. Cambria, and V. Gupta, "" Intelligent Environments: a manifesto," Human-centric Computing and Information Sciences, vol. 3, pp. 1-18, 2013.
    [28] H. Ihm, "Mining Consumer Attitude and Behavior, an exploratory study on movie audience attitude extracted from Twitter. ," Journal of Convergence vol. 4, pp. 29-35, 2013.
    [29] F. N. Afrati and J. D. Ullman, "Optimizing Multiway Joins in a Map-Reduce Environment," Ieee Transactions on Knowledge and Data Engineering, vol. 23, pp. 1282-1298, Sep 2011.
    [30] S. Lynden, Y. Tanimura, I. Kojima, and A. Matono, "Dynamic Data Redistribution for MapReduce Joins," in Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on, 2011, pp. 717-723.
    [31] J. Chandar, "Join Algorithms using Map/Reduce," Master of Science, School of Informatics, University of Edinburgh, 2010.
    [32] S. Heinz, J. Zobel, and H. E. Williams, "Burst tries: a fast, efficient data structure for string keys," ACM Transactions on Information Systems (TOIS), vol. 20, pp. 192-223, 2002.
    [33] A. Krishnan, "GridBLAST: a Globus‐based high‐throughput implementation of BLAST in a Grid computing framework," Concurrency and Computation: Practice and Experience, vol. 17, pp. 1607-1623, 2005.
    [34] H. Stockinger, M. Pagni, L. Cerutti, and L. Falquet, "Grid approach to embarrassingly parallel CPU-intensive bioinformatics problems," in e-Science and Grid Computing, 2006. e-Science'06. Second IEEE International Conference on, 2006, pp. 58-58.
    [35] K. Candan, J. W. Kim, P. Nagarkar, M. Nagendra, and R. Yu, "RanKloud: Scalable multimedia data processing in server clusters," MultiMedia, IEEE, vol. 18, pp. 64-77, 2011.
    [36] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, "Bigtable: A distributed storage system for structured data," ACM Transactions on Computer Systems (TOCS), vol. 26, p. 4, 2008.
    [37] L. George, HBase: the definitive guide: O'Reilly Media, Inc., 2011.
    [38] H. Liu and D. Orban, "Cloud mapreduce: a mapreduce implementation on top of a cloud operating system," in Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on, 2011, pp. 464-474.
    [39] C. Miceli, M. Miceli, S. Jha, H. Kaiser, and A. Merzky, "Programming abstractions for data intensive computing on clouds and grids," in Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009, pp. 478-483.
    [40] A. Matsunaga, M. Tsugawa, and J. Fortes, "Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics applications," in eScience, 2008. eScience'08. IEEE Fourth International Conference on, 2008, pp. 222-229.
    [41] S. Papadimitriou and J. Sun, "Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining," in Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on, 2008, pp. 512-521.
    [42] W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, "Detecting large-scale system problems by mining console logs," in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, 2009, pp. 117-132.
    [43] W. Jiang and G. Agrawal, "Ex-mate: Data intensive computing with large reduction objects and its application to graph mining," in Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on, 2011, pp. 475-484.
    [44] H. Vashishtha, M. Smit, and E. Stroulia, "Moving text analysis tools to the cloud," in Services (SERVICES-1), 2010 6th World Congress on, 2010, pp. 107-114.
    [45] C. Jin, C. Vecchiola, and R. Buyya, "MRPGA: An extension of MapReduce for parallelizing genetic algorithms," in eScience, 2008. eScience'08. IEEE Fourth International Conference on, 2008, pp. 214-221.
    [46] A. Verma, X. Llora, D. E. Goldberg, and R. H. Campbell, "Scaling genetic algorithms using mapreduce," in Intelligent Systems Design and Applications, 2009. ISDA'09. Ninth International Conference on, 2009, pp. 13-18.
    [47] Z. Fadika and M. Govindaraju, "Delma: Dynamically elastic mapreduce framework for cpu-intensive applications," in Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on, 2011, pp. 454-463.
    [48] S. Groot and M. Kitsuregawa, "Jumbo: Beyond MapReduce for workload balancing," in 36th International Conference on Very Large Data Bases, Singapore, 2010.
    [49] S. R. Ramakrishnan, G. Swart, and A. Urmanov, "Balancing reducer skew in MapReduce workloads using progressive sampling," in Proceedings of the Third ACM Symposium on Cloud Computing, 2012, p. 16.
    [50] K. Palla, "A comparative analysis of join algorithms using the hadoop map/reduce framework," Master of science thesis. School of informatics, University of Edinburgh, 2009.
    [51] S. Blanas, Y. Li, and J. M. Patel, "Design and evaluation of main memory hash join algorithms for multi-core cpus," in Proc. ACM SIGMOD, 2011.
    [52] F. Atta, S. D. Viglas, and S. Niazi, "SAND Join—A skew handling join algorithm for Google's MapReduce framework," in Multitopic Conference (INMIC), 2011 IEEE 14th International, 2011, pp. 170-175.
    [53] T. Lee, K. Kim, and H. J. Kim, "Join processing using Bloom filter in MapReduce," in Proceedings of the 2012 ACM Research in Applied Computation Symposium, 2012, pp. 100-105.
    [54] X. Wang, R. Burns, A. Terzis, and A. Deshpande, "Network-aware join processing in global-scale database federations," in Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, 2008, pp. 586-595.
    [55] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, "ZooKeeper: Wait-free coordination for Internet-scale systems," in USENIX ATC, 2010.
    [56] B. Palanisamy, A. Singh, L. Liu, and B. Jain, "Purlieus: locality-aware resource allocation for MapReduce in a cloud," in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, 2011, p. 58.
    [57] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica, "Mesos: A platform for fine-grained resource sharing in the data center," in Proceedings of the 8th USENIX conference on Networked systems design and implementation, 2011, pp. 22-22.
    [58] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, "Dominant resource fairness: fair allocation of multiple resource types," in USENIX NSDI, 2011.
    [59] G. Lee, N. Tolia, P. Ranganathan, and R. H. Katz, "Topology-aware resource allocation for data-intensive workloads," in Proceedings of the first ACM asia-pacific workshop on Workshop on systems, 2010, pp. 1-6.
    [60] X. Wang, D. Shen, T. Nie, Y. Kou, and G. Yu, "The Equi-Join Processing and Optimization on Ring Architecture Key/Value Database," Web Technologies and Applications, pp. 243-254, 2012.
    [61] D. Jiang, A. K. H. Tung, and G. Chen, "Map-join-reduce: Toward scalable and efficient data analysis on large clusters," Knowledge and Data Engineering, IEEE Transactions on, vol. 23, pp. 1299-1311, 2011.
    [62] Y. Lin, D. Agrawal, C. Chen, B. C. Ooi, and S. Wu, "Llama: leveraging columnar storage for scalable join processing in the mapreduce framework," in Proceedings of the 2011 international conference on Management of data, 2011, pp. 961-972.
    [63] J. Myung and S. Lee, "Matrix chain multiplication via multi-way join algorithms in MapReduce," in Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, 2012, p. 53.
    [64] T. Seidl, S. Fries, and B. Boden, "MR-DSJ: Distance-Based Self-Join for Large-Scale Vector Data Analysis with MapReduce," in 15th BTW Conference on Database Systems for Business, Technology, and Web, Magdeburg, Germany, 2013, pp. 37-56.
    [65] F. N. Afrati, A. D. Sarma, D. Menestrina, A. Parameswaran, and J. D. Ullman, "Fuzzy joins using MapReduce," in Data Engineering (ICDE), 2012 IEEE 28th International Conference on, 2012, pp. 498-509.
    [66] A. Metwally and C. Faloutsos, "V-smart-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors," Proceedings of the VLDB Endowment, vol. 5, pp. 704-715, 2012.
    [67] N. Gowraj, P. V. Ravi, V. Mouniga, and M. Sumalatha, "S2MART: Smart Sql to Map-Reduce Translators," in Web Technologies and Applications, ed: Springer, 2013, pp. 571-582.
    [68] R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, and X. Zhang, "Ysmart: Yet another sql-to-mapreduce translator," in Distributed Computing Systems (ICDCS), 2011 31st International Conference on, 2011, pp. 25-36.
    [69] Y. Xu and S. Hu, "QMapper: a tool for SQL optimization on hive using query rewriting," in Proceedings of the 22nd international conference on World Wide Web companion, 2013, pp. 211-212.
    [70] J. Lu and R. H. Guting, "Parallel Secondo: Boosting Database Engines with Hadoop," in Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on, 2012, pp. 738-743.
    [71] W.-C. Chung, H.-P. Lin, S.-C. Chen, M.-F. Jiang, and Y.-C. Chung, "JackHare: a framework for SQL to NoSQL translation using MapReduce," Automated Software Engineering, pp. 1-20, 2013.
    [72] M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin, "MapReduce and parallel DBMSs: friends or foes?," Communications of the ACM, vol. 53, pp. 64-71, 2010.
    [73] S. Khalil, S. A. Salem, S. Nassar, and E. M. Saad, "Mapreduce Performance in Heterogeneous Environments: A Review."
    [74] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares, and X. Qin, "Improving mapreduce performance through data placement in heterogeneous hadoop clusters," in Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, 2010, pp. 1-9.
    [75] X. Bu, J. Rao, and C.-Z. Xu, "Interference and Locality-Aware Task Scheduling for MapReduce Applications in Virtual Clusters," presented at the The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), New York, USA, 2013.
    [76] Y. Fan, W. Wu, H. Cao, H. Zhu, X. Zhao, and W. Wei, "A heterogeneity-aware data distribution and rebalance method in Hadoop cluster," in ChinaGrid Annual Conference (ChinaGrid), 2012 Seventh, 2012, pp. 176-181.
    [77] J. Park, D. Lee, B. Kim, J. Huh, and S. Maeng, "Locality-aware dynamic vm reconfiguration on mapreduce clouds," in Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, 2012, pp. 27-36.
    [78] W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, "A throughput optimal algorithm for map task scheduling in mapreduce with data locality," ACM SIGMETRICS Performance Evaluation Review, vol. 40, pp. 33-42, 2013.
    [79] M. Hammoud and M. F. Sakr, "Locality-aware reduce task scheduling for mapreduce," in Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on, 2011, pp. 570-576.
    [80] M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica, "Improving MapReduce Performance in Heterogeneous Environments," in OSDI, 2008, p. 7.
    [81] X. Zhao, X. Dong, H. Cao, Y. Fan, and H. Zhu, "A parameter dynamic-tuning scheduling algorithm based on history in heterogeneous environments," in ChinaGrid Annual Conference (ChinaGrid), 2012 Seventh, 2012, pp. 49-56.
    [82] X. Sun, C. He, and Y. Lu, "ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm," in Parallel and Distributed Systems (ICPADS), 2012 IEEE 18th International Conference on, 2012, pp. 148-155.
    [83] Q. Chen, M. Guo, Q. Deng, L. Zheng, S. Guo, and Y. Shen, "HAT: history-based auto-tuning MapReduce in heterogeneous environments," The Journal of Supercomputing, pp. 1-17, 2013.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE