研究生: |
陳怡仁 Chen, Yi Ren |
---|---|
論文名稱: |
G-Storm: 具 GPU 感知之 Storm 規劃方法 G-Storm: GPU-Aware Scheduling in Storm |
指導教授: |
李哲榮
Lee, Che Rung |
口試委員: |
周志遠
蕭宏章 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 英文 |
論文頁數: | 26 |
中文關鍵詞: | 大數據 、串流處理 、GPU 、Storm |
外文關鍵詞: | big data, stream process, GPU, Storm |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現今我們正邁向資料經濟的時代,如何能有效分析大量數據則成為成功的關鍵。目前有許多用於處理巨量資料的系統已經被開發出來,當中Storm是為了處理資料串流而設計的。Storm預設只使用了相當簡易的round-robin策略來對工作進行排程。這種策略在同質平台的環境下可以達到不錯的成效,但是在異質環境下則無法達到有效的利用。
此篇論文我們設計並實作出G-Storm,一種新的Storm排程演算法,能讓Storm有效地評估並利用GPU計算卡來加速計算效能。我們的實驗顯示G-Storm在工作量較輕的情況下可以比Storm預設的工作排程多出1.65倍的效能,而在工作量較重的情況下更可達到將近2.04倍的加速。
Now we are shifting toward to a data driven economy, in which the ability to efficiently analyze huge amount of data in time is the key to successes. Many systems
for big data processing have been developed and Storm is one of them, whose target is stream data processing. By default Storm only provides a very simple round
robin scheduling policy to assign tasks. The default scheduler can provides nice performance for homogeneous platforms, but does not work well for heterogeneous
computing environments.
In this thesis, we propose and implement a new Storm scheduling algorithm, named G-Storm, such that Storm can evaluate GPU capacity for scheduling and more effectively make use of GPU to speed up the overall performance. The experimental results show that G-Storm can achieve 1.65x to 2.04x performance acceleration on lightly weight and heavily loading of topology, compared to Storm with
default scheduler.
ffmpeg. http://www.ffmpeg.org.
Lmax disruptor. https://github.com/LMAX-Exchange/disruptor.
Netty. http://netty.io/.
Nvidia cuda document. https://developer.nvidia.com/cuda-toolkit-65.
Zeromq. http://zeromq.org/.
Gang Chen 0001, Ke Chen 0005, Dawei Jiang, Beng Chin Ooi, Lei Shi, Hoang Tam Vo, and Sai Wu. E3: an elastic execution engine for scalable data processing.
Lisa Amini, Henrique Andrade, Ranjita Bhagwan, Frank Eskesen, Richard King, Philippe Selo, Yoonho Park, and Chitra Venkatramani. Spc: A dis-tributed, scalable platform for data mining.
Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. Adaptive online scheduling in storm.
Apache Software Foundation. Storm. http://storm.apache.org.
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems.
Vinayak Borkar, Michael Carey, Raman Grover, Nicola Onose, and Rares Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing.
M. Cammert, C. Heinz, J. Kramer, B. Seeger, S. Vaupel, and U. Wolske. Flexible multi-threaded scheduling for continuous queries over data streams.
Fangfei Chen, M. Kodialam, and T.V. Lakshman. Joint scheduling of processing and shuffle phases in mapreduce systems.
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears.
Mapreduce online.
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, John Gerth,
Justin Talbot, Khaled Elmeleegy, and Russell Sears. Online aggregation and continuous query support in mapreduce.
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters.
Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior.
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems.
Vibhore Kumar, Henrique Andrade, Bu ̆gra Gedik, and Kun-Lung Wu. Deduce: At the intersection of mapreduce and stream processing.
Muhammad Anis Uddin Nasir, Gianmarco De Francisci Morales, David Garc ́ıaSoriano, Nicolas Kourtellis, and Marco Serafini. The power of both choices: Practical load balancing for distributed stream processing engines.
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform.
M. Rychly, P. Koda, and P. Smrz. Scheduling decisions in stream processing on heterogeneous clusters.
K. Shvachko, Hairong Kuang, S. Radia, and R. Chansler. The hadoop distributed file system.
Jielong Xu, Zhenhua Chen, Jian Tang, and Sen Su. T-storm: Traffic-aware online scheduling in storm.