Moving Towards Pure ANSI SQL in NoSQL｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	薛瑞夫 Sheriffo Ceesay
論文名稱：	Moving Towards Pure ANSI SQL in NoSQL
指導教授：	鍾葉清
口試委員:	徐慰中李哲榮
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2013
畢業學年度：	101
語文別：	英文
論文頁數：	60
中文關鍵詞：	雲端運算、軟體框架、資料庫、數據庫、資料庫、數據庫查詢語言
外文關鍵詞：	Hadoop, MapReduce, NoSQL, Hive, HBase, ANSI SQL
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Moving Towards Pure ANSI SQL in NoSQL
The main focus of this master’s thesis is to narrow down the user friendly gap between
the newly more distributed data processing platforms (HBase,Cassandra,
MapReduce e.t.c) and the traditional less distributed data processing platforms
e.g. (RDBMS’s).
Lot of work have been done in this area e.g. Hive and Pig but they are not
pure SQL.
Over the past few decades RDBMS’s and data-warehouses were the only choice
of data processing platforms with rich set of data processing tools e.g. SQL but
recently, due to the variety, velocity and volume of data, these traditional data
processing platforms becomes less efficient to handle this kind of data; thereby
the need to come up with more efficient data stores and processing platforms.
Though NoSQL data stores have lived up to their expectations of storing and
processing large datasets but this process might not be simple and convenient
as in traditional databases. One common cons of NoSQL databases is the lack
of the much loved SQL language.
This thesis will therefore focus on this new type of data stores also called
(NoSQL). Specifically we will focus on HBase which is a column oriented or
BigTable like Database as our choice of NoSQL store.
The fact that NoSQL databases are becoming very popular we will propose our
data mapping methods which can help migration from Relational Databases to
NoSQL databases to be less daunting.
Since this movement is from RDB’s which has rich set of procedures i.e. SQL to
access and manipulate data, we will extend our work to bridge the gap between
SQL and NoSQL by providing methods of using pure ANSI SQL to manipulate
the underlying data which is stored in our NoSQL store (HBase).

Introduction 3
1 Background Studies . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 Data distribution: . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Data locality: . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Key-Value pair orientation: . . . . . . . . . . . . . . . . 7
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Size of Data . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Heterogeneous Nature of Data . . . . . . . . . . . . . . . 12
3.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . 12
Data Mapping 13
1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Transforming RDBMS into HBase . . . . . . . . . . . . . . . . . 16
2.1 Schema Transformation . . . . . . . . . . . . . . . . . . 18
2.2 Rule of Thumb . . . . . . . . . . . . . . . . . . . . . . . 18
3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 22
Basic Implementation 23
1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.1 Life Cycle of MapReduce Query Execution . . . . . . . . 24
2 SELECT Statement General Structure . . . . . . . . . . . . . . 25
3 WHAT TO SELECT AND KEYWORDS . . . . . . . . . . . . 26
3.1 Table and Column Aliasing . . . . . . . . . . . . . . . . 26
3.2 ALL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Projection and Filtering (IN) . . . . . . . . . . . . . . . 27
3.4 Projection and Filtering (LIKE) . . . . . . . . . . . . . . 28
3.5 DISTINCT . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
xi
Contents
Advanced Implementation 31
1 JackHare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.1 JackHare and our Work . . . . . . . . . . . . . . . . . . 32
1.2 JackHare Architecture . . . . . . . . . . . . . . . . . . . 32
2 Advance MapReduce to SQL Implementation . . . . . . . . . . 33
2.1 General Structure . . . . . . . . . . . . . . . . . . . . . . 33
2.2 ORDERING AND SORTING . . . . . . . . . . . . . . . 34
2.3 Descending and Ascending Ordering: . . . . . . . . . . . 37
2.4 Multi-Column Order By: . . . . . . . . . . . . . . . . . . 38
2.5 AGGREGATION . . . . . . . . . . . . . . . . . . . . . . 39
2.6 NESTED and SUB-QUERIES or COMPOSITION QUERIES 41
2.7 Model for Composite Queries . . . . . . . . . . . . . . . 41
2.8 JOIN Implementation and Cross Join Optimization . . . 43
2.9 SET OPERATIONS and it’s Optimization . . . . . . . . 44
3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Experiments 47
1 Setup A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 Benchmark Results of Setup A . . . . . . . . . . . . . . . . . . . 48
2.1 Effect of Data Size . . . . . . . . . . . . . . . . . . . . . 52
3 Setup B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1 Benchmark Results of Setup B . . . . . . . . . . . . . . . 53
4 Functional Benchmarking . . . . . . . . . . . . . . . . . . . . . 54
5 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Conclusion and Future Work 55
                                

[1] David Dewitt and Jim Gray, Parallel Database System: The future of High
Performance Database Systems, ACM 1992.
[2] Fey Chang, Jeffrey Dean et al. BigTable: A Distributed Storage System
for Structured Data
[3] HBase: http://hbase.apache.org/
[4] NoSQL: http://nosql-database.org/
[5] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing
on Large Clusters
[6] Hadoop: http://hadoop.apache.org/
[7] JackHare: http://sourceforge.net/projects/jackhare/
[8] Chongxin Li. RDB to HBase: Transforming Relational Database into
HBase: A Case Study
[9] Hung Bing, et al. Structured Data Processing: Structured Data Processing
on MapReduce in NoSQL Databases
59
Bibliography
[10] Meng-Ju Hsieh, et al. SQLMR: A Scalable Database Management System
for Cloud Computing
[11] GIGACOM 2012 Facebook Data: http://gigaom.com/data/facebook-iscollecting-
your-data-500-terabytes-a-day/
[12] Biswapesh, et al. Tenzing: A SQL implementation On The MapReduce
Framework
[13] Number of Column Family recommendations:
http://hbase.apache.org/book/number.of.cfs.html
[14] JackHare: http://sourceforge.net/projects/jackhare/
[15] Hive:https://cwiki.apache.org/confluence/display/Hive/Home
[16] Apache Pig: http://pig.apache.org/
60

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文