研究生: |
薛瑞夫 Sheriffo Ceesay |
---|---|
論文名稱: |
Moving Towards Pure ANSI SQL in NoSQL |
指導教授: | 鍾葉清 |
口試委員: |
徐慰中
李哲榮 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 60 |
中文關鍵詞: | 雲端運算 、軟體框架 、資料庫 、數據庫 、資料庫 、數據庫查詢語言 |
外文關鍵詞: | Hadoop, MapReduce, NoSQL, Hive, HBase, ANSI SQL |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Moving Towards Pure ANSI SQL in NoSQL
The main focus of this master’s thesis is to narrow down the user friendly gap between
the newly more distributed data processing platforms (HBase,Cassandra,
MapReduce e.t.c) and the traditional less distributed data processing platforms
e.g. (RDBMS’s).
Lot of work have been done in this area e.g. Hive and Pig but they are not
pure SQL.
Over the past few decades RDBMS’s and data-warehouses were the only choice
of data processing platforms with rich set of data processing tools e.g. SQL but
recently, due to the variety, velocity and volume of data, these traditional data
processing platforms becomes less efficient to handle this kind of data; thereby
the need to come up with more efficient data stores and processing platforms.
Though NoSQL data stores have lived up to their expectations of storing and
processing large datasets but this process might not be simple and convenient
as in traditional databases. One common cons of NoSQL databases is the lack
of the much loved SQL language.
This thesis will therefore focus on this new type of data stores also called
(NoSQL). Specifically we will focus on HBase which is a column oriented or
BigTable like Database as our choice of NoSQL store.
The fact that NoSQL databases are becoming very popular we will propose our
data mapping methods which can help migration from Relational Databases to
NoSQL databases to be less daunting.
Since this movement is from RDB’s which has rich set of procedures i.e. SQL to
access and manipulate data, we will extend our work to bridge the gap between
SQL and NoSQL by providing methods of using pure ANSI SQL to manipulate
the underlying data which is stored in our NoSQL store (HBase).
Moving Towards Pure ANSI SQL in NoSQL
The main focus of this master’s thesis is to narrow down the user friendly gap between
the newly more distributed data processing platforms (HBase,Cassandra,
MapReduce e.t.c) and the traditional less distributed data processing platforms
e.g. (RDBMS’s).
Lot of work have been done in this area e.g. Hive and Pig but they are not
pure SQL.
Over the past few decades RDBMS’s and data-warehouses were the only choice
of data processing platforms with rich set of data processing tools e.g. SQL but
recently, due to the variety, velocity and volume of data, these traditional data
processing platforms becomes less efficient to handle this kind of data; thereby
the need to come up with more efficient data stores and processing platforms.
Though NoSQL data stores have lived up to their expectations of storing and
processing large datasets but this process might not be simple and convenient
as in traditional databases. One common cons of NoSQL databases is the lack
of the much loved SQL language.
This thesis will therefore focus on this new type of data stores also called
(NoSQL). Specifically we will focus on HBase which is a column oriented or
BigTable like Database as our choice of NoSQL store.
The fact that NoSQL databases are becoming very popular we will propose our
data mapping methods which can help migration from Relational Databases to
NoSQL databases to be less daunting.
Since this movement is from RDB’s which has rich set of procedures i.e. SQL to
access and manipulate data, we will extend our work to bridge the gap between
SQL and NoSQL by providing methods of using pure ANSI SQL to manipulate
the underlying data which is stored in our NoSQL store (HBase).
[1] David Dewitt and Jim Gray, Parallel Database System: The future of High
Performance Database Systems, ACM 1992.
[2] Fey Chang, Jeffrey Dean et al. BigTable: A Distributed Storage System
for Structured Data
[3] HBase: http://hbase.apache.org/
[4] NoSQL: http://nosql-database.org/
[5] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing
on Large Clusters
[6] Hadoop: http://hadoop.apache.org/
[7] JackHare: http://sourceforge.net/projects/jackhare/
[8] Chongxin Li. RDB to HBase: Transforming Relational Database into
HBase: A Case Study
[9] Hung Bing, et al. Structured Data Processing: Structured Data Processing
on MapReduce in NoSQL Databases
59
Bibliography
[10] Meng-Ju Hsieh, et al. SQLMR: A Scalable Database Management System
for Cloud Computing
[11] GIGACOM 2012 Facebook Data: http://gigaom.com/data/facebook-iscollecting-
your-data-500-terabytes-a-day/
[12] Biswapesh, et al. Tenzing: A SQL implementation On The MapReduce
Framework
[13] Number of Column Family recommendations:
http://hbase.apache.org/book/number.of.cfs.html
[14] JackHare: http://sourceforge.net/projects/jackhare/
[15] Hive:https://cwiki.apache.org/confluence/display/Hive/Home
[16] Apache Pig: http://pig.apache.org/
60