一個為串流資料查詢系統設計之XML路徑選擇性估計方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃人傑 Jen-Chieh Huang
論文名稱：	一個為串流資料查詢系統設計之XML路徑選擇性估計方法 An Efficient Method for Estimating XML Path Selectivity in Stream Query Systems
指導教授：	陳良弼 Arbee L. P. Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2004
畢業學年度：	92
語文別：	英文
論文頁數：	54
中文關鍵詞：	XML 、選擇性、XPath
外文關鍵詞：	XML, path selectivity, XPath
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著串流資料的日益重要，管理此類之龐大而不可預測之資料成為一急須之問題。許多串流資料管理系統 (DSMS) 被提出來解決不同的環境與資料型態之問題。其中一纇即為針對 XML (eXtensible Markup Language) 串流資料。XML 被廣泛用在資訊交換，與描述性之語言。因此，加速 XML 串流的查詢處理為一重要之課題。
在各種查詢處理的最佳化技巧中，選擇性 (selectivity) 的估計為一個重要的傳統方法。在新的串流環境下，決定一個條件敘述的選擇性也是一個重要的最佳化指標，尤其是在有結合 (join) 運算子的情形下。在 XML 串流處理中，對於路徑敘述 (path expressions) 的選擇性預估問題是更加複雜的。因此，我們提出了一個基於資料摘要 (summarization) 的新方法來估計路徑敘述的選擇性。我們所提出的方法是由路徑樹(path tree)結構與馬可夫模型(Markov model)所組成。我們也提出了一個基於選擇性估計的放寬查詢的技術以避免沒有答案被找出。
同時我們也在實驗部分對於不同型態的資料與查詢驗證我們方法的能力，並且考慮不同大小的記憶體所造成的影響。也分析了不同的刪除節點之方式所造成的影響。

Abstract
As the increasing need for streaming applications, management of these unbounded
and unpredicted data becomes an urgent issue. Many Data Stream Management
Systems (DSMSs) are proposed to accommodate to different streaming environments
and data types. One kind of these DSMSs focuses on the XML data type. XML is
widely used in many kinds of applications such as information exchange and web
services. Therefore, it is an important problem to efficiently process XML queries
over data streams.
Among various optimization techniques of query processing, selectivity estimation is
important in the traditional database researches. In the streaming environment, to
determine the selectivity of some condition clauses is also important for query
optimization, especially the case of join operation. In the XML streams, the estimation
of path selectivity is more complex and difficult. Therefore, a novel selectivity
estimation approach based on data summarization is proposed for XML path
expressions over streaming data. This approach is combined with both path tree
approach and Markov-based approach. The newly arrived data are processed and
recorded in the summary tree, and the unimportant data are archived in a preference
matrix. Based on the summary structures, the selectivity of the query paths can be
efficiently estimated. We also propose a prototype of query relaxation system for
XML path queries, which is based on the proposed selectivity estimation method. This
system is used to automatically relax the user query such that the number of query
results will reach the user requirement.
In the experiment results, we show that the selectivity of various path queries can be
efficiently estimated from different kinds of data. The performance under different
memory limits and the node deletion policies are also discussed.
Keywords: XML, XPath, Stream data, selectivity estimation, summarization

 Table of Contents
Abstract    III
Table of Contents    V
List of Figures    VI
1.    Introduction    1
2.    The Overview    6
2.1    The Environment and the Query Processor    6
2.2    Query Optimization Using Path Selectivity Estimation    8
3.    Selectivity Estimation Using Summarization    11
3.1    Construction of the Summary Tree    11
3.2    Deletion of Nodes and Expiration of Data    18
3.3    Estimate the Selectivity Using the Summary Structure    27
4.    Query Relaxation Using Selectivity Estimation    30
4.1    Motivation    30
4.2    the Relaxation Model    36
4.3    System: an Overview    39
4.4    The Proposed Relaxation Method    40
5.    Performance Evaluation    43
5.1    Data Generation    43
5.2    Query Generation    44
5.3    Experiment Environment    45
5.3.1    The Experiment Setup    45
5.3.2    The Measures    45
5.4    Results and Discussion    46
5.4.1    Deletion Policies: D-Val vs. Leaf    46
5.4.2    Relevance of Data vs. Nodes in the Summary Tree    49
6.    Conclusion    51
7.    References    52

List of Figures
Figure 1 a Sample SQL Query    2
Figure 2 a Sample Query with Window Parameters    3
Figure 3 a Sample XQuery with Window Parameters    4
Figure 4 the Example Stream Proxy    6
Figure 5 the Sample Query Processor    7
Figure 6 the Initial Label Table    12
Figure 7 the Sample Path Expressions    13
Figure 8 an Incoming XML Data Tree    14
Figure 9 the Summarization Process at Step One with Parsing Stack    14
Figure 10 the Summarization Process at Step Two with Parsing Stack    15
Figure 11 the Partial Summary Tree with Parsing Stack    15
Figure 12 Another Partial Summary Tree    17
Figure 13 Yet Another Summary Tree    17
Figure 14 the Completed Summary Tree    18
Figure 15 Example of the Order-1 Markov Property    19
Figure 16 the Example of Order-2 Markov Property    20
Figure 17 A Sample Summary Tree to Be Trimmed    21
Figure 18 A Partial Preference Matrix    21
Figure 19 Another Partial Preference Matrix    23
Figure 20 the Partially Deleted Summary Tree    23
Figure 21 the Complete Preference Matrix    24
Figure 22 the Summary Tree Trimmed    24
Figure 23 the D-Value Formula    25
Figure 24 Estimation Function of Order-1    27
Figure 25 the Computation of Dead    27
Figure 26 the Algorithm for Simple Path Estimation    29
Figure 27 a Sample XML Schema    32
Figure 28 a Sample XML Document    33
Figure 29 a VLDB Example    34
Figure 30 a QAC-framework Query    37
Figure 31 a Sample Query to Be Relaxed    38
Figure 32 the Query Relaxation Framework    39
Figure 33 the Relaxation Result    41
Figure 34 the Complete Summary Tree    41
Figure 35 the Relaxed Query    42
Figure 36 the Error Function    45
Figure 37 the EC-Ratio Function    46
Figure 38 Window Size vs. Error with Deletion Ratio 0.2    46
Figure 39 Window Size vs. Error with Deletion Ratio 0.5    47
Figure 40 EC-Ratio vs. Window Size with Node Deletion Ratio 0.2    47
Figure 41 EC-Ratio vs. Window Size with Node Deletion Ratio 0.5    48
Figure 42 Error vs. Node Deletion Ratio of Path Queries with Different Data    49
Figure 43 Error vs. Node Deletion Ratio of Path Queries with Different Data    50

                                

[1] Mehmet Altinel and Michael J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” Proceedings of the 26th Very Large Data Bases Conference (VLDB), 2000.
[2] Yanlei Diao, Peter Fischer, Michael J. Franklin and Raymond To, “YFilter: Efficient and Scalable Filtering of XML Documents,” Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
[3] Chee-Yong Chan, Pascal Felber, Minos Garofalakis and Rajeev Rastoji, “Efficient Filtering of XML Documents with XPath Expressions,” Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
[4] Chee-Yong Chan, Minos Garofalakis and Rajeev Rastoji, “”RE-Tree: An Efficient Index Structure for Regular Expressions,” Proceedings of the 28th Very Large Data Bases Conference (VLDB), 2002.
[5] Madoka Yuriyama and Hiroaki Nakamura, “Filtering Contents by Efficient Evaluation of XPath Expressions,” Proceedings of the 2003 Symposium on Applications and the Internet (SAINT ‘03).
[6] Feng Peng and Sudarshan S. Chawathe, "XPath Queries on Streaming Data," Proceedings of Special Interests Groups Management of Data Conference (SIGMOD) 2003.
[7] Ashish Kumar Gupta and Dan Suciu, "Stream processing of XPath queries with predicates," Proceedings of Special Interest Group on Management of Data Conference (SIGMOD) 2003.
[8] Ashraf Aboulnaga, Jeffrey F. Naughton, Chun Zhang, "Generating Synthetic Complex-structured XML Data," Proceedings of International Workshop on the Web and Databases (WebDB) 2001
[9] Nicholas J. Belkin and W. Bruce Croft, "Information Filtering and Information Retrieval: Two Sides of the Same Coin?" Communications of the ACM (CACM) 35(12):29-38 1992.
[10] Georg Gottolob, Christoph Koch, and Reinhard Pichler, "XPath Processing in a Nutshell," SIGMOD Record 32(1):12-19 2003
[11] Jianjun Chen, David J. DeWitt, Feng Tien, and Yuan Wang, "NiagaraCQ, A Scalable Continuous Query System for Internet Databases," Proceedings of the Special Interest Group on Management of Data Conference (SIGMOD) 2000
[12] Zachary G. Ives, Alon Y. Halevy, and Daniel S. Weld, "An XML Query Engine for Network-Bound Data," Very Large Data Bases Journal 11(4):380-402, 2002.
[13] Dongwon Lee, "Query Relaxation for XML Model," PhD. Thesis
[14] Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton, "Estimating the Selectivity of XML Path Expressions for Internet Scale Applications," Proceedings of 27th Very Large Data Bases Conference (VLDB) 2001.
[15] Neoklis Polyzotis and Minos Garofalakis, "Statistical Synopses for General Graph-Structured XML Documents," Proceedings of Special Interest Group on Management f Data Conference (SIGMOD), 2002.
[16] Ashraf Aboulnaga, Jeffrey F. Naughton, "Building XML Statistics for the Hidden Web," Proceedings of Conference on Information and Knowledge Management (CIKM) 2002
[17] J. Clark, XML path language (XPath), 1999, http://www.w3.org/TR/xpath
[18] J. P. T. Bray and C. M. Sperberg-McQueen. eXtensible Markup Language(xml) 1.0, http://www.w3.org/TR/Rec-xml, 1998
[19] The STREAM Group. "STREAM: The Stanford Stream Data Manager," IEEE Data Engineering Bulletin, Vol. 26 No. 1, March 2003
[20] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and Issues in Data Stream Systems," Proceedings of Principles of Database Systems Conference (PODS) 2002.
[21] Jaewoo Kang, Jeffrey F. Naughton, and Stratis D. Viglac, “Evaluating Window Joins over Unbounded Streams”, Proceedings of the 28th Very Large Data Bases Conference (VLDB) 2002
[22] S. J. Kaplan. “Cooperative Aspects of Database Interactions,” Artificial Intelligence, 19(2):165–187, Oct. 1982.
[23] Lukasz Golab and M. Tamer □zsu, “Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams,” Proceedings of the 29th Very Large Data Bases Conference (VLDB) 2003.
[24] Megginson Technologies, “SAX 1.0: a free API for event-based XML parsing”, http://www.megginson.com/SAX/index.html, May, 1998
[25] TinyXML project, http://tinyxpath.sourceforge.net/, 2004
[26] XQuery, http://www.w3c.org/XML/Query, 2004

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文