研究生: |
黃人傑 Jen-Chieh Huang |
---|---|
論文名稱: |
一個為串流資料查詢系統設計之XML路徑選擇性估計方法 An Efficient Method for Estimating XML Path Selectivity in Stream Query Systems |
指導教授: |
陳良弼
Arbee L. P. Chen |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2004 |
畢業學年度: | 92 |
語文別: | 英文 |
論文頁數: | 54 |
中文關鍵詞: | XML 、選擇性 、XPath |
外文關鍵詞: | XML, path selectivity, XPath |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著串流資料的日益重要,管理此類之龐大而不可預測之資料成為一急須之問題。許多串流資料管理系統 (DSMS) 被提出來解決不同的環境與資料型態之問題。其中一纇即為針對 XML (eXtensible Markup Language) 串流資料。XML 被廣泛用在資訊交換,與描述性之語言。因此,加速 XML 串流的查詢處理為一重要之課題。
在各種查詢處理的最佳化技巧中,選擇性 (selectivity) 的估計為一個重要的傳統方法。在新的串流環境下,決定一個條件敘述的選擇性也是一個重要的最佳化指標,尤其是在有結合 (join) 運算子的情形下。在 XML 串流處理中,對於路徑敘述 (path expressions) 的選擇性預估問題是更加複雜的。因此,我們提出了一個基於資料摘要 (summarization) 的新方法來估計路徑敘述的選擇性。我們所提出的方法是由路徑樹(path tree)結構與馬可夫模型(Markov model)所組成。我們也提出了一個基於選擇性估計的放寬查詢的技術以避免沒有答案被找出。
同時我們也在實驗部分對於不同型態的資料與查詢驗證我們方法的能力,並且考慮不同大小的記憶體所造成的影響。也分析了不同的刪除節點之方式所造成的影響。
Abstract
As the increasing need for streaming applications, management of these unbounded
and unpredicted data becomes an urgent issue. Many Data Stream Management
Systems (DSMSs) are proposed to accommodate to different streaming environments
and data types. One kind of these DSMSs focuses on the XML data type. XML is
widely used in many kinds of applications such as information exchange and web
services. Therefore, it is an important problem to efficiently process XML queries
over data streams.
Among various optimization techniques of query processing, selectivity estimation is
important in the traditional database researches. In the streaming environment, to
determine the selectivity of some condition clauses is also important for query
optimization, especially the case of join operation. In the XML streams, the estimation
of path selectivity is more complex and difficult. Therefore, a novel selectivity
estimation approach based on data summarization is proposed for XML path
expressions over streaming data. This approach is combined with both path tree
approach and Markov-based approach. The newly arrived data are processed and
recorded in the summary tree, and the unimportant data are archived in a preference
matrix. Based on the summary structures, the selectivity of the query paths can be
efficiently estimated. We also propose a prototype of query relaxation system for
XML path queries, which is based on the proposed selectivity estimation method. This
system is used to automatically relax the user query such that the number of query
results will reach the user requirement.
In the experiment results, we show that the selectivity of various path queries can be
efficiently estimated from different kinds of data. The performance under different
memory limits and the node deletion policies are also discussed.
Keywords: XML, XPath, Stream data, selectivity estimation, summarization
[1] Mehmet Altinel and Michael J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” Proceedings of the 26th Very Large Data Bases Conference (VLDB), 2000.
[2] Yanlei Diao, Peter Fischer, Michael J. Franklin and Raymond To, “YFilter: Efficient and Scalable Filtering of XML Documents,” Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
[3] Chee-Yong Chan, Pascal Felber, Minos Garofalakis and Rajeev Rastoji, “Efficient Filtering of XML Documents with XPath Expressions,” Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
[4] Chee-Yong Chan, Minos Garofalakis and Rajeev Rastoji, “”RE-Tree: An Efficient Index Structure for Regular Expressions,” Proceedings of the 28th Very Large Data Bases Conference (VLDB), 2002.
[5] Madoka Yuriyama and Hiroaki Nakamura, “Filtering Contents by Efficient Evaluation of XPath Expressions,” Proceedings of the 2003 Symposium on Applications and the Internet (SAINT ‘03).
[6] Feng Peng and Sudarshan S. Chawathe, "XPath Queries on Streaming Data," Proceedings of Special Interests Groups Management of Data Conference (SIGMOD) 2003.
[7] Ashish Kumar Gupta and Dan Suciu, "Stream processing of XPath queries with predicates," Proceedings of Special Interest Group on Management of Data Conference (SIGMOD) 2003.
[8] Ashraf Aboulnaga, Jeffrey F. Naughton, Chun Zhang, "Generating Synthetic Complex-structured XML Data," Proceedings of International Workshop on the Web and Databases (WebDB) 2001
[9] Nicholas J. Belkin and W. Bruce Croft, "Information Filtering and Information Retrieval: Two Sides of the Same Coin?" Communications of the ACM (CACM) 35(12):29-38 1992.
[10] Georg Gottolob, Christoph Koch, and Reinhard Pichler, "XPath Processing in a Nutshell," SIGMOD Record 32(1):12-19 2003
[11] Jianjun Chen, David J. DeWitt, Feng Tien, and Yuan Wang, "NiagaraCQ, A Scalable Continuous Query System for Internet Databases," Proceedings of the Special Interest Group on Management of Data Conference (SIGMOD) 2000
[12] Zachary G. Ives, Alon Y. Halevy, and Daniel S. Weld, "An XML Query Engine for Network-Bound Data," Very Large Data Bases Journal 11(4):380-402, 2002.
[13] Dongwon Lee, "Query Relaxation for XML Model," PhD. Thesis
[14] Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton, "Estimating the Selectivity of XML Path Expressions for Internet Scale Applications," Proceedings of 27th Very Large Data Bases Conference (VLDB) 2001.
[15] Neoklis Polyzotis and Minos Garofalakis, "Statistical Synopses for General Graph-Structured XML Documents," Proceedings of Special Interest Group on Management f Data Conference (SIGMOD), 2002.
[16] Ashraf Aboulnaga, Jeffrey F. Naughton, "Building XML Statistics for the Hidden Web," Proceedings of Conference on Information and Knowledge Management (CIKM) 2002
[17] J. Clark, XML path language (XPath), 1999, http://www.w3.org/TR/xpath
[18] J. P. T. Bray and C. M. Sperberg-McQueen. eXtensible Markup Language(xml) 1.0, http://www.w3.org/TR/Rec-xml, 1998
[19] The STREAM Group. "STREAM: The Stanford Stream Data Manager," IEEE Data Engineering Bulletin, Vol. 26 No. 1, March 2003
[20] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and Issues in Data Stream Systems," Proceedings of Principles of Database Systems Conference (PODS) 2002.
[21] Jaewoo Kang, Jeffrey F. Naughton, and Stratis D. Viglac, “Evaluating Window Joins over Unbounded Streams”, Proceedings of the 28th Very Large Data Bases Conference (VLDB) 2002
[22] S. J. Kaplan. “Cooperative Aspects of Database Interactions,” Artificial Intelligence, 19(2):165–187, Oct. 1982.
[23] Lukasz Golab and M. Tamer □zsu, “Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams,” Proceedings of the 29th Very Large Data Bases Conference (VLDB) 2003.
[24] Megginson Technologies, “SAX 1.0: a free API for event-based XML parsing”, http://www.megginson.com/SAX/index.html, May, 1998
[25] TinyXML project, http://tinyxpath.sourceforge.net/, 2004
[26] XQuery, http://www.w3c.org/XML/Query, 2004