簡易檢索 / 詳目顯示

研究生: 黃人傑
Jen-Chieh Huang
論文名稱: 一個為串流資料查詢系統設計之XML路徑選擇性估計方法
An Efficient Method for Estimating XML Path Selectivity in Stream Query Systems
指導教授: 陳良弼
Arbee L. P. Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 54
中文關鍵詞: XML選擇性XPath
外文關鍵詞: XML, path selectivity, XPath
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著串流資料的日益重要,管理此類之龐大而不可預測之資料成為一急須之問題。許多串流資料管理系統 (DSMS) 被提出來解決不同的環境與資料型態之問題。其中一纇即為針對 XML (eXtensible Markup Language) 串流資料。XML 被廣泛用在資訊交換,與描述性之語言。因此,加速 XML 串流的查詢處理為一重要之課題。
    在各種查詢處理的最佳化技巧中,選擇性 (selectivity) 的估計為一個重要的傳統方法。在新的串流環境下,決定一個條件敘述的選擇性也是一個重要的最佳化指標,尤其是在有結合 (join) 運算子的情形下。在 XML 串流處理中,對於路徑敘述 (path expressions) 的選擇性預估問題是更加複雜的。因此,我們提出了一個基於資料摘要 (summarization) 的新方法來估計路徑敘述的選擇性。我們所提出的方法是由路徑樹(path tree)結構與馬可夫模型(Markov model)所組成。我們也提出了一個基於選擇性估計的放寬查詢的技術以避免沒有答案被找出。
    同時我們也在實驗部分對於不同型態的資料與查詢驗證我們方法的能力,並且考慮不同大小的記憶體所造成的影響。也分析了不同的刪除節點之方式所造成的影響。


    Abstract
    As the increasing need for streaming applications, management of these unbounded
    and unpredicted data becomes an urgent issue. Many Data Stream Management
    Systems (DSMSs) are proposed to accommodate to different streaming environments
    and data types. One kind of these DSMSs focuses on the XML data type. XML is
    widely used in many kinds of applications such as information exchange and web
    services. Therefore, it is an important problem to efficiently process XML queries
    over data streams.
    Among various optimization techniques of query processing, selectivity estimation is
    important in the traditional database researches. In the streaming environment, to
    determine the selectivity of some condition clauses is also important for query
    optimization, especially the case of join operation. In the XML streams, the estimation
    of path selectivity is more complex and difficult. Therefore, a novel selectivity
    estimation approach based on data summarization is proposed for XML path
    expressions over streaming data. This approach is combined with both path tree
    approach and Markov-based approach. The newly arrived data are processed and
    recorded in the summary tree, and the unimportant data are archived in a preference
    matrix. Based on the summary structures, the selectivity of the query paths can be
    efficiently estimated. We also propose a prototype of query relaxation system for
    XML path queries, which is based on the proposed selectivity estimation method. This
    system is used to automatically relax the user query such that the number of query
    results will reach the user requirement.
    In the experiment results, we show that the selectivity of various path queries can be
    efficiently estimated from different kinds of data. The performance under different
    memory limits and the node deletion policies are also discussed.
    Keywords: XML, XPath, Stream data, selectivity estimation, summarization

    Table of Contents Abstract III Table of Contents V List of Figures VI 1. Introduction 1 2. The Overview 6 2.1 The Environment and the Query Processor 6 2.2 Query Optimization Using Path Selectivity Estimation 8 3. Selectivity Estimation Using Summarization 11 3.1 Construction of the Summary Tree 11 3.2 Deletion of Nodes and Expiration of Data 18 3.3 Estimate the Selectivity Using the Summary Structure 27 4. Query Relaxation Using Selectivity Estimation 30 4.1 Motivation 30 4.2 the Relaxation Model 36 4.3 System: an Overview 39 4.4 The Proposed Relaxation Method 40 5. Performance Evaluation 43 5.1 Data Generation 43 5.2 Query Generation 44 5.3 Experiment Environment 45 5.3.1 The Experiment Setup 45 5.3.2 The Measures 45 5.4 Results and Discussion 46 5.4.1 Deletion Policies: D-Val vs. Leaf 46 5.4.2 Relevance of Data vs. Nodes in the Summary Tree 49 6. Conclusion 51 7. References 52 List of Figures Figure 1 a Sample SQL Query 2 Figure 2 a Sample Query with Window Parameters 3 Figure 3 a Sample XQuery with Window Parameters 4 Figure 4 the Example Stream Proxy 6 Figure 5 the Sample Query Processor 7 Figure 6 the Initial Label Table 12 Figure 7 the Sample Path Expressions 13 Figure 8 an Incoming XML Data Tree 14 Figure 9 the Summarization Process at Step One with Parsing Stack 14 Figure 10 the Summarization Process at Step Two with Parsing Stack 15 Figure 11 the Partial Summary Tree with Parsing Stack 15 Figure 12 Another Partial Summary Tree 17 Figure 13 Yet Another Summary Tree 17 Figure 14 the Completed Summary Tree 18 Figure 15 Example of the Order-1 Markov Property 19 Figure 16 the Example of Order-2 Markov Property 20 Figure 17 A Sample Summary Tree to Be Trimmed 21 Figure 18 A Partial Preference Matrix 21 Figure 19 Another Partial Preference Matrix 23 Figure 20 the Partially Deleted Summary Tree 23 Figure 21 the Complete Preference Matrix 24 Figure 22 the Summary Tree Trimmed 24 Figure 23 the D-Value Formula 25 Figure 24 Estimation Function of Order-1 27 Figure 25 the Computation of Dead 27 Figure 26 the Algorithm for Simple Path Estimation 29 Figure 27 a Sample XML Schema 32 Figure 28 a Sample XML Document 33 Figure 29 a VLDB Example 34 Figure 30 a QAC-framework Query 37 Figure 31 a Sample Query to Be Relaxed 38 Figure 32 the Query Relaxation Framework 39 Figure 33 the Relaxation Result 41 Figure 34 the Complete Summary Tree 41 Figure 35 the Relaxed Query 42 Figure 36 the Error Function 45 Figure 37 the EC-Ratio Function 46 Figure 38 Window Size vs. Error with Deletion Ratio 0.2 46 Figure 39 Window Size vs. Error with Deletion Ratio 0.5 47 Figure 40 EC-Ratio vs. Window Size with Node Deletion Ratio 0.2 47 Figure 41 EC-Ratio vs. Window Size with Node Deletion Ratio 0.5 48 Figure 42 Error vs. Node Deletion Ratio of Path Queries with Different Data 49 Figure 43 Error vs. Node Deletion Ratio of Path Queries with Different Data 50

    [1] Mehmet Altinel and Michael J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” Proceedings of the 26th Very Large Data Bases Conference (VLDB), 2000.
    [2] Yanlei Diao, Peter Fischer, Michael J. Franklin and Raymond To, “YFilter: Efficient and Scalable Filtering of XML Documents,” Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
    [3] Chee-Yong Chan, Pascal Felber, Minos Garofalakis and Rajeev Rastoji, “Efficient Filtering of XML Documents with XPath Expressions,” Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
    [4] Chee-Yong Chan, Minos Garofalakis and Rajeev Rastoji, “”RE-Tree: An Efficient Index Structure for Regular Expressions,” Proceedings of the 28th Very Large Data Bases Conference (VLDB), 2002.
    [5] Madoka Yuriyama and Hiroaki Nakamura, “Filtering Contents by Efficient Evaluation of XPath Expressions,” Proceedings of the 2003 Symposium on Applications and the Internet (SAINT ‘03).
    [6] Feng Peng and Sudarshan S. Chawathe, "XPath Queries on Streaming Data," Proceedings of Special Interests Groups Management of Data Conference (SIGMOD) 2003.
    [7] Ashish Kumar Gupta and Dan Suciu, "Stream processing of XPath queries with predicates," Proceedings of Special Interest Group on Management of Data Conference (SIGMOD) 2003.
    [8] Ashraf Aboulnaga, Jeffrey F. Naughton, Chun Zhang, "Generating Synthetic Complex-structured XML Data," Proceedings of International Workshop on the Web and Databases (WebDB) 2001
    [9] Nicholas J. Belkin and W. Bruce Croft, "Information Filtering and Information Retrieval: Two Sides of the Same Coin?" Communications of the ACM (CACM) 35(12):29-38 1992.
    [10] Georg Gottolob, Christoph Koch, and Reinhard Pichler, "XPath Processing in a Nutshell," SIGMOD Record 32(1):12-19 2003
    [11] Jianjun Chen, David J. DeWitt, Feng Tien, and Yuan Wang, "NiagaraCQ, A Scalable Continuous Query System for Internet Databases," Proceedings of the Special Interest Group on Management of Data Conference (SIGMOD) 2000
    [12] Zachary G. Ives, Alon Y. Halevy, and Daniel S. Weld, "An XML Query Engine for Network-Bound Data," Very Large Data Bases Journal 11(4):380-402, 2002.
    [13] Dongwon Lee, "Query Relaxation for XML Model," PhD. Thesis
    [14] Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton, "Estimating the Selectivity of XML Path Expressions for Internet Scale Applications," Proceedings of 27th Very Large Data Bases Conference (VLDB) 2001.
    [15] Neoklis Polyzotis and Minos Garofalakis, "Statistical Synopses for General Graph-Structured XML Documents," Proceedings of Special Interest Group on Management f Data Conference (SIGMOD), 2002.
    [16] Ashraf Aboulnaga, Jeffrey F. Naughton, "Building XML Statistics for the Hidden Web," Proceedings of Conference on Information and Knowledge Management (CIKM) 2002
    [17] J. Clark, XML path language (XPath), 1999, http://www.w3.org/TR/xpath
    [18] J. P. T. Bray and C. M. Sperberg-McQueen. eXtensible Markup Language(xml) 1.0, http://www.w3.org/TR/Rec-xml, 1998
    [19] The STREAM Group. "STREAM: The Stanford Stream Data Manager," IEEE Data Engineering Bulletin, Vol. 26 No. 1, March 2003
    [20] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and Issues in Data Stream Systems," Proceedings of Principles of Database Systems Conference (PODS) 2002.
    [21] Jaewoo Kang, Jeffrey F. Naughton, and Stratis D. Viglac, “Evaluating Window Joins over Unbounded Streams”, Proceedings of the 28th Very Large Data Bases Conference (VLDB) 2002
    [22] S. J. Kaplan. “Cooperative Aspects of Database Interactions,” Artificial Intelligence, 19(2):165–187, Oct. 1982.
    [23] Lukasz Golab and M. Tamer □zsu, “Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams,” Proceedings of the 29th Very Large Data Bases Conference (VLDB) 2003.
    [24] Megginson Technologies, “SAX 1.0: a free API for event-based XML parsing”, http://www.megginson.com/SAX/index.html, May, 1998
    [25] TinyXML project, http://tinyxpath.sourceforge.net/, 2004
    [26] XQuery, http://www.w3c.org/XML/Query, 2004

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE