簡易檢索 / 詳目顯示

研究生: 吳慶順
Nigel
論文名稱: 有效處理支脈配對在可擴展標記語言資料上的索引
XCut: Indexing XML Data for Efficient Twig Matching
指導教授: 許奮輝
Fenn-Huei Sheu
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 32
中文關鍵詞: 可擴展標記語言查詢
外文關鍵詞: XML, query
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 有效計算XML(extensible markup language)查詢就是要能快速地找到所有符合支脈片斷的地方,也就是XML文件樹狀結構的子圖。第一種方法是將查詢片斷視為查詢樹,而且對於查詢樹裡的每個節點賦予一個指標(cursor)和堆壘(stack),指標是為了能循序處理,而堆疊是為了存放部份滿足的XML元素。即使事先對元素建立的索引能有效地跳過不滿足的地方,但這個方法只對查詢片斷只含有祖孫(ancestor-descendant)關係的邊(edge)有效。另一種方法是先將XML資料和查詢先一對一地轉換成字串。接著利用字串索引做子字串的比對,也能達到同等於跳躍的效果。這種方法在支脈查詢包含有順序的父子(parent-child)關係的邊和節點有選擇性判斷時執行效果較好。本篇論文提出的方法是將第一種方法的堆疊改用雙向佇列(Queue)暫存索引樹的內部內容,以能夠處理階層式的搜尋。我們的設計允許佇列間交互的檢查以去滿足支脈配對。在索引樹內部階層提早過濾可以加快處理的速度。對於樹葉內容有查詢元素完整的資訊,我們的設計可以去除任何子圖是違反父子關係的邊和在查詢節點裡選擇性的判斷。實驗的結果也證明我們的方法是優於第二種方法。


    Efficient evaluation of XML queries entails fast finding all the occurrences of a twig pattern, a subgraph of tree-structured XML documents. One approach models the pattern as a query tree directly, and associates a cursor and a stack to each tree node for coordinated sequential access and partial match tally of all the XML elements. Even if pre-built indices on the elements can facilitate skipping mismatches, this approach works well only when the pattern purely involves ancestor-descendent edges. Another strategy first transforms XML data and query trees one-to-one into sequences. Then, string indices are used for subsequence matching to achieve equivalent skips in transform domain. This strategy performs better when the twig contains ordered parent-child edges and nodes with their own selection predicates. This thesis proposes to replace the stacks used in the first approach by simple pipes (double-ended queues) to queue too inner entries of index trees accessed by level-order traversal. Our design permits the cross-verification among queued entries subject to the twig pattern. Early filtration at the inner level of index trees is thus achievable to further expedite processing. Joint with the leaf entries having full specifications of query elements, our design can also suppress any subgraph that violates parent-child edges and selection predicates at query tree nodes. Intensive experimental results also demonstrate such superiority over the second strategy.

    TABLE OF CONTENTS I LIST OF FIGURES II CHAPTER 1 INTRODUCTION 1 CHAPTER 2 RELATED WORK 5 2.1 PRIX (PRÜFER SEQUENCE FOR INDEXING XML) 5 2.2 XR-TREE AND XRTWIG ALGORITHM 7 CHAPTER 3 OUR SOLUTION: XCUT (CROSSCUT) ALGORITHM 9 3.1 XCUT ALGORITHM USING PIPES (DOUBLE QUEUES) 9 3.2 LEVEL-ORDER TRAVERSAL BY XB+-TREES 14 3.3 OPTIMIZATION THROUGH VIRTUAL LINKS 19 CHAPTER 4 PERFORMANCE STUDY 22 4.1 EXPERIMENTAL SETUP 22 4.2 XCUT VS. XRTWIG ON HOLISTIC STRUCTURAL JOINS 24 4.3 XCUT VS. PRIX ON GENERAL TWIG QUERIES 26 4.4 THE EFFECTIVENESS OF USING VIRTUAL LINKS 28 CHAPTER 5 CONCLUDING REMARKS 30

    [1] S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu, "Structural Joins: A Primitive for Efficient XML Query Pattern Matching," in Proc. of ICDE, pp. 141-152, 2002.
    [2] S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon, "XQuery 1.0: An XML Query Language," http://www.w3.org/TR/xquery/, 2003.
    [3] N. Bruno, N. Koudas, and D. Srivastava, "Holistic Twig Joins: Optimal XML Pattern Matching," in Proc. of ACM SIGMOD, Wisconsin, June 4-6, 2002.
    [4] Z. Chen, H. V. Jagadish, L. V. S. Lakshmanan, and S. Paparizos, "From Tree Patterns to Generalized Tree Patterns: On Efficient Evaluation of XQuery," in Proc. of VLDB, Germany, Sept., 2003.
    [5] S.-Y. Chien, Z. Vagena, D. Zhang, V. J. Tsotras, and C. Zaniolo, "Efficient Structural Joins on Indexed XML Documents," in Proc. of VLDB, 2002.
    [6] C.-W. Chung, J.-K. Min, and K. Shim, "APEX: An Adaptive Path Index for XML data," in Proc. of ACM SIGMOD, Wisconsin, June 4-6, 2002.
    [7] B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon, "A Fast Index for Semistructured Data," in Proc. of VLDB, 2001.
    [8] CWI, "XMark: An XML Benchmark Project," http://monetdb.cwi.nl/xml/.
    [9] T. Grust, "Accelerating XPath Location Steps," in Proc. of ACM SIGMOD, Wisconsin, June 4-6, 2002.
    [10] S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu, "Approximate XML Joins," in Proc. of ACM SIGMOD, Wisconsin, June 4-6, 2002.
    [11] A. Halverson, J. Burger, L. Galanis, A. Kini, R. Krishnamurthy, A. H. Rao, F. Tian, S. D. Viglas, Y. Wang, J. F. Naughton, and D. DeWitt, "Mixed Mode XML Query Processing," in Proc. of VLDB, 2003.
    [12] H. Jiang, H. Lu, and W. Wang, "Efficient Processing of XML Twig Queries with OR-Predicates," in Proc. of SIGMOD, Paris, France, June 13-18, 2004.
    [13] H. Jiang, H. Lu, W. Wang, and B. C. Ooi, "XR-Tree: Indexing XML Data for Efficient Structural Joins," in Proc. of IEEE ICDE, pp. 253-264, 2003.
    [14] H. Jiang, W. Wang, H. Lu, and J. X. Yu, "Holistic Twig Joins on Indexed XML Documents," in Proc. of VLDB, Germany, Sept., 2003.
    [15] R. Kaushik, P. Bohannon, J. F. Naughton, and H. F. Korth, "Covering Indexes for Branching Path Queries," in Proc. of SIGMOD, June 4-6, 2002.
    [16] Q. Li and B. Moon, "Indexing and Querying XML Data for Regular Path Expressions," in Proc. of VLDB, Roma, Italy, pp. 361-370, Sept. 11-14, 2001.
    [17] N. Polyzotis, M. Garofalakis, and Y. Ioannidis, "Selectivity Estimation for XML Twigs," in Proc. of IEEE Data Engineering, Boston, March, 2004.
    [18] P. Rao and B. Moon, "PRIX: Indexing and Querying XML using Prüfer Sequences," in Proc. of IEEE Data Engineering, Boston, March, 2004.
    [19] U. X. Repository, "XML Data Repository," http://www.cs.washington.edu/research/xmldatasets.
    [20] K. Runapongsa, J. M. Patel, H. V. Jagadish, Y. Chen, and S. Al-Khalifa, "The Michigan Benchmark: Towards XML Query Performance Diagnostics," in Proc. of VLDB, Berlin, Germany, 2003.
    [21] H. Wang, S. Park, W. Fan, and P. S. Yu, "ViST: A Dynamic Index Method for Querying XML Data by Tree Structures," in Proc. of SIGMOD, 2003.
    [22] W. Wang, H. Jiang, H. Lu, and J. X. Yu, "Containment Join Size Estimation: Models and Methods," in Proc. of SIGMOD, 2003.
    [23] Y. Wu, J. M. Patel, and H. V. Jagadish, "Structural Join Order Selection for XML Query Optimization," in Proc. of IEEE Data Engineering, March 5-8, 2003.
    [24] B. B. Yao, M. T. Özsu, and N. Khandelwal, "XBench Benchmark and Performance Testing for XML DBMSs," in Proc. of ICDE, March, 2004.
    [25] C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. M. Lohman, "On Supporting Containment Queries in Relational Database Management Systems," in Proc. of SIGMOD, pp. 425-436, 2001.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE