簡易檢索 / 詳目顯示

研究生: 陳穎毅
Ying-Yi Chen
論文名稱: 利用節點間之結構關係為XML資料建立索引
Indexing XML Data Using Structural Relationships between Nodes
指導教授: 陳良弼
Arbee L.P. Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 39
中文關鍵詞: 半結構性資料索引XML結構關係
外文關鍵詞: Semi-Structured Data, Index, XML, Structural Relationship
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 對XML資料的查詢,通常可分成兩個部份:第一是節點(元素)的內容,第二則是節點間的(結構)關係。因為對內容的查詢,可以直接引用在傳統資料庫上已經發展成熟的技術來處理;因此,如何有效率地評估節點間的關係,對XML查詢引擎的效能有決定性的影響。

    現今較具代表性的XML查詢處理技術,仿效資訊擷取領域中常用的觀念,將XML資料以反轉表(Inverted List)的形式儲存。要評估兩個節點是否滿足某種結構關係,可將存有此二節點的兩個反轉表取出,進行結構結合(Sturtural Join)之操作來得到答案。然而這種作法主要的問題是:在進行結構結合之前,必須先讀取兩個反轉表,導致可觀的I/O負擔。

    本論文中,我們提出一種新的想法:根據節點間的結構關係,對XML資料中全部的節點建立索引。存在某個反轉表內的節點,可以根據他們跟其他節點間的結構關係(有哪些祖先?有哪些子孫?)來分類。因此,反轉表可以被進一步細分成許多子反轉表。如果已經知道某個節點屬於哪一類,就可以直接取出此類所對應的那些子反轉表,而不需要讀取整個反轉表。

    我們的實驗顯示了這個作法在處理高度不規則的XML資料時,能對查詢效能有所助益。


    Queries on XML data usually involve selections on contents (values) of nodes
    and structural relationships between nodes. While querying by values can borrow
    from traditional database technologies, how to efficiently evaluate structural portion
    of a query is critical to the performance of XML query processing engines.
    One representative approach to evaluate the structural portion is information
    retrieval style processing using inverted lists. Nodes of the same label are collected in
    one list. The containment relationship of two nodes is evaluated by performing
    structural join on two lists. The major challenge of this approach is the high I/O cost
    to access the lists on disk. In this thesis, we proposed a new index structure on the
    inverted lists based on structural relationships between nodes. Nodes in one list can be
    further classified to sub-lists by which children/ancestors/parent they have. Thus, if
    we know the class to which candidate nodes belong, we can directly fetch the sub-list
    of the class and skip unnecessary entries. Our experiments show the benefit of the
    proposed approach in highly irregular XML data.

    Abstract…………………………………………………………………………….II Acknowledgements………………………………………………………………...III Contents…………………………………………………………………………...IV List of Figures………………………………………………………………………..V List of Tables………………………………………………………………………VI 1. Introduction……………………………………………………………………..1 2. Preliminaries……………………………………………………………………7 3. Indexing By Set-Valued Attributes of Nodes…………………………………9 4. Subset Queries Using Bloom Filters…………………………………………13 4-1 Bloom Filter Basics………………………………………………………13 4-2 The False Positive Rate…………………………………………………16 4-3 The Subset Query………………………………………………………17 4-4 The Bit-Sliced Bloom Filter File (BSBFF)……………………………18 4-5 Integrating BSBFF into the ISR Scheme………………………………19 5. The Mixed Approach…………………………………………………………21 5-1 Analysis…………………………………………………………………21 5-2 The Mixed Approach……………………………………………………22 6. Experiments……………………………………………………………………24 7. Related Works…………………………………………………………………29 8. Conclusion……………………………………………………………………31 References…………………………………………………………………………32

    [1] S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu.
    Structural joins: A primitive for efficient XML query pattern matching. In
    Proceedings of ICDE, 2002.
    [2] T. C. Bell, A. Moffat, C. G. Nevill-Manning, I. H. Witten and J. Zobel. Data
    compression in full-text retrieval systems, Journal of the American Society for
    Information Science, 44(9):508–531, 1993.
    [3] B. H. Bloom. Space/time trade-offs in hash coding with allowable errors.
    Communications of the ACM, 13(7):422–426, 1970.
    [4] K. Bratbergsengen. Hashing Methods and Relational Algebra Operations. In
    Proceedings of VLDB, 1984.
    [5] A. Broder and M. Mitzenmacher. Network Applications of Bloom Filters: A
    Survey. Internet Mathematics 1(4):485–509, 2003.
    [6] N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML
    pattern matching. In Proceedings of SIGMOD, 2002.
    [7] D. D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language
    for heterogeneous data sources. In Proceedings of WebDB, 2000.
    [8] S. Chen, A. Ailamaki, P. B. Gibbons and T. C. Mowry. Inspector Joins. In
    Proceedings of VLDB, 2005.
    [9] T. Chen, J. Lu and T. W. Ling. On Boosting Holism in XML Twig Pattern
    Matching using Structural Indexing Techniques. In Proceedings of SIGMOD,
    2005.
    [10] S-Y. Chien, Z. Vagena, D. Zhang, V. J. Tsotras, and C. Zaniolo. Efficient
    Structural Joins on Indexed XML Documents. In Proceedings of VLDB, 2002.
    [11] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A
    query language for XML. Submission to theWorldWideWeb Consortium
    19-August-1998. http://www.w3.org/TR/NOTE-xml-ql, 1998.
    [12] Extensible Markup Language (XML). http://www.w3.org/XML/.
    [13] C. Faloutsos and R. Chan. Fast Text Access Methods for Optical and Large
    Magnetic Disks: Designs and Performance Comparison. In Proceedings of
    VLDB, 1988.
    [14] L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary Cache: A Scalable
    Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactions on
    Networking 8(3):281–293, 2000.
    [15] H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-Tree: Indexing XML Data for
    Efficient Structural Joins. In Proceedings of ICDE, 2003.
    [16] R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the
    Integration of Structure Indexes and Inverted Lists. In Proceedings of
    SIGMOD, 2004.
    [17] H. Li, M.L. Lee, W. Hsu and C. Chen. An Evaluation of XML Indexes for
    Structural Join. SIGMOD Record 33(3):28–33, 2004.
    [18] Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path
    Expressions. In Proceedings of VLDB, 2001.
    [19] J. Lu, T. W. Ling, C-Y. Chan and T. Chen. From Region Encoding To
    Extended Dewey: On Efficient Processing of XML Twig Pattern Matching. In
    Proceedings of VLDB, 2005.
    [20] M. D. McIlroy. Development of a Spelling List. IEEE Transactions on
    Communications 30(1):91–99, 1982.
    [21] M. Mitzenmacher. Compressed Bloom Filter. IEEE/ACM Transactions on
    Networking 10(5):604–612, 2002.
    [22] P. Rao and B. Moon. PRIX: Indexing And Querying XML Using Prufer
    Sequences. In Proceedings of ICDE, 2004.
    [23] The Penn Treebank Project. http://www.cis.upenn.edu/~treebank/home.html.
    [24] X. Wu, M. L. Lee and W. Hsu. A Prime Number Labeling Scheme for
    Dynamic Ordered XML Trees. In Proceedings of ICDE, 2004.
    [25] XPath 2.0. http://www.w3.org/TR/XPath20/.
    [26] XQuery 1.0. http://www.w3.org/TR/XQuery/.
    [27] J. Zobel, A. Moffat and K. Ramamohanarao. Inverted files versus signature
    files for text indexing. ACM Transactions on Database Systems,23(4):453–490,
    1998.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE