研究生: |
陳穎毅 Ying-Yi Chen |
---|---|
論文名稱: |
利用節點間之結構關係為XML資料建立索引 Indexing XML Data Using Structural Relationships between Nodes |
指導教授: |
陳良弼
Arbee L.P. Chen |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | 半結構性資料 、索引 、XML 、結構關係 |
外文關鍵詞: | Semi-Structured Data, Index, XML, Structural Relationship |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
對XML資料的查詢,通常可分成兩個部份:第一是節點(元素)的內容,第二則是節點間的(結構)關係。因為對內容的查詢,可以直接引用在傳統資料庫上已經發展成熟的技術來處理;因此,如何有效率地評估節點間的關係,對XML查詢引擎的效能有決定性的影響。
現今較具代表性的XML查詢處理技術,仿效資訊擷取領域中常用的觀念,將XML資料以反轉表(Inverted List)的形式儲存。要評估兩個節點是否滿足某種結構關係,可將存有此二節點的兩個反轉表取出,進行結構結合(Sturtural Join)之操作來得到答案。然而這種作法主要的問題是:在進行結構結合之前,必須先讀取兩個反轉表,導致可觀的I/O負擔。
本論文中,我們提出一種新的想法:根據節點間的結構關係,對XML資料中全部的節點建立索引。存在某個反轉表內的節點,可以根據他們跟其他節點間的結構關係(有哪些祖先?有哪些子孫?)來分類。因此,反轉表可以被進一步細分成許多子反轉表。如果已經知道某個節點屬於哪一類,就可以直接取出此類所對應的那些子反轉表,而不需要讀取整個反轉表。
我們的實驗顯示了這個作法在處理高度不規則的XML資料時,能對查詢效能有所助益。
Queries on XML data usually involve selections on contents (values) of nodes
and structural relationships between nodes. While querying by values can borrow
from traditional database technologies, how to efficiently evaluate structural portion
of a query is critical to the performance of XML query processing engines.
One representative approach to evaluate the structural portion is information
retrieval style processing using inverted lists. Nodes of the same label are collected in
one list. The containment relationship of two nodes is evaluated by performing
structural join on two lists. The major challenge of this approach is the high I/O cost
to access the lists on disk. In this thesis, we proposed a new index structure on the
inverted lists based on structural relationships between nodes. Nodes in one list can be
further classified to sub-lists by which children/ancestors/parent they have. Thus, if
we know the class to which candidate nodes belong, we can directly fetch the sub-list
of the class and skip unnecessary entries. Our experiments show the benefit of the
proposed approach in highly irregular XML data.
[1] S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu.
Structural joins: A primitive for efficient XML query pattern matching. In
Proceedings of ICDE, 2002.
[2] T. C. Bell, A. Moffat, C. G. Nevill-Manning, I. H. Witten and J. Zobel. Data
compression in full-text retrieval systems, Journal of the American Society for
Information Science, 44(9):508–531, 1993.
[3] B. H. Bloom. Space/time trade-offs in hash coding with allowable errors.
Communications of the ACM, 13(7):422–426, 1970.
[4] K. Bratbergsengen. Hashing Methods and Relational Algebra Operations. In
Proceedings of VLDB, 1984.
[5] A. Broder and M. Mitzenmacher. Network Applications of Bloom Filters: A
Survey. Internet Mathematics 1(4):485–509, 2003.
[6] N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML
pattern matching. In Proceedings of SIGMOD, 2002.
[7] D. D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language
for heterogeneous data sources. In Proceedings of WebDB, 2000.
[8] S. Chen, A. Ailamaki, P. B. Gibbons and T. C. Mowry. Inspector Joins. In
Proceedings of VLDB, 2005.
[9] T. Chen, J. Lu and T. W. Ling. On Boosting Holism in XML Twig Pattern
Matching using Structural Indexing Techniques. In Proceedings of SIGMOD,
2005.
[10] S-Y. Chien, Z. Vagena, D. Zhang, V. J. Tsotras, and C. Zaniolo. Efficient
Structural Joins on Indexed XML Documents. In Proceedings of VLDB, 2002.
[11] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A
query language for XML. Submission to theWorldWideWeb Consortium
19-August-1998. http://www.w3.org/TR/NOTE-xml-ql, 1998.
[12] Extensible Markup Language (XML). http://www.w3.org/XML/.
[13] C. Faloutsos and R. Chan. Fast Text Access Methods for Optical and Large
Magnetic Disks: Designs and Performance Comparison. In Proceedings of
VLDB, 1988.
[14] L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary Cache: A Scalable
Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactions on
Networking 8(3):281–293, 2000.
[15] H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-Tree: Indexing XML Data for
Efficient Structural Joins. In Proceedings of ICDE, 2003.
[16] R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the
Integration of Structure Indexes and Inverted Lists. In Proceedings of
SIGMOD, 2004.
[17] H. Li, M.L. Lee, W. Hsu and C. Chen. An Evaluation of XML Indexes for
Structural Join. SIGMOD Record 33(3):28–33, 2004.
[18] Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path
Expressions. In Proceedings of VLDB, 2001.
[19] J. Lu, T. W. Ling, C-Y. Chan and T. Chen. From Region Encoding To
Extended Dewey: On Efficient Processing of XML Twig Pattern Matching. In
Proceedings of VLDB, 2005.
[20] M. D. McIlroy. Development of a Spelling List. IEEE Transactions on
Communications 30(1):91–99, 1982.
[21] M. Mitzenmacher. Compressed Bloom Filter. IEEE/ACM Transactions on
Networking 10(5):604–612, 2002.
[22] P. Rao and B. Moon. PRIX: Indexing And Querying XML Using Prufer
Sequences. In Proceedings of ICDE, 2004.
[23] The Penn Treebank Project. http://www.cis.upenn.edu/~treebank/home.html.
[24] X. Wu, M. L. Lee and W. Hsu. A Prime Number Labeling Scheme for
Dynamic Ordered XML Trees. In Proceedings of ICDE, 2004.
[25] XPath 2.0. http://www.w3.org/TR/XPath20/.
[26] XQuery 1.0. http://www.w3.org/TR/XQuery/.
[27] J. Zobel, A. Moffat and K. Ramamohanarao. Inverted files versus signature
files for text indexing. ACM Transactions on Database Systems,23(4):453–490,
1998.