利用節點間之結構關係為XML資料建立索引｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳穎毅 Ying-Yi Chen
論文名稱：	利用節點間之結構關係為XML資料建立索引 Indexing XML Data Using Structural Relationships between Nodes
指導教授：	陳良弼 Arbee L.P. Chen
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2005
畢業學年度：	93
語文別：	英文
論文頁數：	39
中文關鍵詞：	半結構性資料、索引、XML 、結構關係
外文關鍵詞：	Semi-Structured Data, Index, XML, Structural Relationship
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

對ＸＭＬ資料的查詢，通常可分成兩個部份：第一是節點（元素）的內容，第二則是節點間的（結構）關係。因為對內容的查詢，可以直接引用在傳統資料庫上已經發展成熟的技術來處理；因此，如何有效率地評估節點間的關係，對ＸＭＬ查詢引擎的效能有決定性的影響。

現今較具代表性的ＸＭＬ查詢處理技術，仿效資訊擷取領域中常用的觀念，將ＸＭＬ資料以反轉表(Inverted List)的形式儲存。要評估兩個節點是否滿足某種結構關係，可將存有此二節點的兩個反轉表取出，進行結構結合(Sturtural Join)之操作來得到答案。然而這種作法主要的問題是：在進行結構結合之前，必須先讀取兩個反轉表，導致可觀的I/O負擔。

本論文中，我們提出一種新的想法：根據節點間的結構關係，對ＸＭＬ資料中全部的節點建立索引。存在某個反轉表內的節點，可以根據他們跟其他節點間的結構關係（有哪些祖先？有哪些子孫？）來分類。因此，反轉表可以被進一步細分成許多子反轉表。如果已經知道某個節點屬於哪一類，就可以直接取出此類所對應的那些子反轉表，而不需要讀取整個反轉表。

我們的實驗顯示了這個作法在處理高度不規則的ＸＭＬ資料時，能對查詢效能有所助益。

Queries on XML data usually involve selections on contents (values) of nodes
and structural relationships between nodes. While querying by values can borrow
from traditional database technologies, how to efficiently evaluate structural portion
of a query is critical to the performance of XML query processing engines.
One representative approach to evaluate the structural portion is information
retrieval style processing using inverted lists. Nodes of the same label are collected in
one list. The containment relationship of two nodes is evaluated by performing
structural join on two lists. The major challenge of this approach is the high I/O cost
to access the lists on disk. In this thesis, we proposed a new index structure on the
inverted lists based on structural relationships between nodes. Nodes in one list can be
further classified to sub-lists by which children/ancestors/parent they have. Thus, if
we know the class to which candidate nodes belong, we can directly fetch the sub-list
of the class and skip unnecessary entries. Our experiments show the benefit of the
proposed approach in highly irregular XML data.

Abstract…………………………………………………………………………….II
Acknowledgements………………………………………………………………...III
Contents…………………………………………………………………………...IV
List of Figures………………………………………………………………………..V
List of Tables………………………………………………………………………VI
1. Introduction……………………………………………………………………..1
2. Preliminaries……………………………………………………………………7
3. Indexing By Set-Valued Attributes of Nodes…………………………………9
4. Subset Queries Using Bloom Filters…………………………………………13
4-1 Bloom Filter Basics………………………………………………………13
4-2 The False Positive Rate…………………………………………………16
4-3 The Subset Query………………………………………………………17
4-4 The Bit-Sliced Bloom Filter File (BSBFF)……………………………18
4-5 Integrating BSBFF into the ISR Scheme………………………………19
5. The Mixed Approach…………………………………………………………21
5-1 Analysis…………………………………………………………………21
5-2 The Mixed Approach……………………………………………………22
6. Experiments……………………………………………………………………24
7. Related Works…………………………………………………………………29
8. Conclusion……………………………………………………………………31
References…………………………………………………………………………32

                                

[1] S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu.
Structural joins: A primitive for efficient XML query pattern matching. In
Proceedings of ICDE, 2002.
[2] T. C. Bell, A. Moffat, C. G. Nevill-Manning, I. H. Witten and J. Zobel. Data
compression in full-text retrieval systems, Journal of the American Society for
Information Science, 44(9):508–531, 1993.
[3] B. H. Bloom. Space/time trade-offs in hash coding with allowable errors.
Communications of the ACM, 13(7):422–426, 1970.
[4] K. Bratbergsengen. Hashing Methods and Relational Algebra Operations. In
Proceedings of VLDB, 1984.
[5] A. Broder and M. Mitzenmacher. Network Applications of Bloom Filters: A
Survey. Internet Mathematics 1(4):485–509, 2003.
[6] N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML
pattern matching. In Proceedings of SIGMOD, 2002.
[7] D. D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language
for heterogeneous data sources. In Proceedings of WebDB, 2000.
[8] S. Chen, A. Ailamaki, P. B. Gibbons and T. C. Mowry. Inspector Joins. In
Proceedings of VLDB, 2005.
[9] T. Chen, J. Lu and T. W. Ling. On Boosting Holism in XML Twig Pattern
Matching using Structural Indexing Techniques. In Proceedings of SIGMOD,
2005.
[10] S-Y. Chien, Z. Vagena, D. Zhang, V. J. Tsotras, and C. Zaniolo. Efficient
Structural Joins on Indexed XML Documents. In Proceedings of VLDB, 2002.
[11] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A
query language for XML. Submission to theWorldWideWeb Consortium
19-August-1998. http://www.w3.org/TR/NOTE-xml-ql, 1998.
[12] Extensible Markup Language (XML). http://www.w3.org/XML/.
[13] C. Faloutsos and R. Chan. Fast Text Access Methods for Optical and Large
Magnetic Disks: Designs and Performance Comparison. In Proceedings of
VLDB, 1988.
[14] L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary Cache: A Scalable
Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactions on
Networking 8(3):281–293, 2000.
[15] H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-Tree: Indexing XML Data for
Efficient Structural Joins. In Proceedings of ICDE, 2003.
[16] R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the
Integration of Structure Indexes and Inverted Lists. In Proceedings of
SIGMOD, 2004.
[17] H. Li, M.L. Lee, W. Hsu and C. Chen. An Evaluation of XML Indexes for
Structural Join. SIGMOD Record 33(3):28–33, 2004.
[18] Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path
Expressions. In Proceedings of VLDB, 2001.
[19] J. Lu, T. W. Ling, C-Y. Chan and T. Chen. From Region Encoding To
Extended Dewey: On Efficient Processing of XML Twig Pattern Matching. In
Proceedings of VLDB, 2005.
[20] M. D. McIlroy. Development of a Spelling List. IEEE Transactions on
Communications 30(1):91–99, 1982.
[21] M. Mitzenmacher. Compressed Bloom Filter. IEEE/ACM Transactions on
Networking 10(5):604–612, 2002.
[22] P. Rao and B. Moon. PRIX: Indexing And Querying XML Using Prufer
Sequences. In Proceedings of ICDE, 2004.
[23] The Penn Treebank Project. http://www.cis.upenn.edu/~treebank/home.html.
[24] X. Wu, M. L. Lee and W. Hsu. A Prime Number Labeling Scheme for
Dynamic Ordered XML Trees. In Proceedings of ICDE, 2004.
[25] XPath 2.0. http://www.w3.org/TR/XPath20/.
[26] XQuery 1.0. http://www.w3.org/TR/XQuery/.
[27] J. Zobel, A. Moffat and K. Ramamohanarao. Inverted files versus signature
files for text indexing. ACM Transactions on Database Systems,23(4):453–490,
1998.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文