簡易檢索 / 詳目顯示

研究生: 莊若純
論文名稱: 藉由雙層邊界機制於交易資料庫之快速查詢系統
Efficient Query Processing in Transactional Databases by A Two-Level Bounding Mechanism
指導教授: 陳良弼
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2004
畢業學年度: 92
語文別: 中文
論文頁數: 52
中文關鍵詞: 交易相似度查詢邊界機制交易分群
外文關鍵詞: similarity search in transactions, bounding mechanism, transaction clustering
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 交易庫中交易的相似度查詢一直是一個很重要的議題。大部分現存的方法都是先預估查詢和交易的最小距離值(distance lower bound),移除不可能為答案的交易再進一步找相似度夠高的交易。另ㄧ方面這些方法都只將這個問題考慮在靜態交易資料庫上,因此並不適用在動態交易資料庫中。因為動態交易資料庫中更新是很頻繁的,這些方法並不夠快速。
    在這篇論文中,我們提出了一個藉由雙層邊界機制達到快速範圍內查詢(range search) 的方法。首先,我們先將交易分群並且將這些群表示為交易中所有項目(item)交集和聯集的集合。查詢和群中交易的最小距離值和最大距離值可以藉由這樣表示法代表的集合去預估。在這第一層的邊界機制中,我們便能得知哪些群中沒有任何交易和查詢夠相似或是哪些群中所有的交易都是相似度查詢的答案。
    任ㄧ群中我們又進一步將他們在做分群,並紀錄群中各交易和代表群的集合之間的距離關係。查詢和各個小群中交易的最小距離和最大距離值可以藉由這樣的紀錄最預估。在第一層邊界機制中那些被保留下來的群在第二層邊界機制中就是利用上述的紀錄作進一步的確認。我們可以得知哪些小群中沒有任何交易和查詢夠相似或是哪些小群中所有的交易都是相似度查詢的查案。需要算出和查詢真實距離的交易只有那些在第二層邊界機制仍無法判斷的小群中的交易。
    在資料更新上我們的機制也是相當快速的。實驗結果亦證明我們在新增、刪除、查詢處理或是移除效果(pruning effect)上都有較以往方法更好的表現。


    Similarity search in transaction databases has been an important issue. Most existing methods use the branch and bound technique to prune transactions that are impossible to be the answers. However, only the distance lower bounds are estimated for the pruning. Also these methods only consider the environment of the static databases. As a result, these methods are not efficient enough when they are employed in the dynamic environment where the transaction database is frequently updated.
    In this paper, a novel two-level bounding mechanism for efficient range query processing is developed. In our approach, we first group the transactions in the database into clusters. By representing the cluster based on the intersection and union of the items in the transactions belonging to the cluster, the distance lower and upper bounds between a given query and the transactions in the cluster are derived for the query processing. Then, whether the cluster contains no transaction that is similar to the query within distance threshold or all the transactions in the cluster are answers to the query can be determined efficiently at the first-level bounding mechanism.
    In a cluster, we group the transactions into sets. By the distance correlations between each set of transactions and the representative features of the cluster, the distance lower and upper bounds between the given query and the transactions in the set are derived further for the query processing. Thus, whether the set of transactions contains no transaction that is similar to the query within distance threshold or all the transactions in the set are answers to the query can be determined efficiently at the second-level bounding mechanism. Only the real distances between the remaining sets of transactions and the query should be computed.
    The maintenance of the proposed mechanism is efficient. The cluster of a new transaction can be easily determined using the bounding mechanism. In addition, the update after a transaction is inserted into or deleted from a cluster is also easy since only the sets in that cluster are affected. Experimental results show that our approach is superior to the previous works in query processing time, pruning effects and processing time for update.

    Abstract..................................................II Acknowledgement...........................................IV Contents...................................................V List of Figures...........................................VI List of Tables...........................................VII 1.Introduction.............................................1 2.Related Work.............................................7 2.1 Signature Table........................................7 2.2 SG-tree................................................9 2.2.1 Insertion, Deletion and Update .....................10 2.2.2 Query Processing....................................12 3.The First-Level Bounding Mechanism .....................15 3.1 Features and Properties of The Cluster................15 3.2 The First-Level Query Processing .....................17 3.3 The First-Level Transaction Clustering................19 4.The Second-Level Bounding Mechanism.....................23 4.1 The Difference-Value Pair for The Second-Level Transaction Clustering....................................23 4.2 The Second-Level Query Processing.....................26 5.Insertion, Deletion and The Maintenance of Data Update..31 5.1 The Maintenance of Data Update........................31 5.2 Insertion and Deletion................................33 6.Experimental Results....................................35 6.1 Pruning Effects.......................................35 6.2 Correlation among □, The Number of Clusters and The Query Processing Time.....................................35 6.3 Correlation between The Node Size and The Query Processing Time of SG-tree................................37 6.4 Comparison and Analysis...............................38 7.Conclusion..............................................42 Reference.................................................43

    [1] Aggarwal C. C., J. L. Wolf and P. S. Yu, “A New Method for Similarity Indexing of Market Basket Data,” Proc. of ACM International Conference on Management of Data (SIGMOD), pages 407-418, 1999.
    [2] Gionis A., D. Gunopulos and N. Koudas, “Efficient and Tunable Similar Set Retrieval,” Proc. of ACM International Conference on Management of Data (SIGMOD), pages 247-258, 2001.
    [3] Gibson D., J. Kleinberg and P. Raghavan, “Clustering Categorical Data: An Approach based on Dynamical Systems,” The VLDB Journal Volume 8 Numbers 3-4, pages 222-236, February 2000.
    [4] Guha S., R. Rastogi, and K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Proc. of IEEE International Conference on Data Engineering (ICDE), pages 512-521, 1999.
    [5] Ganti V., J. Gehrke and R. Ramakrishnan, “CACTUS-Clustering Categorical Data Using Summaries,” Proc. of ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 73-83, 1999.
    [6] Han E.H., G. Karypis and V. Kumar, “Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results,” Bulletin of the Technical Committee on Data Engineering (TCDC), Vol. 21, No. 1, March 1998.
    [7] Hellerstein J.M. and Arbee A. Preffer, “The RD-Tree: An Index Structure for Sets,” University of Wisconsin Computer Science Technical Report 1252, November 1994.
    [8] Mamoulis N., David W. Cheung and W. Lian, “Similarity Search in Sets and Categorical Data Using the Signature Tree,” Proc. of IEEE International Conference on Data Engineering (ICDE), pages 73-83, 2003.
    [9] Nanopoulos A. and Y. Manolopoulos, “Efficient Similarity Search for Market Basket Data,” The VLDB Journal Volume 11 Number 2, pages 138-152, 2002.
    [10] Roussopoulos N., S. Kelley and F. Vincent, “Nearest Neighbor Queries,” Proc. of ACM International Conference on Management of Data (SIGMOD), pages 71-79, 1995.
    [11] Strehl A. and J. Ghosh, “A Scalable Approach to Balanced, High-dimensional Clustering of Market-baskets,” Proc. of International Conference on High Performance Computing (HiPC), volume 1970 of LNCS, pages 525-536, December 2000.
    [12] C. Ordonez, E. Omiecinski and N. Ezquerra, “A Fast Algorithm to Cluster High Dimensional Basket Data,” Proc. of IEEE International Conference on Data Mining (ICDM), pages 633-636, 2001.
    [13] Wojna A., “Center-Based Indexing for Nearest Neighbors Search,” Proc. of IEEE International Conference on Data Mining (ICDM), pages 681-684, 2003.
    [14] Wang K., C. Xu and B. Liu, “Clustering Transactions Using Large Items,” Proc. of International Conference on Information and Knowledge Management (CIKM), pages 483-490, 1999.
    [15] Y. Xiao and M.H. Dunham, “Interactive Clustering for Transaction Data,” Proc. of International Conference on Data Warehousing and Knowledge Discovery (DaWaK), pages 124-131, 2001.
    [16] Yun C.H., K.T. Chuang and M.S. Chen, “An Efficient Clustering Algorithm for Market Basket Data based on Small Large Ratios,” Proc. of International Computer Software and Applications Conference (CAMPSAC), pages 505-510, 2001.
    [17] Yun C.H., K.T. Chuang and M.S. Chen, “Self-Tuning Clustering: An Adaptive Clustering Method for Transaction Data,” Proc. of International Conference on Data Warehousing and Knowledge Discovery (DaWaK), pages 42-51, 2002.
    [18] Yun C.H., K.T. Chuang and M.S. Chen, “Using Category-Based Adherence to Cluster Market-Basket Data,” Proc. of IEEE International Conference on Data Mining (ICDM), pages 546-553, 2002.
    [19] Yang Y and B. Padmanabhan, “Segmenting Customer Transaction Using a Pattern-Based Clustering Approach,” Proc. of IEEE International Conference on Data Mining (ICDM), pages 441-448, 2003.
    [20] Yang Y. X. Guan and J. You, “CLOPE: A Fast and Efficient Clustering Algorithm for Transactional Data,” Proc. of ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 682-687, 2002.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE