研究生: |
葉志剛 Chih-Kang Yeh |
---|---|
論文名稱: |
處理多串流環境下探勘頻繁資料集之大量查詢 Processing Multiple Queries of Finding Frequent Itemsets over Multiple Data Streams |
指導教授: |
陳良弼
Arbee L.P. Chen |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 47 |
中文關鍵詞: | 資料串流 、項目集 、多查詢 、多串流 、計畫最佳化 |
外文關鍵詞: | data streams, itemset, multiple queries, multile streams, plan optimization |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
資料串流環境上作探勘的研究是近年來的熱門的題目,在相關的領域中也有多樣的研究已被發表出來,然後,大部分的研究都只著重在單一的資料串流上,而忽略多串流環境的限制,導致許多已被發表的方法無法被實際的運用,因此促成了這篇論文的誕生,我們注意到了在多串流環境上,一個良好的執行計畫是重要的,於是我們提出了plan graph的概念幫助我們列舉出所有可能的執行計畫,並進一步對這個graph被刪減的動作,最後我們提出一個直覺的方法能迅速的找到一個接近完美的執行計畫,在這篇論文的最後,我們以實驗的方式來驗証我們的理論。
Data stream management is a hot topic of recent interests. Mining frequent itemsets has been widely studied in this field. However, most of the past works focus on mining frequent itemsets over a single data stream. Unfortunately, the research can be impractical in real applications, such as network management, market analysis, etc. In the last few years, several researchers have noticed this problem and proposed several approaches to address it. Mining frequent itemsets over multiple data streams is one of the major objectives of our work. Furthermore, we consider the request for frequent itemsets over multiple data streams as a special type of queries. We propose a framework for processing multiple queries over multiple data streams. The major difficulty of the work is to handle a large amount of data from several data streams to efficiently produce the mining results. Thus, an efficient processing plan is essential. In this paper, we introduce the concept of the Plan Graph for enumerating all possible processing plans. By using the Plan Graph, we transform the problem of finding the optimal processing plan into the problem of finding the minimum-cost tree on a graph. Although the Plan Graph enumerates successfully all possible processing plans, finding the optimal processing plan can be time-consuming. Therefore, we propose a heuristic approach to efficiently find a plan close to the optimal one. Finally, we verify the efficiency of our approach by the experiments.
[1] A. Arasu and G. S. Manku. Approximate Counts and Quantiles over Sliding Windows. In Proc. of 23th Intl. Conf. on ACM PODS, pages 286-296, 2004.
[2] J. Xu Yu, Z. Chong, H. J. Lu and A. Zhou. False Positive or False Negative: Mining Frequent Itemsets form High Speed Transactional Data Streams. In Proc. of 30th Intl. Conf. on Very Large Data Bases, pages 204-215, 2004
[3] G. S. Manku and R. Motwani. Approximate Frequency Counts over Data Streams. In Proc. of 28th Intl. Conf. on Very Large Data Bases, pages 346-357
[4] B. Babcock and C. Olston. Distributed Top-K Monitoring. In Proc. of 2003 ACM SIGMOD, pages 28-39, 2003
[5] A. Manjhi, V. Shkapenyuk and K. Dhamdhere. Finding(Recently) Frequent Items in Distributed Data Streams. In Proc. of 21st Intl. Conf. on Data Engineering, pages 767-778, 2005
[6] G. Cormode, and S. Muthukrishnan What’s Hot and What Not: Tracking Most Frequent Items Dynamically. In Proc. of 22th Intl. Conf. on ACM PODS, pages 296-306, 2003
[7] C.H. Lin, D.Y. Chiu, Y.H. Wu and L.P. Chen. Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window. In Proc. of 5th SIAM Intl. Conf. on Data Mining, 2005
[8] B. Babcock, S. Babu, M. Datar, R. Motwani and J. Widom. Models and Issues in Data Streams Systems. In Proc. of 21th Intl. Conf on ACM PODS, pages 1-16, 2002.
[9] L. Golab and M. T. Ozsu. Issue in Data Stream Management. In Rcord of 2003 ACM SIGMOD, Vol. 32, No 2, 2003
[10] M. Greenwald and Sanjeev Khanna, Space-Efficient Online Computation of Quantile Summaries. In Proc. of Intl. Conf on ACM SIGMOD, 2001
[11] R. Agrawal and R. Srikant. Fast Algorithm for Mining Association Rules. In Proc. of 20th Intl. Conf. on Very Large Data Bases, pages 478-499, 1994
[12] J. Y. Wang, J. W. Han and J. Pei, CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets. In Proc. of 10th Intl. Conf. of ACM KDD, pages 236-245, 2003.
[13] R. M. Karp, C. H. Papadimitriou and S Shenker. A Simple Algorithm for Finding Frequent Elements in Streams and Bags. In ACM Tran. On Database System, 2003.
[14] M. J. Zaki and C. J. Hsiao. CHARM: An Efficient Algorithm for Closed Itemset Mining. In Proc. of 2nd of SIAM Intl. Conf. on Data Mining, 2005.
[15] C. Q. Jin, W. N. Qian, C. F. Sha, J. X. Yu and A. Y. Zhou. Dynamically Maintaining Frequent Items over a Data Stream. In Proc. of Intl. Conf. on Information and Knowledge, pages 287-294, 2003
[16] M. Charikar, K. Chen and M. Farach-Colton. Finding frequent items in data streams. In Proc. of the 29th ICALP, 2002.
[17] J. W. Han, J. Pei and Y Lu. Mining frequent patterns without candidate generation. In Proc. of Intl. Conf on ACM SIGMOD, 2000
[18] L. Zosin and S. Khuller. On Directed Steiner Trees. Information Processing Letters, 2002