研究生: |
黃偉哲 Webber Hunag |
---|---|
論文名稱: |
搜尋K個最頻繁閉項目集的複合方法 A Hybrid Method for Top-K Frequent Closed Patterns Mining |
指導教授: |
許奮輝
Simon Sheu |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 26 |
中文關鍵詞: | 資料探勘 、最小支持 、頻繁 、閉 、最低長度 |
外文關鍵詞: | Data Mining, Minimum support, Frequent, Closed, Minimal Length |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於目前的資料量越來越多,所以資料庫也越來越大。如何龐大的資料庫中快速找尋使用者感興趣的資料成為重要的課題。資料探勘這個研究課題於是興起。這篇論文便是屬於資料探勘的範疇。在“A Hybrid Method for Mining Frequent Closed Patterns”這篇論文中,我們使用整合了horizontal format和vertical format的方法來增進mining的效率。在2002年ICDM Han所提出的paper“Mining Top-K Frequent Closed Patterns without Minimum Support”中提出一個新的議題,找出K個長度不小於Minimal Length的frequent closed patterns,稱為TFP。 新增加了K(找出K個答案)和Minimal Length(長度限制)。其特殊之處在於不需使用者設定minimum support,而是在找尋答案的過程中動態地提升minimum support來增加搜尋的速度。我們基本上使用和“A Hybrid Method for Mining Frequent Closed Patterns”這篇文章中一樣的方法,使用交集pattern的方法做搜尋。由於多出了Minimal Length的限制,使得我們的方法在交集過程中可以過濾掉大量長度不符合Minimal Length的交集產物,大大提升mining的效率。
我們的實驗將表現出在一般的dataset中,在Minimal Length變大時,我們的效率將會大幅提升。另外在比較特殊的dataset,transaction長度很長但transaction數目較少的情況,我們的mining效果比TFP要更佳。
More and more data cause the size of database very large. How to find the data that user interested in is a important task like Data Mining. This paper is about Data Mining. In the paper “A Hybrid Method for Mining Frequent Closed Patterns”, we combine the method about horizontal format and vertical format to improve performance of mining. During 2002, a new task has be presented. Finding the K most frequent closed patterns that its length no smaller than Minimal Length(TFP). The K and Minimal Length are defined by users. The difference with traditional method is users do not need to defined minimum support. Minimum support is rising dynamically during mining. Our method is like “A Hybrid Method for Mining Frequent Closed Patterns”, mining based on intersection of patterns. Because of the constraint “Minimal Length”, we can prove a lot of candidates which its length small than Minimal Length during mining.
In general datasets, with increasing Minimal length, our mining is more efficient. In the special datasets, like many columns but less rows, our method outperforms TFP.
[1] J. Wang, J. Han, Y. Lu, and P. Tzvetkov, "TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets," IEEE Transactions on Knowledge and Data Engineering, vol. 17, pp. 652-664, 2005.
[2] J. Yang, W. Wang, and P. S. Yu, "STAMP: On Discovery of Statistically Important Pattern Repeats in Long Sequential Data," presented at SIAM International Conference on Data MIning(SDM), San Francisco, CA, USA, 2003.
[3] Q. Zou, W. W. Chu, and B. Lu, "SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets," presented at ICDM, Maebashi City, Japan, 2002.
[4] W.-G. Teng, M.-S. Chen, and P. S. Yu, "A Regression-Based Temporal Pattern Mining Scheme for Data Streams," presented at VLDB, Berlin, Germany, 2003.
[5] Z. Zheng, R. Kohavi, and L. Mason, "Real World Performance of Association Rule Algorithms," presented at KDD, San Francisco, CA, USA, 2001.
[6] J. Pei, G. Dong, W. Zou, and J. Han, "On Computing Condensed Frequent Pattern Bases," presented at ICDM, Maebashi City, Japan, 2002.
[7] F. Bonchi and C. Lucchese, "On Closed Constrained Frequent Pattern Mining," presented at ICDM, Brighton, UK, 2004.
[8] Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz, "Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window," presented at ICDM, Brighton, UK, 2004.
[9] J. Han, J. Wang, Y. Lu, and P. Tzvetkov, "Mining Top-K Frequent Closed Patterns without Minimum Support," presented at ICDM, Maebashi City, Japan, 2002.
[10] H. Xiong, P.-N. Tan, and V. Kumar, "Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution," presented at ICDM, Melbourne, Florida, USA, 2003.
[11] J. Han, J. Pei, and Y. Yin, "Mining Frequent Patterns without Candidate Generation," presented at SIGMOD, Dallas, Texas, USA, 2000.
[12] A. Pietracaprina and D. Zandolin, "Mining Frequent Itemsets using Patricia Tries," presented at ICDM Workshop on Frequent Itemset Mining Implementations(FIMI), Melbourne, Florida, USA, 2003.
[13] G. Grahne and J. Zhu, "Mining Frequent Itemsets from Secondary Memory," presented at ICDM, Brighton, UK, 2004.
[14] J. Pei, J. Han, and L. V. S. Lakshmanan, "Mining Frequent Item Sets with Convertible Constraints," presented at ICDE, Heidelberg, Germany, 2001.
[15] J. Liu, Y. Pan, K. Wang, and J. Han, "Mining Frequent Item Sets by Opportunistic Projection," presented at KDD, Edmonton, Alberta, Canada, 2002.
[16] J.-F. Boulicaut and B. Jeudy, "Mining Free Itemsets under Constraints," presented at International Database Engineering and Applications Symposium(IDEAS), Grenoble, France, 2001.
[17] B. Liu, W. Hsu, and Y. Ma, "Mining Association Rules with Multiple Minimum Supports," presented at KDD, San Diego, CA, USA, 1999.
[18] B. Goethals, "Memory issues in frequent itemset mining," presented at ACM Symposium on Applied Computing(SAC), Nicosia, Cyprus, 2004.
[19] M. Seno and G. Karypis, "LPMiner: An Algorithm for Finding Frequent Itemsets Using Length-Decreasing Support Constraint," presented at ICDM, San Jose, California, USA, 2001.
[20] M. ElHajj and O. R. Zaiane, "Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining," presented at KDD, Washington, DC, USA, 2003.
[21] T. Mielikainen, "Intersecting Data to Closed Sets with Constraints," presented at ICDM Workshop on Frequent Itemset Mining Implementations(FIMI), Melbourne, Florida, USA, 2003.
[22] W. Cheung and O. R. Zaiane, "Incremental Mining of Frequent Patterns Without Candidate Generation or Support Constraint," presented at International Database Engineering and Applications Symposium(IDEAS), Hong Kong, China, 2003.
[23] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, "H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases," presented at ICDM, San Jose, California, USA, 2001.
[24] F. Bonchi and B. Goethals, "FP-Bonsai: the Art of Growing and Pruning Small FP-Trees," presented at Advances in Knowledge Discovery and Data Mining, Pacific-Asia Conference(PAKDD), Sydney, Australia, 2004.
[25] M. J. Zaki and K. Gouda, "Fast Vertical Mining Using Diffsets," presented at KDD, Washington, DC, USA, 2003.
[26] G. Cong, A. K. H. Tung, X. Xu, F. Pan, and J. Yang, "FARMER: Finding Interesting Rule Groups in Microarray Datasets," presented at SIGMOD, Paris, France, 2004.
[27] A. Moffat and L. Stuiver, "Exploiting Clustering in Inverted File Compression," presented at Data Compression Conference(DCC), Snowbird, Utah, 1996.
[28] G. Grahne and J. Zhu, "Efficiently Using Prefix-trees in Mining Frequent Itemsets," presented at ICDM Workshop on Frequent Itemset Mining Implementations(FIMI), Melbourne, Florida, USA, 2003.
[29] C. Bucila, J. Gehrke, D. Kifer, and W. White, "DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints," presented at KDD, Edmonton, Alberta, Canada, 2002.
[30] J. Li and Y. Zhang, "Direct Interesting Rule Generation," presented at ICDM, Melbourne, Florida, USA, 2003.
[31] J. Wang, J. Han, and J. Pei, "CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets," presented at KDD, Washington, DC, USA, 2003.
[32] J. Pei, J. Han, and R. Mao, "CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets," presented at SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery(DMKD), Dallas, Texas, USA, 2000.
[33] M. J. Zaki and C.-J. Hsiao, "CHARM: An Efficient Algorithm for Closed Itemset Mining," presented at SIAM International Conference on Data MIning(SDM), Arlington, VA, USA, 2002.
[34] W.-Y. Kim, Y.-K. Lee, and J. Han, "CCMine: Efficient Mining of Confidence-Closed Correlated Patterns," presented at Advances in Knowledge Discovery and Data Mining, Pacific-Asia Conference(PAKDD), Sydney, Australia, 2004.
[35] F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. J. Zaki, "CARPENTER: Finding Closed Patterns in Long Biological Datasets," presented at KDD, Washington, DC, USA, 2003.
[36] J. Wang and G. Karypis, "BAMBOO: Accelerating Closed Itemset Mining by Deeply Pushing the Length-Decreasing Support Constraint," presented at SIAM International Conference on Data MIning(SDM), Lake Buena Vista, Florida, USA, 2004.
[37] P.-Y. Hsu, Y.-L. Chen, and C.-C. Ling, "Algorithms for mining association rules in bag databases," Information Sciences, vol. 166, pp. 31-47, 2004.
[38] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, "Adaptive and Resource-Aware Mining of Frequent Sets," presented at ICDM, Maebashi City, Japan, 2002.