簡易檢索 / 詳目顯示

研究生: 劉家燕
Jane Liu
論文名稱: 交易間關聯法則的探勘與資料探勘問題的分類之研究
A Study on Inter-transaction Association Rule Mining and the Classification of the Data Mining Problems
指導教授: 陳良弼
Arbee L. P. Chen
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2000
畢業學年度: 88
語文別: 中文
論文頁數: 34
中文關鍵詞: 交易間關聯法則資料探勘問題分類
外文關鍵詞: inter-transaction association rules, non-incremental counting procedure, pattern mining, problem classification
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著商業趨勢分析應用的需求增加,探勘交易間關聯法則的問題成為資料探勘的重要課題之一。當搜尋的關聯法則不再只是出現在單筆交易的內部,而是橫跨交易之間時,便成為另一項新的挑戰。
    過去的研究是用 Apriori-like 的方式以遞增的方式來解交易間關聯法則都問題。在這篇論文中我們提出兩個 graph-based 的演算法分別探勘有條件限制以及最大交易間關聯法則。因為有時候使用者的查詢是『如果今天A股票上漲,那麼可以在後面幾天買哪兩支股票?』像這種有條件限制的查詢,我們希望利用我們提出的演算法可以更有效率的找到符合使用者需求的結果。此外,探勘最大交易間關聯法則其實是達到探勘所有交易間關聯法則的另一個策略。從實驗數據得知,利用我們的演算法找尋較長的交易間關聯法則會比 Apriori-like 要快。在這篇論文中,我們同時探討並分析從多筆交易或多個序列延伸出來的資料探勘的問題,並將之分為五類。


    With the growing interest in commercial trend analysis, mining inter-transaction association rules in transaction databases has become an important data mining research. Mining inter-transaction association rules create new challenging problems since they involve date items in multiple transactions.
    Previous studies adopt an Apriori-like generation-and-test approach utilized to mine intra-transaction associations. In this study, we propose graph-based algorithms to discover specific large-k inter-transaction itemsets and maximal large inter-transaction itemsets (a maximal large inter-transaction itemset is not a subset of any large inter-transaction itemset.). Most users are interested in some specified rules, hence, discovering specific large-k inter-transaction itemsets efficiently will be helpful to satisfy users' requirements. Moreover, finding maximal large inter-transaction itemsets is another solution for mining all large inter-transaction itemsets. Our performance shows that our algorithms perform well, especially for those potentially long large inter-transaction itemsets. In addition, we analyze the approaches on mining various patterns from transaction or sequence databases. As a result, a classification of this problem into five classes is formed.

    ABSTRACT ii ACKNOWLEDGEMENTS iii CONTENTS iv LIST OF FIGURES vi CHAPTER 1. INTRODUCTION 1 CHAPTER 2. RELATED WORK 3 2.1 PROBLEM CLASSIFICATION 3 2.2 RELATIONSHIPS 5 2.2.1 Relationship between Inter-transaction Association Rules and Sequential Patterns 5 2.2.2 Relationship between Partial Periodical Patterns and Inter-transaction Association Rules 6 2.2.3 Relationship between Browsing Behaviors and Sequential Patterns 7 CHAPTER 3. GRAPH-BASED ALGORITHMS 8 3.1 LARGE-1 EITEMSET GENERATION PHASE 10 3.2 GRAPH-INDEX CONSTRUCTION PHASE 10 3.3 LARGE-K EITEMSET GENERATION PHASE 11 3.4 MAXIMAL LARGE EITEMSETS GENERATION PHASE 13 CHAPTER 4. EXPERIMENTAL RESULTS 15 4.1 SYNTHETIC DATA GENERATION 15 4.2 EFFECT OF LARGE-2 GENERATION 16 4.3 EFFECT OF LARGE-K GENERATION AND DATABASE SIZE 17 4.4 EFFECT OF WINDOW SIZE 19 4.5 EFFECT OF MINIMUM SUPPORT 20 4.6 EFFECT OF FINDING MAXIMAL LARGE EITEMSETS 21 CHAPTER 5. CONCLUSION AND FUTURE WORK 23 BIBLIOGRAPHY: 24

    [AIS93] R. Agrawal, T. Imielinski and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," in Proc. ACM Int. Conf. Management of Data, pages 207-216, Washington, D.C., May 1993.
    [AS94] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," in Proc. ACM Int. Conf. on Very Large Data Bases, pages 487-499, Santigo, September 1994.
    [AS95] R. Agrawal and R. Srikant, "Mining Sequential Patterns," in Proc. IEEE Int. Conf. on Data Engineering, Taipei, Taiwan, March 1995.
    [Bay98] R. J. Bayardo, "Efficiently Mining Long Patterns from Databases," in Proc. ACM Int. Conf. on Management of Data, 1998.
    [BeWa98a] C Bettini, X. Sean Wang, S. Jojodia and J.-L. Lin, "Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences," IEEE Trans. on Knowledge and Data Engineering, Vol. 10, No. 2, 1998.
    [BeWa98b]C Bettini, X. S. Wang and S. Jojodia, "Mining Temporal Relationships with Multiple Granularities in Time Sequences," IEEE Bulletin of the Technical Committee on Data Engineering (TCDE), Vol. 21, No. 1, pages 32~38, 1998.
    [BMUT97] S. Brin, R. Motwani, J. D. Ullman and S. Tsur, "Dynamic Itemset Counting and Implication Rules for Market Basket Data," in Proc. ACM Int. Conf. Management of Data, pages 255-264, 1997.
    [CHY96] M. S. Chen, J. Han and P. S. Yu, "Data Mining: An Overview from A Database Perspective," IEEE Trans. on Knowledge and Data Engineering, 5:926-938, 1996.
    [CPY98] M. S. Chen, J. S. Park and P. S. Yu, "Efficient Data Mining for Path Traversal Patterns," IEEE Trans. on Knowledge and Data Engineering, 10(2): 209-220, March/April 1998.
    [FLYH99] L. Feng, H. Lu, J. X. Yu and J. Han, "Mining Inter-Transaction Associations with Templates," in Proc. ACM Int. Conf. Information and Knowledge Management, Kansas City, Missouri, November 1999.
    [HDY99] J. Han. G. Dong and Y. Yin, "Efficient Mining of Partial Patterns in Time Series Database," in Proc. IEEE Int. Conf. on Data Engineering, pages 106-115, Sydney, Australia, March 1999.
    [Hsu98] J. L. Hsu, C. C. Liu, and A. L. P. Chen, "Efficient Repeating Pattern Finding in Music Databases," in Proc. ACM Int. Conf. Information and Knowledge Management, pp. 281 ~ 288, 1998.
    [KS99] K. S. Wang, "Web Log Mining: Discovering User Access Patterns", Master Thesis, in NCU DBLab, 1999
    [LHF98] H. Lu, J. Han and L. Feng, "Stock Movement Prediction And N-Dimensional Inter-Transaction Association Rules," in Proc. ACM Workshop on Research Issues on Data Mining and Knowledge Discovery, pages 12:1-12:7, Seattle, Washington, June 1998.
    [Liu99] C. C. Liu, J. L. Hsu and A. L. P. Chen, "Efficient Theme and Non-Trivial Repeating Pattern Discovering in Music Databases," in Proc. IEEE Int. Conf. on Data Engineering, 1999.
    [MaTo95] H. Mannila, H. Toivonen and A. I. Verkamo, "Discovering Frequent Episodes in Sequences," in Proc. Int. Conf. on Knowledge Discovery and Data Mining, 1995.
    [ORS98] B. Ozden, S. Ramaswamy and A. Silberschtaz, "Cyclic Association Rules," in Proc. IEEE Int. Conf. on Data Engineering, pages 412-421, Orlando, Florida, February 1998.
    [PCY95] J. S. Park, M. S. Chen and P. S. Yu, "An Effective Hash-based Algorithm for Mining Association Rules," in Proc. ACM Int. Conf. Management of Data, pages 175-186, San Jose, May 1995.
    [PZOD99] S. Parthasarathy, M. J. Zaki, M. Ogihara and S. Dwarkadas, "Incremental and Interactive Sequence Mining," in Proc. ACM Int. Conf. Information and Knowledge Management, Kansas City, Missouri, November 1999.
    [RMS98] S. Ramaswamy, S. Mahajan and A. Siberschatz, "On the Discovery of Interesting Patterns in Association Rules," in Proc. ACM Int. Conf. on Very Large Data Bases, pages 368-379, New York City, USA, August 1998.
    [SA96] R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations and Performance Improvements," in Proc. Int. Conf. Extending Database Technology, Avignon, France, March 1996.
    [SON95] A. Savasere, E. Omiecinski and S. Navathe, "An Efficient Algorithm for Mining Association Rules in Large Databases," in Proc. ACM Int. Conf. on Very Large Data Bases, pages 432-443, Zurich, September 1995.
    [Tang98] J. Tang, "Using Incremental Pruning to Increase the Efficiency of Dynamic Itemset Counting for Mining Association Rules," in Proc. ACM Int. Conf. Information and Knowledge Management, pages 273-280, 1998
    [TLHF99] A. K. H. Tung, H. Lu, J. Han and L. Feng, "Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules," in Proc. Int. Conf. Knowledge Discovery and Data Mining, San Diego, California, August 1999.
    [Toi96] H. Toivonen, "Sampling Large Databases for Association Rules," in Proc. ACM Int. Conf. on Very Large Data Bases, pages 134-145, Bombay, September 1996.
    [YC95] S. J. Yen and A. L. P. Chen, "An Efficient Algorithm for Deriving Compact Rules from Databases," in Proc. Int. Conf. on Database Systems for Advanced Applications, pages 364-371, 1995.
    [YC96a] S. J. Yen and A. L. P. Chen, "The Analysis of Relationships in Databases for Rule Derivation," Journal of Intelligent Information Systems, Vol.7, pages 1-24, 1996.
    [YC96b] S. Y. Yen and A. L. P. Chen, "An Efficient Approach to Discovering Knowledge from Large Databases," in Proc. IEEE/ACM Int. Conf. on Parallel and Distributed Information Systems, pages 8-18, 1996.
    [Wang97] K. Wang, "Discovering Patterns from Large and Dynamic Sequential Data," Journal of Intelligent Information Systems, 9:33~56, Kluwer Academic Publishers, 1997
    [WC99] Y. H. Wu and A. L. P. Chen, "Prediction of WebPage Access by Using Proxy Server Logs," in preparation, 1999.
    [WH99] W. H Chen, "Mining Webflows Using User Browsing Behaviors in the WWW Environment," Master Thesis, in NTHU DBLab, 1999.
    [Zaki98] M. J. Zaki, "Efficient Enumeration of Frequent Sequences," in Proc. ACM Int. Conf. Information and Knowledge Management, pages 68-75, Bethesda, Maryland, November 1998.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE