研究生: |
羅子澄 |
---|---|
論文名稱: |
運用於Hadoop雲端運算的資料探勘混合編碼演算法 A hybrid algorithm with TID Apriori algorithm And Binary encoded algorithm on Hadoop |
指導教授: | 唐傳義 |
口試委員: |
唐傳義
李家同 盧錦隆 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | 平行程式 、關聯式法則 、資料探勘 |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現今的世界上資料探勘已經成為一項非常重要的研究,而在資料探勘的研究範圍中關聯式規則的尋找是其中主要的研究方向之一。
在這篇論文中我們針對在關聯式規則的尋找中主要的一種演算法,Apriori演算法,來進行研究和改進。Apriori演算法在處裡大量資料時有一個嚴重的效能瓶頸,此效能瓶頸源於Apriori演算法在運作過程中所產生的大量過渡資料。然而不幸的是在現實的世界裡我們通常都必須要面對非常大量的資料。
因此我們在這篇論文中提供一種混合式的演算法,混合了Apriori-like演算法中的TID Apriori演算法以及二進位的編碼方式。藉由混合這兩種演算法,我們在實驗之後得到相對於TID Apriori演算法以及Apriori演算法更好的效能。但是我們所提供的演算法仍然屬於Apriori-like演算法,依然有Apriori-like演算法所共有的效能瓶頸問題。
為了克服此瓶頸問題我們將演算法以MapReduce programing model實作在Hadoop 環境上,藉此將我們的演算法平行化。
關鍵字 : 資料探勘 , 關聯式規則 , 平行 , Apriori 演算法
[1] Zhaoyang Qu , Shilin Zhang The WAMS Power Data Processing based on Hadoop
IPCSIT vol. 25 (2012) © (2012) IACSIT Press, Singapore
[2] Xin Yue Yang , Zhen Liu , Yan Fu MapReduce as a Programming Model for Association Rules Algorithm on Hadoop
[3] Jean-Daniel Cryans , Sylvie Ratt’e , Roger Champagne Adaptation of Apriori to MapReduce to Build a Warehouse of Relations Between Named Entities Across theWeb
[4] Jochen Hipp , Ulrich G¨untzer , Gholamreza Nakhaeizadeh Algorithms for Association Rule Mining – A General Survey and Comparison
[5] Lin Guo , Xiongfei Li Using Apriori to Mine IoT Frequent Structures on Compute Cloud
[6] Alex Nanopoulos , Yannis Manolopoulos Finding Generalized Path Patterns for Web Log Data Mining J. Stuller et al. (Eds.): ADBIS-DASFAA 2000, LNCS 1884, pp. 215{228, 2000.c Springer-Verlag Berlin Heidelberg 2000
[7] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” in In Proceedings of OSDI’04: Sixth Symposium on Operating System Design and Implementation, December 2004.
[8] H. Yang, A. Dasdan, R. Hsiao, and D. Parker, “Map-reduce-merge: simplified relational data processing on large clusters,” in In Proceedings of the 2007 ACM SIGMOD international Conference on Management of Data, June 11–14 2007, pp. 1029–1040.
[9] Apache Hadoop. Available at http://hadoop.apache.org
[10] Apache Hadoop. Available at http://hadoop.apache.org
[11] R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. Proceedings of ACM SIGMOD, pages 207{216, May 1993.
[12] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases, pages 478-499, September 1994.
[13] J.-S. Park, M.-S. Chen, and P. S. Yu. E_cient Parallel Data Mining for Association Rules. Proceedings of the 4th Intern'l Conf. on Information and Knowledge Management, pages 31-36, Nov. 29 - Dec. 3, 1995.
[14] R. Lammel. Google's MapReduce programming model|Revisited. Science of Computer Programming, 68(3):208{237, 2007.
[15] J. Dean. Experiences with MapReduce, an abstraction for large-scale computation. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques. ACM New York, NY, USA, 2006.
[16] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB'94, pp. 487{499.
[17] M.Zaharia, A.Konwinski, A.Joseph, Y.zatz, and I.Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI’08: 8th USENIX Symposium on Operating Systems Design and Implementation, October 2008.
[18] T.Chao, H.Zhou, Y.He, and L.Zha. A Dynamic MapReduce Scheduler for Heterogeneous Workloads. IEEE Computer Society, 2009.
[19] T. White, Hadoop: The Definitive Guide. O'Reilly Media, Yahoo! Press, June 5, 2009.