研究生: |
張雅芳 Chang, Ya-Fang |
---|---|
論文名稱: |
利用MapReduce實作分散式協同推薦系統 MapReduce Implementations of Distributed Collaborative Based Recommendation System |
指導教授: |
李哲榮
Lee, Che-Rung |
口試委員: |
洪哲倫
Hung, Che-Lun 周志遠 Chou, Jerry |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 66 |
中文關鍵詞: | 推薦系統 、協同 、MapReduce 、Mahout |
外文關鍵詞: | Collaborative Filtering |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
現在的資訊社會有許多應用都是仰賴大量資料而形成,Recommendation System便是一例。Recommendation System近年來大量應用在電子商務,主要用在推薦使用者可能會感興趣的物品,希望能藉此提高使用者對該網站有更多的探索,用處極廣,是資料處理相關領域相當重要的一個研究議題。隨著商業規模的擴大,Recommendation System所要計算的資料大小也伴隨增加,計算所需花費的時間更是呈倍數成長。
Apache Mahout實作了一個使用MapReduce Framework來進行Large scale data-process 的Recommendation System運算,希望能夠藉由MapReduce讓運算更有效率並達到更高的精確度。MapReduce是一個用在大量資料分散式運算的programming model,能夠架設在cluster中,將資料分散到數台機器進行運算,用以進行大規模的資料處理。是Google先進行實作且大量運用在大規模資料的處理,後來Apache Hadoop跟進且將其MapReduce Project發展至更廣大的應用層面。
而我們針對上述Mahout Distributed Item Based Recommendation System進行分析後,針對該實作中花費較多執行時間的Similarity Matrix計算部分進行修改,改為進行Stochastic SVD和配合相關數學推導,實作出兩個Distributed Collaborative-based Recommendation Systems。
本篇論文使用Apache Hadoop架設了兩組Cluster,對Mahout Distributed Item Based Recommendation System和Distributed SSVD Recommendation System進行實驗測試,並且進一步比較其整體表現。
Recommendation System has been widely used in electronic commerce recently. To promote user’s visiting of websites, it recommends objects that users might be interested. Nowadays, large E-commerce sites often have millions of items and users, which increase the computation workload of Recommendation System rapidly.
Apache Mahout, an open source machine learning library, which uses MapReduce framework to implement Collaborative Filtering Recommendation Systems, is desinged to make large-scale data process more efficient. MapReduce framework is a distributed computation programming model. It is first proposed by Google and applied to the development of many Google’s servies. Then Apache Hadoop developed its MapReduce Project which has more widespread applications. MapReduce is mainly used to do large-scale data processing by distributing data and computation to different nodes of Cluster.
In this thesis we present the work of analyzing the processing of a Mahout Distributed Item Based Recommendation System and improving the most time comsuming part, which is the computation of similarity matrix. Two new algorithms of Distributed Collaborative-based Recommendation System are proposed and implemented using Stochastic SVD. Moreover, we conducted experiments to compare the performance and accuracy of those algorithms on two different clusters of Apache Hadoop servers. Experimental evaluations showed our algorithms and implementation can improve the performance of Mahout Distributed Item Based Recommendation System 2.5 times and its accuracy by Stochastic SVD features.
[1] S. Owen, R. Anil, T. Dunning, E. Friedman. Mahout in Action. Manning Publication Co., 2012.
[2] T White. Hadoop: The Definitive Guide, Second Edition. O’reilly.(2010).
[3] GroupLens Research. MovieLens Data Sets. http://www.grouplens.org/node/73.
[4] Jimmy Lin and Chris Dyer. Data-Intensive Text Processing with MapReduce. April 11, 2010.
[5] J. L. Herlockerer al.. Evaluating Collaborative Filtering Recommender System. ACM Transactions on Information Systems, 2004.
[6] B.Sarwar, G. Karypis, J. Konstan and J. Riedl. Analysis of Recommendation Algorithms for E-Commerce. In Proceedings of the 2nd ACM Conference on Electronic Commerce (EC’00). ACM. New York. pp. 285-295.
[7] B. Sarwar, G. Karypis, J. Konstan and J. Riedl. Item-Based Collaborative Filtering Recommendation Algorithms. ACM/Hong Kong. 2001.
[8] D. Billsus, M.J. Pazzani. Learning collaborative information filters. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, 1998, pp. 46-53.
[9] M.G. Vozalis, K.G. Margaritis. Using SVD and demographic data for the enhancement of generalized Collaborative Filtering. Information Sciences 177 (2007), pp. 3017-3037.
[10] SongJie Gong, HongWu Ye, and YaEDai . Combining Singular Value Decomposition and Item-based Recommender in Collaborative Filtering. IEEE 2009.
[11] Luiz AndréBarroso and UrsHölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool Publishers, 2009.
[12] N.Halko, P. G.Martinsson and J. A. Tropp. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review, 53(2) (2011), pp.217-288.
[13] Dmitriy Lyubimov. MapReduce SSVD Working Notes. MapReduce QR decomposition. 2010.
[14] Nathan P. Halko. Randomized methods for computing low-rank approximations of matrices. Ph. D., Department of Applied Mathematics, University of Colorado. 2012.
[15] T. Elsayed, J. Lin., and D. W. Oard. Pairwise Document Similarity in Large Collections with MapReduce. Proceedings of ACL-08: HLT, Short Papers, pages 265-268.
[16] https://cwiki.apache.org/MAHOUT/stochastic-singular-value-decomposition.html
[17] http://horicky.blogspot.tw/2011/09/recommendation-engine.html
[18] https://cwiki.apache.org/confluence/display/MAHOUT/RowSimilarityJob