簡易檢索 / 詳目顯示

研究生: 林真伊
Lin, Chen-Yi
論文名稱: 有效率找出k個滿足最多客戶需求/喜愛產品之演算法
Efficient Algorithms for Determining k-Most Demanding/Favorite Products
指導教授: 陳良弼
Chen, L. P.
口試委員: 陳良弼
Chen, L. P.
陳銘憲
Chen, Ming-Syan
曾新穆
Tseng, Shin-Mu
柯佳伶
Koh, Jia-Ling
彭文志
Peng, Wen-Chih
韓永楷
Hon, Wing-Kai
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 105
中文關鍵詞: 資料與知識管理演算法決策支援演算法及系統效能評估查詢處理
外文關鍵詞: algorithms for data and knowledge management, decision support, performance evaluation of algorithm and systems, query processing
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在規劃新產品時,考慮市場已存在的競爭產品與客戶偏好等市調資料,可分析估算出產品目標行銷的客戶群數量,以提供產品行銷的決策依據。為了達到此目的,本論文提出以不同的方式估算產品客戶數量,定義出兩種新的產品定位查詢處理問題:分別是找出k個滿足最多客戶需求之產品,以及找出k個滿足最多客戶喜愛之產品。
      對一個特定種類的產品,給定一組顧客對產品屬性偏好的資料集合、一組市場中存在的產品集合、及公司可提供的一組候選產品集合,k個滿足最多客戶需求之產品是由整體擁有最多預期客戶數的k個候選產品所組成。當產品至少有3個屬性時,此問題為一個NP-hard的問題,因此我們提出一個貪婪演算法來解決此問題。此外,為了確保找出此問題的最佳解,本研究提出k個產品的預期客戶數上限值之估計方法,可有效縮減最佳解的搜尋空間,藉由該削減策略,發展出一個可有效率精確找出最佳解的演算法。
      另一方面,一個產品的反轉式前t名查詢(reverse top-t query)可找出將該產品視為前t個喜愛產品的客戶,這些客戶可視為該產品的潛在客戶。給定一組候選產品集合以及顧客對產品屬性偏好的資料集合,k-t滿足最多客戶喜愛產品是由整體可涵蓋最多潛在客戶數的最多k個候選產品所組成,且根據是否有已存在競爭產品分成兩個問題版本。我們運用前t名查詢(top-t query)與天際線查詢(skyline query)推導出的特性來縮減此問題最佳解的搜尋空間。另外,本研究提出產品潛在客戶數上限值的估計方法,以降低對候選產品執行反轉式前t名查詢的計算成本。由於此兩個問題版本均為NP-hard的問題,因此我們運用多個有效的削減策略,設計出的貪婪演算法所找到的近似解涵蓋之潛在客戶數,保證可達到最佳解涵蓋最多潛在客戶數的(1-1/e)近似率。
      此外,針對這兩部分的研究,我們皆設計一系列的實驗,以人造模擬資料及真實資料來驗證我們所提出的演算法的有效性及執行效率。


    To estimate the number of customers in target markets by taking both product competition and customer preference into consideration provides important information for decision making of product plans on product marketing. For achieving this purpose, this dis-sertation defines and solves two query processing problems of product positioning, named k-Most Demanding Products (k-MDP) discovering and k-t Most Favorite Products (k-t MFP) discovering, from different perspectives on estimating the number of customers.
    Given a set of customers demanding a certain type of products with multiple features, a set of existing products of the type, and a set of candidate products which can be offered by a company, the k-MDP discovering problem is to choose k products from these candi-date products such that the expected number of the total customers for the k products is maximized. We show the problem is NP-hard when the number of the features of a prod-uct is 3 or more. One greedy algorithm is proposed to find approximate solution for the problem. We also attempt to find the optimal solution of the problem by estimating the upper bound of the expected number of the total customers for a set of k candidate prod-ucts for reducing the search space of the optimal solution. An exact algorithm is then pro-vided to find the optimal solution of the problem efficiently by using this pruning strate-gy.
    On the other hand, a reverse top-t query for a product returns a set of customers, named potential customers, who regard the product as one of their top-t favorites. Given a set of products and a set of customers with different preferences on the features of the products, the k-t MFP discovering problem is to select at most k products such that the total number of potential customers is maximized. Two versions of the k-t MFP discovering problem are proposed according to whether existing products are considered in the problem. We exploit several properties of the top-t queries and skyline queries to reduce the solution space. In addition, an upper bound of the potential customers is estimated to reduce the computation cost of performing the reverse top-t query for a product. Because each ver-sion of the problem can be shown as an NP-hard problem, we provide one greedy algo-rithm with an approximation ratio (1-1/e) of the maximum total number of potential cus-tomers by using the designed pruning strategies.
    Furthermore, for both topics of studies, a series of experiments on synthetic datasets and real datasets are performed to show the effectiveness and efficiency of our proposed algorithms.

    Abstract………………………………………………………………………………3 1.Introduction……………………………………………………………………8 1.1 k-Most Demanding Products Discovering Problem……………8 1.2 k-t Most Favorite Products Discovering Problem…………12 2.Related Work……………………………………………………………………18 2.1 Studies Related to Business Analysis…………………………18 2.2 Recommender Systems………...…………………………………………22 3.k-Most Demanding Products Discovering Problem………………24 3.1 Problem Statement…………………………………………………………24 3.1.1 Formal Problem Definition…………………………………………24 3.1.2 Computational Complexity…………………………………………27 3.2 Algorithms for Approximate Solutions…………………………28 3.2.1 BMI Index Structure…………………………………………………28 3.2.2 Greedy Algorithms……………………………………………………32 3.2.2.1 Single-Product-based Greedy Algorithm…………………32 3.2.2.2 Incremental-based Greedy Algorithm………………………33 3.3 Algorithms for Optimal Solutions…………………………………34 3.3.1 Apriori-based Algorithm……………………………………………35 3.3.2 Upper-Bound Pruning Algorithm…………………………………41 3.4 Performance Evaluation…………………………………………………47 3.4.1 Performance on Synthetic Datasets……………………………48 3.4.1.1 Performance of the BMI Index Struc-ture………………48 3.4.1.2 Comparisons among Proposed Algorithms…………………50 3.4.2 Performance on Real Datasets……………………………………57 3.5 Summary of the k-MDP Discovering Problem……………………58 4.k-t Most Favorite Products Discovering Problem………………59 4.1 Problem Statement…………………………………………………………59 4.1.1 Preliminaries……………………………………………………………59 4.1.2 Definition of k-t Most Favorite Products…………………60 4.2 Monochromatic k-t MFP Discovering………………………………64 4.2.1 mSimpleGreedy Algorithm……………………………………………64 4.2.2 mFastGreedy Algorithm………………………………………………66 4.2.2.1 Properties for Efficiently Discovering the k-t mMFP……………………………………………………………………………………67 4.2.2.2 Description of the mFastGreey Algorithm………………73 4.3 Bichromatic k-t MFP Discovering …………………………………78 4.3.1 Properties for Efficiently Discovering the k-t bMFP……………………………………………………………………………………79 4.3.2 bSimpleGreedy Algorithm……………………………………………80 4.3.3 bFastGreedy Algorithm………………………………………………81 4.4 Performance Evaluation…………………………………………………83 4.4.1 Comparisons between mSG and mFG for the k-t mMFP Discovering Problem……………………………………………………………84 4.4.1.1 Performance on Synthetic Datasets…………………………84 4.4.1.2 Performance on Real Datasets…………………………………90 4.4.2 Comparisons between bSG and bFG for the k-t bMFP Discovering Problem……………………………………………………………91 4.4.2.1 Performance on Synthetic Datasets…………………………91 4.4.2.2 Performance on Real Datasets…………………………………94 4.5 Summary of the k-t MFP Discovering Problem…………………95 5.Conclusion and Future Work………………………………………………96 5.1 Conclusion……………………………………………………………………96 5.2 Future Work…………………………………………………………………98 References…………………………………………………………………………101

    [AS94] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” In Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487-499, 1994.
    [AT05] G. Adomavicius and A. Tuzhilin, “Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 6, pp. 734-749, 2005.
    [BK01] S. Borzsonyi, D. Kossmann, and K. Stocker, “The Skyline Operator,” In Pro-ceedings of the 17th International Conference on Data Engineering, pp. 421-430, 2001.
    [BS97] M. Balabanovic and Y. Shoham, “Fab: Content-Based, Collaborative Recom-mendation,” Communications of the ACM, Vol. 40, No. 3, pp. 66-72, 1997.
    [CB00] Y.-C. Chang, L. Bergman, and V. Castelli, “The Onion Technique: Indexing for Linear Optimization Queries,” In Proceedings of the 19th ACM SIGMOD International Conference on Management of Data, pp. 391-402, 2000.
    [CG99] Y. H. Chien and E. I. George, “A Bayesian Model for Collaborative Filtering,” In Proceedings of the 7th International Workshop on Artificial Intelligence and Statistics, 1999.
    [DS07] E. Dellis and B. Seeger, “Efficient Computation of Reverse Skyline Queries,” In Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 291-302, 2007.
    [F98] U. Feige, “A Threshold of ln n for Approximating Set Cover,” Journal of the ACM, Vol. 45, No. 4, pp. 634-652, 1998.
    [GN92] D. Goldberg, D. Nichols, B.M. Oki, and D. Terry, “Using Collaborative Filtering to Weave an Information Tapestry,” Communications of the ACM, Vol. 35, No. 12, pp. 61-70, 1992.
    [HS95] W. Hill, L. Stead, M. Rosenstein, and G. Furnas, “Recommending and Evaluating Choices in a Virtual Community of Use,” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 194-201, 1995.
    [KM97] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L.R. Gordon, and J. Riedl, “GroupLens: Applying Collaborative Filtering to Usenet News,” Communication of the ACM, Vol. 40, No. 3, pp. 77-87, 1997.
    [KM00] F. Korn and S. Muthukrishnan, “Influence Sets based on Reverse Nearest Neighbor Queries,” In Proceedings of the 19th ACM SIGMOD International Conference on Management of Data, pp. 201-212, 2000.
    [KP98] J. Kleinberg, C. Papadimitriou, and P. Raghavan, “A Microeconomic View of Data Mining,” Data Mining and Knowledge Discovery, Vol. 2, No. 4, pp. 311-322, 1998.
    [LC08] X. Lian and L. Chen, “Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases,” In Proceedings of the 27th ACM SIGMOD International Con-ference on Management of Data, pp. 213-226, 2008.
    [LK12] C.-Y. Lin, J.-L. Koh, and A. L. P. Chen, “Determining k-Most Demanding Prod-ucts with Maximum Expected Number of Total Customers,” accepted by IEEE Transac-tions on Knowledge and Data Engineering, 2012.
    [LK13] C.-Y. Lin, J.-L. Koh, and A. L. P. Chen, “Finding k Most Favorite Products based on Reverse Top-t Queries,” submitted to the 29th IEEE International Conference on Data Engineering, 2013.
    [LO06] C. Li, B. C. Ooi, A. K. H. Tung, and S. Wang, “DADA: A Data Cube for Domi-nant Relationship Analysis,” In Proceedings of the 25th ACM SIGMOD International Conference on Management of Data, pp. 659-670, 2006.
    [LS03] G. Linden, B. Smith, and J. York, “Amazon.com Recommendations: Item-to-Item Collaborative Filtering,” IEEE Internet Computing, Vol. 7, pp. 76-80, 2003.
    [LY07] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang, “Selecting Stars: The k Most Repre-sentative Skyline Operator,” In Proceedings of the 23rd International Conference on Data Engineering, pp.86-95, 2007.
    [M08] N. G. Mankiw, “Principles of Economics, the 5th Edition,” South-Western College Pub, 2008.
    [MA03] B.N. Miller, I. Albert, S.K. Lam, J.A. Konstan, and J. Riedl, “MovieLens Un-plugged: Experiences with an Occasionally Connected Recommender System,” In Pro-ceedings of the 2003 International Conference on Intelligent User Interfaces, pp. 263-266, 2003.
    [MD08] M. Miah, G. Das, V. Hristidis, and H. Mannila, “Standing Out in a Crowd: Se-lecting Attributes for Maximum Visibility,” In Proceedings of the 24th International Conference on Data Engineering, pp. 356-365, 2008.
    [PB97] M. Pazzani and D. Billsus, “Learning and Revising User Profiles: The Identifica-tion of Interesting Web Sites,” Machine Learning, Vol. 27, pp. 313-331, 1997.
    [SM95] U. Shardanand and P. Maes, “Social Information Filtering: Algorithms for Au-tomating ‘Word of Mouth’,” In Proceedings of the SIGCHI Conference on Human Fac-tors in Computing Systems, pp. 210-217, 1995.
    [SN99] I. Soboroff and C. Nicholas, “Combining Content and Collaboration in Text Fil-tering,” In Proceedings of the IJCAI’99 Workshop on Machine Learning for Information Filtering, Aug. 1999.
    [SW10] H. Z. Su, E. T. Wang, and A. L. P. Chen, “Continuous Probabilistic Skyline Queries over Uncertain Data Streams,” In Proceedings of the 21st International Confer-ence on Database and Expert Systems Applications, pp. 105-121, 2010.
    [TE01] K.-L. Tan, P.-K. Eng, and B. C. Ooi, “Efficient Progressive Skyline Computa-tion,” In Proceedings of the 27th International Conference on Very Large Data Bases, pp. 301-310, 2001.
    [VD10a] A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Norvag, “Reverse Top-k Que-ries,” In Proceedings of the 26th International Conference on Data Engineering, pp. 365-376, 2010.
    [VD10b] A. Vlachou, C. Doulkeridis, K. Norvag, and Y. Kotidis, “Identifying the Most Influential Data Objects with Reverse Top-k Queries,” In Proceedings of the 36th Interna-tional Conference on Very Large Data Bases, pp. 364-372, 2010.
    [VD11] A. Vlachou, C. Doulkeridis, N. Polyzotis, “Skyline Query Processing over Joins,” In Proceedings of the 30th ACM SIGMOD International Conference on Management of Data, pp. 73-84, 2011.
    [WO09] R. C.-W. Wong, M. T. Ozsu, P. S. Yu, A. W.-C. Fu, and L. Liu, “Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor,” In Proceedings of the 35th International Conference on Very Large Data Bases, pp. 1126-1137, 2009.
    [WW09] Q. Wan, R. C.-W. Wong, I. F. Ilyas, M. T. Ozsu, and Y. Peng, “Creating Com-petitive Products,” In Proceedings of the 35th International Conference on Very Large Data Bases, pp. 898-909, 2009.
    [WW10] Q. Wan, R. C.-W. Wong, and Y. Peng, “Finding Top-k Profitable Products,” In Proceedings of the 26th International Conference on Data Engineering, pp. 1055-1066, 2010.
    [WW11] W. C. Wang, E. T. Wang, and A. L. P. Chen, “Dynamic Skylines Considering Range Queries,” In Proceedings of the 16th International Conference on Database Systems for Advanced Applications, pp. 235-250, 2011.
    [WX09] T. Wu, D. Xin, Q. Mei, and J. Han, “Promotion Analysis in Multi-Dimensional Space,” In Proceedings of the 35th International Conference on Very Large Data Bases, pp. 109-120, 2009.
    [WY08] W. Wu, F. Yang, C. Y. Chan, and K. L. Tan, “FINCH: Evaluating Reverse k-Nearest-Neighbor Queries on Location Data,” In Proceedings of the 34th International Conference on Very Large Data Bases, pp. 1056-1067, 2008.
    [XZ05] T. Xia, D. Zhang, E. Kanoulas, and Y. Du, “On Computing Top-t Most Influen-tial Spatial Sites,” In Proceedings of the 31st International Conference on Very Large Data Bases, pp. 946-957, 2005.
    [ZC08] L. Zou and L. Chen, “Dominant Graph: An Efficient Indexing Structure to An-swer Top-k Queries,” In Proceedings of the 24th International Conference on Data Engi-neering, pp. 536-545, 2008.
    [ZL09] Z. Zhang, L. V. S. Lakshmanan, and A. K. H. Tung, “On Domination Game Analysis for Microeconomic Data Mining,” ACM Transactions on Knowledge Discovery from Data, Vol. 2, No. 4, pp. 18-44, 2009.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE