簡易檢索 / 詳目顯示

研究生: 林冠樺
Lin, Kuan-Hua
論文名稱: 考量使用者觀點之半監督式分群演算法
Semi-Supervised Clustering with Perception Vectors
指導教授: 吳尚鴻
口試委員: 陳銘憲
陳良弼
黃俊龍
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 26
中文關鍵詞: 分群半監督式分群個人化分群資料探勘使用者觀點
外文關鍵詞: clustering, semi-supervised clustering, personalized clustering, data mining, user perception
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 傳統的分群演算法只考慮到資料節點間的相似性,並無法達到個人化分群的功能,於是允許使用者提出旁側資訊的半監督式分群演算法被提出。在本篇論文中,我們發現即使有了旁側資訊的幫助,半監督式分群演算法所找到的結果和使用者心中所想的分群仍然存在著巨大的落差,造成此特性的主要原因為取樣偏誤—傳統旁側資訊可能只包含少數非隨機抽樣之節點,於是誤導演算法找出錯誤的分群結果。為了克服這個難題,我們提出了從使用者觀點學習之方式,請使用者提供觀點向量,其中每個向量敘述了使用者對於每一個群體的概念,並從這個角度提出了一個同時考慮傳統旁側資訊及使用者觀點向量之演算法,名為 BiLinear Embedded Perception (BLEP) clustering。BLEP 分群演算法可以學習到每個群體的隱性變量,進而找到更精確的結果。我們利用眾包平台蒐集許多不同使用者觀點之分群,並在此資料組上進行實驗,並對 BLEP 演算法之結果以及效能做更深入的討論。


    I Introduction 1 II Perception Gap 4 II-A Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 II-B Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 III Bilinear Embedded Perception Clustering 7 III-A Unsmoothening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 III-B New Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 III-C Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 IV Experiment 12 IV-A Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 IV-B Baselines and Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 IV-C Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 IV-D Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 IV-E General Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 IV-F Effect of the Number of Seeds . . . . . . . . . . . . . . . . . . . . . . . . . 17 IV-G Effect of the Size of Perception Vectors . . . . . . . . . . . . . . . . . . . . 17 IV-H Not All Clusters Need Perception Vectors . . . . . . . . . . . . . . . . . . . 18 IV-I One-to-One Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . 18 IV-J The0-asd-fiau9sr098273 urfc . . . . . . . . . . . . . . . . . . . . . . . . . . 19 IV-K Results Given Pairwise Constraints . . . . . . . . . . . . . . . . . . . . . . . 20 V Conclusions 22 References 23 VI Appendix 25 VI-A Obtaining Perception Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    [1] Arindam Banerjee, Chase Krumpelman, Joydeep Ghosh, Sugato Basu, and Raymond J Mooney. Model-based overlappingclustering. In Proc. of KDD, pages 532–537, 2005.
    [2] Aharon Bar-Hillel, Tomer Hertz, Noam Shental, and Daphna Weinshall. Learning distance functions using equivalence relations. In Proc. of ICML, pages 11–18, 2003.
    [3] Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. A probabilistic framework for semi-supervised clustering. In Proc. of KDD, pages 59–68, 2004.
    [4] Sanjiv K Bhatia and Jitender S Deogun. Conceptual clustering in information retrieval. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics, 28(3):427–436, 1998.
    [5] Mikhail Bilenko and Raymond J Mooney. Adaptive duplicate detection using learnable string similarity measures. In Proc. of KDD, pages 39–48, 2003.
    [6] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proc. CIVR, page 48, 2009.
    [7] Guillaume Cleuziou. An extended version of the k-means method for overlapping clustering. In Proc. of ICPR, pages 1–4, 2008.
    [8] Christopher D. Manning Dan Klein and, Sepandar D. Kamvar. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proc. ICML, 2002.
    [9] Ian Davidson. Two approaches to understanding when constraints help clustering. In Proc. of KDD, pages 1312–1320, 2012.
    [10] Ayhan Demiriz, Kristin Bennett, and Mark J. Embrechts. Semi-supervised clustering using genetic algorithms. In Proc. of ANNIE, pages 809–814, 1999.
    [11] Ivor W. Tsang Changshui Zhang Feiping Nie, Dong Xu. Spectral embedded clustering. In Proc. of IJCAI, pages 1181–1186, 2009.
    [12] Thomas Finley and Thorsten Joachims. Supervised clustering with support vector machines. In Proc. ICML, pages 217–224, 2005.
    [13] Hichem Frigui and Raghu Krishnapuram. A robust competitive clustering algorithm with applications in computer vision. IEEE Trans. on PAMI, 21(5):450–465, 1999.
    [14] Anil K Jain. Data clustering: 50 years beyond k-means. Elsevier Science Inc. Trans on Pattern Recognition Letters, 31(8):651–666, 2010.
    [15] Qing Li and Byeong Man Kim. Clustering approach for hybrid recommender system. In Proc. of IEEE/WIC int’l Conf. on Web Intelligence, pages 33–38, 2003.
    [16] Zhenguo Li and Jianzhuang Liu. Constrained clustering by spectral kernel learning. In Proc. of ICCV, pages 421–427, 2009.
    [17] Zhenguo Li, Jianzhuang Liu, and Xiaoou Tang. Pairwise constraint propagation by semidefinite programming for semi-supervised classification. In Proc. of ICML, pages 576–583, 2008.
    [18] Xiaoyong Liu and W Bruce Croft. Cluster-based retrieval using language models. In Proc. of SIGIR, pages 186–193, 2004.
    [19] Stuart Lloyd. Least squares quantization in pcm. IEEE Trans. on Information Theory, 28(2):129–137, 1982.
    [20] Zhengdong Lu and Todd K Leen. Semi-supervised learning with penalized probabilistic clustering. In Proc. of NIPS, pages 849–856, 2004.
    [21] Leonard Poon, Nevin L Zhang, Tao Chen, and Yi Wang. Variable selection in model-based clustering: To do or to facilitate. In Proc. of ICML, pages 887–894, 2010.
    [22] Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In Proc. of NIPS, 2004.
    [23] Raymond J Mooney Sugato Basu, Arindam Banerjee. Semi-supervised clustering by seeding. In Proc. of ICML, pages 27–34, 2002.
    [24] Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. Constrained k-means clustering with background knowledge. In Proc. of ICML, pages 577–584, 2001.
    [25] Lou Wagstaff, Kiri Lou Wagstaff, and Ph. D. Clustering with instance-level constraints. In Proc. of ICML, pages 1103–1110, 2000.
    [26] Eric P Xing, Michael I Jordan, Stuart Russell, and Andrew Ng. Distance metric learning with application to clustering with side-information. In Proc. of NIPS, pages 505–512, 2002.
    [27] Jinfeng Yi, Lijun Zhang, Rong Jin, Qi Qian, and Anil Jain. Semi-supervised clustering by input pattern assisted pairwise similarity matrix completion. In In Proc of ICML, pages 1400–1408, 2013.
    [28] Stella X Yu and Jianbo Shi. Multiclass spectral clustering. In Proc. ICCV, pages 313–319, 2003.
    [29] Lihi Zelnik-Manor and Pietro Perona. Self-tuning spectral clustering. In Proc. of NIPS, pages 1601–1608, 2004.
    [30] Shi Zhong. Semi-supervised model-based document clustering: A comparative study. Machine Learning, 65(1):3–29, 2006.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE