簡易檢索 / 詳目顯示

研究生: 鞏新陽
Gong,Xin -Yang
論文名稱: 考量使用者局部制約觀點之半監督式分群演算法
Semi-Supervised Clustering with Local Perception of User
指導教授: 吳尚鴻
Wu, Shan -Hung
口試委員: 吳尚鴻
張正尚
許秋婷
陳煥宗
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2014
畢業學年度: 103
語文別: 英文
論文頁數: 32
中文關鍵詞: 半監督式分群抽樣偏差局部側面資訊觀念向量
外文關鍵詞: semi-supervised clustering, sampling bias, local side information, perception vector
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 當前已存在一些半監督式分群演算法,通過考量從使用者處搜集來的輔助資訊進行資料分群。輔助資訊主要分為兩個類別:一類是考量全域分群情況而存在的全域資訊,指明一些資料屬於某個特定的資料群;另一類是局部的連結資訊,指明兩個資料的關係,必須屬於同一群或者必須屬於不同群。
    我們通過實驗發現當前通用的半監督式分群演算法仍舊存在缺陷,本文稱之為抽樣偏差。由於搜集輔助資訊的過程中使用者無法完全主動地表達分群觀點,抽樣搜集來的輔助資訊可能只包含到極少有代表性的資料,因此這些輔助資訊可能會對當前的演算法造成誤導,從而使演算法得到的分群結果與使用者真正想要的分群結果之間存在相當的偏差。
    為瞭解決這個缺陷,我們提出了一種新的分群演算法,稱之為觀念轉換分析,我們在考量輔助諮詢的同時也考量使用者的觀念字句,並把使用者的觀念字句建立為觀念向量形式的模型。本論文主要研討考量局部連結資訊的半監督式分群問題,每一組觀念字句與一個局部的必須連結的約束同時搜集,每一個觀念向量描述與之對應的局部連結約束背後的使用者觀念。
    為了驗證本文提出的演算法的有效性,我們使之與當前通用的其他半監督式分群演算法比較,使用真實資料集進行了大量實驗。實驗結果證實我們的觀念轉換分析分群演算法能夠有效地克服其他演算法的抽樣偏差缺陷,得到的分群結果更符合使用者觀念中的真實分群情況。


    Several semi-supervised clustering algorithms have been proposed to create clusters by exploring side information collected from users. The side information mainly has two categories: one is seed indication information based on global cluster situation; the other is pairwise link constraint which is relatively local side information. This paper focuses on the latter: local side information.
    We show in this paper there is still limitation of the current semi-supervised clustering algorithms. The side information that sampling collected from users may cover fewer representative instances, named as sampling bias here, which would mislead current algorithms and give rise to non-ignorable difference between identified clusters and the true clusters perceived by users.
    To address the limitation, we present a new clustering algorithm, named perception transform analysis (PTA), taking user’s perception words together with traditional side information into account by modeling user’s perception words in the form of perception vectors. This paper focuses on local side information, which means each perception vector models the concepts behind a must-link constraint and can be collected from users together with must-links.
    To verify the effectiveness of the proposed algorithm, we compare it with the state-of-the-art semi-supervised clustering algorithms. Extensive experiments are conducted on real datasets and the results demonstrate its advantages and robustness to sampling bias.

    Table of Contents Abstract 3 摘要 4 Acknowledgments 5 Table of Contents 6 Figure List 8 Table List 9 Chapter 1 Introduction 1 1.1 Current clustering algorithms 1 1.2 The proposed method 4 1.2.1 Basic assumption 4 1.2.2 Problem definition 6 1.2.3 Further related work 7 Chapter 2 Perception Gap 9 2.1 Experiment settings 9 2.2 Evidence 10 Chapter 3 Perception Transformation Analysis Model 13 3.1 Key idea 13 3.2 Objective forming 13 3.2.1 Step 1st 14 3.2.2 Step 2nd 14 3.2.2 New regularizer 15 3.2 Objective solving 16 3.3 Overlapping clustering 17 3.3.1 Thresholding strategy 17 Chapter 4 Experiment 19 4.1 Datasets 19 4.2 Baseline and setting 20 4.2.1 Baselines 20 4.2.2 Parameter settings 21 4.2.3 Evaluation metrics 21 4.3 Case study 22 4.4 General comparison 24 4.4.1 Mturk dataset 24 4.4.2 Citation dataset 25 Chapter 5 Conclusion and Future work 29 References 30

    [1] Cluster analysis. Available: http://en.wikipedia.org/wiki/Cluster_analysis
    [2] Sanjiv K Bhatia and Jitender S Deogun. "Conceptual clustering in information
    retrieval". IEEE Transactions on Systems, Man, and Cybernetics, Part B:
    Cybernetics, 28(3): 427–436, 1998.
    [3] Xiaoyong Liu and W Bruce Croft. "Cluster-based retrieval using language
    models". In Proc. of SIGIR, pages 186–193, 2004.
    [4] Qing Li and Byeong Man Kim. "Clustering approach for hybrid recommender
    system". In Proc. of IEEE/WIC int’l Conf. on Web Intelligence, pages 33–38,
    2003.
    [5] Anil K Jain. "Data clustering: 50 years beyond k-means". Elsevier Science Inc.
    Trans on Pattern Recognition Letters, 31(8):651–666, 2010.
    [6] Raymond J Mooney Sugato Basu, Arindam Banerjee. "Semi-supervised
    clustering by seeding". In Proc. of the 9th International Conference on
    Machine Learning, pages 27–34, 2002.
    [7] Zhenguo Li and Jianzhuang Liu. "Constrained clustering by spectral kernel
    learning". In Proc. of ICCV, pages 421–427, 2009.
    [8] Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. "Constrained
    k-means clustering with background knowledge". In Proc. of ICML, pages
    577–584, 2001.
    [9] Zhenguo Li, Jianzhuang Liu, and Xiaoou Tang. "Pairwise constraint
    propagation by semidefinite programming for semi-supervised classification".
    In Proc. of ICML, pages 576–583, 2008.
    [10] Ayhan Demiriz, Kristin Bennett, and Mark J. Embrechts. "Semi-supervised
    clustering using genetic algorithms". In Proc. of ANNIE, pages 809–814, 1999.
    [11] Lou Wagstaff, Kiri Lou Wagstaff, and Ph. D. "Clustering with instance-level
    constraints". In Proc. of ICML, pages 1103–1110, 2000.
    [12] Zhengdong Lu and Todd K Leen. "Semi-supervised learning with penalized
    probabilistic clustering". In Proc. of NIPS, pages 849–856, 2004.
    [13] Thomas Finley and Thorsten Joachims. "Supervised clustering with support
    vector machines". In Proc. ICML, pages 217–224, 2005.
    [14] Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. "A probabilistic
    framework for semisupervised clustering". In Proc. of KDD, pages 59–68,
    2004.
    [15] Aharon Bar-Hillel, Tomer Hertz, Noam Shental, and Daphna Weinshall.
    "Learning distance functions using equivalence relations". In Proc. of ICML,
    31
    pages 11–18, 2003.
    [16] Mikhail Bilenko and Raymond J Mooney. "Adaptive duplicate detection using
    learnable string similarity measures". In Proc. of KDD, pages 39–48, 2003.
    [17] Christopher D. Manning Dan Klein and, Sepandar D. Kamvar. "From
    instance-level constraints to space-level constraints: Making the most of prior
    knowledge in data clustering". In Proc. ICML, pages 307-314, 2002.
    [18] Matthew Schultz and Thorsten Joachims. "Learning a distance metric from
    relative comparisons". In Proc. of NIPS, pages 41-48, 2004.
    [19] Eric P Xing, Michael I Jordan, Stuart Russell, and Andrew Ng. "Distance metric
    learning with application to clustering with side-information". In Proc. of NIPS,
    pages 505–512, 2002.
    [20] Brian Kulis, Sugato Basu, Inderjit Dhillon and Raymond Mooney .
    "Semi-supervised graph clustering: a kernel approach". In Proc. of the 22nd
    International Conference on Machine Learning , pages 457–464, 2005.
    [21] Ian Davidson. "Two approaches to understanding when constraints help
    clustering". In Proc. of KDD, pages 1312–1320, 2012.
    [22] Jinfeng Yi, Lijun Zhang, Rong Jin, Qi Qian, and Anil Jain. "Semi-supervised
    clustering by input pattern assisted pairwise similarity matrix completion". In
    In Proc of ICML, pages 1400–1408, 2013.
    [23] Leonard Poon, Nevin L Zhang, Tao Chen, and Yi Wang. "Variable selection in
    model-based clustering: To do or to facilitate". In Proc. of ICML, pages
    887–894, 2010.
    [24] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yan-Tao
    Zheng. "NUS-WIDE: A Real-World Web Image Database from National
    University of Singapore", ACM International Conference on Image and Video
    Retrieval. Greece. Jul. 8-10, page48, 2009.
    [25] Guillaume Cleuziou. "An extended version of the k-means method for
    overlapping clustering". In Proc. of ICPR, pages 1–4, 2008.
    [26] Arindam Banerjee, Chase Krumpelman, Joydeep Ghosh, Sugato Basu, and
    Raymond J Mooney. "Model-based overlapping clustering". In Proc. of KDD,
    pages 532–537, 2005.
    [27] Ivor W. Tsang Changshui Zhang Feiping Nie, Dong Xu. S pectral embedded
    clustering. In Proc. of IJCAI, pages 1181–1186, 2009.
    [28] Zhenguo Li and Jianzhuang Liu. Constrained clustering by spectral kernel
    learning. In Proc. of ICCV, pages 421–427, 2009.
    [29] Lihi Zelnik-Manor and Pietro Perona. Self-tuning spectral clustering. In Proc.
    of NIPS, pages 1601–1608, 2004.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE