簡易檢索 / 詳目顯示

研究生: 葉佳鑫
Yeh, Chia Hsin
論文名稱: 透過特性級嵌入之多重任務之半監督式分群演算法
Multi-Task Semi-Supervised Clustering via Feature Level Tags Embedding
指導教授: 吳尚鴻
Wu, Shan Hung
口試委員: 陳銘憲
Chen, Ming Xian
沈之涯
Shen, Zhi Ya
張正尚
Zhang, Zheng Shang
黃信騫
Huang, Xin Qian
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 25
中文關鍵詞: 分群多重任務使用者標籤
外文關鍵詞: clustering, multi-task, user tag
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 現今多用來找出個人化分群方法的半監督式分群演算法,通常都會遇到取樣偏見的問題,這篇論文分別從下列三個面向提出解決這個問題的方法:
    1. 利用使用者在資料點上給予的標籤資訊當作半監督式分群演算法的額外資訊
    2. 設計一個模型可將資料點自原本的特性空間嵌入至標籤空間並同時對資料點作分群
    3. 使用一個高維度的張量分享不同使用者間的知識,以提升個別使用者的分群表現

    從我們的實驗中,可以證明我們的演算法具有優異的知識分享能力及分群表現。

    這篇論文其他的貢獻包含:
    1. 設計一個在高維張量上使用的鬆散一般化式
    2. 針對我們的模型,設計一個免CP分解的訓練方式


    The modern solution to find personalized clustering is Semi-Supervised Clustering. However, Semi-Supervised Clustering usually has to face the problem of sampling bias. Our work aims to solve this problem by:
    1. Use the user-given tags data on the data points as the side-information of semi-supervised clustering
    2. Design a single model to embed the data points from feature space to tag space and cluster the data points in one phase
    3. Use a high-mode tensor to transfer knowledge from user to user, and enhance the individual clustering performance

    Our experiments showed the superior performance and ability to transfer knowledge of our algorithm.

    Other contributions of this paper include:
    1. Define a sparse regularization term on tensor
    2. Design a CP-free update rule of our model

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Perception Embedding Model [23] . . . . . . . . . . . . . . . . . 4 2.2 Multi-Task Clustering . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Collaborative Similarity Matrix Completion . . . . . . . . . . . . 4 3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1 One-Domain Case . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Multi-Domains Case . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3 Implementation Proposals . . . . . . . . . . . . . . . . . . . . . . 7 4 Practical Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1 CP-Free Update Rule . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2 Training Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.3 Time Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.1.1 Mturk Image Dataset . . . . . . . . . . . . . . . . . . . . 9 5.1.2 Song Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.1.3 Citation Dataset . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.3 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.4 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.5 Transfer Ability Comparison . . . . . . . . . . . . . . . . . . . . 12 5.6 Compare with Baseline . . . . . . . . . . . . . . . . . . . . . . . . 13 5.7 Compare with Enhanced PE . . . . . . . . . . . . . . . . . . . . 14 5.8 Zero-Shot Experiment . . . . . . . . . . . . . . . . . . . . . . . . 14 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    [1] Aharon Bar-Hillel, Tomer Hertz, Noam Shental, and Daphna Weinshall.
    Learning distance functions using equivalence relations. In ICML, volume 3,
    pages 11–18, 2003.
    [2] Sugato Basu, Arindam Banerjee, and Raymond Mooney. Semi-supervised clustering by seeding. In In Proceedings of 19th International Conference
    on Machine Learning (ICML-2002. Citeseer, 2002.
    [3] Sugato Basu, Mikhail Bilenko, and Raymond J Mooney. A probabilistic
    framework for semi-supervised clustering. In Proceedings of the tenth ACM
    SIGKDD international conference on Knowledge discovery and data mining,
    pages 59–68. ACM, 2004.
    [4] Sanjiv K Bhatia and Jitender S Deogun. Conceptual clustering in information
    retrieval. IEEE Transactions on Systems, Man, and Cybernetics,
    Part B (Cybernetics), 28(3):427–436, 1998.
    [5] Mikhail Bilenko and Raymond J Mooney. Adaptive duplicate detection using
    learnable string similarity measures. In Proceedings of the ninth ACM
    SIGKDD international conference on Knowledge discovery and data mining,
    pages 39–48. ACM, 2003.
    [6] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and
    Yantao Zheng. Nus-wide: a real-world web image database from national
    university of singapore. In Proceedings of the ACM international conference
    on image and video retrieval, page 48. ACM, 2009.
    [7] Guillaume Cleuziou. An extended version of the k-means method for overlapping
    clustering. In Pattern Recognition, 2008. ICPR 2008. 19th International
    Conference on, pages 1–4. IEEE, 2008.
    [8] Ayhan Demiriz, Kristin P Bennett, and Mark J Embrechts. Semisupervised
    clustering using genetic algorithms. Artificial neural networks
    in engineering (ANNIE-99), pages 809–814, 1999.
    [9] Thomas Finley and Thorsten Joachims. Supervised clustering with support
    vector machines. In Proceedings of the 22nd international conference on
    Machine learning, pages 217–224. ACM, 2005.
    [10] Hichem Frigui and Raghu Krishnapuram. A robust competitive clustering
    algorithm with applications in computer vision. Ieee transactions on pattern
    analysis and machine intelligence, 21(5):450–465, 1999.
    [11] Quanquan Gu and Jie Zhou. Learning the shared subspace for multi-task
    clustering and transductive transfer classification. In ICDM, volume 9,
    pages 159–168, 2009.
    [12] Xiangnan He, Min-Yen Kan, Peichu Xie, and Xiao Chen. Comment-based
    multi-view clustering of web 2.0 items. In Proceedings of the 23rd international
    conference on World wide web, pages 771–782. ACM, 2014.
    [13] Dan Klein, Sepandar D Kamvar, and Christopher D Manning. From
    instance-level constraints to space-level constraints: Making the most of
    prior knowledge in data clustering. 2002.
    [14] Brian Kulis and Michael I Jordan. Revisiting k-means: New algorithms via
    bayesian nonparametrics. arXiv preprint arXiv:1111.0352, 2011.
    [15] Qing Li and Byeong Man Kim. Clustering approach for hybrid recommender
    system. In Web Intelligence, 2003. WI 2003. Proceedings.
    IEEE/WIC International Conference on, pages 33–38. IEEE, 2003.
    [16] Zhenguo Li and Jianzhuang Liu. Constrained clustering by spectral kernel
    learning. In 2009 IEEE 12th International Conference on Computer Vision,
    pages 421–427. IEEE, 2009.
    [17] Zhenguo Li, Jianzhuang Liu, and Xiaoou Tang. Pairwise constraint propagation
    by semidefinite programming for semi-supervised classification. In
    Proceedings of the 25th international conference on Machine learning, pages
    576–583. ACM, 2008.
    [18] Xiaoyong Liu and W Bruce Croft. Cluster-based retrieval using language
    models. In Proceedings of the 27th annual international ACM SIGIR conference
    on Research and development in information retrieval, pages 186–
    193. ACM, 2004.
    [19] Yan-Fu Liu, CS NTHU, Cheng-Yu Hsu, and Shan-Hung Wu. Nonlinear
    cross-domain collaborative filtering via hyper-structure transfer. In
    Proceedings of the 32nd International Conference on Machine Learning
    (ICML-15), pages 1190–1198, 2015.
    [20] Zhengdong Lu and Todd K Leen. Semi-supervised learning with penalized
    probabilistic clustering. In Advances in neural information processing
    systems, pages 849–856, 2004.
    [21] Markus Schedl, Nicola Orio, Cynthia Liem, and Geoffroy Peeters. A professionally
    annotated and enriched multimodal data set on popular music. In
    Proceedings of the 4th ACM Multimedia Systems Conference, pages 78–83.
    ACM, 2013.
    [22] Matthew Schultz and Thorsten Joachims. Learning a distance metric from
    relative comparisons. Advances in neural information processing systems
    (NIPS), page 41, 2004.
    [23] Cheng Ting-Yu, Lin Kuan-Hua, Gong Xinyang, Liu Kang-Jun, and Shan-
    Hung Wu. Learning user perceived clusters with feature-level supervision.
    Advances in neural information processing systems (NIPS), 2016.
    [24] Kiri Wagstaff and Claire Cardie. Clustering with instance-level constraints.
    AAAI/IAAI, 1097, 2000.
    [25] Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl, et al. Constrained
    k-means clustering with background knowledge. In ICML, volume
    1, pages 577–584, 2001.
    [26] Eric P Xing, Andrew Y Ng, Michael I Jordan, and Stuart Russell. Distance
    metric learning with application to clustering with side-information.
    Advances in neural information processing systems, 15:505–512, 2003.
    [27] Yan Yan, Elisa Ricci, Gaowen Liu, and Nicu Sebe. Egocentric daily activity
    recognition via multitask clustering. IEEE Transactions on Image
    Processing, 24(10):2984–2995, 2015.
    [28] Jinfeng Yi, Rong Jin, Anil K Jain, and Shaili Jain. Crowdclustering with
    sparse pairwise labels: A matrix completion approach. In AAAI Workshop
    on Human Computation, volume 2. Citeseer, 2012.
    [29] Jinfeng Yi, Rong Jin, Shaili Jain, Tianbao Yang, and Anil K Jain. Semicrowdsourced
    clustering: Generalizing crowd labeling by robust distance
    metric learning. In Advances in neural information processing systems,
    pages 1772–1780, 2012.
    [30] Jinfeng Yi, Lijun Zhang, Rong Jin, Qi Qian, and Anil Jain. Semi-supervised
    clustering by input pattern assisted pairwise similarity matrix completion.
    In Proceedings of the 30th International Conference on Machine Learning
    (ICML-13), pages 1400–1408, 2013.
    [31] Jinfeng Yi, Lijun Zhang, Tianbao Yang, Wei Liu, and Jun Wang. An efficient
    semi-supervised clustering algorithm with sequential constraints. In
    Proceedings of the 21th ACM SIGKDD International Conference on Knowledge
    Discovery and Data Mining, pages 1405–1414. ACM, 2015.
    [32] Yisong Yue, Chong Wang, Khalid El-Arini, and Carlos Guestrin. Personalized
    collaborative clustering. In Proceedings of the 23rd international
    conference on World wide web, pages 75–84. ACM, 2014.
    [33] Jianwen Zhang and Changshui Zhang. Multitask bregman clustering. Neurocomputing,
    74(10):1720–1734, 2011.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE