研究生: |
陳奕安 Chen, Yi An |
---|---|
論文名稱: |
基於跨媒體深度融合網路的社群媒體使用者興趣探勘 Mining User Interests from Social Media Based On Deep Cross-Media Fusion Networks |
指導教授: |
林嘉文
Lin, Chia Wen |
口試委員: |
吳尚鴻
Wu, Shan Hung 王聖智 Wang, Sheng Jyh 鄭文皇 Cheng, Wen Huang |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2016 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 主題模型 、興趣發現 、社群網站分析 、融合網路 |
外文關鍵詞: | topic model, interest mining, social media analysis, fusion network |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文提出深度學習的架構,通過分析用戶在社群網站分享的圖文內容,融合影像和文字兩種異質性的資料,訓練類神經網路模型,利用此模型可以找到使用者的興趣點,從而對他們做精准的廣告推薦。
對於過往應用異質性資料於主題模型中會出現兩個問題點,第一個是語意層級的差異,當輸入影像於主題模型中,例如 : LDA (Latent Dirichlet Allocation) ,會先對影像取視覺詞彙 (Visual Word),以供主題模型使用,但視覺詞彙往往隱含有語意鴻溝 (Semantic Gap) 的缺陷,因為視覺詞彙是由低階特徵量化得到,而影像紋理的特徵並沒有辦法代表語意上真正的資訊,因此就會產生影像與文字在語意層級上明顯的落差,導致影像沒辦法呈現較好的主題分佈,此時主題模型再用變異度選擇的方法去投票得到的結果也會較為偏重在文字的部分。
第二個問題是如何結合社群發文的影像及文字得到更好更穩健的主題預測結果,過去會基於社群資料的互補特性而使用變異度選擇的方法過濾主題模糊的資料,這種人為定義的做法固然有效,認為變異度較高的資料主題描述更精確,但並不是每一篇影像或文字配對都適用於這個法則,而這個方法也是人較為主觀去認定,因此新的想法是希望用學習的方式取代變異度選擇的機制,由訓練資料集的過程中決定要如何結合這兩種資訊。
我提出的方法是利用深度學習結合文字和影像的資訊,先利用神經網路的架構,將兩種異質性資料影像和文字投影到同個語義空間,此時就可以合理的對於這兩種不同的資料利用最大池化 ( Max Pooling ) 這種非線性處理的架構,融合兩種特徵,最後在接一個全連接層,學出以融合特徵為基礎的預測模型,因此當影像或文字其中一個不准確時,經過融合網路的架構可以在兩種異質性資料之間找到最適合的融合結果,並得到精準的使用者發文主題預測,並幫助發掘使用者興趣。
This thesis presents a deep learning framework based on social group analysis to get a specific topic space. Our method will fuse the image and text after embedding into the same semantic space and train a deep learning model to predict the topic on the general user’s post. The prediction can help to mining user’s interests distribution and serve for personalized ads recommendation.
The work that predicting social post topic usually exists the two challenging problems. One is a semantic gap between image feature and text bag of words. This problem causes the topic distribution dominate on text words results and the combination between two vary feature become useless.
The other problem is how to fuse the social post images and texts, and take each advantage improving the topic prediction performance. The past work used the human-designed method to filter out the posts with inexplicit topics. However, not all of social posts can fit this mechanism. Therefore, we present the idea that hope to learn feature selection from dataset. Use the knowledge from the data, not by human-designed. It can learn more robust model.
We proposed the deep learning architecture to fuse the social post image and text. The deep learning architecture consists three parts: the preprocessing and feature extraction part, embedding networks, fusion network. First, for the text posts, do text segmentation, and remove stop words, extract keywords. For the image posts, extract feature from the convolutional neural network, and use the top layer feature as embedding networks input. Then we embed two high semantic meanings features into the same space and domain and get initial topic prediction. After above step, we can use Max-Pooling to fuse these feature and obtain the fusion representation. Based on the fusion representation, designing a fully connected layer to learn a final topic prediction. When the image or text is not accurate, the fusion networks will find a suitable fusion results between two vary data and achieve more accurate user interests prediction.
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton. “Imagenet classification with deep convolutional neural networks.” In NIPS, pages 1097–1105, 2012.
[2] Adler J. Perotte, Frank Wood, Noemie Elhadad, Nicholas Bartlett, “Hierarchically Supervised Latent Dirichlet Allocation,” NIPS 2011: 2609-2617.
[3] B. Cui, A. K. H. Tung, C. Zhang, and Z. Zhao, “Multiple feature fusion for social media applications,” in Proc. SIGMOD, 2010, pp. 435–446.
[4] Barndorff-Nielsen O, “Information and exponential families in statistical theory,” 1978.
[5] Blei, David M., Andrew Y. Ng, and Michael I. Jordan, “Latent dirichlet allocation,” the Journal of machine Learning research 3 (2003): 993-1022.
[6] C.G.M. Snoek, M. Worring, and A.W.M. Smeulders. Early versus late fusion in semantic video analysis. In ACM Multimedia, pages 399–402, Singapore, 2005.
[7] D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006.
[8] Dimitrios Zeimpekis, Efstratios Gallopoulos, “TMG: A MATLAB Toolbox for Generating Term-Document Matrices from Text Collections,” Grouping Multidimensional Data 2006: 187-210.
[9] Feng, He, and Xueming Qian, “Mining user-contributed photos for personalized product recommendation,” Neurocomputing 129 (2014): 409-420.
[10] G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. “Improving neural networks by preventing co-adaptation of feature detectors.” arXiv:1207.0580, 2012.
[11] Griffiths, D. M. B. T. L., and M. I. J. J. B. Tenenbaum, “Hierarchical topic models and the nested Chinese restaurant process,” Advances in neural information processing systems 16 (2004): 17.
[12] H. Feng, X. Qian, Recommend social network users favorite brands, PCM (2013).
[13] H. Larochelle and S. Lauly, “A neural autoregressive topic model,” in NIPS 25, 2012. 2, 3, 4
[14] Hanhuai Shan, Arindam Banerjee, “Mixed-membership naive Bayes models.” Data Min. Knowl. Discov. 23(1): 1-62 (2011).
[15] Hofmann, Thomas, “Probabilistic latent semantic indexing,” SIGIR 1999: 50-57.
[16] J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille. “Explain images with multimodal recurrent neural networks.” In arXiv:1410.1090, 2014. 2, 6, 7
[17] J. Tang, R. Hong, S. Yan, T. Chua, G. Qi, R. Jain, Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images, ACM Trans. Intell. Syst. Technol. (TIST) 2 (2011) 14.
[18] J. Tang, S. Yan, R. Hong, G. Qi, T. Chua, Inferring semantic concepts from community-contributed images and noisy tags, in: Proceedings of the MM, 2009, 223–232.
[19] J. Tang, Z. Zha, D. Tao, T. Chua, Semantic-gap-oriented active learning for multilabel image annotation, IEEE Trans. Image Process. 21 (2012) 2354–2360.
[20] K. He, X. Zhang, S. Ren, and J. Sun. “Deep Residual Learning for Image Recognition.” In CVPR, 2016.
[21] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition”. In ICLR, 2015.
[22] M.F. Porter, “An algorithm for suffix stripping,” Program (1980), no. 3, 130–137.
[23] Mcauliffe, Jon D., and David M. Blei, “Supervised topic models,” Advances in neural information processing systems. 2008.
[24] N. Craswell, Mean reciprocal rank, in: Encyclopedia of Database Systems, 2009, p. 1703.
[25] N. Srivastava and R. Salakhutdinov, “Multimodal learning with deep Boltzmann machines,” in Proc. Adv. Neural Inf. Process. Syst., 2012.
[26] N. Srivastava and R. Salakhutdinov, “Discriminative transfer learning with tr-based priors,” in Proc. Adv. Neural Inf. Process. Syst., 2013.
[27] Qiu, Feng, and Junghoo Cho. "Automatic identification of user interest for personalized search." Proceedings of the 15th international conference on World Wide Web. ACM, 2006.
[28] Redner R, Walker H, “Mixture densities, maximum likelihood and the EM algorithm,” SIAM Rev 1984: 195–239.
[29] Ruslan Salakhutdinov and Geoffrey Hinton. Replicated Softmax: an Undirected Topic Model. In Advances in Neural Information Processing Systems 22 (NIPS 2009), pages 1607–1614, 2009.
[30] Wang, Xin-Jing, et al, “Argo: intelligent advertising by mining a user's interest from his photo collections,” KDD Workshop on Data Mining and Audience Intelligence for Advertising 2009: 18-26.
[31] X. Qian, X. Liu, C. Zheng, Y. Du, X. Hou, Tagging photos using users' vocabularies, Neurocomputing 111 (2013) 144–153.
[32] Xin-Jing Wang, Lei Zhang, Xirong Li, Wei-Ying Ma, “Annotating Images by Mining Image Search Results,” IEEE Trans. Pattern Anal. Mach. Intell. 30(11): 1919-1932 (2008).
[33] Xishan Zhang, Hanwang Zhang, Yong Dong Zhang, Yang Yang, Meng Wang, Huanbo Luan, Jin Tao Li, and Tat-Seng Chua, “Deep fusion of multiple semantic cues for complex event recognition,” IEEE TIP, vol. 25, no. 3, pp. 1033–1046, 2016.
[34] Y. Zheng, Y. Zhang, H. Larochelle, “A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data,” in NIPS, 2014.