研究生: |
羅傑聘 Lo, Chieh Pin |
---|---|
論文名稱: |
以協同濾波與內容導向濾波偵測雙盲論文作者身分 Detecting Authors of Double-Blind Papers by Collaborative Filtering and Content-based Filtering |
指導教授: |
張正尚
Chang, Cheng-Shang |
口試委員: |
張正尚
Chang, Cheng Shang 李端興 Lee, Duan Shin 林華君 Lin, Hwa Chun 黃之浩 Huang, Chih Hao |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 通訊工程研究所 Communications Engineering |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 英文 |
論文頁數: | 25 |
中文關鍵詞: | 文字探勘 |
外文關鍵詞: | Text Mining |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
許多學術會議期刊使用雙盲同行審查流程以確保審查的公平性。在雙盲同行審查流程中,學術論文的作者們與審查者們彼此都不知道對方的身分。然而,在雙盲同行審查流程中,作者的身分是否能有效地被隱藏,是一個有趣的研究題目。對此,我們提出一個偵測雙盲論文作者身分的問題,探討是否能從這些作者過往的著作中獲得資訊,進而辨識出雙盲論文作者的身分。
為了解決這個問題,首先我們從arXiv電子資料庫收集了許多論文。我們根據袋字模型從這些論文抽取特徵字來建造文章對字矩陣,以及作者對字矩陣。藉由使用這兩個矩陣,我們提出以下三種來自協同濾波以及內容倒向濾波的預測方法偵測作者: (1)餘弦相似度,(2)在二分圖上的隨機漫步,(3)矩陣因子分解。
在實驗部分,我們比較這三種方法的準確率。在實驗結果中可見到,餘弦相似度方法擁有最高的準確率94%。然而餘弦相似度方法的運算時間可能會過長,因此,我們提出了最小雜湊法來提升餘弦相似度方法的效率。在實驗結果中我們可見到,擁有二十篇以上過往著作的作者,其身分會有超過90%的機率被我們的系統偵測出來;也就是說,雙盲同行審查流程很難去隱藏那些擁有許多過往著作的作者們的身分。
Many conferences and journals use double-blind peer review to ensure the fairness of the review process. In double-blind peer review process, neither author nor reviewer identities are revealed. One interesting research question is to see whether the double-blind paper review process can indeed conceal the authors' identities.
For this, we consider an authors detection problem of double-blind papers to see whether the authors of the double-blind papers can be detected with the information of their past publications.
To solve the authors detection problem, we rst collect a large set of papers from arXiv. Based on the bag-of-word model, we parse these papers to extract terminologies
of authors to construct a document-term matrix and an author-term matrix. By using these matrices, we propose three prediction methods to detect the authors: (i)cosine
similarity, (ii)random walk on the bipartite graph, and (iii)matrix factorization, which are collaborative ltering and content-based ltering techniques.
We compare the accuracy of these three methods in our experiments. Experimental results show that the cosine similarity method has highest accuracy 94%. However, the
computation time of the cosine similarity method might be too long. Therefore, we use minhash to improve the efficiency of the cosine similarity method. We can see that
authors who wrote more than 20 papers have more than 90% probability of being detected by our system; in other words, it is difficult to conceal authors' identities of those who
published a lot of papers in the past.
[1] P. Brusilovsky, A. Kobsa, and W. Nejdl, The adaptive web: methods and strategies
of web personalization. Springer Science & Business Media, 2007, vol. 4321.
[2] K. Sugiyama and M.-Y. Kan, \Scholarly paper recommendation via user's recent
research interests," in Proceedings of the 10th annual joint conference on Digital
libraries. ACM, 2010, pp. 29{38.
[3] X. Su and T. M. Khoshgoftaar, \A survey of collaborative ltering techniques,"
Advances in articial intelligence, vol. 2009, pp. 1-20, 2009.
[4] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, \Item-based collaborative ltering
recommendation algorithms," in Proceedings of the 10th international conference on
World Wide Web. ACM, 2001, pp. 285-295.
[5] G. Linden, B. Smith, and J. York, \Amazon. com recommendations: Item-to-item
collaborative ltering," Internet Computing, IEEE, vol. 7, no. 1, pp. 76-80, 2003.
[6] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, \Using collaborative ltering to
weave an information tapestry," Communications of the ACM, vol. 35, no. 12, pp.
61-70, 1992.
[7] T. Bogers, \Movie recommendation using random walks over the contextual graph,"
in Proc. of the 2nd Intl. Workshop on Context-Aware Recommender Systems, 2010.
[8] M. Li, B. M. Dias, I. Jarman, W. El-Deredy, and P. J. Lisboa, \Grocery shopping
recommendations based on basket-sensitive random walk," in Proceedings of the 15th
ACM SIGKDD international conference on Knowledge discovery and data mining.
ACM, 2009, pp. 1215-1224.
[9] H. Ma, H. Yang, M. R. Lyu, and I. King, \Sorec: social recommendation using
probabilistic matrix factorization," in Proceedings of the 17th ACM conference on
Information and knowledge management. ACM, 2008, pp. 931{940.
[10] C. Basu, H. Hirsh, W. W. Cohen, and C. G. Nevill-Manning, \Technical paper
recommendation: A study in combining multiple information sources," J. Artif.
Intell. Res.(JAIR), vol. 14, pp. 231-252, 2001.
[11] C. Wang and D. M. Blei, \Collaborative topic modeling for recommending scien-
tic articles," in Proceedings of the 17th ACM SIGKDD international conference on
Knowledge discovery and data mining. ACM, 2011, pp. 448-456.
[12] H. Xue, J. Guo, Y. Lan, and L. Cao, \Personalized paper recommendation in on-
line social scholar system," in Advances in Social Networks Analysis and Mining
(ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 2014, pp. 612-619.
[13] Z. S. Harris, \Distributional structure." Word, 1954.
[14] J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive datasets. Cambridge University Press, 2014.
[15] M. Newman, Networks: an introduction. Oxford University Press, 2010.
[16] A. Z. Broder, \On the resemblance and containment of documents," in Compression
and Complexity of Sequences 1997. Proceedings. IEEE, 1997, pp. 21-29.
[17] C.-b. Lin, \Projected gradient methods for nonnegative matrix factorization," Neural
computation, vol. 19, no. 10, pp. 2756-2779, 2007.