簡易檢索 / 詳目顯示

研究生: 劉庭維
Liu, Ting-Wei
論文名稱: 學習可泛化之特徵表現於跨來源之可信度分析
Beyond Content Embedding: Learning Generalizable Representations for Cross-Source Credibility Analysis
指導教授: 陳宜欣
Chen, Yi-Shin
口試委員: 彭文志
Peng, Wen-Chih
賴郁雯
Lai, Yu-Wen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 55
中文關鍵詞: 可信度分析跨來源假新聞模式嵌入
外文關鍵詞: credibility analysis, cross-soruce, fake news, pattern embedding
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在網路上散佈的錯誤資訊對社會造成了嚴重的影響,許多研究者提出不同的方法來 取得新聞的可信度並且有了很好的研究成果,但本研究發現現有的方法存在著泛化 性的問題。現有的模型在未包含於訓練集的媒體來源中無法有著一致性的表現,我 們稱之為跨來源問題,而此問題造成了模型的正確率下降了約20%。因此,為了解 決這個問題,我們提出了可信度模式嵌入神經網路模型(CPENN),此模型透過虛詞 與句法結構來學習可泛化的跨來源可信度特徵。我們使用包含194個媒體來源的資 料集的實驗來驗證本研究提出的方法超越現有的技術,且較能學習到可泛化的特 徵。我們也提出嵌入特徵的分析來驗證本研究學習的特徵優於現有的內容嵌入的特 徵,並總結本研究提出的方法CPENN由於有著更好的泛化能力,所以可以更加穩定地應用於現實中的不可信新聞偵測。


    The false information on Internet has caused severe damage to the society. Researchers have proposed methods to retrieve the credibility on news and have obtained good results. However, we discover the challenge of the generalizability for the existing methods to perform consistently on the news from media sources which are not in the training set, namely the cross-source failure.The cross-source failure causes 20% decrease on accuracy for current methods. To overcome the challenge, we proposed Credibility Pattern Embedding Neural Network (CPENN), which focus on function words and syntactic structure to learn generalizable representation. Experiments with cross validation on 194 media sources show that the proposed method can learn the generalizable feature and outperform the state-of-the-art methods on unseen media sources. Extensive analysis on embedding feature representation presents the strength on the proposed method compared to current content embedding feature. It concludes that the proposed CPENN is more robust to adopt in real-life unreliable news detection due to its well generalizability.

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5 Experiment & Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 7 CONCLUSIONS AND FUTUREWORK . . . . . . . . . . . . . . . . . . . . 46 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    [1] Hunt Allcott and Matthew Gentzkow. Social media and fake news in the 2016 election. Journal ofeconomic perspectives, 31(2):211–36, 2017.
    [2] Shlomo Argamon and Shlomo Levitan. Measuring the usefulness of function words for authorship attribution. In Proceedings ofthe 2005 ACH/ALLC Conference, pages 4–7, 2005.
    [3] Carlos Argueta, Fernando H Calderon, and Yi-Shin Chen. Multilingual emotion classifier using unsupervised pattern extraction from microblog data. Intelligent Data Analysis, 20(6):1477–1502, 2016.
    [4] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal ofmachine Learning research, 3(Jan):993–1022, 2003.
    [5] Alexandre Bovet and Hern´an A Makse. Influence of fake news in twitter during the 2016 us presidential election. Nature communications, 10(1):7, 2019.
    [6] Leon Derczynski, Kalina Bontcheva, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, and Arkaitz Zubiaga. Semeval-2017 task 8: Rumoureval: Determining rumour veracity and support for rumours. arXiv preprint arXiv:1704.05972, 2017.
    [7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    [8] Martin Ester, Hans-Peter Kriegel, J¨org Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996.
    [9] Andrew Guess, Brendan Nyhan, and Jason Reifler. Selective exposure to misinformation: Evidence from the consumption of fake news during the 2016 us presidential campaign. European Research Council, 9, 2018.
    [10] Benjamin D Horne and Sibel Adali. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Eleventh International AAAI Conference on Web and Social Media, 2017.
    [11] Benjamin D Horne, Sara Khedr, and Sibel Adali. Sampling the news producers: A large news and feature data set for the study of the complex media landscape. In Twelfth International AAAI Conference on Web and Social Media, 2018.
    [12] Maria Janicka, Maria Pszona, and Aleksander Wawer. Cross-domain failures of fake
    news detection. 04 2019.
    [13] Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. News verification by exploiting conflicting social viewpoints in microblogs. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
    [14] Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058, 2014.
    [15] Hamid Karimi and Jiliang Tang. Learning hierarchical discourse-level structure for fake news detection. arXiv preprint arXiv:1903.07389, 2019.
    [16] Mike Kestemont. Function words in authorship attribution. from black magic to theory? In Proceedings ofthe 3rdWorkshop on Computational Linguistics for Literature (CLFL), pages 59–66, 2014.
    [17] Junaed Younus Khan, Md Khondaker, Tawkat Islam, Anindya Iqbal, and Sadia Afroz. A benchmark study on machine learning methods for fake news detection. arXiv preprint arXiv:1905.04749, 2019.
    [18] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    [19] Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. Prominent features of rumor propagation in online social media. In 2013 IEEE 13th International Conference on Data Mining, pages 1103–1108. IEEE, 2013.
    [20] Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelli-
    gence, 2015.
    [21] David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, et al. The science of fake news. Science, 359(6380):1094–1096, 2018.
    [22] Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J Jansen, Kam-Fai Wong, and Meeyoung Cha. Detecting rumors from microblogs with recurrent neural networks. In Ijcai, pages 3818–3824, 2016.
    [23] Jing Ma, Wei Gao, and Kam-Fai Wong. Detect rumors in microblog posts using propagation structure via kernel learning. In Proceedings ofthe 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 708–717, 2017.
    [24] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal ofmachine learning research, 9(Nov):2579–2605, 2008.
    [25] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations ofwords and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
    [26] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010.
    [27] Tempestt Neal, Kalaivani Sundararajan, Aneez Fatima, Yiming Yan, Yingfei Xiang, and DamonWoodard. Surveying stylometry techniques and applications. ACMComputing Surveys (CSUR), 50(6):86, 2018.
    [28] Jeppe Nørregaard, Benjamin D Horne, and Sibel Adalı. Nela-gt-2018: A large multilabelled news dataset for the study of misinformation in news articles. In Proceedings of the International AAAI Conference on Web and Social Media, volume 13, pages 630–638, 2019.
    [29] Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In Proceedings ofthe 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
    [30] Ver´onica P´erez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. Automatic detection of fake news. arXiv preprint arXiv:1708.07104, 2017.
    [31] Dean Pomerleau and Delip Rao. The fake news challenge: Exploring how artificial intelligence technologies could be leveraged to combat fake news, 2017.
    [32] Kashyap Popat, Subhabrata Mukherjee, Jannik Str¨otgen, and Gerhard Weikum. Where the truth lies: Explaining the credibility of emerging claims on the web and social media. In Proceedings ofthe 26th International Conference on World Wide Web Companion, pages 1003–1012. International World Wide Web Conferences Steering Committee, 2017.
    [33] Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum. Declare: Debunking fake news and false claims using evidence-aware deep learning.
    arXiv preprint arXiv:1809.06416, 2018.
    [34] Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings ofthe 2017 Conference on Empirical Methods in Natural Language Processing, pages 2931–2937, 2017.
    [35] Victoria Rubin, Niall Conroy, Yimin Chen, and Sarah Cornwell. Fake news or truth? using satirical cues to detect potentially misleading news. In Proceedings of the second workshop on computational approaches to deception detection, pages 7–17, 2016.
    [36] Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A hybrid deep model for fake news detection. In Proceedings ofthe 2017 ACM on Conference on Information and Knowledge Management, pages 797–806. ACM, 2017.
    [37] Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi-Shin Chen. CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3687–3697, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.
    [38] Baoxu Shi and Tim Weninger. Fact checking in heterogeneous information networks. In Proceedings ofthe 25th International Conference Companion on World Wide Web, pages 101–102. International World Wide Web Conferences Steering Committee, 2016.
    [39] Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni LucaCiampaglia. Finding streams in knowledge graphs to support fact checking. In 2017 IEEE International Conference on Data Mining (ICDM), pages 859–864. IEEE, 2017.
    [40] Efstathios Stamatatos. A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology, 60(3):538–556, 2009.
    [41] James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, and Arpit Mittal. The fact extraction and verification (fever) shared task. arXiv preprint arXiv:1811.10971, 2018.
    [42] Svitlana Volkova, Kyle Shaffer, Jin Yea Jang, and Nathan Hodas. Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on twitter. In Proceedings ofthe 55th Annual Meeting ofthe Association for Computational Linguistics (Volume 2: Short Papers), pages 647–653, 2017.
    [43] Soroush Vosoughi, Deb Roy, and Sinan Aral. The spread of true and false news online. Science, 359(6380):1146–1151, 2018.
    [44] William Yang Wang. ” liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648, 2017.
    [45] Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu Su, and Jing Gao. Eann: Event adversarial neural networks for multi-modal fake news detection. In Proceedings ofthe 24th acm sigkdd international conference on knowledge discovery & data mining, pages 849–857. ACM, 2018.
    [46] You Wu, Pankaj K Agarwal, Chengkai Li, Jun Yang, and Cong Yu. Toward computa-
    tional fact-checking. Proceedings ofthe VLDB Endowment, 7(7):589–600, 2014.
    [47] Fan Yang, Arjun Mukherjee, and Eduard Dragut. Satirical news detection and analysis using attention mechanism and linguistic features. arXiv preprint arXiv:1709.01189, 2017.
    [48] Zhe Zhao, Paul Resnick, and Qiaozhu Mei. Enquiring minds: Early detection of rumors in social media from enquiry posts. In Proceedings ofthe 24th International Conference on World Wide Web, pages 1395–1405. International World Wide Web
    Conferences Steering Committee, 2015.

    QR CODE