簡易檢索 / 詳目顯示

研究生: 余仲哲
Yu, Chung-Che
論文名稱: 以不受內容影響之因子於財金新聞預測標題之吸引力
Extracting Content Independent Features for Attractive Title Prediction in Financial News
指導教授: 陳宜欣
Chen, Yi-Shin
口試委員: 陳朝欽
CHEN, Chaur-Chin
蔡宗翰
Tsai, Tzong-Han
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 31
中文關鍵詞: 標題吸引新聞影響
外文關鍵詞: Title, Attractive, News, Independent
相關次數: 點閱:66下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 進年來由於網路的發達,人們閱讀新聞的主要來源從報紙逐漸轉為網路新聞。相對於報紙而言,網路新聞經常只能使用標題加上小插圖,甚至在只有標題沒有插圖的情況下需要吸引人們的點擊,這使得作者們需要付出更多心力使標題更有吸引力。有其他相關的研究在分析標題是否為點擊誘餌(clickbait),然而這些資料是根據人為主觀評斷而且容易受到主題本身的影響,意即:只要內容本身足夠吸引人,則標題的影響力就不足以置喙。在此篇研究中我們為了排除內容主題的影響力,提出了一個新的框架排除主題相依性較高的詞彙:將新聞依照新聞內容分群,再幫每一個主題內的所有新聞標題切分成「吸引人」的標題與「不吸引人」標題,接者取出不受到主題影響的詞彙。利用此結果分析排除主題影響後的標題吸引力。透過實驗的執行與成果顯示,使用不受主題影響的特性對於預測模型完全沒有參考過的資料擁有較高的預測準確率。


    Due to booming of Internet, the main source of news have transferred from newspaper to news on the Internet. Compared to newspaper, news on the Internet usually displayed in list with small pictures, and sometimes there are only titles without pictures. In this case, to attract more page clicks, authors need to put much more efforts on making titles attractive. Some related researches analyze relationship between attractiveness and page click, like clickbait. However, they focus on a title is a bait or not and their data is manually labeled. In their research, the result tends to be content dependent features, which means that once a content is attractive enough, there would be no effect from titles. In this thesis, we propose a new framework to eliminate influence from content dependent information and then extract content independent features. As shown in the experiment, taking content independent features into regard has better performance for generality to small companies.

    Abstract Table of contents 1 Introduction----------1 2 Related Work----------4 2.1 Token-based----------4 2.2 Meaning-based----------5 2.3 Graph Pattern----------5 3 Methodology----------7 3.1 Eliminate Content Dependent Information----------9 3.2 Extract Content Independent Features----------13 3.2.1 Importance of Tokens----------14 3.2.2 Polarity Embedding----------17 4 Experiment----------19 4.1 Eliminating Influence from Content Topic to Page Click----------19 4.2 Extract Features from Title----------23 5 Conclusion----------28 References----------30

    [1] Carlos Argueta, Fernando H Calderon, and Yi-Shin Chen. Multilingual emotion classifier using unsupervised pattern extraction from microblog data. Intelligent Data Analysis, 20(6):1477–1502, 2016.
    [2] R Sherlock Campbell and James W Pennebaker. The secret life of pronouns: Flexibility in writing style and physical health. Psychological science, 14(1):60–65, 2003.
    [3] Abhijnan Chakraborty, Bhargavi Paranjape, Sourya Kakarla, and Niloy Ganguly.
    Stop clickbait: Detecting and preventing clickbaits in online news media. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 9–16. IEEE, 2016.
    [4] Zellig S Harris. Distributional structure. Word, 10(2-3):146–162, 1954.
    [5] Nathan Hurst. To clickbait or not to clickbait? an examination of clickbait headline effects on source credibility. PhD thesis, University of Missouri–Columbia, 2016.
    [6] Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196, 2014.
    [7] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
    [8] Martin Potthast, Sebastian Kopsel, Benno Stein, and Matthias Hagen. Clickbait de-¨ tection. In European Conference on Information Retrieval, pages 810–817. Springer, 2016.
    [9] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11–21, 1972.
    [10] Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang. A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American society for information science and technology, 57(3):378–393, 2006.

    QR CODE