簡易檢索 / 詳目顯示

研究生: 林昱豪
Lin, Yu-Hao
論文名稱: 上下文相關頁內搜尋
Context-Aware In-Page Search
指導教授: 張俊盛
口試委員: 高照明
梁婷
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 43
中文關鍵詞: 上下文相關搜尋字義解岐實體連結維基百科支持向量機
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文描述一自動搜尋方法,將使用者給予的查詢詞彙,根據其上下文資訊連結至類維基百科知識庫中的文章,進而將知識庫中的文章提供給使用者作為參考,並減輕使用者選擇正確詞彙語意之負擔。此分法首先利用大型類維基知識庫中所含資訊來擴充較小型類維基知識庫之資訊,並利用擴充之後的知識庫來計算一超連結相似度模型,最後將此模型之量化資訊當作支持向量機支訓練資料,此模型即可用來根據前後文判別查詢詞彙在知識庫中應該對應之文章。實驗結果顯示,此系統能夠有效的為給定查詢詞彙與其上下文選擇正確的維基百科文章。此結果將可當作領域專有搜尋系統的核心,經過適當的修改,將可利用在跨語言搜尋系統上。


    In this paper we introduce a method for searching appropriate articles from knowledge bases (e.g. Wikipedia) for a given query and its context. In our approach, this problem is transformed into a multi-class classification of candidate articles. The method involves automatically augmenting small knowledge bases using larger knowledge bases and learning to choose adequate articles based on hyperlink similarity between article and context. At run-time, keyphrases in given context are extracted and the sense ambiguity of query term is resolved by computing similarity of keyphrases between context and candidate articles. Evaluation shows that the method significantly outperforms the strong baseline of assigning most frequent articles to the query terms. Our method effectively determines adequate articles for given query-context pairs, suggesting the possibility of using our methods in context-aware search engines.

    摘要 i Abstract ii Acknowledgement iii Table of Contents iv List of Figures v List of Tables vi CHAPTER 1 Introduction 1 CHAPTER 2 Related Work 6 CHAPTER 3 Method 10 3.1 Problem Statement 10 3.2 Learning to Link with Wikipedia-like Databases 12 3.2.1 Generate Candidate Term-Entity Pairs From Knowledge Base 13 3.2.2 Augmenting Knowledge Base using Inter-Wiki Links 16 3.2.3 Training the Binary SVM Model 19 3.3 Run-Time Entity Linking 21 CHAPTER 4 Experimental Setting 25 4.1 Data Set 25 4.2 Methods Compared 27 4.3 Evaluation Metrics 28 4.4 Evaluation Results 29 4.5 Error Analysis 33 CHAPTER 5 Conclusion and Future Works 36 References 38 Appendix A - Higher order Link Similarity Model 42

    Agirre, E., and Rigau, G. (1996). Word Sense Disambiguation using Conceptual Density. 16th Conference on Computational Linguistics, (pp. 16-22). Copenhagen.
    Banerjee, S., and Pedersen, T. (2002). An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. the Third International Conference on Intelligent Text Processing and Computational Linguistics. Mexico City.
    Black, E. W. (1988). An Experiment in Computational Discrimination of English Word Senses. IBM Journal of Research and Development , 185-194.
    Bruce, R., and Wiebe, J. (1994). Word-Sense Disambiguation Using Decomposable Models. 32nd Annual Meeting of the Association for Computational Linguistics (pp. 139-146). Las Cruces: Association for Computational Linguistics.
    Carpaut, M., and Wu, D. (2007). Improving Statistical Machine Translation using Word Sense Disambiguation. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 61-72). Prague: Association for Computational Linguistics.
    Chan, Y. S., Ng, H. T., and Chiang, D. (2007). Word Sense Disambiguation Improves Statistical Machine Translation. the Association for Computational Linguistics (ACL), (pp. 33-40).
    Chang, J. S., Lin, T., You, G.-N., Chuang, T. C., and Hsieh, C.-T. (2003). Building a Chinese WordNet via Class-based Translation Model. Computational Linguistics and Chinese Language Processing , 61-76.
    Chang CC and Lin CJ. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3):27.
    Cilibrasi RL and Vitanyi PMB. 2007. The google similarity distance. Knowledge and Data Engineering, IEEE Transactions on 19(3):370-83.
    Diab, M., and Resnik, P. (2002). An Unsupervised Method for Word Sense Tagging using Parallel Corpora. the 40th Annual Meeting of the Association for Computational Linguistics (ACL), (pp. 255-262). Philadelphia.
    Gale, W. A., Church, K. W., and Yarowsky, D. (1992). Using Bilingual Materials to Develop Word Sense Disambiguation Methods. the International Conference on Theoretical and Methodological Issues in Machine Translation, (pp. 101-112).
    Galley, M., and McKeown, K. (2003). ImprovingWord Sense Disambiguation in Lexical Chaining. 18th International Joint Conference on Artificial Intelligence (IJCAI 2003). Acapulco.
    Hamp, B., and Feldweg, H. (1997). GermaNet - a Lexical-Semantic Net for German. ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, (pp. 9-15). Madrid.
    Hearst, M. A. (1991). Noun Homograph Disambiguation using Local Context in Large Corpora. 7th Annual Conference of the University of Waterloo Centre for the New OED and Text Research, (pp. 1-15).
    Hsieh, C.-T. (2000). Semi-Automatic Construction of Chinese WordNet - Using Class-based Translation Model.
    Huang, C.-C., Tseng, C.-H., Kao, K. H., and Chang, J. S. (2008). A Thesaurus-based Semantic Classification of English Collocations. ROCLING 2008, (pp. 38-52). Taipei.
    Huang, C.-R., Chang, R.-Y., and Lee, H.-P. (2004). Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO. 4th International Conference on Language Resources and Evaluation (LREC2004), (pp. 1553-1556). Lisbon.
    Leacock, C., Towell, G., and Voorhees, E. (1993). Corpus-based Statistical Sense Resolution. ARPA Human Language Technology Workshop, (pp. 260-265).
    Lesk, M. (1986). Automatic Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. 5th Annual International Conference on Systems Documentation (pp. 24-26). Toronto: Association for Computing Machinery.
    Longman Group. (1992). Longman English-Chinese Dictionary of Contemporary English. Hong Kong: Longman Group (Far East) Ltd.
    Mihalcea, R., and Moldovan, D. I. (1999). A Method for Word Sense Disambiguation of Unrestricted Text. the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 152-158). College Park: Association for Computational Linguistics.
    Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. J. (1990). Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography , pp. 235-244.
    Medelyan O., Witten I. H. and Milne D. 2008. Topic indexing with wikipedia. Proceedings of the AAAI WikiAI workshop, AAAI Press. 19 p.
    Mihalcea R. and Csomai A. 2007. Wikify!: Linking documents to encyclopedic knowledge. Proceedings of the sixteenth ACM conference on conference on information and knowledge management. 233 p.
    Milne D. 2007. Computing semantic relatedness using wikipedia link structure. Proceedings of the new zealand computer science research student conference.
    Milne D. and Witten I. H. 2008. Learning to link with wikipedia. Proceedings of the 17th ACM conference on information and knowledge management, ACM. 509 p.
    Pasca, M., and Harabagiu, S. M. (2001). The Informative Role of WordNet in Open-Domain Question Answering. NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions, and Customizations, (pp. 138-143). Pittsburgh.
    Towell, G., and Voorhees, E. M. (1998). Disambiguating Highly Ambiguous Words. Computational Linguistics , 125-145.
    Voorhees, E. M., and Tice, D. M. (1999). The TREC-8 Question Answering Track Evaluation. TREC-8, (pp. 84-106).
    Vossen, P. (1998). Introduction to EuroWordNet. Computers and the Humanities , 73-89.
    Wible, D., and Kuo, C.-H. (2001). A Syntax-Lexical Semantics Interface Analysis of Collocation Errors. Pacific Second Language Research Forum.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE