上下文相關頁內搜尋｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	林昱豪 Lin, Yu-Hao
論文名稱：	上下文相關頁內搜尋 Context-Aware In-Page Search
指導教授：	張俊盛
口試委員:	高照明梁婷
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2012
畢業學年度：	100
語文別：	英文
論文頁數：	43
中文關鍵詞：	上下文相關搜尋、字義解岐、實體連結、維基百科、支持向量機
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文描述一自動搜尋方法，將使用者給予的查詢詞彙，根據其上下文資訊連結至類維基百科知識庫中的文章，進而將知識庫中的文章提供給使用者作為參考，並減輕使用者選擇正確詞彙語意之負擔。此分法首先利用大型類維基知識庫中所含資訊來擴充較小型類維基知識庫之資訊，並利用擴充之後的知識庫來計算一超連結相似度模型，最後將此模型之量化資訊當作支持向量機支訓練資料，此模型即可用來根據前後文判別查詢詞彙在知識庫中應該對應之文章。實驗結果顯示，此系統能夠有效的為給定查詢詞彙與其上下文選擇正確的維基百科文章。此結果將可當作領域專有搜尋系統的核心，經過適當的修改，將可利用在跨語言搜尋系統上。

In this paper we introduce a method for searching appropriate articles from knowledge bases (e.g. Wikipedia) for a given query and its context. In our approach, this problem is transformed into a multi-class classification of candidate articles. The method involves automatically augmenting small knowledge bases using larger knowledge bases and learning to choose adequate articles based on hyperlink similarity between article and context. At run-time, keyphrases in given context are extracted and the sense ambiguity of query term is resolved by computing similarity of keyphrases between context and candidate articles. Evaluation shows that the method significantly outperforms the strong baseline of assigning most frequent articles to the query terms. Our method effectively determines adequate articles for given query-context pairs, suggesting the possibility of using our methods in context-aware search engines.

摘要    i
Abstract    ii
Acknowledgement    iii
Table of Contents    iv
List of Figures    v
List of Tables    vi
CHAPTER 1 Introduction    1
CHAPTER 2 Related Work    6
CHAPTER 3 Method    10
3.1 Problem Statement    10
3.2 Learning to Link with Wikipedia-like Databases    12
3.2.1 Generate Candidate Term-Entity Pairs From Knowledge Base    13
3.2.2 Augmenting Knowledge Base using Inter-Wiki Links    16
3.2.3 Training the Binary SVM Model    19
3.3 Run-Time Entity Linking    21
CHAPTER 4 Experimental Setting    25
4.1 Data Set    25
4.2 Methods Compared    27
4.3 Evaluation Metrics    28
4.4 Evaluation Results    29
4.5 Error Analysis    33
CHAPTER 5 Conclusion and Future Works    36
References    38
Appendix A － Higher order Link Similarity Model    42

                                

Agirre, E., and Rigau, G. (1996). Word Sense Disambiguation using Conceptual Density. 16th Conference on Computational Linguistics, (pp. 16-22). Copenhagen.
Banerjee, S., and Pedersen, T. (2002). An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. the Third International Conference on Intelligent Text Processing and Computational Linguistics. Mexico City.
Black, E. W. (1988). An Experiment in Computational Discrimination of English Word Senses. IBM Journal of Research and Development , 185-194.
Bruce, R., and Wiebe, J. (1994). Word-Sense Disambiguation Using Decomposable Models. 32nd Annual Meeting of the Association for Computational Linguistics (pp. 139-146). Las Cruces: Association for Computational Linguistics.
Carpaut, M., and Wu, D. (2007). Improving Statistical Machine Translation using Word Sense Disambiguation. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 61-72). Prague: Association for Computational Linguistics.
Chan, Y. S., Ng, H. T., and Chiang, D. (2007). Word Sense Disambiguation Improves Statistical Machine Translation. the Association for Computational Linguistics (ACL), (pp. 33-40).
Chang, J. S., Lin, T., You, G.-N., Chuang, T. C., and Hsieh, C.-T. (2003). Building a Chinese WordNet via Class-based Translation Model. Computational Linguistics and Chinese Language Processing , 61-76.
Chang CC and Lin CJ. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3):27.
Cilibrasi RL and Vitanyi PMB. 2007. The google similarity distance. Knowledge and Data Engineering, IEEE Transactions on 19(3):370-83.
Diab, M., and Resnik, P. (2002). An Unsupervised Method for Word Sense Tagging using Parallel Corpora. the 40th Annual Meeting of the Association for Computational Linguistics (ACL), (pp. 255-262). Philadelphia.
Gale, W. A., Church, K. W., and Yarowsky, D. (1992). Using Bilingual Materials to Develop Word Sense Disambiguation Methods. the International Conference on Theoretical and Methodological Issues in Machine Translation, (pp. 101-112).
Galley, M., and McKeown, K. (2003). ImprovingWord Sense Disambiguation in Lexical Chaining. 18th International Joint Conference on Artificial Intelligence (IJCAI 2003). Acapulco.
Hamp, B., and Feldweg, H. (1997). GermaNet - a Lexical-Semantic Net for German. ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, (pp. 9-15). Madrid.
Hearst, M. A. (1991). Noun Homograph Disambiguation using Local Context in Large Corpora. 7th Annual Conference of the University of Waterloo Centre for the New OED and Text Research, (pp. 1-15).
Hsieh, C.-T. (2000). Semi-Automatic Construction of Chinese WordNet - Using Class-based Translation Model.
Huang, C.-C., Tseng, C.-H., Kao, K. H., and Chang, J. S. (2008). A Thesaurus-based Semantic Classification of English Collocations. ROCLING 2008, (pp. 38-52). Taipei.
Huang, C.-R., Chang, R.-Y., and Lee, H.-P. (2004). Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO. 4th International Conference on Language Resources and Evaluation (LREC2004), (pp. 1553-1556). Lisbon.
Leacock, C., Towell, G., and Voorhees, E. (1993). Corpus-based Statistical Sense Resolution. ARPA Human Language Technology Workshop, (pp. 260-265).
Lesk, M. (1986). Automatic Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. 5th Annual International Conference on Systems Documentation (pp. 24-26). Toronto: Association for Computing Machinery.
Longman Group. (1992). Longman English-Chinese Dictionary of Contemporary English. Hong Kong: Longman Group (Far East) Ltd.
Mihalcea, R., and Moldovan, D. I. (1999). A Method for Word Sense Disambiguation of Unrestricted Text. the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 152-158). College Park: Association for Computational Linguistics.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. J. (1990). Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography , pp. 235-244.
Medelyan O., Witten I. H. and Milne D. 2008. Topic indexing with wikipedia. Proceedings of the AAAI WikiAI workshop, AAAI Press. 19 p.
Mihalcea R. and Csomai A. 2007. Wikify!: Linking documents to encyclopedic knowledge. Proceedings of the sixteenth ACM conference on conference on information and knowledge management. 233 p.
Milne D. 2007. Computing semantic relatedness using wikipedia link structure. Proceedings of the new zealand computer science research student conference.
Milne D. and Witten I. H. 2008. Learning to link with wikipedia. Proceedings of the 17th ACM conference on information and knowledge management, ACM. 509 p.
Pasca, M., and Harabagiu, S. M. (2001). The Informative Role of WordNet in Open-Domain Question Answering. NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions, and Customizations, (pp. 138-143). Pittsburgh.
Towell, G., and Voorhees, E. M. (1998). Disambiguating Highly Ambiguous Words. Computational Linguistics , 125-145.
Voorhees, E. M., and Tice, D. M. (1999). The TREC-8 Question Answering Track Evaluation. TREC-8, (pp. 84-106).
Vossen, P. (1998). Introduction to EuroWordNet. Computers and the Humanities , 73-89.
Wible, D., and Kuo, C.-H. (2001). A Syntax-Lexical Semantics Interface Analysis of Collocation Errors. Pacific Second Language Research Forum.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文