簡易檢索 / 詳目顯示

研究生: 楊捷扉
Jie-Fei Yang
論文名稱: 人物搜尋之資訊擷取與分類
Information Extraction and Classification for Person Search
指導教授: 張俊盛
Jason S. Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 50
中文關鍵詞: 人名檢索資訊擷取文件分類
外文關鍵詞: person search, information extraction, text categorization
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一個以網路資源為本,自動收集中文人名經歷資訊及專業領域。透過個人經歷資訊擷取以及專業領域的分類,可以有效地解決人名歧異(Personal Name Disambiguation)之問題。而專業領域分類更使得個人資訊的提供,能有系統一致化地呈現給使用者。
    在訓練過程中,我們利用語言學的知識以及統計學上的技術,從網路上收集經歷資訊之表面樣式(surface patterns),作為從網路上收集人名資訊以及擷取個人資訊之依據。並且應用Yarowsky (1995)的自舉式方法,以網路資源為本來訓練文件分類器。在執行階段,輸入的人名透過表面樣式之輔助收集經歷資訊,經由經歷資訊及領域分類,解析區隔同名同姓人士的資訊。
    我們也將描述此一方法的系統實作。實驗結果證明我們的方法能夠有效地取出人名的經歷,並且區格不同領域的同名同姓人士,使得個人資訊之網路搜集更為有效。


    We introduce a method for automatically collecting personal information and professional domain of the person. In our approach, personal information is extracted and the domain is identified from web-based data based on personal name disambiguation.
    In the training phase, the method involves generating surface pattern to personal information extraction based on linguistic and statistical information from the Web, and an unsupervising algorithm for constructing Web-based text categorization. At runtime, submitting a person name into a search engine, extracting personal information and identifying each retrieved passage the domain according to the expected person name, finally the referents are sorted by domain, personal information and the degree of popularity.
    We also described an implementation of the proposed method. Blind evaluation of a set of names shows that our method outperforms extracting personal information and cleanly classifying individual’s domain-specific knowledge. This method can be applied to help users quickly find about a person with resulting in the display of personal information in a systematic and consistent way.

    摘要 i ABSTRACT ii Acknoledgement iii Table of Contents iv List of Tables v List of Figures vi Chapter 1 Introduction 1 Chapter 2 Related Work 6 Chapter 3 The PeopleSea System 11 3.1 Problem Statement 11 3.2 People Search on the Web 13 3.2.1 Full Title Extraction 13 3.2.2 Domain Classifier 18 3.2.3 Runtime Process 21 Chapter 4 Experiments and Results 28 4.1 Experimental Setting 28 4.2 Evaluation Metrics 31 4.3 Experimental Results 32 Chapter 5 Discussion 35 5.1 Evaluation on Full Title Extraction 35 5.2 Evaluation on Domain Classification 37 5.3 Limitation in Our Current Research 37 Chapter 6 Conclusion and Future Work 39 References 41 Appendix A – Domain Decision List 43 Appendix B – Full Titles Extraction 46

    AI-Kamha, R. and Embley, D. W. Grouping Search-Engine Returned Citations for Person-Name Queries. In WIDM’04, pp.96-103, Washington, DC, USA, 2004.
    Bagga, A. and Baldwin, B. Entity-Based Cross-Document Coreferencing Using the Vector Space Model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 79-85, Montreal, Canada, 1998.
    Bekkerman, R. and McCallum A. Disambiguating Web Appearances of People in a Social Network. In Proceedings of the 15th World Wide Web Conference (WWW 2005), ACM press, pp.463-470, Chiba, Japan, 2005.
    Bollegala D., Marsuo Y., and Ishizuka M. Extracting Key Phrases to Disambiguate Personal Names on the Web. In Proceeding of CICLing, 2006.
    Fleischman M. B. and Hovy E. Multi-document Person Name Resolution. In Proceedings of the Workshop on Reference Resolution, Barcelona, Spain, 2004.
    Googlism. 2003. <http://www.googlism.com> (1 July 2006).
    Guha, R. and Garg, A. Disambiguating People in Search. In Proceedings of the 13th World Wide Web Conference (WWW 2004), ACM Press, 2004.
    Lloyd, L., Bhagwan V., Gruhl D., and Tomkins A. Disambiguation of references to individuals. Technical Report RJ10364 (A0510-011), IBM Research, 2005
    Malin, B. Unsupervised Name Disambiguation via Social Network Similarity. In proceedings of the Workshop on Link analysis, Counterterrorism, and Security, in conjunction with the SIAM International Conference on Data Mining, pp. 93-102, Newport Beach, CA, 2005.
    Mann, G. S. and Yarowsky, D. Unsupervised Personal Name Disambiguation. In Proceedings of 7th Conference on Computational Natural Language Learning (CoNLL-2003), pp. 33-40, Edmonton, Canada, 2003.
    Manning, C. D. Foundations of Statistical Natural Language Processing (London: England, 1999), pp. 232, 249-252, 494, 575.
    Peng, F., Weischedel, R., Licuanan, A., Xu, J. Combining Deep Linguistics Analysis and Surface Pattern Learning: A Hybrid Approach to Chinese Definitional Question Answering, 2005. Retrieved June 2, 2006, from http://www.cs.umass.edu/fuchun/publication/HLT-EMNLP2005.pdf.
    Soubbotin, M. M. Patterns of Potential Answer Expressions as Clues to the Right Answer. In Proceedings of the TREC-10 Conference, NIST, pp.175-182, Gaithersburg, MD, 2001.
    Vivisimo Inc.2000. <http:// www.vivisimo.com> (1 July 2006).
    Voorhees, E. M. Overview of the TREC 2003 Question answering Track. In proceeding of the 12th Text Retrieval Conference (TREC 2003), pp. 54-68, Gaithersburg, MD, 2004.
    Wan, X., Gao, J., Li, M., and Ding, B. Person Resolution in Person Search Results: WebHawk. In Proceedings of ACM 14th Conference on Information and Knowledge Management (CIKM 2005), pp. 163-170, Bremen, Germany, 2005.
    Yarowsky, D. Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp.88-95, Las Cruces, NM, 1994.
    Yarowsky, D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189-196, 1995.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE