研究生: |
陳丁其 Ding-Chi Chen |
---|---|
論文名稱: |
利用網路為語料庫之專名實體翻譯研究 NE Translation Using Web as Corpus |
指導教授: |
張俊盛
Jason S. Chang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 71 |
中文關鍵詞: | 專名實體 、專名 、翻譯 、網路 、語料庫 |
外文關鍵詞: | NE, Named Entity, Named Entities, translation, web, corpus |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們提出一個新的方法,據以利用網路資源,翻譯專名實體(Named Entity)。此方法涉及自動學習網路上英文專名實體、中文翻譯、符號所形成的表面樣式。由已知的專名實體與翻譯,利用網路搜尋引擎,擷取同時包含有專名實體與翻譯的摘要搜尋結果,考慮附近的符號,形成表面樣式。當需要翻譯專名實體時,首先利用搜尋引擎查詢專名實體之原文,回傳同時包含中文與英文之雙語夾雜摘要部分,藉由自動學習的表面樣式,可獲得在專名實體鄰近之可能為翻譯的字串,再配合資料冗餘性(Data Redundancy)與字串長度考量以求得最佳翻譯。在評估方面,以網路上蒐集到的3,581筆英文問答中,選出前200筆專名實體答案作為測試,觀察使用兩個不同的搜尋引擎配合本方法之翻譯效果。實驗結果發現,我們所提出的方法的確能有效擷取專名實體翻譯,且比市售翻譯軟體SYSTRAN的效果好。搜尋引擎方面,則以使用Google會比Yahoo!奇摩得到較好的翻譯效果。
We propose a method to translate Named Entities based on web resources. The method involves automatically learning surface patterns that are composed of English Named Entity, Chinese Translation, and symbols. With identified Named Entity and translation pairs as input, we obtain abstracts from web search engine and the strings composed of the Named Entity, translation and symbols from these abstracts will transform into surface patterns. When translating the Named Entity, we first use a web search engine to collect the abstracts containing the Named Entity and Chinese words. We then could find many probable translations by matching these abstracts with surface patterns. Additionally, we consider Data Redundancy and string length as information to obtain best translation. For evaluation, we collect 3,581 quesiton-answer pairs from the web and select 200 Named Entities from the quesiton-answer pairs as testing data, which are evaluated through two different web search engines. Results show that our methodology has good translation performance and, which is better than the commercial translation software SYSTRAN.
Chun-Jen Lee, Jason S. Chang, and Jyh-Shing Roger Jang. 2004a. Alignment of bilingual named-entity pairs extraction from parallel corpora. In Proceeding of IJCNLP-04 Workshop on Named Entity Recognition for Natural Language Processing Applications. 9-16.
Chun-Jen Lee, Jason S. Chang, and Jyh-Shing Roger Jang. 2004b. Bilingual named-entity pairs extraction from parallel corpora using statistical model. AMTA 2004. 144-153.
G. SALTON 1989. Automatic Text Processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley.
Hsin-His Chen, Changhua Yang, and Ying Lin. 2003. Learning formulation and transformation rules for multilingual named entities. In Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition. 1-8.
Jin-Cheng Wu. 2004. TermMine-STIB. Personal Communication.
M. M. Soubbotin. 2001. Patterns of potential answer expressions as clues to the right answers. TREC-10 Proceedings. 293-302.
Nagata M., Saito T., and Suzuki K. 2001. Using the web as a blingual dictionary. In Proceedings of ACL’2001 DD-MT Workshop.
Philipp Koehn and Kevin Knight. 2003. Feature-rich statistical translation of noun phrases. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 311-318.
W. H. Lu, L. F. Chien, and H. J. Lee. 2001. Anchor text mining for translation of web queries. In Proceedings of the 2001 IEEE International Conference on Data Mining. 401-408.
W. H. Lu, L. F. Chien, and H. J. Lee. 2002a. Translation of web queries using anchor text mining. ACM Transactions on Asian Language Information Processing. 159-172.
W. H. Lu, L. F. Chien, and H. J. Lee. 2002b. A transitive model for extracting translation equivalents of web queries through anchor text mining. In Proceedings of the 19th International Coference on Computational Linguistics (COLING2002). 584-590.
W. H. Lu, L. F. Chien, and H. J. Lee. 2003. Anchor text mining for translation of web queries: a transitive translation approach. To appear in ACM Transactions on Information Systems (SCI).
Yunbo Cao, Hang Li. 2002. Base noun phrase translation using web data and the EM algorithm. In Proceedings of 19th International Conference Computational Linguistics. 127-133.
Yaser Al-Onaizan and Kevin Knight. 2002. Translating named entities using monolingual and bilingual resources. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 400-408.