簡易檢索 / 詳目顯示

研究生: 陳書磊
William Shu-Lei Chen
論文名稱: 詩句中的語意結構之擷取和檢索
Semantic Structure Extraction and Retrieval of Chinese Poetry
指導教授: 蘇豐文
Von-Wun Soo
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 49
中文關鍵詞: 語義擷取語義結構相似度比對中文詩詞資訊檢索語義結構擷取本體論擷取
外文關鍵詞: Semantic Extraction, Semantic Structural Similarity, Chinese Poetry, Information Retrieval, Semantic Structure Extraction, Ontology Instance Extraction
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著語意網(Semantic Web)的發展,從文件中擷取語意知識變成了一個越來越被重視的課題。語意擷取技術上的困難,使得語意網的普及性和應用都受到限制.尤其是變化複雜,一字通常多義的中文,其處理更加困難。目前已經有的語意擷取技術,往往只擷取文件中的某些特定詞彙,並不能完整的表達其中包含的語意。我們針對中國近體詩的詩句作分析,利用近體詩整齊有規律的格律,從詩句中擷取語意結構。同時我們不但可以用語意網中通用的本體論(Ontology)表達詩中得取的語意結構,也可以用樹狀結構來表示。針對這種語意樹狀結構,我們設計了其相似度的演算法.比較使用者的查詢句和詩句語意結構上的相似(Semantic Structural Similarity),我們也能幫助使用者查詢詩句。
    這篇論文提出一個系統,以使用者問句和詩句語意結構的比對作詩句查詢。我們的系統利用中文同義詞詞林資料庫,對詩句作處理,經由語意標註(Semantic Annotation),語意擷取規則(Semantic Parsing Rules)等步驟,擷取詩句中的語意結構,再經由我們的語意結構相似度比對(Semantic Structural Similarity Matching)演算法,和使用者查詢句的語意結構作一比對,幫助使用者查詢詩句。而我們希望在未來能進一步利用同樣的演算法,將使用者的輸入句改述(Paraphrase)為詩句的想法,嘗試以語意分析來做為電腦自動中文作詩的第一步。


    Due to the vision of the semantic web, semantic extraction has become the topic of many studies. The difficulty of extracting semantic information from natural language documents is one of the major challenges in the development of the semantic web. The semantic extraction from Chinese documents is more difficult due to the lacking of formal definitions of grammars, and of Chinese parsers. Some studies had attempted to extract ontology from Chinese text documents; however, the ontology was limited to a specific domain, which means the semantics extracted were domain-limited. Tang & Song Chinese poetry (唐宋近體詩) is a special form of Chinese literature; it is written with regular metric patterns, and although it takes a regular format, it has rich semantics. With these characteristics, we believe it is feasible to extract semantic information from Chinese poetry with simple heuristics, independent of the domain of concepts. This research proposes techniques to extract the semantic structure from Chinese poetry, and to retrieve Chinese poetry based on semantic structure similarity. The reason we have chosen to work with Chinese poetry is that we want to take advantage of the regularity of the metric pattern of Chinese poetry. To extract the semantic structure from Chinese poetry, we have designed a set of parsing rules. With our parsing heuristics, we can parse each line in the Chinese poem into an ontology instance. We can also parse it to semantic structure of our own design, which can be used later in semantic structure similarity matching. In order to retrieve poems using structure similarity matching, we have designed a semantic structure similarity algorithm to compute the similarity between two semantic structures. The user can retrieve poems using Chinese query passages or with a semantic structure written in a specific format.

    摘要 2 Chapter 1 Introduction 4 1.1 Background 4 1.2 Problem Description 4 1.3 Organization of Thesis 5 Chapter 2 Related Works 6 Chapter 3 System Overview 8 Chapter 4 Proposed Methods 11 4.1 About Chinese Poetry 11 4.2 The Chinese Thesaurus & Semantic Annotation 14 4.3 Parsing 15 4.4 Semantic Structural Similarity 25 4.5 Poetry Retrieval 35 Chapter 5 Experiments 37 Chapter 6 Conclusion and Future Work 43 References 45 Appendix A 47 Appendix B 47

    [1] “RDF, Resource Description Framework”, http://www.w3.org/RDF/.
    [2] “RDFS, Resource Description Framework Schema Specification 1.0”, http://www.w3.org/TR/2000/CR-rdf-schema-20000327/.
    [3] “The DARPA Agent Markup Language Homepage”, http://www.daml.org/.
    [4] D. W. Embley, D. M. Campbell, and R.D. Smith, “Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents”, in Proc. of CIKM'98.
    [5] T. Andreasen, H. Bulskov, and R. Knappe, “From Ontology over Similarity to Query Evaluation”, in Proc. of 2nd International Conference on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems ODBASE 2003.
    [6] T. Brasethvik, and J. A. Gulla, “Natural Language Analysis for Semantic Document Modeling”, Data & Knowledge Engineering, March, 2001, pp. 45-62.
    [7] E. Brill, “A Simple Rule-Based Part of Speech Tagger”, Applied Natural Language Conferences’92 Proceedings.
    [8] J. T. Kim and D. I. Moldovan, “Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction”, IEEE Transactions on Knowledge and Data Engineering, Vol. 7, No, 5, Oct 1995.
    [9] T. Poibeau and D. Dutoit, “Generating extraction patterns from a large semantic network and an untagged corpus”, in Proc. of SemaNet’02 Workshop, August 2002.
    [10] Mei et al. “TongyiciCilin Thesaurus”, Commercial Press, Shanghai, 1996.
    [11] A Maedche and S. Staab, “Measuring Similarity between Ontologies”, in Proc. of EKAW, 2002.
    [12] “Classical Chinese Poetry”, http://en.wikipedia.org/wiki/Chinese_Poetry.
    [13] M. Collins and N. Duffy, "Convolution kernels for natural language", in Proc. Of Neural Information Processing Systems (NIPS 14), pp. 625–632, 2001.
    [14] T. Takahashi, K. Nawata, K. Inui, and Y. Matsumoto, “Effects of Structural Matching and Paraphrasing in Question Answering", in Proc. of IEICE Transactions on Information and Systems 2003.
    [15] V. W. Soo, S. Y. Yang, S. L. Chen and Y. T. Fu, “Ontology Acquisition and Semantic Retrieval from Semantic Annotated Chinese Poetry”, in Proc. of JCDL 2004.
    [16] "Web Ontology Language (OWL) Guide Version 1.0", http://www.w3.org/TR/owl-guide/.
    [17] “WordNet, a lexical semantic net for the English language”, http://www.cogsci.princeton.edu/~wn/.
    [18] “HowNet Knowledge Database”, http://www.keenage.com/.
    [19] S. H. Wu, T. H. Tsai, and W. L. Hsu, “Text Categorization using Automatically Acquired Domain Ontology”, in Proc. of IRAL2003 Workshop on Information Retrieval with Asian Languages, 2003.
    [20] S. H. Wu, and W. L. Hsu, “SOAT: A Semi-Automatic Domain Ontology Acquisition Tool from Chinese Corpus”, in Proc. of COLING 2002.
    [21] S. H. Wu, T. H. Tsai, and W. L. Hsu, “Domain Event Extraction and Representation with Domain Ontology,” in Proc. of IJCAI’03 Workshop on Information Integration on the Web, 2003.
    [22] R. Culmone, G. Rossi, and E. Merelli, “An ontology similarity algorithm for bioagent,” In Proc. of NETTAB Workshop on Agents in Bioinformtics, 2002.
    [23] G. Chen, “The Special Rules for the Third from Last Word in Tang Poetics,” GuoWenTianDi國文天地, Vol. 12, No, 4, 1996
    [24] A. Moschitti and C. A. Bejan “Semantic Kernel for Predicate Argument Classification,” in Proc. of CoNLL 2004.
    [25] S. Pradhan, K. Hacioglu, W. Ward, J. H. Martin, and D. Jurafsky, “Semantic role parsing: Adding semantic structure to unstructured text,” In Proc. of ICDM03.
    [26] K. Hacioglu, S. Pradhan,W.Ward, J. Martin, and D. Jurafsky, “Shallow semantic parsing using Support Vector Machines,” Technical report. 2003.
    [27] J. Chen and O. Rambow, “Use of deep linguistic features for the recognition and labeling of semantic arguments,” In Proc. of EMNLP 2003.
    [28] F. Ciravegna and Y. Wilks, “Designing Adaptive Information Extraction for the Semantic Web in Amilcare,” in Annotation for the Semantic Web, IOS Press, Amsterdam, 2003.
    [29] F. Ciravegna, “Adaptive Information Extraction from Text by Rule Induction and Generalisation,” in Proc. of IJCAI 2001.
    [30] I. H. Meng and W. P. Yang, “An Effective Engine for Answering Questions Based upon Chinese Semantic Extraction,” International Journal of Information and Management Sciences, Vol. 14, No, 4, pp.27-48, 2003.
    [31] A. Ittycheriah, M. Franz, and S. Roukos, “IBM’s statistical question answering system–TREC-10”, Proc. The Text REtrieval Conference (TREC), pp. 258–264, 2001.
    [32] Y. Kiyota, S. Kurohashi, and F. Kido, “”dialog navigator” : A questions answering system based on large text knowledge base”, In Proc. of COLING 2002.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE