簡易檢索 / 詳目顯示

研究生: 陳立強
Chen, Li-Chiang
論文名稱: 基於搭配詞將中文倒裝句還原
Revert the Chinese inverted sentences base on Collocation
指導教授: 許聞廉
Hsu, Wen-Lian
口試委員: 張詠淳
Chang, Yung-Chun
馬偉雲
Ma, Wei-yun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 58
中文關鍵詞: 語意理解搭配詞詞向量句子改寫主幹抽取
外文關鍵詞: semantic understanding, collocation, word vector, sentence rewriting, stem extraction
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 語意理解是自然語言處理發展史中,最悠久、最重要也最困難的一項任務,過去人們使用了語意網路,本體論,斷詞,詞性標註,句法分析,依存分析等等方式,去理解並操作自然語言,近年來深度學習的崛起,也讓人們有更多方式研究語意理解。本章使用了基於深度學習訓練的詞向量,將詞之間的關係程度抽取出來,關係程度高的,即為搭配詞,一個句子的知識,可以由句子中詞與詞之間的搭配關係看出端倪,我們將多種搭配關係定義出來,並且在大量文本中精準的抓出搭配詞組。
    搭配詞程度描述了詞與詞之間的關係,我們也使用了搭配詞的程度,應用在中文倒裝句的還原。要理解一個句子,常見的有句法剖析,然而中文倒裝句因為中文文法的關係,使用句法剖析的效果不佳,為了處理中文倒裝句的問題,我們將中文倒裝句,利用搭配詞將倒裝句的前綴詞、主詞、動詞、受詞、後綴詞定位出來,並將其按照較常見的文法重新排序,讓此倒裝句還原並可以被語法剖析並且被理解。
    對於已知是倒裝句的句子,我們已成功還原了50%的倒裝句,並提出後續的展望。


    Semantic comprehension is the longest, most important and most difficult task in the history of natural language processing. In the past, people used semantic networks, ontology, word segmentation, part-of-speech tagging, syntactic analysis, dependency analysis, etc. to understand and operating natural language. With the rise of deep learning in recent years has also enabled people to study more semantic understanding. This chapter uses the word vector based on deep learning to extract the degree of relationship between words. When the relationship is high, that is, the collocation, the knowledge of a sentence can be seen from the collocation between words and words in the sentence. In this chapter, we define a variety of collocations and accurately capture the collocation phrases in a large amount of text.
    The degree of collocation describes the relationship between words and words. We also use the degree of collocations to apply the restoration of Chinese inverted sentences. To understand a sentence, there is a common syntactic analysis. However, because of the Chinese grammar, the use of syntactic analysis is not good. In order to deal with the problem of Chinese inverted sentences, we use collocation to locate the prefix, main, verb, accept, and suffix words of the inverted sentence, then reordered according to the more common grammar, so that the inverted sentence can be restored and can be parsed and understood.
    For sentences that are known to be inverted sentences, we have successfully restored 50% of the inverted sentences and proposed a follow-up outlook.

    摘要 ii ABSTRACT iii 誌謝 iv LIST OF FIGURES viii LIST OF TABLES xi Chapter 1 緒論 1 1.1 研究目的與動機 1 1.1.1 自然語言處理 1 1.1.2 基於規則與基於統計 1 1.1.3 中文自然語言處理 4 1.2 研究主題 5 1.2.1 語意 5 1.2.2 句子改寫 6 1.3 章節概要 6 Chapter 2 相關研究 8 2.1 詞向量 8 2.2 句法剖析器 10 2.3 搭配詞組相關研究 11 2.3.1 使用語法剖析器抽取搭配詞 11 2.3.2 使用依存句法剖析器抽取搭配詞 13 2.3.3 使用統計方法抽取搭配詞 14 Chapter 3 搭配詞抽取 16 3.1 搭配詞組定義 16 3.2 搭配詞組抽取之系統架構 18 3.2.1 詞向量網路建置 19 3.2.2 搭配詞程度之數學模型 20 3.2.3 精確搭配實體抽取 21 Chapter 4 句子改寫 23 4.1 倒裝句定義 23 4.2 倒裝句剖析問題 24 4.3 倒裝句改寫 26 4.3.1 定位動詞 26 4.3.2 定位受詞與主詞 27 4.3.3 倒裝句還原 27 4.4 倒裝句辨識 31 Chapter 5 實驗結果與討論 33 5.1 搭配詞抽取 33 5.1.1 資料集 33 5.1.2 評估方式 33 5.1.3 樣本選擇 37 5.1.4 詞性分類標準 38 5.1.5 資料量與準確度 39 5.2 倒裝句還原 42 5.2.1 倒裝句資料集 42 5.2.2 倒裝還原句評估方式 42 5.2.3 倒裝還原正確率 43 5.2.4 倒裝還原句錯誤分析 43 5.2.5 還原後剖析結果 48 5.3 倒裝句辨識 52 Chapter 6 結論與未來工作 55 REFERENCE 57

    [1] D Pearce . Synonymy in Collocation Extraction. Proceedings of the workshop on WordNet and other
    [2] J Wermter, U Hahn. 2004. Collocation Extraction Based on Modifiability Statistics. Proceedings of the 20th international conference
    [3] Roger Levy and Christopher D. Manning. 2003. Is it harder to parse Chinese, or the Chinese Treebank? ACL 2003, pp. 439-446.
    [4] 謝佑明, 楊敦淇, 陳克健, 2004, "語法規律的抽取及普遍化與精確化的研究", Proceedings of ROCLING XVI, pp.141-150.
    [5] Richard Socher richard Cliff Chiung-Yu Lin chiungyu Andrew Y. Ng ang Christopher D. Manning, Parsing Natural Scenes and Natural Language with Recursive Neural Networks. Computer Science Department, Stanford University, Stanford, CA 94305, USA
    [6] Yu-Ming Hsieh, and Wei-Yun Ma. 2016. N-best Rescoring for Parsing Based on Dependency-Based Word Embeddings. International Journal of Computational Linguistics and Chinese Language Process (IJCLCLP), 21(2):19-32.
    [7] D Lin. 1998. Extracting collocations from text corpora. First workshop on computational terminology, 1998 – Citeseer
    [8] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality
    [9] D.F. Specht. 1991. A general regression neural network. IEEE Transactions on Neural Networks
    [10] Yoav Goldberg and Omer Levy. 2014. word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method
    [11] Seretan, Violeta. 2008. Collocation extraction based on syntactic parsing.
    [12] Seretan, Violeta Nerima, Luka Wehrli, Eric. 2003. Extraction of multi-word collocations using syntactic bigram composition. Proceedings of the Fourth International Conference on Recent Advances in NLP (RANLP-2003)
    [13] D Lin. 1998. Using collocation statistics in information extraction. Seventh Message Understanding Conference (MUC-7)
    [14] Pavel Pecina. 2005. An extensive empirical study of collocation extraction methods. ACLstudent '05 Proceedings of the ACL Student Research Workshop Pages 13-18

    QR CODE