研究生: |
劉亦真 |
---|---|
論文名稱: |
建立T3剖析樹語料庫:台語部分 Building the Taiwanese Treebank in T3 Corpus |
指導教授: | 江永進 |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 43 |
中文關鍵詞: | 剖析樹 |
外文關鍵詞: | treebank |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
T3語料庫是台灣台客華三種漢語e剖析樹平行語料庫,T3語料庫e文句取自 “現代漢語八百詞”e(華語)例句,該詞典是華語虛詞詞典,文法結構豐富。阮先將例句翻譯做台文、客文,然後斷詞、標記詞類、文法樹標記;所有e工作是人工完成,雖然阮設計軟體工具幫助。本論文報告台語部份到現在完成e部份。台語詞類e討論過去少見散見,阮需要自基礎建立;台語文法樹e系統性報告,本論文親像也是起頭。Dotted tag是新e詞組標記辦法,具備卡好編輯、明確主張詞組性e好處。
T3 corpus is a treebank corpus consists of parallel sentences in the three major languages in Taiwan: Taiwanese, Hakka, and Mandarin. Those sentences are originally example sentences or phrases from “現代漢語八百詞”, and are translated into Taiwanese and Hakka by native speakers. The translated sentences are then segmented, Part-of-Speech tagged, and then syntactically bracketed; all done manually, although software tools are designed to help the laboring task of editing. This thesis reports the progress in the Taiwanese part. For the Part-of-Speech tagging, only few and somewhat incomplete literatures exist and we adopt a tag set of 23 tags. For bracketing, it seems that T3 corpus is the first treebank in Taiwanese. “Dotted tag” is a new system of tagging/bracketing Taiwanese phrase, and has the advantages of being easier to edit and explicitly promoting the use of phrase categories.
參考文獻
[1] Fei Xia(2000). “The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0)”. http://www.cis.upenn.edu/~chinese/posguide.3rd.ch.pdf.
[2] 朱德熙(1982). 語法講義。北京:商務印書館。
[3] 楊秀芳(1991). 臺彎閩南語語法稿。大安出版社。
[4] 馬真(1997). 簡明實用漢語語法教程。北京大學出版社。
[5] 陸儉明(2003). “對“NP+的+VP”結構的重新認識”。北京大學。
[6] 朱德熙(1984). 語法答問。北京:商務印書館。
[7] 洪俊詠(2005). “馬可夫語言模型應用di台語變調gah 注音”。新竹:清華大學統計所碩士論文。
[8] Fei Xia(2000). “The Segmentation Guidelines for the Penn Chinese Treebank (3.0)”. http://www.cis.upenn.edu/~chinese/segguide.3rd.ch.pdf.
[9] Nianwen Xue, Fei Xia(2000). “The Bracketing Guidelines for the Penn Chinese
Treebank (3.0)”. http://www.cis.upenn.edu/~chinese/parseguide.3rd.ch.pdf.
[10] 江永進(2003).“台音輸入法6.0”。新竹:清華大學統計所。