研究生: |
李巧雯 Li, Chiao-Wen |
---|---|
論文名稱: |
類神經機器翻譯為本的中文拼字改錯系統 Chinese Spelling Check based on Neural Machine Translation |
指導教授: |
張俊盛
Chang, Jyun-Sheng |
口試委員: |
許永真
Hsu, Yung-Jen 柯淑津 Ker, Sue-Jin |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 44 |
中文關鍵詞: | 中文拼字改錯 、生成人造錯誤 、類神經機器翻譯 、改稿紀錄 、編輯紀錄 |
外文關鍵詞: | Chinese Spelling Check, Chinese Error Correction, Artificial Error Generation, Neural Machine Translation, Edit Log |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一個中文拼字改錯的方法,自動學習改正一個句子中潛在的拼字錯誤。 我們應用類神經機器翻譯模型(Neural Machine Translation, NMT)於中文拼字改錯,亦即將一句可能有拼字錯誤的句子翻譯為正確的句子。 我們使用從新聞改稿紀錄和人造錯誤資料中提取的對與錯的句對來訓練一個NMT拼字改錯模型。 在訓練階段,我們首先從新聞改稿紀錄抽取與拼字錯誤修改有關的句子。為了擴充訓練資料,我們使用勘誤表(Confusion Set)來生成具有拼字錯誤的句子,接著用這些資料來訓練模型。 實驗結果顯示,改稿紀錄加上人造錯誤資料所訓練的模型有較好的效能。
We present a method for Chinese spelling check that automatically learns to correct a sentence with potential spelling errors. In our approach, a character-based neural machine translation (NMT) model is trained to translate the potentially misspelled sentence into correct one, using right-and-wrong sentence pairs from newspaper edit logs and artificially generated data. The method involves extracting sentences contain edit of spelling correction from edit logs, using commonly confused right-and-wrong word pairs to generate artificial right-and-wrong sentence pairs in order to expand our training data , and training the NMT model. The evaluation on the United Daily News (UDN) Edit Logs and SIGHAN-7 Shared Task shows that adding artificial error data can significantly improve the performance of Chinese spelling check system.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
Chao-Huang Chang. A new approach for automatic chinese spelling correction. In Proceedings of Natural Language Processing Pacific Rim Symposium, volume 95, pages 278–283. Citeseer, 1995.
Hsun-wen Chiu, Jian-cheng Wu, and Jason S Chang. Chinese spelling checker based on statistical machine translation. In Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pages 49–53, 2013.
Shamil Chollampatt and Hwee Tou Ng. A multilayer convolutional encoder- decoder neural network for grammatical error correction. arXiv preprint arXiv:1801.08831, 2018.
Mariano Felice and Zheng Yuan. Generating artificial errors for grammatical error correction. In Proceedings of the Student Research Workshop at the 14th Confer- ence of the European Chapter of the Association for Computational Linguistics, pages 116–126, 2014.
Sunyan Gu and Fei Lang. A chinese text corrector based on seq2seq model. In Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2017 International Conference on, pages 322–325. IEEE, 2017.
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M Rush. Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810, 2017.
C-L Liu, M-H Lai, K-W Tien, Y-H Chuang, S-H Wu, and C-Y Lee. Visually and phonologically similar characters in incorrect chinese words: Analyses, identification, and applications. ACM Transactions on Asian Language Information Processing (TALIP), 10(2):10, 2011.
Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015.
Wei-Yun Ma and Keh-Jiann Chen. Introduction to ckip chinese word segmenta- tion system for the first international chinese word segmentation bakeoff. In Proceedings of the 2nd SIGHAN on CLP, pages 168–171, 2003.
Marek Rei, Mariano Felice, Zheng Yuan, and Ted Briscoe. Artificial error generation with machine translation and syntactic patterns. arXiv preprint arXiv:1707.05236, 2017.
Yuen-Hsien Tseng, Lung-Hao Lee, Li-Ping Chang, and Hsin-Hsi Chen. Introduction to sighan 2015 bake-off for chinese spelling check. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pages 32–37, 2015.
Shih-Hung Wu, Yong-Zhi Chen, Ping-Che Yang, Tsun Ku, and Chao-Lin Liu. Reducing the false alarm rate of chinese character error detection and correction. In CIPS-SIGHAN Joint Conference on Chinese Language Processing, 2010.
Shih-Hung Wu, Chao-Lin Liu, and Lung-Hao Lee. Chinese spelling check evalua- tion at sighan bake-off 2013. In Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, pages 35–42, 2013.
Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y Ng. Neural language correction with character-based attention. arXiv preprint arXiv:1603.09727, 2016.
Zheng Yuan and Ted Briscoe. Grammatical error correction using neural ma- chine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 380–386, 2016.
Lei Zhang, Changning Huang, Ming Zhou, and Haihua Pan. Automatic detect- ing/correcting errors in chinese text by an approximate word-matching algo- rithm. In Proceedings of the 38th Annual Meeting on Association for Compu- tational Linguistics, pages 248–254. Association for Computational Linguistics, 2000.
蔡有秩. 新編錯別字門診. 語文訓練叢書. 螢火蟲, 2003. ISBN 9789867999115. URL https://books.google.com.tw/books?id=2t1LAAAACAAJ.
蔡榮圳. 常見錯別字辨正辭典. 中文可以更好. 商周出版, 2012. ISBN 9789866285585. URL https://books.google.com.tw/books?id= WV2YMwEACAAJ.