簡易檢索 / 詳目顯示

研究生: 許菀庭
Hsu, Wan-Ting
論文名稱: 以不一致性損失函數結合抽取式和生成式摘要的融合摘要模型
A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss
指導教授: 孫民
Sun, Min
口試委員: 王鈺強
Wang, Yu-Chiang
陳縕儂
Chen, Yun-Nung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 43
中文關鍵詞: 自然語言處理摘要深度學習
外文關鍵詞: NLP, summarization, deeplearning
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 我們提出了一個融合摘要模型,結合了抽取式和生成式兩種摘要的優點。抽取式摘要模型可以產生句子層級的注意力,並且得到很高的ROUGE分數;另一方面,生成式摘要需要更複雜的模型來產生單字層級的注意力並生成可讀信較高的摘要。我們的模型利用了句子層級的注意力來調整單字層級的關注值,讓出現在注意力低的句子的單字比較不會被模型輸出。此外,我們提出了一個創新的不一致性損失函數來懲罰兩個層級的注意力之間的不一致性。我們以不一致損失函數和抽取式與生成式模型原始的損失函數,透過端到端的方式訓練,我們在CNN/Daily Mail數據上達到了目前最高的ROUGE分數,並且在人類評估的實驗中,我們產生的摘要也是重要性和可讀性最高的。


    We propose a unified model combining the strength of extractive and abstractive summarization. On the one hand, a simple extractive model can obtain sentence-level attention with high ROUGE scores but less readable. On the other hand, a more complicated abstractive model can obtain word-level dynamic attention to generate a more readable paragraph. In our model, sentence-level attention is used to modulate the word-level attention such that words in less attended sentences are less likely to be generated. Moreover, a novel inconsistency loss function is introduced to penalize the inconsistency between two levels of attentions. By end-to-end training our model with the inconsistency loss and original losses of extractive and abstractive models, we achieve state-of-the-art ROUGE scores while being the most informative and readable summarization on the CNN/Daily Mail dataset in a solid human evaluation.

    摘要 ii Abstract iii 誌謝 iv 1 Introduction 1 2 Related Work 4 2.1 Extractive Summarization 4 2.2 Abstractive Summarization 5 2.3 Hierarchical Attention 6 3 Preliminaries 7 3.1 Extractive Summarization Model (Extractor) 7 3.1.1 Problem Definition 7 3.1.2 Model Architecture 8 3.1.3 Loss Function 10 3.1.4 Ground-Truth Labels 10 3.2 Abstractive Summarization Model (Abstracter) 10 3.2.1 Problem Definition 11 3.2.2 Model Architecture 11 3.2.3 Loss Function 14 4 Our Unified Model 15 4.1 Problem definition 15 4.2 Combining Attentions 16 4.3 Inconsistency Loss 16 4.4 Extractor 18 4.5 Abstracter 19 4.6 Training Procedure 21 5 Experiments 22 5.1 Dataset 22 5.2 Implementation Details 22 6 Results 24 6.1 Results of Extracted Sentences 24 6.2 Results of Abstractive Summarization 25 6.3 Human Evaluation 26 6.4 Results on Non-news Articles 28 7 Conclusion and Future Work 39 7.1 Conclusion 39 7.2 Future Work 39 References 41

    [1] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, pp. 3156–3164, IEEE, 2015. viii, 36
    [2] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Convolutional sequence to sequence learning,” in International Conference on Machine Learning, pp. 1243–1252, 2017. viii, 6, 37
    [3] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Proceedings of the 2015 International Conference on Learning Representations (ICLR), 2014. viii, 1, 6, 38
    [4] J. Cheng and M. Lapata, “Neural summarization by extracting sentences and words,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 484–494, 2016. 1, 4
    [5] R. Nallapati, B. Zhou, and M. Ma, “Classify or select: Neural architectures for extractive document summarization,” arXiv preprint arXiv:1611.04244, 2016. 1, 4
    [6] R. Nallapati, F. Zhai, and B. Zhou, “Summarunner: A recurrent neural network based sequence model for extractive summarization of documents.,” in Proceedings of the 2017 Association for the Advancement of Artificial Intelligence, pp. 3075–3081, 2017. 1, 2, 4, 6, 7, 9, 10, 15, 18, 22
    [7] S. Narayan, N. Papasarantopoulos, M. Lapata, and S. B. Cohen, “Neural extractive summarization with side information,” arXiv preprint arXiv:1704.04530, 2017. 1, 4
    [8] M. Yasunaga, R. Zhang, K. Meelu, A. Pareek, K. Srinivasan, and D. Radev, “Graph-based neural multi-document summarization,” in Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pp. 452–462, 2017. 1, 2, 4
    [9] R. Nallapati, B. Zhou, C. dos Santos, C. Gulcehre, and B. Xiang, “Abstractive text summarization using sequence-to-sequence rnns and beyond,” in Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 280–290, 2016. 1, 5, 6, 10, 11, 22, 25
    [10] A. See, P. J. Liu, and C. D. Manning, “Get to the point: Summarization with pointer-generator networks,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1073–1083, 2017. 1, 2, 5, 6, 10, 11, 13, 14, 15, 19, 20, 22, 25, 26, 28
    [11] R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization,” in Proceedings of the 2018 International Conference on Learning Representations, 2017. 1, 5, 11, 25, 26, 27, 28
    [12] A. Fan, D. Grangier, and M. Auli, “Controllable abstractive summarization,” arXiv preprint arXiv:1711.05217, 2017. 1, 6
    [13] L. Liu, Y. Lu, M. Yang, Q. Qu, J. Zhu, and H. Li, “Generative adversarial network for abstractive text summarization,” in Proceedings of the 2018 Association for the Advancement of Artificial Intelligence, 2017. 1, 6, 25, 26, 28
    [14] M. Kågebäck, O. Mogren, N. Tahmasebi, and D. Dubhashi, “Extractive summarization using continuous vector space models,” in Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC), pp. 31–39, 2014. 4
    [15] W. Yin and Y. Pei, “Optimizing sentence modeling and selection for document summarization,” in Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 1383–1389, AAAI Press, 2015. 4
    [16] S. Narayan, S. B. Cohen, and M. Lapata, “Ranking sentences for extractive summarization with reinforcement learning,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018. 4
    [17] A. M. Rush, S. Chopra, and J. Weston, “A neural attention model for abstractive sentence summarization,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389, 2015. 5, 11
    [18] P. Over, H. Dang, and D. Harman, “Duc in context,” Information Processing & Management, vol. 43, no. 6, pp. 1506–1520, 2007. 5
    [19] D. Graff, J. Kong, K. Chen, and K. Maeda, “English gigaword,” Linguistic Data Consortium, Philadelphia, vol. 4, p. 1, 2003. 5
    [20] B. Hu, Q. Chen, and F. Zhu, “Lcsts: A large scale chinese short text summarization dataset,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (Lisbon, Portugal), pp. 1967–1972, Association for Computational Linguistics, September 2015. 5
    [21] J. Gu, Z. Lu, H. Li, and V. O. Li, “Incorporating copying mechanism in sequence-to-sequence learning,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1631–1640, 2016. 5, 11, 13
    [22] C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio, “Pointing the unknown words,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 140–149, 2016. 5
    [23] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” in Advances in Neural Information Processing Systems, pp. 2692–2700, 2015. 5
    [24] K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom, “Teaching machines to read and comprehend,” in Advances in Neural Information Processing Systems, pp. 1693–1701, 2015. 5, 22
    [25] Y. Miao and P. Blunsom, “Language as a latent variable: Discrete generative models for sentence compression,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 319–328, 2016. 5
    [26] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2, pp. 2672–2680, MIT Press, 2014. 6
    [27] A. Celikyilmaz, A. Bosselut, X. He, and Y. Choi, “Deep communicating agents for abstractive summarization,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018. 6
    [28] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489, 2016. 6
    [29] C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” Text Summarization Branches Out, 2004. 10, 18
    [30] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, pp. 6000–6010, 2017. 16
    [31] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2121–2159, 2011. 22

    QR CODE