簡易檢索 / 詳目顯示

研究生: 陳駿瑜
論文名稱: 運用醫學相關特徵強化電子病歷中的指代消解
Enhancing Coreference Resolution for Electronic Medical Records using Medical Specific Features
指導教授: 許聞廉
蔡宗翰
口試委員: 許聞廉
蔡宗翰
蘇豐文
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 34
中文關鍵詞: 指代消解馬可夫邏輯網路最大熵模型電子病歷
外文關鍵詞: coreference resolution, markov logic network, maximum entropy, patient discharge summary
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電子病歷資料提供了個人詳細的醫療資訊,並且有豐富的診療記錄可供探勘。為了更了解病人,去發掘文字裡各種實體的關係更顯重要。在自然語言處理的領域裡,這項工作被稱之為指代消解。
    這篇研究的一個主要貢獻,在於根據前後文關係設計的規則,描述出有指代消解關係的配對。我們用了兩種自然語言處理的系統:一個是最大熵模型,另一個則是馬可夫邏輯網路模型。
    我們實驗結果,在馬可夫邏輯網路模型的系統上,平均F-measure分數達到了0.875。在這篇論文裡,我們描述了在做電子病歷的指代消解時的主要問題,並提出了許多不同的規則,去發掘前後文的資訊,並且使用了兩種不同的方法模型。


    Patient discharge summaries provide detailed medical information about hospitalized individuals and a rich resource of data for clinical record text mining. The textual expressions of this information are highly variable. In order to acquire a precise understanding of the patient, it is important to uncover the relationship between all instances in the text. In natural language processing (NLP), this task falls under the category of coreference resolution.
    A key contribution of this paper is the application of contextual-dependent rules that describe relationships between co-reference pairs. To resolve phrases that refer to the same entity, we use these rules in two representative NLP systems: one based on the maximum entropy (ME) model, and the other being a system built on the Markov logic network (MLN) model.
    Our experimental results show that our proposed MLN-based system achieved an un-weighted average F-measure of 0.875. In this paper, we have described the main challenges in the resolution of co-reference relations within patient discharge summaries. Several rules were proposed to exploit contextual information and a total of two approaches were presented.

    Abstract 2 List of Tables 2 List of Figures 3 Chapter 1 Background and Significance 4 Chapter 2 Methods 8 2.1 System 1: Markov Logic Network-based Co-reference Resolution System 8 2.2 System 2: Maximum Entropy-based Co-reference Resolution System 11 2.3 Predicate Definition 11 2.4 General Formulae 13 2.5 Medical related formulae 17 2.6 Collective formulae 20 2.7 Singleton detection 21 Chapter 3 Results 23 3.1 Dataset 23 3.2 Results on the Test Set 24 Chapter 4 Discussion 27 4.1 The Hardest Concept Type—Treatment 27 chapter 5 Conclusion 29 References 30 APPENDIX 33

    1. Morton TS. Coreference for NLP applications. Proceedings of the 38th Annual Meeting on Association for Computational Linguistics 1997.
    2. Uzuner Ö. BA, Shen S., Forbush T., Pestian J., South B. Evaluating the state of the art in coreference resolution for electronic medical records. Journal of the American Medical Informatics Association 2012.
    3. Raghunathan K, Lee H, Rangarajan S, Chambers N, Surdeanu M, Jurafsky D, et al. A Multi-Pass Sieve for Coreference Resolution. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing Cambridge, Massachusetts: Association for Computational Linguistics, 2010:492-501.
    4. Richardson M, Domingos P. Markov logic networks. Machine Learning 2006;62(Special Issue: Multi-Relational Data Mining and Statistical Relational Learning):107-36.
    5. Soon WM, Ng HT, Lim DCY. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics 2001;27(4):521-44.
    6. Crammer K, Singer Y. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research 2003;3:951-91.
    7. Riedel S. Improving the Accuracy and Efficiency of MAP Inference for Markov Logic. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI 2008). Helsinki, Finland: AUAI Press, 2008:468-78.
    8. M.F.Porter. An algorithm for suffix stripping. Program 1980;14(3):130-37.
    9. Huang M, Zhu X, Hao Y, Payan DG, Qu K, Li M. Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 2004;20(18):3604-12.
    10. Sung C-L, Lee C-W, Yen H-C, Hsu W-L. Alignment-based surface patterns for factoid question answering systems. Integrated Computer-Aided Engineering 2009;16:259-69.
    11. Banerjee S, Pedersen T. Extended gloss overlaps as a measure of semantic relatedness. Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI). Acapulco, Mexico: Morgan Kaufmann Publishers Inc., 2003:805-10.
    12. Winkler WE. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. Proceedings of the Section on Survey Research Methods: American Statistical Association, 1990:354-59.
    13. Uzuner O, Forbush T, Shen S, Savova G, Chapman W, Clark C, et al. 2011 i2b2/VA Co-reference Annotation Guidelines for the Clinical Domain, 2011.
    14. A model-theoretic coreference scoring scheme. Proceedings of the sixth Message Understanding Conference; 1995. Association for Computational Linguistics.
    15. Bagga A, Baldwin B. Entity-based cross-document coreferencing using the Vector Space Model. Proceedings of the 17th international conference on Computational linguistics - Volume 1. Montreal, Quebec, Canada: Association for Computational Linguistics, 1998:79-85.
    16. Luo X. On coreference resolution performance metrics. Proceedings of the Annual Meeting of the North American Chapter of the Association for Computational Linguistics - Human Language Technology Conference, 2005:25-32.
    17. Dietterich TG. Ensemble Methods in Machine Learning. Proceedings of the First International Workshop on Multiple Classifier Systems: Springer-Verlag, 2000.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE