研究生: |
戴鴻傑 Dai, ng-Jie |
---|---|
論文名稱: |
應用集體推論於生醫全文文獻之實體鏈結與排序 Collective Entity Identification and Ranking in Biomedical Full Text Literature |
指導教授: |
許聞廉
Hsu, n-Lian 蔡宗翰 Tsai, Tzong-Han |
口試委員: |
蘇豐文
Soo, Von-Wen 蔣榮先 Chiang, Jung-Hsien 陳信希 Chen, Hsin-Hsi |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 184 |
中文關鍵詞: | 命名實體歧異消除 、全域排序 、馬可夫邏輯網路 、資料融合 、資訊擷取 |
外文關鍵詞: | entity linking, global ranking, Markov logic network, data fusion, information extraction |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Several research results have showed that finding information about certain entities is the most common information needs of information retrieval users. The needs should be answered by returning specific entities, their properties or related entities instead of just documents. While some search engines are capable of recognizing specific types of entities, true entity-oriented search still has a long way to go because of the high ambiguity in names across documents. Entity Linking (EL) goes beyond the entity recognition task by linking a textual entity mention to a knowledge base entry. It is a difficult task involving several challenges, including name variation and ambiguity.
This dissertation considers identifying the identity of one particular entity type in biomedical articles—gene/gene product mentions as a case study to explore the EL task. Unlike most previous EL-related tasks, this work considers the task from the perspective of the instance-based level and evaluates its performance from an integrated view of the recognition and linking steps. Considering EL tasks from the instance level makes our approach and its evaluation results more relevant to the developers of information extraction applications. The dissertation compiles the first instance-based gene mention linking corpus, which uncovers new challenges that the current EL approaches need to address. A collective EL approach is proposed to deal with those challenges by using not only the contextual information of each individual instance but also relations among them. The experimental results show that the collective EL approach can achieve an F-score of 74.1%, which outperforms the traditional individual classification approach by 1.7%. The collective approach can be extended to exploit the characteristics of different paper sections to further improve the EL performance by an F-score of 1.82% in the full text.
In addition, retrieving entities as answers to a query has emerged as a new research field. Here the goal is not to just recognize the names of the entities in documents but rather to get back a ranked list of the relevant entities. The ranking task has wide applications. For example, in the curation process of bibliographic databases, generating a ranked entity list is very important because only a small percentage of entities mentioned in a literature are suitable for indexing. Such a ranked list can facilitate a curator to select suitable entities for curation. In this dissertation, a global ranking framework, which considers the exist relationships between the entities to be ranked, is proposed to improve the performance of conventional entity ranking models. By using the proposed framework, the performance of the local ranking model can be improved by 3.2% using the official evaluation metric of the BioCreAtIvE challenge. In addition, by employing the standard ranking quality measure, NDCG, the dissertation demonstrates that the proposed framework can be cascaded with different local ranking models and still improve their ranking results.
Several research results have showed that finding information about certain entities is the most common information needs of information retrieval users. The needs should be answered by returning specific entities, their properties or related entities instead of just documents. While some search engines are capable of recognizing specific types of entities, true entity-oriented search still has a long way to go because of the high ambiguity in names across documents. Entity Linking (EL) goes beyond the entity recognition task by linking a textual entity mention to a knowledge base entry. It is a difficult task involving several challenges, including name variation and ambiguity.
This dissertation considers identifying the identity of one particular entity type in biomedical articles—gene/gene product mentions as a case study to explore the EL task. Unlike most previous EL-related tasks, this work considers the task from the perspective of the instance-based level and evaluates its performance from an integrated view of the recognition and linking steps. Considering EL tasks from the instance level makes our approach and its evaluation results more relevant to the developers of information extraction applications. The dissertation compiles the first instance-based gene mention linking corpus, which uncovers new challenges that the current EL approaches need to address. A collective EL approach is proposed to deal with those challenges by using not only the contextual information of each individual instance but also relations among them. The experimental results show that the collective EL approach can achieve an F-score of 74.1%, which outperforms the traditional individual classification approach by 1.7%. The collective approach can be extended to exploit the characteristics of different paper sections to further improve the EL performance by an F-score of 1.82% in the full text.
In addition, retrieving entities as answers to a query has emerged as a new research field. Here the goal is not to just recognize the names of the entities in documents but rather to get back a ranked list of the relevant entities. The ranking task has wide applications. For example, in the curation process of bibliographic databases, generating a ranked entity list is very important because only a small percentage of entities mentioned in a literature are suitable for indexing. Such a ranked list can facilitate a curator to select suitable entities for curation. In this dissertation, a global ranking framework, which considers the exist relationships between the entities to be ranked, is proposed to improve the performance of conventional entity ranking models. By using the proposed framework, the performance of the local ranking model can be improved by 3.2% using the official evaluation metric of the BioCreAtIvE challenge. In addition, by employing the standard ranking quality measure, NDCG, the dissertation demonstrates that the proposed framework can be cascaded with different local ranking models and still improve their ranking results.
Adafre, S. F., Rijke, M. d., & Sang, E. T. K. (2007). Entity retrieval. Paper presented at the Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria.
Aha, D. W., & Bankert, R. L. (1995). A comparative evaluation of sequential feature selection algorithms. In D. Fisher & H.-J. Lenz (Eds.), Learning from Data: Artificial Intelligence and Statistics V (pp. 199-206): Springer-Verlag.
Alias-i. (2008). LingPipe 4.1.0., from http://alias-i.com/lingpipe (accessed October 1, 2008)
Aronson, A. (2001). Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program. JOURNAL OF BIOMEDICAL INFORMATICS, 35, 17-21.
Aronson, A. R., Mork, J. G., Gay, C. W., Humphrey, S. M., & Rogers, W. J. (2004). The NLM Indexing Initiative's Medical Text Indexer. Stud Health Technol Inform, 107(Pt 1), 268-272.
Artiles, J., Gonzalo, J., & Sekine, S. (2007). The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task. Paper presented at the Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, Czech Republic.
Aslam, J. A., & Montague, M. (2001). Models for metasearch. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, United States.
Bailey, P., Craswell, N., Vries, A. P. d., & Soboroff, I. (2007). Overview of the TREC 2007 Enterprise Track. Paper presented at the Proceedings of the Sixteenth Text REtrieval Conference Proceedings (TREC 2007), Gaithersburg, Maryland.
Balog, K., & Serdyukov, P. (2011). Overview of the TREC 2011 Entity Track. Paper presented at the Proceedings of the Twentieth Text REtrieval Conference Proceedings (TREC 2011), Gaithersburg, Maryland.
Balog, K., Serdyukov, P., & Vries, A. P. d. (2010). Overview of the TREC 2010 Entity Track. Paper presented at the Proceedings of the Nineteenth Text REtrieval Conference Proceedings (TREC 2010), Gaithersburg, Maryland.
Balog, K., Serdyukov, P., Vries, A. P. d., Thomas, P., & Westerveld, T. (2009). Overview of the TREC 2009 Entity Track. Paper presented at the Proceedings of the Eighteenth Text REtrieval Conference Proceedings (TREC 2009), Gaithersburg, Maryland.
Balog, K., Thomas, I. S. P., Bailey, P., Craswell, N., & Vries, A. P. d. (2008). Overview of the TREC 2008 Enterprise Track. Paper presented at the Proceedings of the Seventeenth Text REtrieval Conference Proceedings (TREC 2008), Gaithersburg, Maryland.
Bartell, B. T., Cottrell, G. W., & Belew, R. K. (1994). Automatic combination of multiple ranked retrieval systems. Paper presented at the Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2011). GenBank. Nucleic Acids Res, 39(Database issue), D32-D37. doi: 10.1093/nar/gkq1079
Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39-71.
Bhalotia, G., Nakov, P., Schwartz, A., & Hearst, M. (2003). BioText team report for the TREC 2003 genomics track. Paper presented at the Proceedings of The 12th Text Retrieval Conference (TREC 2003), Gaithersburg, Maryland.
Bhattacharya, I., & Getoor, L. (2007). Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 5.
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., & Fienberg, S. (2005). Adaptive name matching in information integration. Intelligent Systems, IEEE, 18(5), 16-23.
Blake, J. A., Bult, C. J., Kadin, J. A., Richardson, J. E., Eppig, J. T., & Group, M. G. D. (2011). The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics. Nucleic Acids Research, 39(suppl 1), D842-D848.
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., . . . Schneider, M. (2003). The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research, 31(1), 365.
Ceol, A., Chatr-Aryamontri, A., Licata, L., & Cesareni, G. (2008). Linking entries in protein interaction database to structured text: The FEBS Letters experiment. FEBS Letters, 582(8), 1171-1177.
Chatr-aryamontri, A., Kerrien, S., Khadake, J., Orchard, S., Ceol, A., Licata, L., . . . Hermjakob, H. (2008). MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data. Genome Biology, 9(Suppl 2), S5.
Chen, L., Liu, H., & Friedman, C. (2005). Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics, 21(2), 248-256. doi: 10.1093/bioinformatics/bth496
Chinchor, N. (1997). MUC-7 named entity task definition. Paper presented at the Proceedings of the 7th Message Understanding Conference.
Chou, P.-H., Dai, H.-J., Huang, C.-H., Tsai, R. T.-H., & Hsu, W.-L. (2008). A Web Application for Biomedical Entities and Relations Annotation Using the Unstructured Information Management Architecture. Paper presented at the Proceedings of the International Computer Symposium (ICS), Taipei, Taiwan.
Chu-Carroll, J., & Prager, J. (2007). An experimental study of the impact of information extraction accuracy on semantic search performance. Paper presented at the Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal.
Cohen, A. M. (2005). Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. Paper presented at the Proceedings of the Joint ACL Workshop and BioLINK SIG (ISMB) on Linking Biological Literature Ontologies and Databases.
Conrad, J. G., & Utt, M. H. (1994). A system for discovering relationships by feature extraction from text databases. Paper presented at the Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland.
Consortium, T. G. O., Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., . . . Sherlock, G. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1), 25-29.
Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3, 951-991.
Crim, J., McDonald, R., & Pereira, F. (2005). Automatically Annotating Documents with Normalized Gene Lists. BMC Bioinformatics, 6(Suppl 1), S13.
Cucerzan, S. (2007). Large-scale named entity disambiguation based on Wikipedia data. Paper presented at the Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic.
Culotta, A., & McCallum, A. (2005). Joint deduplication of multiple record types in relational data. Paper presented at the Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM'05), New York, NY, USA.
Dai, H.-J., Chang, Y.-C., Tsai, R. T.-H., & Hsu, W.-L. (2010). New challenges for biological text-mining in the next decade. Journal of Computer Science and Technology, 25(1), 169-179.
Dai, H.-J., Chang, Y.-C., Tsai, R. T.-H., & Hsu, W.-L. (2011). Integration of gene normalization stages and co-reference resolution using a Markov logic network. Bioinformatics, 27(18), 2586-2594.
Dai, H.-J., Huang, C.-H., Lin, J. Y.-W., Chou, P.-H., Tsai, R. T.-H., & Hsu, W.-L. (2008). A Survey of State of the Art Biomedical Text Mining Techniques for Semantic Analysis. Paper presented at the Proceedings of the first IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC2008).
Dai, H.-J., Huang, C.-H., Lin, R. T. K., Tsai, R. T.-H., & Hsu, W.-L. (2008). BIOSMILE web search: a web application for annotating biomedical entities and relations. Nucl. Acids Res., 36(Web Server issue), W390-W398.
Dai, H.-J., Hung, H.-C., Tsai, R. T.-H., & Hsu, W.-L. (2007). IASL Systems in the Gene Mention Tagging Task and Protein Interaction Article Sub-task. Paper presented at the Proceedings of Second BioCreAtIvE Challenge Evaluation Workshop, Madrid, Spain.
Dai, H.-J., Lai, P.-T., Huang, C.-H., Chang, Y.-C., Bow, Y.-Y., Wu, H.-T., . . . Hsu, W.-L. (2009). IASL-IISR Interactor Normalization System: Using a Multi-stage Gene Normalization Algorithm and SVM-based ranking. Paper presented at the Proceedings of BioCreAtIvE II.5 Challenge Evaluation Workshop, Madrid, Spain.
Dai, H.-J., Lai, P.-T., & Tsai, R. T.-H. (2010). Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 7(3), 412-420.
Dai, H.-J., Lai, P.-T., Tsai, R. T.-H., & Hsu, W.-L. (2010). Global Ranking via Data Fusion Paper presented at the Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China.
Dai, H.-J., Tsai, R. T.-H., & Hsu, W.-L. (2011). Entity Disambiguation Using a Markov-Logic Network. Paper presented at the Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand.
Dai, H.-J., Tsai, W.-C., Tsai, R. T.-H., & Hsu, W.-L. (2011). Enhancing Search Results with Semantic Annotation Using Augmented Browsing. Paper presented at the Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI11), Barcelona, Catalonia (Spain).
Dai, H.-J., Wu, C.-Y., Tsai, R. T.-H., & Hsu, W.-L. (2012). From Entity Recognition to Entity Linking: A Survey of Advanced Entity Linking Techniques. Paper presented at the Proceedings of the 26th Annual Conference of the Japanese Society for Artificial Intelligence, Yamaguchi, Japan.
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., . . . Zien, J. Y. (2003). SemTag and seeker: bootstrapping the semantic web via automated semantic annotation. Paper presented at the Proceedings of the 12th international conference on World Wide Web, Budapest, Hungary.
Dogan, R. I., Murray, G. C., Névéol, A., & Lu, Z. (2009). Understanding PubMed user search behavior through log analysis. Database: the journal of biological databases and curation, 2009.
Domingos, P., Kok, S., Poon, H., Richardson, M., & Singla, P. (2006). Unifying logical and statistical AI. Paper presented at the Proceedings of the 21st National Conference on Artificial Intelligence.
Domingos, P., & Lowd, D. (2009). Markov Logic: An Interface Layer for Artificial Intelligence: Morgan and Claypool Publishers.
Dowell, K. G., McAndrews-Hill, M. S., Hill, D. P., Drabkin, H. J., & Blake, J. A. (2009). Integrating text mining into the MGI biocuration workflow. Database (Oxford), 2009, bap019. doi: 10.1093/database/bap019
Dredze, M., McNamee, P., Rao, D., Gerber, A., & Finin, T. (2010). Entity Disambiguation for Knowledge Base Population. Paper presented at the Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing.
Eales, J. M., Stevens, R. D., & Robertson, D. L. (2008). Full-Text Mining: Linking Practice, Protocols and Articles in Biological Research. Paper presented at the Proceedings of the ISMB BioLINK Special Interest Group on Text Data Mining, Toronto, Canada.
Eaton, A. (2006). HubMed: a web-based biomedical literature search interface. Nucl. Acids Res., 34(Web Server issue), W745.
Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering, 19(1), 1-16. doi: 10.1109/tkde.2007.9
Fang, K., GuoDong, Z., & Qiaoming, Z. (2009, August 6-7). Employing the centering theory in pronoun resolution from the semantic perspective. Paper presented at the Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2, Singapore.
Fang, Y., Si, L., Yu, Z., Xian, Y., & Xu, Y. (2009). Entity Retrieval by Hierarchical Relevance Model, Exploiting the Structure of Tables and Learning Homepage Classifiers. Paper presented at the Proceedings of the Eighteenth Text REtrieval Conference Proceedings (TREC 2009), Gaithersburg, Maryland.
Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64(328), 1183-1210.
Fernández, J. M., Hoffmann, R., & Valencia, A. (2007). iHOP web services. Nucl. Acids Res., 35(Web Server issue), W21-W26.
Finkel, J., Dingare, S., Manning, C., Nissim, M., Alex, B., & Grover, C. (2005). Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics, 6(Suppl 1), S5.
Florian, R., Hassan, H., Ittycheriah, A., Jing, H., Kambhatla, N., Luo, X., . . . Roukos, S. (2004). A Statistical Model for Multilingual Entity Detection and Tracking. Paper presented at the Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, Boston, MA, USA.
Franzén, K., Eriksson, G., Olsson, F., Asker, L., Lidén, P., & Cöster, J. (2002). Protein names and how to find them. International Journal of Medical Informatics, 67(1-3), 49-61. doi: Doi: 10.1016/s1386-5056(02)00052-7
Genesereth, M. R., & Nilsson, N. J. (1987). Logical foundations of artificial intelligence (Vol. 9). San Francisco, CA: Morgan Kaufmann Publishers, INC.
Getoor, L., & Taskar, B. (2007). Introduction to Statistical Relational Learning: The MIT Press.
Gooi, C. H., & Allan, J. (2004). Cross-document coreference on a large scale corpus. Paper presented at the Proceedings of Human Language Technology Conference/North American Association for Computational Linguistics Annual Meeting, Boston, MA.
Gospodnetic, O., & Hatcher, E. (2004). Lucene IN ACTION: Manning Publications.
Gottipati, S., & Jiang, J. (2011). Linking Entities to a Knowledge Base with Query Expansion. Paper presented at the Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP2011), Edinburgh, Scotland, UK.
Graesser, A. C., Jeon, M., Yan, Y., & Cai, Z. (2007). Discourse cohesion in text and tutorial dialogue. Information Design Journal, 15(3), 199-213.
Grishman, R., & Sundheim, B. (1996). Message Understanding Conference-6: a brief history. Paper presented at the Proceedings of the 16th conference on Computational linguistics - Volume 1, Copenhagen, Denmark.
Grosz, B. J., Weinstein, S., & Joshi, A. K. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2), 203-225.
Group, N. S. (2008). Assessment of Detection and Recognition of Entities and Relations Within and Across Documents Automatic Content Extraction 2008 Evaluation Plan (ACE08).
Guo, J., Xu, G., Cheng, X., & Li, H. (2009). Named entity recognition in query. Paper presented at the Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Boston, MA, USA.
Gupta, R., Sarawagi, S., & Diwan, A. A. (2010). Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials. J. Mach. Learn. Res., 9999, 3097-3135.
Hakenberg, J., Plake, C., Leaman, R., Schroeder, M., & Gonzalez, G. (2008). Inter-species normalization of gene mentions with GNAT. Bioinformatics, 24(16), 126-132. doi: 10.1093/bioinformatics/btn299
Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A., & McKusick, V. A. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research, 33(suppl 1), D514-D517.
Han, X., Sun, L., & Zhao, J. (2011). Collective entity linking in web text: a graph-based method. Paper presented at the Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, Beijing, China.
Hersh, W. (2009). Information retrieval: a health and biomedical perspective: Springer Verlag.
Hersh, W., & Bhupatiraju, R. T. (2003). TREC Genomics Track Overview. Paper presented at the Proceedings of The 12th Text Retrieval Conference (TREC 2003), Gaithersburg, Maryland.
Hersh, W., Cohen, A., Ruslen, L., & Roberts, P. (2007). TREC 2007 Genomics Track Overview. Paper presented at the Proceedings of the Sixteenth Text REtrieval Conference Proceedings (TREC 2007), Gaithersburg, Maryland.
Hersh, W., Cohen, A., Yang, J., Bhupatiraju, R. T., Roberts, P., & Hearst, M. (2005). TREC 2005 Genomics Track Overview. Paper presented at the Proceedings of the Fourteenth Text REtrieval Conference Proceedings (TREC 2005), Gaithersburg, Maryland.
Hersh, W., Cohen, A. M., Roberts, P., & Rekapalli, H. K. (2006). TREC 2006 Genomics Track Overview. Paper presented at the Proceedings of the Fifteenth Text REtrieval Conference Proceedings (TREC 2006), Gaithersburg, Maryland.
Hersh, W. R., Bhuptiraju, R. T., Ross, L., Johnson, P., Cohen, A. M., & Kraemer, D. F. (2004). TREC 2004 Genomics Track Overview. Paper presented at the Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004), Gaithersburg, Maryland.
Herskovic, J. R., Tanaka, L. Y., Hersh, W., & Bernstam, E. V. (2007). A Day in the Life of PubMed: Analysis of a Typical Day's Query Log. Journal of the American Medical Informatics Association, 14(2), 212-220.
Hirschman, L., Colosimo, M., Morgan, A., & Yeh, A. (2005). Overview of BioCreAtIvE task 1B: normalized gene lists. BMC Bioinformatics, 6(Suppl 1), S11.
Hirschman, L., Yeh, A., Blaschke, C., & Valencia, A. (2005). Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics, 6(1).
Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., . . . Rhee, S. Y. (2008). Big data: The future of biocuration. Nature, 455(7209), 47-50. doi: 10.1038/455047a
Huang, D. W., Xu, Y., Trotman, A., & Geva, S. (2008). Overview of INEX 2007 Link the Wiki Track. In F. Norbert, K. Jaap, L. Mounia & T. Andrew (Eds.), Focused Access to XML Documents (pp. 373-387): Springer-Verlag.
Huang, M., Névéol, A., & Lu, Z. (2011). Recommending MeSH terms for annotating biomedical articles. Journal of the American Medical Informatics Association, 18, 660-667.
Huang, W. C. D., Geva, S., & Trotman, A. (2009). Overview of the INEX 2008 Link the Wiki Track. In S. Geva, J. Kamps & A. Trotman (Eds.), Advances in Focused Retrieval (Vol. 5631, pp. 314-325): Springer Berlin / Heidelberg.
Huynh, T. N., & Mooney, R. J. (2011). Online max-margin weight learning with Markov Logic Networks. Paper presented at the Proceedings of the Eleventh SIAM International Conference on Data Mining (SDM 2011), Mesa, Arizona, USA.
Jämsen, J., Näppilä, T., & Arvola, P. (2008). Entity Ranking Based on Category Expansion. In N. Fuhr, J. Kamps, M. Lalmas & A. Trotman (Eds.), Focused Access to XML Documents (Vol. 4862, pp. 264-278): Springer Berlin/Heidelberg.
Jansen, B. J., & Spink, A. (2006). How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Information Processing & Management, 42(1), 248-263.
Jelier, R., Schuemie, M., Eijk, C., Weeber, M., Mulligen, E., Schijvenaars, B., . . . Kors, J. (2003). Searching for GeneRIFs: concept-based query expansion and Bayes classification.
Jensen, D., Neville, J., & Gallagher, B. (2004). Why collective inference improves relational classification. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA.
Jenssen, T.-K., Lagreid, A., Komorowski, J., & Hovig, E. (2001). A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28(1), 21-28.
Ji, H., & Grishman, R. (2011). Knowledge base population: Successful approaches and challenges. Paper presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon.
Ji, H., Grishman, R., & Dang, H. T. (2011). Overview of the TAC 2011 Knowledge Base Population Track. Paper presented at the Proceedings of the Fourth Text Analysis Conference (TAC 2011), Gaithersburg, Maryland USA.
Ji, H., Grishman, R., Dang, H. T., Griffitt, K., & Ellis, J. (2010). Overview of the TAC 2010 Knowledge Base Population Track. Paper presented at the Proceedings of the Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland USA.
Ji, S., Zhou, K., Liao, C., Zheng, Z., Xue, G.-R., Chapelle, O., . . . Zha, H. (2009). Global ranking by exploiting user clicks. Paper presented at the Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Boston, MA, USA.
Jin-Dong, K., Tomoko, O., Yoshimasa Tsuruoka, Y. T., & Collier, N. (2004). Introduction to the bio-entity recognition task at JNLPBA. Proceedings of the International Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-04), 70-75.
Kautz, H., Selman, B., & Jiang, Y. (1997). A general stochastic approach to solving problems with hard and soft constraints. The Satisfiability Problem: Theory and Applications, 35, 573-586.
Keshava Prasad, T. S., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., . . . Pandey, A. (2009). Human Protein Reference Database--2009 update. Nucleic Acids Res, 37(Database issue), D767-D772.
Khalid, M. A., Jijkoun, V., & Rijke, M. d. (2008). The impact of named entity normalization on information retrieval for question answering. Paper presented at the Proceedings of the IR research, 30th European conference on Advances in information retrieval (ECIR'08).
Kindermann, R., & Snell, J. L. (1980). Markov Random Fields and Their Applications: Amer Mathematical Society.
Knaus, D., Mittendorf, E., & Schäuble, P. (1995). Improving a basic retrieval method by links and passage level evidence. Paper presented at the Proceedings of the third Text REtrieval Conference (TREC-3), Gaithersburg, Maryland.
Kok, S., & Domingos, P. (2010). Learning Markov logic networks using structural motifs. Paper presented at the Proceedings of the Twenty-Seventh International Conference on Machine Learning, Haifa, Israel.
Krauthammer, M., & Nenadic, G. (2004). Term identification in the biomedical literature. Journal of Biomedical Informatics, 37(6), 512-526.
Kulkarni, S., Singh, A., Ramakrishnan, G., & Chakrabarti, S. (2009). Collective annotation of wikipedia entities in web text. Paper presented at the Proceeding of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) Paris, France.
Kuo, C.-J., Ling, M., & Hsu, C.-N. (2011). Soft tagging of overlapping high confidence gene mention variants for cross-species full-text gene normalization. BMC Bioinformatics, 12(Suppl 8), S6.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Paper presented at the Proceedings of the 18th International Conference on Machine Learning (ICML).
Lai, P.-T., Bow, Y.-Y., Huang, C.-H., Dai, H.-J., Tsai, R. T.-H., & Hsu, W.-L. (2009). Using Contextual Information to Clarify Gene Normalization Ambiguity. Paper presented at the Proceedings of the IEEE International Conference on Information Reuse and Integration (IEEE IRI 2009), Las Vegas, USA.
Lai, P.-T., Dai, H.-J., Huang, C.-H., & Tsai, R. T.-H. (2010). IISR Gene Normalization System for BioCreAtIvE III. Paper presented at the Proceedings of BioCreative III Challenge Evaluation Workshop, Bethesda, Maryland.
Lawrence, S., Giles, C. L., & Bollacker, K. D. (1999). Autonomous citation matching. Paper presented at the Proceedings of the third annual conference on Autonomous Agents, New York, NY, USA.
Leitner, F., Krallinger, M., Cesareni, G., & Valencia, A. (2010). The FEBS Letters SDA corpus: a collection of protein interaction articles with high quality annotations for the BioCreative II.5 online challenge and the text mining community. FEBS Lett, 584(19), 4129-4130. doi: 10.1016/j.febslet.2010.08.026
Leitner, F., Krallinger, M., Rodriguez-Penagos, C., Hakenberg, J., Plake, C., Kuo, C.-J., . . . Valencia, A. (2008). Introducing meta-services for biomedical information extraction. Genome Biology, 9(Suppl 2), S6.
Leitner, F., Mardis, S. A., Krallinger, M., Cesareni, G., Hirschman, L. A., & Valencia, A. (2010). An Overview of BioCreative II.5. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 7(3), 385-399.
Li, Y., Lin, H., & Yang, Z. (2009). Incorporating rich background knowledge for gene named entity classification and recognition. BMC Bioinformatics, 10(1), 223.
Lin, R. T. K., Dai, H.-J., Bow, Y.-Y., Chiu, J. L.-T., & Tsai, R. T.-H. (2009). Using conditional random fields for result identification in biomedical abstracts Integrated Computer-Aided Engineering, 16(4), 339-352.
Lin, R. T. K., Dai, H.-J., Bow, Y.-Y., Day, M.-Y., Tsai, R. T.-H., & Hsu, W.-L. (2008). Result Identification for Biomedical Abstracts Using Conditional Random Fields. Paper presented at the Proceedings of the IEEE International Conference on Information Reuse and Integration (IEEE IRI 2008), Las Vegas, Nevada, USA.
Liu, Y.-T., Liu, T.-Y., Qin, T., Ma, Z.-M., & Li, H. (2007). Supervised rank aggregation. Paper presented at the Proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada.
Lu, Z., Kao, H.-Y., Wei, C.-H., Huang, M., Liu, J., Hsu, C.-J. K. C.-N., . . . Wilbur, W. J. (2011). The gene normalization task in BioCreative III. BMC Bioinformatics, 12(Suppl 9), S2.
McIntosh, T., & Curran, J. (2009). Challenges for automatically extracting molecular interactions from full-text articles. BMC Bioinformatics, 10(1), 311.
McNamee, P., & Dang, H. T. (2009). Overview of the TAC 2009 Knowledge Base Population Track. Paper presented at the Proceedings of the Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland USA.
McNamee, P., Dang, H. T., Simpson, H., Schone, P., & Strassel, S. M. (2009). An Evaluation of Technologies for Knowledge Base Population. Paper presented at the Proceedings of Test Analysis Conference 2009 (TAC 09), Gaithersburg, Maryland USA.
McNamee, P., Mayfield, J., Lawrie, D., Oard, D., & Doermann, D. (2011). Cross-Language Entity Linking. Paper presented at the Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP2011), Chiang Mai, Thailand.
Mendelson, E. (1997). Introduction to Mathematical Logic: Springer.
Mihalcea, R., & Csomai, A. (2007). Wikify!: linking documents to encyclopedic knowledge. Paper presented at the Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal.
Mihalcea, R., & Moldovan, D. (2000). Semantic indexing using WordNet senses. Paper presented at the ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval, Hong Kong, China.
Mihalcea, R., & Moldovan, D. (2001). Document indexing using named entities. Studies in Informatics and Control, 10(1), 21-28.
Miklós, Z., Bonvin, N., Bouquet, P., Catasta, M., Cordioli, D., Fankhauser, P., . . . Stoermer, H. (2010). From web data to entities and back. Paper presented at the Proceedings of the 22nd international conference on Advanced information systems engineering (CAiSE'10), Hammamet, Tunisia.
Milne, D., & Witten, I. H. (2008). Learning to link with wikipedia. Paper presented at the Proceedings of the 17th ACM conference on Information and knowledge management, Napa Valley, California, USA.
Mitchell, J. A., Aronson, A. R., Mork, J. G., Folk, L. C., Humphrey, S. M., & Ward, J. M. (2003). Gene indexing: characterization and analysis of NLM’s GeneRIFs. Paper presented at the Proceedings of the American Medical Informatics Association Annual Symposium (AMIA), Washington, DC.
Mons, B., Ashburner, M., Chichester, C., van Mulligen, E., Weeber, M., den Dunnen, J., . . . Bairoch, A. (2008). Calling on a million minds for community annotation in WikiProteins. Genome Biology, 9(5), R89.
Morgan, A. A., Lu, Z., Wang, X., Cohen, A. M., Fluck, J., Ruch, P., . . . Hirschman, L. (2008). Overview of BioCreative II gene normalization. Genome Biology, 9(Suppl 2), S3.
Myers, G. (1992). 'In this paper we report...': speech acts and scientific facts. Journal of Pragmatics, 17(4), 295-313.
Ng, V. (2005, June). Machine Learning for Coreference Resolution: From Local Classification to Global Ranking. Paper presented at the Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), University of Michigan, USA.
Oda, K., Kim, J.-D., Ohta, T., Okanohara, D., Matsuzaki, T., Tateisi, Y., & Tsujii, J. i. (2008). New challenges for text mining: mapping between text and manually curated pathways. BMC Bioinformatics, 9(Suppl 3), S5.
Pafilis, E., O'Donoghue, S. I., Jensen, L. J., Horn, H., Kuhn, M., Brown, N. P., & Schneider, R. (2009). Reflect: augmented browsing for the life scientist. [10.1038/nbt0609-508]. Nat Biotech, 27(6), 508-510. doi: http://www.nature.com/nbt/journal/v27/n6/suppinfo/nbt0609-508_S1.html
Paice, C. D. (1977). Information Retrieval and the Computer: Macdonald and Jane's.
Paice, C. D. (1981). The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases. Paper presented at the Proceedings of the 3rd annual ACM conference on Research and development in information retrieval, Cambridge, England.
Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2003). Identity uncertainty and citation matching. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 1425-1432.
Pehcevski, J., & Thom, J. A. (2007). Evaluating Focused Retrieval Tasks. Paper presented at the Proceedings of the SIGIR 2007 Workshop on Focused Retrieval, Amsterdam, The Netherlands.
Pehcevski, J., Vercoustre, A.-M., & Thom, J. (2008). Exploiting Locality of Wikipedia Links in Entity Ranking. Paper presented at the Proceedings of the IR research, 30th European conference on Advances in information retrieval (ECIR'08), Glasgow, UK.
Plake, C., Hakenberg, J., & Leser, U. (2005). Optimizing syntax patterns for discovering protein-protein interactions. Paper presented at the Proceedings of the 2005 ACM symposium on Applied computing, Santa Fe, New Mexico.
Poon, H., & Domingos, P. (2008). Joint unsupervised coreference resolution with Markov Logic. Paper presented at the Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu.
Preisach, C., & Schmidt-Thieme, L. (2008). Ensembles of relational classifiers. Knowledge and Information Systems, 14(3), 249-272. doi: 10.1007/s10115-007-0093-3
Qin, T., Liu, T.-Y., Zhang, X.-D., Feng, G., Wang, D.-S., & Ma, W.-Y. (2007). Topic distillation via sub-site retrieval. Inf. Process. Manage., 43(2), 445-460. doi: 10.1016/j.ipm.2006.07.004
Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., & Li, H. (2008). Global Ranking Using Continuous Conditional Random Fields. Paper presented at the Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), Vancouver, Canada.
Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., Xiong, W.-Y., & Li, H. (2008). Learning to rank relational objects and its application to web search. Paper presented at the Proceeding of the 17th international conference on World Wide Web (WWW'08), New York, NY, USA.
Rastogi, V., Dalvi, A. N., & Garofalakis, A. M. (2011). Large-scale collective entity matching. Paper presented at the Proceedings of the VLDB Endowment, Seattle, Washington.
Rebholz-Schuhmann, D. (2011). Overview: Setup of the first and the second CALBC challenge, overview on the analyses. Paper presented at the Proceedings of the second CALBC (Collaborative Annotation of a Large Biomedical Corpus) workshop, Hinxton, Cambridgeshire, UK.
Rebholz-Schuhmann, D., Yepes, A. J. J., Mulligen, E. M. v., Kang, N., Kors, J., Milward, D., . . . Hahn, U. (2010). CALBC silver standard corpus. Paper presented at the Proceedings of the 3rd International Symposium on Languages in Biology and Medicine, Jeju Island, South Korea.
Regev, Y., Finkelstein-Landau, M., Feldman, R., Gorodetsky, M., Zheng, X., Levy, S., . . . Shatkay, H. (2002). Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1). ACM SIGKDD Explorations Newsletter, 4(2), 90-92.
Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(Special Issue: Multi-Relational Data Mining and Statistical Relational Learning), 107-136.
Riedel, S. (2008). Improving the Accuracy and Efficiency of MAP Inference for Markov Logic. Paper presented at the Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI 2008), Helsinki, Finland.
Riedel, S. (2009). markov thebeast User Manual http://code.google.com/p/thebeast/ Retrieved from http://code.google.com/p/thebeast/
Romano, P., Manniello, A., Aresu, O., Armento, M., Cesaro, M., & Parodi, B. (2009). Cell Line Data Base: structure and recent improvements towards molecular authentication of human cell lines. Nucleic Acids Research, 37(Database issue), D925-D932.
Russell, S., & Norvig, P. (1995). Artificial intelligence: a modern approach. Englewood Cliffs, NJ; London: Prentice Hall.
Sang, E. F. T. K. (2002). Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. Paper presented at the proceedings of the 6th conference on Natural language learning - Volume 20.
Sang, E. F. T. K., & Meulder, F. D. (2003). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. Paper presented at the Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, Edmonton, Canada.
Sarawagi, S., & Bhamidipaty, A. (2002). Interactive deduplication using active learning. Paper presented at the Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02), Edmonton, Alberta, Canada.
Sarmento, L., Kehlenbeck, A., Oliveira, E., & Ungar, L. (2009). An Approach to Web-Scale Named-Entity Disambiguation. Paper presented at the Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition, Leipzig, Germany.
Schuemie, M. J., Weeber, M., Schijvenaars, B. J. A., Mulligen, E. M. v., Eijk, C. C. v. d., Jelier, R., . . . Kors, J. A. (2004). Distribution of information in biomedical abstracts and full-text publications. Bioinformatics, 20, 2597-2604.
Schwartz, A. S., & Hearst, M. A. (2003). A simple algorithm for identifying abbreviation definitions in biomedical text. Paper presented at the Pac Symp Biocomput.
Sehgal, A., & Srinivasan, P. (2006). Retrieval with gene queries. BMC Bioinformatics, 7(1), 220.
Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., & Eliassi-Rad, T. (2008). Collective classification in network data. AI Magazine, 29(3), 93.
Shah, P. K., Perez-Iratxeta, C., Bork, P., & Andrade, M. A. (2003). Information extraction from full text scientific articles: Where are the keywords? BMC Bioinformatics, 4, 20.
Shatkay, H., Chen, N., & Blostein, D. (2006). Integrating image data into biomedical text categorization. Bioinformatics, 22(14), e446-e453. doi: 10.1093/bioinformatics/btl235
Smith, L., Tanabe, L. K., Ando, R. J. n., Kuo, C.-J., Chung, I.-F., Hsu, C.-N., . . . Wilbur, W. J. (2008). Overview of BioCreative II gene mention recognition. Genome Biology, 9(Suppl 2), S2.
Smith, T. F., & Waterman, M. (1981). Identification Of Common Molecular Subsequences. Journal of Molecular Biology, 147, 195-197.
Soboroff, I., Vries, A. P. d., & Craswell, N. (2006). Overview of the TREC 2006 Enterprise Track. Paper presented at the Proceedings of the Fifteenth Text REtrieval Conference Proceedings (TREC 2006), Gaithersburg, Maryland.
Soon, W. M., Ng, H. T., & Lim, D. C. Y. (2001). A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4), 521-544.
Sorden, N. N., Chang, H. F., & Nelson, S. J. (1999). Automated Indexing of Gene Symbols. Paper presented at the Proceedings of the American Medical Informatics Association Symposium (AMIA '99), Washington, DC. http://www.nlm.nih.gov/mesh/gene.html
Stephens, M., Palakal, M., Mukhopadhyay, S., & Raje, R. (2001). Detecting gene relations from Medline abstracts. Paper presented at the Proceedings of the 6th Pacific Symposium on Biocomputing (PSB 2001), Hawaii, USA.
Stern, R., Sagot, B., & Béchet, F. (2012). A Joint Named Entity Recognition and Entity Linking System. Paper presented at the Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data (Hybrid2012), Avignon, France.
Subramaniam, L. V., Mukherjea, S., Kankar, P., Srivastava, B., Batra, V. S., Kamesam, P. V., & Kothari, R. (2003). Information extraction from biomedical literature: methodology, evaluation and an application. Paper presented at the Proceedings of the twelfth international conference on Information and knowledge management, New Orleans, LA, USA.
Sung, C.-L., Lee, C.-W., Yen, H.-C., & Hsu, W.-L. (2009). Alignment-based surface patterns for factoid question answering systems. Integrated Computer-Aided Engineering, 16, 259-269.
Swales, J. M. (1990). Genre analysis: English in academic and research settings: Cambridge University Press.
Swales, J. M. (2004). Research genres: Explorations and applications: Cambridge University Press.
Tao, T., & Zhai, C. (2006). Regularized estimation of mixture models for robust pseudo-relevance feedback. Paper presented at the Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, USA.
Tsai, R. T.-H., Chou, W.-C., Su, Y.-S., Lin, Y.-C., Sung, C.-L., Dai, H.-J., . . . Hsu, W.-L. (2007). BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features. BMC Bioinformatics, 8, 325.
Tsai, R. T.-H., Dai, H.-J., Huang, C.-H., & Hsu, W.-L. (2008). Semi-automatic conversion of BioProp semantic annotation to PASBio annotation. BMC Bioinformatics, 9(Suppl 12), S18.
Tsai, R. T.-H., Dai, H.-J., Hung, H.-C., Sung, C.-L., Day, M.-Y., & Hsu, W.-L. (2006). Chinese Word Segmentation with Minimal Linguistic Knowledge: An Improved Conditional Random Fields Coupled with Character Clustering and Automatically Discovered Template Matching. Paper presented at the Proceedings of the 2006 IEEE International Conference on Information Reuse and Integration (IRI - 2006), Waikoloa Village, HI
Tsai, R. T.-H., Dai, H.-J., Lai, P.-T., & Huang, C.-H. (2009). PubMed-EX: A web browser extension to enhance PubMed search with text mining features. Bioinformatics, 25, 3031-3032.
Tsai, R. T.-H., Hung, H.-C., Dai, H.-J., Lin, Y.-W., & Hsu, W.-L. (2008). Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles. BMC Bioinformatics. 2008, 9(1), S3.
Tsai, R. T.-H., & Lai, P.-T. (2011). Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles. BMC Bioinformatics, 12, 60.
Tsai, R. T.-H., Lai, P.-T., Dai, H.-J., Huang, C.-H., Bow, Y.-Y., Chang, Y.-C., . . . Hsu, W.-L. (2009). HypertenGene: Extracting key hypertension genes from biomedical literature with position and automatically-generated template features. BMC Bioinformatics, 10(Suppl 15), S9.
Tsai, R. T.-H., Sung, C.-L., Dai, H.-J., Hung, H.-C., Sung, T.-Y., & Hsu, W.-L. (2006). NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics, 7(Suppl 5), S11.
Tsai, R. T.-H., Wu, S.-H., Chou, W.-C., Lin, C., He, D., Hsiang, J., . . . Hsu, W.-L. (2006). Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics, 7(92), 14.
Tsuruoka, Y., McNaught, J., Tsujii, J., & Ananiadou, S. (2007). Learning string similarity measures for gene/protein name dictionary look-up using logistic regression. Bioinformatics, 23(20), 2768-2774.
Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., & Tsujii, J. i. (2005). Developing a robust part-of-speech tagger for biomedical text. Lecture notes in computer science, 382-392.
Vapnik, V. N. (1995). The nature of statistical learning theory. Berlin: Springer-Verlag New York, Inc.
Vogt, C. C., & Cottrell, G. W. (1999). Fusion via a linear combination of scores. Information Retrieval, 1(3), 151-173.
Voorhees, E. M. (2003). Overview of TREC 2003. Paper presented at the Proceedings of The 12th Text Retrieval Conference (TREC), Gaithersburg, Maryland.
Voorhees, E. M. (2007). Overview of TREC 2007. Paper presented at the Proceedings of the Sixteenth Text REtrieval Conference (TREC 2007) Gaithersburg, Maryland.
Vries, A. P. d., Vercoustre, A.-M., Thom, J. A., Craswell, N., & Lalmas, M. (2008). Overview of the INEX 2007 entity ranking track. In N. Fuhr, J. Kamps, M. Lalmas & A. Trotman (Eds.), Focused Access to XML Documents (pp. 245-251): Springer-Verlag.
Wang, X., Tsujii, J. i., & Ananiadou, S. (2010). Disambiguating the species of biomedical named entities using natural language parsers. Bioinformatics, 26(5), 661-667.
Wang, Y., Addess, K. J., Chen, J., Geer, L. Y., He, J., He, S., . . . Bryant, S. H. (2007). MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res, 35(Database issue), D298-D300. doi: 10.1093/nar/gkl952
Wang, Y., Xiao, J., Suzek, T. O., Zhang, J., Wang, J., & Bryant, S. H. (2009). PubChem: a public information system for analyzing bioactivities of small molecules. Nucl. Acids Res., 37(suppl_2), W623-633. doi: 10.1093/nar/gkp456
Weeber, M., Schijvenaars, B., van Mulligen, E., Mons, B., Jelier, R., van der Eijk, C., & Kors, J. (2003). Ambiguity of Human Gene Symbols in LocusLink and MEDLINE: Creating an Inventory and a Disambiguation Test Collection. Paper presented at the Proceedings of the AMIA Symposium.
Wermter, J., Tomanek, K., & Hahn, U. (2009). High-performance gene name normalization with GENO. Bioinformatics, 25(6), 815-821. doi: 10.1093/bioinformatics/btp071
William A. Baumgartner, J., Cohen, K. B., Fox, L. M., Acquaah-Mensah, G., & Hunter, L. (2007). Manual curation is not sufficient for annotation of genomic databases. Bioinformatics, 23(13), i41-i48.
William A. Baumgartner, J., Lu, Z., Johnson, H. L., Caporaso, J. G., Paquette, J., Lindemann, A., . . . Hunter, L. (2007). An integrated approach to concept recognition in biomedical text. Paper presented at the Proceedings of the Second BioCreative Challenge Evaluation Workshop.
William A. Baumgartner, J., Lu, Z., Johnson, H. L., Caporaso, J. G., Paquette, J., Lindemann, A., . . . Hunter, L. (2008). Concept recognition for extracting protein interaction relations from biomedical text. Genome Biology, 9(Suppl 2), S9.
Willighagen, E., O'Boyle, N., Gopalakrishnan, H., Jiao, D., Guha, R., Steinbeck, C., & Wild, D. (2007). Userscripts for the Life Sciences. BMC Bioinformatics, 8(1), 487.
Winkler, W. E. (1999). The state of record linkage and current research problems. Paper presented at the Proceedings of the Survey Methods Section, Regina, Canada.
Winston, W. L., & Venkataramanan, M. (2002). Introduction to Mathematical Programming: Applications and Algorithms (Vol. 1): Duxbury Press.
Woods, W. A. (1997). Conceptual Indexing: A Better Way to Organize Knowledge Technical Report SMLI TR-97-61. Mountain View, CA, USA: Sun Microsystems, Inc.
Xu, H., Fan, J.-W., Hripcsak, G., Mendonça, E. A., Markatou, M., & Friedman, C. (2007). Gene symbol disambiguation using knowledge-based profiles. Bioinformatics, 23(8), 1015-1022. doi: 10.1093/bioinformatics/btm056
Yang, Q., Jiang, P., Zhang, C., & Niu, Z. (2010). Reconstruct Logical Hierarchical Sitemap for Related Entity Finding. Paper presented at the Proceedings of the Nineteenth Text REtrieval Conference (TREC 2010) Gaithersburg, Maryland.
Yeh, A., Morgan, A., Colosimo, M., & Hirschman, L. (2005). BioCreAtIvE task 1A: gene mention finding evaluation. BMC Bioinformatics, 6(Suppl 1), S2.
Yin, X., & Shah, S. (2010). Building taxonomy of web search intents for name entity queries. Paper presented at the Proceedings of the 19th international conference on World wide web, Raleigh, North Carolina, USA.
Yoshikawa, K., Riedel, S., Hirao, T., Asahara, M., & Matsumoto, Y. (2011). Coreference based event-argument relation extraction on biomedical text. Journal of Biomedical Semantics 2(Suppl 5), S6.
Zhang, W., Sim, Y. C., Su, J., & Tan, C. L. (2011). Entity Linking with Effective Acronym Expansion, Instance Selection and Topic Modeling. Paper presented at the Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain.
Zhang, W., Su, J., Tan, C. L., & Wang, W. T. (2010). Entity Linking Leveraging Automatically Generated Annotation. Paper presented at the Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China.