Use of Text Summarization Technique for Supporting Patent Prior Art Retrieval

簡易檢索 / 詳目顯示

回結果列表

研究生：	余杰謙 Chieh-Chien Yu
論文名稱：	Use of Text Summarization Technique for Supporting Patent Prior Art Retrieval 利用文件摘要技術協助專利先前技術檢索
指導教授：	魏志平 Chih-Ping Wei
口試委員:
學位類別：	碩士 Master
系所名稱：	科技管理學院 - 科技管理研究所 Institute of Technology Management
論文出版年：	2008
畢業學年度：	96
語文別：	英文
論文頁數：	41
中文關鍵詞：	Prior Art Retrieval 、Patent Search 、Text Summarization 、Patent Summarization 、Summary-based Prior Art Retrieval
外文關鍵詞：	Prior Art Retrieval, Patent Search, Text Summarization, Patent Summarization, Summary-based Prior Art Retrieval
相關次數：	點閱：115 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Prior art retrieval refers to the process of identifying relevant prior arts for a given patent (or patent application). Prior art retrieval is essential to support patentability (or novelty) and invalidity searches, and its effectiveness greatly affects the validity of these searches. In this study, we aim at improving the effectiveness of prior art retrieval by proposing a summary-based prior art retrieval (SPAR) technique. Our rationale is that sentences in a patent document are not equally important in describing the invention claimed in the patent. Thus, we employ the text summarization approach to develop an automatic patent summarization technique for selecting important sentences in a query patent document and then use the patent summary for prior art retrieval. For evaluation purposes, we collect 78,225 patent documents from the United States Patent and Trademark Office (USPTO) website and conduct a series of experiments using a traditional full-text-based prior art retrieval technique as the performance benchmark. Our evaluation results suggest that our proposed SPAR technique significantly outperforms its benchmark technique. Moreover, our evaluation results also indicate that the inclusion of feature selection and our non-prior art selection method for patent summarization learning improve the effectiveness of prior art retrieval.

TABLES OF CONTENTS
LIST OF FIGURES    iii
LIST OF TABLES    iv
致謝詞    v
Abstract    vi
中文摘要    vii
Chapter 1 Introduction    1
1.1    Background    1
1.2    Research Motivation and Objective    2
1.3    Organization of the Thesis    4
Chapter 2 Literature Review    5
2.1    Patent Prior Art Retrieval Techniques    5
2.2    Text Summarization    7
2.2.1    Features for Text Summarization    7
2.2.2    Methods for Text Summarization    10
2.2.3    Summary of Text Summarization Techniques    11
Chapter 3 Design of Summary-based Prior Art Retrieval (SPAR) Technique    13
3.1    Patent Summarization Learning System    14
3.2.1    Extraction of Discriminating Terms    14
3.2.2    Sentence Importance Identification    16
3.2.3    Summarization Learning    16
3.2    Summary-based Prior Art Retrieval System    21
3.2.1    Patent Summarization    21
3.2.2    Text-based Prior Art Retrieval    22
Chapter 4 Empirical Evaluation    24
4.1    Data Collection    24
4.2    Performance Benchmark and Evaluation Criteria    25
4.3    Parameter Tuning    25
4.4    Comparative Evaluation Results    28
4.5    Effects of Different Induction Algorithms for the SPAR Technique    32
4.6    Effects of Sentence Importance Identification Methods    33
4.7    Effects of Representation of Thematic Word Variable for Patent Summarization Learning    36
4.8    Effects of Non-Prior Art Selection for Patent Summarization Learning    37
Chapter 5 Conclusion and Future Research Directions    39
5.1    Conclusion    39
5.2    Future Research Directions    39
References    40

                                

References
Baeza-Yates, R. and Ribeiro-Neto, B., Modern Information Retrieval, Boston, MA: Addison Wesley, 1999.
Baxendale, P. B., “Machine-made Index for Technical Literature–An Experiment,” IBM Journal of Research and Development (2:4), 1958, pp. 354-361.
Brin, S. and Page L., “The Anatomy of A Large-scale Hypertextual Web Search Engine,” Computer Networks and ISDN Systems (30:1-7), April 1998, pp. 107-117.
Clark, P. and Niblett, T., “The CN2 Induction Algorithm,” Machine Learning (3:4), 1989, pp. 261-283.
Edmundson, H. P., “New Method in Automatic Extraction,” Journal of the ACM (16:2), 1969, pp. 264-285.
Fujii, A., “Enhancing Patent Retrieval by Citation Analysis,” Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, Netherlands, 2007, pp. 793-794.
Fujita, S., “Technology Survey and Invalidity Search: A Comparative Study of Different Tasks for Japanese Patent Document Retrieval,” Information Processing and Management (43:5), September 2007, pp. 1154-1172.
Ganesan, P., Garcia-Molina, H., and Widom, J., “Exploiting Hierarchical Domain Structure to Compute Similarity,” ACM Transactions on Information Systems (21:1), January 2003, pp. 64-93.
Jones, K. S., “Automatic Summarizing: Factors and Directions,” in Advances in Automatic Text Summarization, I. Mani and M. Maybury (eds.), Cambridge, MA: MIT Press, 1999.
Konishi, K., Kitauchi, A., and Takaki, T., “Invalidity Patent Search System of NTT DATA,” Proceedings of the Fourth NII Test Collection for IR Systems (NTCIR-4) Workshop, Tokyo, Japan, July 2004.
Lai, K. K. and Wu, S. J., “Using the Patent Co-citation Approach to Establish A New Patent Classification System,” Information Processing and Management (41:2), March 2005, pp. 313-330.
Luhn, H. P., “The Automatic Creation of Literature Abstracts,” IBM Journal of Research and Development, 1958, pp. 159-165.
Kupiec, J., Pedersen, J., and Chen, F., “A Trainable Document Summarizer,” Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, 1995, pp. 68-73
Mani, I. and Bloedorn, E., “Machine Learning of Generic and User-Focused Summarization,” Working Notes of the AAAI'98 Spring Symposium on Intelligent Text Summarization, Stanford, CA, 1998, pp. 69-76.
Michalski, R. S., Mozetic, I., Hong, J. and Lavrac, N., “The Multipurpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains,” Proceedings of the 5th National Conference on Artificial Intelligence (AAAI-86), Philadelphia, PA, 1986, pp. 1041-1045.
Mitchell, T., Machine Learning, New York: McGraw Hill, 1997.
Myaeng, S. H. and Jang, D. H., “Development and Evaluation of a Statistically-Based Document Summarization System,” in Advances in Automatic Text Summarization, I. Mani and M. Maybury (eds.), Cambridge, MA: MIT Press, 1999.
Neto, J. L., Santos, A. D., Kaestner, C. A. A., Freitas, A. A., Nievola, J. C., “A Trainable Algorithm for Summarization News Stories,” Proceeding PKDD’2000 Workshop on Machine Learning and Textual Information Access, Lyon, France, September 2000.
Quinlan, J. R., “Induction of Decision Trees,” Machine Learning (1), 1986, pp. 81-106.
Quinlan, J. R., C4.5: Programs for Machine Learning, San Francisco, CA: Morgan Kaufmann, 1993.
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M., “Okapi at TREC-3,” Proceedings of the Third Text REtrieval Conference (TREC-3), Gaithersburg, MD, November 1994, pp. 109-126.
Roussinov, D. G. and Chen, H., “Document Clustering for Electronic Meetings: An Experimental Comparison of Two Techniques,” Decision Support Systems (27:1-2), 1999, pp. 67-79.
Takeuchi, H., Uramoto, N., and Takeda, K., “Experiments on Patent Retrieval at NTCIR-4 Workshop,” Proceedings of the Fourth NII Test Collection for IR Systems (NTCIR-4) Workshop, Tokyo, Japan, July 2004.
Teufel, S. and Moens, M., “Sentence Extraction as a Classification Task,” Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, July 1997, pp. 58-65.
USPTO, Manual of Patent Examining Procedure, 8th Ed., August 2001.
Yang, Y. and Chute, C. G., “An Example-based Mapping Method for Text Categorization and Retrieval,” ACM Transactions on Information Systems (12:3), 1994, pp.252-277.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文