簡易檢索 / 詳目顯示

研究生: 余杰謙
Chieh-Chien Yu
論文名稱: Use of Text Summarization Technique for Supporting Patent Prior Art Retrieval
利用文件摘要技術協助專利先前技術檢索
指導教授: 魏志平
Chih-Ping Wei
口試委員:
學位類別: 碩士
Master
系所名稱: 科技管理學院 - 科技管理研究所
Institute of Technology Management
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 41
中文關鍵詞: Prior Art RetrievalPatent SearchText SummarizationPatent SummarizationSummary-based Prior Art Retrieval
外文關鍵詞: Prior Art Retrieval, Patent Search, Text Summarization, Patent Summarization, Summary-based Prior Art Retrieval
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Prior art retrieval refers to the process of identifying relevant prior arts for a given patent (or patent application). Prior art retrieval is essential to support patentability (or novelty) and invalidity searches, and its effectiveness greatly affects the validity of these searches. In this study, we aim at improving the effectiveness of prior art retrieval by proposing a summary-based prior art retrieval (SPAR) technique. Our rationale is that sentences in a patent document are not equally important in describing the invention claimed in the patent. Thus, we employ the text summarization approach to develop an automatic patent summarization technique for selecting important sentences in a query patent document and then use the patent summary for prior art retrieval. For evaluation purposes, we collect 78,225 patent documents from the United States Patent and Trademark Office (USPTO) website and conduct a series of experiments using a traditional full-text-based prior art retrieval technique as the performance benchmark. Our evaluation results suggest that our proposed SPAR technique significantly outperforms its benchmark technique. Moreover, our evaluation results also indicate that the inclusion of feature selection and our non-prior art selection method for patent summarization learning improve the effectiveness of prior art retrieval.


    TABLES OF CONTENTS LIST OF FIGURES iii LIST OF TABLES iv 致謝詞 v Abstract vi 中文摘要 vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Research Motivation and Objective 2 1.3 Organization of the Thesis 4 Chapter 2 Literature Review 5 2.1 Patent Prior Art Retrieval Techniques 5 2.2 Text Summarization 7 2.2.1 Features for Text Summarization 7 2.2.2 Methods for Text Summarization 10 2.2.3 Summary of Text Summarization Techniques 11 Chapter 3 Design of Summary-based Prior Art Retrieval (SPAR) Technique 13 3.1 Patent Summarization Learning System 14 3.2.1 Extraction of Discriminating Terms 14 3.2.2 Sentence Importance Identification 16 3.2.3 Summarization Learning 16 3.2 Summary-based Prior Art Retrieval System 21 3.2.1 Patent Summarization 21 3.2.2 Text-based Prior Art Retrieval 22 Chapter 4 Empirical Evaluation 24 4.1 Data Collection 24 4.2 Performance Benchmark and Evaluation Criteria 25 4.3 Parameter Tuning 25 4.4 Comparative Evaluation Results 28 4.5 Effects of Different Induction Algorithms for the SPAR Technique 32 4.6 Effects of Sentence Importance Identification Methods 33 4.7 Effects of Representation of Thematic Word Variable for Patent Summarization Learning 36 4.8 Effects of Non-Prior Art Selection for Patent Summarization Learning 37 Chapter 5 Conclusion and Future Research Directions 39 5.1 Conclusion 39 5.2 Future Research Directions 39 References 40

    References
    Baeza-Yates, R. and Ribeiro-Neto, B., Modern Information Retrieval, Boston, MA: Addison Wesley, 1999.
    Baxendale, P. B., “Machine-made Index for Technical Literature–An Experiment,” IBM Journal of Research and Development (2:4), 1958, pp. 354-361.
    Brin, S. and Page L., “The Anatomy of A Large-scale Hypertextual Web Search Engine,” Computer Networks and ISDN Systems (30:1-7), April 1998, pp. 107-117.
    Clark, P. and Niblett, T., “The CN2 Induction Algorithm,” Machine Learning (3:4), 1989, pp. 261-283.
    Edmundson, H. P., “New Method in Automatic Extraction,” Journal of the ACM (16:2), 1969, pp. 264-285.
    Fujii, A., “Enhancing Patent Retrieval by Citation Analysis,” Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, Netherlands, 2007, pp. 793-794.
    Fujita, S., “Technology Survey and Invalidity Search: A Comparative Study of Different Tasks for Japanese Patent Document Retrieval,” Information Processing and Management (43:5), September 2007, pp. 1154-1172.
    Ganesan, P., Garcia-Molina, H., and Widom, J., “Exploiting Hierarchical Domain Structure to Compute Similarity,” ACM Transactions on Information Systems (21:1), January 2003, pp. 64-93.
    Jones, K. S., “Automatic Summarizing: Factors and Directions,” in Advances in Automatic Text Summarization, I. Mani and M. Maybury (eds.), Cambridge, MA: MIT Press, 1999.
    Konishi, K., Kitauchi, A., and Takaki, T., “Invalidity Patent Search System of NTT DATA,” Proceedings of the Fourth NII Test Collection for IR Systems (NTCIR-4) Workshop, Tokyo, Japan, July 2004.
    Lai, K. K. and Wu, S. J., “Using the Patent Co-citation Approach to Establish A New Patent Classification System,” Information Processing and Management (41:2), March 2005, pp. 313-330.
    Luhn, H. P., “The Automatic Creation of Literature Abstracts,” IBM Journal of Research and Development, 1958, pp. 159-165.
    Kupiec, J., Pedersen, J., and Chen, F., “A Trainable Document Summarizer,” Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, 1995, pp. 68-73
    Mani, I. and Bloedorn, E., “Machine Learning of Generic and User-Focused Summarization,” Working Notes of the AAAI'98 Spring Symposium on Intelligent Text Summarization, Stanford, CA, 1998, pp. 69-76.
    Michalski, R. S., Mozetic, I., Hong, J. and Lavrac, N., “The Multipurpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains,” Proceedings of the 5th National Conference on Artificial Intelligence (AAAI-86), Philadelphia, PA, 1986, pp. 1041-1045.
    Mitchell, T., Machine Learning, New York: McGraw Hill, 1997.
    Myaeng, S. H. and Jang, D. H., “Development and Evaluation of a Statistically-Based Document Summarization System,” in Advances in Automatic Text Summarization, I. Mani and M. Maybury (eds.), Cambridge, MA: MIT Press, 1999.
    Neto, J. L., Santos, A. D., Kaestner, C. A. A., Freitas, A. A., Nievola, J. C., “A Trainable Algorithm for Summarization News Stories,” Proceeding PKDD’2000 Workshop on Machine Learning and Textual Information Access, Lyon, France, September 2000.
    Quinlan, J. R., “Induction of Decision Trees,” Machine Learning (1), 1986, pp. 81-106.
    Quinlan, J. R., C4.5: Programs for Machine Learning, San Francisco, CA: Morgan Kaufmann, 1993.
    Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M., “Okapi at TREC-3,” Proceedings of the Third Text REtrieval Conference (TREC-3), Gaithersburg, MD, November 1994, pp. 109-126.
    Roussinov, D. G. and Chen, H., “Document Clustering for Electronic Meetings: An Experimental Comparison of Two Techniques,” Decision Support Systems (27:1-2), 1999, pp. 67-79.
    Takeuchi, H., Uramoto, N., and Takeda, K., “Experiments on Patent Retrieval at NTCIR-4 Workshop,” Proceedings of the Fourth NII Test Collection for IR Systems (NTCIR-4) Workshop, Tokyo, Japan, July 2004.
    Teufel, S. and Moens, M., “Sentence Extraction as a Classification Task,” Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, July 1997, pp. 58-65.
    USPTO, Manual of Patent Examining Procedure, 8th Ed., August 2001.
    Yang, Y. and Chute, C. G., “An Example-based Mapping Method for Text Categorization and Retrieval,” ACM Transactions on Information Systems (12:3), 1994, pp.252-277.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE