簡易檢索 / 詳目顯示

研究生: 鄭憲倫
Shian-luen Cheng
論文名稱: 利用鬆弛演算法分類專利詞彙
Patent Terminology Classification Applying Relaxation Labeling
指導教授: 蘇豐文
Von-Wun Soo
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 42
中文關鍵詞: 鬆弛演算法文件分類專利文件分類專利分析
外文關鍵詞: relaxation labeling, text classifier, patent classification, patent analysis
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 智慧財產權扮演著企業與國家發展競爭力的關鍵角色,然而專利文件能在法律上有效保障智慧財產權。由於專利文件的數量龐大,專利申請範圍(claim)的寫作格式也不盡相同,現今大多數的專利檢索、專利分析及專利侵權迴避工作仍需仰賴專家以人工方式進行。然而在專利分析的相關工作上,目前仍缺乏有效的自動化方法輔助。
    在過去的研究中,各種不同的文件分類器(text classifier)廣泛地應用在文件分類工作上,快速有效地協助人工分類,解決了文件分類的需求。在本研究中,我們將文件分類器應用在專利文件中,用來自動化分類專利文件的專有名詞,協助分類專利文件。
    在本篇論文中,提出了一個應用在專利文件中,以鬆弛演算法(relaxation labeling)來分類專有名詞(terminology)的方法。首先,針對CMP專利文件進行剖析,藉由專利文件的特性,利用自然語言處理技術和正規表示法,擷取出專利申請範圍中的專有名詞與其結構、屬性、材料方面的描述資訊,並進一步建立出CMP詞彙的分類架構(taxnonmy)。接著,從訓練資料中計算出分類架構中不同類別的機率與相關係數。透過鬆弛演算法的計算,可以得知每一個專有名詞最合適被分類的類別。
    我們以兩個實驗來驗證此方法應用在分類專利文件的專有名詞的正確性。經由實驗驗證得知,在有效的剖析和擷取出專利申請範圍的語意內容下,此方法能達到一定的正確比率,提供使用者另一種分類專業領域詞彙的方法,輔助進行專利檢索、辨別相似名詞與建立專業領域詞庫等等相關工作。


    Intellectual property (IP) is a power tool for economic growth of country, it is also the competitive advantage of innovation for businesses. As the view of law, patent is used to protect IP sufficiently. With the growing of patent documents and different writing styles of claims in patents, patent analysis works including patent retrivel, synonym identifying and domain thesaurus building are extremely manual works.
    In this thesis, an approach of text classification we propose is relaxation labeling. The technique is used to classify the terminologies in patent documents. We have pre-classified the taxonomy of CMP domain from training data in advance. The terminologies and the information about relation, attribute and material have been extracted by NLP technique and regular expression. The probability and compatibility coefficients which are parameters in the relaxation labeling model have been estimated from training data. In the progress of relaxation labeling, the probability of each class for each unclassified term was updating. The most appropriate class will be obvious when the model is converged.
    Based on the extracted semantic information, the experiment results clearly show that relaxation labeling is sufficient for terminology classification and achieve certain accuracy. We believe that relaxation labeling might have been usage in patent analysis work.

    Chapter1 Introduction 1 1.1 Patent terminology classification 1 1.2 Research Motivation and Objective 4 1.3 The organization of the Thesis 5 Chapter2 The Backgrounds 6 2.1 Text Classification 6 2.2 Relaxation Labeling 8 2.3 Patent Classification 10 Chapter3 Terminology and Relation Extracting 12 3.1 Terminology Extraction 13 3.2 CMP Taxonomy Construction 16 3.3 Patent Information Extraction 18 Chapter4 Terminology Classifier applying Relaxation Labeling 22 4.1 Define initial probability of each class in taxonomy 22 4.2 Define compatibility coefficient 24 4.3 Terminology classifier applying Relaxation Labeling 26 Chapter5 Evaluations 33 5.1 Accuracy of classification by Relaxation Labeling 33 5.2 Improve accuracy by adjusting weighting 36 Chapter6 Conclusions 39 Reference 40

    Reference
    [1] Kamil Idris. Intellectual property-a power tool for economic growth, World Intellectual Property Organization, Switzerland. 2002.
    [2] Sheldon W. Halpern, Craig Allen Nard and Kenneth L. Port. Fundamentals of United States Intellectual Property Law: Copyright, Patent, and Trademark, Kluwer Law International, The Netherlands, 1999.
    [3] Alan L. Durham. Patent Law Essentials: a concise guide, Quorum Books, U.S.A. 1999.
    [4] Akihiro Shinmori, Manabu Okumura, Yuzo Marukawa and Makoto Iwayama. “Patent Claim Processing for Readability - Structure Analysis and Term Explanation,” Proceedings of ACL Workshop on Patent Corpus Processing, Japan. 2003.
    [5] Svetlana Sheremetyeva. “Natural Language Analysis of Patent Claims,” Proceedings of ACL Workshop on Patent Corpus Processing, Japan. 2003.
    [6] Shih-Yao Yang, Szu-Yin Lin, Shih-Neng Lin, Shian-Luen Cheng, and Von-Wun Soo. “An Ontology-based Multi-Agent Platform For Patent Knowledge Management”, International Journal of Electronic Business Management, Vol3, No.3, pp.181-192. 2005.
    [7] Von-Wun Soo, Shih-Yao Yang, Szu-Yin Lin, Shih-Neng Lin and Shian-Luen Cheng. “A Cooperative Multi-Agent Platform for Invention based on Ontology and Patent Document Analysis,” Proceeding of the 9th International Conference on Computer Supported Cooperative Work in Design (CSCWD), UK. 2005.
    [8] Gerd Nanz and Lawrence E. Camilletti. “Modeling of Chemical-Mechanical Polishing: A review,” IEEE Transactions on Semiconductor Manufacturing, Volume 8, No. 4, pp.382-389. 1995.
    [9] Yiming Yang and Xin Liu, “A Re-Examination of Text Categorization Methods,” Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 42-49. 1999.
    [10] Thorsten Joachimes, “Text Categorization with Support Vector Machine: Learning with Many Relevant Feature”, Proceedings of the 10th European Conference on Machine Learning. ECML-98, pages 137-142. 1998.
    [11] Belur V. Dasarathy, “Nearest neighbor NN norms: Nn pattern classification techniques”, IEEE Computer Society Press, pages 388--397, Los Alamitos. 1991.
    [12] Rosenfeld, A., R. A. Hummel, and S. W. Zucker, “Scene labeling by relaxation operations”. IEEE Trans. Systems, Man, Cybern. vol. SMC-6, no. 6, pp. 420-433, 1976.
    [13] Kittler, J. and J. Illingworth, “Relaxation labeling algorithms: a review”. Image and Vision Computing. IVC(3), No. 4. pp. 206-216, November 1985.
    [14] AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon Halevy, “Learning to Map between Ontologies on the Semantic Web,” Proceedings International WWW Conference, USA. 2002.
    [15] L. Padro, “A Hybrid Environment for Syntax-Semantic Tagging”. 1998.
    [16] Soumen Chakrabarti, Byron Dom, and Piotr Indyk, “Enhanced hypertext categorization using hyperlinks,” Proceedings of SIGMOD-98, ACM International Conference on Management of Data. 1998.
    [17] World Intellectual Property Organization, International Patent Classification: Guide, 8th edition, Volume 5.Geneva. 2006.
    [18] S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan, “Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies”, VLDB Journal 7,163-178. 1998.
    [19] L. S. Larkey, “Some Issues in the Automatic Classification of U.S. patents, Working Notes for the Workshop on Learning for Text Categorization”, 15th Nat. Conf. on Artif. Intell. (AAAi-98), Madison, Wisconsin. 1998.
    [20] T. Kohonen, S. Kaski, K. Lagus, J. Saloj~irvi, J., Honkela, V. Paatero, and A. Saarela, “Self organization of a massive document collection”, IEEE transactions on neural networks 11 (3), 574-585. 2000.
    [21] Miller, G. A., “WordNet: A Lexical Database, Comm. ACM, Volume 38, No.11, pp.39-41. 1995.
    [22] Stanford Natural Language Processing Group., Stanford Tagger, http://nlp.stanford.edu/software/tagger.shtml. 2005.
    [23] Sun Microsystem, Java 2 SDK, Standard Edition Documentation Version 1.4.2, SunMicrosystem. http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html, 2003

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE