研究生: |
黃翊軒 Yi-Hsuan Huang |
---|---|
論文名稱: |
本體論為基之智慧型專利文件分類方法論研究 A Novel Methodology for Ontology-Based Patent Document Categorization |
指導教授: |
張瑞芬
Amy J. C. Trappey |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 145 |
中文關鍵詞: | 本體論 、關鍵詞彙 、文件分類 、類神經網路 、TF-IDF |
外文關鍵詞: | Ontology, Key Phrases, Document Categorization, Neural Network, TF-IDF |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在人類經濟邁入以知識為主軸的知識經濟之際,為了提升產業競爭力,世界各國不斷努力於產業升級與轉型,而企業競爭的優勢便在於其知識的品質,重視創意領先和科技研發。對企業而言,專利資訊不只為人類的智慧寶庫,也是研究開發人員重要的參考資料。企業所重視的是如何在茫茫大海般的專利文件中,將專利資料轉換為企業所需之有效資訊與情報。另外,由於專利資訊亦揭露專利侵權的警訊,智權管理人員藉由專利資訊隨時監控競爭對手的專利核准動向,以降低企業因侵權所應付出的龐大成本。另一方面,企業亦可進行專利部署,以專利作為擴散武器,增加市場佔有率或是進行策略性授權、交互授權、專利聯盟、技術轉移等。在本研究中,提出了一個以本體論為基之智慧型專利文件分類系統。本研究方法論的步驟如下:首先,本研究利用解析Web Ontology Language(OWL)文件,來取得領域知識的本體論;接著,藉由Term Frequency - Inverse Document Frequency(TF-IDF)為基之技術來擷取出專利文件中重要的關鍵詞彙,並以擷取出的關鍵詞彙為基礎,計算該關鍵詞彙所隱含本體論概念的機率。再者,將本體論與類神經網路結合,運用分類文件中關鍵詞彙出現的頻率與隱含本體論概念的機率及本體論關係的計算來進行專利文件的自動分類。此外,本研究還包含了專利文件的搜尋模組,來加強分類後文件的分析與使用。而本研究還提出修正回饋的機制,藉由更新詞彙機率及類神經網路的學習過程來增進分類的準確率。最後,本研究以化學機械研磨(CMP)領域和無線射頻識別(RFID)領域的專利文件為案例來測試自動分類系統之成效。
In order to stimulate novel ideas and avoid patent infringement during new product development, R&D engineers need to obtain existing patent information related to the development domain accurately and in a timely matter. Further, patent documents if we organized and categorized, can provide IP managers with a clean view of the state-of-the-art technologies in an efficient and effective way. Equipped with IP knowledge, companies can set R&D directions and develop patent portfolio and territory strategy to stay competitive in the global market place. This thesis proposes a patent categorization methodology by using Artificial Neural Network (ANN) to classify patent documents based on pre-constructed ontology. The proposed methodology not only recognizes Web Ontology Language (OWL) created by protégé but also acquire probabilities which key phrases belong to the specific concepts in domain ontology. The procedure of the proposed methodology, firstly, extracts key phrases from documents based on Term Frequency - Inverse Document Frequency (TF-IDF) method, and then summarizes a probability matrix between key phrases and concepts to calculate the probability that a specific key phrase contains a certain concept. Because combining frequencies and probabilities of key phases, this study can get better representative input values for ANN model. In addition, this research provides document searching module by selecting key phrases and setting weights to execute IP document analysis. Finally, this research uses patents of Chemical Mechanical Polishing (CMP) and Radio Frequency Identification (RFID) domains as case examples to illustrate and demonstrate the proposed methodology at work with superior results.
1.Akers, L., 2003, “The Future of Patent Information–a User with a View,” World Patent Information, Vol. 25, No. 4, pp. 303-312.
2.Baxendale, P. B., 1958, “Machine-made Index for Technical Literature - An Experiment,” IBM Journal of Research and Development, Vol. 2, No. 4, pp.354-361.
3.Beckett, D. (Ed.), 2004, “RDF/XML Syntax Specification (revised),” W3C Recommendation, available at: http://www.w3.org/TR/rdf-syntax-grammar/.
4.Berners, L.T., Handler, J. and Lassila, O., 2001, “The Semantic Web,” Scientific American, vol. 184.
5.Berners, L.T. “The Layered Technologies of the Semantic Web,” available at: http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html.
6.Broekstra, J., Klein, M., Decker, S., Fensel, D., Harmelen, F. V. and Horrocks, I., 2002, “Enabling Knowledge Representation on the Web by Extending RDF Schema,” Computer Networks, Vol. 39, pp. 609-634.
7.Burgin, R., and Dillon, M., 1992, “Improving Disambiguation in FASIT,” Journal of American Society for Information Science, Vol. 43, No. 2, pp. 101-114.
8.Campbell, R.S., 1983, “Patent Trends as a Technological Forecasting Tool,” World Patent Information, Vol. 5, No. 3, pp. 137-143.
9.Chiang, J., Chen, Y., 2001, “Hierarchical Fuzzy-knn Networks for News Documents Categorization,” Proceedings, the 10th IEEE International Conference on Fuzzy System, No. 2, pp. 720-723.
10.Decker, S., Melnik, S., Harmelen, F. V., Fensel, D., Klein, M., Broekstra, J., Erdmann, M. and Horrocks I., 2000, “The Semantic Web: the Roles of XML and RDF,” IEEE Internet Computing, Vol. 4, No. 5, pp. 63 – 73.
11.Decker, S., Mitra, P. and Melnik, S., 2000, “Framework for the Semantic Web: a RDF tutorial,” IEEE Internet Computing, Vol. 4, No. 6, pp. 68 – 73.
12.Edmundson, H. P., 1969, “New Method in Automatic Extracting” Journal of the ACM, Vol. 16, No. 2, pp.264-285.
13.Edvinsson, L., and Malone, M.S., 1997, 林大容譯,「智慧資本:如何衡量資訊時代無形資產的價值」,第32-33頁,麥田出版社。
14.Embley, D.W., Campbell, D.M., Smith, R.D., and Liddle, S.W., 1998, “Ontology-based Extraction and Structuring of Information from Data-rich Unstructured Documents,” Proc. ACM Conf. Inf. Knowledge Management, pp. 52–59.
15.Fagan, J.L., 1989, “The Effectiveness of a Nonsyntactic Approach to Automatic Phrase Indexing for Document Retrieval,” Journal of American Society for Information Science, Vol. 40, No. 2, pp.115-132.
16.Farkas, J., 1994, “Generating Document Dlusters using Thesauri and Neural Networks,” In Proceedings of the 1994 Canadian conference on electrical and computer engineering, Halifax, NS (Vol. 2, pp. 710–713).
17.Grossman, D., Frieder, O., Holmes, D., & Roberts, D., 1997, “Integrating Structured Data and Text: a Relational Approach,” Journal of the American Society for Information Science, Vol. 48, No. 2, pp. 122–132.
18.Guarino, N., Masolo, C., and Vetere, G., 1999, “OntoSeek: Content-based access to the web,” IEEE Intelligent. System, Vol. 14, No. 3, May-June, pp. 70–80.
19.Halliday, M., and Hasan, R., 1996, “Cohesion in Text,” London, Longmans.
20.Hsu, F.C., Trappey, A. J. C., Hou, J. L., Trappey, C. V., & Liu, S. J., 2004, “Develop a Multi-channel Legal Knowledge Service Center with Knowledge Mining Capability,” International Journal of Electronic Business Management, Vol. 2, No. 2, pp. 92–99.
21.Jennifer F., 1993, “Neural Networks and Document Classification,” Canadian Conference on Electrical and Computer Engineering, Canada.
22.Jennifer F., 1995, “Document Classification and Recurrent Neural Networks,” Canadian Conference on Electrical and Computer Engineering, Canada.
23.Jing-Shin, C., Keh-Yih, S., 1997, “A Multivariate Gaussian Mixture Model for Automatic Compound Word Extraction.” Proceedings of ROCLING-X International Conference 1997, pp. 123-142, Taipei, August 22-24.
24.Jones, K. S., 1972, “A Statistical Interpretation of Term Specificity and Its Application in Retrieval,” Journal of Documentation, Vol. 28, No. 1, pp. 11-20.
25.Jones, L.P., Gassie, E. W., and Radhakrishnan, S., 1990, “INDEX: The Statistical Basis for an Automatic Conceptual Phrase-Indexing System,” Journal of American Society for Information Science, Vol. 41, No. 2, pp.87-98.
26.Kamil I., 2003, “Intellectual property: A Power Tool for Economic Growth,” WIPO, http://www.wipo.int/freepublications/en/intproperty/888/wipo_pub_888_1.pdf.
27.Karras, D.A., and Mertzios, B.G., 2002, “A Robust Meaning Extraction Methodology using Supervised Neural Networks,” Proceedings, Australian Joint Conference on Artificial Intelligence, Canberra, Australia. December 2-6, pp. 498-510.
28.Kim, N. H., Jung, S. Y., Kang, C. S., and Lee, Z. H., 1999, “Patent Information Retrieval System,” Journal of Korea Information Processing, Vol. 6, No. 3, 5, pp. 80–85.
29.Ko, Y., and Seo, J., 2000, “Automatic Text Categorization by Unsupervised Learning,” Proceedings, The 17th conference on computational linguistics (COLING’2000), Saarbrücken, Germany, July 31-August 4 , pp. 453–459.
30.Krulwich, B., 1995, “Learning Document Category Descriptions through the Extraction of Semantically Significant Phrases,” Proceedings of IJCAI Workshop on Data Engineering for Inductive Learning, pp.1-10.
31.Lam, W., Han, Y., 2003, “Automatic Textual Document Categorization based on Generalized Instance Sets and a Metamodel,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, pp. 628-633.
32.Lam, W., Ruiz, M.E., and Srinivasan, P., 1999, “Automatic Text Categorization and its Applications to Text Retrieval,” IEEE Transactions on Knowledge Data Engineering, Vol. 11 No. 6, pp. 865–879.
33.Lim, S.S., Jung, S.W., and Kwon, H.C., 2004, “Improving Patent Retrieval System using Ontology,” Proceedings of the 30th Conference on IEEE Industrial Electronics Society, Korea, November. 2-6.
34.Liu, Z.Q., and Zhang Y., 2001, “A Competitive Neural Network Approach to Web-page Categorization,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 9, No. 6 pp. 731–741.
35.Luhn, H. P., 1957, “A Statistical Approach to Mechanized Encoding and Searching of Literary Information,” IBM Journal of Research and Development, Vol. 1, No. 4, pp. 309-317.
36.Mani, I., and Bloedorn, E., 1998, “Machine Learning of Generic and User-Focused Summarization,” Proceedings of the 15th National Conference on Artificial Intelligence, pp. 821-826.
37.Mao, W., and Chu, W., 2002, “Free-text Medical Document Retrieval Via Phrase-based Vector Space Model,” Proceeding, The 2002 AMIA Annual Symposium, San Antonio, TX, November, pp. 589-493.
38.Marcus, M., Santorini, B, and Marcinkiewicz M., 1993, “Building a Large Annotated Corpus of English: The Penn TreeBank,” Journal of Computational Linguistics, Vol.19, No.1, pp. 313-330.
39.Massey, L., 2003, “On the Quality ART1 Text Clustering,” Neural Networks, Vol. 16, pp. 771-778.
40.McCallum, A., and Nigam, K., 1998, “A Comparison of Event Models for Naive Bayes Text Classification,” Proceedings, AAAI’98 workshop on learning for text categorization, Madison, Wisconsin, July 26-30, pp. 41–48.
41.Miller, E, 1998, “An Introduction to the Resource Description Framework,” D-Lib Magazine, available at: http://www.dlib.org/dlib/may98/miller/05miller.html.
42.Nigam, K., Maccallum, A.K., Thrun, S., and Mitchell, T., 1999, “Text Classification from Labeled and Unlabeled Cocuments using EM,” Machine Learning Journal, pp. 103-134.
43.Paijmans, H., 1993, “Comparing the Document Representation of Two IR Systems: CLARIT and TOPIC,” Journal of American Society for Information Science, Vol. 44, No. 7, pp. 383-392.
44.Raghavan, V.V., & Wong, S.K.M., 1986, “A Critical Analysis of Vector Space Model for Information Retrieval,” Journal of the American Society for Information Science, Vol. 37, No. 5, pp. 279–287.
45.Ricardo, B.Y. and Berthier, R.N., 1999, “Modern Information Retrieval,” New York, Addison-Wesley.
46.Salton, G., and Buckley, C., 1988, “Term-Weighting Approaches in Automatic Text Retrieval,” Journal of Information Processing and Management, Vol. 24, No. 5, pp. 513-523.
47.Salton, G., 1989, “Automatic Text Processing; the Transformation Analysis, and Retrieval of Information by Computer,” Addision-Wesley, New York.
48.Stefan, P., 2005, “An Ontology for the RFID Domain,” available at: http://move.ec3.at/Ontology/RFIDOntology0903/RFIDOntology-Report.pdf
49.Studer, R., Benjamins, V. R., Fensel, D., 1998, “Knowledge Engineering: Principles and Methods”, Data and knowledge engineering, Vol. 25, pp. 161-197.
50.Svingen, B., 1998, “Using Genetic Programming for Document Classification,” Proceedings, The Eleventh International Florida Artificial Intelligence Reseach Society Conference, Sanibel Island, Florida, May 18-20, pp. 63-67.
51.Tam, V., Sangtoso, A., Setiono, R., 2002, “A Comparative Study of Centroid-based, Neighborhood-based and Statistical Approaches, for Effective Document Categorization,” Proceedings, 16th International Conference on Pattern Recognition, No. 4, pp. 235-238.
52.Trappey, A.J.C., Lin, S.C.I., and Wang, C.L., 2005,“Using Neural Network Categorization Method to Develop an Innovative Knowledge Management Technology for Patent Document Classification,” Proceeding, The 9th International Conference on Computer Supported Cooperative Work in Design, Coventry, United Kingdom, May 24-26, pp. 830-835.
53.Turban, E., King, D., Lee, J. and Viehland, D., 2006, “Electronic Commerce: a Managerial Perspective,” Pearson Education International.
54.van Rijsbergen C.J., 1979, “Information Retrieval,” Butterworth, London, England.
55.W3C, 2005, “Resource Description Framework (RDF),” available at: http://www.w3.org/RDF/.
56.William, S., Austin, T., 1999, “Ontologies”, IEEE Intelligent systems, pp. 18-19.
57.Won, S. H., Noh, T. G., Son, K. J., Park J. H., and Lee, S. J., 1999, “Vector Space Model for Patent Information Retrieval System,” Proceedings of the KIP Conference, Vol. 6, No. 3, 5, pp. 80-85.
58.Wu, Z., and Tseng, G., 1995, “ACTS: An Automatic Chinese Text Segmentation System for Full Text Retrieval,” Journal of American Society for Information Science, Vol. 46, No. 2, pp. 83-96.
59.Zhihang C., Chengwen N. and Yi L., 2006, “Neural Network Approaches for Text Document Categorization,” 2006 International Joint Conference on Neural Networks, Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada, July 16-21, 2006.
60.余駿,「本體論為基之智慧型財產專利文件自動摘要方法論研究」(指導教授:張瑞芬),碩士論文,國立清華大學,工業工程與工程管理研究所,2006年。
61.李宗翰,「使用Web Services技術及UNSPSC分類標準建立以XML為基之內容管理系統」(指導教授:張瑞芬),碩士論文,國立清華大學,工業工程與工程管理研究所,2003年。
62.林幸輝(指導教授:林榮慶),「整合CMP工程創研知識及商務知識之多代理人雛型模式」,碩士論文,國立台灣科技大學,機械工程所,2005年。
63.經濟部智慧財產局,http://www.tipo.gov.tw/。
64.高豪伸,「應用關鍵詞彙辨識技術與測量重要資訊密度之文件自動摘要系統」(指導教授:張瑞芬),碩士論文,國立清華大學,工業工程與工程管理研究所,2005年。
65.屠名正,「語意網技術導論」,□峰資訊股份有限公司,2006年。
66.張立典,「以知識表徵為基之文件分群法」(指導教授:張瑞芬),碩士論文,國立清華大學,工業工程與工程管理研究所,2005年。
67.張學民,「工程代理人之協調與溝通技術(RDF)−理論與實務」,經濟部學界開發產業技術計畫教育訓練課程,2004年。
68.陳光華,「資訊檢索的績效評估」, 現代資訊組織與檢索研討會,2004年。
69.陳家駿等,「專利管理高手」,資訊工業策進會科技法律中心,1998年。
70.葉怡成,「類神經網路模式應用與實作」,儒林資訊股份有限公司,2003年。
71.曾元顯,「專利文字之知識探勘:技術與挑戰」,現代資訊組織與檢索研討會,2004年11月19日,頁111-123。
72.蔡憲文,「利用基因演算法來使文件自動分類之研究」(指導教授:洪文斌),碩士論文,淡江大學,資訊工程學系,1998年。
73.廖益助,「使用演化計算改善模糊適應共振理論於文件分群之應用」(指導教授:陳大正),碩士論文,長榮管理學院,經營管理研究院,2002年。
74.鄭寶廷,「使用語意認知機制建置資源調配管理系統之研究」(指導教授:戚玉樑),碩士論文,中原大學,資訊管理研究所,2003年。
75.謝佳宏,「以本體論為基之類神經網路電子文件自動分類管理系統」(指導教授:張瑞芬),碩士論文,國立清華大學,工業工程與工程管理研究所,2005年。
76.關銘,「以OWL DL及SWRL為基礎建置推論雛形系統—以大學排課問題為例」(指導教授:戚玉樑),碩士論文,中原大學,資訊管理研究所,2004年。
77.蘇木春、張孝德,「機器學習:類神經網路、模糊系統以及基因演算法則」,全華科技圖書股份有限公司,2004年。
78.龔筆宏,馮是聰,「針對中文網頁評測kNN與NB分類演算法」,北京大學,電腦科學技術系,2002年。