簡易檢索 / 詳目顯示

研究生: 謝佳宏
Chia-Hung Hsieh
論文名稱: 以本體論為基之類神經網路電子文件自動分類管理系統
Ontology-Based Neural Network Electronic Document Categorization System
指導教授: 張瑞芬
Dr. Amy J.C. Trappey
口試委員:
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 94
中文關鍵詞: 類神經網路文件分類本體論知識管理
外文關鍵詞: neural network, document categorization and classification, ontology, knowledge management
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電腦科技及網路技術的發展導致資訊大量的產生,愈來愈多的專利知識文件也增加分類一致性與迅速性的難度。甚至連文件搜尋的快速與準確性也更加困難。為了解決這個問題,許多專家視文件管理,特別是專利知識文件,為一個重要的議題並投入專利領域研究中。在傳統的分類方法中,領域專家審視文件內容後依據他們的經驗來分類文件類別。由於傳統的方法常常導致分類結果不一致,自動文件分類變成一個重要的研究。本研究提出一個以預先建置的本體論架構為基的類神經網路的專利文件分類系統。首先,該系統可藉由使用詞性及句型分析來擷取文件中的特徵值。接下來系統將比對這些特徵值與本體論架構中的概念(Concept)及關係(Relation),並利用本文提出的詞彙比對方法(Terminology mapping method)來考慮來自概念與關係的影響,並轉換成類神經網路模式的輸入值。最後,系統利用一訓練完整的類神經網路模式來計算並推論文件與類別的相關性。此研究以Chemical mechanical polishing (CMP) 專利知識文件與專利商務知識新聞兩個領域做為系統的載具,並測試系統的效能。CMP是一種廣泛應用在半導體製程上的平坦化技術,因此其專利的發展與應用極其重要。專利商務知識則是有關專利的開發、交易及應用等相關新聞。做好上述兩個領域文件的分類,不但有助於文件搜尋的速度及準確性,更進一步也能幫助我們迴避專利陷阱。在本研究提出的系統中,我們利用國際專利分類碼(IPC)作為專利分類的標準,並且進行系統效能的評估。


    The development of modern computer science leads to information being generated speedily. More knowledge documents about patent are difficult to be classified consistently and promptly. In order to solve this problem, many specialists take document management, especially for technical reports and patent documents, as a significant research issue combining the expertise of IT, IP, and domain experts. In traditional categorization for patent documents, domain experts classify documents based on their experiences after reviewing documental contents. The development of automatic document categorization becomes an important research because traditional methods often bring inconsistent classification results. In this thesis, a categorization method using artificial neural network (ANN) is developed to classify patent documents based on pre-constructed ontology. Firstly, this system extracts the features of a document by using morphological analysis and sentence analysis. Secondly, these features are matched with classes and relations of pre-constructed ontology, and transferred into the inputs of ANN using two weight-transferring functions proposed in this research. Thirdly, a well-trained ANN model is applied to calculate and infer the relationships between given document and categories. We take two domains, Chemical Mechanical Polishing (CMP) and business knowledge documents, as our case studies to demonstrate the proposed system at work. International Patent Classification (IPC) is constructed as the classification schema and hierarchy.

    中文摘要 I Abstract II 致謝辭 III 1. Introduction 1 1.1 Research Motivation and Objectives 1 1.2 Research Method and Progress Phase 1 2. Literature Review 3 2.1 Document Content Analysis 3 2.1.1 The Relation between Feature and Document 3 2.1.2 Feature Selection 3 2.2 Document Classification Methodologies 4 2.2.1 Vector Space Model (VSM) 4 2.2.2 k-Nearest-Neighbor (kNN) 5 2.2.3 Decision Tree 5 2.2.4 Naïve Bayes 5 2.2.5 Genetic Algorithm 6 2.2.6 Artificial Neural Network 7 2.3 Ontology 8 2.3.1 Introduction of Ontology 8 2.3.2 Ontology Language 9 2.3.3 Application of Ontology in Knowledge Management 12 3. System Methodology 13 3.1 System Architecture 13 3.1.1 Architecture of System 13 3.1.2 Procedure of Methodology 14 3.2 Document Content Analysis 15 3.3 Ontology-Based Neural Network Model 15 3.3.1 Back-Propagation Network 16 3.3.2 Ontology Expression 19 3.3.3 Neural Network Model 21 3.3.4 Construction Techniques for Neural Network Model 22 3.4 Terminology Mapping Method 25 3.4.1 Input of the Network from Concept 25 3.4.2 Input of the Network from Relation 26 3.5 Document Searching Methodology 29 4.System Analysis and Design 30 4.1 Software and Hardware 30 4.1.1 System Development Tool 30 4.2 Function Module Design 31 4.2.1 Function Module 31 4.2.2 Module Flow Chart 31 4.3 System Database Design 33 4.4 Neural Network Model Analysis 35 4.4.1 Constructing Hierarchical Document Classification 35 4.4.2 Construct the Domain Ontology Schema 38 4.4.3 Document Content Analysis 42 4.4.4 Network Model Constructing 44 4.4.5 Required-Parameter Analysis 45 5. System Implementation and Evaluation 49 1.1 System Implementation 49 5.1.1 Pre-Training Function 49 5.1.2 Classified Function 54 5.1.3 Search Function 55 5.2 System Evaluation 57 5.2.1 Estimative Criterion of Retrieval Effectiveness for System 57 5.2.2 System Evaluation 59 6. Conclusions and Future Works 62 6.1 Conclusions 62 6.2 Future Works 63 References 64 Appendix 1. Stop Word List 70 Appendix 2. Key Concepts and Relations in System 73

    1.Baeza-Yates, R., and Ribeiro-Neto, B., 1999, Modern Information Retrieval, Addison-Wesley, New York.
    2.Benkhalifa, M., Bensaid, A., and Mouradi, A., 1999, “Text categorization using the semi-supervised fuzzy c-means algorithm,” Proceedings, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS, Pensacola Beach, Florida, August 20-12, pp. 561-565.
    3.Chiang, J., and Chen, Y., 2001, “Hierarchical fuzzy-kNN networks for news documents categorization,” Proceedings, the 10th IEEE International Conference on Fuzzy Systems, No. 2, Melbourne, Australia, pp. 720-723.
    4.Chen, E.H., Wang, S.F., Zhang, Z.Y., and Wang, X., 2001, “Document classification with CC4 neural network,” Proceedings, 8TH International Conference on Neural Information Processing (ICONIP 2001), Sanghai, China, November 14-18, Vol. 1-3, pp. 576-581.
    5.Farkas, J., 1994, “Generating document clusters using thesauri and neural networks,” Proceedings, the 1994 Canadian Conference on Electrical and Computer Engineering, Halifax, Nova Scotia, September 25-28, Vol. 2, pp. 710-713.
    6.Fellbaum, C. (Ed.), 1999, WordNet: An Electronic Lexical Database, MIT press, Cambridge, Massachusetts.
    7.Frakes, W.B., and Baezay, R., 1992, Information Retrieval: Data Structures and Algorithms, Prentice-Hall, New Jersey.
    8.Gravano, L., Garcia-Molina, H., and Tomasic, A., 1999, “Text-source discovery over the internet,” ACM Transactions on Database Systems, Vol. 24, No. 2, pp. 229-264.
    9.Grossman, D., Frieder, O., Holmes, D., and Roberts, D., 1997, “Integrating structured data and text: A relational approach,” Journal of the American Society for Information Science, Vol. 48, No. 2, pp. 122-132.
    10.Gruber, T.R., 1992, “ONTOLINGUA: A mechanism to support portable ontologies,” Technical Report, Knowledge Systems Laboratory, Stanford University, Stanford, CA.
    11.Guarino, N., 1999, “The role of identity conditions in ontology design,” Lecture Notes in Computer Science, Springer-Verlag London, UK, Vol. 1661, pp. 221-234.
    12.Holland, J. H., 1992, “Genetic algorithms,” Scientific American, Vol. 4, pp. 44-50.
    13.Karras, D.A., and Mertzios, B.G., 2002, “A robust meaning extraction methodology using supervised neural networks,” Proceedings, Australian Joint Conference on Artificial Intelligence, Canberra, Australia. December 2-6, pp. 498-510.
    14.Ko, Y., and Seo, J., 2000, “Automatic text categorization by unsupervised learning,” Proceedings, The 17th conference on computational linguistics (COLING’2000), Saarbrücken, Germany, July 31-August 4 , pp. 453–459.
    15.Kao, C.H. (Advisor: Prof. Kuo, Y.H., and Chiang, J.H.), 2000, “Personalized information classification system with automatic ontology construction capability,” M.S. Thesis, Dept. of Computer Science and Information Engineering, National Cheng Kung University, Taiwan.
    16.Lam, W., and Han, Y., 2003, “Automatic textual document categorization based on generalized instance sets and a metamodel,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, pp. 628-633.
    17.Lam, W., Ruiz, M.E., and Srinivasan, P., 1999, “Automatic text categorization and its applications to text retrieval,” IEEE Transactions on Knowledge Data Engineering, Vol. 11 No. 6, pp. 865–879.
    18.Lee, C.S., Chen, Y.J., and Jian Z.W., 2003, “Ontology-based fuzzy event extraction agent for chinese e-news summarization,” Expert Systems with Applications, Vol. 25, Issue: 3, October, 2003, pp. 431–447.
    19.Lee, T.H. (Advisor: Prof. Trappey, A.J.C.), 2003, “XML-based content management platform using web services technique and UNSPSC standard classification schema,” M.S. Thesis, Dept. of Computer Science and Information Engineering, National Cheng Kung University, Taiwan.
    20.Lee, H.M., Chen, C.M., and Hwang, C.W., 2000, “A neural network document classifier with linguistic feature selection,” Proceedings, The 13th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, New Orleans, Louisiana, June 19–22, pp. 555–560.
    21.Li, W., Lee, B., Krausz, F. and Sahin, K., 1991, “Text classification by a neural network,” Proceedings, The 1991 Summer Computer Simulation Conference. Twenty-Third Annual Summer Computer Simulation Conference, Baltimore, Maryland, July 22-24, pp. 313-318.
    22.Liao, C.H. (Advisor: Prof. Kuo, Y.H.), 2002, “Automatic ontology construction approach and its application for information classification,” M.S. Thesis, Dept. of Computer Science and Information Engineering, National Cheng Kung University, Taiwan.
    23.Lin, S.C.I. (Advisor: Prof. Trappey, A.J.C.), 2004, “Using neural network categorization technology to develop an electronic document management system,” M.S. Thesis, Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu, Taiwan.
    24.Liu, H., 2004, “MontyLingua: An End-to-End Natural Language Processor with Common Sense,” Retrieved May 1 from http://web.media.mit.edu/~hugo/montylingua
    25.Liu, Z.Q., and Zhang Y., 2001, “A competitive neural network approach to web-page categorization,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 9, No. 6 pp. 731–741.
    26.Mao, W., and Chu, W., 2002, “Free-text medical document retrieval via phrase-based vector space model,” Proceedings, The 2002 AMIA Annual Symposium, San Antonio, TX, November, pp.489-493.
    27.Massey, L., 2003, “On the quality ART1 text clustering,” Neural Networks, Vol. 16, pp. 771-778.
    28.Matsuo, Y., and Ishizuka, M., 2004, “Keyword extraction from a single document using word co-occurrence statistical information,” International Journal on Artificial Intelligence Tools, Vol. 13(1), pp. 157-169.
    29.McCallum, A., and Nigam, K., 1998, “A comparison of event models for naive bayes text classification,” Proceedings, AAAI’98 workshop on learning for text categorization, Madison, Wisconsin, July 26-30, pp. 41–48.
    30.Meier, J., and Sprague, R., 1996, “Towards a better understanding of electronic document management,” Proceedings, The Twenty-Ninth Hawaii International Conference on System Sciences, Hawaii, January 3-6,Vol. 5, pp. 53-61.
    31.Mostafa, J., and Lam, W., 2000, “Automatic classification using supervised learning in a medical document filtering application,” Information processing & Management, Vol. 36, Issue. 3, pp. 415-444.
    32.Nigam, K., Maccallum, A.K., Thrun, S., and Mitchell, T., 1999, “Text classification from labeled and unlabeled documents using EM,” Machine Learning Journal, pp. 103-134.
    33.Petridis, V., and Kaburlasos, V.G., 2001, “Clustering and classification in structured data domains using Fuzzy Lattice Neurocomputing (FLN),” Knowledge and Data Engineering, IEEE Transactions, Vol. 13, No. 2, pp.245-260.
    34.Raghavan, V.V., and Wong, S.K.M., 1986, “A critical analysis of vector space model for information retrieval,” Journal of the American Society for Information Science, Vol. 37, No. 5, pp. 279-87.
    35.Salton, G., and Buckley, C., 1988, “Term weighting approaches in automatic information retrieval,” Journal of Information Proceeding and Management, Vol. 24, No. 3, pp. 513-524.
    36.Salton, G., 1989, Automatic Text Processing; The Transformation Analysis, and Retrieval of Information by Computer, Addison-Wesley, New York.
    37.Salton, G., Fox, E.A., and Wu, H., 1983, “Extended boolean information retrieval,” Communications of the ACM, Vol. 26, No. 12, pp.1022-1036.
    38.Sasaki, M., and Kita, K., 1998, “Rule-based text categorization using hierarchical categories,” Proceeding, IEEE International Conference on Systems, Man, and Cybernetics, San Diego, California, October 11-14, No. 3, pp. 2827-2830.
    39.Selamat A., and Omatu S., 2004, “Web page feature selection and classification using neural networks,” Information Sciences, Vol. 158, pp. 69-88.
    40.Svingen, B., 1998, “Using genetic programming for document classification,” Proceedings, The Eleventh International Florida Artificial Intelligence Research Society Conference, Sanibel Island, Florida, may 18-20 , pp. 63-67.
    41.Tam, V., Santoso, A., and Setiono, R., 2002, “A comparative study of centroid-based, neighborhood-based and statistical approaches for effective document categorization,” Proceedings, The 16th International Conference on Pattern Recognition, Quebec, CANADA, August 11–15, Vol. 4, pp. 235-238.
    42.Tijerino, Y.A., and Mizoguchi, R., 1993, “MULTIS II: Enabling End Users to Design Problem Solving Engines via Two-level Task Ontologies,” Lecture Notes in Artificial Intelligence 723: Knowledge Acquisition for Knowledge-Based Systems, Caylus, France, pp. 340-359.
    43.Trappey, A.J.C., Simon C.I., and Wang, C.L., 2005,“Using neural network categorization method to develop an innovative knowledge management technology for patent document classification,” proceedings, Proceeding, The 9th International Conference on Computer Supported Cooperative Work in Design, Coventry, United Kingdom, May 24-26, pp. 830-835.
    44.Trappey, A. J. C., Hsu, F. C., Hou, A. J. L., Trappey, C. V., and Liu, S. J., 2004, “Designing a multi-channel legal knowledge service center using data analysis and contact center technology,” Proceedings, The 8th World Multi-Conference on Systemics, Cybernetics and Informatics (SCI), Orlando, Florida, July 18-21, Vol. XVII, pp. 132-136.
    45.van Rijsbergen C.J., 1979, Information Retrieval, Butterworths, London, England.
    46.World Wide Web Consortium(W3C), 2004, “RDF Primer,” Retrieved May 10 from http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#rdfschema
    47.Zadeh, and Lofti A., 1988, “Fuzzy Logic”, Computer, pp. 83-93.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE