簡易檢索 / 詳目顯示

研究生: 張立典
Li-Tien Chang
論文名稱: 以知識表徵為基之文件分群法
An Ontology-based Document Clustering Methodology
指導教授: 張瑞芬
Dr. Amy J.C. Trappey
口試委員:
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 79
中文關鍵詞: 知識表徵文件分群模糊推論
外文關鍵詞: Ontology, Fuzzy inference control, Document clustering
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 此論文主要是提出一個分析以及分群知識文件的方法論,現今有很多分析知識文件的方法,都使以關鍵字為基所發展出來,但是關鍵字不管對人或電腦來講,都是片斷的、比較沒意義的。因此再此我們提出一個以知識表徵為基的知識文件分析方法,藉由知識表徵,希望讓電腦能夠在某一程度下更能夠真正了解知識文件的內容。此方法主要分為幾大步驟,首先專家必須先建立某一領域的知識,並且輸入訓練資料以訓練系統字彙。在訓練完成後,便可作知識文件的分群。分群的步驟,首先之事文建會經過自然語言處理,然後再經由事先所訓練的字彙,找出代表知識文件的知識表徵,接著我們藉此知識表徵且利用模糊推論去推論知識文件間的關係值,最後再利用階層式的分群法對知識文件做分群動作。在此研究最後,我們會評估本方法的效果,並且與關鍵字為基的方法做比較與討論。


    A purpose of the thesis is to present a novel method in analyzing, synthesizing and managing knowledge documents. In general, the methodologies that synthesize and management patents are almost using the key phrases as indices of knowledge documents. But the key phrases extracted from patents are meaningless to computers. Thus a novel methodology to analyze and manage knowledge documents based on ontology is developed in this thesis research. The methodology in this thesis enables computers to understand the knowledge documents in some degree via ontology instead of key phrases in this thesis. The methodology is divided into several steps. First, experts have to construct the specific domain ontology schema and put some training data to train the system. Then a learning method from natural language texts is adapted to infer the principal ontology of the knowledge documents. Therefore we use the fuzzy logic control (FLC) to infer the relationship between the knowledge documents and a suitable document cluster via ontology. Finally, we will evaluate the effectiveness of this methodology, and compare with knowledge document clustering based on key phrases.

    Table of Content 中文摘要 I Abstract II List of Figures V List of Tables VI 1. Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Thesis objective 4 2. Literature Review 5 2.1 Text mining 5 2.2 Ontology 7 2.3 Fuzzy logic control 10 2.4 Clustering methodology 11 3. Methodology Architecture and Functional Detail 14 3.1 Experts construct domain ontology schema 16 3.2 Terminology training 16 3.3 Natural language processing 18 3.4 Terminological analyzer 19 3.5 Knowledge extraction 20 3.6 Relationship generator of knowledge documents 22 3.7 Clustering of knowledge documents 31 4. Results and Evaluations of the Experiment 32 4.1 Data collection 32 4.2 Ontology construction 33 4.3 Result of terminology training 36 4.4 Results of knowledge extraction and clustering 42 4.5 Evaluation 47 5. Conclusions 52 Appendix 1. Schema of Business News 58 Appendix 2. Ontology of Business News in Protégé 59 Appendix 3. Schema of CMP Patent 60 Appendix 4. Ontology of CMP Patent in Protégé 64 Appendix 5. Clustering Result for Documents of Business News 65 Appendix 6. Clustering Result for Documents of CMP Patent 73 List of Figures Figure 1. Two approaches (keyword-based and ontology-based) in knowledge document management on a computer platform 3 Figure 2. A graphic example of a statement in ontology 8 Figure 3. Architecture of fuzzy logic control 10 Figure 4. Tree structure of hierarchical clustering algorithm 12 Figure 5. Structure of neural network 13 Figure 6. Functional flow for the operation of FODM methodology 15 Figure 7. User interface of protege 16 Figure 8. An example of training data 18 Figure 9. Tagging process of a sentence 19 Figure 10. A chunked example 21 Figure 11. Process of filtering statement in ontology 22 Figure 12. Ontological comparison of two documents 24 Figure 13. Membership function of concept “many” 25 Figure 14. Membership function of concept “mediate” 26 Figure 15. Membership function of concept “few” 27 Figure 16. Inference process 28 Figure 17. Rules and concepts of fuzzy inference model 28 Figure 18. Membership function of concept “high” 29 Figure 19. Membership function of concept “mediate” 30 Figure 20. Membership function of concept “low” 31 Figure 21. Ontology of patent infringement, trade and application 34 Figure 22. Ontology of CMP 36 Figure 23. An ontological translation in business news 43 Figure 24. An ontological translation in CMP patent 44 Figure 25. Comparison of the results between two clustering methodologies 51 List of Tables Table 1. RDF concepts 9 Table 2. Meaning of the tags 17 Table 3. Probability of lemma to concept 18 Table 4. An example of terminological analyzing 20 Table 5. Rules of fuzzy logic control for patent document analysis 23 Table 6. Profile of knowledge documents 33 Table 7. Terminology of business news (1) 37 Table 8. Terminology of business news (2) 38 Table 9. Terminology of CMP patent (1) 39 Table 10. Terminology of CMP patent (2) 40 Table 11. Terminology of CMP patent (3) 41 Table 12. Clustering result of business news 45 Table 13. Clustering result of CMP patent 46 Table 14. K-mean clustering result of CMP patent based on key phrases 47 Table 15. Relevant and retrieved sets 47 Table 16. Evaluation of knowledge extraction 49 Table 17. Precision and recall comparison between this research and TF*IDF 49 Table 18. Comparison between fuzzy logic control clustering based on ontology and K-mean clustering based on key phrases 51

    References
    [1] Aizawa, A., 2003, “An information-theoretic perspective of tf–idf measures,” Information Processing and Management, Vol. 39, pp. 45-65.
    [2] Champin, P-A., “RDF Tutorial”, 2001.
    [3] Feng, F., and Bruce Croft W., “Probabilistic techniques for phrase extraction,” Information Processing and Management, 37, 2001, 199-220.
    [4] Hou, J.L., Chan, C.A., “A document content extraction model using keyword correlation analysis,” International Journal of Electronic Business Management (Taiwan), Vol. 1, No. 1, 2003, 54-62.
    [5] http://www.ontology.org/
    [6] Kao, C-C. (Advisor: Prof. Y-H. Kuo, and J-H. Chiang), “Personalized information classification system with automatic ontology construction capability,” M.S. Thesis, Department of Computer Science & Information Engineering, 2000, National Cheng Kung University, Tainan, Taiwan.
    [7] Kung, C-C. (Advisor: Prof. Y. H. Kuo), “Personalized XMLInformation service system with automatic object-oriented ontology construction,” M.S. Thesis, Department of Computer Science & Information Engineering, 2000, National Cheng Kung University, Tainan, Taiwan.
    [8] Lam, S-L., and Lee, L-D., “Feature reduction for neural network based text categorization,” Proceedings of the 6th International Conference on Database Systems for Advanced Applications, 1999, ,195-202.
    [9] Lee, C-S., Chen, Y-J., and Jian, Z-W., “Ontology-based fuzzy event extraction agent for Chinese e-news summarization,” Expert Systems with Applications, 25, 2003, 431-447.
    [10] Liebowitz, J., “Knowledge management and its link to artificial intelligence,” Expert Systems with Applications, 20, 2001, 1-6.
    [11] Lin, S. C. I. (Advisor: Prof. A. J. C. Trappey), “Using Neural Network Categorization Technology to Develop an Electronic Document Management System,” M.S. Thesis, Department of Industrial Engineering and Engineering Management, 2004, National Tsing Hua University, Hsinchu, Taiwan.
    [12] Macintosh, A., Filby, I., and Kingston, J., “Knowledge management techniques: teaching and dissemination concepts,” Int. J. Human-Computer Studies, 1999, 549-566.
    [13] Maiers, J., and Sherif, Y.S., ”Applications of fuzzy set theory,” IEEE Transactions Systems, SMC-15, 1985, 175-189.
    [14] Malone, D., “Knowledge management: a model for organizational learning,” International Journal of Accounting Information Systems, 3, 2002, 111-123.
    [15] Mamdani, E.H., “Application of fuzzy logic to approximate reasoning using linguistic synthesis,” IEEE Transactions on Computers, C-26, 1997, 1182-1191.
    [16] Nevill-Manning, C. G.., Witten I. H., and Paynter G. W., “Lexically-generated subject hierarchies for browsing large collections,” Intranet. J. Digital Libraries, 2(2-3), 1999, 111-123.
    [17] Nissen, M. E., “Knowledge-based knowledge management in the reengineering domain,” Decision Support Systems, 27, 1999, 47-65.
    [18] Perrin, P., and Petry, F. E., “Extraction and representation of contextual information for knowledge discovery in texts,” Information Sciences, 151, 2003, 125-152.
    [19] Rindflesch, T-C., and Fiszman, M., “The interaction of domain knowledge and linquistic structure in natural language processing: interpreting hypernymic propositions in biomedical text,” Journal of Biomedical Informatics, 2003, 36, 462-477.
    [20] Runkler, T. A., and Bezdek, J. C., “Web mining with relational clustering,” International Journal of Approximate Reasoning, 32, 2003, 217-236.
    [21] Russell, S., and Norvig, P., “Artificial intelligence a modern approach,” 2002, Addison-Wesley, New York.
    [22] Sanchez, J-M., Garcia, and R., Bries, J-T., “An approach for incremental knowledge acquisition from text,” Expert System with Application, 25, 2003, 77-86.
    [23] Sanchez, S. N., Triantaphyllou, E., and Kraft, D., “A feature mining based approach for the classification of text documents into disjoint classes,” Information Processing and Management, 38, 2002, 283-604.
    [24] Shamsfard, M., and Barforoush, A.A., “Learning ontologies from natural language texts,” Human-Computer Studies, 60, 2004, 17-63.
    [25] Takaki, T., and Sugeno, M., “Fuzzy identification of systems and its applications to modeling and control,” IEEE Transactions on Systems, SMC-15, 1985, 116-132.
    [26] Vlajic, N., Card, H.C., “An adaptive neural network approach to hypertext clustering,” Neural Networks. IJCNN '99. International Joint Conference on, vol.6, 1999, 3722 - 3726
    [27] Wang, H.F., and Wu, G.Y., “Multicriteria Fuzzy C-Mean Analysis,” Fuzzy Set & System, 64, 1994, 311-319.
    [28] Witten, I. H., “Adaptive text mining: inferring structure from sequences,” Journal of Discrete Algorithms, 2, 2004, 137-159.
    [29] Wu, Z., and Palmer, M., “Verb semantics and lexical selection,” Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, 1994, 133-138.
    [30] Yuan, S-T., and Cheng, C., “Ontology-based personalized couple clustering for heterogeneous product recommendation in mobile marketing,” Expert System with Applications, 26, 2004, 461-476.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE