以知識表徵為基之文件分群法｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	張立典 Li-Tien Chang
論文名稱：	以知識表徵為基之文件分群法 An Ontology-based Document Clustering Methodology
指導教授：	張瑞芬 Dr. Amy J.C. Trappey
口試委員:
學位類別：	碩士 Master
系所名稱：	工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management
論文出版年：	2005
畢業學年度：	93
語文別：	英文
論文頁數：	79
中文關鍵詞：	知識表徵、文件分群、模糊推論
外文關鍵詞：	Ontology, Fuzzy inference control, Document clustering
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

此論文主要是提出一個分析以及分群知識文件的方法論，現今有很多分析知識文件的方法，都使以關鍵字為基所發展出來，但是關鍵字不管對人或電腦來講，都是片斷的、比較沒意義的。因此再此我們提出一個以知識表徵為基的知識文件分析方法，藉由知識表徵，希望讓電腦能夠在某一程度下更能夠真正了解知識文件的內容。此方法主要分為幾大步驟，首先專家必須先建立某一領域的知識，並且輸入訓練資料以訓練系統字彙。在訓練完成後，便可作知識文件的分群。分群的步驟，首先之事文建會經過自然語言處理，然後再經由事先所訓練的字彙，找出代表知識文件的知識表徵，接著我們藉此知識表徵且利用模糊推論去推論知識文件間的關係值，最後再利用階層式的分群法對知識文件做分群動作。在此研究最後，我們會評估本方法的效果，並且與關鍵字為基的方法做比較與討論。

A purpose of the thesis is to present a novel method in analyzing, synthesizing and managing knowledge documents. In general, the methodologies that synthesize and management patents are almost using the key phrases as indices of knowledge documents. But the key phrases extracted from patents are meaningless to computers. Thus a novel methodology to analyze and manage knowledge documents based on ontology is developed in this thesis research. The methodology in this thesis enables computers to understand the knowledge documents in some degree via ontology instead of key phrases in this thesis. The methodology is divided into several steps. First, experts have to construct the specific domain ontology schema and put some training data to train the system. Then a learning method from natural language texts is adapted to infer the principal ontology of the knowledge documents. Therefore we use the fuzzy logic control (FLC) to infer the relationship between the knowledge documents and a suitable document cluster via ontology. Finally, we will evaluate the effectiveness of this methodology, and compare with knowledge document clustering based on key phrases.

Table of Content
中文摘要    I
Abstract    II
List of Figures    V
List of Tables    VI
1. Introduction    1
1.1 Background    1
1.2 Motivation    2
1.3 Thesis objective    4
2. Literature Review    5
2.1 Text mining    5
2.2 Ontology    7
2.3 Fuzzy logic control    10
2.4 Clustering methodology    11
3. Methodology Architecture and Functional Detail    14
3.1 Experts construct domain ontology schema    16
3.2 Terminology training    16
3.3 Natural language processing    18
3.4 Terminological analyzer    19
3.5 Knowledge extraction    20
3.6 Relationship generator of knowledge documents    22
3.7 Clustering of knowledge documents    31
4. Results and Evaluations of the Experiment    32
4.1 Data collection    32
4.2 Ontology construction    33
4.3 Result of terminology training    36
4.4 Results of knowledge extraction and clustering    42
4.5 Evaluation    47
5. Conclusions    52
Appendix 1. Schema of Business News    58
Appendix 2. Ontology of Business News in Protégé    59
Appendix 3. Schema of CMP Patent    60
Appendix 4. Ontology of CMP Patent in Protégé    64
Appendix 5. Clustering Result for Documents of Business News    65
Appendix 6. Clustering Result for Documents of CMP Patent    73

List of Figures
Figure 1. Two approaches (keyword-based and ontology-based) in knowledge document management on a computer platform    3
Figure 2. A graphic example of a statement in ontology    8
Figure 3. Architecture of fuzzy logic control    10
Figure 4. Tree structure of hierarchical clustering algorithm    12
Figure 5. Structure of neural network    13
Figure 6. Functional flow for the operation of FODM methodology    15
Figure 7. User interface of protege    16
Figure 8. An example of training data    18
Figure 9. Tagging process of a sentence    19
Figure 10. A chunked example    21
Figure 11. Process of filtering statement in ontology    22
Figure 12. Ontological comparison of two documents    24
Figure 13. Membership function of concept “many”    25
Figure 14. Membership function of concept “mediate”    26
Figure 15. Membership function of concept “few”    27
Figure 16. Inference process    28
Figure 17. Rules and concepts of fuzzy inference model    28
Figure 18. Membership function of concept “high”    29
Figure 19. Membership function of concept “mediate”    30
Figure 20. Membership function of concept “low”    31
Figure 21. Ontology of patent infringement, trade and application    34
Figure 22. Ontology of CMP    36
Figure 23. An ontological translation in business news    43
Figure 24. An ontological translation in CMP patent    44
Figure 25. Comparison of the results between two clustering methodologies    51


List of Tables
Table 1. RDF concepts    9
Table 2. Meaning of the tags    17
Table 3. Probability of lemma to concept    18
Table 4. An example of terminological analyzing    20
Table 5. Rules of fuzzy logic control for patent document analysis    23
Table 6. Profile of knowledge documents    33
Table 7. Terminology of business news (1)    37
Table 8. Terminology of business news (2)    38
Table 9. Terminology of CMP patent (1)    39
Table 10. Terminology of CMP patent (2)    40
Table 11. Terminology of CMP patent (3)    41
Table 12. Clustering result of business news    45
Table 13. Clustering result of CMP patent    46
Table 14. K-mean clustering result of CMP patent based on key phrases    47
Table 15. Relevant and retrieved sets    47
Table 16. Evaluation of knowledge extraction    49
Table 17. Precision and recall comparison between this research and TF*IDF    49
Table 18. Comparison between fuzzy logic control clustering based on ontology and K-mean clustering based on key phrases    51

                                

References
[1] Aizawa, A., 2003, “An information-theoretic perspective of tf–idf measures,” Information Processing and Management, Vol. 39, pp. 45-65.
[2] Champin, P-A., “RDF Tutorial”, 2001.
[3] Feng, F., and Bruce Croft W., “Probabilistic techniques for phrase extraction,” Information Processing and Management, 37, 2001, 199-220.
[4] Hou, J.L., Chan, C.A., “A document content extraction model using keyword correlation analysis,” International Journal of Electronic Business Management (Taiwan), Vol. 1, No. 1, 2003, 54-62.
[5] http://www.ontology.org/
[6] Kao, C-C. (Advisor: Prof. Y-H. Kuo, and J-H. Chiang), “Personalized information classification system with automatic ontology construction capability,” M.S. Thesis, Department of Computer Science & Information Engineering, 2000, National Cheng Kung University, Tainan, Taiwan.
[7] Kung, C-C. (Advisor: Prof. Y. H. Kuo), “Personalized XMLInformation service system with automatic object-oriented ontology construction,” M.S. Thesis, Department of Computer Science & Information Engineering, 2000, National Cheng Kung University, Tainan, Taiwan.
[8] Lam, S-L., and Lee, L-D., “Feature reduction for neural network based text categorization,” Proceedings of the 6th International Conference on Database Systems for Advanced Applications, 1999, ,195-202.
[9] Lee, C-S., Chen, Y-J., and Jian, Z-W., “Ontology-based fuzzy event extraction agent for Chinese e-news summarization,” Expert Systems with Applications, 25, 2003, 431-447.
[10] Liebowitz, J., “Knowledge management and its link to artificial intelligence,” Expert Systems with Applications, 20, 2001, 1-6.
[11] Lin, S. C. I. (Advisor: Prof. A. J. C. Trappey), “Using Neural Network Categorization Technology to Develop an Electronic Document Management System,” M.S. Thesis, Department of Industrial Engineering and Engineering Management, 2004, National Tsing Hua University, Hsinchu, Taiwan.
[12] Macintosh, A., Filby, I., and Kingston, J., “Knowledge management techniques: teaching and dissemination concepts,” Int. J. Human-Computer Studies, 1999, 549-566.
[13] Maiers, J., and Sherif, Y.S., ”Applications of fuzzy set theory,” IEEE Transactions Systems, SMC-15, 1985, 175-189.
[14] Malone, D., “Knowledge management: a model for organizational learning,” International Journal of Accounting Information Systems, 3, 2002, 111-123.
[15] Mamdani, E.H., “Application of fuzzy logic to approximate reasoning using linguistic synthesis,” IEEE Transactions on Computers, C-26, 1997, 1182-1191.
[16] Nevill-Manning, C. G.., Witten I. H., and Paynter G. W., “Lexically-generated subject hierarchies for browsing large collections,” Intranet. J. Digital Libraries, 2(2-3), 1999, 111-123.
[17] Nissen, M. E., “Knowledge-based knowledge management in the reengineering domain,” Decision Support Systems, 27, 1999, 47-65.
[18] Perrin, P., and Petry, F. E., “Extraction and representation of contextual information for knowledge discovery in texts,” Information Sciences, 151, 2003, 125-152.
[19] Rindflesch, T-C., and Fiszman, M., “The interaction of domain knowledge and linquistic structure in natural language processing: interpreting hypernymic propositions in biomedical text,” Journal of Biomedical Informatics, 2003, 36, 462-477.
[20] Runkler, T. A., and Bezdek, J. C., “Web mining with relational clustering,” International Journal of Approximate Reasoning, 32, 2003, 217-236.
[21] Russell, S., and Norvig, P., “Artificial intelligence a modern approach,” 2002, Addison-Wesley, New York.
[22] Sanchez, J-M., Garcia, and R., Bries, J-T., “An approach for incremental knowledge acquisition from text,” Expert System with Application, 25, 2003, 77-86.
[23] Sanchez, S. N., Triantaphyllou, E., and Kraft, D., “A feature mining based approach for the classification of text documents into disjoint classes,” Information Processing and Management, 38, 2002, 283-604.
[24] Shamsfard, M., and Barforoush, A.A., “Learning ontologies from natural language texts,” Human-Computer Studies, 60, 2004, 17-63.
[25] Takaki, T., and Sugeno, M., “Fuzzy identification of systems and its applications to modeling and control,” IEEE Transactions on Systems, SMC-15, 1985, 116-132.
[26] Vlajic, N., Card, H.C., “An adaptive neural network approach to hypertext clustering,” Neural Networks. IJCNN '99. International Joint Conference on, vol.6, 1999, 3722 - 3726
[27] Wang, H.F., and Wu, G.Y., “Multicriteria Fuzzy C-Mean Analysis,” Fuzzy Set & System, 64, 1994, 311-319.
[28] Witten, I. H., “Adaptive text mining: inferring structure from sequences,” Journal of Discrete Algorithms, 2, 2004, 137-159.
[29] Wu, Z., and Palmer, M., “Verb semantics and lexical selection,” Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, 1994, 133-138.
[30] Yuan, S-T., and Cheng, C., “Ontology-based personalized couple clustering for heterogeneous product recommendation in mobile marketing,” Expert System with Applications, 26, 2004, 461-476.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文