研究生: |
林家誼 Simon C.I. Lin |
---|---|
論文名稱: |
應用類神經網路文件自動分類技術建構電子化知識文件管理系統 Using Neural Network Categorization Technology to Develop an Electronic Document Management System |
指導教授: |
張瑞芬
Amy J.C. Trappey |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2004 |
畢業學年度: | 92 |
語文別: | 中文 |
論文頁數: | 78 |
中文關鍵詞: | 知識管理 、文件管理 、電子化文件 、自動分類 、類神經網路 、後向傳導網路 |
外文關鍵詞: | document management, document categorization, document classification, neural network, back- propagation network, patent document |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文著重於電子化知識文件之分類管理,應用類神經網路之技術,建構一個具備自我學習能力的文件自動分類系統,來幫助使用者對大量之電子化知識文件進行分類管理。首先透過文字處理(text processing)技術,對文件進行關鍵字詞擷取,從中選取資訊量高的字詞,作為文件關鍵字詞。接著計算關鍵字詞出現頻率,再進一步推導出關鍵字詞之間的相關性矩陣,透過相關性矩陣可進行關鍵字詞合併之動作。最後可得一關鍵字詞頻率向量,可充分代表文件之內容。接著本論文使用已經發展相當成熟的後向傳導網路系統(Back-Propagation Network)來做為分類運算法的基礎,透過網路的學習機制可使自動分類的輸出接近於真實人類的分類結果。在本論文中我們運用國際專利分類系統(International Patent Classification)建立階層式的文件類別,再利用動力手工具相關之專利文件進行分類測試,可達到相當程度之分類準確率。最後本論文亦利用訓練完成之網路系統,發展一個文件搜尋功能,可透過網路計算之結果輔助使用者縮小文件搜尋範圍。結果顯示本論文發展的文件分類功能與文件搜尋功能對於大量文件之分類管理與搜尋有顯著之效益。
In order to process huge amount of electronic documents in an organized manner, automatic document categorization is an important research area in the area of explicit knowledge management. In this paper, we propose a new document classification methodology based on neural network technology. We first extract key phrases from the document set by means of text processing and determine the significance of key phrases by frequency. After significant terms are extracted, a keyword correlation analysis model is applied to compute similarity between terms. Then, synonyms are reduced to higher similarity terms. We adopt the back-propagation network model as a classifier. The target output is to identify a document’s proper category based on a hierarchical document classification scheme. In this research, patents related to designs of power hand tools are studied in their IPC classification scheme. Any hand tool patent can be automatically and accurately classified using the pre-trained neural network models. In the prototype system we provide two modules for explicit knowledge management. The automatic classification module helps the user classify patent documents and the search module helps the user identify the correct patent document quickly. The result shows a very significant improvement in document classification and identification in explicit knowledge management.
參考文獻
1. 許雅芬,「新聞文件自動分類之研究」,碩士論文(指導教授:柯淑津),東吳大學資訊科學系研究所(2001)
2. 闕豪恩,「模糊相關應用於文件多重分類問題」,碩士論文(指導教授:林丕靜),淡江大學資訊工程學系研究所(2000)
3. 莊慧美,「以智慧型計算方法探討文件分類」,碩士論文(指導教授:李偉柏),屏東科技大學資訊管理系研究所(1999)
4. 吳文峰,「中文郵件分類器之設計及實作」,碩士論文(指導教授:賴榮滄),逢甲大學資訊工程系研究所(2001)
5. 洪春鳳,「媒介模型為主之XML文件倉儲架構」,碩士論文(指導教授:陳煇煌),大同大學資訊工程研究所(2001)
6. 鄭靜如,「以可延伸標記語言建立一個自動化文件管理系統」,碩士論文(指導教授:曾憲雄),交通大學資訊科學系研究所(2000)
7. 陳彥呈,「智慧型新聞推薦系統」,碩士論文(指導教授:蔣榮先),成功大學資訊工程研究所(2000)
8. 蔡佩君,「在電子商務環境下建構以XML為基礎的顧客回應文件管理之研究」,碩士論文(指導教授:翁頌舜),輔仁大學資訊管理學系研究所(2000)
9. 廖益助,「使用演化計算改善模糊適應共振理論於文件分群之應用」,碩士論文(指導教授:陳大正),長榮管理學院經營管理研究所(2001)
10. 雷穎傑,「應用在結構化文件之階層式文件分群法」,碩士論文(指導教授:曾憲雄),交通大學資訊科學系研究所(2002)
11. 張柏年,「以倒傳遞網路為基礎之自動化晶圓缺陷檢測系統」,碩士論文(指導教授:陳飛龍),清華大學工業工程與工程管理學系研究所(2003)
12. 李宗翰,「使用Web Services技術及UNSPSC分類標準建立以XML為基之內容管理系統」,碩士論文(指導教授:張瑞芬),清華大學工業工程與工程管理學系研究所(2003)
13. 孫銘聰,「啟發式電子化文件權限推論模式與技術建構」,碩士論文(指導教授:侯建良),清華大學工業工程與工程管理學系研究所(2003)
14. 尤克強,「知識管理與創新」,天下遠見出版(2001)
15. 勤業管理顧問公司,劉京偉譯,「知識管理的第一本書」,商周出版(2000)
16. 杜拉克等著,張玉文譯,「知識管理」,天下遠見出版(2000)
17. Downes, L., Mui, C., 邱文寶譯,「Killer App 12步打造數位企業」,天下遠見出版(2000)
18. 羅華強,「類神經網路:Matlab的應用」,清蔚科技出版(2001)
19. 周政宏,「神經網路─理論與實務」,松崗出版(1995)
20. 蔡瑞煌,「類神經網路概論」,三民出版(1995)
21. 盧炳勳、曹登發編譯,「類神經網路理論與應用」,全華出版(1992)
22. 經濟部智慧財產局, http://www.tipo.gov.tw/ (2003)
23. Antonie, M.-L., Zaiane, O.R., 2002, “Text document categorization by term association,” Proceedings, IEEE International Conference on Data Mining, pp. 19-26.
24. Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J., 1999, “Partitioning-based clustering for web document categorization,” Decision Support Systems, Vol. 27, pp. 329-341.
25. Bayer, T., Kressel, U., Mogg-Schneider, H., Renz, I., 1998, “Categorizing paper documents – a generic system for domain and language independent text categorizaiton,” Computer Vision and Image Understanding, Vol. 70, pp. 299-306.
26. Chiang, J., Chen, Y., 2001, “Hierarchical fuzzy-knn networks for news documents categorization,” Proceedings, the 10th IEEE International Conference on Fuzzy Systems, No. 2, pp. 720-723.
27. Chen, H., Schuffels, C., Orwig R., 1996, “Internet categorization and search: a self-organizing approach,” Journal of Visual Communication and Image Representation, Vol. 7, No. 1, pp. 88-102.
28. Deng, W., Wu, W., 2001, “Document categorization and retrieval using semantic microfeatures and growing cell structures,” Proceedings, 12th International Workshop on Database and Expert Systems Applications, pp. 270-274.
29. Deng, J., Chen, L., 2000, “Web documents categorization using fuzzy representation and HAC,” Proceedings, the First International Conference on Web Information Systems Engineering, Vol. 2, pp. 24-28.
30. Frigui, H., Nasraoui, O., 2002, “Simultaneous categorization of text documents and identification of cluster-dependent keywords,” Proceedings, the 2002 IEEE International Conference on Fuzzy Systems, No. 2, pp. 1108-1113.
31. Gurney, K., 1997, An Introduction to Neural Networks, London, UCL Press.
32. Hou, J.L., Chan, C.A., 2003, “A document content extraction model using keyword correlation analysis,” International Journal of Electronic Business Management (Taiwan), Vol. 1, No. 1, pp. 54-62.
33. Haykin, S., 1999, Neural Networks – a comprehensive foundation, Second Edition, Upper Saddle River, NJ, Prentice-Hall International, Inc.
34. Huaizhong Kou, Gardarin, G., 2002, “Similarity model and term association for document categorization,” Proceedings, the 13th International Workshop on Database and Expert Systems Applications, pp. 218-222.
35. Ko, Y., Park, J., Seo, J., 2004, “Improving text categorization using importance of sentences,” Information Processing and Management, Vol. 40, pp. 65-79.
36. Lam, W., Han, Y., 2003, “Automatic textual document categorization based on generalized instance sets and a metamodel,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, pp. 628-633.
37. Luo, X., Nur Zincir-Heywood, A., 2003, “A comparison of SOM based document categorization systems,” Proceedings, the International Joint Conference on Neural Networks, Vol. 3, pp. 1786-1791.
38. Mak, B., Bui, T., and Blanning, R., 1996, “Aggregating and updating experts’ knowledge: an experimental evaluation of five classification techniques,” Expert System with Applications, Vol. 10, No. 2, pp. 233-241.
39. Massey, L., 2003, “On the quality ART1 text clustering,” Neural Networks, Vol. 16, pp. 771-778.
40. Maderlechner, G., Suda, P., Bruckner, T., 1997, “Classification of document by form and content,” Pattern Recognition Letters, Vol. 18, pp. 1225-1231.
41. Meier, J., Sprague, R. 1996, “Towards a better understanding of electronic document management,” Proceedings, the Twenty-Ninth Hawaii International Conference on System Sciences, No. 5, pp. 53-61.
42. Mladenic, D., Grobelnik, M., 2003, “Feature selection on hierarchy of web documents,” Decision Support Systems, Vol. 35, pp. 45-87.
43. O’leary, D. E., 1998, “Enterprise knowledge management,” Computer IEEE, Vol. 31, pp. 54-61.
44. Principe, J. C., Euliano, N. R., Lefebvre, W. C., 2000, Neural and Adaptive Systems – fundamentals through simulations, New York, John Wiley & Sons, Inc.
45. Sasaki, M., Kita, K., 1998, “Rule-based text categorization using hierarchical categories,” Proceedings, IEEE International Conference on Systems, Man, and Cybernetics, No. 3, pp. 2827-2830.
46. Tam, V., Santoso, A., Setiono, R., 2002, “A comparative study of centroid-based, neighborhood-based and statistical approaches for effective document categorization,” Proceedings, 16th International Conference on Pattern Recognition, No. 4, pp. 235-238.
47. Turban, E., Aronson, J. E., 2001, Decision Support Systems and Intelligent Systems, Sixth Edition, Upper Saddle River, NJ, Prentice-Hall International, Inc.
48. Tu, C. L., Trappey, A.J.C., 1998, “Decision making of mortgage loan approval using artificial neural network approach,” Journal of The Chinese Fuzzy Systems Association (Taiwan), Vol. 4, No. 1, pp. 31-44.
49. Tan, C., Wang, Y., Lee, C., 2002, “The use of bigrams to enhance text categorization,” Information Processing and Management, Vol. 38, pp. 529-546.
50. Tyrvainen, P., Paivarinta, T., 1999, “On rethinking organizational document genres for electronic document management,” Proceedings, the 32nd Annual Hawaii International Conference on System Sciences, No. 2, pp. 10.
51. Wang, B.B., McKay, R.I., Abbass, H.A., Barlow, M., 2002, “Learning text classifier using domain concept hierarchy,” Proceedings, IEEE International Conference on Communications, Circuits and Systems and West Sino Expositions, No. 2, pp. 1230-1234.
52. World Intellectual Property Organization, 2003, “International Patent Classification” , http://www.wipo.int/classifications/en/