多層級知識/使用者分類模式與技術建構｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	林峰興
論文名稱：	多層級知識/使用者分類模式與技術建構 Multi-Level Classification Approaches for Enterprise Knowledge and Potential Clients
指導教授：	侯建良
口試委員:
學位類別：	碩士 Master
系所名稱：	工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management
論文出版年：	2004
畢業學年度：	92
語文別：	中文
論文頁數：	160
中文關鍵詞：	多層級分類法則、關聯性分析、文件分類、使用者類別判定、知識管理、文件探勘
外文關鍵詞：	Document Classification, User Classification, Association Analysis, Knowledge Management, Text Mining
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

受惠於電腦資訊科技之蓬勃發展、網際網路之普及化，人類擁有更具即時性與便利性之傳輸、溝通與交易環境。在此資訊高速發展的環境中，電子化文件與電子化交易紀錄以幾何級數之速度成長與流通，使網際網路儼然成為一龐大之資料庫。面對此資料庫中數量驚人之資料，如何利用自動化文件分類技術協助企業組織與個體管理電子化文件、以及透過使用者偏好判定技術提供適切之資訊與服務予目標顧客，實為現今相關研究與實務應用之重要議題。
由於電子化文件內容與網頁瀏覽者之網路行為具多樣性與複雜性，若以人工決策判斷文件類型、決定使用者者偏好，其處理速度不僅緩慢且不符合經濟效益，認定標準亦甚難維持一致性。有鑑於此，本論文提出一套智慧型文件與使用者偏好類型判定之模式與技術，此模式之推論作法包含「類別與文件關鍵詞彙之關聯推論」、「文件類型判定」與「使用者偏好類別判定」等主題。其基本理念乃利用企業文件庫中之已知文件與其對應之文件類型，透過關鍵字擷取法則萃取使用者過去所瀏覽網頁或文件之關鍵字頻率；其次，再根據關鍵字頻進行類別與文件關鍵詞彙之關聯推論，推知各關鍵字與文件類型隸屬關係。於文件類別判定與使用者偏好類別判定主題中，首先乃取得使用者所瀏覽之文件紀錄與對應之關鍵字後，以關鍵字-類型隸屬關係進行比對並推得各文件之隸屬係數值；最後，再根據所有累積隸屬係數值推論文件與使用者之類型。
鑑於傳統之自動分類方法多著重於分類正確率與效率性之提升，卻忽略實際分類問題以層級方式反覆進行之特性，本論文乃針對一般分類方法忽略分類深度、分類彈性不足之缺憾，提出一多層級之自動分類方法論，以更符合實際產業應用之需求。整體而言，本論文所提出之自動分類模式可依據使用者之閱讀紀錄推論其偏好趨勢，進而提供企業決策者決定主動行銷對象之參考。此外，其亦可應用於組織之知識管理系統進行文件分類與權限管理，使企業體之知識存取與管理更能發揮實效。

Owing to the drastic development of the information technologies and the popularity of WWW applications, the common users have a more efficient and convenient environment to communicate and exchange information with each other. In the cyberspace, a great number of digital documents and transaction data make the Internet a huge knowledge depository. In order to efficiently manage the various documents and explore the target knowledge users,
automatic document and user classification mechanisms are required for the modern enterprises to provide effective knowledge service.
Automatic classification mechanisms have gradually been developed to reduce the human efforts dedicated to document/user classification. Concerning the high variety of
document contents and user behaviors over the Internet, it is not appropriate for the modern organizations to exploit the document and user characteristics simply by human decision. This thesis develops an approach to automatically and consistently determine the document/user categories according to the document keywords and browse history.
Previously, only the single-level classification approaches of documents and users are concerned. However, the single-level classification mechanisms cannot meet the organization operation requirements. Due to the complexity of enterprise processes,products and services, automatic multi-level classification methodologies of enterprise
documents and users ar e explored in this thesis to fulfill the realization of intelligent document/knowledge management. In order to evaluate the feasibility and effectiveness of the proposed methodologies, a web-based prototype system is developed and a demonstration case is provided. The decision support model as well as the technology aims at providing enterprises an effective classification approach that can be applied in CRM systems for efficient relationship marketing or KM systems for effective document security management.

中文摘要    I
英文摘要    II
誌謝辭    III
目錄    IV
圖目錄    VII
表目錄    XI

第一章、研究背景    1
1研究動機與目的    1
2研究步驟    3
3研究定位    5
第二章、文獻回顧    8
1網頁探勘    8
1.1網頁內容探勘    9
1.2網頁結構探勘    12
1.3網頁使用歷程探勘    13
2關聯性分析    15
2.1關聯性    15
2.2序列性    17
2.3有趣性    18
3文件分類    20
3.1文件特徵選取    20
3.2文件分類技術    22
4使用者偏好類型判定    24
4.1偏好類型判定方法    24
4.2個人化    27
第三章、文件與使用者類別判定模式    30
1單一層級自動分類法則    30
1.1單一層級類別與關鍵詞彙關聯推論    30
1.2單一層級文件分類    35
1.3單一層級使用者偏好類別判定    41
2多層級自動分類法則    46
2.1多層級類別與關鍵詞彙關聯推論    47
2.2多層級文件分類    58
2.3多層級使用者偏好類別判定    61
第四章、系統架構與規劃    66
1文件與使用者類別判定模式之流程架構    66
2系統功能架構    68
3資料模式定義    70
4系統流程    71
4.1系統操作流程    72
4.2系統資料流程    82
5系統開發工具    83
第五章、系統開發與案例分析    85
1系統功能操作    85
1.1文件解析    87
1.2文件分享    93
1.3文件資料維護    96
1.4文件類別管理    103
1.5詞彙維護    108
1.6人事管理功能    114
1.7系統參數╱門檻值設定    121
2單一層級自動分類案例驗證與分析    123
3多層級自動分類案例驗證與分析    127
3.1案例驗證進行方式    128
3.2案例驗證結果分析    132
3.3系統學習趨勢    135
第六章、結論與未來展望    138

參考文獻    141
附錄一、VSIA SIP分類法    149
附錄二、多層級自動分類案例驗證結果（第二階段至第八階段）    150

                                

1. 王經篤，2001，「中文文件自動分類方法的設計與評估」，國立中正大學資訊工程研究所碩士論文（指導教授：蔡志忠）。
2. 巫啟台，2002，「文件之關聯資訊萃取及其概念圖自動建構」，國立成功大學資訊工程研究所碩士論文（指導教授：蔣榮先）。
3. 沈清正、高鴻斌、張元哲、陳仕昇、陳家仁、黃琮盛、陳彥良，2002，「資料間隱含關係的挖掘與展望」，資訊管理學報，第九卷，專刊，第 75-99頁。
4. 林若萍，2002，「以從眾化機制過濾超載資訊之效果研究」，國立中正大學資訊管理研究所碩士論文（指導教授：王俊程）。
5. 侯永昌、楊雪花，1998，「以模糊理論和遺傳演算法為基礎之中文文件自動分類之研究」，模糊系統學刊，第四卷，第一期，第45-57頁。
6. 侯建良、林峰興、畢威寧，2003，「以關鍵字推論為基之網路消費者類型判定模式」，2003年產業電子化運籌管理學術暨實務研討會—挑戰2008數位台灣會議，Paper ID: 004。
7. 侯建良、林峰興、畢威寧，2003，「知識文件之多層級分類演算法」，中國工業工程學會九十二年度年會暨學術研討會，Paper ID: CIIE2003-365。
8. 凌俊青、許秉瑜、陳彥良，2001，「在包裹式資料庫中挖掘數量關聯規則」，資訊管理學報，第七卷，第二期，第215-232頁。
9. 翁瑞鋒，2001，「網頁瀏覽者行為之泛化分群分析」，國立交通大學資訊科學系研究所碩士論文（指導教授：曾憲雄）。
10. 張恭維，2000，「結合關聯法則與模糊叢聚之網際探勘架構」，元智大學資訊管理學系研究所碩士論文（指導教授：劉俞志）。
11. 許毅嘉，2001，「關聯法則應用於代理伺服器上之快取置換機制」，國立中興大學資訊科學研究所碩士論文（指導教授：賈坤芳）。
12. 陳仕昇、許秉瑜、陳彥良，1999，「以可重覆序列挖掘網路瀏覽規則之研究」，資管評論，第九卷，第53-71頁。
13. 陳正宏，2002，「以網路流量為基礎分析網路使用者之行為-以淡江大學為例」，淡江大學資訊管理學系研究所碩士論文（指導教授：黃明達）。
14. 陳家仁、陳彥良、陳禹辰，2003，「在少樣商品或短交易長度情況下挖掘關聯規則」，資訊管理學報，第九卷，第二期，第55-72頁。
15. 陳振東、朱志浩，2003，「應用資料採庫技術於網際網路使用者資訊偏好分析之研究」，產業論壇，第五卷，第二期，第43-64頁。
16. 曾元顯，2002，「文件主題自動分類成效因素探討」，中國圖書館學會會報，第六十八期，第62-83頁。
17. 楊允言，1999，「中文文件自動分類之探討」，大漢學報，第十三卷，第241-256頁。
18. 楊昇宏，1999，「資料挖掘應用於找尋瀏覽網頁之型樣」，逢甲大學資訊工程學系研究所碩士論文（指導教授：楊東麟）。
19. 楊煜愷，2000，「以完全項目集合演算法挖掘與分析使用者瀏覽行為」，暨南國際大學資訊管理學系研究所碩士論文（指導教授：游子宜）。
20. 楊錦生，2001，「文件探勘技術中字詞擴展之研究」，國立中山大學資訊管理研究所碩士論文（指導教授：魏志平）。
21. 詹智凱，2000，「以詞的關聯性為基礎的文件自動分類」，國立台灣科技大學資訊管理研究所碩士論文（指導教授：徐俊傑）。
22. 賴育昇，2001，「自然語言處理於網際網路常用問答集檢索之研究」，國立成功大學資訊工程研究所博士論文（指導教授：吳宗憲）。
23. 顏秀珍、李御璽、何仁傑，2001，「利用資料探勘語言挖掘感興趣的資訊」，電腦學刊，第十三卷，第四期，第44-60頁。
24. 魏源谷，2001，「多分類器系統在自動化文件分類之研究」，國立中正大學資訊工程研究所碩士論文（指導教授：蔡志忠）。
25. 蘇育民，2001，「意圖行為於網路瀏覽習慣探勘之探索」，義守大學資訊工程學系研究所碩士論文（指導教授：陶幼慧）。
26. Adelberg, B., 1998, “NoDOSE — A tool for semi-automatically extracting structured and semistructured data from text documents,” Proceedings of SIGMOD’98, pp. 283-294.
27. Agrawal, R. and Srikant, R., 1994,“Fast algorithms for mining association rules,” Proceedings of the International Conference on Very Large Database, pp. 487-499.
28. Agrawal, R. and Srikant, R., 1995, “Mining Sequential Patterns,” Proceedings of the Eleventh International Conference on Data Engineering, pp. 3-14.
29. Ashish, N. and Knoblock, C.A., 1997, “Semi-automatic wrapper generation for Internet information sources,” Proceedings of the Second IFCIS International Conference on Cooperative Information Systems, pp. 160-169.
30. Brin, S. and Page, L., 1998, “The anatomy of a large-scale hypertextual web search engine,” Computer Networks and ISDN System, Vol. 30, No. 1-7, pp. 107-117.
31. Buckley, C., Salton, G. and Allan, J., 1994, “The Effect of Adding Relevance Information in a Relevance Feedback Environment,” Proceedings of SIGIR’94, pp. 292-300.
32. Cai, C. H., Fu, A.W.C., Cheng, C. H., and Kwong, W. W., 1998, “Mining association rules with weighted items,” Proceedings of the International Conference on Database Engineering and Applications Symposium, pp. 68-77.
33. Celmins, A., 2000, “Classification by attribute evaluation,” The 19th International Conference of the North American Fuzzy Information Processing Society, pp. 123-127.
34. Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D. and Kleinberg, J., 1998, “Automatic resource compilation by analyzing hyperlink structure and associated text,” Computer Networks and ISDN System, Vol. 30, No. 1-7, pp. 65-74.
35. Chakrabarti, S., Dom, B. E., Kumar, S. R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D. and Kleinberg, J., 1999, “Mining the Web's link structure,” Computer, Vol. 32, No. 8, pp. 60-67.
36. Chang, I. C. and Hwang, H. G., 1998, “Applying neural networks in time series forecasting,” Information and Management Sciences, Vol. 9, No. 3, pp. 35-43.
37. Changchien, S. W. and Lu, T.-C., 2001,“Mining association rules procedure to support on-line recommendation by customers and products fragmentation,”Expert Systems with Applications, pp. 325-335.
38. Charles, L. and Vincent, Ng., 1999, “Mining quantitative association rules under inequality constraints,” Proceedings of Knowledge and Data Engineering Exchange, pp. 53-59.
39. Chen, Y. L., Chen, S. S. and Hsu, P. Y., 2002, “Mining hybrid sequential patterns and sequential rules,” Information Systems, Vol. 27, No. 5, pp. 345-362.
40. Chenoweth, M., 1998, “The early 19th century climate of the Bahamas and a comparison with 20th century average,” Climatic Change, Vol. 40, pp. 577-603.
41. Cooley, R., Mobasher, B. and Srivastava, J., 1997, “Web mining: information and pattern discovery on the World Wide Web,” IEEE International Conference on Tools with Artificial Intelligence, pp. 558-567.
42. Dorre, J., Gerstl, P. and Seiffert, R., 1999, “Text mining: finding nuggets in mountains of textual data,” Proceedings of the 5’s ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 398-401.
43. Ellen, S., 1997, “ParaSite: mining structural information on the web,” Computer Networks and ISDN Systems, Vol. 29, No. 8-13, pp. 1205-1215.
44. Feldman, R. and Dagan, I., 1995, “Knowledge discovery in textual database,” Proceedings of the first ACM SIGKDD International Conference on knowledge discovery and data mining, pp. 112-117.
45. Feng, T. and Murtagh, K., 2000, “Towards knowledge discovery from WWW log data,” Proceedings of International Conference on Information Technology: Coding and Computing, pp. 302-307.
46. Guan, T. and Wong, K.-F., 1999, “KPS: A web information mining algorithm,” Computer Networks, Vol. 31, pp. 1495-1507.
47. Han, J. and Fu, Y., 1995, “Discovery of multiple-level association rules from large databases,” Proceedings of the 21st International Conference on VLDB, pp. 420-431.
48. Haruechaiyasak, C., Shyu, M.-L. and Chen, S.-C., 2002, “Web document classification based on fuzzy association,” Proceedings of the 26th Annual International Computer Software and Applications Conference, pp. 487-492.
49. Hay, B., Wets, G. and Vanhoof, K., 2003, “Segmentation of visiting patterns on web sites using a sequence alignment method,” Journal of Retailing and Consumer Services, Vol. 10, No. 3, pp. 145-153.
50. Hilderman, R. J. and Hamilton, H. J., 2000, “Principles for mining summaries using objective measures of interestingness,” Proceedings of 12th IEEE International Conference on Tools with Artificial Intelligence, pp. 72-81.
51. Jenamani, M., Mohapatra, Pratap K.J. and Ghose, S., 2003, “A stochastic model of e-customer behavior,” Electronic Commerce Research and Applications, Vol. 2, No. 1, pp. 81-94.
52. Kaski, S., Honkela, T., Lagus, K. and Kohonen, T., 1998, “WEBSOM – Self-organizing maps of document collections,” Neurocomputing, Vol. 21, No. 1-3, pp. 101-117.
53. Kim, J. K., Cho, Y. H., Kim, W. J., Kim, J. R. and Suh, J. H., 2002, “A personalized recommendation procedure for Internet shopping support,” Electronic Commerce Research and Applications, Vol. 1, No. 3-4, pp. 301-313.
54. Kim, K.-S. and Han, I., 2001, “The cluster-indexing method for case-based reasoning using self-organizing maps and learning vector quantization for bond rating cases,” Expert Systems with Applications, Vol. 21, No. 3, pp. 147-156.
55. Klusch, M., 2001, “Information agent technology for the Internet: A survey,” Data and Knowledge Engineering, Vol. 36, No. 3, pp. 337-372.
56. Kohonen, T., 1982, “Self-organizing formation of topologically correct feature maps,” Biological Cyberneties, Vol. 43, pp.59-69.
57. Kosala, R. and Blockeel, H., 2000, “Web mining research: a survey,” ACM SIGKDD, Vol. 2, No. 1, pp. 1-15.
58. Lin, F.-R. and Hsueh, C.-M., 2001, “Knowledge map discovery in virtual communities of practice,” The 12th International Conference on Information Management, pp. 158-170.
59. Lin, S.-H., Chen, M.-C., Ho, J.-M. and Huang, Y.-M., 2002, “ACIRD: intelligent internet document organization and retrieval,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 3, pp. 599-614.
60. Lin, X., Liu, C. Zhang, Y. and Zhou, X., 1999, “Efficiently computing frequent tree-like topology patterns in a web environment,” Proceedings of TOOLS 31 on Technology of Object-Oriented Languages and Systems, pp. 440-447.
61. Liu, B., Hsu, W., Mun, L.-F. and Lee, H.-Y., 1999, “Finding interesting patterns using user expectations,” IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 6, pp. 817-832.
62. Liu, C.-H., Lu, C.-C. and Lee, W.-P., 2000, “Document categorization by genetic algorithms,” IEEE International Conference on Systems, Man, and Cybernetics, Vol. 5, pp. 3868-3872.
63. Liu, L., Chen, J. and Song, H., 2002, “The research of Web mining,” Proceedings of the 4th World Congress on Intelligent Control and Automation, Vol. 3, pp. 2333-2337.
64. Liu, S.-T. and Kao, C., 2002, “Fuzzy measures for correlation coefficient of fuzzy numbers,” Fuzzy Sets and Systems, Vol. 128, No. 2, pp. 267-275.
65. Liu, Y. and Yao, X., 1999, “Ensemble learning via negative correlation,” Neural Networks, Vol. 12, No. 10, pp. 1399-1404.
66. Madhavji, N. H., Zhang, M., Boulos, S. and Yuan, X. G., 1989, “Semi-structured cursor movements in MUPE-2,” Software Engineering Journal, Vol. 4, No. 6, pp. 309-317.
67. Martin-Bautista, M. J., Vila, M.-A., Sanchez, D. and Larsen, H. L., 2000, “Fuzzy genes: improving the effectiveness of information retrieval,” Proceedings of the 2000 Congress on Evolutionary Computation, Vol. 1, pp. 471-478.
68. Meo, R., Psaila, G. and Ceri, S., 1996, “A new SQL-like operator for mining association rules,” Proceedings of the International Conference on Very Large Database, pp. 122-133.
69. Mittal, B. and Lassar, W. M., 1996, “The role of personalization in service encounters,” Journal of Retailing, Vol. 72, No. 1, pp. 95-109.
70. Mostafa, J. and Lam, W., 2000, “Automatic classification using supervised learning in a medical document filtering application,” INF. PROCESS. MANAGE, Vol. 36, No. 3, pp. 415-444.
71. Nahm, U. Y. and Mooney, R. J., 2002, “Text mining with information extraction,” Proceedings of the AAAI on Mining Answers from Texts and Knowledge Bases, pp. 60-67.
72. Oja, E., Kiviluoto, K. and Malaroiu, S., 2000, “Independent component analysis for financial time series,” Adaptive Systems for Signal Processing, Communications, and Control Symposium, pp. 111-116.
73. Padmanabhan, B. and Tuzhilin, A., 1999, “Unexpectedness as a measure of interestingness in knowledge discovery,” Decision Support Systems, Vol. 27, pp. 303-318.
74. Piatesky, G. and Matheus, C. J., 1994, “The interestingness of deviations,” Proceedings on Workshop Knowledge Discovery in Database, pp. 25-36.
75. Sakagami, H. and Kamba, T., 1997, “Learning personal preferences on online newspaper articles from user behaviors,” Computer Networks and ISDN Systems, Vol. 29, No. 8-13, pp. 1447-1455.
76. Salton, G. and McGill, M. J., 1983, “Introduction to Modern Information Retrieval,” McGraw-Hill, NY, USA.
77. Silberschatz, A. and Tuzhilin, A., 1995, “On subjective measures of interestingness in knowledge discovery,” Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 275-281.
78. Silberschatz, A. and Tuzhilin, A., 1996, “What makes patterns interesting in knowledge discovery systems,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 970-974.
79. Silva, J., Mexia, J., Coelho, A. and Lopes, G., 2001, “Document clustering and cluster topic extraction in multilingual corpora,” Proceedings IEEE International Conference on Data Mining, pp. 513-520.
80. Singh, L., 1999, “An algorithm for constrained association rule mining in semi-structured data,” Proceedings of PAKDD’99, pp. 148-158.
81. Smith, K. A. and Ng, A., 2003, “Web page clustering using a self-organizing map of user navigation patterns,” Decision Support Systems, Vol. 35, No. 2, pp. 245-256.
82. Sundaresan, N. and Yi, J., 2000, “Mining the Web for relations,” Computer Networks, Vol. 33, No. 1-6, pp. 699-711.
83. Tang, J., Goodrich, M. and Ng, Y.-K., 2001, “A binary-categorization approach for classifying multiple-record web documents using application anthologies and a probabilistic model,” Proceedings of 7th International Conference on Database Systems for Advanced Applications, pp. 58-65.
84. Terano, T. and Murakami, E., 2000, “Finding users’ latent interests for recommendation by learning classifier systems,” Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, Vol. 2, pp. 651-654.
85. Wang, J., Huang Y., Wu G. and Zhang F., 1999, “Web mining: knowledge discovery on the Web,” IEEE SMC '99 Conference Proceedings on Systems, Man, and Cybernetics, Vol. 2, pp. 137-141.
86. Wei, C.-S., Liu, Q., Wang, J. T. L. and Ng, P. A., 1997, “Knowledge discovering for document classification using tree matching in TEXPROS,” Information Sciences, Vol. 100, No. 1-4, pp. 255-310.
87. Yen, S.J. and Chen, A. L. P., 1997, “An Efficient Data Mining Technique for Discovering Interesting Association Rules,” Proceedings of the 8th International Conference and Workshop on Database and Expert Systems Applications, pp. 664-669.
88. Yuan, S.-T. and Chang, W.-L., 2001, “Mixed-initiative synthesized learning approach for web-based CRM,” Expert Systems with Applications, Vol. 20, No. 2, pp. 187-200.
89. Zadeh, L. A., 1965, “Fuzzy Sets,” Information and Control,” Vol. 8, pp. 338-353.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文