研究生: |
陳建佑 |
---|---|
論文名稱: |
以圖表為基礎之知識單元擷取技術 A Knowledge Component Extraction Technology Based on the Figures and Tables |
指導教授: | 侯建良 |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 中文 |
論文頁數: | 175 |
中文關鍵詞: | 知識單元擷取 、知識元件化 、知識管理 |
外文關鍵詞: | Knowledge Component Extraction, Component-based Knowledge, Knowledge Management |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著知識文件之多樣化發展及各領域知識之迅速累積,一份知識文件之內容可能涵蓋數個主題知識段落及不同之領域知識概念,如欲由其中擷取特定主題知識實具有困難性。然而傳統知識擷取模式僅能回饋知識擷取者以整份知識文件為基礎之知識單元,造成知識擷取者需耗費時間閱讀過多知識文件中不相關之主題資訊,始可取得其所需知識。故若能藉由「知識元件化」之概念,將知識文件切割為以主題知識段落為基礎之知識單元,即可使知識攝取者迅速且準確地搜尋並擷取特定領域知識。而因圖表通常為知識文件之主要精髓,所有主題知識之關鍵內容往往環繞於圖表周圍之段落內容中,故本研究將針對自由形式知識文件提出一套可自動地擷取圖表主題知識之方法論。
本方法論之詳細作法乃首先以領域詞彙庫為基礎擷取圖表關鍵詞彙;其次,則針對目標文件內容進行文句斷句,以作為後續擷取圖表敘述段落之基礎。之後,透過「關鍵詞彙比對法」及「起始結尾句比對法」等模式擷取圖表敘述段落;其中,「關鍵詞彙比對法」為計算圖表關鍵詞彙於文件中各文句之出現頻率,進而以頻率為基礎擷取圖表敘述段落;而「起始結尾句比對法」則經由整理知識文件,得知圖表敘述段落起始句與結尾句之語意結構特性,再以此語意結構特性與文件內容進行比對,即可擷取符合圖表敘述段落起始句特性與結尾句特性之段落內容;而結合本方法論所擷取之圖表敘述段落及圖表圖形即為圖表所對應之主題知識。
本研究根據圖表主題知識擷取方法論建構一套圖表主題知識擷取系統,並以「台灣物流年鑑」為案例進行系統驗證,以確認本方法論之準確性及可行性。而由驗證結果得知,本系統可透過匯入訓練資料而強化系統推論之能力,進而使系統推論績效達良好之水準。整體而言,本研究所提出之知識單元擷取技術可提升知識擷取者搜尋並擷取知識文件中特定主題知識之效率,進而促進知識文件之蘊含知識更能被知識擷取者搜尋及應用。
With the growing complexity of document contents and the significant increase of domain knowledge, it is difficult for knowledge receivers to understand the specific domain knowledge. However, the traditional knowledge extraction schemes usually provide complete documents to the knowledge receivers and much time is required for the knowledge receivers to acquire domain knowledge. The concept of component-based knowledge is to divide the documents into several knowledge components corresponding more specific domains and can be used to reduce the time required for the knowledge receivers to search the specific domain knowledge. Moreover, since the figures and tables in a document usually contain the important implicit knowledge expressed within the document, the aim of this research is to extract the knowledge components form the documents (e.g., the industry yearbooks) on the basis of figures and tables.
In this research, a Knowledge Component Extraction (KCE) model with two algorithms namely Keyword Mapping Algorithm (KMA) and Sentence Mapping Algorithm (SMA) is developed. In order to demonstrate applicability of the proposed mothodology, a web-based knowledge component extraction system is also established based on the proposed model. Furthermore, the Taiwan Logistics Yearbooks are applied as examples to evaluate the proposed model. The verification results show that the developed system is a high-performance knowledge component extraction system. As a whole, this research provides an approach for knowledge receivers to efficiently and accurately acquire the domain knowledge.
1. 王文賓,2003,「複雜文件影像的文字抽取技術」,碩士論文(指導教授:吳炳飛),國立交通大學電機資訊學院(電機與控制學程)碩士班。
2. 石明周,2002,「以區域對應為基礎之影像擷取與相關回饋」,碩士論文(指導教授:許秋婷),國立清華大學資訊工程學系。
3. 石昭玲,2002,「以影像內涵之形狀及色彩為基礎的影像檢索系統」,博士論文(指導教授:陳玲慧),國立交通大學資訊科學系。
4. 李泓斌,2004,「應用文本探勘技術於網頁影像語意發掘」,碩士論文(指導教授:楊新章),長榮大學資訊管理學系碩士班。
5. 李國清,1999,「以輪廓特徵及模糊比對為基礎的平面圖形辨識系統」,碩士論文(指導教授:曾怜玉),國立中興大學應用數學系。
6. 林忠誠,2004,「基於本體論與標準物件以遞增式學習來理解影像」,碩士論文(指導教授:蘇豐文),國立清華大學資訊系統與應用研究所。
7. 林姝文,2005,「基於本體論之影像自動註解」,碩士論文(指導教授:蘇豐文),國立清華大學資訊系統與應用研究所。
8. 陶如蘭,2001「以關聯回饋從事影像內容檢索」,碩士論文(指導教授:洪一平與許舜欽),國立臺灣大學資訊工程學研究所。
9. 程雅娟,2000,「結合色彩、紋理及區塊資訊之影像分類法」,碩士論文(指導教授:陳淑媛),元智大學資訊工程研究所。
10. 黃惠俞,2006,「影像認知與回想之研究」,博士論文(指導教授:許文星),國立清華大學電機工程學系。
11. 楊超然,2003,「利用文件及影像檢索建立胃癌診斷與治療的案例式推論」,碩士論文(指導教授:劉立),台北醫學大學醫學資訊研究所。
12. 經濟部商業司,2004,「2003台灣物流年鑑」,經濟部。
13. 經濟部商業司,2005,「2004台灣物流年鑑」,經濟部。
14. 趙柏榕,2005,「病理切片區域特徵影像檢索系統之建構」,碩士論文(指導教授:陳偉),國立台北護理學院資訊管理研究所。
15. 謝家興,2005,「運用以內容為基礎之影像擷取於藥物辨識之研究」,碩士論文(指導教授:劉立),台北醫學大學醫學資訊研究所。
16. 簡志宇,2006,「整合關鍵字與視覺特徵的反覆式影像檢索系統」,碩士論文(指導教授:陳穎平),國立交通大學資訊科學與工程研究所。
17. 魏郁珊,2005,「色彩影像分割演算法之改進與互動式物件擷取」,碩士論文(指導教授:貝蘇章),國立台灣大學電信工程學研究所。
18. Akgöbek, O., Aydin, Y. S., Öztemel, E. and Aksoy, M. S., 2006, “A new algorithm for automatic knowledge acquisition in inductive learning,” Knowledge-Based Systems, Vol. 19, No. 6, pp. 388-395.
19. Barnard, K. and Forsyth, D., 2001, “Learning the semantics of words and pictures,” IEEE International Conference on Computer Vision, Vol. 2, pp. 408-415.
20. Barnard, K., Duygulu, P. and Forsyth, D. A., 2003, “Recognition as translating images into text,” Proceedings of the SPIE, Vol. 5018, pp. 168-178.
21. Barnard, K., Duygulu, P., Forsyth, D., Freitas, N. D., Blei, D. M. and Jordan, M. I., 2003, “Matching words and pictures,” Journal of Machine Learning Research, Vol. 3, pp. 1107-1135.
22. Benitez, A. B. and Chang, S. F., 2002, “Semantic knowledge construction from annotated image collections,” IEEE International Conference on Multimedia and Expo, Vol. 2, pp. 205-208.
23. Berkovich, S. and Inayatullah, M., 2004, “A fuzzy find matching tool for image text analysis,” Applied Imagery Pattern Recognition Workshop, pp. 101-105.
24. Bhanu, B. and Dong, A., 2002, “A medical image-understanding system,” Engineering Applications of Artificial Intelligence, Vol. 15, No. 2, pp. 123-138.
25. Bichindaritz, I. and Akkineni, S., 2006, “Concept mining for indexing medical literature,” Engineering Applications of Artificial Intelligence, Vol. 19, No. 4, pp. 411-417.
26. Cao, C., Wang, H. and Sui, Y., 2004, “Knowledge modeling and acquisition of traditional Chinese herbal drugs and formulae from text,” Artificial Intelligence in Medicine, Vol. 32, No. 1, pp. 3-13.
27. Chan, S. W. K., 2006, “Beyond keyword and cue-phrase matching: A sentence-based abstraction technique for information extraction,” Decision Support Systems, Vol. 42, No. 2, pp. 759-777.
28. Chandrasekar, R. and Srinivas, B., 1997, “Automatic induction of rules for text simplification,” Knowledge-Based Systems, Vol. 10, No. 3, pp. 183-190.
29. Chang, L. Y. and Chen, C. F., 2006, “Road extraction based on watershed segmentation for high resolution quickbird satellite images,” Journal of Photogrammetry and Remote Sensing, Vol. 11, No. 3, pp. 261-267.
30. Chen, D., Odobez, J. and Bourlard, H., 2004, “Text detection and recognition in images and video frames,” The Journal of the Pattern Recognition Society, Vol. 37, No. 3, pp. 595-608.
31. Chen, F. R. and Bloomberg, D. S., 1998, “Summarization of imaged documents without OCR,” Computer Vision and Image Understanding, Vol. 70, No. 3, pp. 307-320.
32. Colombo, C., Del Bimbo, A. and Pala, P., 1999, “Semantics in visual information retrieval,” IEEE Multimedia, Vol. 6, No. 3, pp. 38-53.
33. de Oliverira, I. L. and Wazlawick, R. S., 1998, “A modular connectionist parser for resolution of pronominal anaphoric references in multiple sentences,” The 1998 IEEE International Joint Conference on Neural Networks Proceedings, Vol. 2, pp. 1194-1199.
34. Depalov, D., Pappas, T. N., Li, D. and Gandhi, B., 2006, “Perceptual feature selection for semantic image classification,” 2006 IEEE International Conference on Image Processing, pp. 2921-2924.
35. Duncan, J. S., Staib, L. H., Birkholzer, T., Owen, R., Anandan, P. and Bozma, I., 1990, “Medical image analysis using model-based optimization,” Proceedings of the First Conference on Visualization in Biomedical Computing, pp. 370-377.
36. Fan, J., Gao, Y., Luo, H. and Xu, G., 2005, “Statistical modeling and conceptualization of natural images,” Pattern Recognition, Vol. 38, No. 6, pp. 865-885.
37. Favela, J. and Meza, V., 1999, “Image-retrieval agent: Integrating image content and text,” IEEE Intelligent Systems, Vol. 14, No. 5, pp. 36-39.
38. Foo, S. and Li, H., 2004, “Chinese word segmentation and its effect on information retrieval,” Information Processing and Management, Vol. 40, No. 1, pp. 161-190.
39. Fu, K. and Rosenfeld, A., 1976, “Pattern recognition and image processing,” IEEE Transactions on Computer, Vol. C-25, No. 12, pp. 1336-1346.
40. Fujihara, H., Simmons, D. B., Ellis, N. C. and Shannon, R. E., 1997, “Knowledge conceptualization tool,” IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 2, pp. 209-219.
41. Gomez, F. and Segami, C., 2007, “Semantic interpretation and knowledge extraction,” Knowledge-Based Systems, Vol. 20, No. 1, pp. 51-60.
42. Itti, L., Koch, C. and Niebur, E., 1998, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 11, pp. 1254-1259.
43. Iyatomi, H. and Hagiwara, M., 1998, “Knowledge extraction from scenery images and the recognition using fuzzy inference neural networks,” IEEE International Conference on Systems, Man, and Cybernetics, Vol. 5, pp. 4486-4491.
44. Jobbins, A. C. and Evett, L. J., 1999, “Segmenting documents using multiple lexical features,” Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 721-724.
45. Kim, J. T. and Moldovan, D. I., 1995, “Acquisition of linguistic patterns for knowledge-based information extraction,” IEEE Transactions on Knowledge and Data Engineering, Vol. 7, No. 5, pp. 713-724.
46. Kosir, P. and DeWall, R., 1994, “Feature alignment techniques for pattern recognition,” Proceedings of the IEEE 1994 National Aerospace and Electronics Conference, Vol. 1, pp. 128-132.
47. Li, Y., Lalonde, M., Reiher, E., Rizand, J. and Zhu, C. J., 1997, “A knowledge-based image understanding environment for document processing,” Proceedings of the Fourth International Conference on Document Analysis and Recognition, Vol. 2, pp. 979-983.
48. Liang, S., Ahmadi, M. and Shridhar, M., 1994, “A morphological approach to text string extraction from regular periodic overlapping text/background images,” IEEE International Conference on Image Processing, Vol. 1, pp.144-148.
49. Liu, J. N. K. and Kwong, R. W. M., 2007, “Automatic extraction and identification of chart patterns towards financial forecast,” Applied Soft Computing, Vol. 7, No. 4, pp. 1197-1208.
50. Lo, W., Wong, P. and Siu, M., 2002, “Maximum likelihood algorithm on Chinese word segmentation,” The 6th International Conference on Signal Processing, Vol. 1, pp. 468-471.
51. Luo, J., Savakis, A. E. and Singhal, A., 2005, “A Bayesian network-based framework for semantic image understanding,” Pattern Recognition, Vol. 38, No. 6, pp. 919-934.
52. Luo, J., Singhal, A., Etz, S. P. and Gray, R. T., 2004, “A computational approach to determination of main subject regions in photographic images,” Image and Vision Computing, Vol. 22, No. 3, pp. 227-241.
53. Magerman, D. M., 1995, “Statistical decision-tree models for parsing,” Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 276-283.
54. Maillot, N. E. and Thonnat, M., 2006, “Ontology based complex object recognition,” Image and Vision Computing, pp. 1-12.
55. Vargas-Vera, M., Motta, E., Domingue, J., Shum, S. B. and Lanzoni, M., 2001, “Knowledge extraction by using an ontology-based annotation tool,” Proceedings, Workshop on Knowledge Markup & Semantic Annotation, pp. 5-12.
56. Moghaddam, B. and Pentland, A., 1997, “Probabilistic visual learning for object representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, pp. 696-710.
57. Niemann, H., Sagerer, G. F., Schroder, S. and Kummert, F., 1990, “ERNEST: A semantic network system for pattern understanding,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No. 9, pp. 883-905.
58. Osberger, W. and Maeder, A. J., 1998, “Automatic identification of perceptually important regions in an image,” Proceedings of the 14th International Conference on Pattern Recognition, Vol. 1, pp. 701-704.
59. Pal, C., Swayne, D. and Frey, B., 2001, “The automated extraction of environmentally relevant features from digital imagery using Bayesian multi-resolution analysis,” Advances in Environmental Research, Vol. 5, No. 4, pp. 435-444.
60. Palenichka, R. M. and Zaremba, M. B., 2006, “Perceptual knowledge extraction using Bayesian networks of salient image objects,” International Conference on Pattern Recognition, Vol. 3, pp. 1216-1219.
61. Perrin, P. and Petry, F. E., 2003, “Extraction and representation of contextual information for knowledge discovery in texts,” Information Sciences, Vol. 151, pp. 125-152.
62. Plomp, J., 1992, “An object oriented representational system for image features and their relations,” The 11th IAPR International Conference on Pattern Recognition, Vol. 1, pp. 518-521.
63. Rao, R. P. N., 1999, “An optimal estimation approach to visual perception and learning,” Vision Research, Vol. 39, No. 11, pp. 1963-1989.
64. Robertson, P., 2000, “An architecture for self-adaptation and its application to aerial image understanding,” Proceedings of the First International Workshop on Self-adaptive Software, Vol. 1936, pp. 199-223.
65. Robinson, G. P., Colchester, A. C. F. and Griffin, L. D., 1993, “Model-based recognition of anatomical objects from medical images,” Proceedings of the 13th International Conference on Information Processing in Medical Imaging, Vol. 687, pp. 197-211.
66. Ruaro, M. E., Bonifazi, P. and Torre, V., 2005, “Toward the neurocomputer: Image processing and pattern recognition with neuronal cultures,” IEEE Transactions on Biomedical Engineering, Vol. 52, No. 3, pp. 371-383.
67. Shang, X., Song, G. and Hou, B., 2003, “Content based texture image classification,” 2003 International Conference on Machine Learning and Cybernetics, Vol. 3, pp.1309-1313.
68. Singhal, A., Luo, J. and Zhu, W., 2003, “Probabilistic spatial context models for scene content understanding,” Computer Vision and Pattern Recognition, Vol. 1, pp. 235-241.
69. Soo, V., Lee, C., Li, C., Chen, S. and Chen, C., 2003, “Automated semantic annotation and retrieval based on sharable ontology and case-based learning techniques,” Digital Libraries, pp. 61-72.
70. Srihari, R. K. and Burhans, D. T., 1994, “Visual semantics: Extracting visual information from text accompanying pictures,” Proceedings of the 12th National Conference on Artificial Intelligence, Vol. 1, pp. 793-798.
71. Srihari, R. K., Chopra, R., Burhans, D., Venkataraman, M. and Govindaraju, V., 1994, “Use of collateral text in image interpretation,” Proceedings of the ARPA Image Understanding Workshop, Vol. 94, pp. 897-907.
72. Strouthopoulos, C., Papamarkos, N. and Atsalakis, A. E., 2002, “Text extraction in complex color documents,” The Journal of the Pattern Recognition Society, Vol. 35, No. 8, pp. 1743-1758.
73. Tang, Y. Y., Yan, C. D. and Suen, C. Y., 1994, “Document processing for automatic knowledge acquisition,” IEEE Transactions on Knowledge and Data Engineering, Vol. 6, No. 1, pp. 3-18.
74. Torralba, A. and Sinha, P., 2001, “Statistical context priming for object detection,” The Eighth IEEE International Conference on Computer Vision, Vol. 1, pp. 763-770.
75. Wang, F., Wang, J., Zhang, C. and Kwok, J., 2007, “Face recognition using spectral features,” Pattern Recognition, Vol. 40, No. 10, pp. 2786-2797.
76. Xue, N., 2003, “Chinese word segmentation as character tagging,” Computational Linguistics and Chinese Language Processing, Vol. 8, No. 1, pp. 29-48.
77. Zenou, E. and Samuelides, M., 2005, “Characterizing image sets using formal concept analysis,” EURASIP Journal on Applied Signal Processing, Vol. 2005, No. 13, pp. 1931-1938.
78. Zhu, S. C., 2003, “Statistical modeling and conceptualization of visual patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 6, pp. 691-712.
79. Zrimec, T. and Sammut, C., 1997, “A medical image-understanding system,” Engineering Applications of Artificial Intelligence, Vol. 10, No. 1, pp. 31-39.