研究生: |
朱寧敏 Chu, Ning Min |
---|---|
論文名稱: |
以文件內容為基礎之多文件脈絡關係分析-以產品相關文件分析為例 Multi-document Context Relationship Analysis - A Case Study of Product Related Documents |
指導教授: |
侯建良
Hou, Jiang Liang |
口試委員: |
吳建瑋
廖崇碩 |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 228 |
中文關鍵詞: | 文件脈絡關係 、文件類別判定 、閱讀內容建議 |
外文關鍵詞: | Document Context Relationship, Classification, Reading Recommendation |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
當資訊需求者透過網際網路搜尋所需之文件資料時,由搜索引擎所尋得之文件通常以與搜尋條件相關性高或常被其他瀏覽者點選之文件為優先出現,即符合搜尋條件文件之排序並未考量文件之脈絡關係(即文件之排序未參考文件內容參照的先後關聯),導致資訊需求者無法依文件間合理的先後次第、由淺入深地閱讀文件,因而可能花費較多時間理解文件內容、或在閱讀文件的過程中面臨理解困難的問題。
為解決上述問題,本研究乃先透過搜索引擎蒐集網際網路之各類文件,將所蒐集之文件加以分類,並擷取各文件之特徵點;之後,本研究即依各文件特質擷取結果歸納各類文件之區分特質。依前述作業之解析結果,本研究發展一套「文件脈絡關係分析」方法論,而此方法論主要乃包含「文件特質擷取」、「文件類別判定」及「文件脈絡排序」等三大階段。其中,「文件特質擷取」階段可將搜索引擎尋得之文件依其文件內容擷取特徵點;之後,「文件類別判定」階段乃依文件特質擷取結果、搭配已歸納之各類文件區分特質判定各目標文件所對應之文件類別;最後,「文件脈絡排序」階段則將各類別之文件依閱讀先後次第由淺入深地予以排序,並以視覺化方式呈現此排序結果,以呈現文件間之脈絡關係,供讀者方便地選讀所尋得之目標文件。
藉由上述方法,資訊需求者可在尋得所需之文件資料後,以本研究發展之方法自大量文件中取得文件間合理之排序,並可依文件之先後次第由淺入深地閱讀文件,減少理解文件內容與困難問題的時間,進而提供不同對象閱讀之建議內容,以及學習過程之關係脈絡建議。
As one searches required documents via keywords over the Internet, ranks of the related documents are determined based on their correlation with the specified keywords and their click rates. That is, context relationship between the related documents is not employed to determine the rank. As a result, readers have to spend more time to understand the document contents or face difficulties in understanding the documents. In order to solve the problems, this research analyzes a great number of documents and generalizes the relationship between document characteristics and document categories. On the basis of the analysis results, this research develops a model for context relationship analysis of multiple documents. By using the proposed model, characteristics and categories of documents can be identified by using determinant vectors. Finally, the documents can be sorted and the context relationship of documents can be visually displayed for reading. As a whole, the research can assist readers to acquire reasonable and visualized ranking of documents and to read the documents in appropriate sequence.
參考文獻
1. Agrawal, J., Sharma, N., Kumar, P., Parshav, V. and Goudar, R. H., 2013, "Ranking of Searched Documents Using Semantic Technology," Procedia Engineering, Vol. 64, pp. 1-7.
2. Akbari Torkestani, J., 2012, "An Adaptive Learning Automata-Based Ranking Function Discovery Algorithm," Journal of Intelligent Information Systems, Vol. 39, No. 2, pp. 441-459.
3. Alsmadi, I. and Alhami, I., 2015, "Clustering and Classification of Email Contents," Journal of King Saud University - Computer and Information Sciences, Vol. 27, No. 1, pp. 46-57.
4. Al-Tahrawi, M. M. and Al-Khatib, S. N., 2015, "Arabic Text Classification Using Polynomial Networks," Journal of King Saud University - Computer and Information Sciences, Vol. 27, No. 4, pp. 437-449.
5. Benny, A. and Philip, M., 2015, "Keyword Based Tweet Extraction and Detection of Related Topics," Procedia Computer Science, Vol. 46, pp. 364-371.
6. Bonzanini, M., Martinez-Alvarez, M. and Roelleke, T., 2012, "Opinion Summarisation through Sentence Extraction: An Investigation with Movie Reviews," Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1121-1122.
7. Chan, W. K. and Chong, W. C., 2004, "Unsupervised Clustering for Nontextual Web Document Classification," Decision Support Systems, Vol. 37, No. 3, pp. 377-396.
8. Chen, Y.-H., Lu, J.-L. and Tsai, M. F., 2014, "Finding Keywords in Blogs: Efficient Keyword Extraction in Blog Mining via User Behaviors," Expert Systems with Applications, Vol. 41, No. 2, pp. 663-670.
9. Choi, D., Kim, T., Min, M. and Lee, J-H., 2011, "An Approach to Use Query-Related Web Context on Document Ranking," Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication, pp. 1-7.
10. Dali, L., Fortuna, B. and Rupnik, J., 2010, "Learning to Rank for Personalized News Article Retrieval," Workshop on Applications of Pattern Analysis, pp. 152-159.
11. Daniłowicz, C. and Baliński, J., 2001, "Document Ranking Based upon Markov Chains," Information Processing & Management, Vol. 37, No. 4, pp. 623-637.
12. Druck, G., Pal, C., McCallum, A. and Zhu, X., 2007, "Semi-Supervised Classification with Hybrid Generative/Discriminative Methods," Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 280-289.
13. Duh, K. and Kirchhoff, K., 2011, "Semi-Supervised Ranking for Document Retrieval," Computer Speech and Language, Vol. 25, No. 2, pp. 261-281.
14. Elsas, J. L., Carvalho, V. R. and Carbonell, J. G., 2008, "Fast Learning of Document Ranking Function with Committee Perceptron," Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 55-64.
15. Ercan, G. and Cicekli, I., 2007, "Using Lexical Chains for Keyword Extraction," Information Processing and Management, Vol. 43, No. 6, pp. 1705-1714.
16. Figueiredo, F., Rocha, L., Couto, T., Salles, T., Gonçalves, M. A. and Jr, W. M., 2011, "Word Co-Occurrence Features for Text Classification," Information Systems, Vol. 36, No. 5, pp. 843-858.
17. Ghiassi, M., Olschimke, M., Moon, B. and Arnaudo, P., 2012, "Automated Text Classification Using a Dynamic Artificial Neural Network Model," Expert Systems with Applications, Vol. 39, No. 12, pp. 10967-10976.
18. Guan, H., Zhou, J., Xiao, B., Guo, M. and Yang, T., 2013, "Fast Dimension Reduction for Document Classification Based on Imprecise Spectrum Analysis," Information Sciences, Vol. 222, pp. 147-162.
19. Hahm, G. J., Lee, J. H. and Suh, H. W., 2015, "Semantic Relation Based Personalized Ranking Approach for Engineering Document Retrieval," Advanced Engineering Informatics, Vol. 29, No. 3, pp. 366-379.
20. Haveliwala, T. H., 2003, "Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search," IEEE Transactions on Knowledge & Data Engineering, Vol. 15, No. 4, pp. 784-796.
21. Hawalah, A. and Fasli, M., 2011, "A Hybrid Re-Ranking Algorithm Based on Ontological User Profiles," Proceedings of the 3rd Conference on Computer Science and Electronic Engineering, pp. 50-55.
22. Hernández, I., Rivero, C. R., Ruiz, D. and Corchuelo, R., 2014, "CALA: An Unsupervised URL-Based Web Page Classification System," Knowledge-Based Systems, Vol. 57, pp. 168-180.
23. Hong, B. and Zhen, D, 2012, "An Extended Keyword Extraction Method," Physics Procedia, Vol. 24, pp. 1120-1127.
24. Jameel, S. and Qian, X., 2012, "An Unsupervised Technical Readability Ranking Model by Building a Conceptual Terrain in LSI," Proceedings of the 8th International Conference on Semantics, Knowledge and Grids, pp. 39-46.
25. Jiang, Z., Zhang, S. and Zeng, J., 2013, "A Hybrid Generative/Discriminative Method for Semi-Supervised Classification," Knowledge-Based Systems, Vol. 37, pp. 137-145.
26. Ji, D., Zhao, S. and Xiao, G., 2009, "Chinese Document Re-Ranking Based on Automatically Acquired Term Resource," Language Resources and Evaluation, Vol. 43, No. 4, pp. 385-406.
27. Jun, S., Park, S.-S. and Jang, D.-S., 2014, "Document Clustering Method Using Dimension Reduction and Support Vector Clustering to Overcome Sparseness," Expert Systems with Applications, Vol. 41, No. 7, pp. 3204-3212.
28. Ko, Y. and Seo, J., 2009, "Text Classification from Unlabeled Documents with Bootstrapping and Feature Projection Techniques," Information Processing & Management, Vol. 45, No. 1, pp. 70-83.
29. Lee, L. H., Isa, D., Choo, W. O. and Chue, W. Y., 2012, "High Relevance Keyword Extraction Facility for Bayesian Text Classification on Different Domains of Varying Characteristic," Expert Systems with Applications, Vol. 39, No. 1, pp. 1147-1155.
30. Li, C. H. and Park, S. C., 2009, "An Efficient Document Classification Model Using an Improved Back Propagation Neural Network and Singular Value Decomposition," Expert Systems with Applications, Vol. 36, No. 2, pp. 3208-3215.
31. Lin, S.-S., 2009, "A Document Classification and Retrieval System for R&D in Semiconductor Industry – A Hybrid Approach," Expert Systems with Applications, Vol. 36, No. 3, pp. 4753-4764.
32. Liu, Y., Zhang, L., Song, R., Nie, J.-Y. and Wen, J.-R., 2009, "Clustering Queries for Better Document Ranking," Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1569-1572.
33. Li, Z., Zhou, D., Juan, Y.-F. and Han, J., 2010, "Keyword Extraction for Social Snippets," Proceedings of the 19th International Conference on World Wide Web, pp. 1143-1144.
34. Lloret, E. and Palomar, M., 2013, "Towards Automatic Tweet Generation: A Comparative Study from the Text Summarization Perspective in the Journalism Genre," Expert Systems with Applications, Vol. 40, No. 16, pp. 6624-6630.
35. Lopez, C., Prince, V. and Roche, M., 2014, "How Can Catchy Titles Be Generated without Loss of Informativeness?" Expert Systems with Applications, Vol. 41, No. 4, pp. 1051-1062.
36. Miao, D., Duan, Q., Zhang, H. and Jiao, N., 2009, "Rough Set Based Hybrid Algorithm for Text Classification," Expert Systems with Applications, Vol. 36, No. 5, pp. 9168-9174.
37. Nebhi, K., 2012, "Ontology-Based Information Extraction from Twitter," Proceedings of the Workshop on Information Extraction and Entity Analytics on Social Media Data, pp. 17-22.
38. Okamoto, J. and Ishizaki, S., 2011, "Important Sentence Extraction Using Contextual Semantic Network," Procedia - Social and Behavioral Sciences, Vol. 27, pp. 86-94.
39. Ouertani, H. C., 2013, "Implicit Sensitive Text Summarization Based on Data Conveyed by Connectives," International Journal of Advanced Computer Science & Application, Vol. 4, No. 11 pp. 1-4.
40. Özel, S. A., 2011, "A Web Page Classification System Based on A Genetic Algorithm Using Tagged-Terms As Features," Expert Systems with Applications, Vol. 38, No. 4, pp. 3407-3415.
41. Pak, A. and Paroubek, P., 2010, "Twitter as a Corpus for Sentiment Analysis and Opinion Mining," Proceedings of the 7th Conference on International Language Resources and Evaluation, pp. 1320-1326.
42. Preethi, P. G., Uma, V. and Kumar, A., 2015, "Temporal Sentiment Analysis and Causal Rules Extraction from Tweets for Event Prediction," Procedia Computer Science, Vol. 48, pp. 84-89.
43. Qin, L., Zheng, Q., Jiang, S., Huang, Q. and Gao, W., 2008, "Unsupervised Texture Classification: Automatically Discover and Classify Texture Patterns," Image and Vision Computing, Vol. 26, No. 5, pp. 647-656.
44. Roul, R. K., Devanand, O. R. and Sahay, S. K., 2014, "Web Document Clustering and Ranking Using Tf-Idf Based Apriori Approach," IJCA Proceedings on International Conference on Advances in Computer Engineering and Applications, No. 2, pp. 74-78.
45. Tsui, E., Wang, W. M., Cai, L., Cheung, C. F. and Lee, W. B., 2014, "Knowledge-Based Extraction of Intellectual Capital-Related Information from Unstructured Data," Expert Systems with Applications, Vol. 41, No. 4 pp. 1315-1325.
46. Usui, S., Palmes, P., Nagata, K., Taniguchi, T. and Ueda, N., 2007, "Keyword Extraction, Ranking, and Organization for the Neuroinformatics Platform," BioSystems, Vol. 88, No. 3, pp. 334-342.
47. Wang, Z. and Sun, X., 2011, "Document Classification Algorithm Based on MMP and LS-SVM," Procedia Engineering, Vol. 15, pp. 1565-1569.
48. Wen, K., Li, R., Xia, J. and Gu, X., 2014, "Optimizing Ranking Method Using Social Annotations Based on Language Model," Artificial Intelligence Review, Vol. 41, No. 1, pp. 81-96.
49. Xiang, B., Jiang, D., Pei, J., Sun, X., Chen, E. and Li, H., 2010, "Context-Aware Ranking in Web Search," Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 451-458.
50. Yang, P., Gao, W., Tan, Q. and Wong, K.-F., 2013, "A Link-Bridged Topic Model for Cross-Domain Document Classification," Information Processing and Management, Vol. 49, No. 6, pp. 1181-1193.
51. Yu, H., Oh, J. and Han, W.-S., 2009, "Efficient Feature Weighting Methods for Ranking," Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1157-1166.
52. Zhao, W. X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E.-P. and Li, X., 2011, "Topical Keyphrase Extraction from Twitter," Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, pp. 379-388.
53. Zhao, X.-G., Wang, G., Bi, X., Gong, P. and Zhao, Y., 2011, "XML Document Classification Based on ELM," Neurocomputing, Vol. 74, No. 16, pp. 2444-2451.
54. Zhou, S., Chen, Q. and Wang, X., 2013, "Active Deep Learning Method for Semi-Supervised Sentiment Classification," Neurocomputing, Vol. 120, pp. 536-546.
55. Zhou, S., Chen, Q. and Wang, X., 2014, "Fuzzy Deep Belief Networks for Semi-Supervised Sentiment Classification," Neurocomputing, Vol. 131, pp. 312-322.