研究生: |
林育慧 Yu-Hui Lin |
---|---|
論文名稱: |
詞彙週期的分析與預測 Analyzing and Predicting the Occurrence of Terms |
指導教授: |
陳宜欣
Yi-Shin Chen |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | TF-IDF 、傅立葉轉換 、傅立葉頻譜 |
外文關鍵詞: | TF-IDF, Fourier Transform, Fourier spectrum |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於網際網路的快速成長及資訊數量的暴增,如何有效率地抽取能滿足使用者需求的資訊成為一重要的課題。有鑑於某些資訊存在於時效性之特徵,若能分析出其時間性並善加利用,相信對於抽取相關之資訊時能更有效率。常常我們會利用所抽取出足以代表一篇文章的關鍵詞來建立索引,以助於後續之搜尋。在此我們也藉由分析關鍵詞的時間性,進而找出詞彙可能發生的週期進行預測。
藉由衡量同一詞彙在不同時間點下的權重,並提出“two stage dichotomy”,亦即利用傅立葉頻譜來將其依發生的頻率分類;之後再將可能有週期性特徵的詞彙依其時間的特徵及經傅立葉轉換後的傅立葉頻譜的特徵找出週期,以進行預測。
根據實驗結果顯示,我們的方法不但可行而且成效顯著,利用本篇論文所提出之分析詞彙的方法,的確可以將詞彙依發生的頻繁度進行分類;此外,雖然從分析出來的詞彙的權重無法直接觀察出其週期性,但結合傅立葉頻譜及該詞彙本身時間性的特徵是可以協助找到週期的。
A method to analyze keywords is proposed in this paper. It measures the weights of keywords in different time slots, and classifies the keywords. According to the features of respecting occurrence of keywords and analyzing Fourier spectrum, which transforms the weights of the keyword in different time slot into frequency, the keywords could be classified. Additionally, we propose a new method, combining full and partial periodicity analysis, to predict the occurrence of keywords, which detects the periodicity of one keyword for prediction.
There are few researchers analyzing keywords by time and utilizing Fourier Transform (FT) for analyzing the result to look for the characteristics of the occurrence of the keyword, and followed by analyzing the features of Fourier spectrum transformed from FT to detect the periodicity of the keywords. The experimental results show that our approach has a good performance in analyzing keywords and predicting the periodicity of keywords. Combining full and partial periodicity analysis, we improve the accuracy of prediction.
[1] Yu-Sheng Lai and Chung-HsienWu, Meaningful Term Extraction and Discriminative Term Selection in Text Categorization via Unknown-word Methodology, ACM Trans. on Asian Language Information Processing, Vol.1, No.1, pp.34-64, March 2002
[2] Eibe Frank, Gorden W. Paynter, and Ian H. Witten, Domain-Specific Keyphrase Extraction, Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 668-673, 1999
[3] Gerard Salton and Christopher Buckley, Term-Weighting Approaches in Automatic Text Retrieval, Information Processing & Management, Vol. 24, No. 5, pp. 513-523, 1988
[4] Gerard Salton and Michael J. McGill, Introduction to Modern Information Retrieval, New York: McGraw-Hill Book Co., 1983
[5] Ron Papka, On-line New Event Detection, Clustering, and Tracking, Technical Report: UM-CS-1999-045, 1999
[6] Wenjie Li, Kam-FaiWong, and Chunfa Yuan, A Model for Processing Temporal References in Chinese, Proceedings of the workshop on Temporal and Spatial Information Processing - Volume 13, Pages: 1-8, 2001
[7] Aurora Pons-Porrata, Rafael Berlanga-Llavori, and Jos’e Ruiz-Shulcloper, Temporal- Semantic Clustering of Newspaper Articles form Event Detection,PRIS, 2002
[8] Wenjie Li, Kam-Fai Wong, Guihong Cao, and Chunfa Yuan, Applying Machine Learning to Chinese Temporal Relation Resolution, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 2004
[9] Inderjeet Mani, Marc Verhagen, Ben Wellner, Chong Min Lee, and James Pustejovsky, Machine Learning of Temporal Relations, Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting on Association for Computational, 2006
[10] Chun-Yuan Teng, and Hsin-Hsi Chen, Detection of Bloggers’ Interests: Using Textual, Temporal, and Interactive Features, Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, 2006
[11] Chien Chin Chen, Yao-Tsung Chen, and Meng Chang Chen, An Aging Theory for Event Life-Cycle Modeling, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, VOL. 37, NO.2, March 2007
[12] Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, 2001
[13] ME Cain, HD Ambmos, FXWitkowski and BE sobel, Fast-Fourier transform analysis of signal-averaged electrocardiograms for identification of patients prone to sustained ventricular tachycardia, Circulation, Vol 69, 711-720, 1984
[14] O. Kinoshita, G. Fontaine, F. Rosas, J. Elias, T. Iwa, J. Tonet, G. Lascault, and R. Frank, Time- and Frequency-Domain Analyses of the Signal-Averaged ECG in Patients With Arrhythmogenic Right Ventricular Dysplasia, Circulation, February 1, 1995; 91(3): 715 - 721
[15] Shrish Tiwari, S. Ramachandran, Alok Bhattacharya, Sudha Bhattacharya and Ramakrishna Ramaswamy, Prediction of Probable Genes by Fourier Analysis of Genomic Sequences, Computer Applications in the Biosciences, Vol. 13 no. 3, pages 263-270, 1997
[16] Shelby Pereira, Joseph J. K. ’o Ruanaid, Fr’ed’eric Deguillaume, Gabriela Csurka and Thierry Pun, Template Based Recovery of Fourier-Based Watermarks Using Logpolar and Log-Log Maps, Multimedia Computing and Systems, 1999
[17] Hillol Kargupta and Byung-Hoon Park, A Fourier Spectrum-Based Approach to Represent Decision Trees for Mining Data Streams in Mobile Environments, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 2, pp. 216-229, February, 2004
[18] Hui Liu, Meng Wang, Kuo-Chen Chou, Low-Frequency Fourier Spectrum for Predicting Membrane Protein Types, Biochemical and Biophysical Research Communications pp. 737-739, 2005
[19] R. Leitgeb, C. Hitzenberger, and A. Fercher, Performance of fourier domain vs. time domain optical coherence tomography, Optics Express Vol. 11, No. 8, 889-894, April, 2003
[20] William A. Sethares and Thomas W. Staley, Periodicity Transforms, IEEE Transactions on Signal Processing, Vol. 47, No. 11, November, 1999
[21] Mark Wasson, Using Summaries in Document Retrieval, Proceedings of the Workshop on Automatic Summarization (including DUC 2002), pp. 37-44, July, 2002
[22] MASAHARU YOSHIOKA and MAKOTO HARAGUCHI, On a Combination of Probabilistic and Boolean IR Models for WWW Document, ACM Transactions on Asian Language Information Processing , Vol. 4, No. 3, Pages 340-356, September, 2005
[23] Christos Berberidis, Ioannis Vlahavas,Walid G. Aref, Mikhail Atallah, and Ahmed K. Elmagarmid, On the Discovery of Weak Periodicities in Large Time Series, Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, August, 2002
[24] Jiawei Han, Wan Gong, and Yiwen Yin, Mining Segment-Wise Periodic Patterns in Time-Related Databases, Fourth International Conference on Knowledge Discovery and Data Mining, 1998
[25] Jiawei Han, Guozhu Dong, Yiwen Yin, Efficient Mining of Partial Periodic Patterns in Time Series Database, Proceedings of the 15th International Conference on Data Engineering, , pages 106-115, March, 1999
[26] Rong Jin, Christos Falusos, and Alex G. Hauptmann, Meta-scoring: Automatically Evaluating Term Weighting Sehcmes in IR without Precision-Recall, Proceedings of the 24th Annual International ACM SIGIR conference on Research and Development in Information Retrieval, 2001
[27] BRUNO P.oSSAS, NIVIO ZIVIANI, WAGNER MEIRA, JR., and BERTHIER RIBEIRO-NETO, Set-Based Vector Model: An Efficient Aproach for Correlation- Based Ranking, ACM Transactions on Information Systems (TOIS), Volume 23 Issue 4, October, 2005
[28] Sang-Bum Kim, Hee-Cheol Seo, Hae-Chang Rim, Disambiguation: Information Retrieval Using Word Senses: Root Sense Tagging Approach, Proceedings of the 27th Annual International ACM SIGIR conference on Research and Development in Information Retrieval, July, 2004