研究生: |
童旻浩 Tung, Ming-Hao |
---|---|
論文名稱: |
快速且精準之時間序列分群演算法-透過精準群心選擇 A Fast and Accurate Time Series Clustering Algorithm via Precise Center Selection |
指導教授: |
廖崇碩
Liao, Chung-Shou |
口試委員: |
黃文良
Hwang, Wen-Liang 韓永楷 Hon, Wing Kai 呂俊賢 Lu, Chun-Shien |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 36 |
中文關鍵詞: | 集群問題 、時間序列 、動態時間校正 、群心選擇 |
外文關鍵詞: | Clustering, TimeSeries, DynamicTimeWarping, CenterSelection |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
時間序列集群演算法在許多科學領域得到了廣泛的研究,特別是對於過去十年的交通流量預測問題。最近,在2015年提出了一種稱為TADPole的基於密度的時間序列集群演算法,其性能優於所有其他方法。既使TADPole在大多數測試用例中表現良好,但其輸出分群結果的精準度仍有改進空間以及其選定為群中心的品質。
在本研究中,我們提出了一種快速且準確的時間序列集群演算法,Density Peak via Center Selection (DPCS),我們透過更精準的群心選擇來保留各群的特性在這些群心中。此外,我們根據所需的各資料分佈的屬性構建輸出集群。實驗結果表明DPCS對於大量集群具有高度有效性,即使在處理群集之間具有明顯不同密度時也是如此。特別地,我們顯示DPCS在輸出群集和集群中心都比TADPole更準確,同時保持了與TADPole類似的運行時間。從實務的角度,我們提出的DPCS演算法可以很好的處理時間序列預測問題的應用。
Time series clustering algorithms have been widely studied in many scientific areas, especially for the traffic flow prediction problem in past decade. Very recently, a density-based time series clustering algorithm, called TADPole, which outperforms all the other approaches, was proposed in 2015. Although TADPole performs well in most of the test cases, there is still room for improvement in precision of its output clusters and the quality of the selected centers.
In this study, we propose a fast and more accurate time series clustering algorithm,
Density Peak via Center Selection (DPCS), which selects centers that can hold features of the cluster data. Moreover, we construct the output clusters according to the required properties. Experimental results demonstrate the effectiveness of DPCS for a large number of clusters, even when dealing with clusters with significantly different densities. In particular, we show that DPCS is more accurate than TADPole in both output clusters and cluster centers while maintaining similar running time of TAD-Pole. From a practical perspective, the proposed DPCS algorithm can obviously find many applications of time series forecasting problems.
[1] Nurjahan Begum, Liudmila Ulanova, Jun Wang, and Eamonn Keogh. 2015.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible
Pruning Strategy. In Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD 15). ACM, New
York, NY, USA, 4958. DOI: http://dx.doi.org/10.1145/2783258.2783286
[2] S. Brecheisen, H. P. Kriegel, and M. Pfeifle. 2004. Efficient
density-based clustering of complex objects. In Data Mining, 2004.
ICDM 04. Fourth IEEE International Conference on. 4350. DOI:
http://dx.doi.org/10.1109/ICDM.2004.10082
[3] Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall,
Abdullah Mueen, and Gustavo Batista. 2015. The UCR Time Series
Classification Archive. (July 2015).
[4] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn
Keogh. 2008. Querying and Mining of Time Series Data: Experimental Comparison
of Representations and Distance Measures. Proc. VLDB Endow. 1, 2
(Aug. 2008), 15421552. DOI: http://dx.doi.org/10.14778/1454159.1454226
33
[5] Martin Ester, Hans-Peter Kriegel, Jo rg Sander, and Xiaowei Xu. 1996. A
density- based algorithm for discovering clusters in large spatial databases
with noise. AAAI Press, 226231.
[6] Jyh-Shing Roger Jang. 2016. Machine Learning Toolbox. available at
http://mirlab.org/jang/matlab/toolbox/machineLearning. accessed on Dec
10, 2016.
[7] Leonard Kaufman and Peter J. Rousseeuw. 2009. Finding Groups in Data:
An Introduction to Cluster Analysis. Wiley.
[8] Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact Indexing of
Dynamic Time Warping. Knowl. Inf. Syst. 7, 3 (March 2005), 358386. DOI:
h p://dx.doi.org/10.1007/s10115-004-0154-9
[9] Stephen Kokoska and Daniel Zwillinger. 2000. CRC Standard Probability and
Statistics Tables and Formulae. Chapman & Hall / CRC.
[10] P. D. Kovesi. 2000. MATLAB and Octave Functions for Computer Vision
and Image Processing. (2000). http://www.peterkovesi.com/matlabfns/.
[11] J. MacQueen. 1967. Some methods for classification and analysis
of multivariate observations. In Proceedings of the Fifth Berkeley
Symposium on Mathematical Statistics and Probability, Volume
1: Statistics. University of California Press, Berkeley, Calif., 281297.
http://projecteuclid.org/euclid.bsmsp/1200512992
[12] Son T. Mai, Ira Assent, and Martin Storgaard. 2016. AnyDBC: An Efficient
Anytime Density-based Clustering Algorithm for Very Large Complex
Datasets. In Proceedings of the 22Nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD 16). ACM, New York,
NY, USA, 10251034. DOI: http://dx.doi.org/10.1145/2939672.2939750
[13] Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo
Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn
34
Keogh. 2012. Searching and Mining Trillions of Time Series Subsequences
Under Dynamic Time Warping. In Proceedings of the 18th
ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD 12). ACM, New York, NY, USA, 262270. DOI:
http://dx.doi.org/10.1145/2339530.2339576
[14] William M. Rand. 1971. Objective Criteria for the Evaluation
of Clustering Methods. J. Amer. Statist. Assoc. 66, 336 (1971),
846850. DOI: http://dx.doi.org/10.1080/01621459.1971.10482356 arXiv:
http://www.tandfonline.com/doi/pdf/10.1080/01621459.1971.10482356
[15] Alex Rodriguez and Alessandro Laio. 2014. Clustering by fast search and
find of density peaks. Science 344, 6191 (Jun 2014), 14921496. DOI:
http://dx.doi.org/10.1126/science.1242072
[16] Jin Shieh and Eamonn Keogh. 2008. iSAX: Indexing and Mining Terabyte
Sized Time Series. In Proceedings of the 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD 08). ACM, New
York, NY, USA, 623631. DOI: http://dx.doi.org/10.1145/1401890.1401966
[17] Mohammad Shokoohi-Yekta, Jun Wang, and Eamonn
Keogh. On the Non-Trivial Generalization of Dynamic
Time Warping to the Multi-Dimensional Case.
289297.DOI:http://dx.doi.org/10.1137/1.9781611974010.33arXiv:http://epubs
.siam.org/doi/pdf/10.1137/1.9781611974010.33
[18] Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, and Eamonn
Keogh. 2003. Indexing Multi-dimensional Time-series with Support for Multiple
Distance Measures. In Proceedings of the Ninth ACM SIGKDD International
Conference on Knowledge Discovery and DataMining (KDD 03). ACM,
New York, NY, USA, 216225. DOI: http://dx.doi.org/10.1145/956750.956777
35
[19] Yuan Yuan, Yi-Ping Phoebe Chen, Shengyu Ni, Augix Guohua Xu, Lin Tang,
Martin Vingron, Mehmet Somel, and Philipp Khaitovich. 2011. Development
and application of a modified dynamic time warping algorithm (DTW-S) to
analyses of primate brain expression time series. BMC Bioinformatics 12, 1
(2011), 347. DOI: http://dx.doi.org/10.1186/1471- 2105- 12- 347
[20] KDD CUP 2017 website https:tianchi.aliyun.com/competition/information.htm?
spm=5176.100067.5678.2.8CnCPt&raceId=231597