簡易檢索 / 詳目顯示

研究生: 童旻浩
Tung, Ming-Hao
論文名稱: 快速且精準之時間序列分群演算法-透過精準群心選擇
A Fast and Accurate Time Series Clustering Algorithm via Precise Center Selection
指導教授: 廖崇碩
Liao, Chung-Shou
口試委員: 黃文良
Hwang, Wen-Liang
韓永楷
Hon, Wing Kai
呂俊賢
Lu, Chun-Shien
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 36
中文關鍵詞: 集群問題時間序列動態時間校正群心選擇
外文關鍵詞: Clustering, TimeSeries, DynamicTimeWarping, CenterSelection
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 時間序列集群演算法在許多科學領域得到了廣泛的研究,特別是對於過去十年的交通流量預測問題。最近,在2015年提出了一種稱為TADPole的基於密度的時間序列集群演算法,其性能優於所有其他方法。既使TADPole在大多數測試用例中表現良好,但其輸出分群結果的精準度仍有改進空間以及其選定為群中心的品質。
    在本研究中,我們提出了一種快速且準確的時間序列集群演算法,Density Peak via Center Selection (DPCS),我們透過更精準的群心選擇來保留各群的特性在這些群心中。此外,我們根據所需的各資料分佈的屬性構建輸出集群。實驗結果表明DPCS對於大量集群具有高度有效性,即使在處理群集之間具有明顯不同密度時也是如此。特別地,我們顯示DPCS在輸出群集和集群中心都比TADPole更準確,同時保持了與TADPole類似的運行時間。從實務的角度,我們提出的DPCS演算法可以很好的處理時間序列預測問題的應用。


    Time series clustering algorithms have been widely studied in many scientific areas, especially for the traffic flow prediction problem in past decade. Very recently, a density-based time series clustering algorithm, called TADPole, which outperforms all the other approaches, was proposed in 2015. Although TADPole performs well in most of the test cases, there is still room for improvement in precision of its output clusters and the quality of the selected centers.
     In this study, we propose a fast and more accurate time series clustering algorithm,
    Density Peak via Center Selection (DPCS), which selects centers that can hold features of the cluster data. Moreover, we construct the output clusters according to the required properties. Experimental results demonstrate the effectiveness of DPCS for a large number of clusters, even when dealing with clusters with significantly different densities. In particular, we show that DPCS is more accurate than TADPole in both output clusters and cluster centers while maintaining similar running time of TAD-Pole. From a practical perspective, the proposed DPCS algorithm can obviously find many applications of time series forecasting problems.

    Contents 摘要 I Abstract II 誌謝 III Contents IV List of Figures and Tables V 1 Introduction 1 1.1 Motivation 1 1.2 Prior work 2 1.2 Our contribution 3 2 DTW and TADPole Revisited 6 2.1 Multi-dimensional DTW 6 2.2 Lower bound of multi-dimensional DTW 7 2.3 Procedures of TADPole 8 3 DPCS: Density Peak via Center Selection 12 3.1 Algorithm 12 3.2 Comparison with the TADPole algorithm 14 3.3 Assignment strategies 17 4 Experimental Evaluation 19 4.1 Evaluation of the algorithm 19 4.2 Cluster centers evaluation 25 4.3 Parameter sensitivity of DPCS 26 4.4 Discussion 27 5 Real-world Case Study: Traffic Network 28 5.1 Similar time series patterns 28 5.2 Correlation between pattern and traffic situation 31 6 Conclusion 33 Reference 33

    [1] Nurjahan Begum, Liudmila Ulanova, Jun Wang, and Eamonn Keogh. 2015.
    Accelerating Dynamic Time Warping Clustering with a Novel Admissible
    Pruning Strategy. In Proceedings of the 21th ACM SIGKDD International
    Conference on Knowledge Discovery and Data Mining (KDD 15). ACM, New
    York, NY, USA, 4958. DOI: http://dx.doi.org/10.1145/2783258.2783286
    [2] S. Brecheisen, H. P. Kriegel, and M. Pfeifle. 2004. Efficient
    density-based clustering of complex objects. In Data Mining, 2004.
    ICDM 04. Fourth IEEE International Conference on. 4350. DOI:
    http://dx.doi.org/10.1109/ICDM.2004.10082
    [3] Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall,
    Abdullah Mueen, and Gustavo Batista. 2015. The UCR Time Series
    Classification Archive. (July 2015).
    [4] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn
    Keogh. 2008. Querying and Mining of Time Series Data: Experimental Comparison
    of Representations and Distance Measures. Proc. VLDB Endow. 1, 2
    (Aug. 2008), 15421552. DOI: http://dx.doi.org/10.14778/1454159.1454226
    33
    [5] Martin Ester, Hans-Peter Kriegel, Jo rg Sander, and Xiaowei Xu. 1996. A
    density- based algorithm for discovering clusters in large spatial databases
    with noise. AAAI Press, 226231.
    [6] Jyh-Shing Roger Jang. 2016. Machine Learning Toolbox. available at
    http://mirlab.org/jang/matlab/toolbox/machineLearning. accessed on Dec
    10, 2016.
    [7] Leonard Kaufman and Peter J. Rousseeuw. 2009. Finding Groups in Data:
    An Introduction to Cluster Analysis. Wiley.
    [8] Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact Indexing of
    Dynamic Time Warping. Knowl. Inf. Syst. 7, 3 (March 2005), 358386. DOI:
    h p://dx.doi.org/10.1007/s10115-004-0154-9
    [9] Stephen Kokoska and Daniel Zwillinger. 2000. CRC Standard Probability and
    Statistics Tables and Formulae. Chapman & Hall / CRC.
    [10] P. D. Kovesi. 2000. MATLAB and Octave Functions for Computer Vision
    and Image Processing. (2000). http://www.peterkovesi.com/matlabfns/.
    [11] J. MacQueen. 1967. Some methods for classification and analysis
    of multivariate observations. In Proceedings of the Fifth Berkeley
    Symposium on Mathematical Statistics and Probability, Volume
    1: Statistics. University of California Press, Berkeley, Calif., 281297.
    http://projecteuclid.org/euclid.bsmsp/1200512992
    [12] Son T. Mai, Ira Assent, and Martin Storgaard. 2016. AnyDBC: An Efficient
    Anytime Density-based Clustering Algorithm for Very Large Complex
    Datasets. In Proceedings of the 22Nd ACM SIGKDD International Conference
    on Knowledge Discovery and Data Mining (KDD 16). ACM, New York,
    NY, USA, 10251034. DOI: http://dx.doi.org/10.1145/2939672.2939750
    [13] Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo
    Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn
    34
    Keogh. 2012. Searching and Mining Trillions of Time Series Subsequences
    Under Dynamic Time Warping. In Proceedings of the 18th
    ACM SIGKDD International Conference on Knowledge Discovery and
    Data Mining (KDD 12). ACM, New York, NY, USA, 262270. DOI:
    http://dx.doi.org/10.1145/2339530.2339576
    [14] William M. Rand. 1971. Objective Criteria for the Evaluation
    of Clustering Methods. J. Amer. Statist. Assoc. 66, 336 (1971),
    846850. DOI: http://dx.doi.org/10.1080/01621459.1971.10482356 arXiv:
    http://www.tandfonline.com/doi/pdf/10.1080/01621459.1971.10482356
    [15] Alex Rodriguez and Alessandro Laio. 2014. Clustering by fast search and
    find of density peaks. Science 344, 6191 (Jun 2014), 14921496. DOI:
    http://dx.doi.org/10.1126/science.1242072
    [16] Jin Shieh and Eamonn Keogh. 2008. iSAX: Indexing and Mining Terabyte
    Sized Time Series. In Proceedings of the 14th ACM SIGKDD International
    Conference on Knowledge Discovery and Data Mining (KDD 08). ACM, New
    York, NY, USA, 623631. DOI: http://dx.doi.org/10.1145/1401890.1401966
    [17] Mohammad Shokoohi-Yekta, Jun Wang, and Eamonn
    Keogh. On the Non-Trivial Generalization of Dynamic
    Time Warping to the Multi-Dimensional Case.
    289297.DOI:http://dx.doi.org/10.1137/1.9781611974010.33arXiv:http://epubs
    .siam.org/doi/pdf/10.1137/1.9781611974010.33
    [18] Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, and Eamonn
    Keogh. 2003. Indexing Multi-dimensional Time-series with Support for Multiple
    Distance Measures. In Proceedings of the Ninth ACM SIGKDD International
    Conference on Knowledge Discovery and DataMining (KDD 03). ACM,
    New York, NY, USA, 216225. DOI: http://dx.doi.org/10.1145/956750.956777
    35
    [19] Yuan Yuan, Yi-Ping Phoebe Chen, Shengyu Ni, Augix Guohua Xu, Lin Tang,
    Martin Vingron, Mehmet Somel, and Philipp Khaitovich. 2011. Development
    and application of a modified dynamic time warping algorithm (DTW-S) to
    analyses of primate brain expression time series. BMC Bioinformatics 12, 1
    (2011), 347. DOI: http://dx.doi.org/10.1186/1471- 2105- 12- 347
    [20] KDD CUP 2017 website https:tianchi.aliyun.com/competition/information.htm?
    spm=5176.100067.5678.2.8CnCPt&raceId=231597

    QR CODE