研究生: |
艾雪芳 |
---|---|
論文名稱: |
相關性模型與群集偵測 Correlation Model and Cluster Detection |
指導教授: | 周若珍 |
口試委員: |
陳珍信
黃信誠 徐南容 林培生 |
學位類別: |
博士 Doctor |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2012 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 105 |
中文關鍵詞: | 群集偵測 、空間相關性模型 、時空相關性模型 、偵測率 、假警率 |
外文關鍵詞: | cluster detection, spatial correlation model, spatial-temporal correlation model, detection rate, false alarm rate |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
群集偵測一直是近幾年來是各領域重要的研究問題。傳統的掃描法雖然速度快,但在高相關性資料中會產生許多假警報,而且不能偵測到高盛行率的小群集。而近年來發展的階層模型雖可以偵測群集並同時估計參數,不過須先設定群集個數或群集個數的範圍。本文提出空間及時空相關性模型及所發展的二階段估計法並不需要設定群集個數,經模擬驗證可有效估計模型參數,除了獲得相關性資訊外,偵測率及假警率都有令人滿意的結果。
本文以流行病學上常見的疾病通報資料做驗證。提出的空間相關性模型之實例為紐約州白血病案例。除了偵測到文獻上所提的群集外,由於將相關性納入模型中,可以有效預測保留地點的強度;時空相關性模型之實例為台灣本地腮腺炎及登革熱二種不同傳染途徑的疾病。腮腺炎的時空效應均不明顯,而群集發生地主要是屏東及大台北地區。但近年來台南及高雄也有增加的趨勢,苗栗、雲林、嘉義及花蓮反而減少。登革熱主要的疫區在台南高雄及大台北地區,並且有明顯的時空效應,南部群集多從夏天開始持續到冬天,不過北部的群集不常發生持續的現象。雖然登革熱的個案總數多於腮腺炎,但模型估計的盛行率只有腮腺炎的1/35,顯示此疫一旦發生就迅速蔓延,各級衛生單位須嚴加監控。這二種疾病時空相關性模型的預測值,在個案數多的情況下,模型偏差較簡單時空模型大,反之則有較小的模型偏差,因而可做為監測之用。
Cluster detection is an important problem in many researches. In the case of disease cluster detection, the popular scan statistics of Kulldorff et al is easy to understand and fast to execute, but there are still some drawbacks. It often leads to many false alarms in highly correlation data. Also it cannot detect small clusters even if their infection rates are very high. The spatial hierarchical models proposed by Gangnon and Clayton recently provide information about cluster and spatial/spatial-temporal background. But they do not take into account the possible correlation among the noises. Besides, it has to set the maximum number of clusters which affect the results. In this thesis, we propose spatial and spatial-temporal correlation models and develop the two-stage estimation method which does not need to set the number of clusters in advance. The proposed models are helpful for importation and forecasting purposes. Simulation studies show that they have low false alarm rate and high detection rate.
Our empirical studies are announced disease cases in epidemiology. The upstate New York leukemia data is used for our spatial correlation model. The model provides spatial correlation which makes possible the imputation of missing observation. It can find small clusters which is not possible using scan statistics. The forecast deviance is large when case number rises which hints it potential as a monitoring tool.
For the spatial-temporal correlation model, the mumps and dengue fever in Taiwan, which have different routes of infection, are used. For the mumps data, the spatial and temporal correlations are not clear, and the clusters take place in the north of Taiwan and Ping Tung frequently. Dengue fever clusters usually occur in Tainan, Kaohsiung and Taipei, and the spatial and temporal correlations of dengue fever are significant. The clusters which occur in southern Taiwan last quite long, often starting in summer and vanishing in winter. The clusters in northern Taiwan are usually not sustained. Dengue fever infects fast, so the department of health has to pay close attention as long as one case takes place.
Similarly the prediction deviances, for both diseases, are large when the number of cases increases, indicating its potential usage as a monitoring tool.
Baddeley A., and Møller, J. (1989). Nearest-Neighbour Markov point processes and random sets. International Statistical Review / Revue Internationale de Statistique, 57, 89-121.
Bartlett, M. S. (1963). The spectral analysis of point processes. Journal of the Royal Statistical Society. Series B, 25 , 264-96.
Bartlett, M. S. (1964). The Spectral Analysis of Two-Dimensional Point Processes. Biometrika, 51 , 299-311.
Berman, M., and Diggle, P. (1989). Weighted Integrals of the Second-Order Intensity of a Spatial Point Process. Journal of the Royal Statistical Society. Series B, 51, 81-92.
Besag, J. E. (1974). Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal of the Royal Statistical Society. Series B, 36, 192-236.
Besag, J. E. (1994). Discussion of the paper by Grenander and Miller. Journal of the Royal Statistical Society. Series B, 56, 591-592.
Brix, A., and Diggle, P. J. (2001). Spatiotemporal prediction for log-Gaussian Cox processes. Royal Statistical Society. Series B, 63, 823-841.
Byth, K., and Ripley, B. D. (1980). On Sampling Spatial Patterns by Distance Methods. Biometrics, 36, 279-284.
Chan, H. P. (2009). Detection of spatial clustering with average likelihood ratio test statistics. The Annals of Statistics, 37, 3985-4010.
Chang, Ih, Tiao, G. C., and Chen, C. (1988). Estimation of time series parameters in the presence of outliers. Technometrics, 30, 193-204.
Chang, N. T., Hsu, E. L., Pai, H. H., King, C. C., Tu, W. C., Tang, L. C., Dai, S. M., Luo, Y. P., Wu, H. H., and Lin, Y. H. (2010). Report on the integrated research program for the management of dengue epidemics and vector mosquitoes in Southern Taiwan. In Proceeding of the International forum for Dengue Control, 2010, 111-116.
Clayton, D., and Kaldor, J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics, 43, 671-681.
Cressie, N. A. C. (1991). Statistics for Spatial Data. New York: Wiley.
Diggle, P. J., Besag, J., and Gleaves, J. T. (1976). Statistical analysis of spatial point patterns by means of distance methods. Biometrics, 32, 659-667.
Diggle, P. J. (1983). Statistical Analysis of Spatial Point Patterns. New York: Academic Press.
Diggle, P. J. (1985). A kernel method for smoothing point process data. Journal of the Royal Statistical Society. Series C, 34, 138-147.
Diggle, P. J., and Marron J. S. (1988). Equivalence of smoothing parameter selectors in density and intensity estimation. Journal of the American Statistical Association, 83, 793-800.
Diggle, P. J. (2003). Statistical Analysis of Spatial Point Patterns. (2nd ed.). London: Arnold.
Diggle, P. J., Ribeiro, P. J., and Christensen, O. (2003). An introduction to model-based geostatistics. In spatial Statistics and Computational Methods: Lecture Notes in Statistics, 173, (Møller Ed.). Springer: New York.
Diggle, P. J., Rowlingson, B., and Su, T. (2005). Point process methodology for on-line spatio-temporal disease surveillance. Environmetrics, 16, 423-434.
Fisher, R. A., Thornton, H. G., and Mackenzie, W. A. (1922). The accuracy of the plating method of estimating the density of bacterial populations, with particular reference to the use of Thornton’s agar medium with soil samples. Annals of Applied Biology, 9, 325-359.
Gangnon, R. E., and Clayton, M. K. (2000). Bayesian detection and modeling of spatial disease clustering. Biometrics, 56, 922-935.
Gangnon R. E., and Clayton, M. K. (2001). A weighted average likelihood ratio test for spatial clustering of disease. Statistics in Medicine, 20, 2977-2987.
Gangnon, R. E., and Clayton, M. K. (2003). A hierarchical model for spatially clustered disease rates. Statistics in Medicine, 22, 3213-3228.
Gangnon, R. E. (2006). Impact of prior choice on local Bayes factors for cluster detection. Statistics in Medicine, 25, 883-895.
Gangnon, R. E., and Clayton, M. K. (2007). Cluster detection using Bayes factors from overparameterized cluster models. Environmental and Ecological Statistics, 14, 69-82.
Gangnon, R. E. (2010). A model for space–time cluster detection using spatial clusters with flexible temporal risk patterns. Statistics in Medicine, 29, 2325- 2337.
Gelman, A. (1995). Bayesian Data Analysis. London : Chapman & Hall.
Jeffreys, H. (1961). The Theory of Probability. Oxford: Oxford University Press.
Knox E. G. (1964). The detection of space-time interaction. Applied Statistics, 13, 25-30.
Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics: Theory and Methods, 26, 1481-1496.
Kulldorff, M., and Nagarwalla, N. (1995). Spatial disease clusters: Detection and inference. Statistics in Medicine, 14, 799-810.
Kulldorff, M. (2001). Prospective time-periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society. Series A, 164, 61–72.
Lawson, A. B. (1995). Markov chain Monte Carlo methods for putative pollution source problems in environmental epidemiology. Statistics in Medicine, 14, 2473-2486.
Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27, 209-220.
Møller, J., Syversveen, A. R., and Waagepetersen, R. P. (1998). Log Gaussian Cox processes. Scandinavian Journal of Statistics, 25, 451-482.
Møller, J., and Waagepetersen, R. P. (2004). Statistical Inference and Simulation for Spatial Point Processes. London: Chapman & Hall.
Neill, D. B. (2006). Detection of spatial and spatio-temporal clusters. Ph.D. thesis. Carnegie Mellon University, Department of Computer Science.
Neill, D. B. (2009a). Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting, 25(3), 498-517.
Neill, D. B. (2009b). An empirical comparison of spatial scan statistics for outbreak detection. International Journal of Health Geographics, 8(20).
Openshaw, S., Charlton, M., Craft, A. W., and Birch, J. M. (1988). Investigation of leukemia clusters by use of a geographical analysis machine. Lancet, 1, 272-273.
Ripley, B. D. (1977). Modelling spatial patterns. Journal of the Royal Statistical Society. Series B, 39, 172-212.
Ripley, B. D. (1981). Spatial Statistics. Wiley, New York.
Robert, C. P., and Casella, G. (2004). Monte Carlo statistical methods. New York: Springer.
Robert, C. P., Chopin, N., and Rousseau, J. (2009). Harold Jeffreys's theory of probability revisited. Statistical Science, 24, 141-172.
Roberts, G. O., Rosenthal, J. S. (1998). Markov-chain Monte Carlo: Some practical implications of theoretical results. Canadian Journal of Statistics, 26, 5-20.
Roberts, G. O., and Tweedie, R. L. (1997). Exponential convergence of Langevin diffusions and their discrete approximations. Bernoulli, 2, 341-363.
Ross, S. M. (1997). Simulation. (3rd ed. ). San Diego: Academic Press.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., van der Linde, A. (2002). Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society. Series B, 64, 583-639.
Turnbull, B. W., Iwano, E. J., Burnett, W. S., Howe, H. L., and Clark, L. C. (1990). Monitoring for clusters of disease: application to Leukemia incidence in upstate New York. American Journal of Epidemiology, 132, S136-S143.
Waller, L. A., Carlin, B. P., Xia, H., and Gelfand, A. E. (1997). Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association, 92, 607-617.
Waller LA, Turnbull BW, Clark LC, Nasca P. (1994). Spatial pattern analyses to detect rare disease clusters. In Case Studies in Biometry, Wiley: New York, 3-22.
Whittemore, A. S., Friend, N., Brown J., B. W., and Holly, E. A. (1987). A test to detect clusters of disease. Biometrika, 74, 631-637.
Yan, P., and Clayton, M. (2006). A cluster model for space-time disease counts. Statistics in Medicine, 25, 867-881.
內政部統計查詢網 http://statis.moi.gov.tw/micst/stmain.jsp?sys=100
交通部運輸研究所電子地圖 http://www.iot.gov.tw/ct.asp?xItem=105644&CtNode=1086&mp=1
林鼎翔 (2000). 台灣地區登革熱流行情形與防治, 疫情報導, 16, 187-194.
疾病管理傳染病資料統計查詢系統 http://nidss.cdc.gov.tw/FAQContents.aspx
張美齡、林培生、鄒小蕙 (2008). 空間階層模型在偵測台灣疾病群聚的應用. 中國統計學報 46, 22-35.
黃佩櫻 (2009). 貝氏時間與空間統計模式之應用. 國立政治大學統計研究所碩士論文.
廖勇柏 (2000). 癌症地圖的繪製:趨勢面分析法的改與其在時空特性換討之應用. 國立台灣大學流行病學研究所博士論文.