研究生: |
李元瀚 Lee, Yuan-Han |
---|---|
論文名稱: |
互資訊估計量的比較與軟體開發 Comparison of Mutual Information Estimators and Software Development |
指導教授: | 趙蓮菊 |
口試委員: |
鄭又仁
楊欣洲 |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 中文 |
論文頁數: | 87 |
中文關鍵詞: | 互資訊 、熵指標 |
外文關鍵詞: | Mutual Information, Shannon entropy |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
《生物多樣性公約》已有193個國家簽署,可見世界各國已正視生物多樣性之重要,因此研究生物間的交互行為亦是個重要的議題。互資訊 (mutual information) 是用來衡量兩隨機變數之間的關聯性,在生態上可以使用互資訊來量化物種之間的交互作用或是專一程度。互資訊可由熵 (entropy) 指標來表示,Chao, A., Wang, Y. T. and Jost, L. (2013) 將熵指標表示成發現新物種速率的一個函數,提出近乎不偏的熵指標估計量。本文研究Chao et al. (2013) 熵指標估計量套用在估計互資訊上的表現情形,亦即本文建議之估計量,並與其他五個估計量作比較,以及提出修正兩維度資料下拔靴方法之標準差估計。藉由電腦模擬的方式探討各個估計量的表現,結果顯示本文建議之估計量在所有實驗中有接近最小的偏差,可接受的變異程度,以及近乎一致的具有最小的均方根誤差,其收斂速度相較於其他估計量也更為迅速;當抽取個體數較少時,即可有非常出色的估計表現。同時搭配實例分析,更能了解互資訊對於生態上的實務應用。
此外介紹以R語言開發的互動式網頁ChaoEntropy Online,此軟體包含兩大部分「Shannon entropy」與「Mutual Information」,經由簡單地點選即可得到估計值、拔靴標準差與信賴區間。亦能在本機端執行,資料不外洩,希望對熵指標或互資訊估計有需求的人士,能藉由此軟體輕鬆分析。
[1] Blüthgen, N., N. E. Stork, and K. Fiedler. (2004). Bottom-up control and co-occurrence in complex communities: honeydew and nectar determine a rainforest ant mosaic. Oikos, 106, 344-358.
[2] Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, 11, 265-270.
[3] Chao, A. (1987). Estimating the population size for capture-recapture data with unequal catchability. Biometrics, 43, 783-791.
[4] Chao, A. and Shen, T.-J. (2003). Nonparametric Estimation of Shannon’s index of diversity when there are unseen species. Environmental and Ecological Statistics, 10, 429-443.
[5] Chao, A. and Jost. L. (2012). Coverage-based rarefaction: standardizing samples by completeness rather than by size. Ecology, 93, 2533-2547.
[6] Chao, A., Wang, Y. T. and Jost, L. (2013). Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Methods in Ecology and Evolution, 4, 1091-1100.
[7] Chao, A., Lee, Y.-H. and Tseng, K.-S. (2014). ChaoEntropy Online.
[8] Cover, T. M., & Thomas, J. A. (1991). Entropy, relative entropy and mutual information. Elements of Information Theory, 12-49.
[9] Darbellay, G. A. (1999). An estimator of the mutual information based on a criterion for conditional independence. Computational Statistics & Data Analysis, 32(1), 1-17.
[10] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1-26.
[11] Good, I.J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237-264.
[12] Good, I.J. (2000). Turing's anticipation of empirical Bayes in connection with the cryptanalysis of the naval enigma. Journal of Statistical Computation and Simulation, 66, 101-111.
[13] Haghighat, M. B. A., Aghagolzadeh, A., & Seyedarabi, H. (2011). A non-reference image fusion metric based on mutual information of image features. Computers & Electrical Engineering, 37(5), 744-756.
[14] Horvitz, D.G. & Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663-685.
[15] Khan, S., Bandyopadhyay, S., Ganguly, A. R., Saigal, S., Erickson III, D. J., Protopopescu, V., & Ostrouchov, G. (2007). Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Physical Review E, 76(2), 026209.
[16] Margolin, A. A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Favera, R. D., & Califano, A. (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC bioinformatics, 7(Suppl 1), S7.
[17] Miller, G.A. (1955). Note on the bias of information estimates. Information Theory in Psychology, 95-100.
[18] Paninski, L. (2003). Estimation of entropy and mutual information. Neural Computation, 15, 1191-1253.
[19] Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S. (2007). Correcting for the sampling bias problem in spike train information measures. Journal of neurophysiology, 98(3), 1064-1072.
[20] Quenouille M. (1949). Approximate tests of correlation in time series. Journal of the Royal Statistical Society, 11, 68-84.
[21] Schleuning, M., Blüthgen, N., Flörchinger, M., Braun, J., Schaefer, H. M., & Böhning-Gaese, K. (2011). Specialization and interaction strength in a tropical plant-frugivore network differ among forest strata. Ecology, 92(1), 26-36.
[22] Shannon, C.E. (1948). The mathematical theory of communication. Bell System Technical Journal, 27, 379-423.
[23] Suzuki, T., Sugiyama, M., Sese, J., & Kanamori, T. (2008). Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation. Journal of Machine Learning Research-Proceedings Track, 4, 5-20.
[24] Treves, A., & Panzeri, S. (1995). The upward bias in measures of information derived from limited data samples. Neural Computation, 7, 399-407.
[25] Tukey J. (1958). Bias and confidence in not quite large samples. Annals of Mathematical Statistics, 29, 6.
[26] Wells III, W. M., Viola, P., Atsumi, H., Nakajima, S., & Kikinis, R. (1996). Multi-modal volume registration by maximization of mutual information. Medical image analysis, 1(1), 35-51.
[27] Wilson, E. O. (2002). The Future of Life.
[28] Zahl, S. (1977). Jackknifing an index of diversity. Ecology, 58, 907-913.
[29] 趙蓮菊, 邱春火, 王怡婷, 謝宗震, 馬光輝 (2013). 仰觀宇宙之大, 俯察品類之盛:如何量化生物多樣性. Journal of the Chinese Statistical Association, 51, 8-53.
[30] 徐源泰 (1999). 生物多樣性, 生物技術與生物產業. 1999年生物多樣性論文集, 145-158.
[31] 周三童 (民 102). 區塊抽樣之熵指標估計 趙蓮菊指導 新竹市國立清華大學統計學研究所碩士論文.