簡易檢索 / 詳目顯示

研究生: 李元瀚
Lee, Yuan-Han
論文名稱: 互資訊估計量的比較與軟體開發
Comparison of Mutual Information Estimators and Software Development
指導教授: 趙蓮菊
口試委員: 鄭又仁
楊欣洲
學位類別: 碩士
Master
系所名稱: 理學院 - 統計學研究所
Institute of Statistics
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 87
中文關鍵詞: 互資訊熵指標
外文關鍵詞: Mutual Information, Shannon entropy
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 《生物多樣性公約》已有193個國家簽署,可見世界各國已正視生物多樣性之重要,因此研究生物間的交互行為亦是個重要的議題。互資訊 (mutual information) 是用來衡量兩隨機變數之間的關聯性,在生態上可以使用互資訊來量化物種之間的交互作用或是專一程度。互資訊可由熵 (entropy) 指標來表示,Chao, A., Wang, Y. T. and Jost, L. (2013) 將熵指標表示成發現新物種速率的一個函數,提出近乎不偏的熵指標估計量。本文研究Chao et al. (2013) 熵指標估計量套用在估計互資訊上的表現情形,亦即本文建議之估計量,並與其他五個估計量作比較,以及提出修正兩維度資料下拔靴方法之標準差估計。藉由電腦模擬的方式探討各個估計量的表現,結果顯示本文建議之估計量在所有實驗中有接近最小的偏差,可接受的變異程度,以及近乎一致的具有最小的均方根誤差,其收斂速度相較於其他估計量也更為迅速;當抽取個體數較少時,即可有非常出色的估計表現。同時搭配實例分析,更能了解互資訊對於生態上的實務應用。
    此外介紹以R語言開發的互動式網頁ChaoEntropy Online,此軟體包含兩大部分「Shannon entropy」與「Mutual Information」,經由簡單地點選即可得到估計值、拔靴標準差與信賴區間。亦能在本機端執行,資料不外洩,希望對熵指標或互資訊估計有需求的人士,能藉由此軟體輕鬆分析。


    致謝詞 i 摘要 ii 第一章 緒論 1 第二章 介紹模型與符號及相關文獻回顧 5 2.1 模型假設與符號說明 5 2.1.1 抽樣方法與模型假設 5 2.1.2 符號說明 8 2.2 相關文獻回顧 11 2.2.1 物種數估計 11 2.2.2 樣本涵蓋率 (Sample Coverage) 估計 14 2.2.3 拔靴 (Bootstrapping) 方法之標準差估計與其修正 16 第三章 Shannon熵指標與互資訊及其估計量 20 3.1 熵指標介紹 20 3.2 聯合熵、條件熵與相對熵介紹 22 3.2.1 聯合熵 (joint entropy) 23 3.2.2 條件熵 (conditional entropy) 23 3.2.3 相對熵 (relative entropy) 24 3.3 互資訊介紹 25 3.4 互資訊估計量 28 3.4.1 最大概似估計 29 3.4.2 最大概似估計量之修正偏差 30 3.4.3 摺刀法估計量 32 3.4.4 Chao and Shen 估計量 33 3.4.5 本文建議之估計量 35 3.5 兩維度拔靴方法之標準差估計與其修正 38 第四章 模擬研究 40 4.1 模擬實驗設定說明 40 4.2 模擬實驗的結果與討論 44 4.3 檢驗互資訊估計量使用常態假設之合理性 47 4.4 最大概似估計量之低估情形 48 第五章 實例分析 51 第六章 軟體開發 55 6.1 簡介 55 6.2 第一部分、Shannon 熵指標 (Shannon entropy) 56 6.2.1 個體資料 (Abundance data) 57 6.2.2 區塊資料 (Incidence data) 61 6.3 第二部分、互資訊 (Mutual Information) 65 第七章 結論與後續研究 71 附錄 74 附錄A: 五種實驗下的模擬結果 74 附錄B: 五種實驗下偏差情形與均方根誤差之表現 81 參考文獻 84

    [1] Blüthgen, N., N. E. Stork, and K. Fiedler. (2004). Bottom-up control and co-occurrence in complex communities: honeydew and nectar determine a rainforest ant mosaic. Oikos, 106, 344-358.
    [2] Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, 11, 265-270.
    [3] Chao, A. (1987). Estimating the population size for capture-recapture data with unequal catchability. Biometrics, 43, 783-791.
    [4] Chao, A. and Shen, T.-J. (2003). Nonparametric Estimation of Shannon’s index of diversity when there are unseen species. Environmental and Ecological Statistics, 10, 429-443.
    [5] Chao, A. and Jost. L. (2012). Coverage-based rarefaction: standardizing samples by completeness rather than by size. Ecology, 93, 2533-2547.
    [6] Chao, A., Wang, Y. T. and Jost, L. (2013). Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Methods in Ecology and Evolution, 4, 1091-1100.
    [7] Chao, A., Lee, Y.-H. and Tseng, K.-S. (2014). ChaoEntropy Online.
    [8] Cover, T. M., & Thomas, J. A. (1991). Entropy, relative entropy and mutual information. Elements of Information Theory, 12-49.
    [9] Darbellay, G. A. (1999). An estimator of the mutual information based on a criterion for conditional independence. Computational Statistics & Data Analysis, 32(1), 1-17.
    [10] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1-26.
    [11] Good, I.J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237-264.
    [12] Good, I.J. (2000). Turing's anticipation of empirical Bayes in connection with the cryptanalysis of the naval enigma. Journal of Statistical Computation and Simulation, 66, 101-111.
    [13] Haghighat, M. B. A., Aghagolzadeh, A., & Seyedarabi, H. (2011). A non-reference image fusion metric based on mutual information of image features. Computers & Electrical Engineering, 37(5), 744-756.
    [14] Horvitz, D.G. & Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663-685.
    [15] Khan, S., Bandyopadhyay, S., Ganguly, A. R., Saigal, S., Erickson III, D. J., Protopopescu, V., & Ostrouchov, G. (2007). Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Physical Review E, 76(2), 026209.
    [16] Margolin, A. A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Favera, R. D., & Califano, A. (2006). ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC bioinformatics, 7(Suppl 1), S7.
    [17] Miller, G.A. (1955). Note on the bias of information estimates. Information Theory in Psychology, 95-100.
    [18] Paninski, L. (2003). Estimation of entropy and mutual information. Neural Computation, 15, 1191-1253.
    [19] Panzeri, S., Senatore, R., Montemurro, M. A., & Petersen, R. S. (2007). Correcting for the sampling bias problem in spike train information measures. Journal of neurophysiology, 98(3), 1064-1072.
    [20] Quenouille M. (1949). Approximate tests of correlation in time series. Journal of the Royal Statistical Society, 11, 68-84.
    [21] Schleuning, M., Blüthgen, N., Flörchinger, M., Braun, J., Schaefer, H. M., & Böhning-Gaese, K. (2011). Specialization and interaction strength in a tropical plant-frugivore network differ among forest strata. Ecology, 92(1), 26-36.
    [22] Shannon, C.E. (1948). The mathematical theory of communication. Bell System Technical Journal, 27, 379-423.
    [23] Suzuki, T., Sugiyama, M., Sese, J., & Kanamori, T. (2008). Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation. Journal of Machine Learning Research-Proceedings Track, 4, 5-20.
    [24] Treves, A., & Panzeri, S. (1995). The upward bias in measures of information derived from limited data samples. Neural Computation, 7, 399-407.
    [25] Tukey J. (1958). Bias and confidence in not quite large samples. Annals of Mathematical Statistics, 29, 6.
    [26] Wells III, W. M., Viola, P., Atsumi, H., Nakajima, S., & Kikinis, R. (1996). Multi-modal volume registration by maximization of mutual information. Medical image analysis, 1(1), 35-51.
    [27] Wilson, E. O. (2002). The Future of Life.
    [28] Zahl, S. (1977). Jackknifing an index of diversity. Ecology, 58, 907-913.
    [29] 趙蓮菊, 邱春火, 王怡婷, 謝宗震, 馬光輝 (2013). 仰觀宇宙之大, 俯察品類之盛:如何量化生物多樣性. Journal of the Chinese Statistical Association, 51, 8-53.
    [30] 徐源泰 (1999). 生物多樣性, 生物技術與生物產業. 1999年生物多樣性論文集, 145-158.
    [31] 周三童 (民 102). 區塊抽樣之熵指標估計 趙蓮菊指導 新竹市國立清華大學統計學研究所碩士論文.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE