研究生: |
李宗霖 Li, Zong-Lin |
---|---|
論文名稱: |
網路巨量資料應用於臺灣GDP之最適建模方法-以Google Trends搜尋指數為例 Modeling Taiwan's GDP Using Network Big Data: Evidence from Google Trends Search Volume |
指導教授: |
莊慧玲
Chuang, Hwei-Lin 黃禮珊 Huang, Li-Shan |
口試委員: |
黃朝熙
Huang, Chao-Hsi 孫立憲 Sun, Li-Hsien |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 68 |
中文關鍵詞: | 國內生產毛額 、Google搜尋趨勢 、時間序列分析 、模式選取 、巨量資料 |
外文關鍵詞: | Gross domestic product, Google Trends, Time series analysis, Model selection, Big data |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
經濟指標為政府機構在經過原始資料收集與統計歸納,並以特定算式運算後所定期公佈的數據,其可用於描述該經濟體當前的社會經濟現象或是預測未來經濟發展。但Choi & Varian (2012)提到各國政府的經濟數據皆存在明顯的延遲發佈現象,此是由於多數的總體經濟數據其組成要素龐大且複雜,使得統計結果不易於短期內確定且資料的時間尺度無法再更精細,但此對於需要考量臨近經濟數據的國家政策或相關決策而言皆是相當大的不確定性。
因此本文以臺灣之國內生產毛額(Gross Domestic Product, GDP)作為反應變數,研究期間為2004年第一季至2021年第二季,藉由時間序列分析方法探究政府公佈的總體經濟數據與Google Trends搜尋指數,此兩種樣本調查方式不同的資料,並以六項估計誤差指標量化其各自對於估計與預測臺灣GDP而言所具有的資訊價值,其中將藉由四個步驟的建模流程以建構資料變數組合不盡相同的模型,以期觀察不同模型的估計和預測表現之優劣規律,進而提出對於特定資料類型與估計指標的最適建模方式與樣本資料使用建議。
實證結果顯示,對於估計和預測臺灣GDP而言,不預先對變數進行篩選,意即使用所有資料變數是在多數估計指標中獲得最佳表現的建模作法,而不預先考慮資料變數間之相關性強弱,但有關注多重共線性問題,且不使用基礎模型變數的建模方式,則相對容易獲得適合度較低的模型。此外,僅使用Google Trends資料的模型在MAE和RMSE指標上,相較於基礎模型可分別具有13.44%和12.17%的降幅改善,因此本文認為Google Trends搜尋指數可作為改善模型適合度和提供即時訊息的有效資料來源。
Economic indicators are the information regularly published by a government agency after collecting the original data and calculating with a specific formula, which can describe the economy’s current social and economic phenomena or predict future economic development. However, Choi & Varian (2012) indicated a noticeable delay in releasing economic data from various governments. It is because most of the macroeconomic data contents are enormous and complex. Therefore, it is very challenging to determine the statistical results in the short term and then causes the time scale of data can not be more refined. This will be a considerable uncertainty for making national policies or related decisions without considering the latest economic data.
Therefore, this thesis uses the gross domestic product (GDP) of Taiwan as a response variable, and the research period is from the first quarter of 2004 to the second quarter of 2021. We will study the macroeconomic data released by the government and the Google Trends search volume index through the time series analysis method. A four-step modeling process will be used to construct models with different combinations of data types and variables by quantifying their information value for estimating and forecasting Taiwan’s GDP based on six estimation error criteria. Then we can proffer the optimum modeling approach and sample data usage suggestions for specific data types and estimation indicators.
The empirical results show that for estimating and forecasting Taiwan's GDP, without variable pre-screening, which means modeling with all data variables, can achieve the best performance in most estimation indicators. And without pre-considering the correlation between data variables, but concerning the multicollinearity problem and not modeling with any variable of the basic model, then the goodness of fit of model would most possibly be the lowest. In addition, modeling only by Google Trends data can have 13.44% lower in MAE and 12.17% lower in RMSE, compared with the basic model. Therefore, we confirm the Google Trends search volume index to be adequate data to provide instant information and improve the goodness of fit for estimating and forecasting Taiwan’s GDP.
[1] Akaike, Hirotogu (1973), “Information Theory and an Extension of the Maximum Likelihood Principle”, In B. N. Petrov, & F. Csaki (Eds.), Proceedings of the 2nd International Symposium on Information Theory (pp. 267-281). Budapest: Akademiai Kiado.
[2] Araz, Ozgur M., Dan Bentley & Robert L Muelleman (2014), “Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska”, Am J Emerg Med. 2014 Sep;32(9):1016-23.
[3] Askitas, Nikos & Klaus F. Zimmermann (2009), “Google econometrics and unemployment forecasting”, Applied Economics Quarterly, 55, 107-120.
[4] Constant, Amelie F. & Klaus F. Zimmermann (2008), “Measuring Ethnic Identity and Its Impact on Economic Behavior”, Journal of the European Economic Association, 2008, 6 (2-3), 424-433.
[5] Choi, Hyunyoung & Hal Varian (2012), “Predicting the Present with Google Trends”, The Economic Record, 2012, vol. 88, issue s1, 2-9.
[6] Dickey, David A. & Wayne A. Fuller (1979), “Distribution of the Estimators for Autoregressive Time Series With a Unit Root”, Journal of the American Statistical Association, Vol. 74, No. 366 (Jun., 1979), pp. 427-431.
[7] Divya, K. Hema & Rama Devi (2014), “A Study on Predictors of GDP Early Signals”, Procedia Economics and Finance Volume 11, 2014, Pages 375-382.
[8] Ginsberg, Jeremy, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski & Larry Brilliant (2009), “Detecting influenza epidemics using search engine query data”, Nature volume 457, pages1012–1014 (2009).
[9] Granger, C. W. J.(1969), “Investigating Causal Relations by Econometric Models and Cross-spectral Methods”, Econometrica Vol. 37, No. 3 (Aug., 1969), pp. 424-438.
[10] Granger, C.W.J. & P. Newbold (1974), “Spurious regressions in econometrics”, Journal of Econometrics, Volume 2, Issue 2, July 1974, Pages 111-120.
[11] Götz, Thomas & Thomas Knetsch (2017), “Google data in bridge equation models for German GDP”, Elsevier, vol. 35(1), pages 45-66.
[12] Narita, Futoshi & Rujun Yin (2018), “In Search of Information: Use of Google Trends’ Data to Narrow Information Gaps for Low-income Developing Countries”, IMF Working Papers 2018/286, International Monetary Fund.
[13] Nelson, Charles & Charles Plosser (1982), “Trends and random walks in macroeconmic time series”, Journal of Monetary Economics 10 (I 982) 139-162.
[14] Qian, Eric (2018), “Nowcasting Indian GDP with Google Search data”.
[15] Rakesh, Mohan (2006), “Causal Relationship Between Savings And Economic Growth In Countries With Different Income Levels”, Economics Bulletin, AccessEcon, vol. 5(3), pages 1-12.
[16] Ramos, Francisco F.Ribeiro (2001), “Exports, imports, and economic growth in Portugal: evidence from causality and cointegration analysis”, Economic Modelling, Elsevier, vol. 18(4), pages 613-623.
[17] Said, Said E. & David A. Dickey (1984), “Testing for Unit Roots in Autoregressive-Moving Average Models of Unknown Order”, Biometrika, Vol. 71, No. 3 (Dec., 1984), pp. 599-607.
[18] Santillana, Mauricio, André T. Nguyen, Mark Dredze, Michael J. Paul, Elaine O. Nsoesie & John S. Brownstein (2014), “Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance”, Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLoS Comput Biol 11(10): e1004513.
[19] Schwarz, Gideon (1978), “Estimating the Dimension of a Model”, Ann. Statist. 6 (2) 461 - 464, March, 1978.
[20] Shibata, Ritei (1976), “Selection of the order of an autoregressive model by Akaike's information criterion”, Biometrika, Volume 63, Issue 1, 1976, Pages 117–126.
[21] Simionescu, Mihaela & Klaus F. Zimmermann (2017), “Big Data and Unemployment Analysis”,GLO Discussion Paper Series 81, Global Labor Organization (GLO).
[22] Vosen, Simeon & Torsten Schmidt (2011), “Forecasting private consumption: survey-based indicators vs. Google trends”, Journal of Forecasting Volume 30, Issue 6 p. 565-578.