簡易檢索 / 詳目顯示

研究生: 樊信宏
Fan, Hsin-Hung
論文名稱: 運用機器學習模型分析股票的預測報酬
An Analysis on Forecasting Stock Returns Using Machine Learning Models
指導教授: 楊睿中
Yang, Jui-Chung
口試委員: 李宜
Lee, Yi
區俊傑
Ao, Chon-Kit
學位類別: 碩士
Master
系所名稱: 科技管理學院 - 經濟學系
Department of Economics
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 28
中文關鍵詞: 逐步多重SPA檢定法資料窺探機器學習技術分析標準普爾500指數
外文關鍵詞: Step-SPA, Machine Learning, Data Snooping, Technical Analysis, S&P500
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本文透過不同的機器學習演算法和各種類型的技術指標試圖找出股票市場中最佳的預測模型,再利用預測結果決策出相對應的投資策略。為了避免資料窺探 (Data Snooping) 所造成的推論偏差,我們採用 Hsu, Hsu, and Kuan (2008) 所提出的逐步多重 SPA 檢定法 (Step-SPA) 來檢定投資策略報酬的顯著性。根據 2019 年的標準普爾500指數 (S&P500),我們發現在顯著水準 5 % 下,沒有任何一組投資策略具有顯著的正報酬。當放寬顯著水準為 10 % 後,便能找到一組投資策略具有顯著正報酬,該策略為一組隨機森林其透過 K 折交叉驗證方式篩選每次隨機抽樣所需的變數個數,再透過該模型生成預測報酬率,若預測報酬率超過0.001則買進或低於–0.001則賣出,再無條件重複該交易訊號延長2天,透過上述所形成的投資策略。


    In this paper, we search for the best prediction model through different machine learning algorithms and various types of technical indicators, and use the prediction results to select the corresponding investment strategy. In order to avoid inference bias caused by data snooping, the Step-SPA method proposed by Hsu, Hsu, and Kuan (2008) is used to identify the investment strategies with significantly positive returns. The empirical results show that, for the S&P500 index in 2019, there is no strategy has a positive return at a significance level of 5 %. When the significance level is relaxed to 10 %, only one group of investment strategies has a significant positive return. The strategy is the random forest with the number of variables randomly sampled as candidates at each split selected by the K-fold cross-validation. Then through the model to generate a predicted rate of return, if the forecast rate of return exceeds 0.001 buy or below –0.001 then sell, and then unconditionally repeat the transaction signal extended for 2 days, through the investment strategy formed above.

    表目錄......................................................6 1 緒論......................................................7 2 文獻回顧..................................................8 3 檢定方法.................................................10 4 整合技術指標方法介紹......................................13 4.1 Lasso Regression and Ridge Regression.........14 4.2 主成分迴歸法 (PCR) 和 最小平方迴歸法 (PLSR).......15 4.3 迴歸樹 (Regression Tree)........................15 4.4 隨機森林 (Random Forest).........................16 4.5 梯度提升技術 (Gradient Boosting).................17 4.6 類神經網路 (Neural Network)......................17 5 技術指標模型的投資策略績效測量...............................19 6 實證結果...................................................20 6.1 實證資料.........................................20 6.2 SPA 檢定法和 Step-SPA 檢定法檢定結果.............22 6.3 嘗試其他參數設定..................................24 7 結論......................................................26

    Allaire, J. and F. Chollet (2021) “ keras: R Interface to 'Keras',”R package version 2.4.0.
    Breiman, L. , J. Friedman, R. Olshen, and C. Stone (1984) “Classification and Regression Trees,” Chapman and Hall , Wadsworth , New York.
    Breiman, L. (2001) “Random Forests,” Machine Learning , 45 , 5-32
    Efron, B. (1979) “Bootstrap Methods: Another Look at the Jackknife,” The Annals of Statistics , 7 , 1-26
    Friedman, J. H. (2001) “Greedy function approximation: A gradient boosting machine,” The Annals of Statistics , 29 , 1189-1232
    Friedman, J. , T. Hastie, R. Tibshirani (2010) “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal ofStatistical Software , 33 , 1-22.
    Greenwell, B. , B. Boehmke, J. Cunningham and GBM Developers (2020) “ gbm: Generalized Boosted Regression Models,” R package version2.1.8
    Hansen, P. R (2005) “A Test for Superior Predictive Ability,” Journal of Business & Economic Statistics , 23, 365-380
    Hoerl, A. E. and R. W. Kennard (1970) “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” Technometrics , 12 , 55-67.
    Hsu, P.- H. and C.- M. Kuan (2005) “Reexamining the Profitability of Technical Analysis with Data Snooping Checks,” Journal of Financial Econometrics , 3, 606-628
    Hsu, P.- H. , Y.- C. Hsu , C.- M. Kuan (2008) “Testing the predictive ability of technical analysis using a new stepwise test without data snooping bias,” Journal of Empirical Finance , 17 , 471-484
    Huang, J.-Z. and W. Huang, J. Ni (2019) “Predicting bitcoin returns using high-dimensional technical indicators,” The Journal of Finance and Data Science , 5 , 140-155
    Kuan, C.-M. (2008) "Artificial neural networks," in New Palgrave Dictionary of Economics , S. N. Durlauf and L. E. Blume (eds.), Palgrave Macmillan.
    Liaw, A. and M. Wiener (2002) “ Classification and Regression by randomForest,” R News , 2 , 18--22.
    Mevik,B.-H. , R. Wehrens, and K. H. Liland (2020) “pls : Partial Least Squares and Principal Component Regression,” R package version 2.7-3.
    Newey, W. K. and K. D. West (1987) “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica , 55 ,703-708
    Politis, D. N. and J. P. Romano (1994) “The Stationary Bootstrap,” Journal of the American Statistical Association , 89 , 1303-1313
    Ripley, B. (2019) “ tree: Classification and Regression Trees,” R package version 1.0-40
    Romano, P. J, and M. Wolf (2005) “Stepwise Multiple Testing as Formalized Data Snooping,” Econometrica , 73 , 1237-1282
    Sullivan, R. , A. Timmermann, and H. White (1999) “Data-Snooping, Technical Trading Rule Performance, and the Bootstrap,” Journal of Econometrics , 105, 249-286
    Tibshirani, R. (1996) “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society. Series B , 58 , 267-288
    White, H. (2000). “A Reality Check for Data Snooping,” Econometrica , 68 , 1097–1126

    QR CODE