基於深度強化學習的交易代理人於逐筆交易資料之應用

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉宇超 Liu, Yu-Tsao
論文名稱：	基於深度強化學習的交易代理人於逐筆交易資料之應用 Trading Agents Based on Deep Reinforcement Learning with Tick Data
指導教授：	馬席彬 Ma, Hsi-Pin
口試委員:	翁詠祿 Ueng, Yeong-Luh 孫宏民 Sun, Hung-Min 楊家驤 Yang, Chia-Hsiang
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2022
畢業學年度：	111
語文別：	中文
論文頁數：	61
中文關鍵詞：	深度強化學習、近端策略優化演算法、自動化交易、演算法交易、逐筆交易資訊、臺股期貨
外文關鍵詞：	Deep reinforcement learning, Proximal policy optimization, Automatic trading, Algorithmic trading, Tick data, TAIEX Futures
相關次數：	點閱：88 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在機器學習於金融領域的應用中，目前較多的實驗是以每一段期間內的成交資訊作為機器學習的輸入特徵，如開盤價、收盤價以及成交量......等等。機器學習可用於預測未來市場的趨勢，並結合交易策略對歷史資料進行交易回測(Backtest)。在回測效能評估大致上可以分為收益與風險這兩個種類，模型訓練的目標即是將收益最大化以及最小化交易期間所承受的風險。
在本文中使用含有逐筆交易訊息的限價委託簿(Limit Order Book)作為深度強化學習模型的訓練資料，我使用了三個種類的特徵來描述當前交易市場的狀態作為模型的訓練特徵，分別為限價委託簿的當前狀態、限價委託簿的動態變化以及模型的帳戶狀態，經由實驗挑選出最適合用於訓練模型的特徵集。在獎勵函數(Reward Function)的設計上，我的設計目標是基於高頻交易(High Frequency Trading)的概念，在每一次的買賣間賺取小額獲利，並藉由較高的勝率以及頻繁的交易來累積獲利。設計方式是藉由在過短時間內進行平倉的行為給予懲罰，同時對於能夠賺取小額獲利的行為給予獎勵。本篇論文所使用的近端策略優化演算法(Proximal Policy Optimization, PPO)是一種演員評論家(Actor-Critic)演算法，藉由建構兩個不同的神經網路來對模型進行訓練。
本篇論文討論了模型在各種不同的情況下獲利相關指標的變化。我使用西元2022年4月26日至西元2022年6月15日的臺股期貨逐筆交易資料對模型進行訓練與回測。在考量交易經手費、結算手續費以及期貨交易稅這三個交易成本的情況下，回測期間表現最好的模型勝率可達59.55%，獲利因子可達1.62。

In the application of machine learning in the financial field, many experiments use the transaction information during a period of time as the input feature of machine learning, such as opening price, closing price, and transaction volume. Machine learning can be used to predict future market trends and combine trading strategies to backtest on historical data. The backtest performance evaluation can be roughly divided into profit and risk. The goal of training model is to maximize the profit and minimize the risk during the transaction.
In this paper, the limit order book containing tick by tick transaction information has been used as the training data for the deep reinforcement learning model. I have used three types of features to describe the current trading market state as the training features of the model. It’s the current state of limit order book, the dynamic changes of limit order book, and the account status of the deep reinforcement learning model. I have selected the most suitable feature sets for training model by doing experiment. I have designed the reward function based on the concept of high frequency trading, accumulating profit by trading frequently and high win rate. Giving negative reward for closing a position too fast and giving positive reward for earning profit. I used proximal policy optimization algorithm in my research, and constructing two different neural network to train the model.
I have discussed the profit-related indicators under different conditions. I used TAIEX tick data during 26/4/2022 to 6/15/2022 to train and test the model. Considering brokerage fee, settlement fee and transfer tax, the best-performing model reach 59.55% win rate, and the profit factor reach 1.62 during the backtest period.

摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III
Abstract.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . V
第一章緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 論文大綱. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
第二章背景知識與文獻回顧. . . . . . . . . . . . . . . . . . . . . . . 5
1 強化學習. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 強化學習概述. . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 深度強化學習. . . . . . . . . . . . . . . . . . . . . . . . . 7
2 近端策略優化演算法(Proximal Policy Optimization) . . . . . . . . 7
3 金融市場數據. . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 蠟燭圖. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 技術指標. . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 風險指標. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1 夏普比率(Sharpe Ratio) . . . . . . . . . . . . . . . . . . . 13
4.2 索提諾比率(Sortino Ratio) . . . . . . . . . . . . . . . . . . 13
4.3 風險價值(Value at Risk) . . . . . . . . . . . . . . . . . . . 13
4.4 動盪指數(Turbulence Index) . . . . . . . . . . . . . . . . . 14
4.5 最大回徹(Maximum Drawdown) . . . . . . . . . . . . . . . . . 14
5 文獻回顧與分析. . . . . . . . . . . . . . . . . . . . . . . . . 14
第三章研究方法. . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1 資料說明. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.1 限價委託簿. . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2 資料集概述. . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 資料處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1 資料校正. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 資料清理. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 資料正歸化. . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 限價委託簿動態特徵. . . . . . . . . . . . . . . . . . . . . . 25
3 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 實驗方法. . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1 交易環境. . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 模型架構. . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 損益計算. . . . . . . . . . . . . . . . . . . . . . . . . . 33
第四章實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . 35
1 效能評估指標. . . . . . . . . . . . . . . . . . . . . . . . . . 35
2 比較基準. . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 各項狀態空間實驗結果比較. . . . . . . . . . . . . . . . . . . . 39
3.1 超參數設定. . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 不同初始化條件的實驗結果. . . . . . . . . . . . . . . . . . . . 46
5 訓練資料集對於模型效能的影響. . . . . . . . . . . . . . . . . . 51
6 大盤走勢對模型效能的影響. . . . . . . . . . . . . . . . . . . . 52
7 不同演算法的模型效能比較. . . . . . . . . . . . . . . . . . . . 53
8 與其他論文的效能比較. . . . . . . . . . . . . . . . . . . . . . 54
第五章結論與未來展望. . . . . . . . . . . . . . . . . . . . . . . . 57
1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . 57
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

                                

[1] W. Huang, Y. Nakamori, and S.-Y. Wang, “Forecasting stock market movement direction with support vector machine,” Computers & Operations Research, vol. 32, no. 10, pp. 2513–2522, 2005.
[2] S. Shen, H. Jiang, and T. Zhang, “Stock market forecasting using machine learning algorithms,” Department of Electrical Engineering, Stanford University, Stanford, CA, pp. 1–5, 2012.
[3] G. Bontempi, S. Ben Taieb, and Y.-A. L. Borgne, “Machine learning strategies for time series forecasting,” in Proc. Eur. Bus. Intelligence Summer School. Springer, 2012, pp. 62–77.
[4] X. Gao, “Deep reinforcement learning for time series: playing idealized trading games,” arXiv preprint arXiv:1803.03916, 2018.
[5] F. E. Tay and L. Cao, “Application of support vector machines in financial time series forecasting,” omega, vol. 29, no. 4, pp. 309–317, 2001.
[6] S. Selvin, R. Vinayakumar, E. A. Gopalakrishnan, V. K. Menon, and K. P. Soman, “Stock price prediction using lstm, rnn and cnn-sliding window model,” in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017, pp. 1643–1647.
[7] H. Yang, X.-Y. Liu, S. Zhong, and A. Walid, “Deep reinforcement learning for automated stock trading: An ensemble strategy,” in Proceedings of the First ACM International Conference on AI in Finance, 2020, pp. 1–8.
[8] J. Brogaard et al., “High frequency trading and its impact on market quality,” Northwestern University Kellogg School of Management Working Paper, vol. 66, 2010.
[9] A. Gerig, “High-frequency trading synchronizes prices in financial markets,” arXiv preprint arXiv:1211.1919, 2012.
[10] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “A brief survey of deep reinforcement learning,” arXiv preprint arXiv:1708.05866, 2017.
[11] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1-2, pp. 99–134, 1998.
[12] Y. Li, “Deep reinforcement learning: An overview,” arXiv preprint arXiv:1701.07274, 2017.
[13] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[14] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[15] S. Kakade and J. Langford, “Approximately optimal approximate reinforcement learning,” in In Proc. 19th International Conference on Machine Learning. Citeseer, 2002.
[16] J. Ayala, M. García-Torres, J. L. V. Noguera, F. Gómez-Vela, and F. Divina, “Technical analysis strategy optimization using a machine learning approach in stock market indices,” Knowledge-Based Systems, vol. 225, p. 107119, 2021.
[17] S. Gumparthi, “Relative strength index for developing effective trading strategies in constructing optimal portfolio,” International Journal of Applied Engineering Research, vol. 12, no. 19, pp. 8926–8936, 2017.
[18] J. W. Wilder, New concepts in technical trading systems. Trend Research, 1978.
[19] F. Johnston, J. Boyland, M. Meadows, and E. Shale, “Some properties of a simple moving average when applied to forecasting a time series,” Journal of the Operational Research Society, vol. 50, no. 12, pp. 1267–1271, 1999.
[20] M. Barlam and A. M. Prasad, “Evaluating stock performance using technical analysis: A case study of tcs ltd.” IUP Journal of Accounting Research & Audit Practices, vol. 20, no. 1, pp. 7–14, 2021.
[21] W. F. Sharpe, “The sharpe ratio,” Streetwise–the Best of the Journal of Portfolio Management, pp. 169–185, 1998.
[22] F. A. Sortino and L. N. Price, “Performance measurement in a downside risk framework,” the Journal of Investing, vol. 3, no. 3, pp. 59–64, 1994.
[23] P. Yu, J. S. Lee, I. Kulyatin, Z. Shi, and S. Dasgupta, “Model-based deep reinforcement learning for dynamic portfolio optimization,” arXiv preprint arXiv:1901.08740, 2019.
[24] M. Kritzman and Y. Li, “Skulls, financial turbulence, and risk management,” Financial Analysts Journal, vol. 66, no. 5, pp. 30–41, 2010.
[25] G. Lucarelli and M. Borrotti, “A deep reinforcement learning approach for automated cryptocurrency trading,” in IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, 2019, pp. 247–258.
[26] T. Théate and D. Ernst, “An application of deep reinforcement learning to algorithmic trading,” Expert Systems with Applications, vol. 173, p. 114632, 2021.
[27] J. Sadighian, “Extending deep reinforcement learning frameworks in cryptocurrency market making,” arXiv preprint arXiv:2004.06985, 2020.
[28] J. E. Moody, M. Saffell, Y. Liao, and L. Wu, “Reinforcement learning for trading systems and portfolios.” in KDD, 1998, pp. 279–283.
[29] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
[30] L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning et al., “Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures,” in International Conference on Machine Learning. PMLR, 2018, pp. 1407–1416.
[31] K. Cobbe, O. Klimov, C. Hesse, T. Kim, and J. Schulman, “Quantifying generalization in reinforcement learning,” in International Conference on Machine Learning. PMLR, 2019, pp. 1282–1289.
[32] A. Briola, J. Turiel, R. Marcaccioli, and T. Aste, “Deep reinforcement learning for active high frequency trading,” arXiv preprint arXiv:2101.07107, 2021.
[33] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning. PMLR, 2015, pp. 1889–1897.
[34] P. Oncharoen and P. Vateekul, “Deep learning using risk-reward function for stock market prediction,” in Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence, 2018, pp. 556–561.

簡易檢索 / 詳目顯示

相關論文