研究生: |
劉宇超 Liu, Yu-Tsao |
---|---|
論文名稱: |
基於深度強化學習的交易代理人於逐筆交易資料之應用 Trading Agents Based on Deep Reinforcement Learning with Tick Data |
指導教授: |
馬席彬
Ma, Hsi-Pin |
口試委員: |
翁詠祿
Ueng, Yeong-Luh 孫宏民 Sun, Hung-Min 楊家驤 Yang, Chia-Hsiang |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 61 |
中文關鍵詞: | 深度強化學習 、近端策略優化演算法 、自動化交易 、演算法交易 、逐筆交易資訊 、臺股期貨 |
外文關鍵詞: | Deep reinforcement learning, Proximal policy optimization, Automatic trading, Algorithmic trading, Tick data, TAIEX Futures |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在機器學習於金融領域的應用中,目前較多的實驗是以每一段期間內的成交資訊作為機器學習的輸入特徵,如開盤價、收盤價以及成交量......等等。機器學習可用於預測未來市場的趨勢,並結合交易策略對歷史資料進行交易回測(Backtest)。在回測效能評估大致上可以分為收益與風險這兩個種類,模型訓練的目標即是將收益最大化以及最小化交易期間所承受的風險。
在本文中使用含有逐筆交易訊息的限價委託簿(Limit Order Book)作為深度強化學習模型的訓練資料,我使用了三個種類的特徵來描述當前交易市場的狀態作為模型的訓練特徵,分別為限價委託簿的當前狀態、限價委託簿的動態變化以及模型的帳戶狀態,經由實驗挑選出最適合用於訓練模型的特徵集。在獎勵函數(Reward Function)的設計上,我的設計目標是基於高頻交易(High Frequency Trading)的概念,在每一次的買賣間賺取小額獲利,並藉由較高的勝率以及頻繁的交易來累積獲利。設計方式是藉由在過短時間內進行平倉的行為給予懲罰,同時對於能夠賺取小額獲利的行為給予獎勵。本篇論文所使用的近端策略優化演算法(Proximal Policy Optimization, PPO)是一種演員評論家(Actor-Critic)演算法,藉由建構兩個不同的神經網路來對模型進行訓練。
本篇論文討論了模型在各種不同的情況下獲利相關指標的變化。我使用西元2022年4月26日至西元2022年6月15日的臺股期貨逐筆交易資料對模型進行訓練與回測。在考量交易經手費、結算手續費以及期貨交易稅這三個交易成本的情況下,回測期間表現最好的模型勝率可達59.55%,獲利因子可達1.62。
In the application of machine learning in the financial field, many experiments use the transaction information during a period of time as the input feature of machine learning, such as opening price, closing price, and transaction volume. Machine learning can be used to predict future market trends and combine trading strategies to backtest on historical data. The backtest performance evaluation can be roughly divided into profit and risk. The goal of training model is to maximize the profit and minimize the risk during the transaction.
In this paper, the limit order book containing tick by tick transaction information has been used as the training data for the deep reinforcement learning model. I have used three types of features to describe the current trading market state as the training features of the model. It’s the current state of limit order book, the dynamic changes of limit order book, and the account status of the deep reinforcement learning model. I have selected the most suitable feature sets for training model by doing experiment. I have designed the reward function based on the concept of high frequency trading, accumulating profit by trading frequently and high win rate. Giving negative reward for closing a position too fast and giving positive reward for earning profit. I used proximal policy optimization algorithm in my research, and constructing two different neural network to train the model.
I have discussed the profit-related indicators under different conditions. I used TAIEX tick data during 26/4/2022 to 6/15/2022 to train and test the model. Considering brokerage fee, settlement fee and transfer tax, the best-performing model reach 59.55% win rate, and the profit factor reach 1.62 during the backtest period.
[1] W. Huang, Y. Nakamori, and S.-Y. Wang, “Forecasting stock market movement direction with support vector machine,” Computers & Operations Research, vol. 32, no. 10, pp. 2513–2522, 2005.
[2] S. Shen, H. Jiang, and T. Zhang, “Stock market forecasting using machine learning algorithms,” Department of Electrical Engineering, Stanford University, Stanford, CA, pp. 1–5, 2012.
[3] G. Bontempi, S. Ben Taieb, and Y.-A. L. Borgne, “Machine learning strategies for time series forecasting,” in Proc. Eur. Bus. Intelligence Summer School. Springer, 2012, pp. 62–77.
[4] X. Gao, “Deep reinforcement learning for time series: playing idealized trading games,” arXiv preprint arXiv:1803.03916, 2018.
[5] F. E. Tay and L. Cao, “Application of support vector machines in financial time series forecasting,” omega, vol. 29, no. 4, pp. 309–317, 2001.
[6] S. Selvin, R. Vinayakumar, E. A. Gopalakrishnan, V. K. Menon, and K. P. Soman, “Stock price prediction using lstm, rnn and cnn-sliding window model,” in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017, pp. 1643–1647.
[7] H. Yang, X.-Y. Liu, S. Zhong, and A. Walid, “Deep reinforcement learning for automated stock trading: An ensemble strategy,” in Proceedings of the First ACM International Conference on AI in Finance, 2020, pp. 1–8.
[8] J. Brogaard et al., “High frequency trading and its impact on market quality,” Northwestern University Kellogg School of Management Working Paper, vol. 66, 2010.
[9] A. Gerig, “High-frequency trading synchronizes prices in financial markets,” arXiv preprint arXiv:1211.1919, 2012.
[10] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “A brief survey of deep reinforcement learning,” arXiv preprint arXiv:1708.05866, 2017.
[11] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1-2, pp. 99–134, 1998.
[12] Y. Li, “Deep reinforcement learning: An overview,” arXiv preprint arXiv:1701.07274, 2017.
[13] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[14] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[15] S. Kakade and J. Langford, “Approximately optimal approximate reinforcement learning,” in In Proc. 19th International Conference on Machine Learning. Citeseer, 2002.
[16] J. Ayala, M. García-Torres, J. L. V. Noguera, F. Gómez-Vela, and F. Divina, “Technical analysis strategy optimization using a machine learning approach in stock market indices,” Knowledge-Based Systems, vol. 225, p. 107119, 2021.
[17] S. Gumparthi, “Relative strength index for developing effective trading strategies in constructing optimal portfolio,” International Journal of Applied Engineering Research, vol. 12, no. 19, pp. 8926–8936, 2017.
[18] J. W. Wilder, New concepts in technical trading systems. Trend Research, 1978.
[19] F. Johnston, J. Boyland, M. Meadows, and E. Shale, “Some properties of a simple moving average when applied to forecasting a time series,” Journal of the Operational Research Society, vol. 50, no. 12, pp. 1267–1271, 1999.
[20] M. Barlam and A. M. Prasad, “Evaluating stock performance using technical analysis: A case study of tcs ltd.” IUP Journal of Accounting Research & Audit Practices, vol. 20, no. 1, pp. 7–14, 2021.
[21] W. F. Sharpe, “The sharpe ratio,” Streetwise–the Best of the Journal of Portfolio Management, pp. 169–185, 1998.
[22] F. A. Sortino and L. N. Price, “Performance measurement in a downside risk framework,” the Journal of Investing, vol. 3, no. 3, pp. 59–64, 1994.
[23] P. Yu, J. S. Lee, I. Kulyatin, Z. Shi, and S. Dasgupta, “Model-based deep reinforcement learning for dynamic portfolio optimization,” arXiv preprint arXiv:1901.08740, 2019.
[24] M. Kritzman and Y. Li, “Skulls, financial turbulence, and risk management,” Financial Analysts Journal, vol. 66, no. 5, pp. 30–41, 2010.
[25] G. Lucarelli and M. Borrotti, “A deep reinforcement learning approach for automated cryptocurrency trading,” in IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, 2019, pp. 247–258.
[26] T. Théate and D. Ernst, “An application of deep reinforcement learning to algorithmic trading,” Expert Systems with Applications, vol. 173, p. 114632, 2021.
[27] J. Sadighian, “Extending deep reinforcement learning frameworks in cryptocurrency market making,” arXiv preprint arXiv:2004.06985, 2020.
[28] J. E. Moody, M. Saffell, Y. Liao, and L. Wu, “Reinforcement learning for trading systems and portfolios.” in KDD, 1998, pp. 279–283.
[29] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
[30] L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning et al., “Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures,” in International Conference on Machine Learning. PMLR, 2018, pp. 1407–1416.
[31] K. Cobbe, O. Klimov, C. Hesse, T. Kim, and J. Schulman, “Quantifying generalization in reinforcement learning,” in International Conference on Machine Learning. PMLR, 2019, pp. 1282–1289.
[32] A. Briola, J. Turiel, R. Marcaccioli, and T. Aste, “Deep reinforcement learning for active high frequency trading,” arXiv preprint arXiv:2101.07107, 2021.
[33] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning. PMLR, 2015, pp. 1889–1897.
[34] P. Oncharoen and P. Vateekul, “Deep learning using risk-reward function for stock market prediction,” in Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence, 2018, pp. 556–561.