模擬為基之強化學習訓練資料探討: 以考量等候區容量限制之零工式排程問題為例

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳仕軒 Chen, Shih-Hsuan
論文名稱：	模擬為基之強化學習訓練資料探討: 以考量等候區容量限制之零工式排程問題為例 Training Data Investigation of Simulation-Based Reinforcement Learning Environment - Job Shop Scheduling Problem with limited buffer as an Example
指導教授：	林則孟 Lin, James T.
口試委員:	陳子立 Chen, Tzu-Li 林裕訓 Lin, Yu-Hsun
學位類別：	碩士 Master
系所名稱：	工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	133
中文關鍵詞：	深度強化學習、零工式排程問題、離散事件模擬、記憶回放策略、決策頻繁度
外文關鍵詞：	Deep reinforcement learning, Job shop scheduling problem, Discrete event simulation, Experience replay strategy, Decision frequency
相關次數：	點閱：290 下載：3
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本研究將深度強化學習應用至排程問題當中，並以考量等候區容量限制之
零工式排程問題為例，探討排程問題之訓練樣本生成，尤其針對「DQN」訓練
可能衍生之決策頻繁度與記憶回放策略進行探討。將強化學習應用至排程問題
中首先面臨的就是訓練資料的生成問題，由於代理人難以與實際的生產排程環
境互動，因此需要一個模擬器(Simulator)模擬排程過程中發生的各種動態事件，
本研究提出以離散事件模擬為基礎的強化學習「環境」建構方法，首先定義狀
態、動作、獎勵函數等強化學習要素，接著以系統塑模語言(SysML)做為系統
分析之工具，分析零工式生產排程系統之物件架構與行為，最終以模擬「環境」
互動框架使模擬模型轉化為「環境」，藉由模擬達成訓練樣本生成之目的。
強化學習問題中代理人決策的時機點(Decision epoch)，代表訓練資料的生
成間隔大小，為訓練樣本生成的一個重要議題。本研究於排程問題中考慮決策
頻繁度與決策事件之影響，提出模擬為基「DQN」演算法，透過本研究提出之
演算法可於訓練中嘗試不同決策事件，實驗結果驗證了決策頻繁度存在一個最
佳值，過多或是過少的決策次數都將影響代理人學習效果。本研究以「DQN」
為強化學習演算法並探討其中的記憶回放策略，考量傳統記憶回放策略以隨機
的方式由記憶庫中抽取批量訓練樣本，本研究提出分層抽樣之想法，透過分層
抽樣策略確保每個子集在樣本中有適當的代表性，實驗結果證明了透過決策頻
繁度與分層抽樣策略兩項改進，加速了代理人訓練之收斂速度，且在不同系統
負荷與交期緊縮情境下，「DQN」皆取得了超越單一派工法則的績效表現，顯
示代理人能依據系統狀態做出適當的決策之能力。

This study applies deep reinforcement learning to scheduling problems, focusing
on job shop scheduling problem with limited buffer. The study investigates the
generation of training samples for scheduling problems, particularly addressing the
issues of decision frequency and memory replay strategies in DQN. The application
of reinforcement learning to scheduling problems first faces the challenge of
generating training data. As the agent is unable to interact with the actual production
scheduling environment, a simulator is needed to simulate various dynamic events
that occur during the scheduling process. This study proposes a discrete event
simulation-based environment construction method for reinforcement learning. It
defines the elements of reinforcement learning, such as states, actions, and reward
functions, and utilizes the Systems Modeling Language (SysML) as a tool for system
analysis to analyze the job shop scheduling system.
The decision epoch, representing the timing of agent's decisions in reinforcement
learning problems. This study considers the impact of decision frequency and
decision events in scheduling problems and proposes the simulation-based DQN
algorithm. Through this algorithm, different decision events can be explored during
training. The experimental results confirm that there exists an optimal value for
decision frequency, and too many or too few decision events will affect the learning
effectiveness of the agent. This study proposes a stratified sampling approach to
ensure that each subset has appropriate representativeness in the samples. The
experimental results demonstrate that with the improvements of decision frequency
and stratified sampling strategy, the convergence speed of agent training is
accelerated. Furthermore, in different system load and due date tightness scenarios,
the DQN algorithm achieves performance surpassing that of a single dispatching rule,
iii
indicating the ability of the agent to make appropriate decisions based on the system
state.

摘要    i
致謝    iv
目錄    v
第1章    緒論    1
1    研究背景與動機    1
2    研究目的    4
第2章    文獻回顧    5
1    機器學習應用於排程問題    5
1.1    人工神經網路    5
1.2    強化學習    6
2    零工式排程問題類型    7
3    排程問題之強化學習要素設計    8
3.1    強化學習要素設計-動作(Action)    8
3.2    強化學習要素設計-狀態(state)    11
3.3    強化學習要素設計-獎勵(reward)    13
3.4    強化學習要素設計-決策頻繁度(Decision frequency)    16
4    模擬為基之強化學習「環境」    19
5    深度強化學習方法    21
5.1    訓練策略    23
5.2    神經網路架構    25
5.3    探索策略    26
5.4    記憶回放策略    28
第3章    以深度強化學習為基之動態排程概念    33
1    動態排程問題    33
2    以深度強化學習為基之動態排程    35
3    排程問題之「DQN」研究架構    39
第4章    強化學習分析-以零工式排程問題為例    45
1    零工式排程問題情境    45
2    強化學習排程案例分析    46
2.1    排程案例說明    46
2.2    強化學習要素設計範例    47
2.3    代理人環境互動分析    50
3    分析與發現    59
3.1    排程與馬可夫決策過程    59
3.2    深度強化學習於動態排程問題    61
3.3    強化學習於考量等候區容量限制之零工式排程問題    63
3.4    排程問題與強化學習議題    67
第5章    強化學習要素設計    69
1    強化學習要素設計-狀態(State)    69
2    強化學習要素設計-動作(Action)    73
3    強化學習要素設計-獎勵(Reward)    74
第6章    模擬為基之強化學習環境    77
1    馬可夫決策過程與離散事件模擬    77
2    模擬為基之「環境」    80
2.1    模擬為基之「環境」-以考量等候區之JSSP為例    81
2.2    系統分析階段    82
2.3    事件驅動之強化學習演算法    92
第7章    深度強化學習演算法-以DQN為例    100
1    深度強化學習演算法    100
2    輸入狀態編碼與神經網路架構    102
3    探索策略    104
4    記憶回放策略    105
第8章    實驗結果與分析    109
1    實驗設定    109
1.1    訓練與測試環境設定    109
1.2    實驗情境與目的    110
2    實驗一: 決策頻繁度    113
3    實驗二: 記憶回放策略    115
4    實驗三: 不同系統負荷、交期緊縮情境    119
5    實驗四: 穩定狀態模擬情境    120
5.1    穩定狀態決策頻繁度之影響    120
5.2    探討不同學習事件之影響    122
第9章    結論與建議    124
1    結論    124
2    未來研究方向    125
參考文獻    127


                                

1. 李宗霖,「模擬為基之深度強化學習於零工式生產排程」,清華大學工業工程
研究所碩士論文(2022)。
2. 徐孟維,「機器學習於無人搬運車系統之派車應用」,清華大學工業工程研究
所碩士論文(2018)。
3. 羅士銓,「應用離散事件模擬於強化學習之探討」,清華大學工業工程研究所
碩士論文(2020)。
4. 黃怡嘉,「強化學習於混合式流線生產排程問題之探討」,清華大學工業工程
研究所碩士論文(2020)。
5. Beasley, John E. "OR-Library: distributing test problems by electronic mail."
Journal of the operational research society 41.11 (1990): 1069-1072.
6. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J.,
& Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:1606.01540.
7. Bellemare, Marc G., Will Dabney, and Rémi Munos. "A distributional
perspective on reinforcement learning." International Conference on Machine
Learning. PMLR (2017).
8. Choe, Ri, Jeongmin Kim, and Kwang Ryel Ryu. "Online preference learning for
adaptive dispatching of AGVs in an automated container terminal." Applied Soft
Computing 38 (2016): 647-660.
9. Chen, S. Y., Yu, Y., Da, Q., Tan, J., Huang, H. K., & Tang, H. H. "Stabilizing
reinforcement learning in dynamic environment with application to online
recommendation. " Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining (2018): 1187-1196.
10. Duan, Y., Chen, X., Houthooft, R., Schulman, J., & Abbeel, P. "Benchmarking
deep reinforcement learning for continuous control. " International conference
128
on machine learning (2016): 1329-1338.
11. Fishman, George S. Discrete-event simulation: modeling, programming, and
analysis. Vol. 537. New York: Springer, 2001.
12. Garey, Michael R., David S. Johnson, and Ravi Sethi. "The complexity of
flowshop and jobshop scheduling." Mathematics of operations research 1.2
(1976): 117-129.
13. Gupta, A. K., and A. I. Sivakumar. "Optimization of due-date objectives in
scheduling semiconductor batch manufacturing." International Journal of
Machine Tools and Manufacture 46.12-13 (2006): 1671-1679.
14. Gao, K. Z., Suganthan, P. N., Chua, T. J., Chong, C. S., Cai, T. X., & Pan, Q. K.
"A two-stage artificial bee colony algorithm scheduling flexible job-shop
scheduling problem with new job insertion. " Expert systems with applications,
42.21. (2015): 7652-7663.
15. Grillo, Samuele, Antonio Pievatolo, and Enrico Tironi. "Optimal storage
scheduling using Markov decision processes." IEEE Transactions on Sustainable
Energy 7.2 (2015): 755-764.
16. Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. "Soft actor-critic: Off-policy
maximum entropy deep reinforcement learning with a stochastic actor. "
International conference on machine learning (2018): 1861-1870.
17. Han, Bao-An, and Jian-Jun Yang. "Research on adaptive job shop scheduling
problems based on dueling double DQN." IEEE Access 8 (2020): 186474-
186495.
18. Hu, Yujiao, Yuan Yao, and Wee Sun Lee. "A reinforcement learning approach for
optimizing multiple traveling salesman problems over graphs." KnowledgeBased Systems 204 (2020): 106244.
19. Jacek Błazewicz, Erwin Pesch, and Małgorzata Sterna. The disjunctive graph
129
machine representation of the job shop scheduling problem. European Journal of
Operational Research,127.2 (2000):317–331.
20. Kutanoglu, Erhan. "An analysis of heuristics in a dynamic job shop with
weighted tardiness objectives." International Journal of Production Research
37.1 (1999): 165-187.
21. Lei, K., Guo, P., Zhao, W., Wang, Y., Qian, L., Meng, X., & Tang, L. "A multiaction deep reinforcement learning framework for flexible Job-shop scheduling
problem. " Expert Systems with Applications 205 (2022): 117796.
22. Li, M. P., Sankaran, P., Kuhl, M. E., Ganguly, A., Kwasinski, A., & Ptucha, R.
"Simulation analysis of a deep reinforcement learning approach for task selection
by autonomous material handling vehicles. " In 2018 Winter Simulation
Conference (WSC) .IEEE,(2018): 1073-1083.
23. Lin, Long-Ji. "Self-improving reactive agents based on reinforcement learning,
planning and teaching." Machine learning 8.3 (1992): 293-321.
24. Liu, Chien-Liang, Chuan-Chin Chang, and Chun-Jan Tseng. "Actor-critic deep
reinforcement learning for solving job shop scheduling problems." IEEE Access
8 (2020): 71752-71762.
25. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y.,& Wierstra,
D. " Continuous control with deep reinforcement learning. " arXiv preprint
arXiv:1509.02971. (2015)
26. Liu, Shi Qiang, and Erhan Kozan. "Parallel-identical-machine job-shop
scheduling with different stage-dependent buffering requirements." Computers &
Operations Research 74 (2016): 31-41.
27. Luo, B., Wang, S., Yang, B., & Yi, L. "An improved deep reinforcement learning
approach for the dynamic job shop scheduling problem with random job
arrivals."Journal of Physics: Conference Series (2021): 012029.
130
28. Luo, Shu. "Dynamic scheduling for flexible job shop with new job insertions by
deep reinforcement learning." Applied Soft Computing, 91 (2020): 106208.
29. Muhlemann, A. P., A. G. Lockett, and C-K. Farn. "Job shop scheduling heuristics
and frequency of scheduling." The International Journal of Production Research
20.2 (1982): 227-241.
30. Mouelhi-Chibani, Wiem, and Henri Pierreval. "Training a neural network to
select dispatching rules in real time." Computers & Industrial Engineering, 58.2
(2010): 249-256.
31. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D.,
& Riedmiller, M. "Playing atari with deep reinforcement learning."arXiv preprint
arXiv:1312.5602. (2013).
32. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T.&
Kavukcuoglu, K. " Asynchronous methods for deep reinforcement learning. " In
International conference on machine learning (2016): 1928-1937.
33. Ouelhadj, Djamila, and Sanja Petrovic. "A survey of dynamic scheduling in
manufacturing systems." Journal of scheduling 12 (2009): 417-431.
34. Peters, Jan, and Stefan Schaal. "Reinforcement learning of motor skills with
policy gradients." Neural networks , 21.4 (2008): 682-697.
35. Puterman, Martin L. " Markov decision processes: discrete stochastic dynamic
programming. " John Wiley & Sons, 2014
36. Park, J., Chun, J., Kim, S. H., Kim, Y., & Park, J. "Learning to schedule job-shop
problems: representation and policy learning using graph neural network and
reinforcement learning. " International Journal of Production Research, 59.11,
(2021): 3360-3377.
37. Ran, Y., Zhou, X., Hu, H., & Wen, Y. "Optimizing data centre energy efficiency
via event-driven deep reinforcement learning. " IEEE Transactions on Services
131
Computing (2022).
38. Renke, Liu, Rajesh Piplani, and Carlos Toro. "A review of dynamic scheduling:
Context, techniques and prospects." Implementing Industry 4.0: The Model
Factory as the Key Enabler for the Future of Manufacturing (2021): 229-258.
39. Schaul, T., Quan, J., Antonoglou, I., & Silver, D. "Prioritized experience replay. "
arXiv preprint arXiv:1511.05952 (2015).
40. Shalev-Shwartz, Shai, Shaked Shammah, and Amnon Shashua. "Safe, multiagent, reinforcement learning for autonomous driving." arXiv preprint
arXiv:1610.03295 (2016).
41. Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint
arXiv:1707.06347 (2017).
42. Smith, Samuel L., et al. "Don't decay the learning rate, increase the batch size."
arXiv preprint arXiv:1711.00489 (2017).
43. Shahrabi, Jamal, Mohammad Amin Adibi, and Masoud Mahootchi. "A
reinforcement learning approach to parameter estimation in dynamic job shop
scheduling." Computers & Industrial Engineering 110 (2017): 75-82.
44. Shiue, Y.-R., Data-mining-based dynamic dispatching rule selection
mechanism for shop floor control systems using a support vector machine
approach. International Journal of Production Research, 2009. 47(13): p.
3669-3690.
45. Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An
introduction. MIT press, 2018.
46. Stooke, Adam, and Pieter Abbeel. "Accelerated methods for deep reinforcement
learning." arXiv preprint arXiv:1803.02811 (2018).
47. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez,
A., ... & Hassabis, D. "Mastering the game of go without human knowledge. "
132
Nature, 550.7676, (2017): 354-359.
48. Teymourifar, Aydin, et al. "Extracting new dispatching rules for multi-objective
dynamic flexible job shop scheduling with limited buffer spaces." Cognitive
Computation, 12 (2020): 195-205.
49. Tassel, P., Gebser, M., & Schekotihin, K. (2021). A reinforcement learning
environment for job-shop scheduling. arXiv preprint arXiv:2104.03760.
50. Vinod, V., and R. Sridharan. "Simulation modeling and analysis of due-date
assignment methods and scheduling decision rules in a dynamic job shop
production system." International Journal of Production Economics, 129.1
(2011): 127-146.
51. Wang, L., Hu, X., Wang, Y., Xu, S., Ma, S., Yang, K., ... & Wang, W. " Dynamic
job-shop scheduling in smart manufacturing using deep reinforcement learning. "
Computer Networks, 190, (2021): 107969.
52. Wang, Yi-Chi, and John M. Usher. "Application of reinforcement learning for
agent-based production scheduling." Engineering Applications of Artificial
Intelligence, 18.1 (2005): 73-82.
53. Weckman, Gary R., Chandrasekhar V. Ganduri, and David A. Koonce. "A neural
network job-shop scheduler." Journal of Intelligent Manufacturing,19 (2008):
191-201.
54. Wu, Junjie, Kuo Li, and Qing-Shan Jia. "Decentralized multi-agent
reinforcement learning with multi-time scale of decision epochs." 2020 59th
IEEE Conference on Decision and Control (CDC). IEEE (2020).
55. Wang, Z., Cai, B., Li, J., Yang, D., Zhao, Y., & Xie, H. " Solving nonpermutation flow-shop scheduling problem via a novel deep reinforcement
learning approach. "Computers & Operations Research, 151, (2023): 106095.
56. Wang, J., Zhou, P., Huang, G., & Wang, W. A data mining approach to discover
133
critical events for event-driven optimization in building air conditioning systems.
Energy Procedia, 143, (2017): 251-257.
57. Xiong, H., Shi, S., Ren, D., & Hu, J. " A survey of job shop scheduling problem:
The types and models. "Computers & Operations Research, (2022): 105731.
58. Yamamoto, M., and S. Y. Nof. "Scheduling/rescheduling in the manufacturing
operating system environment." International Journal of Production Research
23.4 (1985): 705-722.
59. Yih, Yuehwern, and Arne Thesen. "Semi-Markov decision models for real-time
scheduling." The International Journal of Production Research 29.11 (1991):
2331-2346.
60. Zhang, Shangtong, and Richard S. Sutton. "A deeper look at experience replay."
arXiv preprint arXiv:1712.01275 (2017).
61. Zhang, C., Song, W., Cao, Z., Zhang, J., Tan, P. S., & Chi, X. "Learning to
dispatch for job shop scheduling via deep reinforcement learning. " Advances in
Neural Information Processing Systems, 33, (2020): 1621-1632.
62. Zhang, H., & Yu, T. Taxonomy of reinforcement learning algorithms. Deep
Reinforcement Learning: Fundamentals, Research and Applications, (2020):
125-133.
63. Zhang, Tao, Shufang Xie, and Oliver Rose. "Real-time job shop scheduling
based on simulation and Markov decision processes." 2017 Winter Simulation
Conference (WSC). IEEE (2017).
64. Zhao, Y., Wang, Y., Tan, Y., Zhang, J., & Yu, H. "Dynamic jobshop scheduling
algorithm based on deep q network. " IEEE Access, 9, (2021): 122995-123011.
65. Zhang, Z., Zheng, L., Li, N., Wang, W., Zhong, S., & Hu, K. " Minimizing mean
weighted tardiness in unrelated parallel machine scheduling with reinforcement
learning. "Computers & operations research, 39.7, (2012): 1315-1324.

簡易檢索 / 詳目顯示

相關論文