模擬為基之深度強化學習於零工式生產排程｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	李宗霖 Lee, Tsung-Lin
論文名稱：	模擬為基之深度強化學習於零工式生產排程 Simulation-based Deep Reinforcement Learning for Job Shop Scheduling Problem
指導教授：	林則孟 Lin, James T.
口試委員:	陳勝一 Chen, Sheng-I 林裕訓 Lin, Yu-Hsun
學位類別：	碩士 Master
系所名稱：	工學院 - 智慧生產與智能馬達電控產業 Intelligent Manufacturing & Intelligent Motor Electronic Control Master Program of Industry
論文出版年：	2022
畢業學年度：	110
語文別：	中文
論文頁數：	108
中文關鍵詞：	深度強化學習、動態排程問題、離散事件模擬
外文關鍵詞：	Deep Reinforcement learning, Dynamic Scheduling, Discrete event simulation
相關次數：	點閱：99 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本研究主要探討深度強化學習於零工式生產排程問題，考量系統中工單動態來到事件引發的動態情境下，採用Deep Q Network深度強化學習方法，以最小化一段時間內的平均工單流程時間(Mean Flow Time) 為目標，探討動態排程與集中式派工問題。
本研究以模擬為基礎之建構強化學習『環境』。利用離散事件模擬結合強化學習，以下次事件模擬法進行時間推進，取代生產系統中未知的狀態轉移機率。本研究參考Open AI Gym框架，引入狀態轉移之介面函數於模擬模式中，作為能與學習代理人(Agent)互動的環境，使代理人能觀察環境之即時狀態做動作決策，環境負責執行動作與狀態轉移，並對此給予獎勵，透過兩者反覆互動之系統轉移過程收集訓練樣本，進行強化學習代理人的訓練。
本研究提出以模擬為基之深度強化學習方法，並於應用零工式生產系統中。首先定義其狀態、動作、獎勵函數等強化學習要素。接著建構基於自注意力機制的Deep Q Network，透過保留工單動態來到之時序性資訊的狀態編碼，使代理人從過去經驗學習狀態與派工法則的關聯及探索更多可能之最佳決策。
實驗結果驗證了透過適當的獎勵值以及類神經網路模型的設計，深度強化學習方法可學習到如何根據系統狀態即時調整派工，也可以隨環境中動態情境的不同，做決策調整，在工單動態來到之情境下，學習到超越最佳單一派工法則的績效表現，發揮了強化學習於動態決策問題之優勢。

This research applied deep reinforcement learning in job shop scheduling problem. Considering the scenario of the system with dynamic job arrival, Deep Q Network learning algorithm is used to minimize the mean flow time within a period of time for dynamic scheduling and centralized dispatching problem.
The "environment" of reinforcement learning is contrasted based on simulation. By using discrete event simulation, the next event advancement method is used for the state transition without the unknown transition probability in the production system. This study introduces the OpenAI Gym compatible interfaces into the simulation model to make "environment" have an interactive relation to RL agent. When time advance to the decisions point, the agent is able to observe the state from environment and decide the action, and the environment must transit to the next state and feedback the reward after executing the action. By collecting such state transition process as training samples, RL agents are trained with the experiences.
This research proposes a simulation-based deep reinforcement learning approach for the job shop scheduling problem. First, the research define the RL elements such as its state, action, and reward, and then a Deep Q Network with self-attention module is constructed. By encoding the state with the temporal information about the dynamic arrival of jobs, the agent must be able to learn the relationship between the state and the dispatching rule from the past experience and explore better decisions.
The experimental results verify that with the appropriate designs of reward function and neural network architecture, the performance of the deep reinforcement learning dynamic scheduling method can be better than traditional dispatching methods.

摘要    i
Abstract    ii
致謝    iii
第1章    緒論    1
1    研究背景與動機    1
2    研究目的    2
第2章    文獻回顧    4
1    排程問題    4
1.1    靜態排程與動態排程    4
1.2    排程問題解決方法    6
2    強化學習應用於動態排程問題相關文獻    8
2.1    強化學習於動態排程問題    8
2.2    強化學習於動態排程問題之整理    10
3    深度強化學習    13
3.1    強化學習    13
3.2    馬可夫決策過程    14
3.3    深度強化學習    14
3.4    Deep Q Network(DQN)    17
第3章    以深度強化學習為基之動態排程概念    23
1    動態排程問題    23
2    以深度強化學習為基之動態排程    24
3    Deep Q Network學習所面臨之議題    26
第4章    問題分析與定義    30
1    問題描述    30
2    動態排程與派工問題分析(案例說明)    31
3    強化學習於零工式生產排程與派工問題    48
4    強化學習要素設計    50
第5章    模擬為基礎之強化學習環境    54
1    離散事件模擬於強化學習    54
2    學習事件    56
3    排程與派工問題之離散事件模擬「環境」建構    57
4    SimPy 模擬程式(Simulation)    67
第6章    深度強化學習模型之建構    75
1    建構DQN類神經網路    76
1.1    類神經網路結構(Network Architecture)    77
1.2    輸入狀態之編碼(State Encoding)    78
1.3    輸出動作的選取(Action Selection)    80
2    DQN 訓練策略    81
3    基於「模擬為環境」之DQN訓練流程    84
第7章    實驗結果與分析    87
1    實驗環境    87
1.1    實驗平台    87
1.2    實驗環境設定    88
2    實驗情境設計與實驗結果    88
2.1    實驗一：確定型靜態環境與DQN學習表現    89
2.2    實驗二：類神經網路設計對於派工決策之影響    95
2.3    實驗三：訂單來到率變動情境    98
2.4    小結    100
第8章    結論與建議    102
1    結論    102
2    建議    103
參考文獻    105


                                

1. 林則孟，“生產計劃與管制”，華泰出版社，2012。
2. 林則孟、李宗霖，“以模擬為基之深度強化學習應用於零工式生產排程問題”，中國工業工程學會年會暨學術研討會，2021。
3. 林則孟、羅士銓，“結合SysML與離散事件模擬建立生產系統之強化學習環境”，中國工業工程學會年會暨學術研討會，2020。
4. 黃怡嘉，“強化學習於混合式流線生產排程問題之探討”，清華大學工業工程研究所碩士論文，2020。
5. 羅士銓，“應用離散事件模擬於強化學習之探討”，清華大學工業工程研究所碩士論文，2020。
6. Allen, M., K. Pearn, and T. Monks, "Developing an OpenAI Gym-compatible framework and simulation environment for testing Deep Reinforcement Learning agents solving the Ambulance Location Problem". arXiv preprint arXiv:2101.04434, 2021.
7. Arulkumaran, K., et al., "A brief survey of deep reinforcement learning". arXiv preprint arXiv:1708.05866, 2017.
8. Barto, A.G., R.S. Sutton, and C.W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems". IEEE Transactions on Systems, Man, and Cybernetics, 1983. SMC-13(5): p. 834-846.
9. Battaglia, P.W., et al., "Relational inductive biases, deep learning, and graph networks". arXiv preprint arXiv:1806.01261, 2018.
10. Beasley, J.E., "OR-Library: distributing test problems by electronic mail". Journal of the operational research society, 1990. 41(11): p. 1069-1072.
11. Bertel, S. and J.-C. Billaut, "A genetic algorithm for an industrial multiprocessor flow shop scheduling problem with recirculation". European Journal of Operational Research, 2004. 159(3): p. 651-662.
12. Brockman, G., et al., "Openai gym". arXiv preprint arXiv:1606.01540, 2016.
13. de Lara, J., et al., "Domain-specific discrete event modelling and simulation using graph transformation". Software & Systems Modeling, 2014. 13(1): p. 209-238.
14. Dominic, P.D., S. Kaliyamoorthy, and M.S. Kumar, "Efficient dispatching rules for dynamic job shop scheduling". The International Journal of Advanced Manufacturing Technology, 2004. 24(1): p. 70-75.
15. Fortunato, M., et al., "Noisy networks for exploration". arXiv preprint arXiv:1706.10295, 2017.
16. François-Lavet, V., et al., "An introduction to deep reinforcement learning". arXiv preprint arXiv:1811.12560, 2018.
17. Garey, M.R. and D.S. Johnson, "Computers and Intractability; A Guide to the Theory of NP-Completeness". 1990: W. H. Freeman & Co.
18. Han, B.-A. and J.-J. Yang, "Research on adaptive job shop scheduling problems based on dueling double DQN". IEEE Access, 2020. 8: p. 186474-186495.
19. Han, W., F. Guo, and X. Su, "A reinforcement learning method for a hybrid flow-shop scheduling problem". Algorithms, 2019. 12(11): p. 222.
20. Li, M.P., et al. "Simulation analysis of a deep reinforcement learning approach for task selection by autonomous material handling vehicles". 2018 Winter Simulation Conference (WSC). 2018. IEEE.
21. Liao, C.-J. and C.-T. You, "An improved formulation for the job-shop scheduling problem". Journal of the Operational Research Society, 1992. 43(11): p. 1047-1054.
22. Lillicrap, T.P., et al., "Continuous control with deep reinforcement learning". arXiv preprint arXiv:1509.02971, 2016.
23. Liu, C.-L., C.-C. Chang, and C.-J. Tseng, "Actor-critic deep reinforcement learning for solving job shop scheduling problems". Ieee Access, 2020. 8: p. 71752-71762.
24. Lu, M.-S. and R. Romanowski, "Multicontextual dispatching rules for job shops with dynamic job arrival". The International Journal of Advanced Manufacturing Technology, 2013. 67(1-4): p. 19-33.
25. Mnih, V., et al. "Asynchronous methods for deep reinforcement learning". International conference on machine learning. 2016.
26. Mnih, V., et al., "Playing atari with deep reinforcement learning". arXiv preprint arXiv:1312.5602, 2013.
27. Mnih, V., et al., "Human-level control through deep reinforcement learning". nature, 2015. 518(7540): p. 529-533.
28. Ouelhadj, D. and S. Petrovic, "A survey of dynamic scheduling in manufacturing systems". Journal of scheduling, 2009. 12(4): p. 417-431.
29. Pan, J.C.-H. and J.-S. Chen, "Mixed binary integer programming formulations for the reentrant job shop scheduling problem". Computers & Operations Research, 2005. 32(5): p. 1197-1212.
30. Park, J., et al., "Learning to schedule job-shop problems: representation and policy learning using graph neural network and reinforcement learning". International Journal of Production Research, 2021. 59(11): p. 3360-3377.
31. Pfeiffer, A., B. Kádár, and L. Monostori, "Stability-oriented evaluation of rescheduling strategies, by using simulation". Computers in Industry, 2007. 58(7): p. 630-643.
32. Priore, P., et al., "Dynamic scheduling of manufacturing systems using machine learning: An updated review". Ai Edam, 2014. 28(1): p. 83-97.
33. Ren, J., C. Ye, and F. Yang, "A novel solution to JSPS based on long short-term memory and policy gradient algorithm". International Journal of Simulation Modelling, 2020. 19(1): p. 157-168.
34. Renke, L., R. Piplani, and C. Toro, "A Review of Dynamic Scheduling: Context, Techniques and Prospects". Implementing Industry 4.0, 2021: p. 229.
35. Schaul, T., et al., "Prioritized experience replay". arXiv preprint arXiv:1511.05952, 2015.
36. Schulman, J., et al., "Proximal policy optimization algorithms". arXiv preprint arXiv:1707.06347, 2017.
37. Sewak, M., "Deep reinforcement learning". 2019: Springer.
38. Shi, D., et al., "Intelligent scheduling of discrete automated production line via deep reinforcement learning". International Journal of Production Research, 2020. 58(11): p. 3362-3380.
39. Silver, D., et al., "Mastering the game of Go with deep neural networks and tree search". nature, 2016. 529(7587): p. 484-489.
40. Silver, D., et al., "Mastering the game of go without human knowledge". nature, 2017. 550(7676): p. 354-359.
41. Singh, S., et al., "Convergence results for single-step on-policy reinforcement-learning algorithms". Machine learning, 2000. 38(3): p. 287-308.
42. Stricker, N., et al., "Reinforcement learning for adaptive order dispatching in the semiconductor industry". CIRP Annals, 2018. 67(1): p. 511-514.
43. Sutton, R.S. and A.G. Barto, "Reinforcement learning: An introduction". 2018: MIT press.
44. Tang, L.-L., Y. Yih, and C.-Y. Liu, "A study on decision rules of a scheduling model in an FMS". Computers in Industry, 1993. 22(1): p. 1-13.
45. Vallada, E. and R. Ruiz, "A genetic algorithm for the unrelated parallel machine scheduling problem with sequence dependent setup times". European Journal of Operational Research, 2011. 211(3): p. 612-622.
46. Wang, L., et al., "Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning". Computer Networks, 2021. 190: p. 107969.
47. Wang, Y.-C. and J.M. Usher, "Application of reinforcement learning for agent-based production scheduling". Engineering Applications of Artificial Intelligence, 2005. 18(1): p. 73-82.
48. Waschneck, B., et al. "Deep reinforcement learning for semiconductor production scheduling". 2018 29th annual SEMI advanced semiconductor manufacturing conference (ASMC). 2018. IEEE.
49. Watkins, C.J.C.H., "Learning from delayed rewards". 1989.
50. Williams, R.J., "Simple statistical gradient-following algorithms for connectionist reinforcement learning". Machine learning, 1992. 8(3): p. 229-256.
51. Wu, C., et al., "UAV autonomous target search based on deep reinforcement learning in complex disaster scene". IEEE Access, 2019. 7: p. 117227-117245.
52. Yuan, B., L. Wang, and Z. Jiang. "Dynamic parallel machine scheduling using the learning agent". 2013 IEEE international conference on industrial engineering and engineering management. 2013. IEEE.
53. Zambaldi, V., et al. "Deep reinforcement learning with relational inductive biases". International Conference on Learning Representations. 2018.
54. Zhang, T., S. Xie, and O. Rose. "Real-time job shop scheduling based on simulation and Markov decision processes". 2017 Winter Simulation Conference (WSC). 2017. IEEE.
55. Zhang, Z., et al., "Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning". Computers & operations research, 2012. 39(7): p. 1315-1324.
56. Zhang, Z., L. Zheng, and M.X. Weng, "Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning". The International Journal of Advanced Manufacturing Technology, 2007. 34(9): p. 968-980.
57. Zhao, Y., et al., "Dynamic Jobshop Scheduling Algorithm Based on Deep Q Network". IEEE Access, 2021. 9: p. 122995-123011.

簡易檢索 / 詳目顯示

相關論文