簡易檢索 / 詳目顯示

研究生: 陳仕軒
Chen, Shih-Hsuan
論文名稱: 模擬為基之強化學習訓練資料探討: 以考量等候區容量限制之零工式排程問題為例
Training Data Investigation of Simulation-Based Reinforcement Learning Environment - Job Shop Scheduling Problem with limited buffer as an Example
指導教授: 林則孟
Lin, James T.
口試委員: 陳子立
Chen, Tzu-Li
林裕訓
Lin, Yu-Hsun
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 133
中文關鍵詞: 深度強化學習零工式排程問題離散事件模擬記憶回放策略決策頻繁度
外文關鍵詞: Deep reinforcement learning, Job shop scheduling problem, Discrete event simulation, Experience replay strategy, Decision frequency
相關次數: 點閱:58下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究將深度強化學習應用至排程問題當中,並以考量等候區容量限制之
    零工式排程問題為例,探討排程問題之訓練樣本生成,尤其針對「DQN」訓練
    可能衍生之決策頻繁度與記憶回放策略進行探討。將強化學習應用至排程問題
    中首先面臨的就是訓練資料的生成問題,由於代理人難以與實際的生產排程環
    境互動,因此需要一個模擬器(Simulator)模擬排程過程中發生的各種動態事件,
    本研究提出以離散事件模擬為基礎的強化學習「環境」建構方法,首先定義狀
    態、動作、獎勵函數等強化學習要素,接著以系統塑模語言(SysML)做為系統
    分析之工具,分析零工式生產排程系統之物件架構與行為,最終以模擬「環境」
    互動框架使模擬模型轉化為「環境」,藉由模擬達成訓練樣本生成之目的。
    強化學習問題中代理人決策的時機點(Decision epoch),代表訓練資料的生
    成間隔大小,為訓練樣本生成的一個重要議題。本研究於排程問題中考慮決策
    頻繁度與決策事件之影響,提出模擬為基「DQN」演算法,透過本研究提出之
    演算法可於訓練中嘗試不同決策事件,實驗結果驗證了決策頻繁度存在一個最
    佳值,過多或是過少的決策次數都將影響代理人學習效果。本研究以「DQN」
    為強化學習演算法並探討其中的記憶回放策略,考量傳統記憶回放策略以隨機
    的方式由記憶庫中抽取批量訓練樣本,本研究提出分層抽樣之想法,透過分層
    抽樣策略確保每個子集在樣本中有適當的代表性,實驗結果證明了透過決策頻
    繁度與分層抽樣策略兩項改進,加速了代理人訓練之收斂速度,且在不同系統
    負荷與交期緊縮情境下,「DQN」皆取得了超越單一派工法則的績效表現,顯
    示代理人能依據系統狀態做出適當的決策之能力。


    This study applies deep reinforcement learning to scheduling problems, focusing
    on job shop scheduling problem with limited buffer. The study investigates the
    generation of training samples for scheduling problems, particularly addressing the
    issues of decision frequency and memory replay strategies in DQN. The application
    of reinforcement learning to scheduling problems first faces the challenge of
    generating training data. As the agent is unable to interact with the actual production
    scheduling environment, a simulator is needed to simulate various dynamic events
    that occur during the scheduling process. This study proposes a discrete event
    simulation-based environment construction method for reinforcement learning. It
    defines the elements of reinforcement learning, such as states, actions, and reward
    functions, and utilizes the Systems Modeling Language (SysML) as a tool for system
    analysis to analyze the job shop scheduling system.
    The decision epoch, representing the timing of agent's decisions in reinforcement
    learning problems. This study considers the impact of decision frequency and
    decision events in scheduling problems and proposes the simulation-based DQN
    algorithm. Through this algorithm, different decision events can be explored during
    training. The experimental results confirm that there exists an optimal value for
    decision frequency, and too many or too few decision events will affect the learning
    effectiveness of the agent. This study proposes a stratified sampling approach to
    ensure that each subset has appropriate representativeness in the samples. The
    experimental results demonstrate that with the improvements of decision frequency
    and stratified sampling strategy, the convergence speed of agent training is
    accelerated. Furthermore, in different system load and due date tightness scenarios,
    the DQN algorithm achieves performance surpassing that of a single dispatching rule,
    iii
    indicating the ability of the agent to make appropriate decisions based on the system
    state.

    摘要 i 致謝 iv 目錄 v 第1章 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 4 第2章 文獻回顧 5 2.1 機器學習應用於排程問題 5 2.1.1 人工神經網路 5 2.1.2 強化學習 6 2.2 零工式排程問題類型 7 2.3 排程問題之強化學習要素設計 8 2.3.1 強化學習要素設計-動作(Action) 8 2.3.2 強化學習要素設計-狀態(state) 11 2.3.3 強化學習要素設計-獎勵(reward) 13 2.3.4 強化學習要素設計-決策頻繁度(Decision frequency) 16 2.4 模擬為基之強化學習「環境」 19 2.5 深度強化學習方法 21 2.5.1 訓練策略 23 2.5.2 神經網路架構 25 2.5.3 探索策略 26 2.5.4 記憶回放策略 28 第3章 以深度強化學習為基之動態排程概念 33 3.1 動態排程問題 33 3.2 以深度強化學習為基之動態排程 35 3.3 排程問題之「DQN」研究架構 39 第4章 強化學習分析-以零工式排程問題為例 45 4.1 零工式排程問題情境 45 4.2 強化學習排程案例分析 46 4.2.1 排程案例說明 46 4.2.2 強化學習要素設計範例 47 4.2.3 代理人環境互動分析 50 4.3 分析與發現 59 4.3.1 排程與馬可夫決策過程 59 4.3.2 深度強化學習於動態排程問題 61 4.3.3 強化學習於考量等候區容量限制之零工式排程問題 63 4.3.4 排程問題與強化學習議題 67 第5章 強化學習要素設計 69 5.1 強化學習要素設計-狀態(State) 69 5.2 強化學習要素設計-動作(Action) 73 5.3 強化學習要素設計-獎勵(Reward) 74 第6章 模擬為基之強化學習環境 77 6.1 馬可夫決策過程與離散事件模擬 77 6.2 模擬為基之「環境」 80 6.2.1 模擬為基之「環境」-以考量等候區之JSSP為例 81 6.2.2 系統分析階段 82 6.2.3 事件驅動之強化學習演算法 92 第7章 深度強化學習演算法-以DQN為例 100 7.1 深度強化學習演算法 100 7.2 輸入狀態編碼與神經網路架構 102 7.3 探索策略 104 7.4 記憶回放策略 105 第8章 實驗結果與分析 109 8.1 實驗設定 109 8.1.1 訓練與測試環境設定 109 8.1.2 實驗情境與目的 110 8.2 實驗一: 決策頻繁度 113 8.3 實驗二: 記憶回放策略 115 8.4 實驗三: 不同系統負荷、交期緊縮情境 119 8.5 實驗四: 穩定狀態模擬情境 120 8.5.1 穩定狀態決策頻繁度之影響 120 8.5.2 探討不同學習事件之影響 122 第9章 結論與建議 124 9.1 結論 124 9.2 未來研究方向 125 參考文獻 127

    1. 李宗霖,「模擬為基之深度強化學習於零工式生產排程」,清華大學工業工程
    研究所碩士論文(2022)。
    2. 徐孟維,「機器學習於無人搬運車系統之派車應用」,清華大學工業工程研究
    所碩士論文(2018)。
    3. 羅士銓,「應用離散事件模擬於強化學習之探討」,清華大學工業工程研究所
    碩士論文(2020)。
    4. 黃怡嘉,「強化學習於混合式流線生產排程問題之探討」,清華大學工業工程
    研究所碩士論文(2020)。
    5. Beasley, John E. "OR-Library: distributing test problems by electronic mail."
    Journal of the operational research society 41.11 (1990): 1069-1072.
    6. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J.,
    & Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:1606.01540.
    7. Bellemare, Marc G., Will Dabney, and Rémi Munos. "A distributional
    perspective on reinforcement learning." International Conference on Machine
    Learning. PMLR (2017).
    8. Choe, Ri, Jeongmin Kim, and Kwang Ryel Ryu. "Online preference learning for
    adaptive dispatching of AGVs in an automated container terminal." Applied Soft
    Computing 38 (2016): 647-660.
    9. Chen, S. Y., Yu, Y., Da, Q., Tan, J., Huang, H. K., & Tang, H. H. "Stabilizing
    reinforcement learning in dynamic environment with application to online
    recommendation. " Proceedings of the 24th ACM SIGKDD International
    Conference on Knowledge Discovery & Data Mining (2018): 1187-1196.
    10. Duan, Y., Chen, X., Houthooft, R., Schulman, J., & Abbeel, P. "Benchmarking
    deep reinforcement learning for continuous control. " International conference
    128
    on machine learning (2016): 1329-1338.
    11. Fishman, George S. Discrete-event simulation: modeling, programming, and
    analysis. Vol. 537. New York: Springer, 2001.
    12. Garey, Michael R., David S. Johnson, and Ravi Sethi. "The complexity of
    flowshop and jobshop scheduling." Mathematics of operations research 1.2
    (1976): 117-129.
    13. Gupta, A. K., and A. I. Sivakumar. "Optimization of due-date objectives in
    scheduling semiconductor batch manufacturing." International Journal of
    Machine Tools and Manufacture 46.12-13 (2006): 1671-1679.
    14. Gao, K. Z., Suganthan, P. N., Chua, T. J., Chong, C. S., Cai, T. X., & Pan, Q. K.
    "A two-stage artificial bee colony algorithm scheduling flexible job-shop
    scheduling problem with new job insertion. " Expert systems with applications,
    42.21. (2015): 7652-7663.
    15. Grillo, Samuele, Antonio Pievatolo, and Enrico Tironi. "Optimal storage
    scheduling using Markov decision processes." IEEE Transactions on Sustainable
    Energy 7.2 (2015): 755-764.
    16. Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. "Soft actor-critic: Off-policy
    maximum entropy deep reinforcement learning with a stochastic actor. "
    International conference on machine learning (2018): 1861-1870.
    17. Han, Bao-An, and Jian-Jun Yang. "Research on adaptive job shop scheduling
    problems based on dueling double DQN." IEEE Access 8 (2020): 186474-
    186495.
    18. Hu, Yujiao, Yuan Yao, and Wee Sun Lee. "A reinforcement learning approach for
    optimizing multiple traveling salesman problems over graphs." KnowledgeBased Systems 204 (2020): 106244.
    19. Jacek Błazewicz, Erwin Pesch, and Małgorzata Sterna. The disjunctive graph
    129
    machine representation of the job shop scheduling problem. European Journal of
    Operational Research,127.2 (2000):317–331.
    20. Kutanoglu, Erhan. "An analysis of heuristics in a dynamic job shop with
    weighted tardiness objectives." International Journal of Production Research
    37.1 (1999): 165-187.
    21. Lei, K., Guo, P., Zhao, W., Wang, Y., Qian, L., Meng, X., & Tang, L. "A multiaction deep reinforcement learning framework for flexible Job-shop scheduling
    problem. " Expert Systems with Applications 205 (2022): 117796.
    22. Li, M. P., Sankaran, P., Kuhl, M. E., Ganguly, A., Kwasinski, A., & Ptucha, R.
    "Simulation analysis of a deep reinforcement learning approach for task selection
    by autonomous material handling vehicles. " In 2018 Winter Simulation
    Conference (WSC) .IEEE,(2018): 1073-1083.
    23. Lin, Long-Ji. "Self-improving reactive agents based on reinforcement learning,
    planning and teaching." Machine learning 8.3 (1992): 293-321.
    24. Liu, Chien-Liang, Chuan-Chin Chang, and Chun-Jan Tseng. "Actor-critic deep
    reinforcement learning for solving job shop scheduling problems." IEEE Access
    8 (2020): 71752-71762.
    25. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y.,& Wierstra,
    D. " Continuous control with deep reinforcement learning. " arXiv preprint
    arXiv:1509.02971. (2015)
    26. Liu, Shi Qiang, and Erhan Kozan. "Parallel-identical-machine job-shop
    scheduling with different stage-dependent buffering requirements." Computers &
    Operations Research 74 (2016): 31-41.
    27. Luo, B., Wang, S., Yang, B., & Yi, L. "An improved deep reinforcement learning
    approach for the dynamic job shop scheduling problem with random job
    arrivals."Journal of Physics: Conference Series (2021): 012029.
    130
    28. Luo, Shu. "Dynamic scheduling for flexible job shop with new job insertions by
    deep reinforcement learning." Applied Soft Computing, 91 (2020): 106208.
    29. Muhlemann, A. P., A. G. Lockett, and C-K. Farn. "Job shop scheduling heuristics
    and frequency of scheduling." The International Journal of Production Research
    20.2 (1982): 227-241.
    30. Mouelhi-Chibani, Wiem, and Henri Pierreval. "Training a neural network to
    select dispatching rules in real time." Computers & Industrial Engineering, 58.2
    (2010): 249-256.
    31. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D.,
    & Riedmiller, M. "Playing atari with deep reinforcement learning."arXiv preprint
    arXiv:1312.5602. (2013).
    32. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T.&
    Kavukcuoglu, K. " Asynchronous methods for deep reinforcement learning. " In
    International conference on machine learning (2016): 1928-1937.
    33. Ouelhadj, Djamila, and Sanja Petrovic. "A survey of dynamic scheduling in
    manufacturing systems." Journal of scheduling 12 (2009): 417-431.
    34. Peters, Jan, and Stefan Schaal. "Reinforcement learning of motor skills with
    policy gradients." Neural networks , 21.4 (2008): 682-697.
    35. Puterman, Martin L. " Markov decision processes: discrete stochastic dynamic
    programming. " John Wiley & Sons, 2014
    36. Park, J., Chun, J., Kim, S. H., Kim, Y., & Park, J. "Learning to schedule job-shop
    problems: representation and policy learning using graph neural network and
    reinforcement learning. " International Journal of Production Research, 59.11,
    (2021): 3360-3377.
    37. Ran, Y., Zhou, X., Hu, H., & Wen, Y. "Optimizing data centre energy efficiency
    via event-driven deep reinforcement learning. " IEEE Transactions on Services
    131
    Computing (2022).
    38. Renke, Liu, Rajesh Piplani, and Carlos Toro. "A review of dynamic scheduling:
    Context, techniques and prospects." Implementing Industry 4.0: The Model
    Factory as the Key Enabler for the Future of Manufacturing (2021): 229-258.
    39. Schaul, T., Quan, J., Antonoglou, I., & Silver, D. "Prioritized experience replay. "
    arXiv preprint arXiv:1511.05952 (2015).
    40. Shalev-Shwartz, Shai, Shaked Shammah, and Amnon Shashua. "Safe, multiagent, reinforcement learning for autonomous driving." arXiv preprint
    arXiv:1610.03295 (2016).
    41. Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint
    arXiv:1707.06347 (2017).
    42. Smith, Samuel L., et al. "Don't decay the learning rate, increase the batch size."
    arXiv preprint arXiv:1711.00489 (2017).
    43. Shahrabi, Jamal, Mohammad Amin Adibi, and Masoud Mahootchi. "A
    reinforcement learning approach to parameter estimation in dynamic job shop
    scheduling." Computers & Industrial Engineering 110 (2017): 75-82.
    44. Shiue, Y.-R., Data-mining-based dynamic dispatching rule selection
    mechanism for shop floor control systems using a support vector machine
    approach. International Journal of Production Research, 2009. 47(13): p.
    3669-3690.
    45. Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An
    introduction. MIT press, 2018.
    46. Stooke, Adam, and Pieter Abbeel. "Accelerated methods for deep reinforcement
    learning." arXiv preprint arXiv:1803.02811 (2018).
    47. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez,
    A., ... & Hassabis, D. "Mastering the game of go without human knowledge. "
    132
    Nature, 550.7676, (2017): 354-359.
    48. Teymourifar, Aydin, et al. "Extracting new dispatching rules for multi-objective
    dynamic flexible job shop scheduling with limited buffer spaces." Cognitive
    Computation, 12 (2020): 195-205.
    49. Tassel, P., Gebser, M., & Schekotihin, K. (2021). A reinforcement learning
    environment for job-shop scheduling. arXiv preprint arXiv:2104.03760.
    50. Vinod, V., and R. Sridharan. "Simulation modeling and analysis of due-date
    assignment methods and scheduling decision rules in a dynamic job shop
    production system." International Journal of Production Economics, 129.1
    (2011): 127-146.
    51. Wang, L., Hu, X., Wang, Y., Xu, S., Ma, S., Yang, K., ... & Wang, W. " Dynamic
    job-shop scheduling in smart manufacturing using deep reinforcement learning. "
    Computer Networks, 190, (2021): 107969.
    52. Wang, Yi-Chi, and John M. Usher. "Application of reinforcement learning for
    agent-based production scheduling." Engineering Applications of Artificial
    Intelligence, 18.1 (2005): 73-82.
    53. Weckman, Gary R., Chandrasekhar V. Ganduri, and David A. Koonce. "A neural
    network job-shop scheduler." Journal of Intelligent Manufacturing,19 (2008):
    191-201.
    54. Wu, Junjie, Kuo Li, and Qing-Shan Jia. "Decentralized multi-agent
    reinforcement learning with multi-time scale of decision epochs." 2020 59th
    IEEE Conference on Decision and Control (CDC). IEEE (2020).
    55. Wang, Z., Cai, B., Li, J., Yang, D., Zhao, Y., & Xie, H. " Solving nonpermutation flow-shop scheduling problem via a novel deep reinforcement
    learning approach. "Computers & Operations Research, 151, (2023): 106095.
    56. Wang, J., Zhou, P., Huang, G., & Wang, W. A data mining approach to discover
    133
    critical events for event-driven optimization in building air conditioning systems.
    Energy Procedia, 143, (2017): 251-257.
    57. Xiong, H., Shi, S., Ren, D., & Hu, J. " A survey of job shop scheduling problem:
    The types and models. "Computers & Operations Research, (2022): 105731.
    58. Yamamoto, M., and S. Y. Nof. "Scheduling/rescheduling in the manufacturing
    operating system environment." International Journal of Production Research
    23.4 (1985): 705-722.
    59. Yih, Yuehwern, and Arne Thesen. "Semi-Markov decision models for real-time
    scheduling." The International Journal of Production Research 29.11 (1991):
    2331-2346.
    60. Zhang, Shangtong, and Richard S. Sutton. "A deeper look at experience replay."
    arXiv preprint arXiv:1712.01275 (2017).
    61. Zhang, C., Song, W., Cao, Z., Zhang, J., Tan, P. S., & Chi, X. "Learning to
    dispatch for job shop scheduling via deep reinforcement learning. " Advances in
    Neural Information Processing Systems, 33, (2020): 1621-1632.
    62. Zhang, H., & Yu, T. Taxonomy of reinforcement learning algorithms. Deep
    Reinforcement Learning: Fundamentals, Research and Applications, (2020):
    125-133.
    63. Zhang, Tao, Shufang Xie, and Oliver Rose. "Real-time job shop scheduling
    based on simulation and Markov decision processes." 2017 Winter Simulation
    Conference (WSC). IEEE (2017).
    64. Zhao, Y., Wang, Y., Tan, Y., Zhang, J., & Yu, H. "Dynamic jobshop scheduling
    algorithm based on deep q network. " IEEE Access, 9, (2021): 122995-123011.
    65. Zhang, Z., Zheng, L., Li, N., Wang, W., Zhong, S., & Hu, K. " Minimizing mean
    weighted tardiness in unrelated parallel machine scheduling with reinforcement
    learning. "Computers & operations research, 39.7, (2012): 1315-1324.

    QR CODE