簡易檢索 / 詳目顯示

研究生: 王婉瑜
Wang, Wan-Yu
論文名稱: 結合模擬與強化學習於貨到人自動搬運車系統之任務指派問題
Integrating Simulation and Reinforcement Learning for Task Allocation Problem in Robotic Mobile Fulfillment System
指導教授: 林則孟
Lin, James T.
口試委員: 丁慶榮
Ting, Ching-Jung
劉建良
Liu, Chien-Liang
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 93
中文關鍵詞: 無人自動搬運車任務指派貨到人自動搬運車系統強化學習深度學習
外文關鍵詞: Automated Guided Vehicle, Task Allocation, Robotic Mobile Fulfillment System, Reinforcement Learning, Deep Learning
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在物料搬運與倉儲系統中,自動搬運車的使用有愈來愈多的趨勢。其中有一新型態的倉儲自動物料搬運系統,貨到人自動搬運車系統(RMFS),自動搬運車又稱為機器人,例如亞馬遜的Kiva系統,在電子商務自動揀貨系統已逐漸受到重視。在一個具有自動搬運車的系統中,不管是彈性製造系統(FMS)或是貨到人自動搬運車系統(RMFS),存在著許多調度車子的挑戰,例如如何決定派車順序並及時達交。在過去大部分的文獻中,派車的相關議題多用數學模型或啟發式演算法求解,然而在動態環境下的情形卻鮮少被探討。因此,本研究著重將強化學習應用於一動態派車問題上。
    在貨到人自動搬運車系統(RMFS)中,任務指派問題可被視為一連串的動態決策問題,本研究採用深度強化學習方法,並提出強化學習結合模擬模型的訓練樣本生成架構。另外,強化學習訓練樣本的基本元素將被設計,包含狀態(state)、動作(action)與獎勵值(reward),藉由訓練樣本的傳遞,連接起模擬環境(model of environment)與強化學習,其中貨到人自動搬運車系統中的空車移動時間與排隊時間將被深入探討,並結合到獎勵值的設計。此外,為了克服狀態維度上的限制,本研究應用DQN (Deep Q Network)來嘗試解決此問題。
    本研究設計不同車子數量與訂單動態變化的情境來驗證深度強化學習的效果。從模擬結果發現,深度強化學習能夠從過往經驗所產生出的訓練樣本不斷學習至接近最佳單一派工法則,並且在高度動態的環境中,顯現出其適應環境變化去更動其決策的能力,更勝過於最佳的單一派工法則。


    Use of autonomous guided vehicles (AGVs) is a growing trend in the material handling and warehousing. A new type of warehousing system called Robotic Mobile Fulfillment Systems (RMFS) is introduced. AGVs are also called as robots, such as Kiva systems in Amazon, and it has received great attention on e-commerce warehousing management. In an AGVs system, no matter flexible manufacturing system (FMS) or RMFS, the challenge of dispatching the autonomous vehicles to complete the job delivery in time exists. Most literatures solve this issue by mathematical model or metaheuristics. However, it is rare to handle with dynamic environment and thus reinforcement learning (RL) approach is adopted.
    This work views task allocation problem in robotic mobile fulfillment systems as a sequential decision-making problem and proposes to use deep reinforcement learning to cope with this problem. A RL framework integrates simulation-based training data generation algorithm is developed. Also, since the training data, which is composed of states, actions, and rewards, is fundamental for RL algorithm and connects the model of environment and learning policy, these elements is formulated and designed in this research. Two factors, empty-trip time and queue time, are analyzed to design reward. Furthermore, to overcome the “curse of dimensionality” and mitigate the uncountable state space, Deep Q-Network (DQN) is introduced to tackle.
    To evaluate the deep reinforcement learning methodology, experiments with different AGV fleet size and dynamic scenarios are conducted. A performance comparison of the RL-based dispatching with the traditionally used rule-based system shows remarkable results by showing its adaptability to modify rule-selection properly in different situations.

    摘要 i Abstract ii 1. Introduction 1 1.1 Research Background and Motivation 1 1.2 Research Objective 2 2. Robotic Mobile Fulfillment System 4 2.1. Components 4 2.2. Agents and Behaviors 7 2.3. Decisions 8 2.3.1. Order Assignment 9 2.3.2. Pod Selection 10 2.3.3. Task Allocation 11 2.3.4. Pod Storage Assignment 12 2.3.5. Path Planning 13 2.4. Discussion of Research Issues 13 3. Literature Review 16 3.1. AGV Dispatching Problem 16 3.2. Reinforcement Learning 17 3.2.1. RL Introduction 17 3.2.2. Deep Q Network 19 3.2.3. RL on AGV Dispatching 21 4. Integrating Simulation and RL for Task Allocation Problem in RMFS 26 4.1. Problem Statement 26 4.1.1. Problem Analysis 26 4.1.2. Problem Definition 28 4.2. RL Framework for RMFS 31 4.2.1. RL Elements Design 32 4.2.2. Simulation-based training data generator 38 4.2.3. Deep Q Network 41 5. Experiment and Result Analysis 45 5.1. Experiment Scenario and Setting 45 5.1.1. Training Stability and Convergence 46 5.1.2. AGV Fleet Size 47 5.1.3. Tasks with Different t/p Ratio 54 5.2. Result and Analysis 62 6. Conclusion and Suggestion 64 6.1. Conclusion 64 6.2. Suggestion and Future Works 65 Reference 67 Appendix 73

    [1]. Arviv, K., Stern, H., & Edan, Y. (2016). Collaborative reinforcement learning for a two-robot job transfer flow-shop scheduling problem. International Journal of Production Research, 54(4), 1196-1209.
    [2]. Buşoniu, L., Babuška, R., & De Schutter, B. (2010). Multi-agent reinforcement learning: An overview. In Innovations in multi-agent systems and applications-1 (pp. 183-221). Springer, Berlin, Heidelberg.
    [3]. Chang, N.-F. (2019) Application of Workload Control in Semiconductor Assembly Factory. (Unpublished Master’s thesis) National Tsing Hua University, Hsinchu Taiwan
    [4]. Chen, C. M., Gong, Y., De Koster, R. B., & Van Nunen, J. A. (2010). A flexible evaluative framework for order picking systems. Production and Operations Management, 19(1), 70-82.
    [5]. Chen, C., Xia, B., Zhou, B. H., & Xi, L. (2015). A reinforcement learning based approach for a multiple-load carrier scheduling problem. Journal of Intelligent Manufacturing, 26(6), 1233-1245.
    [6]. Choe, R., Kim, J., & Ryu, K. R. (2016). Online preference learning for adaptive dispatching of AGVs in an automated container terminal. Applied Soft Computing, 38, 647-660.
    [7]. Claes, D. (2018). Decentralised Multi-Robot Systems Towards Coordination in Real World Settings (Doctoral dissertation, University of Liverpool).
    [8]. Egbelu, P. J., & Tanchoco, J. M. (1984). Characterization of automatic guided vehicle dispatching rules. International Journal of Production Research, 22(3), 359-374.
    [9]. Enright, J. J., & Wurman, P. R. (2011, August). Optimization and coordinated autonomy in mobile fulfillment systems. In Workshops at the twenty-fifth AAAI conference on artificial intelligence.
    [10]. Gerkey, B. P., & Matarić, M. J. (2004). A formal analysis and taxonomy of task allocation in multi-robot systems. The International Journal of Robotics Research, 23(9), 939-954.
    [11]. Gosavi, A. (2009). Reinforcement learning: A tutorial survey and recent advances. INFORMS Journal on Computing, 21(2), 178-192.
    [12]. Govindaiah, S., & Petty, M. D. (2019, April). Applying Reinforcement Learning to Plan Manufacturing Material Handling Part 1: Background and Formal Problem Specification. In Proceedings of the 2019 ACM Southeast Conference (pp. 168-171).
    [13]. Govindaiah, S., & Petty, M. D. (2019, April). Applying Reinforcement Learning to Plan Manufacturing Material Handling Part 2: Experimentation and Results. In Proceedings of the 2019 ACM Southeast Conference (pp. 16-23).
    [14]. Gu, J., Goetschalckx, M., & McGinnis, L. F. (2007). Research on warehouse operation: A comprehensive review. European Journal of Operational Research, 177(1), 1-21.
    [15]. Guan, M., & Li, Z. (2018). Genetic algorithm for scattered storage assignment in kiva mobile fulfillment system. American Journal of Operations Research, 8(6), 474-485.
    [16]. Heragu, S. S., Cai, X., Krishnamurthy, A., & Malmborg, C. J. (2011). Analytical models for analysis of automated warehouse material handling systems. International Journal of Production Research, 49(22), 6833-6861.
    [17]. Hsyu, M.-W. (2018) A machine-learning-based approach for AGV dispatching. (Unpublished Master’s thesis) National Tsing Hua University, Hsinchu Taiwan
    [18]. Krenzler, R., Xie, L., & Li, H. (2018). Deterministic pod repositioning problem in robotic mobile fulfillment systems. arXiv preprint arXiv:1810.05514.
    [19]. Lamballais, T., Roy, D., & de Koster, M. B. M. (2016). A Novel Approach to Analyze Inventory Allocation Decisions in Robotic Mobile Fulfillment Systems.
    [20]. Lamballais, T., Roy, D., & De Koster, M. B. M. (2017). Estimating performance in a robotic mobile fulfillment system. European Journal of Operational Research, 256(3), 976-990.
    [21]. Le-Anh, T., & van der Meer, J. R. (2004). Testing and classifying vehicle dispatching rules in three real-world settings. Journal of Operations Management, 22(4), 369-386.
    [22]. Li, M. P., Sankaran, P., Kuhl, M. E., Ganguly, A., Kwasinski, A., & Ptucha, R. (2018, December). Simulation analysis of a deep reinforcement learning approach for task selection by autonomous material handling vehicles. In 2018 Winter Simulation Conference (WSC) (pp. 1073-1083).
    [23]. Li, Z. P., Zhang, J. L., Zhang, H. J., & Hua, G. W. (2017). Optimal selection of movable shelves under cargo-to-person picking mode. International Journal of Simulation Modelling, 16(1), 145-156.
    [24]. Li, Z., & Li, W. (2015). Mathematical model and algorithm for the task allocation problem of robots in the smart warehouse. American Journal of Operations Research, 5(06), 493.
    [25]. Li, Z., Li, W., & Jiang, L. (2016). Task assignment problem of robots in a smart warehouse environment. Management Studies, 4(4), 167-175.
    [26]. Maxwell, W. L., & Muckstadt, J. A. (1982). Design of automatic guided vehicle systems. Iie Transactions, 14(2), 114-124.
    [27]. McFarlane, R. (2018). A survey of exploration strategies in reinforcement learning. McGill University, http://www. cs. mcgill. ca/cs526/roger. pdf, accessed: April.
    [28]. Merschformann, M. (2018). Active repositioning of storage units in Robotic Mobile Fulfillment Systems. In Operations Research Proceedings 2017 (pp. 379-385). Springer, Cham.
    [29]. Merschformann, M., Lamballais, T., de Koster, R., & Suhl, L. (2018). Decision rules for robotic mobile fulfillment systems. arXiv preprint arXiv:1801.06703.
    [30]. Merschformann, M., Xie, L., & Erdmann, D. (2017). Path planning for robotic mobile fulfillment systems. arXiv preprint arXiv:1706.09347.
    [31]. Merschformann, M., Xie, L., & Li, H. (2017). RAWSim-O: A simulation framework for robotic mobile fulfillment systems. arXiv preprint arXiv:1710.04726.
    [32]. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
    [33]. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., & Petersen, S. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
    [34]. Mousavi, M., Yap, H. J., Musa, S. N., Tahriri, F., & Dawal, S. Z. M. (2017). Multi-objective AGV scheduling in an FMS using a hybrid of genetic algorithm and particle swarm optimization. PloS One, 12(3).
    [35]. Ou, X., Chang, Q., & Chakraborty, N. (2019). Simulation study on reward function of reinforcement learning in gantry work cell scheduling. Journal of Manufacturing Systems, 50, 1-8.
    [36]. Panwalkar, S. S., & Iskander, W. (1977). A survey of scheduling rules. Operations Research, 25(1), 45-61.
    [37]. Peng, J., & Williams, R. J. (1994). Incremental multi-step Q-learning. In Machine Learning Proceedings 1994 (pp. 226-232). Morgan Kaufmann.
    [38]. Saidi-Mehrabad, M., Dehnavi-Arani, S., Evazabadian, F., & Mahmoodian, V. (2015). An Ant Colony Algorithm (ACA) for solving the new integrated model of job shop scheduling and conflict-free routing of AGVs. Computers & Industrial Engineering, 86, 2-13.
    [39]. Sewak, M. (2019). Deep Reinforcement Learning. Springer Singapore.
    [40]. Shiue, Y. R. (2009). Data-mining-based dynamic dispatching rule selection mechanism for shop floor control systems using a support vector machine approach. International Journal of Production Research, 47(13), 3669-3690.
    [41]. Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006, June). PAC model-free reinforcement learning. In Proceedings of the 23rd international conference on Machine learning (pp. 881-888).
    [42]. Stricker, N., Kuhnle, A., Sturm, R., & Friess, S. (2018). Reinforcement learning for adaptive order dispatching in the semiconductor industry. CIRP Annals, 67(1), 511-514.
    [43]. Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Massachusetts London. England: The MIT Press Cambridge.
    [44]. Takahashi, K., & Tomah, S. (2020). Online optimization of AGV transport systems using deep reinforcement learning. Bulletin of Networking, Computing, Systems, and Software, 9(1), 53-57.
    [45]. Wang, Y. C., & Usher, J. M. (2005). Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence, 18(1), 73-82.
    [46]. Watanabe, M., Furukawa, M., Kinoshita, M., & Kakazu, Y. (2001). Acquisition of Efficient Transportation Knowledge by Q-Learning for Multiple Autonomous AGVs and Their Transportation Simulation. JOURNAL-JAPAN SOCIETY FOR PRECISION ENGINEERING, 67(10), 1609-1614.
    [47]. Wurman, P. R., D'Andrea, R., & Mountz, M. (2008). Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI Magazine, 29(1), 9-9.
    [48]. Xiang, X., Liu, C., & Miao, L. (2018). Storage assignment and order batching problem in Kiva mobile fulfilment system. Engineering Optimization, 50(11), 1941-1962.
    [49]. Xue, T., Zeng, P., & Yu, H. (2018, February). A reinforcement learning method for multi-AGV scheduling in manufacturing. In 2018 IEEE International Conference on Industrial Technology (ICIT) (pp. 1557-1561).
    [50]. Zamiri Marvizadeh, S., & Choobineh, F. F. (2014). Entropy-based dispatching for automatic guided vehicles. International Journal of Production Research, 52(11), 3303-3316.
    [51]. Zeng, Q., Yang, Z., & Hu, X. (2011). A method integrating simulation and reinforcement learning for operation scheduling in container terminals. Transport, 26(4), 383-393.
    [52]. Zhang, Z., Zheng, L., & Weng, M. X. (2007). Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning. The International Journal of Advanced Manufacturing Technology, 34(9-10), 968-980.
    [53]. Zhang, Z., Zheng, L., Li, N., Wang, W., Zhong, S., & Hu, K. (2012). Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning. Computers & Operations Research, 39(7), 1315-1324.

    QR CODE