簡易檢索 / 詳目顯示

研究生: 林育丞
Lin, Yu-Cheng
論文名稱: 基於強化學習的動態車輛途程問題優化
Optimization of Dynamic Vehicle Routing Problems Based on Reinforcement Learning
指導教授: 葉維彰
Yeh, Wei-Chang
口試委員: 梁韵嘉
Liang, Yun-Chia
賴智明
Lai, Chyh-Ming
謝宗融
Hsieh, Tsung-Jung
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 66
中文關鍵詞: 動態車輛途程問題軟時間窗格強化學習自注意力機制
外文關鍵詞: Dynaimc Vehicle Routing Problem, Soft Time Window, Reinforcement Learning, Self-Attention Mechanism
相關次數: 點閱:51下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著科技的日新月異,全球定位系統、行動通訊和雲端運算的成熟應用推動了許多新的商業模式蓬勃發展,並促使車輛途程問題的研究重心由靜態轉向動態。近年來,機器學習的應用日益普及。其中,強化學習方法因其泛化能力和高效計算速度,非常適合應對動態車輛途程問題,因為該問題需要即時處理資訊的動態變化。
    因此,本研究針對具有軟時間窗格和動態行駛時間的車輛途程問題,採用了改進的動態注意力模型和強化學習方法。具體來說,本研究在前人所提出之動態注意力模型上加入了特徵融合和時間推進的機制,同時簡化了解題流程,以有效應對問題中的時間窗格和動態變化特性。此外,利用強化學習方法對該模型進行訓練,使其能夠逐漸提升求解性能。
    為了測試本研究方法的有效性,本研究先與原動態注意力模型進行比較以驗證改進之效果,再與其他動態車輛途程問題所提出的元啟發式方法進行比較。實驗結果顯示,本研究所提出之改進能夠在目標值和訓練時間上有良好的效果,且本研究所提出之方法能在極短的時間內生成比其他元啟發式方法更佳的解。


    With the rapid advancement of technology, the mature applications of global positioning systems, mobile communications, and cloud computing have driven the flourishing development of many new business models. This has also shifted the focus of research on vehicle routing problems from static to dynamic. In recent years, the adoption of machine learning has become increasingly widespread. Among these methods, reinforcement learning stands out for its ability to generalize and compute efficiently, making it highly suitable for addressing dynamic vehicle routing problems, which require real-time adaptation to changing information.
    Therefore, this study addresses vehicle routing problems with soft time windows and dynamic travel times by adopting an improved dynamic attention model and reinforcement learning methods. Specifically, this study incorporates feature fusion and time advancement mechanisms into the previously proposed dynamic attention model and simplifies the solution process to effectively address the characteristics of time windows and dynamic changes in the problem. Additionally, reinforcement learning methods are used to train the model, enabling it to gradually improve its solution performance.
    To evaluate the effectiveness of the proposed method, this study first compared it with the original dynamic attention model to verify the improvements. Subsequently, it was compared with other meta-heuristic methods proposed for the dynamic vehicle routing problem. The experimental results indicate that the improvements proposed in this study achieve favorable outcomes in terms of both objective values and training time. Furthermore, the proposed method can generate better solutions in a significantly shorter time compared to other meta-heuristic methods.

    摘要 I Abstract II 目錄 III 圖目錄 V 表目錄 VII 1 緒論 1 1.1 研究背景與動機 1 1.2 研究目的 4 1.3 研究流程 6 2 文獻回顧 7 2.1 車輛途程問題 7 2.1.1 車輛途程問題之定義 7 2.1.2 車輛途程問題之類型 8 2.1.3 小結 10 2.2 動態車輛途程問題 10 2.2.1 動態車輛途程問題之緣示 10 2.2.2 動態車輛途程問題之解題方法 12 2.2.3 小結 16 2.3 車輛途程問題中的行駛時間 17 2.3.1 時間相依的行駛時間 17 2.3.2 隨機性的行駛時間 19 2.3.3 動態性的行駛時間 20 2.3.4 小結 21 2.4 強化學習 21 2.4.1 強化學習之概念 21 2.4.2 強化學習之演算法 22 2.4.3 神經網路模型 25 2.4.4 強化學習於動態車輛途程問題之應用 30 2.4.5 小結 33 3 研究方法 34 3.1 問題定義 34 3.2 編解碼模型 37 3.3 強化學習 41 4 實驗結果與分析 45 4.1 訓練範例 45 4.2 參數分析 47 4.3 性能測試 51 4.4 動態有效性 56 5 結論 58 參考文獻 60

    [1] H. Zhang, H. Ge, J. Yang, and Y. Tong, “Review of vehicle routing problems: Models,
    classification and solving algorithms,” Archives of Computational Methods in Engineering,
    pp. 1–27, 2021.
    [2] B. H. O. Rios, E. C. Xavier, F. K. Miyazawa, P. Amorim, E. Curcio, and M. J. Santos, “Re-
    cent dynamic vehicle routing problems: A survey,” Computers & Industrial Engineering,
    vol. 160, p. 107604, 2021.
    [3] V. Pillac, M. Gendreau, C. Gu´eret, and A. L. Medaglia, “A review of dynamic vehicle
    routing problems,” European Journal of Operational Research, vol. 225, no. 1, pp. 1–11,
    2013.
    [4] H. Zhang, Q. Zhang, L. Ma, Z. Zhang, and Y. Liu, “A hybrid ant colony optimization
    algorithm for a multi-objective vehicle routing problem with flexible time windows,” Infor-
    mation Sciences, vol. 490, pp. 166–190, 2019.
    [5] S. Chen, R. Chen, G.-G. Wang, J. Gao, and A. K. Sangaiah, “An adaptive large neigh-
    borhood search heuristic for dynamic vehicle routing problems,” Computers & Electrical
    Engineering, vol. 67, pp. 596–607, 2018.
    [6] F. Wang, F. Liao, Y. Li, X. Yan, and X. Chen, “An ensemble learning based multi-objective
    evolutionary algorithm for the dynamic vehicle routing problem with time windows,”
    Computers & Industrial Engineering, vol. 154, p. 107131, 2021.
    [7] M. Liu, Q. Song, Q. Zhao, L. Li, Z. Yang, and Y. Zhang, “A hybrid bso-aco for dynamic
    vehicle routing problem on real-world road networks,” IEEE Access, vol. 10, pp. 118302–
    118312, 2022.
    [8] B. Peng, J. Wang, and Z. Zhang, “A deep reinforcement learning algorithm using dynamic
    attention model for vehicle routing problems,” in Artificial Intelligence Algorithms and
    Applications: 11th International Symposium, ISICA 2019, Guangzhou, China, November
    16–17, 2019, Revised Selected Papers 11, pp. 636–650, Springer, 2020.
    [9] X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai, and Q. Miao, “Deep
    reinforcement learning: a survey,” IEEE Transactions on Neural Networks and Learning
    Systems, 2022.
    [10] A. Charpentier, R. Elie, and C. Remlinger, “Reinforcement learning in economics and
    finance,” Computational Economics, pp. 1–38, 2021.
    [11] J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, “How to train your robot
    with deep reinforcement learning: lessons we have learned,” The International Journal of
    Robotics Research, vol. 40, no. 4-5, pp. 698–721, 2021.
    [12] A. Fawzi, M. Balog, A. Huang, T. Hubert, B. Romera-Paredes, M. Barekatain, A. Novikov,
    F. J. R Ruiz, J. Schrittwieser, G. Swirszcz, et al., “Discovering faster matrix multiplication
    algorithms with reinforcement learning,” Nature, vol. 610, no. 7930, pp. 47–53, 2022.
    [13] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combinatorial optimization
    with reinforcement learning,” arXiv preprint arXiv:1611.09940, 2016.
    [14] W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!,” arXiv
    preprint arXiv:1803.08475, 2018.
    [15] K. Zhang, F. He, Z. Zhang, X. Lin, and M. Li, “Multi-vehicle routing problems with soft
    time windows: A multi-agent reinforcement learning approach,” Transportation Research
    Part C: Emerging Technologies, vol. 121, p. 102861, 2020.
    [16] W. Joe and H. C. Lau, “Deep reinforcement learning approach to solve dynamic vehicle
    routing problem with stochastic customers,” in Proceedings of the international conference
    on automated planning and scheduling, vol. 30, pp. 394–402, 2020.
    [17] Y. Ma, X. Hao, J. Hao, J. Lu, X. Liu, T. Xialiang, M. Yuan, Z. Li, J. Tang, and Z. Meng,
    “A hierarchical reinforcement learning based optimization framework for large-scale dy-
    namic pickup and delivery problems,” Advances in Neural Information Processing Systems,
    vol. 34, pp. 23609–23620, 2021.
    [18] N. N. Sultana, V. Baniwal, A. Basumatary, P. Mittal, S. Ghosh, and H. Khadilkar, “Fast
    approximate solutions using reinforcement learning for dynamic capacitated vehicle routing
    with time windows,” arXiv preprint arXiv:2102.12088, 2021.
    [19] W. Pan and S. Q. Liu, “Deep reinforcement learning for the dynamic and uncertain vehicle
    routing problem,” Applied Intelligence, vol. 53, no. 1, pp. 405–422, 2023.
    [20] C. Zhou, J. Ma, L. Douge, E. P. Chew, and L. H. Lee, “Reinforcement learning-based
    approach for dynamic vehicle routing problem with stochastic demand,” Computers &
    Industrial Engineering, vol. 182, p. 109443, 2023.
    [21] N. R. Sabar, A. Bhaskar, E. Chung, A. Turky, and A. Song, “A self-adaptive evolution-
    ary algorithm for dynamic vehicle routing problems with traffic congestion,” Swarm and
    evolutionary computation, vol. 44, pp. 1018–1027, 2019.
    [22] G. B. Dantzig and J. H. Ramser, “The truck dispatching problem,” Management science,
    vol. 6, no. 1, pp. 80–91, 1959.
    [23] H. N. Psaraftis, M. Wen, and C. A. Kontovas, “Dynamic vehicle routing problems: Three
    decades and counting,” Networks, vol. 67, no. 1, pp. 3–31, 2016.
    [24] M. W. Ulmer and S. Streng, “Same-day delivery with pickup stations and autonomous
    vehicles,” Computers & Operations Research, vol. 108, pp. 1–19, 2019.
    [25] M. W. Ulmer, B. W. Thomas, and D. C. Mattfeld, “Preemptive depot returns for dynamic
    same-day delivery,” EURO journal on Transportation and Logistics, vol. 8, no. 4, pp. 327–
    361, 2019.
    [26] H. Abidi, K. Hassine, and F. Mguis, “Genetic algorithm for solving a dynamic vehicle rout-
    ing problem with time windows,” in 2018 International Conference on High Performance
    Computing & Simulation (HPCS), pp. 782–788, IEEE, 2018.
    [27] Y. He, X. Wang, F. Zhou, and Y. Lin, “Dynamic vehicle routing problem considering
    simultaneous dual services in the last mile delivery,” Kybernetes, vol. 49, no. 4, pp. 1267–
    1284, 2020.
    [28] S. Wang, W. Sun, and M. Huang, “An adaptive large neighborhood search for the multi-
    depot dynamic vehicle routing problem with time windows,” Computers & Industrial
    Engineering, vol. 191, p. 110122, 2024.
    [29] M. M. Solomon, “Algorithms for the vehicle routing and scheduling problems with time
    window constraints,” Operations research, vol. 35, no. 2, pp. 254–265, 1987.
    [30] L. Hong, “An improved lns algorithm for real-time vehicle routing problem with time
    windows,” Computers & Operations Research, vol. 39, no. 2, pp. 151–163, 2012.
    [31] J. De Armas and B. Meli´an-Batista, “Variable neighborhood search for a dynamic rich
    vehicle routing problem with time windows,” Computers & Industrial Engineering, vol. 85,
    pp. 120–131, 2015.
    [32] H.-F. Wang and Y.-Y. Chen, “A genetic algorithm for the simultaneous delivery and pickup
    problems with time window,” Computers & industrial engineering, vol. 62, no. 1, pp. 84–95,
    2012.
    [33] M. Mavrovouniotis and S. Yang, “Ant algorithms with immigrants schemes for the dynamic
    vehicle routing problem,” Information Sciences, vol. 294, pp. 456–477, 2015.
    [34] M. He, Z. Wei, X. Wu, and Y. Peng, “An adaptive variable neighborhood search ant colony
    algorithm for vehicle routing problem with soft time windows,” IEEE Access, vol. 9,
    pp. 21258–21266, 2021.
    [35] S. Ichoua, M. Gendreau, and J.-Y. Potvin, “Vehicle dispatching with time-dependent travel
    times,” European journal of operational research, vol. 144, no. 2, pp. 379–396, 2003.
    [36] B. Pan, Z. Zhang, and A. Lim, “Multi-trip time-dependent vehicle routing problem with
    time windows,” European Journal of Operational Research, vol. 291, no. 1, pp. 218–231,
    2021.
    [37] M. Gmira, M. Gendreau, A. Lodi, and J.-Y. Potvin, “Tabu search for the time-dependent
    vehicle routing problem with time windows on a road network,” European Journal of
    Operational Research, vol. 288, no. 1, pp. 129–140, 2021.
    [38] M. Rajabi-Bahaabadi, A. Shariat-Mohaymany, M. Babaei, and D. Vigo, “Reliable vehicle
    routing problem in stochastic networks with correlated travel times,” Operational Research,
    vol. 21, pp. 299–330, 2021.
    [39] B. Rostami, G. Desaulniers, F. Errico, and A. Lodi, “Branch-price-and-cut algorithms
    for the vehicle routing problem with stochastic and correlated travel times,” Operations
    Research, vol. 69, no. 2, pp. 436–455, 2021.
    [40] G. Li and J. Li, “An improved tabu search algorithm for the stochastic vehicle routing
    problem with soft time windows,” IEEE Access, vol. 8, pp. 158115–158124, 2020.
    [41] J. F. Sze, S. Salhi, and N. Wassan, “An adaptive variable neighbourhood search approach
    for the dynamic vehicle routing problem,” Computers & Operations Research, vol. 164,
    p. 106531, 2024.
    [42] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
    [43] N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, “Reinforcement learning for combi-
    natorial optimization: A survey,” Computers & Operations Research, vol. 134, p. 105400,
    2021.
    [44] S. M. Raza, M. Sajid, and J. Singh, “Vehicle routing problem using reinforcement learning:
    Recent advancements,” in Advanced machine intelligence and signal processing, pp. 269–
    280, Springer, 2022.
    [45] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Ried-
    miller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602,
    2013.
    [46] H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-
    learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1,
    2016.
    [47] M. Nazari, A. Oroojlooy, L. Snyder, and M. Tak´ac, “Reinforcement learning for solving
    the vehicle routing problem,” Advances in neural information processing systems, vol. 31,
    2018.
    [48] J. Zhao, M. Mao, X. Zhao, and J. Zou, “A hybrid of deep reinforcement learning and local
    search for the vehicle routing problems,” IEEE Transactions on Intelligent Transportation
    Systems, vol. 22, no. 11, pp. 7208–7218, 2020.
    [49] H. Lu, X. Zhang, and S. Yang, “A learning-based iterative method for solving vehicle
    routing problems,” in International conference on learning representations, 2019.
    [50] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist rein-
    forcement learning,” Machine learning, vol. 8, pp. 229–256, 1992.
    [51] L. Xin, W. Song, Z. Cao, and J. Zhang, “Multi-decoder attention model with embedding
    glimpse for solving vehicle routing problems,” in Proceedings of the AAAI Conference on
    Artificial Intelligence, vol. 35, pp. 12042–12049, 2021.
    [52] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and
    I. Polosukhin, “Attention is all you need,” Advances in neural information processing
    systems, vol. 30, 2017.
    [53] J. James, W. Yu, and J. Gu, “Online vehicle routing with neural combinatorial optimization
    and deep reinforcement learning,” IEEE Transactions on Intelligent Transportation Systems,
    vol. 20, no. 10, pp. 3806–3817, 2019.
    [54] Z. Zhang, H. Liu, M. Zhou, and J. Wang, “Solving dynamic traveling salesman problems
    with deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning
    Systems, vol. 34, no. 4, pp. 2119–2132, 2021.
    [55] R. Basso, B. Kulcs´ar, I. Sanchez-Diaz, and X. Qu, “Dynamic stochastic electric vehicle
    routing with safe reinforcement learning,” Transportation research part E: logistics and
    transportation review, vol. 157, p. 102496, 2022.
    [56] H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Alizadeh, “Learning
    scheduling algorithms for data processing clusters,” in Proceedings of the ACM special
    interest group on data communication, pp. 270–288, 2019.
    [57] X. Li, W. Luo, M. Yuan, J. Wang, J. Lu, J. Wang, J. L¨u, and J. Zeng, “Learning to optimize
    industry-scale dynamic pickup and delivery problems,” in 2021 IEEE 37th International
    Conference on Data Engineering (ICDE), pp. 2511–2522, IEEE, 2021.
    [58] P. Augerat, D. Naddef, J. Belenguer, E. Benavent, A. Corberan, and G. Rinaldi, “Compu-
    tational results with a branch and cut code for the capacitated vehicle routing problem,”
    1995.
    [59] G. Kim, Y. S. Ong, T. Cheong, and P. S. Tan, “Solving the dynamic vehicle routing problem
    under traffic congestion,” IEEE Transactions on Intelligent Transportation Systems, vol. 17,
    no. 8, pp. 2367–2380, 2016.

    QR CODE