基於強化學習的動態車輛途程問題優化｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	林育丞 Lin, Yu-Cheng
論文名稱：	基於強化學習的動態車輛途程問題優化 Optimization of Dynamic Vehicle Routing Problems Based on Reinforcement Learning
指導教授：	葉維彰 Yeh, Wei-Chang
口試委員:	梁韵嘉 Liang, Yun-Chia 賴智明 Lai, Chyh-Ming 謝宗融 Hsieh, Tsung-Jung
學位類別：	碩士 Master
系所名稱：	工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management
論文出版年：	2024
畢業學年度：	112
語文別：	中文
論文頁數：	66
中文關鍵詞：	動態車輛途程問題、軟時間窗格、強化學習、自注意力機制
外文關鍵詞：	Dynaimc Vehicle Routing Problem, Soft Time Window, Reinforcement Learning, Self-Attention Mechanism
相關次數：	點閱：51 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著科技的日新月異，全球定位系統、行動通訊和雲端運算的成熟應用推動了許多新的商業模式蓬勃發展，並促使車輛途程問題的研究重心由靜態轉向動態。近年來，機器學習的應用日益普及。其中，強化學習方法因其泛化能力和高效計算速度，非常適合應對動態車輛途程問題，因為該問題需要即時處理資訊的動態變化。
因此，本研究針對具有軟時間窗格和動態行駛時間的車輛途程問題，採用了改進的動態注意力模型和強化學習方法。具體來說，本研究在前人所提出之動態注意力模型上加入了特徵融合和時間推進的機制，同時簡化了解題流程，以有效應對問題中的時間窗格和動態變化特性。此外，利用強化學習方法對該模型進行訓練，使其能夠逐漸提升求解性能。
為了測試本研究方法的有效性，本研究先與原動態注意力模型進行比較以驗證改進之效果，再與其他動態車輛途程問題所提出的元啟發式方法進行比較。實驗結果顯示，本研究所提出之改進能夠在目標值和訓練時間上有良好的效果，且本研究所提出之方法能在極短的時間內生成比其他元啟發式方法更佳的解。

With the rapid advancement of technology, the mature applications of global positioning systems, mobile communications, and cloud computing have driven the flourishing development of many new business models. This has also shifted the focus of research on vehicle routing problems from static to dynamic. In recent years, the adoption of machine learning has become increasingly widespread. Among these methods, reinforcement learning stands out for its ability to generalize and compute efficiently, making it highly suitable for addressing dynamic vehicle routing problems, which require real-time adaptation to changing information.
Therefore, this study addresses vehicle routing problems with soft time windows and dynamic travel times by adopting an improved dynamic attention model and reinforcement learning methods. Specifically, this study incorporates feature fusion and time advancement mechanisms into the previously proposed dynamic attention model and simplifies the solution process to effectively address the characteristics of time windows and dynamic changes in the problem. Additionally, reinforcement learning methods are used to train the model, enabling it to gradually improve its solution performance.
To evaluate the effectiveness of the proposed method, this study first compared it with the original dynamic attention model to verify the improvements. Subsequently, it was compared with other meta-heuristic methods proposed for the dynamic vehicle routing problem. The experimental results indicate that the improvements proposed in this study achieve favorable outcomes in terms of both objective values and training time. Furthermore, the proposed method can generate better solutions in a significantly shorter time compared to other meta-heuristic methods.

摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VII

緒論 1
1 研究背景與動機 1
2 研究目的 4
3 研究流程 6

文獻回顧 7
1 車輛途程問題 7
1.1 車輛途程問題之定義 7
1.2 車輛途程問題之類型 8
1.3 小結 10
2 動態車輛途程問題 10
2.1 動態車輛途程問題之緣示 10
2.2 動態車輛途程問題之解題方法 12
2.3 小結 16
3 車輛途程問題中的行駛時間 17
3.1 時間相依的行駛時間 17
3.2 隨機性的行駛時間 19
3.3 動態性的行駛時間 20
3.4 小結 21
4 強化學習 21
4.1 強化學習之概念 21
4.2 強化學習之演算法 22
4.3 神經網路模型 25
4.4 強化學習於動態車輛途程問題之應用 30
4.5 小結 33

研究方法 34
1 問題定義 34
2 編解碼模型 37
3 強化學習 41

實驗結果與分析 45
1 訓練範例 45
2 參數分析 47
3 性能測試 51
4 動態有效性 56

結論 58

參考文獻 60

                                

[1] H. Zhang, H. Ge, J. Yang, and Y. Tong, “Review of vehicle routing problems: Models,
classification and solving algorithms,” Archives of Computational Methods in Engineering,
pp. 1–27, 2021.
[2] B. H. O. Rios, E. C. Xavier, F. K. Miyazawa, P. Amorim, E. Curcio, and M. J. Santos, “Re-
cent dynamic vehicle routing problems: A survey,” Computers & Industrial Engineering,
vol. 160, p. 107604, 2021.
[3] V. Pillac, M. Gendreau, C. Gu´eret, and A. L. Medaglia, “A review of dynamic vehicle
routing problems,” European Journal of Operational Research, vol. 225, no. 1, pp. 1–11,
2013.
[4] H. Zhang, Q. Zhang, L. Ma, Z. Zhang, and Y. Liu, “A hybrid ant colony optimization
algorithm for a multi-objective vehicle routing problem with flexible time windows,” Infor-
mation Sciences, vol. 490, pp. 166–190, 2019.
[5] S. Chen, R. Chen, G.-G. Wang, J. Gao, and A. K. Sangaiah, “An adaptive large neigh-
borhood search heuristic for dynamic vehicle routing problems,” Computers & Electrical
Engineering, vol. 67, pp. 596–607, 2018.
[6] F. Wang, F. Liao, Y. Li, X. Yan, and X. Chen, “An ensemble learning based multi-objective
evolutionary algorithm for the dynamic vehicle routing problem with time windows,”
Computers & Industrial Engineering, vol. 154, p. 107131, 2021.
[7] M. Liu, Q. Song, Q. Zhao, L. Li, Z. Yang, and Y. Zhang, “A hybrid bso-aco for dynamic
vehicle routing problem on real-world road networks,” IEEE Access, vol. 10, pp. 118302–
118312, 2022.
[8] B. Peng, J. Wang, and Z. Zhang, “A deep reinforcement learning algorithm using dynamic
attention model for vehicle routing problems,” in Artificial Intelligence Algorithms and
Applications: 11th International Symposium, ISICA 2019, Guangzhou, China, November
16–17, 2019, Revised Selected Papers 11, pp. 636–650, Springer, 2020.
[9] X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai, and Q. Miao, “Deep
reinforcement learning: a survey,” IEEE Transactions on Neural Networks and Learning
Systems, 2022.
[10] A. Charpentier, R. Elie, and C. Remlinger, “Reinforcement learning in economics and
finance,” Computational Economics, pp. 1–38, 2021.
[11] J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, “How to train your robot
with deep reinforcement learning: lessons we have learned,” The International Journal of
Robotics Research, vol. 40, no. 4-5, pp. 698–721, 2021.
[12] A. Fawzi, M. Balog, A. Huang, T. Hubert, B. Romera-Paredes, M. Barekatain, A. Novikov,
F. J. R Ruiz, J. Schrittwieser, G. Swirszcz, et al., “Discovering faster matrix multiplication
algorithms with reinforcement learning,” Nature, vol. 610, no. 7930, pp. 47–53, 2022.
[13] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combinatorial optimization
with reinforcement learning,” arXiv preprint arXiv:1611.09940, 2016.
[14] W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!,” arXiv
preprint arXiv:1803.08475, 2018.
[15] K. Zhang, F. He, Z. Zhang, X. Lin, and M. Li, “Multi-vehicle routing problems with soft
time windows: A multi-agent reinforcement learning approach,” Transportation Research
Part C: Emerging Technologies, vol. 121, p. 102861, 2020.
[16] W. Joe and H. C. Lau, “Deep reinforcement learning approach to solve dynamic vehicle
routing problem with stochastic customers,” in Proceedings of the international conference
on automated planning and scheduling, vol. 30, pp. 394–402, 2020.
[17] Y. Ma, X. Hao, J. Hao, J. Lu, X. Liu, T. Xialiang, M. Yuan, Z. Li, J. Tang, and Z. Meng,
“A hierarchical reinforcement learning based optimization framework for large-scale dy-
namic pickup and delivery problems,” Advances in Neural Information Processing Systems,
vol. 34, pp. 23609–23620, 2021.
[18] N. N. Sultana, V. Baniwal, A. Basumatary, P. Mittal, S. Ghosh, and H. Khadilkar, “Fast
approximate solutions using reinforcement learning for dynamic capacitated vehicle routing
with time windows,” arXiv preprint arXiv:2102.12088, 2021.
[19] W. Pan and S. Q. Liu, “Deep reinforcement learning for the dynamic and uncertain vehicle
routing problem,” Applied Intelligence, vol. 53, no. 1, pp. 405–422, 2023.
[20] C. Zhou, J. Ma, L. Douge, E. P. Chew, and L. H. Lee, “Reinforcement learning-based
approach for dynamic vehicle routing problem with stochastic demand,” Computers &
Industrial Engineering, vol. 182, p. 109443, 2023.
[21] N. R. Sabar, A. Bhaskar, E. Chung, A. Turky, and A. Song, “A self-adaptive evolution-
ary algorithm for dynamic vehicle routing problems with traffic congestion,” Swarm and
evolutionary computation, vol. 44, pp. 1018–1027, 2019.
[22] G. B. Dantzig and J. H. Ramser, “The truck dispatching problem,” Management science,
vol. 6, no. 1, pp. 80–91, 1959.
[23] H. N. Psaraftis, M. Wen, and C. A. Kontovas, “Dynamic vehicle routing problems: Three
decades and counting,” Networks, vol. 67, no. 1, pp. 3–31, 2016.
[24] M. W. Ulmer and S. Streng, “Same-day delivery with pickup stations and autonomous
vehicles,” Computers & Operations Research, vol. 108, pp. 1–19, 2019.
[25] M. W. Ulmer, B. W. Thomas, and D. C. Mattfeld, “Preemptive depot returns for dynamic
same-day delivery,” EURO journal on Transportation and Logistics, vol. 8, no. 4, pp. 327–
361, 2019.
[26] H. Abidi, K. Hassine, and F. Mguis, “Genetic algorithm for solving a dynamic vehicle rout-
ing problem with time windows,” in 2018 International Conference on High Performance
Computing & Simulation (HPCS), pp. 782–788, IEEE, 2018.
[27] Y. He, X. Wang, F. Zhou, and Y. Lin, “Dynamic vehicle routing problem considering
simultaneous dual services in the last mile delivery,” Kybernetes, vol. 49, no. 4, pp. 1267–
1284, 2020.
[28] S. Wang, W. Sun, and M. Huang, “An adaptive large neighborhood search for the multi-
depot dynamic vehicle routing problem with time windows,” Computers & Industrial
Engineering, vol. 191, p. 110122, 2024.
[29] M. M. Solomon, “Algorithms for the vehicle routing and scheduling problems with time
window constraints,” Operations research, vol. 35, no. 2, pp. 254–265, 1987.
[30] L. Hong, “An improved lns algorithm for real-time vehicle routing problem with time
windows,” Computers & Operations Research, vol. 39, no. 2, pp. 151–163, 2012.
[31] J. De Armas and B. Meli´an-Batista, “Variable neighborhood search for a dynamic rich
vehicle routing problem with time windows,” Computers & Industrial Engineering, vol. 85,
pp. 120–131, 2015.
[32] H.-F. Wang and Y.-Y. Chen, “A genetic algorithm for the simultaneous delivery and pickup
problems with time window,” Computers & industrial engineering, vol. 62, no. 1, pp. 84–95,
2012.
[33] M. Mavrovouniotis and S. Yang, “Ant algorithms with immigrants schemes for the dynamic
vehicle routing problem,” Information Sciences, vol. 294, pp. 456–477, 2015.
[34] M. He, Z. Wei, X. Wu, and Y. Peng, “An adaptive variable neighborhood search ant colony
algorithm for vehicle routing problem with soft time windows,” IEEE Access, vol. 9,
pp. 21258–21266, 2021.
[35] S. Ichoua, M. Gendreau, and J.-Y. Potvin, “Vehicle dispatching with time-dependent travel
times,” European journal of operational research, vol. 144, no. 2, pp. 379–396, 2003.
[36] B. Pan, Z. Zhang, and A. Lim, “Multi-trip time-dependent vehicle routing problem with
time windows,” European Journal of Operational Research, vol. 291, no. 1, pp. 218–231,
2021.
[37] M. Gmira, M. Gendreau, A. Lodi, and J.-Y. Potvin, “Tabu search for the time-dependent
vehicle routing problem with time windows on a road network,” European Journal of
Operational Research, vol. 288, no. 1, pp. 129–140, 2021.
[38] M. Rajabi-Bahaabadi, A. Shariat-Mohaymany, M. Babaei, and D. Vigo, “Reliable vehicle
routing problem in stochastic networks with correlated travel times,” Operational Research,
vol. 21, pp. 299–330, 2021.
[39] B. Rostami, G. Desaulniers, F. Errico, and A. Lodi, “Branch-price-and-cut algorithms
for the vehicle routing problem with stochastic and correlated travel times,” Operations
Research, vol. 69, no. 2, pp. 436–455, 2021.
[40] G. Li and J. Li, “An improved tabu search algorithm for the stochastic vehicle routing
problem with soft time windows,” IEEE Access, vol. 8, pp. 158115–158124, 2020.
[41] J. F. Sze, S. Salhi, and N. Wassan, “An adaptive variable neighbourhood search approach
for the dynamic vehicle routing problem,” Computers & Operations Research, vol. 164,
p. 106531, 2024.
[42] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[43] N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, “Reinforcement learning for combi-
natorial optimization: A survey,” Computers & Operations Research, vol. 134, p. 105400,
2021.
[44] S. M. Raza, M. Sajid, and J. Singh, “Vehicle routing problem using reinforcement learning:
Recent advancements,” in Advanced machine intelligence and signal processing, pp. 269–
280, Springer, 2022.
[45] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Ried-
miller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602,
2013.
[46] H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-
learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1,
2016.
[47] M. Nazari, A. Oroojlooy, L. Snyder, and M. Tak´ac, “Reinforcement learning for solving
the vehicle routing problem,” Advances in neural information processing systems, vol. 31,
2018.
[48] J. Zhao, M. Mao, X. Zhao, and J. Zou, “A hybrid of deep reinforcement learning and local
search for the vehicle routing problems,” IEEE Transactions on Intelligent Transportation
Systems, vol. 22, no. 11, pp. 7208–7218, 2020.
[49] H. Lu, X. Zhang, and S. Yang, “A learning-based iterative method for solving vehicle
routing problems,” in International conference on learning representations, 2019.
[50] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist rein-
forcement learning,” Machine learning, vol. 8, pp. 229–256, 1992.
[51] L. Xin, W. Song, Z. Cao, and J. Zhang, “Multi-decoder attention model with embedding
glimpse for solving vehicle routing problems,” in Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 35, pp. 12042–12049, 2021.
[52] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and
I. Polosukhin, “Attention is all you need,” Advances in neural information processing
systems, vol. 30, 2017.
[53] J. James, W. Yu, and J. Gu, “Online vehicle routing with neural combinatorial optimization
and deep reinforcement learning,” IEEE Transactions on Intelligent Transportation Systems,
vol. 20, no. 10, pp. 3806–3817, 2019.
[54] Z. Zhang, H. Liu, M. Zhou, and J. Wang, “Solving dynamic traveling salesman problems
with deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning
Systems, vol. 34, no. 4, pp. 2119–2132, 2021.
[55] R. Basso, B. Kulcs´ar, I. Sanchez-Diaz, and X. Qu, “Dynamic stochastic electric vehicle
routing with safe reinforcement learning,” Transportation research part E: logistics and
transportation review, vol. 157, p. 102496, 2022.
[56] H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Alizadeh, “Learning
scheduling algorithms for data processing clusters,” in Proceedings of the ACM special
interest group on data communication, pp. 270–288, 2019.
[57] X. Li, W. Luo, M. Yuan, J. Wang, J. Lu, J. Wang, J. L¨u, and J. Zeng, “Learning to optimize
industry-scale dynamic pickup and delivery problems,” in 2021 IEEE 37th International
Conference on Data Engineering (ICDE), pp. 2511–2522, IEEE, 2021.
[58] P. Augerat, D. Naddef, J. Belenguer, E. Benavent, A. Corberan, and G. Rinaldi, “Compu-
tational results with a branch and cut code for the capacitated vehicle routing problem,”
1995.
[59] G. Kim, Y. S. Ong, T. Cheong, and P. S. Tan, “Solving the dynamic vehicle routing problem
under traffic congestion,” IEEE Transactions on Intelligent Transportation Systems, vol. 17,
no. 8, pp. 2367–2380, 2016.

簡易檢索 / 詳目顯示

相關論文