深度強化學習驅動的軟體定義網路下預先備份放置方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	拉吉謝哈爾‧巴塔 Rajshekhar Bhatta
論文名稱：	深度強化學習驅動的軟體定義網路下預先備份放置方法 Deep Reinforcement Learning-Driven Preemptive Backup Placement for Software Defined Networks
指導教授：	劉光浩 Liu, Kuang-Hao 吳財福 Wu, Tsai-Fu
口試委員:	黃之浩 Huang, C.-H. 蔡孟勳 Tsai, Meng-Hsun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	50
中文關鍵詞：	深度強化學習、軟性演員評論、網路功能虛擬化、預防性恢復、服務功能鏈接、虛擬網路功能
外文關鍵詞：	Deep reinforcement learning (DRL), Greedy Synchronization Algorithm, Network Function Virtualization (NFV), Preemptive Recovery, Service function chaining (SFC), Soft-actor-critic (SAC)
相關次數：	點閱：243 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

這項研究實現了一種技術方法，將貪婪同步算法與深度強化學習（DRL）相結合，以應對軟體定義網絡（SDN）中的故障處理和緩解挑戰。其主要目標是開發一個高智能的系統，可以主動處理網路故障，並根據實時情況動態調整操作，以最小化停機時間，確保網路運行的可靠性。

貪婪同步組件根據當前網路狀況選擇局部最優操作，實現快速決策，這使得算法能夠迅速反應檢測到的故障並及時開始恢復網路運作。另一方面，DRL 框架使算法具備學習和適應能力，能夠根據與 SDN 環境的交互作用獲得的獎勵和懲罰來優化其決策過程。

算法的設計包括一個精心設計的探索-利用策略，在已知最優操作和尋找潛在改進之間取得平衡，通過有效地引導 SDN 狀態，算法減輕了陷入局部最優解的風險，從而提高了在故障恢復中的韌性。

This research work implements an approach that integrates a Greedy synchronization algorithm with Deep Reinforcement Learning (DRL) to tackle the challenge of fault handling and mitigation in Software Defined Networks (SDNs). The primary goal is to develop an intelligent and efficient system that can proactively respond to network faults and dynamically adapt its actions to minimize downtime and ensure reliable network operation.

The Greedy synchronization component facilitates swift decision-making by selecting globally optimal actions based on current network conditions. This allows the algorithm to respond rapidly to detected faults and begin the recovery process promptly. On the other hand, the DRL framework equips the algorithm with learning and adaptation capabilities, enabling it to optimize its decision-making based on rewards and penalties obtained through interaction with the SDN environment.

The algorithm's design includes a carefully crafted exploration-exploitation strategy to strike a balance between exploiting known optimal actions and exploring potential improvements. By navigating the SDN states effectively, the algorithm mitigates the risk of getting stuck in locally optimal solutions, thereby increasing its resilience in fault recovery.

摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     v
Abstract  . . . . . . . . . . . . . . . . . . . . . . . . . . .     vi
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . .   vii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .  1
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . .  2
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . .  4
3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . .  6
System Model and Failure Recovery. . . . . . . . . . . . . . . .  7
1 System Model Overview . . . . . . . . . . . . . . . . . . . . . 8
1.1 Network Structure . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Network Properties . . . . . . . . . . . . . . . . . . . . . .9
2 Markov Decision Modelling Using Transition States . . . . . . .10
2.1 Hybrid State Modelling . . . . . . . . . . . . . . . . . . . 10
2.2 Transition State Model . . . . . . . . . . . . . . . . . . . 12
3 Failure Recovery Method . . . . . . . . . . . . . . . . . . . .12
3.1 Preemptive Recovery . . . . . . . . . . . . . . . . . . . . .13
3.2 Non-Preemptive Recovery . . . . . . . . . . . . . . . . . . .15
4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . .16
4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . .16
4.2 Adaptive Greedy Synchronization . . . . . . . . . . . . . . .17
4.3 Objective Function . . . . . . . . . . . . . . . . . . . . . 19
Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . 21
1 Soft Actor Critic . . . . . . . . . . . . . . . . . . . . . . .22
1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .23
2 Agent Modelling . . . . . . . . . . . . . . . . . . . . . . . .27
2.1 States . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Action . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Reward Function . . . . . . . . . . . . . . . . . . . . . . .28
Results and Discussions. . . . . . . . . . . . . . . . . . . .   30
1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . 30
1.1 LSTM Unit Structure . . . . . . . . . . . . . . . . . . . . .31
1.2 Training and Evaluation . . . . . . . . . . . . . . . . . . .32
2 Accuracy Analysis . . . . . . . . . . . . . . . . . . . . . . .33

2.1 Critical State Detection Accuracy . . . . . . . . . . . . . .33
2.2 Warning State Detection Accuracy . . . . . . . . . . . . . . 35
2.3 Normal State Detection Accuracy . . . . . . . . . . . . . . .37
2.4 Premptive Decision Ratio . . . . . . . . . . . . . . . . . . 38
3 Benchmark Model Analysis . . . . . . . . . . . . . . . . . . . 41
3.1 Time Complexity Analysis of the Benchmark Algorithms . . . . 41
Conclusion . . . .. . . . . . . . . . . . . . . . . . . . . . .  45
References. . . . . . . . . . . . . . . . . . . . . . . . . . . .  47
                                

[1] G. Yuan, Z. Xu, B. Yang, W. Liang, W. Chai, D. Tuncer, A. Galis, G. Pavlou, and G. Wu, “Fault tolerant placement of stateful VNFs and dynamic fault recovery in cloud networks,” Computer Networks, vol. 166, p. 106953, 2019.
[2] L. Yu and L. Qu, “AoI aware VNF scheduling with parallel transmission for delay- sensitive applications,” in Proc. Int Conf on High Performance Computing Com- munications; 8th Int Conf on Data Science Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud Big Data Systems Application (HPCC/DSS/SmartCity/DependSys), 2022, pp. 1382–1387.
[3] A. Shaghaghi, A. Zakeri, N. Mokari, M. R. Javan, M. Behdadfar, and E. A. Jorswieck, “Proactive and AoI-aware failure recovery for stateful NFV-enabled zero-touch 6G net- works: Model-free DRL approach,” IEEE Transactions on Network and Service Manage- ment, vol. 19, no. 1, pp. 437–451, 2022.
[4] K. Qu, W. Zhuang, Q. Ye, X. Shen, X. Li, and J. Rao, “Dynamic flow migration for em- bedded services in SDN/NFV-enabled 5G core networks,” IEEE Transactions on Com- munications, vol. 68, no. 4, pp. 2394–2408, 2020.
[5] S. Schneider, H. Qarawlus, and H. Karl, “Distributed online service coordination using deep reinforcement learning,” in Proc. 2021 IEEE 41st International Conference on Dis- tributed Computing Systems (ICDCS), July 2021, pp. 539–549.
[6] M. Z. Chowdhury, M. Shahjalal, S. Ahmed, and Y. M. Jang, “6G wireless communication systems: Applications, requirements, technologies, challenges, and research directions,” IEEE Open Journal of the Communications Society, vol. 1, pp. 957–975, 2020.[7] D. Xia, J. Shi, K. Wan, J. Wan, M. Martínez-García, and X. Guan, “Digital twin and artifi- cial intelligence for intelligent planning and energy-efficient deployment of 6G networks in smart factories,” IEEE Wireless Communications, vol. 30, no. 3, pp. 171–179, 2023.
[8] Y. Wang and X. Tan, “Greedy multi-step off-policy reinforcement learning,” 2020. [Online]. Available: https://openreview.net/references/pdf?id=5Anqt8m42I
[9] Y. Wang, Q. Wu, P. He, and X. Tan, “Greedy-step off-policy reinforcement learning,”
arXiv preprint arXiv:2102.11717, 2021.

[10] M. Gimelfarb, S. Sanner, and C.-G. Lee, “Epsilon-bmc: A bayesian ensem- ble approach to epsilon-greedy exploration in model-free reinforcement learn- ing,” Conference on Uncertainty in Artificial Intelligence, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:197662327
[11] N. Hammami and K. K. Nguyen, “On-policy vs. off-policy deep reinforcement learning for resource allocation in open radio access network,” in 2022 IEEE Wireless Communications and Networking Conference (WCNC), 2022, pp. 1461–1466.
[12] G. ETSI, “Network functions virtualisation (NFV) release 3; reliability; report on NFV resiliency for the support of network slicing,” Network Functions Virtualisation (NFV) Release, vol. 3.
[13] S. Schneider, L. Dietrich Klenner, and H. Karl, “Every node for itself: Fully distributed service coordination,” in Proc. International Conference on Network and Service Man- agement (CNSM), 2020, pp. 1–9.
[14] ETSI NFV Industry Specification Group (ISG), “Network Functions Virtualization (NFV); Architectural Framework,” ETSI, Technical Report GS NFV 002, 2013. [Online]. Available: https://www.etsi.org/deliver/etsi_gs/NFV/001_099/002/01.01.01_ 60/gs_nfv002v010101p.pdf
[15] Y. Xiao, Q. Zhang, F. Liu, J. Wang, M. Zhao, Z. Zhang, and J. Zhang, “NFVdeep: Adaptive online service function chain deployment with deep reinforcement learning,” in Proc. International Symposium on Quality of Service. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https: //doi.org/10.1145/3326285.3329056

[16] H. Huang and S. Guo, “Proactive failure recovery for NFV in distributed edge computing,” IEEE Communications Magazine, vol. 57, pp. 131–137, 2019. [Online].
Available: https://api.semanticscholar.org/CorpusID:88488702

[17] V. Padma and P. Yogesh, “Proactive failure recovery in openflow-based software defined networks,” Proc. International Conference on Signal Processing, Communication and Networking (ICSCN), pp. 1–6, 2015.
[18] Y. Yue, X. Tang, Z. Zhang, X. Zhang, and W. Yang, “Virtual network function migra- tion considering load balance and SFC delay in 6G mobile edge computing networks,” Electronics, vol. 12, no. 12, 2023.
[19] L. Nobach, I. Rimac, V. Hilt, and D. Hausheer, “Statelet-based efficient and seamless NFV state transfer,” IEEE Transactions on Network and Service Management, vol. 14, pp. 964–977, 2017.
[20] J. Sun, F. Liu, H. Wang, and D. O. Wu, “Joint VNF placement, CPU allocation, and flow routing for traffic changes,” IEEE Internet of Things Journal, vol. 10, no. 2, pp. 1208– 1222, 2023.
[21] S. Kakade and J. Langford, “Approximately optimal approximate reinforcement learning,” in Proc. International Conference on Machine Learning, 2002, pp. 267–274.
[22] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” 2018. [Online]. Available: https://arxiv.org/abs/1801.01290
[23] S. Han and Y. Sung, “Diversity actor-critic: Sample-aware entropy regularization for sample-efficient exploration,” 2021. [Online]. Available: https://arxiv.org/abs/2006. 01419
[24] S. Neumann, S. Lim, A. Joseph, Y. Pan, A. White, and M. White, “Greedy actor-critic: A new conditional cross-entropy method for policy improvement,” 2023. [Online].
Available: https://arxiv.org/abs/1810.09103

[25] A. Shirmarz and A. Ghaffari, “An adaptive greedy flow routing algorithm for performance improvement in software-defined network,” International Journal of Numerical Mod- elling: Electronic Networks, Devices and Fields, vol. 33, no. 1, p. e2676, 2020, e2676 JNM-19-0092.R1.
[26] F. B. Schneider, “The state machine approach: A tutorial,” in Fault-Tolerant Distributed Computing, B. Simons and A. Spector, Eds. New York, NY: Springer New York, 1990, pp. 18–41.

簡易檢索 / 詳目顯示

相關論文