蜂巢式網路架構內利用序列到序列模型強化式學習實現多連線換手最佳化

簡易檢索 / 詳目顯示

回結果列表

研究生：	李相宇 Lee, Hsiang-Yu
論文名稱：	蜂巢式網路架構內利用序列到序列模型強化式學習實現多連線換手最佳化 Multi-connectivity Handover Optimization Using Sequence To Sequence Reinforcement Learning in Cellular Network
指導教授：	蔡明哲 Tsai, Ming-Jer
口試委員:	郭桐惟 Kuo, Tung-Wei 張仕穎 Chang, Shih-Ying 郭建志 Kuo, Jian-Jhih
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	20
中文關鍵詞：	換手、蜂巢式網路、強化式學習、序列至序列模型、深度學習
外文關鍵詞：	Handover, Cellular Network, Reinforcement Learning, Sequence to Sequence Model, Deep Learning
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在蜂巢式網路架構中，為了應對使用者頻寬需求的提高，達到更高速且
更穩定的連線，於環境中使用雙連線或多連線技術是不可或缺的。在多連
線的架構環境下，每次使用者選擇基地台進行換手時，都能選擇出一個或
多個基地台來滿足使用者的需求。在這種動態的環境下，每一個使用者每
一次的換手都會對網路環境造成影響，因此如何去正確的選擇換手對象去
對應這種變化，以合理的分配頻寬，滿足多數使用者需求，成為了需要解
決的問題。為了應對這個問題，我們提出了使用了序列到序列模型搭配深
度強化式學習來選擇換手對象的方法。
我們提出的方法針對過去在此領域使用強化式學習方法中需要列出所有
可能基地台搭配，因此難以擴大規模使用的缺點進行改良。透過將換手問
題視作一個機器學習中的多標籤分類問題，我們利用序列到序列模型輸出
序列的特性，達成了不影響表現且可彈性擴大規模的目標。透過模擬實驗，
我們的方法得知，得益於較先進的強化式學習方法，相較先前的強化式學
習方法可以得到更好的結果。同時我們也對使用序列到序列模型的效益進
行了實驗驗證，發現使用序列到序列模型能加速訓練流程，更快的得到好
的結果以增加實用性。

In the cellular network architecture, to cope with the increasing bandwidth demands of users and achieve faster and more stable connections, the use of dual or
multi-connectivity architecture is indispensable. In a multi-connectivity architecture, each time a handover is triggered, one or more base stations can be selected
to meet the user’s needs. In such dynamic environment, each handover by every user affects the network environment and is hard to predict. Therefore, the
problem to be solved is how to correctly choose the handover target to adapt to
these changes, allocate bandwidth reasonably, and meet the needs of the majority
of users.To solve this problem, we propose a method that utilizes a sequence-tosequence model combined with deep reinforcement learning to select the handover
target.
Our proposed method improves the drawback of previous reinforcement learning method in this field, which required listing all possible base station combinations, making it difficult to scale up. By considering the handover problem as a
multi-label classification problem in machine learning and utilizing the sequenceto-sequence model’s output sequence characteristics, we achieve the goal of maintaining performance while flexibly scaling up. Through simulation experiments,
our method shows that, thanks to more advanced reinforcement learning methods, it achieves better results compared to previous reinforcement learning approaches.Additionally, we validate the benefits of using a sequence-to-sequence
model,finding that it speeds up the training process and yields good results faster,
thereby increasing practicality.

Acknowledgements
摘要 i
Abstract ii
Introduction 1
Related Works 3
Preliminary 5
1 Reinforcement Learning (RL) . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 Background of RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Proximal Policy Optimization (PPO) . . . . . . . . . . . . . . . . . . . 6
2 Sequence-to-Sequence Model (Seq2Seq) . . . . . . . . . . . . . . . . . . . . . 7
2.1 Background of Seq2Seq . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
The Proposed Method 9
1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Simulation 13
1 Simulation setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Comparison methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1 Comparison on other methods . . . . . . . . . . . . . . . . . . . . . . 14
4.2 With and without Transformer . . . . . . . . . . . . . . . . . . . . . . 16
Conclusion 17
References 19
                                

[1] K. D. C. Silva, Z. Becvar, and C. R. L. Frances, “Adaptive hysteresis margin based on
fuzzy logic for handover in mobile networks with dense small cells,” IEEE Access, vol. 6,
pp. 17178–17179, 2018.
[2] K. C. Silva, Z. Becvar, E. H. S. Cardoso, and C. R. Francˆes, “Self-tuning handover algorithm based on fuzzy logic in mobile networks with dense small cells,” IEEE Wireless
Communications and Networking Conference, pp. 1–6, 2018.
[3] Z.-H. Huang, Y.-L. Hsu, P.-K. Chang, and M.-J. Tsai, “Efficient handover algorithm in 5g
networks using deep learning,” in GLOBECOM 2020 - 2020 IEEE Global Communications Conference, pp. 1–6, 2020.
[4] Y. Chen, X. Lin, T. Khan, and M. Mozaffari, “Efficient drone mobility support using
reinforcement learning,” in 2020 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6, 2020.
[5] M. Sana, A. De Domenico, E. C. Strinati, and A. Clemente, “Multi-agent deep reinforcement learning for distributed handover management in dense mmwave networks,”
in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 8976–8980, 2020.
[6] S. Kang, S. Choi, G. Lee, and S. Bahk, “A dual-connection based handover scheme for
ultra-dense millimeter-wave cellular networks,” in 2019 IEEE Global Communications
Conference (GLOBECOM), pp. 1–6, 2019.
[7] F. Zhao, H. Tian, G. Nie, and H. Wu, “Received signal strength prediction based multiconnectivity handover scheme for ultra-dense networks,” in 2018 24th Asia-Pacific Conference on Communications (APCC), pp. 233–238, 2018.
[8] V. Yajnanarayana, H. Rydén, and L. Hévizi, “5g handover using reinforcement learning,”
in 2020 IEEE 3rd 5G World Forum (5GWF), pp. 349–354, 2020.
[9] J. J. Hernández-Carlón, J. Pérez-Romero, O. Sallent, I. Vilà, and F. Casadevall, “A deep
q-network-based algorithm for multi-connectivity optimization in heterogeneous cellularnetworks,” Sensors, vol. 22, no. 16, 2022.
[10] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA,
USA: A Bradford Book, 2018.
[11] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017.
[12] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural
networks,” in Advances in Neural Information Processing Systems (Z. Ghahramani,
M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, eds.), vol. 27, Curran Associates,
Inc., 2014.
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and
I. Polosukhin, “Attention is all you need,” 2017.

簡易檢索 / 詳目顯示

相關論文