基於深度強化式學習細胞網路平均資訊年齡最小化

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳宜倍 Wu, Yi-Bei
論文名稱：	基於深度強化式學習細胞網路平均資訊年齡最小化 Average AoI Minimization for Cellular Network Based on Deep Reinforcement Learning
指導教授：	劉光浩 Liu, Kuang-Hao
口試委員:	方凱田 Feng, Kai-Ten 鍾偉和 Chung, Wei-Ho
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 通訊工程研究所 Communications Engineering
論文出版年：	2024
畢業學年度：	113
語文別：	英文
論文頁數：	36
中文關鍵詞：	資訊年齡、卷積神經網路、深度強化式學習、上行用戶選擇
外文關鍵詞：	age of information, convolution neural network, deep reinforcement learning, uplink user selection
相關次數：	點閱：158 下載：3
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

資訊年齡 (AoI) 是蜂巢網路中確保資料及時傳遞的重要指標。本論文希望將資訊年齡最小化並確保基地台 (BS) 端訊息的新鮮度。不同蜂巢網路內的用戶所選擇的結果會互相影響，使得傳統技術無法有效應對。此外，由於資料缺標籤，導致監督式學習無法使用。為了克服這些問題，我們採用深度強化式學習 (DRL)，以非監督的方式進行用戶選擇。
論文核心的方法主要使用雙重深度 Q 網路 (DDQN)。與選擇每個時間槽中最早產生的封包方法不同，雙重深度 Q 網路使基地台能夠考慮更廣泛的因素來做出決策，包括封包錯誤率、封包產生時間以及訊號干擾加噪聲比(SINR)。為了有效捕捉和處理這些複雜的網路特徵，卷積神經網路（CNN)被整合入雙重深度 Q 網路架構中。卷積神經網路萃取了網絡中的有效的特徵，使得智能體能夠學習到特徵間複雜的關係，並挑選出用戶。
在模擬中，相較於只考慮封包產生時間的方法，可以看到我們所提出的方法在不同的封包到達率和編碼率下，資訊年齡更低。我們的模擬結果展現了基於雙深度 Q 網路的方法確實有效降低資訊年齡，也展示其在增強未來蜂窩網路對資訊新鮮度敏感的潛力。

The age of information (AoI) in cellular networks is a critical metric for ensuring timely data delivery in time-sensitive applications. The aim of this thesis is to minimize the AoI and improve the freshness of information at the base station (BS). Reaching this objective is difficult owing to the interdependent results of user selection across various cells, rendering conventional techniques ineffective. Additionally, the absence of ground truth makes supervised learning infeasible. To overcome these obstacles, deep reinforcement learning (DRL) is utilized to make user selection decisions in an unspervised manner.
The core of the proposed method lies in a double deep Q-network (DDQN) framework. Unlike the methods that choosing the earliest packet in each time slot, DDQN enables the BS to make intelligent decisions by considering a broader range of factors, including packet error rate, packet creation time, and signal-to-interference-plus-noise ratio (SINR).
To effectively capture and process these complex network dynamics, a convolutional neural network (CNN) is integrated into the DDQN architecture. The CNN extracts salient features from the network state, enabling the agent to learn intricate patterns and make decisions user selection.
Extensive simulations demonstrate the proposed method's superiority over conventional scheduling schemes, particularly in scenarios with varying packet interarrival time and coding rate.The results highlight the effectiveness of the DDQN-based approach in minimizing AoI, showcasing its potential to enhance the performance of future cellular networks for applications sensitive to information freshness.

摘要 i
Abstract ii
誌謝 iii
Introduction 1
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
System Model and Problem Description 4
1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Packet Error Rate and Packet Update Policies . . . . . . . . . . . . . . 6
1.3 Average AoI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Deep Q Learning for UE Selection Scheme 10
1 Model Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Simulation Results and Discussion 17
1 Dataset Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Rationale for State Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Impact of Episode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Impact of Learning Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6 Impact of Training Data Fraction . . . . . . . . . . . . . . . . . . . . . . . . . 25
7 Average AoI Differences Under Varying Packet Interarrival Time . . . . . . . . 26
8 Average AoI Differences Under Varying Coding Rate . . . . . . . . . . . . . . 28
9 Impact of Transmit Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10 Comparison to Other Reinforcement Learning Methods . . . . . . . . . . . . . 32
Conclusion 34
References 35
                                

[1] R. D. Yates and S. K. Kaul, “The age of information: Real-time status updating by multiple sources,” IEEE Transactions on Information Theory, vol. 65, no. 3, pp. 1807–1827, 2019.
[2] B. Yu, Y. Cai, D. Wu, and Z. Xiang, “Average age of information in short packet based machine type communication,” IEEE Transactions on Vehicular Technology, vol. 69, no. 9, pp. 10 306–10 319, 2020.
[3] H. B. Beytur, S. Baghaee, and E. Uysal, “Measuring age of information on real-life connections,” in Proc. Signal Processing and Communications Applications Conference (SIU), 2019, pp. 1–4.
[4] H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” 2015. [Online]. Available: https://arxiv.org/abs/1509.06461
[5] S.-Y. Huang and K.-H. Liu, “Average AoI minimization for energy harvesting relay-aided status update network using deep reinforcement learning,” IEEE Wireless Communications Letters, vol. 12, no. 8, pp. 1464–1468, 2023.
[6] L. Liu, K. Xiong, J. Cao, Y. Lu, P. Fan, and K. B. Letaief, “Average AoI minimization
in UAV-assisted data collection with RF wireless power transfer: A deep reinforcement learning scheme,” IEEE Internet of Things Journal, vol. 9, no. 7, pp. 5216–5228, 2022.
[7] C.-C. Wu, P. Popovski, Z.-H. Tan, and C. Stefanović, “Design of AoI-aware 5G uplink scheduler using reinforcement learning,” in Proc. IEEE 5G World Forum (5GWF), 2021, pp. 176–181.
[8] A. Saleh and R. Valenzuela, “A statistical model for indoor multipath propagation,” IEEE Journal on Selected Areas in Communications, vol. 5, no. 2, pp. 128–137, 1987.
[9] A. Alkhateeb and R. W. Heath, “Frequency selective hybrid precoding for limited feedback millimeter wave systems,” IEEE Transactions on Communications, vol. 64, no. 5, pp. 1801–1818, 2016.
[10] F. Sohrabi and W. Yu, “Hybrid analog and digital beamforming for mmWave OFDM largescale antenna arrays,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 7, pp. 1432–1443, 2017.
[11] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
[12] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
[13] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. International Conference on Machine Learning, 2015.
[14] A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications,” in Proc. Information Theory and Applications Workshop (ITA), San Diego, CA, Feb. 2019, pp. 1–8.
[15] C. Li, S. Yan, N. Yang, and X. Zhou, “Truncated channel inversion power control to enable one-way URLLC with imperfect channel reciprocity,” IEEE Transactions on Communications, vol. 70, no. 4, pp. 2313–2327, 2022.

簡易檢索 / 詳目顯示

相關論文