研究生: |
王人弘 Wang, Jen-Hung |
---|---|
論文名稱: |
多通道交會問題於近似策略與強化學習之比較 Comparison of Approximation Policies and Reinforcement Learning for Multichannel Rendezvous Problem |
指導教授: |
張正尚
Chang, Cheng-Shang |
口試委員: |
李端興
Lee, Duan-Shin 林華君 Lin, Hwa-Chun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 通訊工程研究所 Communications Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 多通道交會 、強化學習 |
外文關鍵詞: | Multichannel Rendezvous, Reinforcement Learning |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本文中,我們考慮認知無線電網路(CRN)中的多通道交會問題,並且假設兩個使用者同時間跳到共同一個通道時會成功交會的機率會根據通道的狀態。我們用隨機程序去代表在任意時間下所有的通道狀態。對兩個使用者而言,他們只能夠知道通道狀態的分布狀況,並不能夠看到在任一時間下通道確切的狀態為何。在[1]當中,有兩種通道模型,分別是快速時變通道模型(在不同時間下通道狀態是獨立且相同分布)以及慢速時變通道模型(通道狀態隨著時間保持不變)。我們延伸一種廣義時變通道模型,其中此通道狀態的聯合機率分布僅假設在時間上是穩態的。我們推導出廣義時變通道模型的最小交會時間期望值的上限為在慢速時變通道模型的最小交會時間期望值。並假設在兩種通道狀態的馬可夫鏈模型下,可以推導出廣義時變通道模型的最小交會時間期望值的下限是快速通道模型的最小交會時間期望值。透過對抗強盜問題來描述多通道交會問題,我們提出使用強化學習方式來學習通道選擇機率。我們的實驗結果顯示,強化學習的方法是非常有效的,產生的最小交換時間期望值是能夠與文獻中的各種近似策略相比的。
In this thesis, we consider the multichannel rendezvous problem in cognitive radio networks (CRNs) where the probability that two users hopping on the same channel have a successful rendezvous is a function of channel states. The channel states are
modelled by stochastic processes with joint distributions known to users. However, the exact state of a channel at any time is not observable. In [1], there are two channel
models: (i) the fast time-varying channel model (where the channel states are assumed to be independent and identically distributed in each time slot), and (ii) the slow time-varying channel model (where the channel states remain unchanged over time). We then
extend the results in [1] to general time channel models where the joint distribution of the channel states is only assumed to be stationary in time. We derived that the upper bound of the ETTR of the general channel model is the ETTR of the slow time-varying
channel model. Under a markov channel model with two states, we can derived that the lower bound of the ETTR of the general channel model is the ETTR of the fast time-varying channel model. By formulating such a multichannel rendezvous problem as
an adversarial bandit problem, we propose using a reinforcement learning approach to learn the channel selection probabilities pi(t), i = 1; 2; : : : ;N. Our experimental results show that the reinforcement learning approach is very eective and yields comparable ETTRs when comparing to various approximation policies in the literature.
[1] C.-S. Chang, D.-S. Lee, Y.-L. Lin, and J.-H. Wang, "ETTR Bounds and Approximation Solutions of Blind Rendezvous Policies in Cognitive Radio Networks with Random Channel States," arXiv e-prints, p. arXiv:1906.10424, Jun 2019.
[2] C. Chang, D. Lee, and W. Liao, "A tutorial on multichannel rendezvous in cognitive radio networks," in Cognitive Radio Networks: Performance, Applications and Technology. Nova Sci, 2018.
[3] Q.-S. H. Z. Gu, Y. Wang and F. C. M. Lau, Rendezvous in Distributed Systems: Theory, Algorithms and Applications. Springer, 2017.
[4] S. Alpern and S. Gal, "The theory of search games and rendezvous. 2003."
[5] C.-F. Shih, T. Y. Wu, and W. Liao, "Dh-mac: A dynamic channel hopping mac protocol for cognitive radio networks," in 2010 IEEE International Conference on Communications. IEEE, 2010, pp. 1-5.
[6] C.-S. Chang, W. Liao, T.-Y. Wu, C.-S. Chang, W. Liao, and T.-Y. Wu, "Tight lower bounds for channel hopping schemes in cognitive radio networks," IEEE/ACM Transactions on Networking (TON), vol. 24, no. 4, pp. 2343-2356, 2016.
[7] M. J. Abdel-Rahman, H. Rahbari, and M. Krunz, "Multicast rendezvous in fastvarying dsa networks," IEEE transactions on mobile computing, vol. 14, no. 7, pp. 1449-1462, 2014.
[8] K. Bian, J.-M. Park, and R. Chen, "A quorum-based framework for establishing control channels in dynamic spectrum access networks," in Proceedings of the 15th annual international conference on Mobile computing and networking. ACM, 2009,
pp. 25-36.
[9] D. Yang, J. Shin, and C. Kim, "Deterministic rendezvous scheme in multichannel access networks," Electronics Letters, vol. 46, no. 20, pp. 1402-1404, 2010.
[10] N. C. Theis, R. W. Thomas, and L. A. DaSilva, "Rendezvous for cognitive radios," IEEE transactions on mobile computing, vol. 10, no. 2, pp. 216-227, 2010.
[11] Z. Lin, H. Liu, X. Chu, and Y.-W. Leung, "Jump-stay based channel-hopping algorithm with guaranteed rendezvous for cognitive radio networks," in 2011 Proceedings IEEE INFOCOM. IEEE, 2011, pp. 2444-2452.
[12] Z. Gu, Q.-S. Hua, Y. Wang, and F. C. Lau, "Nearly optimal asynchronous blind rendezvous algorithm for cognitive radio networks," in 2013 IEEE international conference on sensing, communications and networking (SECON). IEEE, 2013, pp. 371-379.
[13] G.-Y. Chang and J.-F. Huang, "A fast rendezvous channel-hopping algorithm for cognitive radio networks," IEEE Communications Letters, vol. 17, no. 7, pp. 1475-1478, 2013.
[14] G.-Y. Chang, W.-H. Teng, H.-Y. Chen, and J.-P. Sheu, "Novel channel-hopping schemes for cognitive radio networks," IEEE Transactions on Mobile Computing, vol. 13, no. 2, pp. 407-421, 2012.
[15] Z. Gu, Q.-S. Hua, and W. Dai, "Fully distributed algorithms for blind rendezvous in cognitive radio networks," in Proceedings of the 15th ACM international symposium on Mobile ad hoc networking and computing. ACM, 2014, pp. 155-164.
[16] C.-S. Chang, C.-Y. Chen, D.-S. Lee, and W. Liao, "Ecient encoding of user ids for nearly optimal expected time-to-rendezvous in heterogeneous cognitive radio networks," IEEE/ACM Transactions on Networking (TON), vol. 25, no. 6, pp. 3323-3337, 2017.
[17] G. Li, Z. Gu, X. Lin, H. Pu, and Q.-s. Hua, "Deterministic distributed rendezvous algorithms for multi-radio cognitive radio networks," in Proceedings of the 17th ACM international conference on Modeling, analysis and simulation of wireless and mobile systems. ACM, 2014, pp. 313-320.
[18] L. Yu, H. Liu, Y.-W. Leung, X. Chu, and Z. Lin, "Multiple radios for fast rendezvous in cognitive radio networks," IEEE Transactions on Mobile Computing, vol. 14, no. 9, pp. 1917-1931, 2014.
[19] L. Yu, H. Liu, Y.-W. Leung, X. Chu, and Z. Lin, "Adjustable rendezvous in multi-radio cognitive radio networks," in 2015 IEEE
global communications conference (GLOBECOM). IEEE, 2015, pp. 1-7.
[20] Y.-C. Chang, C.-S. Chang, and J.-P. Sheu, "An enhanced fast multi-radio rendezvous algorithm in heterogeneous cognitive radio networks," IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 4, pp. 847-859, 2018.
[21] H. Pu, Z. Gu, X. Lin, Q.-S. Hua, and H. Jin, "Dynamic rendezvous algorithms for cognitive radio networks," in 2016 IEEE International Conference on Communications (ICC). IEEE, 2016, pp. 1-6.
[22] A. Al-Mqdashi, A. Sali, M. J. Abdel-Rahman, N. K. Noordin, S. J. Hashim, and R. Nordin, "Ecient rendezvous schemes for fast-varying cognitive radio ad hoc networks," Transactions on Emerging Telecommunications Technologies, vol. 28, no. 12, p. e3217, 2017.
[23] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[24] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P.Wang, Y.-C. Liang, and D. I. Kim, "Applications of deep reinforcement learning in communications and networking: A survey," IEEE Communications Surveys & Tutorials, 2019.
[25] J. C. Gittins, K. D. Glazebrook, R. Weber, and R. Weber, Multi-armed bandit allocation indices. Wiley Online Library, 1989, vol. 25.
[26] S. M. Ross, J. J. Kelly, R. J. Sullivan, W. J. Perry, D. Mercer, R. M. Davis, T. D. Washburn, E. V. Sager, J. B. Boyce, and V. L. Bristow, Stochastic processes. Wiley New York, 1996, vol. 2.
[27] C.-S. Chang, X. Chao, and M. Pinedo, "Integration of discrete-time correlated markov processes in a tdm system," Probability in the Engineering and Informational Sciences, vol. 4, no. 1, pp. 29-56, 1990.
[28] A. W. Marshall, I. Olkin, and B. C. Arnold, Inequalities: theory of majorization and its applications. Springer, 1979, vol. 143.
[29] A. H. Tchen et al., "Inequalities for distributions with given marginals," The Annals of Probability, vol. 8, no. 4, pp. 814-827, 1980.
[30] T. Rolski, "Upper bounds for single server queues with doubly stochastic poisson arrivals," Mathematics of Operations Research, vol. 11, no. 3, pp. 442-450, 1986.
[31] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, "The nonstochastic multiarmed bandit problem," SIAM journal on computing, vol. 32, no. 1, pp. 48-77, 2002.
[32] S. Gold, A. Rangarajan et al., "Softmax to softassign: Neural network algorithms for combinatorial optimization," Journal of Articial Neural Networks, vol. 2, no. 4, pp. 381-399, 1996.
[33] L. S. Shapley, "Stochastic games," Proceedings of the national academy of sciences, vol. 39, no. 10, pp. 1095-1100, 1953.