多通道交會問題於近似策略與強化學習之比較

簡易檢索 / 詳目顯示

回結果列表

研究生：	王人弘 Wang, Jen-Hung
論文名稱：	多通道交會問題於近似策略與強化學習之比較 Comparison of Approximation Policies and Reinforcement Learning for Multichannel Rendezvous Problem
指導教授：	張正尚 Chang, Cheng-Shang
口試委員:	李端興 Lee, Duan-Shin 林華君 Lin, Hwa-Chun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 通訊工程研究所 Communications Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	45
中文關鍵詞：	多通道交會、強化學習
外文關鍵詞：	Multichannel Rendezvous, Reinforcement Learning
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本文中，我們考慮認知無線電網路（CRN）中的多通道交會問題，並且假設兩個使用者同時間跳到共同一個通道時會成功交會的機率會根據通道的狀態。我們用隨機程序去代表在任意時間下所有的通道狀態。對兩個使用者而言，他們只能夠知道通道狀態的分布狀況，並不能夠看到在任一時間下通道確切的狀態為何。在[1]當中，有兩種通道模型，分別是快速時變通道模型（在不同時間下通道狀態是獨立且相同分布）以及慢速時變通道模型（通道狀態隨著時間保持不變）。我們延伸一種廣義時變通道模型，其中此通道狀態的聯合機率分布僅假設在時間上是穩態的。我們推導出廣義時變通道模型的最小交會時間期望值的上限為在慢速時變通道模型的最小交會時間期望值。並假設在兩種通道狀態的馬可夫鏈模型下，可以推導出廣義時變通道模型的最小交會時間期望值的下限是快速通道模型的最小交會時間期望值。透過對抗強盜問題來描述多通道交會問題，我們提出使用強化學習方式來學習通道選擇機率。我們的實驗結果顯示，強化學習的方法是非常有效的，產生的最小交換時間期望值是能夠與文獻中的各種近似策略相比的。

In this thesis, we consider the multichannel rendezvous problem in cognitive radio networks (CRNs) where the probability that two users hopping on the same channel have a successful rendezvous is a function of channel states. The channel states are
modelled by stochastic processes with joint distributions known to users. However, the exact state of a channel at any time is not observable. In [1], there are two channel
models: (i) the fast time-varying channel model (where the channel states are assumed to be independent and identically distributed in each time slot), and (ii) the slow time-varying channel model (where the channel states remain unchanged over time). We then
extend the results in [1] to general time channel models where the joint distribution of the channel states is only assumed to be stationary in time. We derived that the upper bound of the ETTR of the general channel model is the ETTR of the slow time-varying
channel model. Under a markov channel model with two states, we can derived that the lower bound of the ETTR of the general channel model is the ETTR of the fast time-varying channel model. By formulating such a multichannel rendezvous problem as
an adversarial bandit problem, we propose using a reinforcement learning approach to learn the channel selection probabilities pi(t), i = 1; 2; : : : ;N. Our experimental results show that the reinforcement learning approach is very eective and yields comparable ETTRs when comparing to various approximation policies in the literature.

Contents 1
List of Figures 3
Introduction 4
System Model 10
A General Time-Varying channel Model 12
A Markov channel model with two states 16
1 An ETTR lower bound for positively correlated Markov chains . . . . . . 17
2 Approximation solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Simulation of general time-varying channel model 24
Reinforcement learning 28
Experimental results of reinforcement learning 31
Conclusion 39
Bibliography 41
                                

[1] C.-S. Chang, D.-S. Lee, Y.-L. Lin, and J.-H. Wang, "ETTR Bounds and Approximation Solutions of Blind Rendezvous Policies in Cognitive Radio Networks with Random Channel States," arXiv e-prints, p. arXiv:1906.10424, Jun 2019.
[2] C. Chang, D. Lee, and W. Liao, "A tutorial on multichannel rendezvous in cognitive radio networks," in Cognitive Radio Networks: Performance, Applications and Technology. Nova Sci, 2018.
[3] Q.-S. H. Z. Gu, Y. Wang and F. C. M. Lau, Rendezvous in Distributed Systems: Theory, Algorithms and Applications. Springer, 2017.
[4] S. Alpern and S. Gal, "The theory of search games and rendezvous. 2003."
[5] C.-F. Shih, T. Y. Wu, and W. Liao, "Dh-mac: A dynamic channel hopping mac protocol for cognitive radio networks," in 2010 IEEE International Conference on Communications. IEEE, 2010, pp. 1-5.
[6] C.-S. Chang, W. Liao, T.-Y. Wu, C.-S. Chang, W. Liao, and T.-Y. Wu, "Tight lower bounds for channel hopping schemes in cognitive radio networks," IEEE/ACM Transactions on Networking (TON), vol. 24, no. 4, pp. 2343-2356, 2016.
[7] M. J. Abdel-Rahman, H. Rahbari, and M. Krunz, "Multicast rendezvous in fastvarying dsa networks," IEEE transactions on mobile computing, vol. 14, no. 7, pp. 1449-1462, 2014.
[8] K. Bian, J.-M. Park, and R. Chen, "A quorum-based framework for establishing control channels in dynamic spectrum access networks," in Proceedings of the 15th annual international conference on Mobile computing and networking. ACM, 2009,
pp. 25-36.
[9] D. Yang, J. Shin, and C. Kim, "Deterministic rendezvous scheme in multichannel access networks," Electronics Letters, vol. 46, no. 20, pp. 1402-1404, 2010.
[10] N. C. Theis, R. W. Thomas, and L. A. DaSilva, "Rendezvous for cognitive radios," IEEE transactions on mobile computing, vol. 10, no. 2, pp. 216-227, 2010.
[11] Z. Lin, H. Liu, X. Chu, and Y.-W. Leung, "Jump-stay based channel-hopping algorithm with guaranteed rendezvous for cognitive radio networks," in 2011 Proceedings IEEE INFOCOM. IEEE, 2011, pp. 2444-2452.
[12] Z. Gu, Q.-S. Hua, Y. Wang, and F. C. Lau, "Nearly optimal asynchronous blind rendezvous algorithm for cognitive radio networks," in 2013 IEEE international conference on sensing, communications and networking (SECON). IEEE, 2013, pp. 371-379.
[13] G.-Y. Chang and J.-F. Huang, "A fast rendezvous channel-hopping algorithm for cognitive radio networks," IEEE Communications Letters, vol. 17, no. 7, pp. 1475-1478, 2013.
[14] G.-Y. Chang, W.-H. Teng, H.-Y. Chen, and J.-P. Sheu, "Novel channel-hopping schemes for cognitive radio networks," IEEE Transactions on Mobile Computing, vol. 13, no. 2, pp. 407-421, 2012.
[15] Z. Gu, Q.-S. Hua, and W. Dai, "Fully distributed algorithms for blind rendezvous in cognitive radio networks," in Proceedings of the 15th ACM international symposium on Mobile ad hoc networking and computing. ACM, 2014, pp. 155-164.
[16] C.-S. Chang, C.-Y. Chen, D.-S. Lee, and W. Liao, "Ecient encoding of user ids for nearly optimal expected time-to-rendezvous in heterogeneous cognitive radio networks," IEEE/ACM Transactions on Networking (TON), vol. 25, no. 6, pp. 3323-3337, 2017.
[17] G. Li, Z. Gu, X. Lin, H. Pu, and Q.-s. Hua, "Deterministic distributed rendezvous algorithms for multi-radio cognitive radio networks," in Proceedings of the 17th ACM international conference on Modeling, analysis and simulation of wireless and mobile systems. ACM, 2014, pp. 313-320.
[18] L. Yu, H. Liu, Y.-W. Leung, X. Chu, and Z. Lin, "Multiple radios for fast rendezvous in cognitive radio networks," IEEE Transactions on Mobile Computing, vol. 14, no. 9, pp. 1917-1931, 2014.
[19] L. Yu, H. Liu, Y.-W. Leung, X. Chu, and Z. Lin, "Adjustable rendezvous in multi-radio cognitive radio networks," in 2015 IEEE
global communications conference (GLOBECOM). IEEE, 2015, pp. 1-7.
[20] Y.-C. Chang, C.-S. Chang, and J.-P. Sheu, "An enhanced fast multi-radio rendezvous algorithm in heterogeneous cognitive radio networks," IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 4, pp. 847-859, 2018.
[21] H. Pu, Z. Gu, X. Lin, Q.-S. Hua, and H. Jin, "Dynamic rendezvous algorithms for cognitive radio networks," in 2016 IEEE International Conference on Communications (ICC). IEEE, 2016, pp. 1-6.
[22] A. Al-Mqdashi, A. Sali, M. J. Abdel-Rahman, N. K. Noordin, S. J. Hashim, and R. Nordin, "Ecient rendezvous schemes for fast-varying cognitive radio ad hoc networks," Transactions on Emerging Telecommunications Technologies, vol. 28, no. 12, p. e3217, 2017.
[23] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[24] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P.Wang, Y.-C. Liang, and D. I. Kim, "Applications of deep reinforcement learning in communications and networking: A survey," IEEE Communications Surveys & Tutorials, 2019.
[25] J. C. Gittins, K. D. Glazebrook, R. Weber, and R. Weber, Multi-armed bandit allocation indices. Wiley Online Library, 1989, vol. 25.
[26] S. M. Ross, J. J. Kelly, R. J. Sullivan, W. J. Perry, D. Mercer, R. M. Davis, T. D. Washburn, E. V. Sager, J. B. Boyce, and V. L. Bristow, Stochastic processes. Wiley New York, 1996, vol. 2.
[27] C.-S. Chang, X. Chao, and M. Pinedo, "Integration of discrete-time correlated markov processes in a tdm system," Probability in the Engineering and Informational Sciences, vol. 4, no. 1, pp. 29-56, 1990.
[28] A. W. Marshall, I. Olkin, and B. C. Arnold, Inequalities: theory of majorization and its applications. Springer, 1979, vol. 143.
[29] A. H. Tchen et al., "Inequalities for distributions with given marginals," The Annals of Probability, vol. 8, no. 4, pp. 814-827, 1980.
[30] T. Rolski, "Upper bounds for single server queues with doubly stochastic poisson arrivals," Mathematics of Operations Research, vol. 11, no. 3, pp. 442-450, 1986.
[31] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, "The nonstochastic multiarmed bandit problem," SIAM journal on computing, vol. 32, no. 1, pp. 48-77, 2002.
[32] S. Gold, A. Rangarajan et al., "Softmax to softassign: Neural network algorithms for combinatorial optimization," Journal of Articial Neural Networks, vol. 2, no. 4, pp. 381-399, 1996.
[33] L. S. Shapley, "Stochastic games," Proceedings of the national academy of sciences, vol. 39, no. 10, pp. 1095-1100, 1953.

簡易檢索 / 詳目顯示

相關論文