簡易檢索 / 詳目顯示

研究生: 林俊廷
Lin, Chun-Ting
論文名稱: 基於深度強化學習方法下,針對接收訊號強度指標進行室內定位
Indoor Positioning via Received Signal Strength Indicators Using Deep Reinforcement Learning
指導教授: 鐘太郎
Jong, Tai-Lang
口試委員: 廖梨君
Liao, Li-Chun
黃裕煒
Huang, Yu-Wei
謝奇文
Hsieh, Chi-Wen
鐘太郎
Jong, Tai-Lang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 85
中文關鍵詞: 深度強化學習機器學習人工智慧室內定位接收訊號強度指標變分自動編碼器物聯網
外文關鍵詞: Deep reinforcement learning, Machine learning, Artificial intelligence, Indoor positioning, Received signal strength indicators, Variational autoencoder, Internet of things
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文主要利用無線接收訊號強度指標資料做室內定位,接收訊號強度指標常常被使用在藍芽Beacon設備之室內定位方法中,對於配有Beacon設備之移動或固定的物體定位也有許多的應用層面。本論文使用深度強化學習預測物體位於室內所在的位置,並與其他著名的機器學習方法之實驗結果做比較以及討論。
    本論文首先嘗試了不包含過去經驗下,每次只用單一筆資料測試在不同的環境情況下使用深度強化學習之方法,包含只考慮最簡化環境情況,接著考慮室內人員移動所帶來的雜訊干擾之影響,再考慮室內障礙物所帶來的雜訊干擾之影響,最後再考慮深度強化學習模型組合變分自動編碼器模型之影響,從隨機選取200筆之未標記資料的測試結果發現在最後一種組合環境假設下得到定位誤差為7.92公尺。另外本論文著重於使用標記資料以及未標記資料,考慮過去經驗訓練深度強化學習模型,並重複10次隨機選取200筆未標記資料測量,預測到的平均距離誤差總平均值僅有5.31公尺,另外也重複10次隨機選取200筆標記資料測量預測平均距離誤差,所得到的總平均值也僅有5.18公尺。
    除了使用深度強化學習組合變分自動編碼器之方法做定位,本論文並使用其他著名的機器學習方法之實驗結果來做比較,包括非監督式學習中的變分自動編碼器與K-means分群法,以及監督式學習中的卷積神經網路,並使用與深度強化學習組合變分自動編碼器相同的資料,重複10次隨機選取200筆標記資料與未標記資料來測量,對於未標記資料得到平均距離誤差之總平均值分別為11.61、7.08與6.27公尺;對於標記資料得到平均距離誤差之總平均值分別為12.36、7.2與5.99公尺,得知使用深度強化學習組合變分自動編碼器之方法皆得到比較好的結果。


    The thesis is mainly studying indoor positioning using the wireless received signal strength indicators (RSSIs) data. RSSIs are often utilized in indoor positioning method using Bluetooth Beacon devices. There are many applications in moving or fixed object positioning with Beacon devices. This paper aims to use deep reinforcement learning (DRL) to predict where an object is located indoors and also compare and discuss with the experimental results of other well-known machine learning methods.
    This thesis first tries DRL method using single data each time in different environments that we don’t consider past experience yet, including only under the most simplified environment, and then we consider noise interference caused by the movement of indoor people, then we add noise interference caused by the indoor obstacles into consideration, and then we consider the DRL model combined with variational autoencoder (VAE) model, from the randomly selected of 200 unlabeled data, the distance error is 7.92m under the last combined environment . In addition, the thesis focuses on the use of labeled data and unlabeled data to train DRL+VAE model considering past experience, and the total average distance errors obtained by measuring predicted average distance errors 10 times of 200 randomly selected labeled data and unlabeled data are only 5.31m and 5.18m, respectively.
    In addition to using DRL+VAE method, this thesis also uses the experimental results of other famous machine learning methods to compare, including VAE and K-means algorithm which are unsupervised learning and convolutional neural network (CNN) which is supervised learning, and we use the same data as the DRL+VAE method to measure 10 times of the randomly selected labeled and unlabeled data. The total average values of the average distance errors for unlabeled data is 11.61, 7.08 and 6.27 meters, respectively. The total average values of the average distance error for labeled data is 12.36, 7.2, and 5.99 meters, respectively. Therefore, DRL+VAE model gets better results.

    中文摘要 I ABSTRACT II 致謝 III 目錄 IV 圖目錄 VII 表目錄 X 第一章 緒論 1 1.1 前言 1 1.2 研究背景 1 1.3 文獻回顧 2 1.4 研究動機 4 1.5 論文貢獻 5 1.6 論文架構 5 第二章 機器學習類型 6 2.1 前言 6 2.2 監督式學習 (Supervised learning) 7 2.2.1 線性回歸分析 (Linear Regression) 7 2.2.2 分類分析 (classification) 9 2.2.3 實例:卷積神經網路 (Convolutional Neural Network, CNN) [20] 10 2.2.4 其它監督式學習方法以及比較 12 2.3 非監督式學習 (Unsupervised learning) 13 2.3.1 分群 (clustering) 13 2.3.2 降維 (dimension reduction) 14 2.3.3 實例:深度生成網路 (Deep generative model) 15 2.4 強化學習 (Reinforcement learning) 16 2.4.1 強化學習基本介紹 16 2.4.2 強化學習應用 17 2.5 歸納與總結 21 第三章 深度強化學習 22 3.1 前言 22 3.2 馬可夫決策過程 (Markov Decision Process, MDP) [53] 23 3.2.1 馬可夫過程 (Markov Process) 23 3.2.2 馬可夫決策過程 (Markov Decision Process, MDPs) [53] 24 3.3 強化學習訓練方法-基於策略 (policy-based) 27 3.3.1 基於策略之步驟 27 3.3.2 策略梯度法 (policy gradient) 29 3.4 強化學習訓練方法-基於評判 (critic-based) 31 3.4.1 狀態價值函數Vπs (state value function) 31 3.4.2 狀態決策價值函數Qπs,a (State-action value function) 32 3.4.3 Q學習 (Q-learning) 33 3.5 在線學習法 (on-policy)與離線學習法 (off-policy) 36 第四章 變分自動編碼器 38 4.1 前言 38 4.2 高斯混合模型(Gaussian mixture model, GMM) 40 4.3 架構及原理 43 4.3.1 架構 43 4.3.2 原理 44 第五章 分析方法與實驗結果 49 5.1 前言 49 5.2 數據介紹 52 5.3 未考慮經驗之使用深度強化學習之預測結果 54 5.3.1 介紹 54 5.3.2 最簡化環境下假設之結果 56 5.3.3 考慮人員移動干擾之環境假設的結果 58 5.3.4 考慮固定障礙物干擾之環境假設的結果 60 5.3.5 深度強化學習結合變分自動編碼器的環境假設結果 63 5.3.6 不同高斯雜訊之假設結果比較 67 5.4 使用監督式學習以及非監督式學習之實驗結果 68 5.4.1 監督式學習-使用卷積神經網路方法 68 5.4.2 非監督式學習-使用變分自動編碼器 71 5.4.3 非監督式學習-使用分群法 (K-means) 72 5.5 使用深度強化學習以及監督式與非監督式方法的比較 73 第六章 結論與未來展望 78 參考文獻 80

    [1] G. Dedes and A. G. Dempster, "Indoor GPS positioning - challenges and opportunities," VTC-2005-Fall. 2005 IEEE 62nd Vehicular Technology Conference, 2005., Dallas, TX, USA, 2005, pp. 412-415.
    [2] Hartley, R.I., Sturm, P.: Triangulation. Computer Vision and Image Understanding Journal (CVIU) 68(2) (1997) 146–157
    [3] M. Li and Y. Lu, "Angle-of-arrival estimation for localization and communication in wireless networks," 2008 16th European Signal Processing Conference, Lausanne, 2008, pp. 1-5.
    [4] R. Kaune, "Accuracy studies for TDOA and TOA localization," 2012 15th International Conference on Information Fusion, Singapore, 2012, pp. 408-415.
    [5] F. Zafari, A. Gkelias and K. K. Leung, "A Survey of Indoor Localization Systems and Technologies," in IEEE Communications Surveys & Tutorials.
    [6] C. Chen, Y. Chen, H. Lai, Y. Han and K. J. R. Liu, "High accuracy indoor localization: A WiFi-based approach," 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, 2016, pp. 6245-6249.
    [7] Nowicki M., Wietrzykowski J. (2017) Low-Effort Place Recognition with WiFi Fingerprints Using Deep Learning. In: Szewczyk R., Zieliński C., Kaliczyńska M. (eds) Automation 2017. ICA 2017. Advances in Intelligent Systems and Computing, vol 550. Springer, Cham
    [8] H. A. Nahas and J. S. Deogun, "Radio Frequency Identification Applications in Smart Hospitals," Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS'07), Maribor, 2007, pp. 337-342.
    [9] S. Holm, "Hybrid ultrasound-RFID indoor positioning: Combining the best of both worlds," 2009 IEEE International Conference on RFID, Orlando, FL, 2009, pp. 155-162.
    [10] S. Gezici et al., "Localization via ultra-wideband radios: a look at positioning aspects for future sensor networks," in IEEE Signal Processing Magazine, vol. 22, no. 4, pp. 70-84, July 2005.
    [11] M. Mohammadi, A. Al-Fuqaha, M. Guizani and J. Oh, "Semisupervised Deep Reinforcement Learning in Support of IoT and Smart City Services," in IEEE Internet of Things Journal, vol. 5, no. 2, pp. 624-635, April 2018.
    [12] Y. Wang, Q. Ye, J. Cheng and L. Wang, "RSSI-Based Bluetooth Indoor Localization," 2015 11th International Conference on Mobile Ad-hoc and Sensor Networks (MSN), Shenzhen, 2015, pp. 165-171.
    [13] S. Feldmann, K. Kyamakya, A. Zapater, and Z. Lue, “An indoor Bluetooth-based positioning system: Concept, implementation and experimental evaluation,” in Proc. ICWN, vol. 272. Las Vegas, NV, USA, 2003, pp. 109–113.
    [14] M. Terán, J. Aranda, H. Carrillo, D. Mendez and C. Parra, "IoT-based system for indoor location using bluetooth low energy," 2017 IEEE Colombian Conference on Communications and Computing (COLCOM), Cartagena, 2017, pp. 1-6.
    [15] Andy Cavallini (2014) iBeacon Bible [Online]. Available: https://meetingofideas.files.wordpress.com/2015/09/beacon-bible-3-0.pdf
    [16] C. Gomez, J. Oller, and J. Paradells, “Overview and Evaluation of Bluetooth Low Energy: An Emerging Low-Power Wireless Technology,” Sensors, vol. 12, no. 9, pp. 11734–11753, Aug. 2012.
    [17] G. Félix, M. Siller and E. N. Álvarez, "A fingerprinting indoor localization algorithm based deep learning," 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), Vienna, 2016, pp. 1006-1011.
    [18] C. Hsieh, J. Chen and B. Nien, "Deep Learning-Based Indoor Localization Using Received Signal Strength and Channel State Information," in IEEE Access, vol. 7, pp. 33256-33267, 2019.
    [19] Ibrahim, Mai et al. “CNN based Indoor Localization using RSS Time-Series.” 2018 IEEE Symposium on Computers and Communications (ISCC) (2018): 01044-01049.
    [20] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
    [21] F. Pernkopf, “Bayesian network classifiers versus selective k-NN classifiers”, Pattern recognition, vol. 38, no. 1, pp. 1-10, 2005
    [22] Peng, C., Lee, K.L., & Ingersoll, G.M. (2002). An Introduction to Logistic Regression Analysis and Reporting.
    [23] Quinlan, J. 1986. Induction of decision trees. Machine Learning
    [24] L. Breiman. 2001. Random forests. Machine learning
    [25] Hsu, C.-W & Chang, C.-C & Lin, C.-J. (2003). A Practical Guide to Support Vector Classification. 101. 1396-1400.
    [26] Diederik P Kingma, Max Welling,” Auto-Encoding Variational Bayes” arXiv preprint arXiv:1312.6114, 2013
    [27] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman and A. Y. Wu, "An efficient k-means clustering algorithm: analysis and implementation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, July 2002.
    [28] J. Shlens. (2005, December) A tutorial on principal component analysis. [Online]. Available: http://www.cs.cmu.edu/∼elaw/papers/pca.pdf
    [29] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. Generative adversarial nets. In Proceedings of NIPS, pages 2672– 2680, 2014
    [30] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra,
    and M. Riedmiller, “Playing atari with deep reinforcement learning,”arXiv preprint arXiv:1312.5602, 2013.
    [31] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 201
    [32] F. Musumeci et al., "An Overview on Application of Machine Learning Techniques in Optical Networks," in IEEE Communications Surveys & Tutorials, vol. 21, no. 2, pp. 1383-1408, Secondquarter 2019.
    [33] Osmankovic, Dinko & Konjicija, Samim. (2011). Implementation of Q - Learning algorithm for solving maze problem.. 1619-1622.
    [34] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 1998
    [35] Yuxi Li. Deep Reinforcement Learning: An Overview. arXiv:1701.07274, 2017
    [36] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992.
    [37] Hinton, G. E. & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.
    [38] M Bishop, Christopher. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics).
    [39] Yeh James (2017) 資料分析-機器學習-第5-1講-卷積神經網絡介紹 [Online]. Available: https://medium.com/@yehjames/
    [40] Michael Copeland 人工智慧、機器學習與深度學習間有什麼區別? [Online]. Available: https://blogs.nvidia.com.tw/2016/07/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
    [41] 何維涓 模擬大腦學習過程,DeepMind用強化學習神經網路找出人類內化過去經驗解決新任務的關鍵 [Online]. Available: https://www.ithome.com.tw/news/123178
    [42] C. Poyton, Digital Video and HDTV Algorithms and Interfaces. San Francisco, CA: Morgan Kaufmann, 2003.
    [43] A. Singh, N. Thakur and A. Sharma, "A review of supervised machine learning algorithms," 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, 2016, pp. 1310-1315.
    [44] Tommy Huang (2018) 機器學習: 集群分析 K-means Clustering [Online]. Available: https://medium.com/@chih.sheng.huang821/機器學習-集群分析
    [45] Hinton, G.E., McClelland, J.L., & Rumelhart, D.E. (1986). Distributed representations. In D.E. Rumelhart & J.L. McClelland (Eds.), Parallel distributed processing: Explora tions in the microstructure of cognition. Cambridge, MA: MIT Press.
    [46] jonbruner generative-adversarial-networks [Online]. Available: https://github.com/jonbruner/generative-adversarial-networks/
    [47] Atari [Online]. Available: https://www.atari.com/
    [48] Alpha Go [Online]. Available: https://deepmind.com/research/alphago/
    [49] 李宏毅 (2017) Machine learning [Online]. Available: https://www.youtube.com/watch?v=CXgbekl66jc&list=PLJV_el3uVTsPy9oCRY30oBPNLCo89yu49
    [50] Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (1999a). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS) 12
    [51] Code HeroKu Introduction to Reinforcement Learning — Part 1 [Online]. Available: https://medium.com/code-heroku/introduction-to-reinforcement-learning-67826ec177ea
    [52] D. Ciregan, U. Meier and J. Schmidhuber, "Multi-column deep neural networks for image classification," 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, 2012, pp. 3642-3649.
    [53] Dan Klein (2013) Markov Decision Processes [Online]. Available: http://artificial-intelligence-class.org/assets/slides/11-Reinforcement-Learning.pdf
    [54] Feller, W. (1971) Introduction to Probability Theory and Its Applications, Vol II (2nd edition),Wiley. Section I.3
    [55] Metelli, A. M., Papini, M., Faccio, F., and Restelli, M. Policy optimization via importance sampling. In Advances in Neural Information Processing Systems, pp. 5447–5459, 2018.
    [56] Python, https://www.python.org/
    [57] Keras, https://keras.io/
    [58] Tensorflow, https://www.tensorflow.org/
    [59] Theano, http://deeplearning.net/software/theano/
    [60] Kyle Bai (2018) TensorFlow 筆記 [Online]. Available: https://hackmd.io/s/HJxsUvOpg
    [61] Tkinter, https://docs.python.org/3/library/tkinter.html
    [62] S. Sadowski and P. Spachos, "RSSI-Based Indoor Localization With the Internet of Things," in IEEE Access, vol. 6, pp. 30149-30161, 2018.
    [63] Ben-David, Shai; Kushilevitz, Eyal; Mansour, Yishay (1997-10-01). "Online Learning versus Offline Learning". Machine Learning. 29 (1): 45–63.
    [64] Harris, David and Harris, Sarah. Digital design and computer architecture (2nd ed.). San Francisco, Calif.: Morgan Kaufmann
    [65] Kullback, S.; Leibler, R. A. On Information and Sufficiency. Ann. Math. Statist. 22 (1951), no. 1, 79--86.
    [66] 郭韋良."Bone Age Assessment and C. elegans Age Prediction Using Deep Convolutional Neural Network" 國立清華大學電機工程學系碩士論文(2018)

    QR CODE