研究生: |
徐嘉駿 Hsu, Chia-Chun |
---|---|
論文名稱: |
行動網路之合作式卷積神經網路佈署 Cooperative Convolutional Neural Network Deployment over Mobile Networks |
指導教授: |
陳文村
Chen, Wen-Tsuen 許健平 Sheu, Jang-Ping |
口試委員: |
楊得年
Yang, De-Nian 郭建志 Kuo, Jian-Jhih |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 41 |
中文關鍵詞: | 邊端運算 、卷積神經網路 、行動網路 |
相關次數: | 點閱:69 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來人工智慧(Artificial Intelligence)的興起,許多應用(例如:人臉辨識、物件偵測等)靠著卷積神經網路(Convolutional Neural Network, CNN)能有更好的正確率。雖然CNN在影像辨識上有不錯的成果,但其需大量的運算一直是反應時間的瓶頸,特別是佈署在運算能力較弱的邊端裝置上(例如:手機、IoT裝置)。許多研究透過霧運算(Fog Computing)及行動邊緣運算(Mobile Edge Computing)來減低其反應時間,基於此環境下發展出的切割深度神經網路模型/資料(Deep Neural NetworkModel/Data Partition)的分散式佈署技術。
然而,不嚴謹的模型/資料切割架構會使得反應時間不減反增,如何選取最佳邊端伺服器(Edge Server)的數量以協助運算及如何做適當的資料切割與佈署神經網路模型是一大挑戰。
在本論文中,我們探討了一個最佳化問題,名為CONVENE目標是最小化卷積神經網路的反應時間。為了理解該問題的基本特性,我們首先將CONVENE簡化成CONVENE-SC,在只有一個通道傳輸資料的情況下設計了一個演算法,名為THREAD-SC。接著,進一步為了解決CONVENE,我們延伸THREAD-SC的概念提出另一個演算法,名為THREAD,此演算法考量了多個通道傳輸資料情況下邊端伺服器數量的選擇與適當的模型與資料分配,以達成更短的反應時間。模擬結果顯示,我們的演算法(THREAD)相較於其他研究方法減少大約50%的時間。實際的反應時間測量也約與模擬情況相符。
Inference acceleration has drawn much attention to cope with the real-time requirement of artificial intelligence (AI) applications. Normally, the overall AI model is offloaded to the cloud for acceleration, which may cause a heavy workload for the backhaul networks and clouds. In order to alleviate the workload, model partition for Convolutional Neural Networks (CNN) has been proposed to utilize the parallel and distributed computing units (e.g., mobile edge servers). The previous works concentrated on the load balancing among servers, however, it may omit the interplay between the computing and communication. It makes the existing approaches less efficient especially in mobile edge networks at which smart devices usually equipped with limited computing capacity that leads to the necessity of offloading the tasks via limited bandwidth capacity to nearby servers. On the other hand, smart devices may have one or multiple available channels which can be used to accelerate transmission in parallel. Therefore, in this thesis, we propose a new system and formulate a new optimization problem, Cooperative CNN Deployment over Mobile Network Edges (CONVENE), to minimize the completion time of inference for the smart devices through one or more available channels. To explore the intrinsic properties, we first study CONVENE with Single Channel and derive an algorithm termed Time-Length-Aware Cooperator Selection and Model Partition Algorithm with Single Channel (THREAD-SC) based on cumulative servers laws addressing suitable number of servers and elastic model partition to get the optimum solution. Next, an extension, Time-Length-Aware Cooperator Selection and Model Partition Algorithm (THREAD), is proposed to subtly utilize multiple channels to further reduce completion time. The simulation and implementation results manifest that our system DeepCo with our algorithm THREAD reduces the total completion time by 50% compared with other naive solutions and works effectively in practice.
[1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805,2018.
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016.
[3] T. Ching et al., “Opportunities and obstacles for deep learning in biology and medicine,” Journal of The Royal Society Interface, vol. 15, p. 20 170 387, 2018.
[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019.
[5] Q. Liu, S. Huang, J. Opadere, and T. Han, “An edge network orchestrator for mobile augmented reality,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2018.
[6] L. Wang, L. Jiao, T. He, J. Li, and M. Mühlhäuser, “Service entity placement for social virtual reality applications in edge computing,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2018.
[7] (2019). Cisco visual networking index: Global mobile data traffic forecast update. White paper.
[8] C. Marquez, M. Gramaglia, M. Fiore, A. Banchs, and X. Costa-Perez, “How should I slice my network? A multi-service empirical evaluation of resource sharing efficiency,” in Proceedings of International Conference on Mobile Computing and Networking (MobiCom), 2018.
[9] P. Rost, A. Banchs, I. Berberana, M. Breitbach, M. Doll, H. Droste, C. Mannweiler, M. A. Puente, K. Samdanis, and B. Sayadi, “Mobile network architecture evolution toward 5G,” IEEE Communications Magazine, vol. 54, pp. 84–91, 2016.
[10] T. G. Rodrigues, K. Suto, H. Nishiyama, and N. Kato, “Hybrid method for minimizing service delay in edge cloud computing through VM migration and transmission power control,” IEEE Transactions on Computers, vol. 66, pp. 810–819, 2017.
[11] J. Mao, X. Chen, K. W. Nixon, C. Krieger, and Y. Chen, “MoDNN: Local distributed mobile computing system for deep neural network,” in Proceedings of Design, Automation Test in Europe Conference Exhibition (DATE), 2017.
[12] J. Mao, Z. Yang, W. Wen, C. Wu, L. Song, K. W. Nixon, X. Chen, H. Li, and Y.Chen, “MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale dnns,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2017.
[13] Z. Zhao, K. M. Barijough, and A. Gerstlauer, “Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, pp. 2348–2359, 2018.
[14] Y. Ban, C. Li, C. Sim, G. Wu, and K. Wong, “4G/5G multiple antennas for future multi-mode smartphone applications,” IEEE Access, vol. 4, pp. 2981–2988, 2016.
[15] G. Klein and D. Murray, “Parallel tracking and mapping for small AR workspaces,” in Proceedings of IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR), 2007.
[16] J. Xu, L. Chen, and P. Zhou, “Joint service caching and task offloading for mobile edge computing in dense networks,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2018.
[17] Y. He, J. Ren, G. Yu, and Y. Cai, “Joint computation offloading and resource allocation in D2D enabled mec networks,” in Proceedings of IEEE International Conference on Communications (IEEE ICC), 2019.
[18] Y. Xiao and M. Krunz, “Qoe and power efficiency tradeoff for fog computing networks with fog node cooperation,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2017.
[19] L. Tong and W. Gao, “Application-aware traffic scheduling for workload offloading in mobile clouds,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2016.
[20] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artif. Intell. Res., vol. 4, pp. 237–285, 1996.
[21] L. Huang, S. Bi, and Y. J. Zhang, “Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks,” IEEE Transactions on Mobile Computing, 2019.
[22] S. Wang, R. Urgaonkar, T. He, K. Chan, M. Zafer, and K. K. Leung, “Dynamic service placement for mobile micro-clouds with predicted future costs,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, pp. 1002–1016, 2016.
[23] L. Wang, L. Jiao, J. Li, and M. Mühlhäuser, “Online resource allocation for arbitrary user mobility in distributed edge clouds,” in Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS), 2017.
[24] L. Gu, D. Zeng, W. Li, S. Guo, A. Zomaya, and H. Jin, “Deep reinforcement learning based VNF management in geo-distributed edge computing,” in Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS), 2019.
[25] C. Zhang, H. Du, Q. Ye, C. Liu, and H. Yuan, “DMRA: A decentralized resource allocation scheme for multi-sp mobile edge computing,” in Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS), 2019.
[26] X. Ran, H. Chen, X. Zhu, Z. Liu, and J. Chen, “DeepDecision: A mobile deep learning framework for edge video analytics,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2018.
[27] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[28] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. arXiv: 1704.04861.
[29] S. Srinivas and R. V. Babu, “Data-free parameter pruning for deep neural networks,” in Proceedings of the British Machine Vision Conference (BMVC), 2015, pp. 31.1–31.12.
[30] H. M. Song Han and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” in Proceedings of International Conference on Learning Representations (ICLR), 2016.
[31] F. Tung and G. Mori, “Deep neural network compression by in-parallel pruningquantization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
[32] S. Teerapittayanon, B. McDanel, and H. T. Kung, “Distributed deep neural networks over the cloud, the edge and end devices,” in Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS), 2017, pp. 328–339.
[33] Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017.
[34] C. Hu, W. Bao, D. Wang, and F. Liu, “Dynamic adaptive DNN surgery for inference acceleration on the edge,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2019, pp. 1423–1431.
[35] B. M. S. Teerapittayanon and H. Kung, “Branchynet: Fast inference via early exiting from deep neural networks,” in Proceedings of International Conference on Pattern Recognition (ICPR), 2016.
[36] J.-I. Chang, J.-J. Kuo, C.-H. Lin, W.-T. Chen, and J.-P. Sheu, “Ultra-low-latency distributed deep neural network over hierarchical mobile networks,” in Proceedings of IEEE Global Communications Conference (GLOBECOM), 2019.
[37] J. Mao, Z. Qin, Z. Xu, K. W. Nixon, X. Chen, H. Li, and Y. Chen, “AdaLearner: An adaptive distributed mobile learning system for neural networks,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2017.
[38] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communicationefficient learning of deep networks from decentralized data,” in Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.
[39] J. Konecný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” in NIPS Workshop on Private Multi-Party Machine Learning, 2016.
[40] R. Anil, G. Pereyra, A. T. Passos, R. Ormandi, G. Dahl, and G. Hinton, “Large scale distributed neural network training through online distillation,” in Proceedings of International Conference on Learning Representations (ICLR), 2018.
[41] E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S. Kim, “Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data,” arXiv preprint arXiv:1811.11479, 2018.
[42] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “When edge meets learning: Adaptive control for resource constrained distributed machine learning,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2018.
[43] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive federated learning in resource constrained edge computing systems,” IEEE Journal on Selected Areas in Communications, vol. 37, pp. 1205–1221, 2019.
[44] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, “In-edge AI: Intelligentizing mobile edge computing, caching and communication by federated learning,” IEEE Network, vol. 33, pp. 156–165, 2019.
[45] S. Wang, W. Chen, X. Zhou, S. Chang, and M. Ji, “Addressing skewness in iterative ML jobs with parameter partition,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2019.
[46] T. Nishio and R. Yonetani, “Client selection for federated learning with heterogeneous resources in mobile edge,” in Proceedings of IEEE International Conference on Communications (IEEE ICC), 2019.
[47] N. H. Tran, W. Bao, A. Zomaya, M. N. H. Nguyen, and C. S. Hong, “Federated learning over wireless networks: Optimization model design and analysis,” in Proceedings of IEEE International Conference on Computer Communications (INFOCOM), 2019.
[48] C. Ng, M. Barketau, T. Cheng, and M. Y. Kovalyov, “Product partition and related problems of scheduling and systems reliability: Computational complexity and approximation,” European Journal of Operational Research, vol. 207, pp. 601–604, 2010.
[49] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261–2269.