在邊緣雲加速分散資料的深度學習模型訓練｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊天琪 Yang, Tien-Chi
論文名稱：	在邊緣雲加速分散資料的深度學習模型訓練 Accelerate Deep Leaarning Model Training with Distrubuted Datasets in Edge Clouds
指導教授：	周志遠 Chou, Jerry
口試委員:	李哲榮賴冠州
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	22
中文關鍵詞：	分散式深度學習運算、邊緣運算、邊緣人工智慧
外文關鍵詞：	distributed computing system, edge computing, artificial intelligence
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在這篇碩士論文中，我們觀察深度學習在邊緣運算的行為，當分散式深度學
習的訓練放在邊緣運算系統中，會有三個問題造成整體的效能低落。首先
第一個問題，每個邊緣運算中的節點所擁有的資料量都不相同。第二個問
題，不同節點與節點間的距離會造成邊緣系統裡的網路速度不一。第三個問
題，每個節點的硬體或是剩餘的運算資源不同，則會造成每個節點的運算能
力不一。以上這些問題會造成同步式分散式深度學習的訓練效率不佳，因此
我們提出兩種搬移資料的演算法，分別是用整數線性規劃的方式以及用貪婪
法，讓邊緣運算系統的工作量分配平均，以達到效率提升的成效。在我們模
擬環境的實驗中可以看見經過我們的方法，可以讓整體的訓練速度提升3 倍
左右，而在我們架的真實邊緣運算環境中，藉由我們的方法可以讓訓練速度
提升40%

In this thesis, we investigated deep learning training in the edge computing system and observed that there have three problems can make model training performance inefficient due to workload imbalance: First and foremost, the amount of data in each node differs from other nodes; besides, the computing ability of edge nodes are also different; last, the bandwidth between edge nodes are distinctive. We proposed two data movement strategies, which are the ILP method and the Greedy method. In the simulation experiment, we can get almost 3x faster than the baseline results without data transfer. Moreover, in the real world experiment, we can get 40% faster than the baseline results.

Introduction -----1
Related work -----4
Methodology -----6
System model -----6
The integer linear programming method -----7
The Greedy method -----8
Evaluation -----11
Experiment setup -----11
Simulation Environment Evaluation -----12
Real-World Raspberry Pi Evaluation -----15
Conclusion -----19
References -----21
                                

[1] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. Tensorflow: A system for largescale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16) (2016), pp. 265–283.
[2] Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. Qsgd: Communication-efficient sgd via gradient quantization and encoding. In Advances in Neural Information Processing Systems (2017), pp. 1709–1720.
[3] Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).
[4] Chen, J., Li, K., Deng, Q., Li, K., and Yu, P. S. Distributed deep learning model for intelligent video surveillance systems with edge computing. IEEE Transactions on Industrial Informatics (2019), 1–1.
[5] Das, D., Avancha, S., Mudigere, D., Vaidyanathan, K., Sridharan, S., Kalamkar, D. D., Kaul, B., and Dubey, P. Distributed deep learning using synchronous stochastic gradient descent. CoRR abs/1602.06709 (2016).
[6] De Coninck, E., Verbelen, T., Vankeirsbilck, B., Bohez, S., Simoens, P., Demeester,P., and Dhoedt, B. Distributed neural networks for internet of things:The big-little approach. In International Internet of Things Summit (2015),Springer, pp. 484–492.
[7] Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M.,Senior, A., Tucker, P., Yang, K., et al. Large scale distributed deep networks.In Advances in neural information processing systems (2012), pp. 1223–1231.
[8] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (2009), Ieee, pp. 248–255.
[9] Deng, L., and Yu, D. Deep learning: methods and applications. Foundations and trends in signal processing 7, 3–4 (2014), 197–387.
[10] Duchi, J., Hazan, E., and Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research 12, 7 (2011).
[11] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. CoRR abs/1512.03385 (2015).
[12] Hubert, B. Wondershaper. http://lartc.org/wondershaper/.
[13] Kim, J., Kim, S., Kwon, N., Kang, H., Kim, Y., and Lee, C. Deep learning based automatic defect classification in through-silicon via process: Fa: Factory automation. In 2018 29th Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC) (2018), IEEE, pp. 35–39.
[14] Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[15] Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012), pp. 1097–1105.
[16] LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. nature 521, 7553 (2015), 436–444.
[17] Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B.-Y. Scaling distributed machine learning with the parameter server. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14) (2014), pp. 583–598.
[18] Lin, Y., Han, S., Mao, H., Wang, Y., and Dally, W. J. Deep gradient compression: Reducing the communication bandwidth for distributed training. CoRR abs/1712.01887 (2017).
[19] Recht, B., Re, C., Wright, S., and Niu, F. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems (2011), pp. 693–701.
[20] Seide, F., Fu, H., Droppo, J., Li, G., and Yu, D. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Fifteenth Annual Conference of the International Speech Communication Association (2014).
[21] Sergeev, A., and Balso, M. D. Horovod: fast and easy distributed deep learning in tensorflow. CoRR abs/1802.05799 (2018).
[22] Shi, W., Cao, J., Zhang, Q., Li, Y., and Xu, L. Edge computing: Vision and challenges. IEEE internet of things journal 3, 5 (2016), 637–646.
[23] Teerapittayanon, S., McDanel, B., and Kung, H. T. Distributed deep neural networks over the cloud, the edge and end devices. CoRR abs/1709.01921 (2017).
[24] You, Y., Zhang, Z., Hsieh, C., Demmel, J., and Keutzer, K. 100-epoch imagenet training with alexnet in 24 minutes. arXiv preprint arXiv:1709.05011 (2017).
[25] Zhang, W., Gupta, S., Lian, X., and Liu, J. Staleness-aware async-sgd for distributed deep learning. CoRR abs/1511.05950 (2015).

簡易檢索 / 詳目顯示

相關論文