研究生: |
盧允凡 Lu, Yun-Fan |
---|---|
論文名稱: |
基於權重剪枝下的深度學習硬體加速器之工作排程問題 Job Scheduling Based on Weight Pruning in a Deep Learning Accelerator |
指導教授: |
黃婷婷
Hwang, TingTing |
口試委員: |
吳中浩
Wu, Allen C.-H. 黃稚存 Huang, Chih-Tsun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | 深度學習加速器 、權重剪枝 、排程問題 、卷積神經網路 |
外文關鍵詞: | Deep Learning Accelerator, Weight Pruning, Scheduling Problem, Convolutional Neural Network |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
深度學習(Deep Learning, DL)在許多領域已取得突破性發展,然而高效能運算 是實現人工智能應用的關鍵。先前研究發現深度神經網路(Deep Neural Network, DNN)中有許多零或非常接近零的權重(Weight)。在深度學習硬體加速器(Deep Learning Accelerator)設計時,將這些權重刪除可以大幅提升運算效能,即為權 重剪枝(Weight Pruning)。但是即使相同的神經網路模型在不同的應用下,模型 的參數也會有所差異。這些差異會造成硬體設計上的不同,而有不同的工作排程 (Job Scheduling)需求。為了縮小硬體設計時間成本,以自動化方法分析並歸納 出適當的工作排程顯得十分重要。我們以權重剪枝(Weight Pruning)為基礎,實 現硬體資源的最佳化問題,並且探討硬體架構下的效能指標,以及提出工作排程 問題(Job Scheduling Problem)的解決方法。
Application of Deep Learning (DL) has achieved a huge breakthrough in many fields. Many innovative DL applications require efficient computation. Previous work has found that neural networks of DL have many zero and near to zero weights. These weights can be deleted, i.e. weight pruning, to improve computation efficiency of the deep neural networks (DNNs). Also, different neural network model varies from one to one. This leads to the difficulty in the hardware design and job scheduling. Thus, using an automation technology to analyze and support the hardware accelerator design flow may be helpful. In this work, we study an optimization problem based on weight pruning technology, discuss the performance of hardware design, and propose a solution to a job scheduling problem.
[1] Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Ra- minder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre- luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adri- ana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gre- gory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ’17, page 1–12, New York, NY, USA, 2017. Association for Computing Machinery.
[2] Song Han, Jeff Pool, John Tran, and William J. Dally. Learning both weights and connections for efficient neural networks. CoRR, abs/1506.02626, 2015.
[3] L. Du, Y. Du, Y. Li, J. Su, Y. Kuan, C. Liu, and M. F. Chang. A reconfigurable streaming deep convolutional neural network accelerator for internet of things. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(1):198–208, Jan 2018.
[4] Yann Lecun, J. S. Denker, Sara A. Solla, R. E. Howard, and L.D. Jackel. Optimal brain damage. In David Touretzky, editor, Advances in Neural Information Processing Systems (NIPS 1989), Denver, CO, volume 2. Morgan Kaufmann, 1990.
[5] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep learning with limited numerical precision. CoRR, abs/1502.02551, 2015.
[6] Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149, 2015.
[7] Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size. CoRR, abs/1602.07360, 2016.
[8] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, To- bias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
[9] Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian M. Caulfield, Eric S. Chung, and Doug Burger. A configurable cloud- scale dnn processor for real-time ai. In Proceedings of the 45th Annual International Symposium on Computer Architecture, ISCA ’18, page 1–14. IEEE Press, 2018.
[10] Y. Chen, T. Krishna, J. S. Emer, and V. Sze. Eyeriss: An energy-efficient reconfig- urable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1):127–138, Jan 2017.
[11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
[12] John L Hennessy and David A Patterson. Computer architecture: a quantitative ap- proach. Elsevier, 2017.