研究生: |
阮郁善 Ruan, Yu-Shan |
---|---|
論文名稱: |
應用於深度神經網路推薦系統之高效率與低延遲內積加速器 A High-throughput Low-latency Inner Product Engine for Small-batch Inference of Deep Learning Recommendation Models |
指導教授: |
林永隆
Lin, Youn-Long |
口試委員: |
黃俊達
陳建文 郭皇志 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 30 |
中文關鍵詞: | 深度學習 、推薦系統 、加速器 、可程式化邏輯閘陣列 、小批次 |
外文關鍵詞: | Deep learning, Recommendation system, Accelerator, FPGA, Small-batch |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
推薦系統廣泛應用於各式產業,建構於用戶的過往歷程,其能精準地提供具吸引力的產品或廣告。再加上深度學習技術近年於各領域都有傑出表現,越來越多研究社群關注深度學習類神經網路應用於推薦系統領域,目前最先進且開源的 deep learning recommendation model (DLRM),使研究者們能為此貢獻所長,專注於提升系統效率、效能、負載能力等議題。
記憶體資源限制在深度學習技術中為實際落地的最大困難與考量點,大多數研究使用大批量導向架構以減緩記憶體頻寬需求。然而,其於低延遲與小批量限制情境下,會帶來系統使用率低落問題。為此本論文提出具高效率低延遲之內積加速器(RecIP) 應用於 DLRM 加速器中,針對硬體資源限制與高記憶體頻寬需求取得適宜的平衡,並能符合低延遲規格於小批量 DLRM 推論情境。
RecIP 引擎於 Intel Stratix 10 FPGA 平台在 100 MHz 可高達 819.2 GOP/s,並提供將近 90% 系統使用率,使用 50% 邏輯運算與 19% On-chip memory 資源,比大批量導向加速器僅增加各 3% 邏輯與記憶體資源就可於小批量下達高資源使用率。
Recommendation systems are utilized in various business applications. Based on user's records, they predict his/her rating or preference. Deep learning technology has achieved superior performance in diverse applications. An open-sourced deep learning recommendation model (DLRM) is getting popular in both academic and industry. More and more research communities contribute to optimizing deep-learning-based recommendation system efficiency, performance, and workload. However, most researchers adopt traditional DNN accelerators or used batch-oriented architecture for reducing memory traffic.
To address DLRM's unique compute and memory characteristics, we propose a latency-aware and high-throughput Inner Product Engine (RecIP). We need to process hardware resource limitations at high memory bandwidth requirements and support low-latency DLRM accelerator for small-query inference.
Implemented on an Intel Stratix 10 FPGA, RecIP engine achieves 819.2 GOP/s running at 100 MHz. It utilized 90% of computing resources, and 50% of logic resources, which is only 3% more than a batch-oriented architecture.
[1] Baidu. Deepbench, 2017.
[2] Cho, B. Y., Jung, J., and Erez, M. Accelerating bandwidthbound deep learning inference
with mainmemory accelerators. arXiv:2012.00158v1 (2020).
[3] Choquette, J., and Gandhi, W. Nvidia a100 gpu: Performance innovation for gpu computing. Hot Chips 32 Symposium (HCS), IEEE (2020).
[4] Chung, E., Fowers, J., Ovtcharov, K., Papamichael, M., Caulfield, A., Massengill, T., and
et al. Serving dnns in real time at datacenter scale with project brainwave. IEEE Micro
(2018).
[5] Facebook. Deep learning recommendation model for personalization and recommendation
systems, 2019.
[6] Gupta, U., Hsia, S., Saraph, V., Wang, X., Reagen, B., Wei, G.Y., Lee, H.H. S., Brooks,
D., and Wu, C.J. Deeprecsys: A system for optimizing endtoend atscale neural recommendation inference. Proceedings of the International Symposium on Computer Architecture (ISCA) (2020).
[7] Hanlon, J. why is so much memory needed for deep neural networks, 2016.
[8] He, M., Song, C., Kim, I., Jeong, C., Kim, S., Park, I., Thottethodi, M., and Vijaykumar,
T. N. Newton: A drammaker's acceleratorinmemory (aim) architecture for machine
learning. 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE (2020).
[9] Hwang, R., Kim, T., Kwon, Y., and Rhu, M. Centaur: A chipletbased, hybrid sparsedense accelerator for personalized recommendations. arXiv:2005.05968v1 (2020).
[10] Jannach, D., and Jugovac, M. Measuring the business value of recommender systems.
ACM Transactions on Management Information Systems (2019), 1–23.
[11] Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia,
S., Boden, N., and et al. Indatacenter performance analysis of a tensor processing unit.
ISCA (2017).
[12] Kumar, S., Bradbury, J., Young, C., Wang, Y. E., Levskaya, A., Hechtman, B., Chen,
D., and Lee, H. Exploring the limits of concurrency in ml training on google tpus.
arXiv:2011.03641 (2020).
[13] LAPEDUS, M. Inmemory vs. nearmemory computing, 2019.
29
[14] Lee, J., Suh, T., Roy, D., and Baucus, M. Emerging technology and business model innovation: The case of artificial intelligence. J. Open Innov. Technol. Mark. Complex.
(2019).
[15] Naumov, M., Mudigere, D., Shi, H.J. M., Huang, J., Sundaraman, N., Park, J., Wang, X.,
Gupta, U., Wu, C.J., and et al. Deep learning recommendation model for personalization
and recommendation systems. CoRR (2019).
[16] Reddi, V. J., Cheng, C., Kanter, D., Mattson, P., Schmuelling, G., Wu, C.J., Anderson, B.,
Breughe, M., Charlebois, M., Chou, W., Chukka, R., Coleman, C., Davis, S., Deng, P., and
et al. Mlperf inference benchmark. ACM/IEEE 47th Annual International Symposium on
Computer Architecture (ISCA) (2020).
[17] Underwood, C. Use cases of recommendation systems in business–current applications
and methods, 2020.
[18] Xie, X., Lian, J., Liu, Z., Wang, X., Wu, F., Wang, H., and Chen, Z. Personalized recommendation systems: Five hot research topics you must know, 2018.