具備時間平行數據流和高效突觸記憶體壓縮的突波神經網路加速器

簡易檢索 / 詳目顯示

回結果列表

研究生：	林郁軒 Lin, Yu-Hsuan
論文名稱：	具備時間平行數據流和高效突觸記憶體壓縮的突波神經網路加速器 A Spiking Neural Network (SNN) Accelerator with Temporal Parallel Dataflow and Efficient Synapse Memory Compression Mechanism
指導教授：	鄭桂忠 Tang, Kea-Tiong
口試委員:	謝志成 Hsieh, Chih-Cheng 謝秉璇 Hsieh, Ping-Hsuan 盧峙丞 Lu, Chih-Cheng
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2024
畢業學年度：	112
語文別：	中文
論文頁數：	55
中文關鍵詞：	突波神經網路、加速器、時間步並行計算、突觸壓縮、輸入訊號稀疏感知
外文關鍵詞：	Spiking neural networks (SNNs), Weight sparsity
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

突波神經網路(Spiking neural networks, SNNs)作為第三代人工神經網路模型，相較於傳統的類神經網路(Artificial neural networks, ANNs)更接近生物神經元的運作方式，突波神經網路透過二進位值輸入和輸出突波進行訊息傳遞，模擬神經元在時間和空間中的突波傳遞模式，因為僅記載突波發生與否，特別適合處理複雜的時空數據，在邊緣硬體上可節省大量運算能耗。神經形態運算被視為機器學習的未來，提供了一種新的認知計算方式，對於突波神經網路模擬的硬體加速器的需求正在迅速增加。
突波神經網路先天的高稀疏性以及事件驅動運算特性帶來的低功耗適合對能耗要求極高的邊緣裝置。此外，在資源受限移動式裝置上，我們同時要求少的儲存空間需求。突波神經網路與傳統的人工神經網路相比，非常適合處理複雜與具有時間維度相關性的資料，然而，常見的突波神經網路晶片針對不同時間步的計算是採取重複訪問資料的方式，導致高能耗。
本研究提出一跨時間步並行計算以及高效突觸記憶體結構之硬體加速架構，並具備激發神經元權重搜索電路的突波稀疏感知策略，以此達到更低的能耗、更少的硬體資源使用量、更高能效膜電位累加計算等優勢。此論文提出之架構與方法應用於完全連接的 256-128-128-10 網絡，對 16×16 MNIST 訓練圖像進行分類，達到 0.2 pJ/SOP 的能量效率，最高1.9倍的加速以及減少2倍的記憶體存取次數。

Spiking Neural Networks (SNNs), as the third generation of artificial neural network models, are closer to the operation mode of biological neurons compared to traditional Artificial Neural Networks (ANNs). SNNs transmit information through binary value input and output spikes, simulating the spiking transmission patterns of neurons in time and space. Because only the occurrence of spikes is recorded, they are particularly suitable for processing complex spatiotemporal data and can save a significant amount of computational energy on edge hardware. There is a rapidly increasing demand for hardware accelerators simulating SNNs. The inherent high sparsity and low power consumption characteristics of SNNs, due to event-driven computation, make them suitable for edge devices with extremely high energy consumption requirements. Furthermore, in resource-constrained mobile devices, we also require minimal storage space. Compared to traditional ANNs, SNNs are very suitable for processing complex data with temporal correlations. However, common spiking neural network chips typically involve repeated data access for computations at different time steps, leading to high energy consumption.
This study proposes a hardware acceleration architecture with Temporally Parallel Weight-Friendly (TPWF) dataflow and efficient synaptic memory structures. It also incorporates a spiking sparse sensing strategy for excitatory neuron weight searching circuits to achieve lower energy consumption, reduced hardware resource usage, and higher energy-efficient membrane potential accumulation calculations. The proposed architecture and methods are applied to a fully connected 256-128-128-10 network for classifying 16×16 MNIST training images, achieving an energy efficiency of 0.2 pJ/SOP, up to 1.9 times acceleration, and reducing memory access times by 2 times.

摘要............I
ABSTRACT........II
目錄............III
圖目錄..........    V
表格目錄........    VII
第 1 章    緒論.................1
1    研究背景.............1
2    研究動機與目的........4
3    章節簡介..............7
第 2 章    文獻回顧..............8
1    突波神經網路架構.......8
1.1    神經元模型.............8
1.2    突波神經網路架構...........9
1.3    時間順序處理資料流.........10
1.4    資料復用..................10
1.5    輸入稀疏性................11
1.6    權重稀疏性................13
2    突波神經網路加速器.........14
2.1    時間順序處理資料流之加速器..14
2.2    資料復用..................15
2.3    片上記憶體空間壓縮.........16
3    研究動機.................17
第 3 章    具時間平行數據流與高效記憶體壓縮的突波神經網路加速器.....19
1    突波神經網路加速器架構.....19
2    突波運算行為與數據流...... 25
2.1    神經元模型與電路(I-QIF neuron model)......25
2.2    時間並行數據流(Temporally Parallel Weight-Friendly dataflow)    26
2.3    後突觸系統架構(Post-synaptic core design)..29
3    基於脈動陣列的輸入稀疏感知與權重映射.........32
3.1    輸入稀疏性跳零機制(Input Sparsity Zero-Skipping mechanism)32
3.2    隨機突波與權重映射 (Random Spiks and Weight Mapping)......34
4    權重壓縮機制架構與權重搜索..36
4.1    長度壓縮編碼與定義(Run Length Encoding Definition) 36
4.2    突觸權重映射 (Sparse Synaptic Weight Mapping).....38
4.3    記憶體索引編碼(Sparsity Index Code)...............38
4.4    激發神經元權重搜索 (Burst Neuron Weight Search Mechanism).40
第 4 章    實驗結果...................42
1    FPGA 系統驗證與環境設置.....42
2    加速器效能分析與神經網路應用.44
2.1    加速幅度分析..........44
2.2    記憶體存取次數分析.....45
2.3    稀疏化感知架構成效.....46
3    與先進加速器效能比較...49
第 5 章    結論與未來發展........51
參考文獻.....................53






                                

[1] Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (1986-10-09). "Learning representations by back-propagating errors". Nature. 323 (6088): 533– 536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0.
[2] McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, P.115-133, 1943
[3] Murat Isik, "A Survey of Spiking Neural Network Accelerator on FPGA." arXiv:2307.03910 [cs.AR], https://doi.org/10.48550/arXiv.2307.03910
[4] Y.-H. Wang, T.-C. Gong, Y.-X. Ding, Y. Li, W. Wang, Z.-A. Chen, N. Du, E. Covi, M. Farronato, D. Ielmini, X.-M. Zhang, Q. Luo, Redox memristors with volatile threshold switching behavior for neuromorphic computing, Journal of Electronic Science and Technology (2022), doi: https:// doi.org/10.1016/j.jnlest.2022.100177.
[5] J. von Neumann, "First draft of a report on the EDVAC, " in IEEE Annals of the History of Computing, vol. 15, no. 4, pp. 27-75, 1993.
[6] F.Akopyanetal.,“TrueNorth:Designandtoolflowofa65mW1millionneuron programmable neurosynaptic chip,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume 34, No. 10, P.1537-1557, 2015.
[7] Furber S B, Lester D R, Plana L A, Garside J D, Painkras E, Temple S, Brown A D. Overview of the SpiNNaker system architecture. IEEE Transactions on Computers, 2013, 62(12): 2454-2467. K. Elissa, “Title of paper if known,” unpublished.
[8] M. Davies et al., “Loihi: A neuromorphic manycore processor with on-chip learning,” IEEE Micro, Volume 38, No. 1, P.82-99, 2018.C. Szegedy et al., "Going deeper with convolutions." In Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, doi: 10.1109/CVPR.2015.
[9] Y. Liu et al., " An 82 nW 0.53 pJ/SOP clock-free spiking neural network with 40 μ s latency for AloT wake-up functions using ultimate-event-driven bionic architecture and computing-in-memory technique ", IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, vol. 65, pp. 372-374, Feb. 2022.
[10] M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA, 2014, pp. 10-14, doi: 10.1109/ISSCC.2014.6757323. E. Lindholm, et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," In IEEE Micro, vol. 28, no. 2, pp. 39-55, 2008.
[11] Filipp Akopyan, Jun Sawada, Andrew Cassidy, Rodrigo Alvarez-Icaza, John Arthur, Paul Merolla, Nabil Imam, Yutaka Nakamura, Pallab Datta, Gi-Joon Nam et al., "Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip", IEEE transactions on computer-aided design of integrated circuits and systems, vol. 34, no. 10, pp. 1537-1557, 2015. Zhuang Liu et al., "Learning efficient convolutional networks through network slimming."
[12] Frenkel, C., Legat, J.-D., and Bol, D. (2018). A 0.086-mm2 9.8-pj/sop 64k-synapse 256-neuron online-learning digital spiking neuromorphic processor in 28nm CMOS. arXiv preprint arXiv:1804.07858. Available online at: https://arxiv.org/ abs/1804.07858
[13] V. Truong-Tuan et al., "FPGA Implementation of Parallel Neurosynaptic Cores for Neuromorphic Architectures", 2021 19th IEEE International New Circuits and Systems Conference, pp. 1-4, 2021.
[14] Z. Yang et al., "Back to Homogeneous Computing: A Tightly-Coupled Neuromorphic Processor With Neuromorphic ISA," in IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 11, pp. 2910-2927, Nov. 2023, doi: 10.1109/TPDS.2023.3307408.
[15] J. -J. Lee, W. Zhang and P. Li, "Parallel Time Batching: Systolic-Array Acceleration of Sparse Spiking Neural Computation," 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Korea, Republic of, 2022, pp. 317-330, doi: 10.1109/HPCA53966.2022.00031. Z. Cai, et al., “Deep learning with low precision by half-wave gaussian quantization.” In CVPR, 2017.
[16] B. U. Pedroni et al., "Forward table-based presynaptic event-triggered spike-timing-dependent plasticity," 2016 IEEE Biomedical Circuits and Systems Conference (BioCAS), Shanghai, China, 2016, pp. 580-583, doi: 10.1109/BioCAS.2016.7833861.
[17] G. K. Chen, R. Kumar, H. E. Sumbul, P. C. Knag and R. K. Krishnamurthy, "A 4096-Neuron 1M-Synapse 3.8PJ/SOP Spiking Neural Network with On-Chip STDP Learning and Sparse Weights in 10NM FinFET CMOS," 2018 IEEE Symposium on VLSI Circuits, Honolulu, HI, USA, 2018, pp. 255-256, doi: 10.1109/VLSIC.2018.8502423.
[18] C. Frenkel, J. Legat, D. Bol, “MorphIC: A 65-nm 738k-Synapse/mm$^2$ Quad- Core Binary-Weight Digital Neuromorphic Processor With Stochastic Spike- Driven Online Learning,” IEEE Transactions on Biomedical Circuits and Systems, Volume 13, No. 5, P.999-1010, 2019.
[19] S. Narayanan, K. Taht, R. Balasubramonian, E. Giacomin and P.-E. Gaillardon, "Spinalflow: an architecture and dataflow tailored for spiking neural networks", 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain, 2020, pp. 349-362
[20] Kung, H. T. and Charles E. Leiserson. "Systolic Arrays for (VLSI), " (1978).
[21] Jeong-Jun Lee, Peng Li “Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators”, 2020 IEEE 38th International Conference on Computer Design (ICCD)
[22] G. K. Chen, R. Kumar, H. E. Sumbul, P. C. Knag and R. K. Krishnamurthy, "A 4096-Neuron 1M-Synapse 3.8-pJ/SOP Spiking Neural Network With On-Chip STDP Learning and Sparse Weights in 10-nm FinFET CMOS," in IEEE Journal of Solid-State Circuits, vol. 54, no. 4, pp. 992-1002, April 2019, doi: 10.1109/JSSC.2018.2884901.
[23] Sze, V., Chen, Y.-H., Yang, T.-J., and Emer, J., "Efficient Processing of Deep Neural Networks: A Tutorial and Survey, " arXiv, 2017.
[24] J. Park, J. Lee, and D. Jeon, “A 65nm 236.5 nJ/classification neuromorphic processor with 7.5% energy overhead on-chip learning using direct spike-only feedback,” in Proc. IEEE Int. Solid-State Circuits Conference (ISSCC), pp. 140–142, 2019
[25] J. Zhang et al., “22.6 ANP-I: A 28nm 1.5pJ/SOP Asynchronous Spiking Neural Network Processor Enabling Sub-O.1 μJ/Sample On-Chip Learning for Edge-AI Applications,” 2023 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, Feb. 19, 2023. doi:10.1109/isscc42615.2023.10067650.
[26] Jongkil Park, YeonJoo Jeong, Jaewook Kim, Suyoun Lee, Joon Young Kwak, Jong-Keuk Park, “High-Density Digital Neuromorphic Processor with High-Precision Neural and Synaptic Dynamics and Temporal Acceleration,”2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS).
[27] Chen-Fu Yeh. depth-from-motion. https://github.com/twetto/depth-from-motion, 2020
[28] X. Ju, B. Fang, R. Yan, X. Xu, and H. Tang, “An fpga implementation of deep spiking neural networks for low-power and fast classification,” Neural computation, vol. 32, no. 1, pp. 182–204, 2020.
[29] H. Fang, A. Shrestha, D. Ma, and Q. Qiu, “Scalable noc-based neuromorphic hardware learning and inference,” in 2018 International joint conference on neural networks (IJCNN). IEEE, 2018, pp. 1–8.
[30] S. Li, Z. Zhang, R. Mao, J. Xiao, L. Chang, and J. Zhou, “A fast and energy-efficient snn processor with adaptive clock/event-driven com- putation scheme and online learning,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 4, pp. 1543–1552, 2021.
[31] Y. Liu, Y. Chen, W. Ye, and Y. Gui, “Fpga-nhap: A general fpga-based neuromorphic hardware acceleration platform with high speed and low power,” IEEE Transactions on Circuits and Systems I: Regular Papers, 2022.

簡易檢索 / 詳目顯示

相關論文