基於「資訊再使用距離」之分析的系統化卷積類神經網路資料流優化方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	李佳齊 Lee, Chia-Chi
論文名稱：	基於「資訊再使用距離」之分析的系統化卷積類神經網路資料流優化方法 A Systematic Reuse-distance-based Approach for Convolution Neural Network Dataflow Optimization
指導教授：	蔡仁松 Tsay, Ren-Song
口試委員:	麥偉基 Mak, Wai-Kei 何宗易 Ho, Tsung-Yi 呂仁碩 Liu, Ren-Shuo
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	43
中文關鍵詞：	卷積類神經網路、資料流優化、資料再使用距離
外文關鍵詞：	Neural, Dataflow, Reuse-distance
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本篇論文中，我們提出了一種有效率的卷積類神經網路資料流優化的方法，此方法利用「資訊再使用距離」的方法來量化不同資料流的資料局部性，我們這個量化方法的獨特貢獻在於我們可以在早期系統設計階段時針對不同的記憶體架構去系統化地設計卷積運算的最佳資料流，此外，我們的方法可以應用在通用處理器或專用處理器上。通過跟特定處理器的最佳發布結果比較來驗證我們方法，我們的方法花費不到一秒鐘來找到最佳資料流，並且運行過程比先前的工作快了三到四個數量級。為了證明我們的方法具有通用性，實驗數據中也顯示我們的方法可以在其它由DineroIV模擬的其他記憶體架構中得出卷積運算的最佳資料流。

In this paper, we propose an efficient CNN dataflow optimization approach leveraging the reuse-distance method to quantify the data locality of different dataflows. A unique contribution of our quantitative approach is that we can systematically design the optimal dataflow of convolution computation for different memory architectures in the early system design phase. Additionally, our method can apply to general-purpose processors or customized processors. We verify our approach positively by comparing the best-published results of some specific processors. Our approach takes less than one second to find the optimal dataflow and runs 3 to 4 orders faster than previous works. To prove the versatility of our approach, the experimental results also show that our approach can produce the optimal dataflow of convolution computation on other memory architecture simulated by DineroIV.

I.    Introduction    5
II.    Preliminary    8
A.    CNN Basics and Notations    8
B.    Related Work    10
1)    Convolution Computation with Rank-lowering    10
2)    Convolution Computation with Tiling    12
3)    Data Reuse Pattern of Convolution Computation    14
4)    Reuse-distance Approach    15
III.    Method    17
A.    Overview    17
B.    The Loop Order of Convolution Computation with Tiling    19
C.    Quantifying the Data Locality of Convolution Computation with Tiling    21
D.    Quantify the Data Locality of Convolution Computation with Rank-lowering    30
E.    Evaluate the Number of External Memory Accesses    32
IV.    Experimental Results    34
A.    Experiment Setup    34
B.    Verifying Our Approach on Published Memory Architecture    34
C.    Verifying Our Approach on Simulated Cache    36
V.    Conclusion    40
VI.    Reference    41
                                

[1] Jason Cong and Bingjun Xiao. “Minimizing computation in convolutional neural networks.” in Proc. Int. Conf. Artif. Neural Netw. (ICANN), pp. 281-290, 2014.
[2] Wonkyung Jung, Daejin Jung, Byeongho Kim, Sunjung Lee, Wonjong Rhee, and Jung Ho Ahn. “Restructuring Batch Normalization to Accelerate CNN Training.”, CoRR, vol. abs/1807.01702, 2018.
[3] Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.”, Proc. IEEE Int’I Solid-States Circuits Conf. (ISSCC 16), pp.262-263, 2016
[4] Song Han, Huizi Mao, and William J. Dally. “Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding.”, in Int. Conf. Learning Representations (ICLR), 2016.
[5] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. “Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or -1.”, CoRR, vol. abs/1511.00363, 2015.
[6] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ail Farhadi. “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.”, CoRR, vol. abs/1603.05279, 2016.
[7] Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. “Hardware-oriented Approximation of Convolutional Neural Networks.”, CoRR, vol. abs/1605.06402, 2016.
[8] Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. “Quantized Convolutional Neural Networks for Mobile Devices.”, in CVPR. IEEE Computer Society, pp.4820-4828, 2016
[9] Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. “Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights.”, in Proc. ICLR, 2017.
[10] Song Han, Jeff Pool, John Tran, and William J. Dally. “Learning both Weights and Connections for Efficient Neural Networks.”, in Advances in Neural Information Processing Systems, 2015.
[11] Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. “Pruning Filters for Efficient ConvNets.”, in ICLR, pages 1-13, 2017.
[12] Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. “Pruning Convolutional Neural Networks for Resource Efficient Inference.”, in ICLR, pages 1-17, 2017.
[13] Yihui He, Xiangyu Zhang, and Jian Sun. “Channel Pruning for Accelerating Very Deep Neural Networks.”, in International Conference on Computer Vision (ICCV), vol. 2, p. 6, 2017.
[14] Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. “ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression.”, in The IEEE International Conference on Computer Vision (ICCV), 2017.
[15] Kumar Chellapilla, Sidd Puri, and Patrice Simard. “High Performance Convolutional Neural Networks for Document Processing.”, in Tenth International Workshop on Frontiers in Handwriting Recognition, October 2006.
[16] Chetlur, Sharan, Woolley, Cliff, Vandermersch, Philippe, Cohen, Jonathan, Tran, John, Catanzaro, Bryan, and Shelhamer, Evan. “cuDNN : Efficient Primitives for Deep Learning.”, CoRR, abs/1410.0759, 2014.
[17] Minsik Cho and Daniel Brand. “MEC: Memory-efficient Convolution for Deep Neural Network.”, in Proc. Int. Conf. Mach. Learn. (ICML), Sydney, NSW, Australia, pp. 815-824, 2017.
[18] Aravind Vasudevan, Andrew Anderson, and David Gregg. “Parallel Multi Channel Convolution using General Matrix Multiplication.”, in 28th IEEE International Conference on Application-specific Systems, Architectures and Processors, ASAP 2017.
[19] Andrew Anderson, Aravind Vasudevan, Cormac Keane, David Gregg. “Low-memory GEMM-based Convolution Algorithms for Deep Neural Networks.”, arXiv preprint arXiv:1709.03395, 2017.
[20] Alex Krizhevsky, Ilya Stuskever, and Geoffrey E. Hinton. “ImageNet Classification with Deep Convolutional Neural Networks.”, in NIPS, 2012.
[21] Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.”, 2015.
[22] Joseph Redmon and Ali Farhadi. “Yolo9000: Better, faster, stronger.”, in CVPR, 2017.
[23] Jiyuan Zhang, Franz Franchetti, Tze Meng Low. “High Performance Zero-Memory Overhead Direct Convolutions.”, in ICML, 2018.
[24] Xuan Yang, Jing Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram and Mark Horowitz. “A Systematic Approach to Blocking Convolutional Neural Networks.”, arXiv, 2016.
[25] Yufei Ma, Yu Cao, Sarma Vrudhula, Jae-sun Seo. “Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks.”, in ACM, pp. 45-54, 2017.
[26] Fengbin Tu, Shouyi Yin, Peng Ouyang, Shinbin Tang, Leibo Liu, and Shaojun Wei. “Deep Convolutional Neural Network Architecture with Reconfigurable Computation Patterns.”, IEEE Journal of Solid-State Circuits, vol.52, pp. 127-138, 2017.
[27] Arthur Stoutchinin, Francesco Conti, Luca Benini. “Optimally Scheduling CNN Convolutions for Efficient Memory Access.”, CoRR, vol. abs/1902.01492, 2019.
[28] Marian Verhelst and Bert Moons. “Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices.”, IEEE Solid-State Circuits Magazine, 9(4):55-65, 2017.
[29] Mattson Richard L., et al. “Evaluation techniques for storage hierarchies.”, IBM Systems journal 9.2 (1970): 78-117.
[30] Cheng-Lin Tsai, et al. “A Fast-and-Effective Early-Stage Multi-level Cache Optimization Method based on Reuse-Distance Analysis.” National Tsing Hua University, 2016.
[31] http://www.cs.wisc.edu/~markhill/DineroIV

簡易檢索 / 詳目顯示

相關論文