平行簡化群體演算法與混合梯度下降法於CUDA架構之實現

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃上科 Huang, Shang-Ke
論文名稱：	平行簡化群體演算法與混合梯度下降法於CUDA架構之實現 Parallel Simplified Swarm Optimization and Hybrid Gradient Descent Implemented in CUDA
指導教授：	葉維彰 Yeh, Wei-Chang
口試委員:	賴鵬仁 Lai, Peng-Jen 賴智明 Lai, Chyh-Ming
學位類別：	碩士 Master
系所名稱：	工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	64
中文關鍵詞：	圖形處理器、平行運算、智能群體演算法、簡化群體演算法、梯度下降法、統一計算架構
外文關鍵詞：	Graphics Processing Unit (GPU), Parallelism, Intelligent Swarm Algorithm (SIAs), Simplified Swarm Optimization (SSO), Gradient Descent (GD), Compute Unified Device Architecture (CUDA)
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著圖形處理器（GPU）的取得成本逐年降低，運算能力的大幅進步，跨領域支援性的提升，需要大量平行化處理的最佳化問題在現今得以由個人電腦來處理。最佳化計算領域中，智能群體演算法（SIAs）以及梯度下降法的資料特性適用於平行化處理，但過去尚無學者提出運行於GPU上的簡化群體演算法（SSO）以及簡單實現的混合梯度下降法；因此本研究在考慮計算資源有有限，演算法泛用性、實現難度的條件下，提出基於統一計算架構（CUDA）上運行的平行簡化群體演算法（PSSO），以及基於梯度的混合梯度下降法G-SOA。

在PSSO方面，參考其他SIAs由CPU轉換為GPU運行的研究中，有兩個共通的缺點，其一：適合度函數的時間複雜度理論值為O (tNm)，其中有t次迭代數目，N個適合度函數，需要兩兩比較m次；其二：pBests與gBest在更新時會有資源搶佔的問題。本研究提出PSSO以改善上述缺點，研究結果顯示時間複雜度降低了N的指數量級，且完全避免了資源搶佔的問題。

在G-SOA方面，梯度下降法能夠確保收斂到區域最佳解，但其無法保證收斂到全域最佳解，本研究因此利用了ea-mutation以及混合梯度的Search方法，讓已經掉入區域最佳解的粒子能夠逃出。經由研究結果顯示G-SOA相較於原本的梯度下降法，所找到的解更爲優良。

最終本研究將以不同的最佳化問題來驗證本研究所提出方法的求解品質以及運行速度。

As the acquisition cost of the graphics processing unit (GPU) has decreased, personal computers (PC) can handle optimization problems nowadays. In optimization computing, intelligent swarm algorithm (SIAs) and the gradient descent method are suitable for parallelization. However, there neither had a GPU-based simplified swarm algorithm (SSO) nor a simple-implemented hybrid gradient method been proposed. Accordingly, this dissertation proposed PSSO and G-SOA based on CUDA platform considering computational ability and versatility.

In PSSO, first, the theoretical value of time complexity of fitness function is O (tNm). There are t iterations and N fitness function, each of which required pair comparisons m times. Second, pBests and gBest have the resource preemption when updating. As the experiments showed, the time complexity has successfully reduced by order of magnitude of N, and the problem of resource preemption was avoided entirely.

In terms of G-SOA, gradient descent method can guarantee to local optimum convergence, but cannot ensure to global optimum. Thus, this study used ea-mutation and a hybrid-gradient method-Search to enable particles to get rid of local optimum. The results showed that G-SOA performed better than the original gradient descent method.

誌謝

摘要                                                               i

Abstract                                                          ii

Introduction                                                    1
1 Background and Motivation --------------------------------- 1
2 Research Aims --------------------------------------------- 2
3 Main Contribution of This Dissertation -------------------- 4

Literature Survey                                               6
1 Simplified Swarm Optimization (SSO) ----------------------- 6
2 Gradient Descent ------------------------------------------ 8
2.1 Batch Gradient Descent (Batch GD) ------------------- 8
2.2 Local Optimium and Global Optimium ------------------ 9
3 General-Purpose GPU Computing ----------------------------- 9
4 GPU-based SIAs Implementation ----------------------------- 12
4.1 Kernel Function (I): Initialize --------------------- 13
4.2 Kernel Function (E): Evaluation --------------------- 13
4.3 Kernel Function (C): Communication and (U): Update Swarm ----------------------------------------------------------------- 14
5 Hybrid Gradient-based SIAs -------------------------------- 14

Parallel Simplified Swarm Optimization (PSSO)                   16
1 Random Number Generation ---------------------------------- 16
2 Thread Organization --------------------------------------- 17
3 PSSO for GPU Computation ---------------------------------- 19
4 Experiments and Analysis ---------------------------------- 22
4.1 Benchmark Problems ---------------------------------- 23
4.2 Design of Experiments ------------------------------- 24
4.3 Precision of Solutions and Speedup ------------------ 28

Gradient-based Swarm Optimization Algorithm (G-SOA)             30
1 Random Number Generation ---------------------------------- 30
2 Exclusion-attraction Mutation ----------------------------- 31
3 Exploration and Exploitation ------------------------------ 34
4 Experiments and Analysis ---------------------------------- 36
4.1 Design of Experiments ------------------------------- 37
4.2 Quality of Solution and Speedup --------------------- 40
4.3 Experimental Analysis ------------------------------- 42

Conclusion and Further Research                                 45
1 Conclusion ------------------------------------------------ 45
2 Further Research ------------------------------------------ 46

Appendix A Design of Experiments for PSSO                         48

Appendix B Design of Experiments for G-SOA                        54

References                                                        61
                                

[1] R. Shams, P. Sadeghi, R. A. Kennedy, and R. I. Hartley, “A survey of medical image registration on multicore and the gpu,” IEEE Signal Processing Magazine, vol. 27, no. 2, pp. 50–60, 2010.

[2] S. Mittal and J. S. Vetter, “A survey of methods for analyzing and improving gpu energy efficiency,” ACM Computing Surveys (CSUR), vol. 47, no. 2, pp. 1–23, 2014.

[3] B. Deschizeaux and J.-Y. Blanc, “Imaging earth s subsurface using cuda,” GPU Gems, vol. 3, pp. 831–850, 2007.

[4] G. Hager, T. Zeiser, and G. Wellein, “Data access optimizations for highly threaded multi-core cpus with multiple memory controllers,” in 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–7, IEEE, 2008.

[5] C. T. Documentation, “v10. 1.168,” 2019.

[6] C. A. Navarro, N. Hitschfeld-Kahler, and L. Mateu, “A survey on parallel computing and its applications in data-parallel problems using gpu architectures,” Communications in Computational Physics, vol. 15, no. 2, pp. 285–329, 2014.

[7] M. Garland, S. Le Grand, J. Nickolls, J. Anderson, J. Hardwick, S. Morton, E. Phillips, Y. Zhang, and V. Volkov, “Parallel computing experiences with cuda,” IEEE micro, vol. 28, no. 4, pp. 13–27, 2008.

[8] E. Yildirim, E. Arslan, J. Kim, and T. Kosar, “Application-level optimization of big data transfers through pipelining, parallelism and concurrency,” IEEE Transactions on Cloud Computing, vol. 4, no. 1, pp. 63–75, 2015.

[9] Y. Tan and K. Ding, “A survey on gpu-based implementation of swarm intelligence algorithms,” IEEE transactions on cybernetics, vol. 46, no. 9, pp. 2028–2041, 2015.

[10] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, 2016.

[11] W. Zhu, “Massively parallel differential evolution pattern search optimization with graphics hardware acceleration: an investigation on bound constrained optimization problems,” Journal of Global Optimization, vol. 50, no. 3, pp. 417–437, 2011.

[12] M. L. Wong, “Parallel multi-objective evolutionary algorithms on graphics processing units,” in Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, pp. 2515–2522, ACM, 2009.

[13] M. M. Noel, “A new gradient based particle swarm optimization algorithm for accurate computation of global minimum,” Applied Soft Computing, vol. 12, no. 1, pp. 353–359, 2012.

[14] W.-C. Yeh and J.-S. Lin, “New parallel swarm algorithm for smart sensor systems redundancy allocation problems in the internet of things,” The Journal of Supercomputing, vol. 74, no. 9, pp. 4358–4384, 2018.

[15] W.-C. Yeh, “A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems,” Expert Systems with Applications, vol. 36, no. 5, pp. 9192–9200, 2009.

[16] T. Varga, A. Király, and J. Abonyi, “Improvement of pso algorithm by memory-based gradient search application in inventory management,” in Swarm Intelligence and Bio-Inspired Computation, pp. 403–422, Elsevier, 2013.

[17] A. Uriarte, P. Melin, and F. Valdez, “A new hybrid pso method applied to benchmark functions,” in Nature-Inspired Design of Hybrid Intelligent Systems, pp. 423–430, Springer, 2017.

[18] F. Han and Q. Liu, “A diversity-guided hybrid particle swarm optimization based on gradient search,” Neurocomputing, vol. 137, pp. 234–240, 2014.

[19] W.-C. Yeh, “New parameter-free simplified swarm optimization for artificial neural network training and its application in the prediction of time series,” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 4, pp. 661–665, 2013.

[20] Y. Y. Chung and N. Wahid, “A hybrid network intrusion detection system using simplified swarm optimization (sso),” Applied soft computing, vol. 12, no. 9, pp. 3014–3022, 2012.

[21] W.-C. Yeh, “Simplified swarm optimization in disassembly sequencing problems with learning effects,” Computers & Operations Research, vol. 39, no. 9, pp. 2168–2177, 2012.

[22] S. Pllana and F. Xhafa, Programming Multicore and Many-core Computing Systems, vol. 86. John Wiley & Sons, 2017.

[23] M. M. Hussain, H. Hattori, and N. Fujimoto, “A cuda implementation of the standard particle swarm optimization,” in 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 219–226, IEEE, 2016.

[24] S. Tsutsui and N. Fujimoto, “Solving quadratic assignment problems by genetic algorithms with gpu computation: a case study,” in Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, pp. 2523–2530, ACM, 2009.

[25] P. Krömer, J. Platoš, V. Snášel, and A. Abraham, “A comparison of many-threaded differential evolution and genetic algorithms on cuda,” in 2011 Third World Congress on Nature and Biologically Inspired Computing, pp. 509–514, IEEE, 2011.

[26] H. Zhu, Y. Guo, J. Wu, J. Gu, and K. Eguchi, “Paralleling euclidean particle swarm optimization in cuda,” in 2011 4th International Conference on Intelligent Networks and Intelligent Systems, pp. 93–96, IEEE, 2011.

[27] W. Li and Z. Zhang, “A cuda-based multi-channel particle swarm algorithm,” in 2011 International Conference on Control, Automation and Systems Engineering (CASE), pp. 1–4, IEEE, 2011.

[28] L. Mussi, F. Daolio, and S. Cagnoni, “Evaluation of parallel particle swarm optimization algorithms within the cudaTM architecture,” Information Sciences, vol. 181, no. 20, pp. 4642–4657, 2011.

[29] J. Qu, X. Liu, M. Sun, and F. Qi, “Gpu-based parallel particle swarm optimization methods for graph drawing,” Discrete Dynamics in Nature and Society, vol. 2017, 2017.

[30] Y. Zhou and Y. Tan, “Gpu-based parallel particle swarm optimization,” in 2009 IEEE Congress on Evolutionary Computation, pp. 1493–1500, IEEE, 2009.

[31] O. Bali, W. Elloumi, P. Krömer, and A. M. Alimi, “Gpu particle swarm optimization applied to travelling salesman problem,” in 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, pp. 112–119, IEEE, 2015.

[32] A. K. Qin, F. Raimondo, F. Forbes, and Y. S. Ong, “An improved cuda-based implementation of differential evolution on gpu,” in Proceedings of the 14th annual conference on Genetic and evolutionary computation, pp. 991–998, ACM, 2012.

[33] O. Maitre, L. A. Baumes, N. Lachiche, A. Corma, and P. Collet, “Coarse grain parallelization of evolutionary algorithms on gpgpu cards with easea,” in Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp. 1403–1410, ACM, 2009.

[34] F. Han and Q.-H. Ling, “A new approach for function approximation incorporating adaptive particle swarm optimization and a priori information,” Applied Mathematics and Computation, vol. 205, no. 2, pp. 792–798, 2008.

[35] H.-b. Gao, L. Gao, C. Zhou, and D.-y. Yu, “Particle swarm optimization based algorithm for neural network learning,” Acta Electronica Sinica, vol. 32, no. 9, pp. 1572–1574, 2004.

[36] S. Mirjalili, S. Z. M. Hashim, and H. M. Sardroudi, “Training feedforward neural networks using hybrid particle swarm optimization and gravitational search algorithm,” Applied Mathematics and Computation, vol. 218, no. 22, pp. 11125–11137, 2012.

[37] C. CUDA, “Best practices guide version 5.0. nvidia corporation (october, 2012).”

[38] V. Roberge and M. Tarbouchi, “Parallel particle swarm optimization on graphical processing unit for pose estimation,” WSEAS Trans. Comput, vol. 11, no. 6, pp. 170–179, 2012.

[39] Y. Zhou and Y. Tan, “Gpu-based parallel multi-objective particle swarm optimization,” International Journal of Artificial Intelligence, vol. 7, no. A11, pp. 125–141, 2011.

[40] V. S. Gordon and D. Whitley, “Serial and parallel genetic algorithms as function optimizers,” in ICGA, pp. 177–183, 1993.

[41] D. Wolpert and W. Macready, “No free lunch theorems for search (technical report sf it r-95-02-010),” Santa Fe Institute, Santa Fe, NM, 1995.

[42] G. Hadley, “Nonlinear and dynamics programming, ser. world student,” 1964.

[43] S. Surjanovic and D. Bingham, “Virtual library of simulation experiments: Test functions and datasets.” Retrieved June 2, 2020, from http://www.sfu.ca/~ssurjano.

[44] J. H. Friedman, “An overview of predictive learning and function approximation,” in From statistics to neural networks, pp. 1–61, Springer, 1994.

[45] K. Ding, S. Zheng, and Y. Tan, “A gpu-based parallel fireworks algorithm for optimization,” in Proceedings of the 15th annual conference on Genetic and evolutionary computation, pp. 9–16, 2013.

[46] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. Jarrod Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. Carey,
İ. Polat, Y. Feng, E. W. Moore, J. Vand erPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and S... Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020.

[47] L. T. Watson and C. A. Baker, “A fully-distributed parallel global search algorithm,” Engineering Computations, 2001.

[48] X. Zhao, X.-S. Gao, and Z.-C. Hu, “Evolutionary programming based on non-uniform mutation,” Applied Mathematics and Computation, vol. 192, no. 1, pp. 1–11, 2007.

簡易檢索 / 詳目顯示

相關論文