基於預測的大規模平行模型超參數調較方法｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	余洪楓 Yu, Hong-feng
論文名稱：	基於預測的大規模平行模型超參數調較方法 PBHS: A Prediction-Based Strategy for Massively Parallel Hyperparameter Tuning
指導教授：	周志遠 Chou, Jerry
口試委員:	賴冠州 Lai, Kuan-Chou 李哲榮 Lee, Che-Rung
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	33
中文關鍵詞：	自動化機器學習、超參數優化、分散式機器學習
外文關鍵詞：	AutoML, Hyperparameter Optimization, Distributed Machine Learning
相關次數：	點閱：83 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本論文中，我們介紹了 PBHS，一種基於預測的超參數調優調度程序。近年來，由於數據可用性和計算能力的快速增長，深度學習技術得到迅速發展。同時，一些實驗表明，模型架構及其超參數是影響最終性能的關鍵因素，因此隨機選擇的超參數配置可能導致糟糕的性能。隨著深度學習技術在日常應用中的不斷普及，開發一種具有良好設計算法的工具變得至關重要，該工具能夠通過自動調整超參數來開發模型，並在時間和資源限制下找到具有相應超參數配置的最佳性能。現有的解決方案僅根據它們的当前的性能終止候選配置，這可能會錯誤地終止一些實際上非常有前途的候選配置。因此，我們設計了一個自動調優調度器，結合數學方法來預測學習曲線的趨勢，使調度器可以向前看。我們用各種神經網絡和數據集測試我們的調度器，證明它能取得比現有解決方案更好的表現。

In this thesis, we present PBHS, a prediction-based hyperparameter tuning scheduler. Recent years have seen the Cambrian explosion of deep learning technology powered by the rapid increment in both data availability and computational ability. Meanwhile, several experiments have shown that model architecture as well as its hyperparameters are the critical factors that affect the final performance, such that random network configuration can lead to catastrophic performance. As deep learning technology continuing to take places in everyday applications, it becomes crucial to develop a tool with well-designed algorithm that is able to develop the model by tuning hyperparameters automatically and find the best performance with corresponding hyperparameter configuration under time and resources limitations. Existing solutions only kill the candidate configurations base on their \emph{current} performance, which may falsely kill some candidates that are actually very promising. Thus, we design an auto-tuning scheduler combining with mathematical method to predict the trend of learning curve, so that the scheduler can \emph{look forward}. We test our scheduler with various neural networks and data set, and prove it effective.

1-Introduction    1
2-Related work    4
3-Motivation    7
1-Parallel Computing    7
2-Observation    8
3-Exploitation and Exploration    8
4-Problems    10
1-Definitions and Problem Scope    10
2-Assumptions    10
5-Methodology    12
1-Overview    13
2-Prediction-based Hyperparameter Tuning    14
2.1-Phase 1    14
2.2-Phase 2    15
3-Components    18
3.1-Prediction Model    18
3.2-Candidate Pool    19
3.3-Explorative/Exploitative GPU Pool    19
3.4-Exploration Ratio    19
3.5-Near Future Prediction    20
3.6-Unpromising Trial Removal    20
4-Implementation    21
6-Evaluation    22
1-Environment    22
2-Hyperparameter Tuning with Different Deadlines and Resources    22
2.1-ResNet18 on Cifar10    24
2.2-VGG16 on Cifar100    24
2.3-MemN2N on bAbI    24
3-Analysis    25
3.1-Preparation    25
3.2-Resource Time Allocation:    26
3.3-Top-ranked Trial Completion Rate:    27
4-Parameters    28
4.1-Exploration Ratio    28
4.2-Predictor    29
4.3-Near Future Prediction Threshold    29
7-Conclusion    31
References    32
                                

[1]Bergstra, J., and Bengio, Y. Random search for hyper-parameter optimization.
Journal of machine learning research 13, 2 (2012).
[2]Domhan, T., Springenberg, J. T., and Hutter, F. Speeding up automatic hyper- parameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-fourth international joint conference on artificial intelli- gence (2015).
[3]He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. CoRR abs/1512.03385 (2015).
[4]Jamieson, K., and Talwalkar, A. Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 19th International Confer- ence on Artificial Intelligence and Statistics (Cadiz, Spain, 09–11 May 2016),
A. Gretton and C. C. Robert, Eds., vol. 51 of Proceedings of Machine Learning Research, PMLR, pp. 240–248.
[5]Klein, A., Falkner, S., Springenberg, J. T., and Hutter, F. Learning curve pre- diction with bayesian neural networks.
[6]Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Hardt, M., Recht, B., and Talwalkar, A. Massively parallel hyperparameter tuning, 2019.
[7]Li, L., Jamieson, K. G., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. Hyperband: Bandit-based configuration evaluation for hyperparameter opti- mization. In ICLR (Poster) (2017).
[8]Liaw, R., Bhardwaj, R., Dunlap, L., Zou, Y., Gonzalez, J., Stoica, I., and Tu- manov, A. Hypersched: Dynamic resource reallocation for model develop- ment on a deadline. CoRR abs/2001.02338 (2020).
[9]Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E., and Stoica, I. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118 (2018).
[10]Simonyan, K., and Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv e-prints (Sept. 2014), arXiv:1409.1556.
[11]Sukhbaatar, S., Szlam, A., Weston, J., and Fergus, R. Weakly supervised mem- ory networks. CoRR abs/1503.08895 (2015).
Weston, J., Bordes, A., Chopra, S., Rush, A. M., Van Merriënboer, B., Joulin, A., and Mikolov, T. Towards ai-complete question answering: A set of pre- requisite toy tasks. arXiv preprint arXiv:1502.05698 (2015).

簡易檢索 / 詳目顯示

相關論文