利用強化學習及大型語言模型進行熱力學性質分子設計

簡易檢索 / 詳目顯示

回結果列表

研究生：	廖元群 Liao, Yuan-Qun
論文名稱：	利用強化學習及大型語言模型進行熱力學性質分子設計 Reinforcement Learning and Large Language Model for Thermodynamic Properties Molecular Design
指導教授：	汪上曉 Wong, Shan-Hill 姚遠 Yao, Yuan
口試委員:	鄭西顯 Jang, Shi-Shang 康嘉麟 Kang, Jia-Lin
學位類別：	碩士 Master
系所名稱：	工學院 - 化學工程學系 Department of Chemical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	中文
論文頁數：	76
中文關鍵詞：	分子設計、MolDQN 、大型語言模型、ChatGPT 、GPT-3
外文關鍵詞：	Molecular design, MolDQN, large language models, ChatGPT, GPT-3
相關次數：	點閱：196 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在化學領域中，尋找具備理想性質的分子是一項極具挑戰性的任務，尤其是對於特用化學品或新藥的設計而言。研究人員必須在廣闊的化學空間中尋找少數具備理想性質的分子。近年來隨著電腦效能的提升，機器學習技術得以迅速發展，並提供了多種方法來預測分子性質和進行化學產品設計，這些方法統稱為電腦輔助分子設計（Computer aided molecular design, CAMD），大幅縮短了化學產品的開發週期並降低了研發成本。
在本文中我們使用名為MolDQN的強化學習方法以及使用大型語言模型GPT-3的微調模型進行研究，以簡化分子線性輸入規範(Simplified Molecular Input Line Entry Specification, SMILES)為輸入，透過兩種截然不同的模型來生成具有特定溶解度參數之分子。
MolDQN結合強化學習中的DQN與化學領域知識來進行分子性質優化任務，只允許合理的修飾動作來保證優化的分子都是有效的分子，以修飾分子為動作，理想性質為獎勵，在不進行任何預訓練的情況下從頭開始產生分子。在研究中我們發現在溶解度參數目標條件生成上，可以在只提供獎勵條件下學習到如何組裝或是修飾一個分子，在生成目標範圍內分子的百分比高達50%。
並且本文在研究中測試了ChatGPT對分子設計實驗之規劃能力，結果表明在大方向建議上ChatGPT能有效率的提供建議。在描述精確的情況下，接著一步步引導，可以有非常高效的產生模擬代碼。此外也利用ChatGPT之基礎模型GPT-3進行微調，執行few-shot learning的分子設計。提供分子沸點作為容易獲得的輔助性質，並尋找具有特定溶解度參數的分子。結果顯示，在提示詞明確要求的情況下，模型可以有效的學會生成合理SMILES，在生成符合極端溶解度參數之分子增加了一倍的效果，但在較一般溶解度參數時仍然有發展空間。

In chemistry, finding molecules with desirable properties is a challenging task, especially for the design of specific chemicals or new drugs. Researchers have to search for a small number of molecules with ideal properties in a wide range of chemical space. In recent years, with the improvement of computer performance, machine learning technology has been rapidly developed and provides various methods for predicting molecular properties and designing chemical products, which are commonly referred to as Computer Aided Molecular Design (CAMD), dramatically shortening the development cycle and reducing the cost of developing chemical products.
In this paper we use an enhanced learning method called MolDQN and a fine-tuning model using the large-scale language model GPT-3. Using the Simplified Molecular Input Line Entry Specification (SMILES) as input, two distinct models are used to generate molecules with specific solubility parameters.
MolDQN combines DQN from reinforcement learning with chemical domain knowledge for molecular property optimization tasks, allowing only reasonable modifying actions to ensure that the optimized molecules are valid molecules, using modified molecules as actions and ideal properties as rewards, and generating molecules from the beginning without any pre-training. In our research, we found that for solubility parameter target generation, it is possible to learn how to assemble or modify a molecule with only reward provided, and the percentage of generated molecules in the target range is up to 50%.
Moreover, we examined the ability of ChatGPT to plan molecular design experiments, and the results show that ChatGPT can efficiently provide suggestions in terms of general direction. With precise descriptions and step-by-step guidance, simulation codes can be generated efficiently. ChatGPT's base model, GPT-3, is also used to perform fine-tuning of the molecular design for few-shot learning. The boiling points of the molecules were provided as easy-to-obtain auxiliary properties, and molecules with specific solubility parameters were searched. Results show that the model can effectively learning to generate reasonable SMILES when prompts are explicitly requested, doubling the effectiveness in generating molecules that meet the extreme solubility parameter, but there is still development potential for the more general solubility parameter.

摘要    i
Abstract    ii
誌謝    iii
目錄 Table of Contents    iv
圖目錄 List of Figures    vii
表目錄 List of Tables    ix
第一章 緒論    1
一.1研究背景    1
一.2計算機輔助分子設計    1
一.2.1順向演算法    2
一.2.2逆向演算法    3
一.3強化學習    4
一.4大型語言模型    4
一.5研究動機    5
第二章 基於原子的強化學習    7
二.1分子表示式    7
二.1.1 SMILES    7
二.1.2 分子指紋    7
二.2強化學習    8
二.2.1蒙地卡羅法    10
二.2.2時間差分法    11
二.3 Q-Learning與DQN    12
二.3.1 Q-Learning    12
二.3.2 DQN    12
二.4 MolDQN    14
二.4.1代理人(Agent)    14
二.4.2環境(Environment)    14
二.4.3獎勵(Reward)    15
二.4.4狀態(State)    15
二.4.5模型架構    15
二.5訓練結果    16
二.5.1 QED訓練結果    16
二.5.2 SIM訓練結果    18
二.5.3溶解度參數單目標訓練結果    20
第三章 基於大型語言模型的分子設計    24
三.1大型語言模型GPT-3    24
三.2簡易提示詞    25
三.2.1模型訓練    28
三.2.2模型生成之設定    29
三.2.3 Epochs 對模型生成的影響    30
三.2.4 Prompt對模型生成的影響    30
三.2.5輔助性質數量對模型生成的影響    31
三.2.6目標性質數量對模型生成的影響    32
三.3演化設計    37
三.3.1固定溫度之演化模型對生成分子的影響    37
三.3.2收縮溫度之演化模型對生成分子的影響    39
三.4指導性提示詞    41
三.4.1原始指導性提示詞    57
三.4.2修正之指導性提示詞    58
三.4.3修正之指導性提示詞與分隔符    59
第四章 結論    62
參考文獻    63
附錄    67
附錄一 簡易提示詞 “delta=13100”  T檢定    67
附錄二 簡易提示詞 “delta=23100”  T檢定    72

                                

[1] Frühbeis, H., Klein, R., & Wallmeier, H. (1987). Computer‐Assisted Molecular Design (CAMD)—An Overview. Angewandte Chemie International Edition in English, 26(5), 403-418.
[2] Ng, L. Y., Chong, F. K., & Chemmangattuvalappil, N. G. (2015). Challenges and opportunities in computer-aided molecular design. Computers & Chemical Engineering, 81, 115-129.
[3] Austin, N. D., Sahinidis, N. V., & Trahan, D. W. (2016). Computer-aided molecular design: An introduction and review of tools, applications, and solution techniques. Chemical Engineering Research and Design, 116, 2-26.
[4] Joback, K. G., & Reid, R. C. (1987). Estimation of pure-component properties from group-contributions. Chemical Engineering Communications, 57(1-6), 233-243. https://doi.org/10.1080/00986448708960487
[5] Roubehie Fissa, M., Lahiouel, Y., Khaouane, L., & Hanini, S. (2019). QSPR estimation models of normal boiling point and relative liquid density of pure hydrocarbons using MLR and MLP-ANN methods. Journal of Molecular Graphics and Modelling, 87, 109-120. https://doi.org/https://doi.org/10.1016/j.jmgm.2018.11.013
[6] Paduszynski, K., & Domanska, U. (2014). Viscosity of ionic liquids: an extensive database and a new group contribution model based on a feed-forward artificial neural network. Journal of chemical information and modeling, 54(5), 1311-1324.
[7] Liu, Q., Allamanis, M., Brockschmidt, M., & Gaunt, A. (2018). Constrained graph variational autoencoders for molecule design. Advances in neural information processing systems, 31.
[8] Segler, M. H., Kogej, T., Tyrchan, C., & Waller, M. P. (2018). Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS central science, 4(1), 120-131.
[9] Sanchez-Lengeling, B., Outeiral, C., Guimaraes, G. L., & Aspuru-Guzik, A. (2017). Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC).
[10] Elton, D. C., Boukouvalas, Z., Fuge, M. D., & Chung, P. W. (2019). Deep learning for molecular design—a review of the state of the art. Molecular Systems Design & Engineering, 4(4), 828-849.
[11] Venkatasubramanian, V., Chan, K., & Caruthers, J. M. (1994). Computer-aided molecular design using genetic algorithms. Computers & Chemical Engineering, 18(9), 833-844.
[12] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. nature, 529(7587), 484-489.
[13] Mousavi, S. S., Schukat, M., & Howley, E. (2018). Deep reinforcement learning: an overview. In Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016: Volume 2 (pp. 426-440). Springer International Publishing.
[14] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[15] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
[16] Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
[17] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
[18] Lund, B. D., & Wang, T. (2023). Chatting about ChatGPT: how may AI and GPT impact academia and libraries?. Library Hi Tech News, 40(3), 26-29.
[19] Daylight Chemical Information System. https://www.daylight.com/
[20] Durant, J. L., Leland, B. A., Henry, D. R., & Nourse, J. G. (2002). Reoptimization of MDL keys for use in drug discovery. Journal of chemical information and computer sciences, 42(6), 1273-1280.
[21] Morgan, H. L. (1965). The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. Journal of chemical documentation, 5(2), 107-113.
[22] RDKit: Open-source cheminformatics. https://www.rdkit.org
[23] Rogers, D., & Hahn, M. (2010). Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5), 742-754.
[24] Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
[25] Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of artificial intelligence research, 4, 237-285.
[26] Otterlo, M. V., & Wiering, M. (2012). Reinforcement learning and markov decision processes. In Reinforcement learning (pp. 3-42). Springer, Berlin, Heidelberg.
[27] Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8(3), 279-292.
[28] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. nature, 518(7540), 529-533.
[29] Zhou, Z., Kearnes, S., Li, L., Zare, R. N., & Riley, P. (2019). Optimization of molecules via deep reinforcement learning. Scientific reports, 9(1), 1-10.
[30] Rogers, D., & Hahn, M. (2010). Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5), 742-754.
[31] Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S., & Hopkins, A. L. (2012). Quantifying the chemical beauty of drugs. Nature chemistry, 4(2), 90-98.
[32] Chung, N. C., Miasojedow, B., Startek, M., & Gambin, A. (2019). Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data. BMC bioinformatics, 20(15), 1-11.
[33] Alshehri, A. S., Gani, R., & You, F. (2020). Deep learning and knowledge-based methods for computer-aided molecular design—toward a unified approach: State-of-the-art and future directions. Computers & Chemical Engineering, 141, 107005
[34] Rangarajan, S. (2022). Towards a chemistry-informed paradigm for designing molecules. Current Opinion in Chemical Engineering, 35, 100717.
[35] Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., ... & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
[36] Peng, B., Li, C., He, P., Galley, M., & Gao, J. (2023). Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
[37] "GPT-3.," OpenAI, [Online]. Available: https://beta.openai.com/docs/introduction.
[38] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
[39] OpenAI .(2020). Language Models are Few-Shot Learners. arXiv:2005.14165v4
[40] Mishra, S., Khashabi, D., Baral, C., Choi, Y., & Hajishirzi, H. (2021). Reframing Instructional Prompts to GPTk's Language. arXiv preprint arXiv:2109.07830.

簡易檢索 / 詳目顯示

相關論文