簡易檢索 / 詳目顯示

研究生: 廖元群
Liao, Yuan-Qun
論文名稱: 利用強化學習及大型語言模型進行熱力學性質分子設計
Reinforcement Learning and Large Language Model for Thermodynamic Properties Molecular Design
指導教授: 汪上曉
Wong, Shan-Hill
姚遠
Yao, Yuan
口試委員: 鄭西顯
Jang, Shi-Shang
康嘉麟
Kang, Jia-Lin
學位類別: 碩士
Master
系所名稱: 工學院 - 化學工程學系
Department of Chemical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 76
中文關鍵詞: 分子設計MolDQN大型語言模型ChatGPTGPT-3
外文關鍵詞: Molecular design, MolDQN, large language models, ChatGPT, GPT-3
相關次數: 點閱:56下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在化學領域中,尋找具備理想性質的分子是一項極具挑戰性的任務,尤其是對於特用化學品或新藥的設計而言。研究人員必須在廣闊的化學空間中尋找少數具備理想性質的分子。近年來隨著電腦效能的提升,機器學習技術得以迅速發展,並提供了多種方法來預測分子性質和進行化學產品設計,這些方法統稱為電腦輔助分子設計(Computer aided molecular design, CAMD),大幅縮短了化學產品的開發週期並降低了研發成本。
    在本文中我們使用名為MolDQN的強化學習方法以及使用大型語言模型GPT-3的微調模型進行研究,以簡化分子線性輸入規範(Simplified Molecular Input Line Entry Specification, SMILES)為輸入,透過兩種截然不同的模型來生成具有特定溶解度參數之分子。
    MolDQN結合強化學習中的DQN與化學領域知識來進行分子性質優化任務,只允許合理的修飾動作來保證優化的分子都是有效的分子,以修飾分子為動作,理想性質為獎勵,在不進行任何預訓練的情況下從頭開始產生分子。在研究中我們發現在溶解度參數目標條件生成上,可以在只提供獎勵條件下學習到如何組裝或是修飾一個分子,在生成目標範圍內分子的百分比高達50%。
    並且本文在研究中測試了ChatGPT對分子設計實驗之規劃能力,結果表明在大方向建議上ChatGPT能有效率的提供建議。在描述精確的情況下,接著一步步引導,可以有非常高效的產生模擬代碼。此外也利用ChatGPT之基礎模型GPT-3進行微調,執行few-shot learning的分子設計。提供分子沸點作為容易獲得的輔助性質,並尋找具有特定溶解度參數的分子。結果顯示,在提示詞明確要求的情況下,模型可以有效的學會生成合理SMILES,在生成符合極端溶解度參數之分子增加了一倍的效果,但在較一般溶解度參數時仍然有發展空間。


    In chemistry, finding molecules with desirable properties is a challenging task, especially for the design of specific chemicals or new drugs. Researchers have to search for a small number of molecules with ideal properties in a wide range of chemical space. In recent years, with the improvement of computer performance, machine learning technology has been rapidly developed and provides various methods for predicting molecular properties and designing chemical products, which are commonly referred to as Computer Aided Molecular Design (CAMD), dramatically shortening the development cycle and reducing the cost of developing chemical products.
    In this paper we use an enhanced learning method called MolDQN and a fine-tuning model using the large-scale language model GPT-3. Using the Simplified Molecular Input Line Entry Specification (SMILES) as input, two distinct models are used to generate molecules with specific solubility parameters.
    MolDQN combines DQN from reinforcement learning with chemical domain knowledge for molecular property optimization tasks, allowing only reasonable modifying actions to ensure that the optimized molecules are valid molecules, using modified molecules as actions and ideal properties as rewards, and generating molecules from the beginning without any pre-training. In our research, we found that for solubility parameter target generation, it is possible to learn how to assemble or modify a molecule with only reward provided, and the percentage of generated molecules in the target range is up to 50%.
    Moreover, we examined the ability of ChatGPT to plan molecular design experiments, and the results show that ChatGPT can efficiently provide suggestions in terms of general direction. With precise descriptions and step-by-step guidance, simulation codes can be generated efficiently. ChatGPT's base model, GPT-3, is also used to perform fine-tuning of the molecular design for few-shot learning. The boiling points of the molecules were provided as easy-to-obtain auxiliary properties, and molecules with specific solubility parameters were searched. Results show that the model can effectively learning to generate reasonable SMILES when prompts are explicitly requested, doubling the effectiveness in generating molecules that meet the extreme solubility parameter, but there is still development potential for the more general solubility parameter.

    摘要 i Abstract ii 誌謝 iii 目錄 Table of Contents iv 圖目錄 List of Figures vii 表目錄 List of Tables ix 第一章 緒論 1 一.1研究背景 1 一.2計算機輔助分子設計 1 一.2.1順向演算法 2 一.2.2逆向演算法 3 一.3強化學習 4 一.4大型語言模型 4 一.5研究動機 5 第二章 基於原子的強化學習 7 二.1分子表示式 7 二.1.1 SMILES 7 二.1.2 分子指紋 7 二.2強化學習 8 二.2.1蒙地卡羅法 10 二.2.2時間差分法 11 二.3 Q-Learning與DQN 12 二.3.1 Q-Learning 12 二.3.2 DQN 12 二.4 MolDQN 14 二.4.1代理人(Agent) 14 二.4.2環境(Environment) 14 二.4.3獎勵(Reward) 15 二.4.4狀態(State) 15 二.4.5模型架構 15 二.5訓練結果 16 二.5.1 QED訓練結果 16 二.5.2 SIM訓練結果 18 二.5.3溶解度參數單目標訓練結果 20 第三章 基於大型語言模型的分子設計 24 三.1大型語言模型GPT-3 24 三.2簡易提示詞 25 三.2.1模型訓練 28 三.2.2模型生成之設定 29 三.2.3 Epochs 對模型生成的影響 30 三.2.4 Prompt對模型生成的影響 30 三.2.5輔助性質數量對模型生成的影響 31 三.2.6目標性質數量對模型生成的影響 32 三.3演化設計 37 三.3.1固定溫度之演化模型對生成分子的影響 37 三.3.2收縮溫度之演化模型對生成分子的影響 39 三.4指導性提示詞 41 三.4.1原始指導性提示詞 57 三.4.2修正之指導性提示詞 58 三.4.3修正之指導性提示詞與分隔符 59 第四章 結論 62 參考文獻 63 附錄 67 附錄一 簡易提示詞 “delta=13100” T檢定 67 附錄二 簡易提示詞 “delta=23100” T檢定 72

    [1] Frühbeis, H., Klein, R., & Wallmeier, H. (1987). Computer‐Assisted Molecular Design (CAMD)—An Overview. Angewandte Chemie International Edition in English, 26(5), 403-418.
    [2] Ng, L. Y., Chong, F. K., & Chemmangattuvalappil, N. G. (2015). Challenges and opportunities in computer-aided molecular design. Computers & Chemical Engineering, 81, 115-129.
    [3] Austin, N. D., Sahinidis, N. V., & Trahan, D. W. (2016). Computer-aided molecular design: An introduction and review of tools, applications, and solution techniques. Chemical Engineering Research and Design, 116, 2-26.
    [4] Joback, K. G., & Reid, R. C. (1987). Estimation of pure-component properties from group-contributions. Chemical Engineering Communications, 57(1-6), 233-243. https://doi.org/10.1080/00986448708960487
    [5] Roubehie Fissa, M., Lahiouel, Y., Khaouane, L., & Hanini, S. (2019). QSPR estimation models of normal boiling point and relative liquid density of pure hydrocarbons using MLR and MLP-ANN methods. Journal of Molecular Graphics and Modelling, 87, 109-120. https://doi.org/https://doi.org/10.1016/j.jmgm.2018.11.013
    [6] Paduszynski, K., & Domanska, U. (2014). Viscosity of ionic liquids: an extensive database and a new group contribution model based on a feed-forward artificial neural network. Journal of chemical information and modeling, 54(5), 1311-1324.
    [7] Liu, Q., Allamanis, M., Brockschmidt, M., & Gaunt, A. (2018). Constrained graph variational autoencoders for molecule design. Advances in neural information processing systems, 31.
    [8] Segler, M. H., Kogej, T., Tyrchan, C., & Waller, M. P. (2018). Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS central science, 4(1), 120-131.
    [9] Sanchez-Lengeling, B., Outeiral, C., Guimaraes, G. L., & Aspuru-Guzik, A. (2017). Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC).
    [10] Elton, D. C., Boukouvalas, Z., Fuge, M. D., & Chung, P. W. (2019). Deep learning for molecular design—a review of the state of the art. Molecular Systems Design & Engineering, 4(4), 828-849.
    [11] Venkatasubramanian, V., Chan, K., & Caruthers, J. M. (1994). Computer-aided molecular design using genetic algorithms. Computers & Chemical Engineering, 18(9), 833-844.
    [12] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. nature, 529(7587), 484-489.
    [13] Mousavi, S. S., Schukat, M., & Howley, E. (2018). Deep reinforcement learning: an overview. In Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016: Volume 2 (pp. 426-440). Springer International Publishing.
    [14] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    [15] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
    [16] Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
    [17] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
    [18] Lund, B. D., & Wang, T. (2023). Chatting about ChatGPT: how may AI and GPT impact academia and libraries?. Library Hi Tech News, 40(3), 26-29.
    [19] Daylight Chemical Information System. https://www.daylight.com/
    [20] Durant, J. L., Leland, B. A., Henry, D. R., & Nourse, J. G. (2002). Reoptimization of MDL keys for use in drug discovery. Journal of chemical information and computer sciences, 42(6), 1273-1280.
    [21] Morgan, H. L. (1965). The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. Journal of chemical documentation, 5(2), 107-113.
    [22] RDKit: Open-source cheminformatics. https://www.rdkit.org
    [23] Rogers, D., & Hahn, M. (2010). Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5), 742-754.
    [24] Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
    [25] Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of artificial intelligence research, 4, 237-285.
    [26] Otterlo, M. V., & Wiering, M. (2012). Reinforcement learning and markov decision processes. In Reinforcement learning (pp. 3-42). Springer, Berlin, Heidelberg.
    [27] Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8(3), 279-292.
    [28] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. nature, 518(7540), 529-533.
    [29] Zhou, Z., Kearnes, S., Li, L., Zare, R. N., & Riley, P. (2019). Optimization of molecules via deep reinforcement learning. Scientific reports, 9(1), 1-10.
    [30] Rogers, D., & Hahn, M. (2010). Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5), 742-754.
    [31] Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S., & Hopkins, A. L. (2012). Quantifying the chemical beauty of drugs. Nature chemistry, 4(2), 90-98.
    [32] Chung, N. C., Miasojedow, B., Startek, M., & Gambin, A. (2019). Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data. BMC bioinformatics, 20(15), 1-11.
    [33] Alshehri, A. S., Gani, R., & You, F. (2020). Deep learning and knowledge-based methods for computer-aided molecular design—toward a unified approach: State-of-the-art and future directions. Computers & Chemical Engineering, 141, 107005
    [34] Rangarajan, S. (2022). Towards a chemistry-informed paradigm for designing molecules. Current Opinion in Chemical Engineering, 35, 100717.
    [35] Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., ... & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
    [36] Peng, B., Li, C., He, P., Galley, M., & Gao, J. (2023). Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
    [37] "GPT-3.," OpenAI, [Online]. Available: https://beta.openai.com/docs/introduction.
    [38] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [39] OpenAI .(2020). Language Models are Few-Shot Learners. arXiv:2005.14165v4
    [40] Mishra, S., Khashabi, D., Baral, C., Choi, Y., & Hajishirzi, H. (2021). Reframing Instructional Prompts to GPTk's Language. arXiv preprint arXiv:2109.07830.

    QR CODE