簡易檢索 / 詳目顯示

研究生: 廖柄淦
Liao, Ping-Kan
論文名稱: 利用隨機漫步與卷積神經網路模型從適應症敘述推導出潛在的中藥方劑
Using Random Walk and Convolutional Neural Network Models to Infer Potential Traditional Chinese Medicine Prescriptions from Indication Descriptions
指導教授: 蘇豐文
Soo, Von-Wun
口試委員: 郭柏志
柯宏慧
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 77
中文關鍵詞: 中藥方劑隨機漫步卷積神經網路適應症敘述
外文關鍵詞: Traditional Chinese Medicine Prescriptions, Random Walk, Convolutional Neural Network, Indication Descriptions
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來中醫或是中藥逐漸被西方國家所重視,這使得相關的科學研究也逐漸增加,讓中醫藥的科學化也成為了未來的趨勢。有鑑於中醫藥相關書籍的電子化與藥材的成分分析日益增加,這讓中醫藥模糊的經驗傳承得以利用大數據分析、深度學習的方式去嘗試歸納出一套法則。本研究建構了本草綱目(Compendium of Materia Medica)、中醫藥綜合資料庫(Traditional Chinese Medicine Integrated Database)與中醫藥系統藥理學資料庫(Traditional Chinese Medicine Systems Pharmacology Database),與深度學習卷積網路與隨機走路推理模型, 透過輸入中藥適應症來推導出潛在的中藥方劑,並利用預先訓練好的卷積神經網路進行評估。我們建構了包含適應症、標靶蛋白質與化學成分的多元網路,以隨機漫步演算法推導,然後提出潛在的中藥方劑。預先訓練好的卷積神經網路的平均準確率為92.43%,邀請專家針對系統提出的方劑進行評估,「認同」加上「非常認同」的比例為38.00%。雖然在資料不全的情況下尤其是藥草的成分不全,我們仍獲得可供未來研究與驗證的初步結果,也確認了藉由結合中、西方醫學觀念來提出潛在方劑的可能性。


    During the last few years, traditional Chinese medicine has gradually been valued by Western countries, which has gradually increased related scientific research and made the scientific traditional Chinese medicine a trend in the future. Because of the increasing digitization of Chinese medicine-related books and the increasing analysis of the components of herb, this allows the vague experience inheritance of traditional Chinese medicine to use big data analysis and deep learning to try to summarize a set of rules.

    This study combines the Compendium of Materia Medica, Traditional Chinese Medicine Integrated Database and Traditional Chinese Medicine Systems Pharmacology Database to infer potential Chinese medicine prescriptions from Chinese medicine indication input, and uses a pre-trained convolutional neural network to make evaluation. We constructed multiple networks including indications, target proteins and chemical compounds, and deduced the results using a random walk algorithm, and then proposed potential Chinese medicine prescriptions.

    The average accuracy of pre-trained CNN which in order to evaluate proposed potential prescriptions is 92.43%. The blind evaluation by human experts on the prescriptions proposed by our system being categorized as "suitable" or "very suitable" against "not suitable" and "very unsuitable" is overall 38.00%.

    Although under extreme incomplete information in many domains such as the ingredients of Traditional Chinese herbs, we still obtain preliminary results for further investigation and evaluation. We also show the possibility of combining traditional Chinese and Modern Western medical concepts to propose potential prescriptions in dealing with various indications.

    摘要i Abstract ii Acknowledgement iv List of Tables ix List of Figures xi 1 Introduction 1 2 Background and Related Work 5 2.1 Indications for TCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 The construction of biological networks . . . . . . . . . . . . . . . . . . . 6 2.2.1 Chemical-based networks . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Target-based networks . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.3 Indication-based networks . . . . . . . . . . . . . . . . . . . . . . 8 3 Methodology 9 3.1 Overview of the system architecture . . . . . . . . . . . . . . . . . . . . . 9 3.2 Pre-trained CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 Compendium of Materia Medica encoding(CoMM-encoding) . . . 12 3.2.2 The TCMID Data set . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.3 The architecture of CNN . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Building the biological heterogeneous and homogeneous networks of western medicine knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.1 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.2 Indication homogeneous network . . . . . . . . . . . . . . . . . . 19 3.3.3 The homogeneous networks of target proteins . . . . . . . . . . . 20 3.3.4 Chemical compound homogeneous network . . . . . . . . . . . . . 20 3.3.5 The heterogeneous networks from indications to target proteins . . 21 3.3.6 The heterogeneous networks from indications to chemical compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.7 The heterogeneous networks from target proteins to chemical compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 The random walk inference models on complex heterogeneous networks . . 22 3.4.1 The basic random walk model . . . . . . . . . . . . . . . . . . . . 23 3.4.2 Optimization of random walk . . . . . . . . . . . . . . . . . . . . 25 3.5 Converting TCM indications terms into western medicine concepts . . . . . 26 3.5.1 MetaMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Generating TCM prescriptions . . . . . . . . . . . . . . . . . . . . . . . . 27 3.6.1 Cartesian product . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 Experiments and Results 30 4.1 Steps of experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 The performance of the pre-trained CNN classifier . . . . . . . . . . . . . 31 4.3 Test data for random walk models . . . . . . . . . . . . . . . . . . . . . . 34 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.4.1 Number of herb combinations . . . . . . . . . . . . . . . . . . . . 36 4.4.2 Evaluation on the pre-trained CNN . . . . . . . . . . . . . . . . . 36 4.4.3 Evaluation by questionnaire . . . . . . . . . . . . . . . . . . . . . 38 4.5 Evaluation on the Robustness of prediction on TCM prescriptions . . . . . 41 4.5.1 Preparing the near-miss data set . . . . . . . . . . . . . . . . . . . 41 4.5.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 Conclusion and Future Work 44 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 The future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 References 47 .1 Appendix: CoMM-encoding Vector Attributes Detail Content . . . . . . . . . . . . . . 52 .2 Appendix: The List of Another Names of Herb in CoMM . . . . . . . . . . . . . . . . 59 .3 Appendix: Detail Content of Test Data for Random Walk . . . . . . . . . . . . . . . . 59 .4 Appendix: Detail Content of Potential Prescriptions . . . . . . . . . . . . . . . . . . . 61 .5 Appendix: Detail Questionnaire Response of Test Data . . . . . . . . . . . . . . . . . 77

    [1] Yu-Chen Kuo and Von-Won Soo. Predicting Indications of Traditional Chinese
    Medicine Based on a Random Walk Model. Master’s thesis, National Tsing Hua
    University, Hsinchu, Taiwan, dec 2018.
    [2] Wikipedia. Indication(medicine). URL https://en.wikipedia.org/wiki/
    Indication_(medicine).
    [3] 王瑞參、王靜瓊、周良、林春夏、張東迪、張賢哲、陳永清、黃中
    瑀、廖淑櫻、賴尚志. 中藥成藥效能、適應症語意解析及中藥廣告違
    規態樣釋例彙編. 衛生福利部, 2014. URL https://www.mohw.gov.tw/
    dl-10531-d8d7ed06-fc19-4a22-a8af-6f0869342591.html.
    [4] A Hamosh, A Scott, J Amberger, D Valle, and V Mckusick. Online mendelian
    inheritance in man (omim). Hum Mutat, 15(1):57–61, 2000. doi: 10.1002/
    (SICI)1098-1004(200001)15:1⟨57::AID-HUMU12⟩3.0.CO;2-G. URL https://
    pubmed.ncbi.nlm.nih.gov/10612823/.
    [5] J Ru, P Li, J Wang, W Zhou, B Li, C Huang, P Li, Z Guo, W Tao, Y Yang, X Xu,
    Y Li, Y Wang, and L Yang. TCMSP: a database of systems pharmacology for drug discovery from herbal medicines. Journal of Cheminformatics, 6:13, 4 2014.
    doi: 10.1186/1758-2946-6-13. URL https://pubmed.ncbi.nlm.nih.gov/
    24735618/.
    [6] David Weininger. Smiles, a chemical language and information system. 1. introduction
    to methodology and encoding rules. Journal of Chemical Information and
    Computer Sciences, 28(1):31–36, 1988. doi: 10.1021/ci00057a005. URL https:
    //pubs.acs.org/doi/abs/10.1021/ci00057a005.
    [7] Adri Cereto-Massagu, Mara Jos Ojeda, Cristina Valls, Miquel Mulero, Santiago
    Garcia-Vallv, and Gerard Pujadas. Molecular fingerprint similarity search in virtual
    screening. Methods, 71:58–63, 2015. ISSN 1046-2023. doi: https://doi.org/10.1016/
    j.ymeth.2014.08.005. URL https://www.sciencedirect.com/science/
    article/pii/S1046202314002631.
    [8] W Pearson. An introduction to sequence similarity (”homology”) searching. Curr
    Protoc Bioinformatics, 3, 2013. doi: 10.1002/0471250953.bi0301s42. URL https:
    //www.ncbi.nlm.nih.gov/pmc/articles/PMC3820096/.
    [9] Silpa Suthram, Joel T Dudley, Annie P Chiang, Rong Chen, Trevor J Hastie, and
    Atul J Butte. Network-based elucidation of human disease similarities reveals common
    functional modules enriched for pluripotent drug targets. PLoS Computational
    Biology, 6(2):e1000662, 2010. URL https://journals.plos.org/
    ploscompbiol/article?id=10.1371/journal.pcbi.1000662.
    [10] XiujuanWang, Natali Gulbahce, and Haiyuan Yu. Network-based methods for human disease gene prediction. Briefings in Functional Genomics, 10(5):280–293, 2011.
    URL https://academic.oup.com/bfg/article/10/5/280/206849.
    [11] 李時珍. 本草綱目. 1596.
    [12] Lin Huang, Duoli Xie, Yiran Yu, Huanlong Liu, Yan Shi, Tieliu Shi, and Chengping
    Wen. TCMID 2.0: a comprehensive resource for tcm. Nucleic acids research, 46(D1):
    D1117–D1120, 2018. URL https://academic.oup.com/nar/article/
    46/D1/D1117/4584630.
    [13] Alan Bridge Alex Bateman and CathyWu. Uniprot(website). URL https://www.
    uniprot.org/.
    [14] R. Gentleman H. Pags, P. Aboyoun and S. DebRoy. Biostrings. URL
    https://bioconductor.org/packages/release/bioc/html/
    Biostrings.html.
    [15] Varun Giri, Tadi Venkata Sivakumar, Kwang Myung Cho, Tae Yong Kim, and Anirban
    Bhaduri. RxnSim: a tool to compare biochemical reactions. Bioinformatics, 31
    (22):3712–3714, 07 2015. ISSN 1367-4803. doi: 10.1093/bioinformatics/btv416.
    URL https://doi.org/10.1093/bioinformatics/btv416.
    [16] Wei Liu, Chunquan Li, Yanjun Xu, Haixiu Yang, Qianlan Yao, Junwei Han, Desi
    Shang, Chunlong Zhang, Fei Su, Xiaoxi Li, Yun Xiao, Fan Zhang, Meng Dai, and Xia
    Li. Topologically inferring risk-active pathways toward precise cancer classification by directed random walk. Bioinformatics, 29(17):2169–2177, 07 2013. ISSN 1367-
    4803. doi: 10.1093/bioinformatics/btt373. URL https://doi.org/10.1093/
    bioinformatics/btt373.
    [17] Jie Sun, Hongbo Shi, ZhenzhenWang, Changjian Zhang, Lin Liu, LetianWang,Weiwei
    He, Dapeng Hao, Shulin Liu, and Meng Zhou. Inferring novel lncrna–disease
    associations based on a random walk model of a lncrna functional similarity network.
    Mol. BioSyst., 10:2074–2081, 2014. doi: 10.1039/C3MB70608G. URL
    http://dx.doi.org/10.1039/C3MB70608G.
    [18] Xing Chen, Ming-Xi Liu, and Gui-Ying Yan. Drug–target interaction prediction by
    random walk on the heterogeneous network. Molecular BioSystems, 8:1970–1978,
    2012. doi: 10.1039/C2MB00002D. URL http://dx.doi.org/10.1039/
    C2MB00002D.
    [19] Hsiang-Yuan Yeh Yu-Fen Huang and Von-Wun Soo. Inferring drug-disease
    associations from integration of chemical, genomic and phenotype data using
    network propagation. BMC Medical Genomics, November 2013. doi: 10.
    1186/1755-8794-6-S3-S4. URL https://www.ncbi.nlm.nih.gov/pmc/
    articles/PMC3980383/.
    [20] Taehyun Hwang and Rui Kuang. A heterogeneous label propagation algorithm for
    disease gene discovery. In Proceedings of the 2010 SIAM International Conference
    on Data Mining, pages 583–594. SIAM, 2010.
    [21] Maryam Lotfi Shahreza, Nasser Ghadiri, Seyed Rasoul Mousavi, Jaleh Varshosaz, and James R. Green. Heter-lp: A heterogeneous label propagation algorithm
    and its application in drug repositioning. Journal of Biomedical Informatics,
    68:167–183, 2017. ISSN 1532-0464. doi: https://doi.org/10.1016/j.jbi.2017.
    03.006. URL https://www.sciencedirect.com/science/article/
    pii/S1532046417300552.
    [22] A. R. Aronson. Effective mapping of biomedical text to the umls metathesaurus:
    the metamap program. Proceedings. AMIA Symposium, pages 17–21, 2001. ISSN
    1531-605X. URL https://www.ncbi.nlm.nih.gov/pmc/articles/
    PMC2243666/.
    [23] Yong-Hua Wang. Traditional chinese medicine database and analysis platform.
    URL https://bioconductor.org/packages/release/bioc/
    html/Biostrings.html.

    QR CODE