研究生: |
林儀潤 Lin, Yi-Ruen |
---|---|
論文名稱: |
人類疾病相關蛋白質資料庫之建構 Design and Construction of Human Disease-Associated Protein Database (HDAPD) |
指導教授: |
林志侯
Lin, Thy-Hou |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
生命科學暨醫學院 - 分子醫學研究所 Institute of Molecular Medicine |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 英文 |
論文頁數: | 53 |
中文關鍵詞: | 疾病 、蛋白質 、資料庫 、生物資訊 |
外文關鍵詞: | disease, protein, database, bioinformatics |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自從 DNA 定序技術被英國劍橋大學的 Frederick Sanger (化學修飾法)和美國哈佛大學的 Walter Gilbert 與 Allan Maxam (雙去氧核醣核酸取代法) 1970年代初期發明後,研究人員可以快速解析出DNA 序列,使得DNA序列資料大量累積,因此許多大型資料庫從80年代陸續建立,例如:GenBank、EMBL和DDBJ。除了核酸資料庫外,也有許多大型的蛋白質的資料庫在此時開始構築,例如:Swiss-Prot。在人類基因體計畫 (HGP) 於2003完成後,生物科學領域朝向蛋白質體學邁進,蛋白質研究成為熱門領域,其序列、結構、功能和參與路徑成為人們研究的對象,蛋白質專門的資料庫、分析與預測程式持續出現,到現在其成長速度依然驚人。
相對於快速成長的分子生物資料庫,疾病相關資料庫依舊不多,而大多數的資料庫多又以提供臨床訊息為主,例如:疾病的症狀、檢測方法、治療方式和臨床照護。直到現在,用以收集核酸方面的資料庫,只有隸屬於NCBI的OMIM和 Weizmann Institute of Science所建立的GeneCards能夠較完整提供疾病相關的基因資料;提供疾病相關蛋白質資料的大型資料庫並未存在。有鑑於此,我們將建構一個專門提供人類疾病相關蛋白質的資料庫-HDAPD,以建立疾病與蛋白質的關係為首要目標,接著從各個大型蛋白質資料庫收集資料,例如: 自PDB收集蛋白質序列和結構資料,在KEGG中尋找蛋白質所參與的路徑,並從PubMed裡攫取最新的科學文獻,一方面協助研究人員更深入探討人類疾病相關蛋白質,另一方面加速疾病治療藥物的開發。
After DNA sequencing technologies were invented by Frederick Sanger from University of Cambridge in British and Walter Gilbert and Allan Maxam from Harvard University in USA in early 1970s, researchers have been able to determine DNA sequences quickly. Many huge databases including GenBank, EMBL and DDBJ have been established since 80s to store large accumulation of DNA sequences. In addition to nucleotide databases, a lot of protein databases began to be constructed such as Swiss-Prot. When Human Genome Project (HGP) was completed in 2003, biological science tends to proteomics. People started to research sequence, structure, function, and pathway of protein. Protein specific databases, analysis and predict programs are created uninterruptedly.
Comparing with molecular biological databases, disease-associated databases remain fewer. Besides, most of databases are built to provide clinical information such as syndrome, diagnosis, therapeutic method and clinical care. Currently, OMIM of NCBI in USA and GeneCards of Weizmann Institute of Science in Israel can provide better disease-associated genetic information. In fact, there is not any huge database can provide useful disease-associated protein information for researchers. According to above reason, we plan to construct a new database - HDAPD that is specific to provide human disease-associated protein information. To build relationship between disease and protein is main mission of HDAPD. Next, we will collect data from several huge protein databases. Obtaining protein sequences and structures from PDB, associated pathways from KEGG and acquiring latest science literatures from PubMed. HDAPD is created to assist researchers to survey human disease- associated protein deeply and accelerate development of drugs.
1. WHO. (2007) The 10 leading causes of death by broad income group (2002) and (2005 projection).
2. WHO. (2008) The 10 leading causes of death by broad income group (2004).
3. Uitto, J. (2005) The gene family of ABC transporters--novel mutations, new phenotypes. Trends Mol Med, 11, 341-343.
4. McKinnon, P.J. (2009) DNA repair deficiency and neurological disease. Nat Rev Neurosci, 10, 100-112.
5. Lee, J.W., Beebe, K., Nangle, L.A., Jang, J., Longo-Guess, C.M., Cook, S.A., Davisson, M.T., Sundberg, J.P., Schimmel, P. and Ackerman, S.L. (2006) Editing-defective tRNA synthetase causes protein misfolding and neurodegeneration. Nature, 443, 50-55.
6. Javier, F.C., and Alberto, R. K. (2002) Alternative splicing: multiple control mechanisms and involvement in human disease. Trends in Genetics, 18, 186 - 193.
7. Foster, K.W., Frost, A.R., McKie-Bell, P., Lin, C.Y., Engler, J.A., Grizzle, W.E. and Ruppert, J.M. (2000) Increase of GKLF messenger RNA and protein expression during progression of breast cancer. Cancer Res, 60, 6488-6495.
8. Tchekneva, E.E., Khuchua, Z., Davis, L.S., Kadkina, V., Dunn, S.R., Bachman, S., Ishibashi, K., Rinchik, E.M., Harris, R.C., Dikov, M.M. et al. (2008) Single amino acid substitution in aquaporin 11 causes renal failure. J Am Soc Nephrol, 19, 1955-1964.
9. Zhong, M., Molday, L.L. and Molday, R.S. (2009) Role of the C terminus of the photoreceptor ABCA4 transporter in protein folding, function, and retinal degenerative diseases. J Biol Chem, 284, 3640-3649.
10. Corthay, A., Backlund, J. and Holmdahl, R. (2001) Role of glycopeptide-specific T cells in collagen-induced arthritis: an example how post-translational modification of proteins may be involved in autoimmune disease. Ann Med, 33, 456-465.
11. U.S. Department of Energy Office of Science, O.o.B.a.E.R., Human Genome Program. (2009) Human Genome Project Information. Available from: www.ornl.gov/hgmis
12. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. and Sayers, E.W. (2009) GenBank. Nucleic Acids Res, 37, D26-31.
13. Sugawara, H., Ikeo, K., Fukuchi, S., Gojobori, T. and Tateno, Y. (2009) DDBJ dealing with mass data produced by the second generation sequencer. Nucleic Acids Res, 37, D16-18.
14. Sterk, P., Kulikova, T., Kersey, P. and Apweiler, R. (2007) The EMBL Nucleotide Sequence and Genome Reviews Databases. Methods Mol Biol, 406, 1-21.
15. Gasteiger, E., Jung, E. and Bairoch, A. (2001) SWISS-PROT: connecting biomolecular knowledge via a protein database. Curr Issues Mol Biol, 3, 47-55.
16. O'Donovan, C., Martin, M.J., Gattiker, A., Gasteiger, E., Bairoch, A. and Apweiler, R. (2002) High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief Bioinform, 3, 275-284.
17. Barker, W.C., Garavelli, J.S., McGarvey, P.B., Marzec, C.R., Orcutt, B.C., Srinivasarao, G.Y., Yeh, L.S., Ledley, R.S., Mewes, H.W., Pfeiffer, F. et al. (1999) The PIR-International Protein Sequence Database. Nucleic Acids Res, 27, 39-43.
18. Berman, H., Henrick, K. and Nakamura, H. (2003) Announcing the worldwide Protein Data Bank. Nat. Struct. Biol., 10, 980-980.
19. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., Cuche, B.A., de Castro, E., Lachaize, C., Langendijk-Genevaux, P.S. and Sigrist, C.J. (2008) The 20 years of PROSITE. Nucleic Acids Res, 36, D245-249.
20. Baxevanis, A.D. (2008) Searching NCBI databases using Entrez. Curr Protoc Bioinformatics, Chapter 1, Unit 1 3.
21. Zdobnov, E.M., Lopez, R., Apweiler, R. and Etzold, T. (2002) The EBI SRS server--recent developments. Bioinformatics, 18, 368-373.
22. WHO. (2008) The 10 leading causes of death by broad income group (2004). Available from: http://www.who.int/mediacentre/factsheets/fs310/en/index.html
23. Berman, H., Henrick, K., Nakamura, H. and Markley, J.L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res., 35, D301-D303.
24. Genes and Disease. National Library of Medicine (US), NCBI, Bethesda , MD. Available from: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=gnd.TOC&depth=2
25. Segolene, A., Christopher, G. C., Marjorie, S. G., James, E. H., Steven, E. H., Robert, J., Richard, M., Mea, M. C. R., Kentaro, S., Martti, V. and Tevfik, B. U. (1995-2007) International Classification of Diseases (ICD). 10th Revision, 2007 ed. World Health Organization (WHO). Available from: http://apps.who.int/classifications/apps/icd/icd10online/
26. McKusick, V.A. (1966) Mendelian Inheritance in Man, A Catolog of Autosomal Dominant, Autosomal Recessive, and X-linked Phenotypes. 1st ed. Johns Hopkins University Press, Baltimore, MD.
27. McKusick, V.A. (1998) Mendelian Inheritance in Man, A Catolog of Human Genes and Genetic Disorders. 12th ed. Johns Hopkins University Press, Baltimore, MD.
28. Amberger, J., Bocchini, C.A., Scott, A.F. and Hamosh, A. (2009) McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res., 37, D793-D796.
29. Bairoch, A., Bougueleret, L., Altairac, S., Amendolia, V., Auchincloss, A., Puy, G.A., Axelsen, K., Baratin, D., Blatter, M.C., Boeckmann, B. et al. (2007) The universal protein resource (UniProt). Nucleic Acids Res., 35, D193-D197.
30. The UniProt Consortium. (2008) The Universal Protein Resource (UniProt). Nucleic Acids Res., 36.
31. Pagon, R.A., Editor-in-chief; Bird, Thomas C.; Dolan, Cynthia R.; Smith, Richard J.H.; Stephens, Karen; Associate editors. (1993-2009) GeneReviews. University of Washington, Seattle (WA). Available from: www.genereviews.org
32. (2003-2009) Genetics Home Reference. U.S. National Library of Medicine (NLM). Available from: http://ghr.nlm.nih.gov/
33. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25, 25-29.
34. Kanehisa, M. (1997) A database for post-genome analysis. Trend in Genetics, 13, 375-376.
35. Goto, S., Nishioka, T., and Kanehisa, M. (2000) LIGAND: chemical database of enzyme reactions. Nucleic Acids Res., 28, 380-382.
36. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, F. K., Itoh, M., Kawashima, S., Katayama, T., Araki, M. and Hirakawa, M. (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res., 34, D354-D357.
37. Kanehisa, M., Araki, M., Goto, S, Hattori, M., Hirakawa, M., Itoh, M., Katayama, T. Kawashima, S., Okuda, S., Tokimatsu, T. and Yamanishi, Y. (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res., 36, D480-D484.
38. The Apache HTTP Server Project. Available from: http://httpd.apache.org/
39. David, A., Allan, L., and Michael, W. MySQL:The world's most popular open source database. Available from: http://www.mysql.com/
40. Larry, W., Tom, C., Jon, O. (2000) Programming Perl. Third Edition ed. O'Reilly, Sebastopol, CA.
41. Randal, L.S.a.T., P. (2008) Learning Perl. Fifth Edition ed. O' Reilly, Sebastopol, CA.
42. Doug, S. (2000) Beginner's Introduction to Perl. O'Reilly, Sebastopol, CA.
43. (1997-2009) ActiveState - The dynamic languages company, Vancouver, BC.
44. Elaine, A. (1999-2001) The Timeline of Perl and its Culture. Available from: http://history.perl.org/
45. Jeffrey, E.F.F. (2006) Mastering Regular Expressions. Third Edition ed. O'Reilly, Sebastopol, CA.
46. (1999-2009) Comprehensive Perl Archive Network (CPAN). Available from: http://www.cpan.org/
47. Alligator, D., Tim, B. (2000) Programming the Perl DBI. O'Reilly, Sebastopol, CA.
48. James, T. (2001) Beginning Perl for Bioinformatics. O'Reilly, Sebastopol, CA.
49. Paul, K. (2008) JavaScript creator ponders past, future. InfoWorld.
50. Guy, L.S.J. ECMAScript. Available from: http://www.ecmascript.org/
51. David, F. (1998) Javascript: The Definitive Guide. Third Edition ed. O'Reilly, Sebastopol, CA.
52. Jmol: an open-source Java viewer for chemical structures in 3D. Available from: http://www.jmol.org/
53. Twigger, S.N., Shimoyama, M., Bromberg, S., Kwitek, A. E., Jacob, H. J. and the RGD Team. (2007) The Rat Genome Database, update 2007—Easing the path from disease to data and back again. Nucleic Acids Res., 35, D658-662.