Summary
In the post-Human Genome Project era, awareness of the resources available through the internet is essential to both molecular biologists and clinicians. An overview of the main databases and analytical tools described in this chapter is important to understand the principles upon which hypotheses are generated, experiments are based and conclusions reached. Similarly, an introduction to the terminology of these resources often facilitates their use and adoption into practice. This chapter covers database resources such as NCBI/ Entrez, Ensembl and UCSC as well as analytical tools for sequence alignment, promoter analysis and molecular interactions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- BLAST:
-
Basic Local Alignment Search Tool
- DDBJ:
-
DNA Data Bank of Japan
- EMBL:
-
European Molecular Biology Laboratory
- EST:
-
Expressed Sequence Tag
- NCBI:
-
National Centre for Biotechnology Information
- UCSC:
-
University of California, Santa Cruz
References
Dayhoff, M. O. (1978) Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington D.C.
Baxeveanis, A. D. O., and Ouellette, B. F. F. (eds) (2005) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 3rd ed. Wiley-InterScience, Hoboken, NJ.
Bateman, A. (2006) Editorial, Nucleic Acids Res. 34, Database Issue 1.
Geer, R. C., and Sayers, E. W. (2003) Entrez: making use of its power. Brief Bioinform. 4, 179ā184.
Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. (2005) Entrez gene: gene-centered information at NCBI. Nucleic Acids Res. 33, D54āD58.
Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2003) NCBI reference Sequence project: update and current status. Nucleic Acids Res. 31, 34ā37.
Birney, E., Andrews, D., Bevan, P., Caccamo, M., Cameron, G., Chen, Y., et al. (2004) Ensembl 2004. Nucleic Acids Res. 32, D468āD470.
Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., et al. (2002) The human genome browser at UCSC. Genome Res. 12, 996ā1006.
Openhelix,http://www.openhelix.com/ucscmaterials.shtml.
Safran, M., Chalifa-Caspi, V., Shmueli. O., Olender. T., Lapidot, M., Rosen, N., et al. (2003) Human gene-centric databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res. 31, 142ā146.
Rosen, N., Chalifa-Caspi, V., Shmueli, O., Adato, A., Lapidot, M., Stampnitzky, J., et al. (2003) GeneLoc: exon-based integration of human genome maps. Bioinformatics 19(S1), 222ā224.
Lenhard, B., Hayes, W. S., and Wasserman, W. W. (2001) GeneLynx: a gene-centric portal to the human genome. Genome Res. 11, 2151ā2157.
Gilbert, D. G., (2002) euGenes: a eukaryote genome information system. Nucleic Acids Res. 30, 145ā148.
Gribskov, M. R., and Devereux, J. (1991) Sequence analysis primer. UWBC biotechnical resource series, Stockton Press; Macmillan Publishers, New York, xv, 279.
Durbin, R. (1998) Biological sequence analysis: probabalistic models of proteins and nucleic acids, Cambridge University Press, Cambridge, p. 356.
Thompson, J. D., Plewniak, F., and Poch, O. (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27, 2682ā2690.
Phillips, A., Janies, D., and Wheeler, W. (2000) Multiple sequence alignment in phylogenetic analysis. Mol. Phylogenet. Evol. 16, 317ā330.
Baldauf, S. L. (2003) Phylogeny for the faint of heart: a tutorial. Trends Genet. 19, 345ā351.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403ā410.
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389ā3402.
Schaffer, A. A., Aravind, L., Madden, T. L., Shavirin, S., Spouge, J. L., Wolf, Y. I., et al. (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29, 2994ā3005.
States, D. J., Gish, W., and Altschul, S. F. (1991) Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods: A Companion to Methods in Enzymology 3, 66ā70.
Tatusova, T. A., and Madden, T. L. (1999) BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247ā250.
Schwartz, S., Zhang, Z., Frazer, K. A., Smit, A., Riemer, C., Bouck, J., et al. (2000) PipMakerāweb server for aligning two genomic DNA sequences. Genome Res. 10, 577ā586.
Kent, W. J. (2002) BLATāthe BLAST-like alignment tool. Genome Res. 12, 656ā664.
Pertsemlidis, A., and Fondon III, J. W.. (2001) Having a BLAST with bioinformatics (and avoiding BLASTphemy). Genome Biol. 2, Reviews 1ā10
Boutros, P. C. (2005) An Introduction to Effective BLASTing. Hypothesis 3, 26ā33.
Liu, X., Noll, D. M., Lieb, J. D., and Clarke, N. D. (2005) DIP-chip: rapid and accurate determination of DNA-binding specificity. Genome Res. 15, 421ā427.
Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber, G. K., et al. (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799ā804.
Ren, B., Robert, F., Wyrick, J. J., Aparicio, O., Jennings, E. G., Simon, I., et al. (2000) Genome-wide location and function of DNA binding proteins. Science 290, 2306ā2309.
Frith, M. C., Fu, Y., Yu, L., Chen, J. F., Hansen, U., and Weng, Z. (2004) Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Res. 32, 1372ā1381.
Tompa, M., Li, N., Bailey, T. L., Church, G. M., De Moor, B., Eskin, E., et al. (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137ā144.
Wasserman, W. W., Palumbo, M., Thompson, W., Fickett, J. W., and Lawrence, C. E., (2000) Human-mouse genome comparisons to locate regulatory sites. Nat. Genet. 26, 225ā228.
Boutros, P. C., Moffat, I. D., Franc, M. A., Tijet, N., Tuomisto, J., Pohjanvirta, R., et al. (2004) Dioxin-responsive AHRE-II gene battery: identification by phylogenetic footprinting. Biochem. Biophys. Res. Commun. 321, 707ā715.
Zhu, Z., Pilpel, Y., and Church, G. M. (2002) Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. J. Mol. Biol. 318, 71ā81.
Steffen, M., Petti, A., Aach, J., Dāhaeseleer, P., and Church, G. (2002) Automated modelling of signal transduction networks. BMC Bioinformatics 3, 34.
Segal, E., Shapira, M., Regev, A., Peāer, D., Botstein, D., Koller, D., et al. (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166ā176.
Friedman, N. (2004) Inferring cellular networks using probabilistic graphical models. Science 303, 799ā805.
Mwangi, M. M., and Siggia, E. D. (2003) Genome wide identification of regulatory motifs in Bacillus subtilis. BMC Bioinformatics 4, 18.
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W., and Lenhard, B. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91āD94.
Rozen, S., and Skaletsky, H. (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 132, 365ā386.
Pattyn, F., Speleman, F., De Paepe, A., and Vandesompele, J. (2003) RTPrimerDB: the real-time PCR primer and probe database. Nucleic Acids Res. 31, 122ā123.
Lexa, M., Horak, J., and Brzobohaty, B. (2001) Virtual PCR. Bioinformatics 17, 192ā193.
Boutros, P. C., and Okey, A. B. (2004) PUNS: transcriptomic- and genomic-in silico PCR for enhanced primer design. Bioinformatics 20, 2399ā2400.
Moore, M. J. (2005) From birth to death: the complex lives of eukaryotic mRNAs. Science 309, 1514ā1518.
Xie, X., Lu, J., Kulbokas, E. J., Golub, T. R., Mootha, V., Lindblad-Toh, K., et al. (2005) Systematic discovery of regulatory motifs in human promoters and 3ā UTRs by comparison of several mammals. Nature 434, 338ā345.
Macke, T. J., Ecker, D. J., Gutell, R. R., Gautheret, D., Case, D. A., and Sampath, R. (2001) RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Res. 29, 4724ā4235.
Tinoco, I., Jr., and Bustamante, C. (1999) How RNA folds. J. Mol. Biol. 293, 271ā281.
Major, F., and Griffey, R. (2001) Computational methods for RNA structure determination. Curr. Opin. Struct. Biol. 11, 282ā286.
Marchler-Bauer, A., Anderson, J. B., DeWeese-Scott, C., Fedorova, N. D., Geer, L. Y., He, S., et al. (2003) CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31, 383ā387.
Baker, D., and Sali, A. (2001) Protein structure prediction and structural genomics. Science 294, 93ā96.
Meiler, J., and Baker, D. (2003) Coupled prediction of protein secondary and tertiary structure. Proc. Natl. Acad. Sci. U S A 100, 12105ā1210.
Aebersold, R., and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 422, 198ā207.
Tong, A. H., Drees, B., Nardelli, G., Bader, G. D., Brannetti, B., Castagnoli, L., et al. (2002) A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321ā324.
Tsoka, S., and Ouzounis, C. A. (2000) Prediction of protein interactions: metabolic enzymes are frequently involved in gene fusion. Nat. Genet. 26, 141ā142.
Sharan, R., Suthram, S., Kelley, R. M., Kuhn, T., McCuine, S., Uetz, P., et al. (2005) Conserved patterns of protein interaction in multiple species. Proc. Natl. Acad. Sci. U S A 102, 1974ā1979.
Kemmeren, P., van Berkum, N. L., Vilo, J., Bijma, T., Donders, R., Brazma, A., et al. (2002) Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell. 9, 1133ā1143.
Huh W. K., Falvo, J. V., Gerke, L. C., Carroll, A. S., Howson, R. W., Weissman, J. S., et al. (2003) Global analysis of protein localization in budding yeast. Nature 425, 686ā691.
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N. J., Chung, S., et al. (2003) Bayesian networks approach for predicting proteināprotein interactions from genomic data. Science 302, 449ā453.
Mishra, G. R., Suresh, M., Kumaran, K., Kannabiran, N., Suresh, S., Bala, P., et al. (2006) Human protein reference databaseā2006 update. Nucleic Acids Res. 34, D411āD414.
Brown, K. R., and Jurisica, I. (2005) Online predicted human interaction database. Bioinformatics 1, 2076ā2082.
Przulj, N., Wigle, D. A., and Jurisica, I. (2004) Functional topology in a network of protein interactions. Bioinformatics 20, 340ā348.
Luscombe, N. M., Babu, M. M., Yu, H., Snyder, M., Teichmann, S. A., and Gerstein, M. (2004) Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431, 308ā312.
Wachi, S., Yoneda., K., and Wu, R. (2005) Interactomeātranscriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics 21, 4205ā4208.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Joshua, A.M., Boutros, P.C. (2008). Web-Based Resources for Clinical Bioinformatics. In: Trent, R.J. (eds) Clinical Bioinformatics. Methods in Molecular Medicineā¢, vol 141. Humana Press. https://doi.org/10.1007/978-1-60327-148-6_17
Download citation
DOI: https://doi.org/10.1007/978-1-60327-148-6_17
Publisher Name: Humana Press
Print ISBN: 978-1-58829-791-4
Online ISBN: 978-1-60327-148-6
eBook Packages: Springer Protocols