Abstract
Systems biologists often have to deal with large gene groups obtained from high-throughput experiments, genome-wide predictions, and literature searches. Handling and functional interpretation of these gene groups is rather challenging. Problems arise from redundancies in databases, where a gene is given several names or identifiers, and from falsely assigned genes in the list. Moreover, genes in gene groups obtained by different methods are often represented by different types of identifiers, or are even genes from other model organisms. Thus, research in systems biology requires software tools that help to handle and interpret gene groups.
This chapter will review tools to store and compare gene groups represented by various identifiers. We introduce software that uses Gene Ontology (GO) annotations to infer biological processes associated with the gene groups. Additionally, we review approaches to further analyze gene groups regarding their transcriptional regulation by retrieving and analyzing their putative promoter regions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Benson D, Karsch-Mizrachi I, Lipman D, et al. GenBank. Nucleic Acids Res 2005;33:D34–38.
Wheeler D, Barrett T, Benson D, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2005;33:D39–45.
Boeckmann B, Bairoch A, Apweiler R, et al. The SWISS-PROT protein knowledge base and its supplement TrEMBL in 2003. Nucleic Acids Res 2003;31:365–370.
Maglott D, Ostell J, Pruitt K, Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2005;33:D54–58.
Blüthgen N, Kielbasa SM, Cajavec B, Herzel H. HOMGL-comparing genelists across species and with different accession numbers. Bioinformatics 2004;20:125–126.
Tullai JW, Schaffer ME, Mullenbrock S, et al. Identification of transcription factor binding sites upstream of human genes regulated by the phosphatidylinositol 3-kinase and MEK/ERK signaling pathways. J Biol Chem 2004;279:20167–20177.
Cheung K, Hager J, Pan D, et al. KARMA: a web server application for comparing and annotating heterogeneous microarray platforms. Nucleic Acids Res 2004;32:W441–444.
Veldhoven A, de Lange D, Smid M, et al. Storing, linking, and mining microarray databases using SRS. BMC Bioinformatics 2005;6:192.
Tsai J, Sultana R, Lee Y, et al. RESOURCERER: a database for annotating and linking microarray resources within and across species. Genome Biol 2001;2:SOFTWARE0002.
Wang P, Ding F, Chiang H, et al. ProbeMatchDB—a web database for finding equivalent probes across microarray platforms and species. Bioinformatics 2002;18:488–489.
Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000;25:25–29.
Bard JB, Rhee SY. Ontologies in biology: design, applications and future challenges. Nat Rev Genet 2004;5:213–222.
Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Stat Sci 2003;18:71–103.
Blüthgen N, Brand K, Cajavec B, et al. Biological profiling of gene groups utilizing gene ontology. Genome Inform 2005;16:106–115.
Draghici S, Khatri P, Martins RP, et al. Global functional profiling of gene expression. Genomics 2003;81:98–104.
Hosack DA, Dennis G Jr, Sherman BT, et al. Identifying biological themes within lists of genes with EASE. Genome Biol 2003;4:R70.
Dennis G, Sherman BT, Hosack DA, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003;4:P3.
Zhong S, Li C, Wong WH. ChipInfo: Software for extracting gene annotation and gene ontology information for microarray analysis. Nucleic Acids Res 2003;31:3483–3486.
Feng W, Wang G, Zeeberg B, et al. Development of gene ontology tool for biological interpretation of genomic and proteomic data. AMIA Annu Symp Proc 2003;839.
Castillo-Davis CI, Hartl DL. GeneMerge-post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 2003;19:891–892.
Al-Shahrour F, Diaz-Uriarte R, Dopazo J. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 2004;20:578–580.
Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 2004;20:1464–1465.
Maere S, Heymans K, Kuiper M. BiNGO: a Cytoscape plug-in to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 2005;21:3448–3449.
Conesa A, Gotz S, Garcia-Gomez J, et al. Blast2go: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005;21:3674–3676.
Kielbasa S, Blüthgen N, Herzel H. Genome-wide analysis of functions regulated by sets of transcription factors. Proceedings of the German Conference on Bioinformatics. 2004;105–113.
Blüthgen N, Kielbasa S, Herzel H. Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res 2005;33:272–279.
Wasserman W, Fickett J. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol 1998;278:167–181.
Schmid C, Praz V, Delorenzi M, et al. The Eukaryotic Promoter Database EPD: the impact of in silico primer extension. Nucleic Acids Res 2004;32:D82–85.
Carninci P, Kasukawa T, Katayama S, et al. The transcriptional landscape of the mammalian genome. Science 2005;309:1559–1563.
Suzuki Y, Yamashita R, Sugano S, Nakai K. DBTSS, DataBase of Transcriptional Start Sites: progress report 2004. Nucleic Acids Res 2004;32:D78–81.
Birney E, Andrews D, Bevan P, et al. Ensembl 2004. Nucleic Acids Res 2004;32 Database issue:D468–D470.
Stormo G. DNA binding sites: representation and discovery. Bioinformatics 2000;16:16–23.
Lawrence CE, Altschul SF, Boguski MS, et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 1993;262:208–214.
Roth FR, Hughes JD, Estep PE, Church GM. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnol 1998;16:939–945.
Frith M, Hansen U, Spouge J, Weng Z. Finding functional sequence elements by multiple local alignment. Nucleic Acids Res 2004;32:189–200.
Bailey TL, Elkan C. Fitting a mixture model by expectation maximisation to discover motifs in biopolymers. In: Proceedings of the International Conference on Intelligence Systems for Molecular Biology. AAAI Press; 1994:28–36.
van Helden J, André B, Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 1998;281:827–842.
Kielbasa S, Korbel J, Beule D, et al. Combining frequency and positional information to predict transcription factor binding sites. Bioinformatics 2001;17:1019–1026.
Sandelin A, Alkema W, Engstrom P, et al. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 2004;32 Database issue:D91–D94.
Wingender E, Dietze P, Karas H, Knuppel R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 1996;24:238–241.
Wingender E, Chen X, Hehl R, et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 2000;28:316–319.
Matys V, Fricke E, Geffers R, et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003;31:374–378.
Quandt K, Frech K, Karas H, et al. MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res 1995;23:4878–4884.
Kel A, Gossling E, Reuter I, et al. MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 2003;31:3576–3579.
Frith M, Fu Y, Yu L, et al. Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Res 2004;32:1372–1381.
Rahmann S, Müller T, Vingron M. On the power of profiles for transcription factor binding site detection. Stat Appl Genet Mol Biol 2003;2:7.
Wasserman W, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004;5:276–287.
Bussemaker H, Li H, Siggia E. Regulatory element detection using correlation with expression. Nat Genet 2001;27:167–171.
Caselle M, Di Cunto F, Provero P. Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes. BMC Bioinformatics 2002;3:7.
Wagner A. A computational genomics approach to the identification of gene networks. Nucleic Acids Res 1997;25:3594–3604.
Pilpel Y, Sudarsanam P, Church G. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 2001;29:153–159.
Frith M, Spouge J, Hansen U, Weng Z. Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. Nucleic Acids Res 2002;30:3214–3224.
Frith M, Li M, Weng Z. Cluster-Buster: Finding dense clusters of motifs in DNA sequences. Nucleic Acids Res 2003;31:3666–3668.
Murakami K, Kojima T, Sakaki Y. Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression. BMC Genomics 2004;5:16.
Kel-Margoulis O, Romashchenko A, Kolchanov N, et al. COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation. Nucleic Acids Res 2000;28:311–315.
Dieterich C, Cusack B, Wang H, et al. Annotating regulatory DNA based on man-mouse genomic comparison. Bioinformatics 2002;18Suppl 2 S84–S90.
Wasserman W, Palumbo M, Thompson W, et al. Human-mouse genome comparisons to locate regulatory sites. Nat Genet 2000;26:225–228.
Wang T, Stormo G. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 2003;19:2369–2380.
Lenhard B, Sandelin A, Mendoza L, et al. Identification of conserved regulatory elements by comparative genome analysis. J Biol 2003;2:13.
Roepcke S, Grossmann S, Rahmann S, Vingron M. T-Reg Comparator: an analysis tool for the comparison of position weight matrices. Nucleic Acids Res 2005;33:W438–441.
Kielbasa S, Gonze D, Herzel H. Measuring similarities between transcription factor binding sites. BMC Bioinformatics 2005;6:237.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Humana Press Inc.
About this chapter
Cite this chapter
Blüthgen, N., Kielbasa, S.M., Beule, D. (2007). Handling and Interpreting Gene Groups. In: Choi, S. (eds) Introduction to Systems Biology. Humana Press. https://doi.org/10.1007/978-1-59745-531-2_4
Download citation
DOI: https://doi.org/10.1007/978-1-59745-531-2_4
Publisher Name: Humana Press
Print ISBN: 978-1-58829-706-8
Online ISBN: 978-1-59745-531-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)