Abstract
In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem NP-hard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time. We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time. In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well. We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR). A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns. Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses. We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4 (2003)
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(Database issue), 267–270 (2004)
Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000)
Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families 30(7), 1575–1584 (April 2002)
Entrez: the life sciences search engine, http://www.ncbi.nih.gov/gquery/gquery.fcgi
Sayers, E.W., et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 37(Database issue), D16–D18 (2009)
Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)
Margarita, et al.: TAIR: a resource for integrated Arabidopsis data. Functional and Integrative Genomics 2(6), 239 (2002)
Rhee, S.Y., et al.: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to arabidopsis biology, research materials and community. Nucleic Acids Research 31(1), 224–228 (2003)
Feige, U.: A threshold of ln n for approximating set cover. Journal of the ACM 45(4), 634–652 (1998)
Gene Ontology (GO), http://www.geneontology.org/
Goldberg, A.V.: Finding a maximum density subgraph. Technical report (1984)
Kang, B., Grancher, N., Koyffmann, V., Lardemer, D., Burney, S., Ahmad, M.: Multiple interactions between cryptochrome and phototropin blue-light signalling pathways in arabidopsis thaliana. Planta 227(5), 1091–1099 (2008)
Khuller, S., Saha, B.: On finding dense subgraphs. In: ICALP 2009, pp. 597–608 (2009)
King, A.D., Przulj, N., Jurisica, I.: Protein complex prediction via cost-based clustering. Bioinformatics 20(17), 3013–3020 (2004)
Rhee, S.Y., Reiser, L.: Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes. Current Protocols in Bioinformatics (2005)
Lawler, E.: Combinatorial optimization - networks and matroids. Holt, Rinehart and Winston, New York (1976)
Lee, W.-j., Raschid, L., Sayyadi, H., Srinivasan, P.: Exploiting ontology structure and patterns of annotation to mine significant associations between pairs of controlled vocabulary terms. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds.) DILS 2008. LNCS (LNBI), vol. 5109, pp. 44–60. Springer, Heidelberg (2008)
Li, X., Foo, C., Ng, S.: Discovering protein complexes in dense reliable neighborhoods of protein interaction networks 6, 157–168 (2007)
Maglott, D.R., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35(Database issue), 26–31 (2007)
Navlakha, S., White, J., Nagarajan, N., Pop, M., Kingsford, C.: Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 400–417. Springer, Heidelberg (2009)
Newman, M.E.J.: Modularity and community structure in networks 103(23), 8577–8582 (2006)
Ohgishi, M., Saji, K., Okada, K., Sakai, T.: Functional analysis of each blue light receptor, cry1, cry2, phot1, and phot2, by using combinatorial multiple mutants in arabidopsis. PNAS 1010(8), 2223–2228 (2004)
Pereira-Leal, J.B., Enright, A.J., Ouzounis, C.A.: Detection of functional modules from protein interaction networks. Proteins 54(1), 49–57 (2004)
Picard, J.-C., Queyranne, M.: On the structure of all minimum cuts in a network and applications. Mathematical Programming Study 13, 8–16 (1980)
Plant Ontology (PO), http://www.plantontology.org/
PubMed, http://www.ncbi.nih.gov/entrez/
PubMed Central, http://www.pubmedcentral.nih.gov/
Saha, B., Hoch, A., Khuller, S., Raschid, L., Zhang, X.: Dense subgraph with restrictions and applications to gene annotation graphs (2010), http://www.cs.umd.edu/~samir/grant/recomb-full.pdf
Spirin, V., Mirny, L.A.: Protein complexes and functional modules in molecular networks 100(21), 12123–12128 (October 2003)
Unified Medical Language System (UMLS), http://www.nlm.nih.gov/research/umls/
Yu, H., Paccanaro, A., Trifonov, V., Gerstein, M.: Predicting interactions in protein networks by completing defective cliques. Bioinformatics 22(7), 823–829 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saha, B., Hoch, A., Khuller, S., Raschid, L., Zhang, XN. (2010). Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-12683-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12682-6
Online ISBN: 978-3-642-12683-3
eBook Packages: Computer ScienceComputer Science (R0)