Skip to main content

Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6044))

Abstract

In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem NP-hard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time. We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time. In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well. We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR). A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns. Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses. We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4 (2003)

    Google Scholar 

  2. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(Database issue), 267–270 (2004)

    Article  Google Scholar 

  3. Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families 30(7), 1575–1584 (April 2002)

    Google Scholar 

  5. Entrez: the life sciences search engine, http://www.ncbi.nih.gov/gquery/gquery.fcgi

  6. Sayers, E.W., et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 37(Database issue), D16–D18 (2009)

    Google Scholar 

  7. Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)

    Article  Google Scholar 

  8. Margarita, et al.: TAIR: a resource for integrated Arabidopsis data. Functional and Integrative Genomics 2(6), 239 (2002)

    Article  Google Scholar 

  9. Rhee, S.Y., et al.: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to arabidopsis biology, research materials and community. Nucleic Acids Research 31(1), 224–228 (2003)

    Article  Google Scholar 

  10. Feige, U.: A threshold of ln n for approximating set cover. Journal of the ACM 45(4), 634–652 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  11. Gene Ontology (GO), http://www.geneontology.org/

  12. Goldberg, A.V.: Finding a maximum density subgraph. Technical report (1984)

    Google Scholar 

  13. Kang, B., Grancher, N., Koyffmann, V., Lardemer, D., Burney, S., Ahmad, M.: Multiple interactions between cryptochrome and phototropin blue-light signalling pathways in arabidopsis thaliana. Planta 227(5), 1091–1099 (2008)

    Article  Google Scholar 

  14. Khuller, S., Saha, B.: On finding dense subgraphs. In: ICALP 2009, pp. 597–608 (2009)

    Google Scholar 

  15. King, A.D., Przulj, N., Jurisica, I.: Protein complex prediction via cost-based clustering. Bioinformatics 20(17), 3013–3020 (2004)

    Article  Google Scholar 

  16. Rhee, S.Y., Reiser, L.: Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes. Current Protocols in Bioinformatics (2005)

    Google Scholar 

  17. Lawler, E.: Combinatorial optimization - networks and matroids. Holt, Rinehart and Winston, New York (1976)

    MATH  Google Scholar 

  18. Lee, W.-j., Raschid, L., Sayyadi, H., Srinivasan, P.: Exploiting ontology structure and patterns of annotation to mine significant associations between pairs of controlled vocabulary terms. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds.) DILS 2008. LNCS (LNBI), vol. 5109, pp. 44–60. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  19. Li, X., Foo, C., Ng, S.: Discovering protein complexes in dense reliable neighborhoods of protein interaction networks 6, 157–168 (2007)

    Google Scholar 

  20. Maglott, D.R., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35(Database issue), 26–31 (2007)

    Article  Google Scholar 

  21. Navlakha, S., White, J., Nagarajan, N., Pop, M., Kingsford, C.: Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 400–417. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  22. Newman, M.E.J.: Modularity and community structure in networks 103(23), 8577–8582 (2006)

    Google Scholar 

  23. Ohgishi, M., Saji, K., Okada, K., Sakai, T.: Functional analysis of each blue light receptor, cry1, cry2, phot1, and phot2, by using combinatorial multiple mutants in arabidopsis. PNAS 1010(8), 2223–2228 (2004)

    Article  Google Scholar 

  24. Pereira-Leal, J.B., Enright, A.J., Ouzounis, C.A.: Detection of functional modules from protein interaction networks. Proteins 54(1), 49–57 (2004)

    Article  Google Scholar 

  25. Picard, J.-C., Queyranne, M.: On the structure of all minimum cuts in a network and applications. Mathematical Programming Study 13, 8–16 (1980)

    MATH  MathSciNet  Google Scholar 

  26. Plant Ontology (PO), http://www.plantontology.org/

  27. PubMed, http://www.ncbi.nih.gov/entrez/

  28. PubMed Central, http://www.pubmedcentral.nih.gov/

  29. Saha, B., Hoch, A., Khuller, S., Raschid, L., Zhang, X.: Dense subgraph with restrictions and applications to gene annotation graphs (2010), http://www.cs.umd.edu/~samir/grant/recomb-full.pdf

  30. Spirin, V., Mirny, L.A.: Protein complexes and functional modules in molecular networks 100(21), 12123–12128 (October 2003)

    Google Scholar 

  31. Unified Medical Language System (UMLS), http://www.nlm.nih.gov/research/umls/

  32. Yu, H., Paccanaro, A., Trifonov, V., Gerstein, M.: Predicting interactions in protein networks by completing defective cliques. Bioinformatics 22(7), 823–829 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Saha, B., Hoch, A., Khuller, S., Raschid, L., Zhang, XN. (2010). Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12683-3_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12682-6

  • Online ISBN: 978-3-642-12683-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics