Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs

Saha, Barna; Hoch, Allison; Khuller, Samir; Raschid, Louiqa; Zhang, Xiao-Ning

doi:10.1007/978-3-642-12683-3_30

Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs

Barna Saha²⁰,
Allison Hoch²¹,
Samir Khuller²²,
Louiqa Raschid²³ &
…
Xiao-Ning Zhang^24,25

Conference paper

2698 Accesses
54 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6044))

Abstract

In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem NP-hard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time. We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time. In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well. We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR). A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns. Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses. We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4 (2003)
Google Scholar
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 32(Database issue), 267–270 (2004)
Article Google Scholar
Charikar, M.: Greedy approximation algorithms for finding dense components in a graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000)
Chapter Google Scholar
Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families 30(7), 1575–1584 (April 2002)
Google Scholar
Entrez: the life sciences search engine, http://www.ncbi.nih.gov/gquery/gquery.fcgi
Sayers, E.W., et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 37(Database issue), D16–D18 (2009)
Google Scholar
Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)
Article Google Scholar
Margarita, et al.: TAIR: a resource for integrated Arabidopsis data. Functional and Integrative Genomics 2(6), 239 (2002)
Article Google Scholar
Rhee, S.Y., et al.: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to arabidopsis biology, research materials and community. Nucleic Acids Research 31(1), 224–228 (2003)
Article Google Scholar
Feige, U.: A threshold of ln n for approximating set cover. Journal of the ACM 45(4), 634–652 (1998)
Article MATH MathSciNet Google Scholar
Gene Ontology (GO), http://www.geneontology.org/
Goldberg, A.V.: Finding a maximum density subgraph. Technical report (1984)
Google Scholar
Kang, B., Grancher, N., Koyffmann, V., Lardemer, D., Burney, S., Ahmad, M.: Multiple interactions between cryptochrome and phototropin blue-light signalling pathways in arabidopsis thaliana. Planta 227(5), 1091–1099 (2008)
Article Google Scholar
Khuller, S., Saha, B.: On finding dense subgraphs. In: ICALP 2009, pp. 597–608 (2009)
Google Scholar
King, A.D., Przulj, N., Jurisica, I.: Protein complex prediction via cost-based clustering. Bioinformatics 20(17), 3013–3020 (2004)
Article Google Scholar
Rhee, S.Y., Reiser, L.: Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes. Current Protocols in Bioinformatics (2005)
Google Scholar
Lawler, E.: Combinatorial optimization - networks and matroids. Holt, Rinehart and Winston, New York (1976)
MATH Google Scholar
Lee, W.-j., Raschid, L., Sayyadi, H., Srinivasan, P.: Exploiting ontology structure and patterns of annotation to mine significant associations between pairs of controlled vocabulary terms. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds.) DILS 2008. LNCS (LNBI), vol. 5109, pp. 44–60. Springer, Heidelberg (2008)
Chapter Google Scholar
Li, X., Foo, C., Ng, S.: Discovering protein complexes in dense reliable neighborhoods of protein interaction networks 6, 157–168 (2007)
Google Scholar
Maglott, D.R., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research 35(Database issue), 26–31 (2007)
Article Google Scholar
Navlakha, S., White, J., Nagarajan, N., Pop, M., Kingsford, C.: Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 400–417. Springer, Heidelberg (2009)
Chapter Google Scholar
Newman, M.E.J.: Modularity and community structure in networks 103(23), 8577–8582 (2006)
Google Scholar
Ohgishi, M., Saji, K., Okada, K., Sakai, T.: Functional analysis of each blue light receptor, cry1, cry2, phot1, and phot2, by using combinatorial multiple mutants in arabidopsis. PNAS 1010(8), 2223–2228 (2004)
Article Google Scholar
Pereira-Leal, J.B., Enright, A.J., Ouzounis, C.A.: Detection of functional modules from protein interaction networks. Proteins 54(1), 49–57 (2004)
Article Google Scholar
Picard, J.-C., Queyranne, M.: On the structure of all minimum cuts in a network and applications. Mathematical Programming Study 13, 8–16 (1980)
MATH MathSciNet Google Scholar
Plant Ontology (PO), http://www.plantontology.org/
PubMed, http://www.ncbi.nih.gov/entrez/
PubMed Central, http://www.pubmedcentral.nih.gov/
Saha, B., Hoch, A., Khuller, S., Raschid, L., Zhang, X.: Dense subgraph with restrictions and applications to gene annotation graphs (2010), http://www.cs.umd.edu/~samir/grant/recomb-full.pdf
Spirin, V., Mirny, L.A.: Protein complexes and functional modules in molecular networks 100(21), 12123–12128 (October 2003)
Google Scholar
Unified Medical Language System (UMLS), http://www.nlm.nih.gov/research/umls/
Yu, H., Paccanaro, A., Trifonov, V., Gerstein, M.: Predicting interactions in protein networks by completing defective cliques. Bioinformatics 22(7), 823–829 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Research supported by NSF Award CCF-0728839 Department of Computer Science, University of Maryland, College Park, MD, 20742
Barna Saha
Research supported by NSF REU Supplement to Award CCF-0728839 Department of Computer Science, University of Maryland, College Park, MD, 20742
Allison Hoch
Research supported by NSF Award CCF-0728839 and a Google Research Award Department of Computer Science and UMIACS, University of Maryland, College Park, MD, 20742
Samir Khuller
Research supported by NSF Award IIS-0430915 and IIS-0960963 UMIACS and Robert H. Smith School of Business, University of Maryland, College Park, MD, 20742
Louiqa Raschid
Research supported by Department of Biology, St. Bonaventure University, St. Bonaventure, NY, 14778
Xiao-Ning Zhang
Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, 20742
Xiao-Ning Zhang

Authors

Barna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Allison Hoch
View author publications
You can also search for this author in PubMed Google Scholar
Samir Khuller
View author publications
You can also search for this author in PubMed Google Scholar
Louiqa Raschid
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Ning Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, 02139, Cambridge, MA, USA
Bonnie Berger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saha, B., Hoch, A., Khuller, S., Raschid, L., Zhang, XN. (2010). Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-12683-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12682-6
Online ISBN: 978-3-642-12683-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics