Skip to main content

Link Discovery in Graphs Derived from Biological Databases

(Research Paper)

  • Conference paper
Book cover Data Integration in the Life Sciences (DILS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4075))

Included in the following conference series:

Abstract

Public biological databases contain vast amounts of rich data that can also be used to create and evaluate new biological hypothesis. We propose a method for link discovery in biological databases, i.e., for prediction and evaluation of implicit or previously unknown connections between biological entities and concepts. In our framework, information extracted from available databases is represented as a graph, where vertices correspond to entities and concepts, and edges represent known, annotated relationships between vertices. A link, an (implicit and possibly unknown) relation between two entities is manifested as a path or a subgraph connecting the corresponding vertices. We propose measures for link goodness that are based on three factors: edge reliability, relevance, and rarity. We handle these factors with a proper probabilistic interpretation. We give practical methods for finding and evaluating links in large graphs and report experimental results with Alzheimer genes and protein interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Turner, F.S., Clutterbuck, D.R., Semple, C.A.M.: POCUS: Mining genomic sequence annotation to predict disease genes. Genome Biology 4, R75 (2003)

    Article  Google Scholar 

  2. Perez-Iratxeta, C., Wjst, M., Bork, P., Andrade, M.A.: G2D: A tool for mining genes associated with disease. BMC Genetics 6, 45 (2005)

    Article  Google Scholar 

  3. Colbourn, C.J.: The Combinatorics of Network Reliability. Oxford University Press, Oxford (1987)

    Google Scholar 

  4. Getoor, L., Diehl, C.P.: Link mining: A survey. SIGKDD Explorations 7, 3–12 (2005)

    Article  Google Scholar 

  5. Swanson, D.R.: Fish oil, Raynaud’s syndrome and undiscovered public knowledge. Perspectives in Biology and Medicine 30, 7–18 (1986)

    Google Scholar 

  6. Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence 91, 183–203 (1997)

    Article  MATH  Google Scholar 

  7. Liben-Nowell, D., Kleinberg, J.: The link prediction problem fof social networks. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM 2003), pp. 556–559 (2003)

    Google Scholar 

  8. Lin, S., Chalupsky, H.: Unsupervised link discovery in multi-relational data via rarity analysis. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), pp. 171–178 (2003)

    Google Scholar 

  9. Faloutsos, C., McCurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 118–127 (2004)

    Google Scholar 

  10. Asthana, S., King, O.D., Gibbons, F.D., Roth, F.P.: Predicting protein complex memebership using probabilistic network reliability. Genome Research 14, 1170–1175 (2004)

    Article  Google Scholar 

  11. Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explorations 7, 56–63 (2005)

    Article  Google Scholar 

  12. Tarjan, R.E.: Data Structures and Network Algorithms. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1983)

    Google Scholar 

  13. Eppstein, D.: Finding the k shortest paths. SIAM Journal on Computing 28, 652–673 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  14. Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM Journal on Computing 8, 410–421 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  15. Lacroix, Z., Raschid, L., Vidal, M.-E.: Efficient techniques to explore and rank paths in life science data sources. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 187–202. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Mork, P., Shaker, R., Halevy, A., Tarczy-Hornoch, P.: PQL: A declarative query language over dynamic biological schemata. In: Proceedings of the American Medical Informatics Association Annual Symposium 2002, pp. 533–537 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sevon, P., Eronen, L., Hintsanen, P., Kulovesi, K., Toivonen, H. (2006). Link Discovery in Graphs Derived from Biological Databases. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_5

Download citation

  • DOI: https://doi.org/10.1007/11799511_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36593-8

  • Online ISBN: 978-3-540-36595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics