Link Discovery in Graphs Derived from Biological Databases

Sevon, Petteri; Eronen, Lauri; Hintsanen, Petteri; Kulovesi, Kimmo; Toivonen, Hannu

doi:10.1007/11799511_5

Link Discovery in Graphs Derived from Biological Databases

(Research Paper)

Petteri Sevon²²,
Lauri Eronen²²,
Petteri Hintsanen²²,
Kimmo Kulovesi²² &
…
Hannu Toivonen²²

Conference paper

651 Accesses
48 Citations
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4075))

Abstract

Public biological databases contain vast amounts of rich data that can also be used to create and evaluate new biological hypothesis. We propose a method for link discovery in biological databases, i.e., for prediction and evaluation of implicit or previously unknown connections between biological entities and concepts. In our framework, information extracted from available databases is represented as a graph, where vertices correspond to entities and concepts, and edges represent known, annotated relationships between vertices. A link, an (implicit and possibly unknown) relation between two entities is manifested as a path or a subgraph connecting the corresponding vertices. We propose measures for link goodness that are based on three factors: edge reliability, relevance, and rarity. We handle these factors with a proper probabilistic interpretation. We give practical methods for finding and evaluating links in large graphs and report experimental results with Alzheimer genes and protein interactions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Turner, F.S., Clutterbuck, D.R., Semple, C.A.M.: POCUS: Mining genomic sequence annotation to predict disease genes. Genome Biology 4, R75 (2003)
Article Google Scholar
Perez-Iratxeta, C., Wjst, M., Bork, P., Andrade, M.A.: G2D: A tool for mining genes associated with disease. BMC Genetics 6, 45 (2005)
Article Google Scholar
Colbourn, C.J.: The Combinatorics of Network Reliability. Oxford University Press, Oxford (1987)
Google Scholar
Getoor, L., Diehl, C.P.: Link mining: A survey. SIGKDD Explorations 7, 3–12 (2005)
Article Google Scholar
Swanson, D.R.: Fish oil, Raynaud’s syndrome and undiscovered public knowledge. Perspectives in Biology and Medicine 30, 7–18 (1986)
Google Scholar
Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence 91, 183–203 (1997)
Article MATH Google Scholar
Liben-Nowell, D., Kleinberg, J.: The link prediction problem fof social networks. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM 2003), pp. 556–559 (2003)
Google Scholar
Lin, S., Chalupsky, H.: Unsupervised link discovery in multi-relational data via rarity analysis. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), pp. 171–178 (2003)
Google Scholar
Faloutsos, C., McCurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 118–127 (2004)
Google Scholar
Asthana, S., King, O.D., Gibbons, F.D., Roth, F.P.: Predicting protein complex memebership using probabilistic network reliability. Genome Research 14, 1170–1175 (2004)
Article Google Scholar
Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explorations 7, 56–63 (2005)
Article Google Scholar
Tarjan, R.E.: Data Structures and Network Algorithms. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1983)
Google Scholar
Eppstein, D.: Finding the k shortest paths. SIAM Journal on Computing 28, 652–673 (1998)
Article MATH MathSciNet Google Scholar
Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM Journal on Computing 8, 410–421 (1979)
Article MATH MathSciNet Google Scholar
Lacroix, Z., Raschid, L., Vidal, M.-E.: Efficient techniques to explore and rank paths in life science data sources. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 187–202. Springer, Heidelberg (2004)
Chapter Google Scholar
Mork, P., Shaker, R., Halevy, A., Tarczy-Hornoch, P.: PQL: A declarative query language over dynamic biological schemata. In: Proceedings of the American Medical Informatics Association Annual Symposium 2002, pp. 533–537 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

HIIT Basic Research Unit,Department of Computer Science, University of Helsinki, P.O. Box 68, FI-00014, Finland
Petteri Sevon, Lauri Eronen, Petteri Hintsanen, Kimmo Kulovesi & Hannu Toivonen

Authors

Petteri Sevon
View author publications
You can also search for this author in PubMed Google Scholar
Lauri Eronen
View author publications
You can also search for this author in PubMed Google Scholar
Petteri Hintsanen
View author publications
You can also search for this author in PubMed Google Scholar
Kimmo Kulovesi
View author publications
You can also search for this author in PubMed Google Scholar
Hannu Toivonen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Humboldt-Universität zu Berlin,
Ulf Leser
Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Felix Naumann
IBM Application and Integration Middleware, 1475 Phoenixville Pike, 19380, West Chester, PA, USA
Barbara Eckman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sevon, P., Eronen, L., Hintsanen, P., Kulovesi, K., Toivonen, H. (2006). Link Discovery in Graphs Derived from Biological Databases. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_5

Download citation

DOI: https://doi.org/10.1007/11799511_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36593-8
Online ISBN: 978-3-540-36595-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics