Abstract
Public biological databases contain vast amounts of rich data that can also be used to create and evaluate new biological hypothesis. We propose a method for link discovery in biological databases, i.e., for prediction and evaluation of implicit or previously unknown connections between biological entities and concepts. In our framework, information extracted from available databases is represented as a graph, where vertices correspond to entities and concepts, and edges represent known, annotated relationships between vertices. A link, an (implicit and possibly unknown) relation between two entities is manifested as a path or a subgraph connecting the corresponding vertices. We propose measures for link goodness that are based on three factors: edge reliability, relevance, and rarity. We handle these factors with a proper probabilistic interpretation. We give practical methods for finding and evaluating links in large graphs and report experimental results with Alzheimer genes and protein interactions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Turner, F.S., Clutterbuck, D.R., Semple, C.A.M.: POCUS: Mining genomic sequence annotation to predict disease genes. Genome Biology 4, R75 (2003)
Perez-Iratxeta, C., Wjst, M., Bork, P., Andrade, M.A.: G2D: A tool for mining genes associated with disease. BMC Genetics 6, 45 (2005)
Colbourn, C.J.: The Combinatorics of Network Reliability. Oxford University Press, Oxford (1987)
Getoor, L., Diehl, C.P.: Link mining: A survey. SIGKDD Explorations 7, 3–12 (2005)
Swanson, D.R.: Fish oil, Raynaud’s syndrome and undiscovered public knowledge. Perspectives in Biology and Medicine 30, 7–18 (1986)
Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence 91, 183–203 (1997)
Liben-Nowell, D., Kleinberg, J.: The link prediction problem fof social networks. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM 2003), pp. 556–559 (2003)
Lin, S., Chalupsky, H.: Unsupervised link discovery in multi-relational data via rarity analysis. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003), pp. 171–178 (2003)
Faloutsos, C., McCurley, K.S., Tomkins, A.: Fast discovery of connection subgraphs. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 118–127 (2004)
Asthana, S., King, O.D., Gibbons, F.D., Roth, F.P.: Predicting protein complex memebership using probabilistic network reliability. Genome Research 14, 1170–1175 (2004)
Ramakrishnan, C., Milnor, W.H., Perry, M., Sheth, A.P.: Discovering informative connection subgraphs in multi-relational graphs. SIGKDD Explorations 7, 56–63 (2005)
Tarjan, R.E.: Data Structures and Network Algorithms. CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1983)
Eppstein, D.: Finding the k shortest paths. SIAM Journal on Computing 28, 652–673 (1998)
Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM Journal on Computing 8, 410–421 (1979)
Lacroix, Z., Raschid, L., Vidal, M.-E.: Efficient techniques to explore and rank paths in life science data sources. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 187–202. Springer, Heidelberg (2004)
Mork, P., Shaker, R., Halevy, A., Tarczy-Hornoch, P.: PQL: A declarative query language over dynamic biological schemata. In: Proceedings of the American Medical Informatics Association Annual Symposium 2002, pp. 533–537 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sevon, P., Eronen, L., Hintsanen, P., Kulovesi, K., Toivonen, H. (2006). Link Discovery in Graphs Derived from Biological Databases. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_5
Download citation
DOI: https://doi.org/10.1007/11799511_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36593-8
Online ISBN: 978-3-540-36595-2
eBook Packages: Computer ScienceComputer Science (R0)