Abstract
Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies are playing a leading role in making such datasets widely available. Scientists can mine these datasets to discover patterns of annotation. While ontology alignment and integration across datasets has been explored in the context of the semantic Web, there is no current approach to mine such patterns in annotation graph datasets. In this paper, we propose a novel approach for link prediction; it is a preliminary task when discovering more complex patterns. Our prediction is based on a complementary methodology of graph summarization (GS) and dense subgraphs (DSG). GS can exploit and summarize knowledge captured within the ontologies and in the annotation patterns. DSG uses the ontology structure, in particular the distance between CV terms, to filter the graph, and to find promising subgraphs. We develop a scoring function based on multiple heuristics to rank the predictions. We perform an extensive evaluation on Arabidopsis thaliana genes.
Chapter PDF
References
Benchettara, N., Kanawati, R., Rouveirol, C.: Supervised machine learning applied to link prediction in bipartite social networks. In: Proc. ASONAM, pp. 326–330 (2010)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)
Bogdanov, P., Singh, A.K.: Molecular Function Prediction Using Neighborhood Features. IEEE/ACM Trans. Comput. Biology Bioinform. 7(2), 208–217 (2010)
Charikar, M.: Greedy Approximation Algorithms for Finding Dense Components in a Graph. In: Jansen, K., Khuller, S. (eds.) APPROX 2000. LNCS, vol. 1913, pp. 84–95. Springer, Heidelberg (2000)
Chua, H.N., Sung, W.-K., Wong, L.: An efficient strategy for extensive integration of diverse biological data for protein function prediction. Bioinformatics 23(24), 3364–3373 (2007)
Goldberg, A.V.: Finding a maximum density subgraph. Technical Report UCB/CSD-84-171, EECS Department, University of California, Berkeley (1984)
Hasan, M.A., Chaoji, V., Salem, S., Zaki, M.: Link Prediction Using Supervised Learning. In: Proc. on Link Analysis, Counterterrorism and Security (2006)
Hassanzadeh, O., et al.: Linkedct: A linked data space for clinical trials. In: Proc. WWW 2009 Workshop on Linked Data on the Web, LDOW 2009 (2009)
Jain, P., Yeh, P., Verma, K., Vasquez, R., Damova, M., Hitzler, P., Sheth, A.: Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study With Proton. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 80–92. Springer, Heidelberg (2011)
Kahan, J., Koivunen, M.: Annotea: an open rdf infrastructure for shared web annotations. In: Proc. of the WWW, pp. 623–632 (2001)
Kang, B., Grancher, N., Koyffmann, V., Lardemer, D., Burney, S., Ahmad, M.: Multiple interactions between cryptochrome and phototropin blue-light signalling pathways in arabidopsis thaliana. Planta 227(5), 1091–1099 (2008)
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18, 39–40 (1953)
Khuller, S., Saha, B.: On Finding Dense Subgraphs. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009. LNCS, vol. 5555, pp. 597–608. Springer, Heidelberg (2009)
Kortsarz, G., Peleg, D.: Generating sparse 2-spanners. J. Algorithms 17(2), 222–236 (1994)
Kunegis, J., De Luca, E., Albayrak, S.: The Link Prediction Problem in Bipartite Networks. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010. LNCS, vol. 6178, pp. 380–389. Springer, Heidelberg (2010)
Lawler, E.: Combinatorial optimization - networks and matroids. Holt, Rinehart and Winston, New York (1976)
Lee, I., Ambaru, B., Thakkar, P., Marcotte, E., Rhee, S.: Rational association of genes with traits using a genome-scale gene network for arabidopsis thaliana. Nature Biotechnology (28), 149–156 (2010)
Liben-Nowell, D., Kleinberg, J.M.: The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology (JASIST) 58(7), 1019–1031 (2007)
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Mir, S., Staab, S., Rojas, I.: An Unsupervised Approach for Acquiring Ontologies and RDF Data from Online Life Science Databases. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010, Part II. LNCS, vol. 6089, pp. 319–333. Springer, Heidelberg (2010)
Namata, G.M., Sharara, H., Getoor, L.: A Survey of Link Mining Tasks for Analyzing Noisy an Incomplete Networks. In: Philip, J.H., Yu, S.S., Faloutsos, C. (eds.) Link Mining: Models, Algorithms, and Applications. Springer, Heidelberg (2010)
Navlakha, S., Kingsford, C.: Exploring biological network dynamics with ensembles of graph partitions. In: Proc. 15th Intl. Pacific Symposium on Biocomputing (PSB), vol. 15, pp. 166–177 (2010)
Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proc. of Conference on Management of Data, SIGMOD (2008)
Ohgishi, M., Saji, K., Okada, K., Sakai, T.: Functional analysis of each blue light receptor, cry1, cry2, phot1, and phot2, by using combinatorial multiple mutants in arabidopsis. Proc. of the National Academy of Sciences 1010(8), 2223–2228 (2004)
Parundekar, R., Knoblock, C., Ambite, J.: Linking and Building Ontologies of Linked Data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 598–614. Springer, Heidelberg (2010)
Pujari, M., Kanawati, R.: A supervised machine learning link prediction approach for tag recommendation. In: Proc. of HCI (2011)
Saha, B., Hoch, A., Khuller, S., Raschid, L., Zhang, X.-N.: Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 456–472. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thor, A. et al. (2011). Link Prediction for Annotation Graphs Using Graph Summarization. In: Aroyo, L., et al. The Semantic Web – ISWC 2011. ISWC 2011. Lecture Notes in Computer Science, vol 7031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25073-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-25073-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25072-9
Online ISBN: 978-3-642-25073-6
eBook Packages: Computer ScienceComputer Science (R0)