Abstract
The increasing availability of large RDF datasets offers an exciting opportunity to use such data to build predictive models using machine learning algorithms. However, the massive size and distributed nature of RDF data calls for approaches to learning from RDF data in a setting where the data can be accessed only through a query interface, e.g., the SPARQL endpoint of the RDF store. In applications where the data are subject to frequent updates, there is a need for algorithms that allow the predictive model to be incrementally updated in response to changes in the data. Furthermore, in some applications, the attributes that are relevant for specific prediction tasks are not known a priori and hence need to be discovered by the algorithm. We present an approach to learning Relational Bayesian Classifiers (RBCs) from RDF data that addresses such scenarios. Specifically, we show how to build RBCs from RDF data using statistical queries through the SPARQL endpoint of the RDF store. We compare the communication complexity of our algorithm with one that requires direct centralized access to the data and hence has to retrieve the entire RDF dataset from the remote location for processing. We establish the conditions under which the RBC models can be incrementally updated in response to addition or deletion of RDF data. We show how our approach can be extended to the setting where the attributes that are relevant for prediction are not known a priori, by selectively crawling the RDF data for attributes of interest. We provide open source implementation and evaluate the proposed approach on several large RDF datasets.
Chapter PDF
References
Ackerson, L.K., Viswanath, K.: Communication inequalities, social determinants, and intermittent smoking in the 2003 health information national trends survey. Prev. Chronic. Dis. 6(2) (2009)
Antoniou, G., van Harmelen, F.: A Semantic Web Primer, 2nd edn. MIT Press, Cambridge (2008)
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American (2001)
Bicer, V., Tran, T., Gossen, A.: Relational Kernel Machines for Learning from Graph-Structured RDF Data. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 47–62. Springer, Heidelberg (2011)
Breitmann, K., Casanova, M., Truszkowski, W.: Semantic Web: Concepts, Technologies and Applications. Springer (2007)
Caragea, D., Zhang, J., Bao, J., Pathak, J., Honavar, V.: Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous, Distributed Information Sources. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, p. 14. Springer, Heidelberg (2005)
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley Interscience, New York (1991)
Cyganiak, R., Jentzsch, A.: Linking open data cloud diagram, http://lod-cloud.net/ (accessed 2011)
Ding, L., DiFranzo, D., Graves, A., Michaelis, J.R., Li, X., McGuinness, D.L., Hendler, J.: Data-gov wiki: Towards linking government data. In: AAAI Spring Symposium on Linked Data Meets Artificial Intelligence (2010)
Ding, L., DiFranzo, D., Graves, A., Michaelis, J.R., Li, X., McGuinness, D.L., Hendler, J.A.: TWC data-gov corpus: incrementally generating linked government data from data. gov. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1383–1386 (2010)
Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning. The MIT Press (2007)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Hassanzadeh, O., Consens, M.: Linked movie data base. In: WWW 2009 LDOW Workshop (2009)
Hendler, J.: Science and the semantic web. Science 299, 520–521 (2003)
Hitzler, P., Krötzsch, M., Rudolph, S.: Foundations of Semantic Web Technologies. Chapman & Hall/CRC (2009)
Hung, E., Deng, Y., Subrahmanian, V.S.: RDF aggregate queries and views. In: 21st International Conference on Data Engineering, pp. 717–728 (2005)
Kiefer, C., Bernstein, A., Locher, A.: Adding Data Mining Support to SPARQL Via Statistical Relational Learning Methods. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 478–492. Springer, Heidelberg (2008)
Korf, R.E.: Depth-first iterative-deepening: an optimal admissible tree search. Artif. Intell. 27, 97–109 (1985)
Koul, N., Bui, N., Honavar, V.: Scalable, updatable predictive models for sequence data. In: BIBM, pp. 681–685 (2010)
Koul, N., Caragea, C., Honavar, V., Bahirwani, V., Caragea, D.: Learning classifiers from large databases using statistical queries. In: Web Intelligence, pp. 923–926 (2008)
Koul, N., Lin, H.T.: Indus learning framework. Google Code (2011), http://code.google.com/p/induslearningframework/
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers, Norwell (1998)
Manola, F., Miller, E. (eds.): RDF Primer. W3C Recommendation. World Wide Web Consortium (February 2004)
Nelson, D., Kreps, G., Hesse, B., Croyle, R., Willis, G., Arora, N., Rimer, B., Viswanath, K.V., Weinstein, N., Alden, S.: The health information national trends survey (HINTS): Development, design, and dissemination. Journal of Health Communication: International Perspectives 9(5), 443–460 (2004)
Neville, J., Jensen, D., Gallagher, B.: Simple estimators for relational bayesian classifiers. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 609–612 (2003)
Prud’ommeaux, E., Seaborne, A.: SPARQL query language for RDF, http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/ (accessed 2011)
Tauberer, J.: The 2000, U.S. census: 1 billion RDF triples, http://www.rdfabout.com/demo/census/ (accessed 2011)
Tresp, V., Huang, Y., Bundschus, M., Rettinger, A.: Materializing and querying learned knowledge. In: Proceedings of the ESWC 2009 IRMLeS Workshop (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, H.T., Koul, N., Honavar, V. (2011). Learning Relational Bayesian Classifiers from RDF Data. In: Aroyo, L., et al. The Semantic Web – ISWC 2011. ISWC 2011. Lecture Notes in Computer Science, vol 7031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25073-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-25073-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25072-9
Online ISBN: 978-3-642-25073-6
eBook Packages: Computer ScienceComputer Science (R0)