Abstract
In the Semantic Web context, procedures for deciding the class-membership of an individual to a target concept in a knowledge base are generally based on automated reasoning. However, frequent cases of incompleteness/inconsistency due to distributed, heterogeneous nature and the Web-scale dimension of the knowledge bases. It has been shown that resorting to models induced from the data may offer comparably effective and efficient solutions for these cases, although skewness in the instance distribution may affect the quality of such models. This is known as class-imbalance problem. We propose a machine learning approach, based on the induction of Terminological Random Forests, that is an extension of the notion of Random Forest to cope with this problem in case of knowledge bases expressed through the standard Web ontology languages. Experimentally we show the feasibility of our approach and its effectiveness w.r.t. related methods, especially with imbalanced datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers (2011)
d’Amato, C., Fanizzi, N., Esposito, F.: Inductive learning for the Semantic Web: What does it buy? Semant. Web 1, 53–59 (2010)
Rettinger, A., Lösch, U., Tresp, V., d’Amato, C., Fanizzi, N.: Mining the Semantic Web - Statistical learning for next generation knowledge bases. Data Min. Knowl. Discov. 24, 613–662 (2012)
Rettinger, A., Nickles, M., Tresp, V.: Statistical relational learning with formal ontologies. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 286–301. Springer, Heidelberg (2009)
Minervini, P., d’Amato, C., Fanizzi, N., Esposito, F.: Transductive inference for class-membership propagation in web ontologies. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 457–471. Springer, Heidelberg (2013)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. on Knowl. and Data Eng. 21, 1263–1284 (2009)
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications, 1st edn. Wiley-IEEE Press (2013)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) ICML, pp. 148–156. Morgan Kaufmann (1996)
Fanizzi, N., d’Amato, C., Esposito, F.: Induction of concepts in web ontologies through terminological decision trees. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 442–457. Springer, Heidelberg (2010)
Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The Description Logic Handbook. 2nd edn. Cambridge University Press (2007)
Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision trees. Artif. Intell. 101(1-2), 285–297 (1998)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Chen, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. Technical report, Department of Statistics, University of Berkeley (2004)
Assche, A.V., Vens, C., Blockeel, H., Dzeroski, S.: First order random forests: Learning relational classifiers with complex aggregates. Machine Learning 64, 149–182 (2006)
Li, B., Chen, X., Li, M.J., Huang, J.Z., Feng, S.: Scalable random forests for massive data. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part I. LNCS, vol. 7301, pp. 135–146. Springer, Heidelberg (2012)
Fu, B., Wang, Z., Pan, R., Xu, G., Dolog, P.: An integrated pruning criterion for ensemble learning based on classification accuracy and diversity. In: Uden, L., Herrera, F., Bajo, J., Corchado, J.M. (eds.) 7th International Conference on KMO. AISC, vol. 172, pp. 47–58. Springer, Heidelberg (2013), http://dx.doi.org/10.1007/978-3-642-30867-3_5
Yin, X.C., Yang, C., Hao, H.W.: Learning to diversify via weighted kernels for classifier ensemble. CoRR abs/1406.1167 (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Rizzo, G., d’Amato, C., Fanizzi, N., Esposito, F. (2014). Tackling the Class-Imbalance Learning Problem in Semantic Web Knowledge Bases. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds) Knowledge Engineering and Knowledge Management. EKAW 2014. Lecture Notes in Computer Science(), vol 8876. Springer, Cham. https://doi.org/10.1007/978-3-319-13704-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-13704-9_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13703-2
Online ISBN: 978-3-319-13704-9
eBook Packages: Computer ScienceComputer Science (R0)