Tackling the Class-Imbalance Learning Problem in Semantic Web Knowledge Bases

Rizzo, Giuseppe; d’Amato, Claudia; Fanizzi, Nicola; Esposito, Floriana

doi:10.1007/978-3-319-13704-9_35

Tackling the Class-Imbalance Learning Problem in Semantic Web Knowledge Bases

Giuseppe Rizzo²³,
Claudia d’Amato²³,
Nicola Fanizzi²³ &
…
Floriana Esposito²³

Conference paper

1289 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8876))

Abstract

In the Semantic Web context, procedures for deciding the class-membership of an individual to a target concept in a knowledge base are generally based on automated reasoning. However, frequent cases of incompleteness/inconsistency due to distributed, heterogeneous nature and the Web-scale dimension of the knowledge bases. It has been shown that resorting to models induced from the data may offer comparably effective and efficient solutions for these cases, although skewness in the instance distribution may affect the quality of such models. This is known as class-imbalance problem. We propose a machine learning approach, based on the induction of Terminological Random Forests, that is an extension of the notion of Random Forest to cope with this problem in case of knowledge bases expressed through the standard Web ontology languages. Experimentally we show the feasibility of our approach and its effectiveness w.r.t. related methods, especially with imbalanced datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers (2011)
Google Scholar
d’Amato, C., Fanizzi, N., Esposito, F.: Inductive learning for the Semantic Web: What does it buy? Semant. Web 1, 53–59 (2010)
Google Scholar
Rettinger, A., Lösch, U., Tresp, V., d’Amato, C., Fanizzi, N.: Mining the Semantic Web - Statistical learning for next generation knowledge bases. Data Min. Knowl. Discov. 24, 613–662 (2012)
Article MATH MathSciNet Google Scholar
Rettinger, A., Nickles, M., Tresp, V.: Statistical relational learning with formal ontologies. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 286–301. Springer, Heidelberg (2009)
Chapter Google Scholar
Minervini, P., d’Amato, C., Fanizzi, N., Esposito, F.: Transductive inference for class-membership propagation in web ontologies. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 457–471. Springer, Heidelberg (2013)
Chapter Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. on Knowl. and Data Eng. 21, 1263–1284 (2009)
Article Google Scholar
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications, 1st edn. Wiley-IEEE Press (2013)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) ICML, pp. 148–156. Morgan Kaufmann (1996)
Google Scholar
Fanizzi, N., d’Amato, C., Esposito, F.: Induction of concepts in web ontologies through terminological decision trees. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 442–457. Springer, Heidelberg (2010)
Chapter Google Scholar
Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.): The Description Logic Handbook. 2nd edn. Cambridge University Press (2007)
Google Scholar
Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision trees. Artif. Intell. 101(1-2), 285–297 (1998)
Article MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Chen, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. Technical report, Department of Statistics, University of Berkeley (2004)
Google Scholar
Assche, A.V., Vens, C., Blockeel, H., Dzeroski, S.: First order random forests: Learning relational classifiers with complex aggregates. Machine Learning 64, 149–182 (2006)
Article MATH Google Scholar
Li, B., Chen, X., Li, M.J., Huang, J.Z., Feng, S.: Scalable random forests for massive data. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part I. LNCS, vol. 7301, pp. 135–146. Springer, Heidelberg (2012)
Chapter Google Scholar
Fu, B., Wang, Z., Pan, R., Xu, G., Dolog, P.: An integrated pruning criterion for ensemble learning based on classification accuracy and diversity. In: Uden, L., Herrera, F., Bajo, J., Corchado, J.M. (eds.) 7th International Conference on KMO. AISC, vol. 172, pp. 47–58. Springer, Heidelberg (2013), http://dx.doi.org/10.1007/978-3-642-30867-3_5
Google Scholar
Yin, X.C., Yang, C., Hao, H.W.: Learning to diversify via weighted kernels for classifier ensemble. CoRR abs/1406.1167 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

LACAM – Dipartimento di Informatica, Università degli Studi di Bari “Aldo Moro”, Via E.Orabona 4, 70125, Bari, Italy
Giuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi & Floriana Esposito

Authors

Giuseppe Rizzo
View author publications
You can also search for this author in PubMed Google Scholar
Claudia d’Amato
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Fanizzi
View author publications
You can also search for this author in PubMed Google Scholar
Floriana Esposito
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Geography, University of California, Santa Barbara, CA, USA
Krzysztof Janowicz
Dept. of Computer Science, VU University Amsterdam, The Netherlands
Stefan Schlobach
University of Linköping, Sweden
Patrick Lambrix
Aalto University, Finland
Eero Hyvönen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rizzo, G., d’Amato, C., Fanizzi, N., Esposito, F. (2014). Tackling the Class-Imbalance Learning Problem in Semantic Web Knowledge Bases. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds) Knowledge Engineering and Knowledge Management. EKAW 2014. Lecture Notes in Computer Science(), vol 8876. Springer, Cham. https://doi.org/10.1007/978-3-319-13704-9_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-13704-9_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13703-2
Online ISBN: 978-3-319-13704-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics