Abstract
The paper investigates the use of Machine Learning (ML) to support experts validating skos:exactMatch links. It trains ML techniques provided by RapidMiner with manually validated links and shows how to use the obtained predictive models for saving expert efforts. The obtained results are preliminary but encouraging: the trained predictive models reduce up to 70% the number of manual checking required from experts, leaving only 10% of the wrong links unnoticed. Cutting the 70% of the expert burden is crucial, especially when dealing with the validation of large sets of links.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
This strategy works for languages such as English, Italian, Spanish which uses spaces/hyphen for dividing compound words. It may not work for German and Dutch where compound words are represented differently.
- 6.
References
Albertoni, R., de Martino, M., Podestà, P., Abecker, A., Wössner, R., Schnitter, K.: LusTRE: a framework of linked environmental thesauri for metadata management. Earth Sci. Inf. 11(4), 525–544 (2018)
Albertoni, R., De Martino, M., Podestà, P.: Quality measures for skos: ExactMatch linksets: an application to the thesaurus framework LusTRE. Data Technol. Appl. 52(3), 405–423 (2018)
Raad, J., Beek, W., van Harmelen, F., Pernelle, N., Saïs, F.: Detecting erroneous identity links on the web using network metrics. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 391–407. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_23
Valdestilhas, A., Soru, T., Ngomo, A.C.N.: CEDAL: time-efficient detection of erroneous links in large-scale link repositories. In: Proceedings of the International Conference on Web Intelligence, Leipzig, Germany. pp. 106–113 (2017)
Papaleo, L., Pernelle, N., Saïs, F., Dumont, C.: Logical detection of invalid SameAs statements in RDF data. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS (LNAI), vol. 8876, pp. 373–384. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13704-9_29
Paulheim, H.: Identifying wrong links between datasets by multi-dimensional outlier detection. In: WoDOOM 2014, Co-located with ESWC 2014, Anissaras/Hersonissou, Greece, pp. 27–38 (2014)
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., Lehmann, J.: Crowdsourcing linked data quality assessment. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 260–276. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_17
Zaveri, A., et al.: User-driven quality evaluation of DBpedia. In: I-SEMANTICS 2013, Graz, Austria, 4–6 September 2013, pp. 97–104. ACM (2013)
Rico, M., Mihindukulasooriya, N., Kontokostas, D., Paulheim, H., Hellmann, S., Gómez-Pérez, A.: Predicting incorrect mappings. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing - SAC 2018, pp. 323–330. ACM Press, USA (2018)
Albertoni, R., De Martino, M., Podestà, P.: Environmental thesauri under the lens of reusability. In: Kő, A., Francesconi, E. (eds.) EGOVIS 2014. LNCS, vol. 8650, pp. 222–236. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10178-1_18
Carusone, A., Olivetta, L.: Thesaurus Italiano di Scienze della Terra. Ist. Poligrafico dello Stato (2006)
Albertoni, R., De Martino, M., Di Franco, S., De Santis, V., Plini, P.: EARTh: an environmental application reference thesaurus in the linked open data cloud. Semant. Web. 5, 165–171 (2014)
Caracciolo, C., et al.: The AGROVOC linked dataset. Semant. Web. 4, 341–348 (2013)
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., et al. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04930-9_41
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona (2013)
Hofmann, M., Klinkenberg, R.: RapidMiner: Data Mining Use Cases and Business Analytics Applications. Chapman & Hall/CRC (2013)
Acknowledgment
The author thanks RapidMiner GmbH for granting an education license of their studio tool and the EU project eENVPlus for providing data about the validation of links.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Albertoni, R. (2019). Applying Predictive Models to Support skos:ExactMatch Validation. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds) Metadata and Semantic Research. MTSR 2019. Communications in Computer and Information Science, vol 1057. Springer, Cham. https://doi.org/10.1007/978-3-030-36599-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-36599-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36598-1
Online ISBN: 978-3-030-36599-8
eBook Packages: Computer ScienceComputer Science (R0)