Skip to main content

Applying Predictive Models to Support skos:ExactMatch Validation

  • Conference paper
  • First Online:
Metadata and Semantic Research (MTSR 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1057))

Included in the following conference series:

  • 763 Accesses

Abstract

The paper investigates the use of Machine Learning (ML) to support experts validating skos:exactMatch links. It trains ML techniques provided by RapidMiner with manually validated links and shows how to use the obtained predictive models for saving expert efforts. The obtained results are preliminary but encouraging: the trained predictive models reduce up to 70% the number of manual checking required from experts, leaving only 10% of the wrong links unnoticed. Cutting the 70% of the expert burden is crucial, especially when dealing with the validation of large sets of links.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://developers.google.com/machine-learning/crash-course/ml-intro.

  2. 2.

    https://pypi.org/project/textdistance/.

  3. 3.

    https://radimrehurek.com/gensim.

  4. 4.

    https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/.

  5. 5.

    This strategy works for languages such as English, Italian, Spanish which uses spaces/hyphen for dividing compound words. It may not work for German and Dutch where compound words are represented differently.

  6. 6.

    https://github.com/riccardoAlbertoni/LinkCorrectness/blob/master/PreparingFeaturesForLinksetCorrectness.ipynb.

References

  1. Albertoni, R., de Martino, M., Podestà, P., Abecker, A., Wössner, R., Schnitter, K.: LusTRE: a framework of linked environmental thesauri for metadata management. Earth Sci. Inf. 11(4), 525–544 (2018)

    Article  Google Scholar 

  2. Albertoni, R., De Martino, M., Podestà, P.: Quality measures for skos: ExactMatch linksets: an application to the thesaurus framework LusTRE. Data Technol. Appl. 52(3), 405–423 (2018)

    Article  Google Scholar 

  3. Raad, J., Beek, W., van Harmelen, F., Pernelle, N., Saïs, F.: Detecting erroneous identity links on the web using network metrics. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 391–407. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_23

    Chapter  Google Scholar 

  4. Valdestilhas, A., Soru, T., Ngomo, A.C.N.: CEDAL: time-efficient detection of erroneous links in large-scale link repositories. In: Proceedings of the International Conference on Web Intelligence, Leipzig, Germany. pp. 106–113 (2017)

    Google Scholar 

  5. Papaleo, L., Pernelle, N., Saïs, F., Dumont, C.: Logical detection of invalid SameAs statements in RDF data. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS (LNAI), vol. 8876, pp. 373–384. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13704-9_29

    Chapter  Google Scholar 

  6. Paulheim, H.: Identifying wrong links between datasets by multi-dimensional outlier detection. In: WoDOOM 2014, Co-located with ESWC 2014, Anissaras/Hersonissou, Greece, pp. 27–38 (2014)

    Google Scholar 

  7. Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., Lehmann, J.: Crowdsourcing linked data quality assessment. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 260–276. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_17

    Chapter  Google Scholar 

  8. Zaveri, A., et al.: User-driven quality evaluation of DBpedia. In: I-SEMANTICS 2013, Graz, Austria, 4–6 September 2013, pp. 97–104. ACM (2013)

    Google Scholar 

  9. Rico, M., Mihindukulasooriya, N., Kontokostas, D., Paulheim, H., Hellmann, S., Gómez-Pérez, A.: Predicting incorrect mappings. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing - SAC 2018, pp. 323–330. ACM Press, USA (2018)

    Google Scholar 

  10. Albertoni, R., De Martino, M., Podestà, P.: Environmental thesauri under the lens of reusability. In: Kő, A., Francesconi, E. (eds.) EGOVIS 2014. LNCS, vol. 8650, pp. 222–236. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10178-1_18

    Chapter  Google Scholar 

  11. Carusone, A., Olivetta, L.: Thesaurus Italiano di Scienze della Terra. Ist. Poligrafico dello Stato (2006)

    Google Scholar 

  12. Albertoni, R., De Martino, M., Di Franco, S., De Santis, V., Plini, P.: EARTh: an environmental application reference thesaurus in the linked open data cloud. Semant. Web. 5, 165–171 (2014)

    Google Scholar 

  13. Caracciolo, C., et al.: The AGROVOC linked dataset. Semant. Web. 4, 341–348 (2013)

    Google Scholar 

  14. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., et al. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04930-9_41

    Chapter  Google Scholar 

  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona (2013)

    Google Scholar 

  16. Hofmann, M., Klinkenberg, R.: RapidMiner: Data Mining Use Cases and Business Analytics Applications. Chapman & Hall/CRC (2013)

    Google Scholar 

Download references

Acknowledgment

The author thanks RapidMiner GmbH for granting an education license of their studio tool and the EU project eENVPlus for providing data about the validation of links.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Albertoni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Albertoni, R. (2019). Applying Predictive Models to Support skos:ExactMatch Validation. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds) Metadata and Semantic Research. MTSR 2019. Communications in Computer and Information Science, vol 1057. Springer, Cham. https://doi.org/10.1007/978-3-030-36599-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36599-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36598-1

  • Online ISBN: 978-3-030-36599-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics