Skip to main content

Why Enriching Business Transactions with Linked Open Data May Be Problematic in Classification Tasks

  • Conference paper
  • First Online:
Knowledge Engineering and Semantic Web (KESW 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 786))

Included in the following conference series:

Abstract

Linked Open Data has proven useful in disambiguation and query extension tasks, but their incomplete and inconsistent nature may make them less useful in analyzing brief, low-level business transactions. In this paper, we investigate the effect of using Wikidata and DBpedia to aid in classification of real bank transactions. The experiments indicate that Linked Open Data may have the potential to supplement transaction classification systems effectively. However, given the nature of the transaction data used in this research and the current state of Wikidata and DBpedia, the extracted data has in fact a negative impact the accuracy on the classification model when compared to the Baseline approach. The Baseline approach produces an accuracy score of 88,60% where the Wikidata, DBpedia and their combined approaches yield accuracy scores of 84,99%, 86,65% and 83,48%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wikidata DBpedia. http://wikidata.dbpedia.org/. Accessed 10 June 2017

  2. Fellbaum, C., “What is WordNet?”. In: Brown (2005). WordNet and wordnets. https://wordnet.princeton.edu/. Accessed 15 June 2017

  3. Natural Language Toolkit. NLTK Project (2017). http://www.nltk.org/. Accessed 13 June 2017

  4. Chaput, M.: About Whoosh (2012). http://whoosh.readthedocs.io/en/latest/intro.html#about-whoosh. Accessed 09 June 2017

  5. Yandex (2017). https://yandex.com/company/general_info/yandex_today/. Accessed 30 May 2017

  6. RDF Working Group: Resource Description Framework (RDF) (2004). https://www.w3.org/RDF/. Accessed 29 May 2017

  7. Xiong, C., Callan J.: Query expansion with freebase. In: Proceedings of the 2015 International Conference on the Theory of Information Retrieval, 27–30 September, Northampton, Massachusetts, USA (2015)

    Google Scholar 

  8. Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)

    Google Scholar 

  9. Van Asch, V.: Macro-and micro-averaged evaluation measures (2013). https://www.semanticscholar.org/

  10. Skeppe, L.B.: Classifying Swedish Bank Transactions with Early and Late Fusion Techniques. Master thesis, KTH Royal Institute of Technology, Stockholm (2014)

    Google Scholar 

  11. Perlich, C.: Which is your favourite Machine Learning Algorithm? (2016). http://www.kdnuggets.com/2016/09/perlich-favorite-machine-learning-algorithm.html

  12. Vollset, E., Folkestad, E.: Automatic Classification of Bank Transactions. Master thesis, Norwegian University of Science and Technology, Trondheim (2017)

    Google Scholar 

  13. Iftene, A., Baboi, A.M.: Using semantic resources in image retrieval. In: 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES 2016, vol. 96, pp. 436–445. Elsevier (2016)

    Google Scholar 

  14. Ye, Y., Ma, F., Rong, H., Huang, J.Z.: Improved email classification through enriched feature space." In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management (WAIM) (2004)

    Google Scholar 

  15. Poyraz, M., Ganiz, M.C., Akyokus, S., Gorener, B., Kilimci, Z.H.: Exploiting Turkish Wikipedia as a semantic resource for text classification. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–5 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eirik Folkestad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Folkestad, E., Vollset, E., Gallala, M.R., Gulla, J.A. (2017). Why Enriching Business Transactions with Linked Open Data May Be Problematic in Classification Tasks. In: Różewski, P., Lange, C. (eds) Knowledge Engineering and Semantic Web. KESW 2017. Communications in Computer and Information Science, vol 786. Springer, Cham. https://doi.org/10.1007/978-3-319-69548-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69548-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69547-1

  • Online ISBN: 978-3-319-69548-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics