Why Enriching Business Transactions with Linked Open Data May Be Problematic in Classification Tasks

  • Eirik FolkestadEmail author
  • Erlend Vollset
  • Marius Rise Gallala
  • Jon Atle Gulla
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 786)


Linked Open Data has proven useful in disambiguation and query extension tasks, but their incomplete and inconsistent nature may make them less useful in analyzing brief, low-level business transactions. In this paper, we investigate the effect of using Wikidata and DBpedia to aid in classification of real bank transactions. The experiments indicate that Linked Open Data may have the potential to supplement transaction classification systems effectively. However, given the nature of the transaction data used in this research and the current state of Wikidata and DBpedia, the extracted data has in fact a negative impact the accuracy on the classification model when compared to the Baseline approach. The Baseline approach produces an accuracy score of 88,60% where the Wikidata, DBpedia and their combined approaches yield accuracy scores of 84,99%, 86,65% and 83,48%.


Classification Bank transactions Logistic Regression Linked Open Data Wikidata DBpedia 


  1. 1.
    Wikidata DBpedia. Accessed 10 June 2017
  2. 2.
    Fellbaum, C., “What is WordNet?”. In: Brown (2005). WordNet and wordnets. Accessed 15 June 2017
  3. 3.
    Natural Language Toolkit. NLTK Project (2017). Accessed 13 June 2017
  4. 4.
    Chaput, M.: About Whoosh (2012). Accessed 09 June 2017
  5. 5.
    Yandex (2017). Accessed 30 May 2017
  6. 6.
    RDF Working Group: Resource Description Framework (RDF) (2004). Accessed 29 May 2017
  7. 7.
    Xiong, C., Callan J.: Query expansion with freebase. In: Proceedings of the 2015 International Conference on the Theory of Information Retrieval, 27–30 September, Northampton, Massachusetts, USA (2015)Google Scholar
  8. 8.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)Google Scholar
  9. 9.
    Van Asch, V.: Macro-and micro-averaged evaluation measures (2013).
  10. 10.
    Skeppe, L.B.: Classifying Swedish Bank Transactions with Early and Late Fusion Techniques. Master thesis, KTH Royal Institute of Technology, Stockholm (2014)Google Scholar
  11. 11.
    Perlich, C.: Which is your favourite Machine Learning Algorithm? (2016).
  12. 12.
    Vollset, E., Folkestad, E.: Automatic Classification of Bank Transactions. Master thesis, Norwegian University of Science and Technology, Trondheim (2017)Google Scholar
  13. 13.
    Iftene, A., Baboi, A.M.: Using semantic resources in image retrieval. In: 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES 2016, vol. 96, pp. 436–445. Elsevier (2016)Google Scholar
  14. 14.
    Ye, Y., Ma, F., Rong, H., Huang, J.Z.: Improved email classification through enriched feature space." In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management (WAIM) (2004)Google Scholar
  15. 15.
    Poyraz, M., Ganiz, M.C., Akyokus, S., Gorener, B., Kilimci, Z.H.: Exploiting Turkish Wikipedia as a semantic resource for text classification. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–5 (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Eirik Folkestad
    • 1
    Email author
  • Erlend Vollset
    • 1
  • Marius Rise Gallala
    • 2
  • Jon Atle Gulla
    • 1
  1. 1.Department of Computer ScienceNorwegian University of Science and TechnologyTrondheimNorway
  2. 2.Sparebank1 SMNTrondheimNorway

Personalised recommendations