Why Enriching Business Transactions with Linked Open Data May Be Problematic in Classification Tasks

Folkestad, Eirik; Vollset, Erlend; Gallala, Marius Rise; Gulla, Jon Atle

doi:10.1007/978-3-319-69548-8_24

Eirik Folkestad¹¹,
Erlend Vollset¹¹,
Marius Rise Gallala¹² &
…
Jon Atle Gulla¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 786))

Included in the following conference series:

International Conference on Knowledge Engineering and the Semantic Web

762 Accesses
2 Citations

Abstract

Linked Open Data has proven useful in disambiguation and query extension tasks, but their incomplete and inconsistent nature may make them less useful in analyzing brief, low-level business transactions. In this paper, we investigate the effect of using Wikidata and DBpedia to aid in classification of real bank transactions. The experiments indicate that Linked Open Data may have the potential to supplement transaction classification systems effectively. However, given the nature of the transaction data used in this research and the current state of Wikidata and DBpedia, the extracted data has in fact a negative impact the accuracy on the classification model when compared to the Baseline approach. The Baseline approach produces an accuracy score of 88,60% where the Wikidata, DBpedia and their combined approaches yield accuracy scores of 84,99%, 86,65% and 83,48%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wikidata DBpedia. http://wikidata.dbpedia.org/. Accessed 10 June 2017
Fellbaum, C., “What is WordNet?”. In: Brown (2005). WordNet and wordnets. https://wordnet.princeton.edu/. Accessed 15 June 2017
Natural Language Toolkit. NLTK Project (2017). http://www.nltk.org/. Accessed 13 June 2017
Chaput, M.: About Whoosh (2012). http://whoosh.readthedocs.io/en/latest/intro.html#about-whoosh. Accessed 09 June 2017
Yandex (2017). https://yandex.com/company/general_info/yandex_today/. Accessed 30 May 2017
RDF Working Group: Resource Description Framework (RDF) (2004). https://www.w3.org/RDF/. Accessed 29 May 2017
Xiong, C., Callan J.: Query expansion with freebase. In: Proceedings of the 2015 International Conference on the Theory of Information Retrieval, 27–30 September, Northampton, Massachusetts, USA (2015)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)
Google Scholar
Van Asch, V.: Macro-and micro-averaged evaluation measures (2013). https://www.semanticscholar.org/
Skeppe, L.B.: Classifying Swedish Bank Transactions with Early and Late Fusion Techniques. Master thesis, KTH Royal Institute of Technology, Stockholm (2014)
Google Scholar
Perlich, C.: Which is your favourite Machine Learning Algorithm? (2016). http://www.kdnuggets.com/2016/09/perlich-favorite-machine-learning-algorithm.html
Vollset, E., Folkestad, E.: Automatic Classification of Bank Transactions. Master thesis, Norwegian University of Science and Technology, Trondheim (2017)
Google Scholar
Iftene, A., Baboi, A.M.: Using semantic resources in image retrieval. In: 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES 2016, vol. 96, pp. 436–445. Elsevier (2016)
Google Scholar
Ye, Y., Ma, F., Rong, H., Huang, J.Z.: Improved email classification through enriched feature space." In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management (WAIM) (2004)
Google Scholar
Poyraz, M., Ganiz, M.C., Akyokus, S., Gorener, B., Kilimci, Z.H.: Exploiting Turkish Wikipedia as a semantic resource for text classification. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–5 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
Eirik Folkestad, Erlend Vollset & Jon Atle Gulla
Sparebank1 SMN, Trondheim, Norway
Marius Rise Gallala

Authors

Eirik Folkestad
View author publications
You can also search for this author in PubMed Google Scholar
Erlend Vollset
View author publications
You can also search for this author in PubMed Google Scholar
Marius Rise Gallala
View author publications
You can also search for this author in PubMed Google Scholar
Jon Atle Gulla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eirik Folkestad .

Editor information

Editors and Affiliations

West Pomeranian University of Technology in Szczecin, Szczecin, Poland
Przemysław Różewski
University of Bonn, Bonn, Germany
Christoph Lange

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Folkestad, E., Vollset, E., Gallala, M.R., Gulla, J.A. (2017). Why Enriching Business Transactions with Linked Open Data May Be Problematic in Classification Tasks. In: Różewski, P., Lange, C. (eds) Knowledge Engineering and Semantic Web. KESW 2017. Communications in Computer and Information Science, vol 786. Springer, Cham. https://doi.org/10.1007/978-3-319-69548-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-69548-8_24
Published: 18 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69547-1
Online ISBN: 978-3-319-69548-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Why Enriching Business Transactions with Linked Open Data May Be Problematic in Classification Tasks