A Domain Specific ESA Method for Semantic Text Matching

Mazzola, Luca; Siegfried, Patrick; Waldis, Andreas; Stalder, Florian; Denzler, Alexander; Kaufmann, Michael

doi:10.1007/978-3-030-38704-4_2

Luca Mazzola⁶,
Patrick Siegfried⁶,
Andreas Waldis⁶,
Florian Stalder⁶,
Alexander Denzler⁶ &
…
Michael Kaufmann⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 864))

478 Accesses

Abstract

An approach to semantic text similarity matching is concept-based characterization of entities and themes that can be automatically extracted from content. This is useful to build an effective recommender system on top of similarity measures and its usage for document retrieval and ranking. In this work, our research goal is to create an expert system for education recommendation, based on skills, capabilities, areas of expertise present in someone’s curriculum vitae and personal preferences. This form of semantic text matching challenge needs to take into account all the personal educational experiences (formal, informal, and on-the-job), but also work-related know-how, to create a concept based profile of the person. This will allow a reasoned matching process from CVs and career vision to descriptions of education programs. Taking inspiration from the explicit semantic analysis (ESA), we developed a domain-specific approach to semantically characterize short texts and to compare their content for semantic similarity. Thanks to an enriching and a filtering process, we transform the general purpose German Wikipedia into a domain specific model for our task. The domain is defined also through a German knowledge base or vocabulary of description for educational experiences and for job offers. Initial testing with a small set of documents demonstrated that our approach covers the main requirements and can match semantically similar text content. This is applied in a use case and lead to the implementation of an education recommender system prototype.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
identification of the base word, by removal of derived or inflected variations.

References

J.E. Alvarez, H. Bast, A review of word embedding and document similarity algorithms applied to academic text, in Bachelor’s Thesis, University of Freiburg (2017). https://pdfs.semanticscholar.org/0502/05c30069de7df8164f2e4a368e6fa2b804d9.pdf
O. Egozi, S. Markovitch, E. Gabrilovich, Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. (TOIS) 29(2), 8 (2011)
Google Scholar
Y. Song, D. Roth, Unsupervised sparse vector densification for short text similarity, in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015), pp. 1275–1280
Google Scholar
E. Gabrilovich, S. Markovitch, Computing semantic relatedness using Wikipedia-based explicit semantic analysis, in IJcAI, vol. 7 (2007), pp. 1606–1611
Google Scholar
D. Bogdanova, M. Yazdani, SESA: Supervised Explicit Semantic Analysis. arXiv preprint arXiv:1708.03246 (2017)
M. Pagliardini, P. Gupta, M. Jaggi, Unsupervised Learning of Sentence Embeddings Using Compositional n-gram Features arXiv preprint arXiv:1703.02507 (2017)
A. Waldis, L. Mazzola, M. Kaufmann, Concept extraction with convolutional neural networks, in Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA 2018), vol. 1 (2018), pp. 118–129
Google Scholar
K. Bennani-Smires, C. Musat, M. Jaggi, A. Hossmann, M. Baeriswyl, EmbedRank: Unsupervised Keyphrase Extraction Using Sentence Embeddings. arXiv preprint arXiv:1801.04470 (2018)
Y. Yao et al., Granular computing: basic issues and possible solutions, in Proceedings of the 5th Joint Conference on Information Sciences, vol. 1 (2000), pp. 186–189
Google Scholar
C. Mencar, Theory of Fuzzy Information Granulation: Contributions to Interpretability Issues (University of Bari, 2005), pp. 3–8
Google Scholar
M.M. Gupta, R.K. Ragade, R.R. Yager, Advances in Fuzzy Set Theory and Applications (North-Holland Publishing Company, 1979)
Google Scholar
G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
K. Lund, C. Burgess, Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)
Article Google Scholar

Download references

Acknowledgements

The research leading to this work was partially financed by the KTI/Innosuisse Swiss federal agency, through a competitive call. The financed project KTI-Nr. 27104.1 is called CVCube: digitale Aus- und Weiterbildungsberatung mittels Bildungsgraphen. The authors would like to thank the business project partner for the fruitful discussions and for allowing us to use the examples in this publication. We would like to thank Benjamin Haymond for his very helpful and precise revision and language editing support of this manuscript.

Author information

Authors and Affiliations

School of Information Technology, Lucerne University of Applied Sciences, 6343, Rotkreuz, Switzerland
Luca Mazzola, Patrick Siegfried, Andreas Waldis, Florian Stalder, Alexander Denzler & Michael Kaufmann

Authors

Luca Mazzola
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Siegfried
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Waldis
View author publications
You can also search for this author in PubMed Google Scholar
Florian Stalder
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Denzler
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kaufmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Kaufmann .

Editor information

Editors and Affiliations

Department of Electrical Engineering and Computers, Faculdade de Ciências e Tecnologia, UNINOVA-CTS Centre of Technology and Systems, Universidade Nova de Lisboa, Caparica, Portugal
Ricardo Jardim-Goncalves
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
Vassil Sgurev
University of Library Studies and Information Technologies, Sofia, Bulgaria
Vladimir Jotsov
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mazzola, L., Siegfried, P., Waldis, A., Stalder, F., Denzler, A., Kaufmann, M. (2020). A Domain Specific ESA Method for Semantic Text Matching. In: Jardim-Goncalves, R., Sgurev, V., Jotsov, V., Kacprzyk, J. (eds) Intelligent Systems: Theory, Research and Innovation in Applications. Studies in Computational Intelligence, vol 864. Springer, Cham. https://doi.org/10.1007/978-3-030-38704-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-38704-4_2
Published: 04 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38703-7
Online ISBN: 978-3-030-38704-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics