Abstract
An approach to semantic text similarity matching is concept-based characterization of entities and themes that can be automatically extracted from content. This is useful to build an effective recommender system on top of similarity measures and its usage for document retrieval and ranking. In this work, our research goal is to create an expert system for education recommendation, based on skills, capabilities, areas of expertise present in someone’s curriculum vitae and personal preferences. This form of semantic text matching challenge needs to take into account all the personal educational experiences (formal, informal, and on-the-job), but also work-related know-how, to create a concept based profile of the person. This will allow a reasoned matching process from CVs and career vision to descriptions of education programs. Taking inspiration from the explicit semantic analysis (ESA), we developed a domain-specific approach to semantically characterize short texts and to compare their content for semantic similarity. Thanks to an enriching and a filtering process, we transform the general purpose German Wikipedia into a domain specific model for our task. The domain is defined also through a German knowledge base or vocabulary of description for educational experiences and for job offers. Initial testing with a small set of documents demonstrated that our approach covers the main requirements and can match semantically similar text content. This is applied in a use case and lead to the implementation of an education recommender system prototype.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
identification of the base word, by removal of derived or inflected variations.
References
J.E. Alvarez, H. Bast, A review of word embedding and document similarity algorithms applied to academic text, in Bachelor’s Thesis, University of Freiburg (2017). https://pdfs.semanticscholar.org/0502/05c30069de7df8164f2e4a368e6fa2b804d9.pdf
O. Egozi, S. Markovitch, E. Gabrilovich, Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. (TOIS) 29(2), 8 (2011)
Y. Song, D. Roth, Unsupervised sparse vector densification for short text similarity, in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015), pp. 1275–1280
E. Gabrilovich, S. Markovitch, Computing semantic relatedness using Wikipedia-based explicit semantic analysis, in IJcAI, vol. 7 (2007), pp. 1606–1611
D. Bogdanova, M. Yazdani, SESA: Supervised Explicit Semantic Analysis. arXiv preprint arXiv:1708.03246 (2017)
M. Pagliardini, P. Gupta, M. Jaggi, Unsupervised Learning of Sentence Embeddings Using Compositional n-gram Features arXiv preprint arXiv:1703.02507 (2017)
A. Waldis, L. Mazzola, M. Kaufmann, Concept extraction with convolutional neural networks, in Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA 2018), vol. 1 (2018), pp. 118–129
K. Bennani-Smires, C. Musat, M. Jaggi, A. Hossmann, M. Baeriswyl, EmbedRank: Unsupervised Keyphrase Extraction Using Sentence Embeddings. arXiv preprint arXiv:1801.04470 (2018)
Y. Yao et al., Granular computing: basic issues and possible solutions, in Proceedings of the 5th Joint Conference on Information Sciences, vol. 1 (2000), pp. 186–189
C. Mencar, Theory of Fuzzy Information Granulation: Contributions to Interpretability Issues (University of Bari, 2005), pp. 3–8
M.M. Gupta, R.K. Ragade, R.R. Yager, Advances in Fuzzy Set Theory and Applications (North-Holland Publishing Company, 1979)
G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
K. Lund, C. Burgess, Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)
Acknowledgements
The research leading to this work was partially financed by the KTI/Innosuisse Swiss federal agency, through a competitive call. The financed project KTI-Nr. 27104.1 is called CVCube: digitale Aus- und Weiterbildungsberatung mittels Bildungsgraphen. The authors would like to thank the business project partner for the fruitful discussions and for allowing us to use the examples in this publication. We would like to thank Benjamin Haymond for his very helpful and precise revision and language editing support of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mazzola, L., Siegfried, P., Waldis, A., Stalder, F., Denzler, A., Kaufmann, M. (2020). A Domain Specific ESA Method for Semantic Text Matching. In: Jardim-Goncalves, R., Sgurev, V., Jotsov, V., Kacprzyk, J. (eds) Intelligent Systems: Theory, Research and Innovation in Applications. Studies in Computational Intelligence, vol 864. Springer, Cham. https://doi.org/10.1007/978-3-030-38704-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-38704-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38703-7
Online ISBN: 978-3-030-38704-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)