Skip to main content

A Domain Specific ESA Method for Semantic Text Matching

  • Chapter
  • First Online:
Intelligent Systems: Theory, Research and Innovation in Applications

Abstract

An approach to semantic text similarity matching is concept-based characterization of entities and themes that can be automatically extracted from content. This is useful to build an effective recommender system on top of similarity measures and its usage for document retrieval and ranking. In this work, our research goal is to create an expert system for education recommendation, based on skills, capabilities, areas of expertise present in someone’s curriculum vitae and personal preferences. This form of semantic text matching challenge needs to take into account all the personal educational experiences (formal, informal, and on-the-job), but also work-related know-how, to create a concept based profile of the person. This will allow a reasoned matching process from CVs and career vision to descriptions of education programs. Taking inspiration from the explicit semantic analysis (ESA), we developed a domain-specific approach to semantically characterize short texts and to compare their content for semantic similarity. Thanks to an enriching and a filtering process, we transform the general purpose German Wikipedia into a domain specific model for our task. The domain is defined also through a German knowledge base or vocabulary of description for educational experiences and for job offers. Initial testing with a small set of documents demonstrated that our approach covers the main requirements and can match semantically similar text content. This is applied in a use case and lead to the implementation of an education recommender system prototype.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    identification of the base word, by removal of derived or inflected variations.

References

  1. J.E. Alvarez, H. Bast, A review of word embedding and document similarity algorithms applied to academic text, in Bachelor’s Thesis, University of Freiburg (2017). https://pdfs.semanticscholar.org/0502/05c30069de7df8164f2e4a368e6fa2b804d9.pdf

  2. O. Egozi, S. Markovitch, E. Gabrilovich, Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. (TOIS) 29(2), 8 (2011)

    Google Scholar 

  3. Y. Song, D. Roth, Unsupervised sparse vector densification for short text similarity, in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015), pp. 1275–1280

    Google Scholar 

  4. E. Gabrilovich, S. Markovitch, Computing semantic relatedness using Wikipedia-based explicit semantic analysis, in IJcAI, vol. 7 (2007), pp. 1606–1611

    Google Scholar 

  5. D. Bogdanova, M. Yazdani, SESA: Supervised Explicit Semantic Analysis. arXiv preprint arXiv:1708.03246 (2017)

  6. M. Pagliardini, P. Gupta, M. Jaggi, Unsupervised Learning of Sentence Embeddings Using Compositional n-gram Features arXiv preprint arXiv:1703.02507 (2017)

  7. A. Waldis, L. Mazzola, M. Kaufmann, Concept extraction with convolutional neural networks, in Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA 2018), vol. 1 (2018), pp. 118–129

    Google Scholar 

  8. K. Bennani-Smires, C. Musat, M. Jaggi, A. Hossmann, M. Baeriswyl, EmbedRank: Unsupervised Keyphrase Extraction Using Sentence Embeddings. arXiv preprint arXiv:1801.04470 (2018)

  9. Y. Yao et al., Granular computing: basic issues and possible solutions, in Proceedings of the 5th Joint Conference on Information Sciences, vol. 1 (2000), pp. 186–189

    Google Scholar 

  10. C. Mencar, Theory of Fuzzy Information Granulation: Contributions to Interpretability Issues (University of Bari, 2005), pp. 3–8

    Google Scholar 

  11. M.M. Gupta, R.K. Ragade, R.R. Yager, Advances in Fuzzy Set Theory and Applications (North-Holland Publishing Company, 1979)

    Google Scholar 

  12. G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  13. K. Lund, C. Burgess, Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)

    Article  Google Scholar 

Download references

Acknowledgements

The research leading to this work was partially financed by the KTI/Innosuisse Swiss federal agency, through a competitive call. The financed project KTI-Nr. 27104.1 is called CVCube: digitale Aus- und Weiterbildungsberatung mittels Bildungsgraphen. The authors would like to thank the business project partner for the fruitful discussions and for allowing us to use the examples in this publication. We would like to thank Benjamin Haymond for his very helpful and precise revision and language editing support of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Kaufmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mazzola, L., Siegfried, P., Waldis, A., Stalder, F., Denzler, A., Kaufmann, M. (2020). A Domain Specific ESA Method for Semantic Text Matching. In: Jardim-Goncalves, R., Sgurev, V., Jotsov, V., Kacprzyk, J. (eds) Intelligent Systems: Theory, Research and Innovation in Applications. Studies in Computational Intelligence, vol 864. Springer, Cham. https://doi.org/10.1007/978-3-030-38704-4_2

Download citation

Publish with us

Policies and ethics