Skip to main content

Experimental Comparison of Unsupervised Approaches in the Task of Separating Specializations Within Professions in Job Vacancies

  • Conference paper
  • First Online:
Artificial Intelligence and Natural Language (AINL 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1119))

Included in the following conference series:

Abstract

In this article an unsupervised approach for analysis of labor market requirements allowing to solve the problem of discovering latent specializations within broadly defined professions is presented. For instance, for the profession of “programmer” such specializations could be “CNC programmer”, “mobile developer”, “frontend developer” and so on. Various statistical methods of text vector representations: TF-IDF, probabilistic topic modeling, neural language models based on distributional semantics (word2vec, fasttext) and deep contextualized word representation (ELMo and multilingual BERT) have been experimentally evaluated. Both pre-trained models and models trained on the texts of job vacancies in Russian have been investigated. The experiments were conducted on dataset provided by online recruitment platforms. Several types of clustering methods: K-means, Affinity Propagation, BIRCH, Agglomerative clustering, and HDBSCAN have been tested. In case of predetermined clusters’ number (k-means, agglomerative) the best result was achieved by ARTM. However, if the number of clusters was not specified ahead of time, word2vec trained on our job vacancies dataset has outperformed other models. The models trained on our corpora perform much better than pre-trained models with large even multilingual vocabulary.

Supported by RFBR, research project No. 18-47-860013.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ikudo, A., et al.: Occupational Classifications: A Machine Learning Approach. No. w24951. National Bureau of Economic Research (2018)

    Google Scholar 

  2. Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: Using machine learning for labour market intelligence. In: Altun, Y., et al. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10536, pp. 330–342. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_27

    Chapter  Google Scholar 

  3. Colombo, E., Mercorio, F., Mezzanzanica, M.: Applying machine learning tools on web vacancies for labour market and skill analysis (2018)

    Google Scholar 

  4. Wowczko, I.: Skills and vacancy analysis with data mining techniques. Informatics, vol. 2, no. 4. Multidisciplinary Digital Publishing Institute (2015)

    Google Scholar 

  5. Spirin, N., Karahalios, K.: Unsupervised approach to generate informative structured snippets for job search engines. In: Proceedings of the 22nd International Conference on World Wide Web. ACM (2013)

    Google Scholar 

  6. Muthyala, R., et al.: Data-driven job search engine using skills and company attribute filters. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE (2017)

    Google Scholar 

  7. Deokar, S.T.: Text documents clustering using K-means algorithm. Int. J. Technol. Eng. Sci. [IJTES] 1(4), 282–286 (2013)

    Google Scholar 

  8. Zhu, Y., Yu, J., Jia, C.: Initializing k-means clustering using affinity propagation. In: 2009 9th International Conference on Hybrid Intelligent Systems, vol. 1. IEEE (2009)

    Google Scholar 

  9. Guan, R., et al.: Text clustering with seeds affinity propagation. IEEE Trans. Knowl. Data Eng. 23(4), 627–637 (2011)

    Article  Google Scholar 

  10. Gencoglu, O.: Deep representation learning for clustering of health tweets. arXiv preprint arXiv:1901.00439 (2018)

  11. Parhomenko, P.A., Grigorev, A.A., Astrakhantsev, N.A.: A survey and an experimental comparison of methods for text clustering: application to scientific articles. Trudy ISP RAN/Proc. ISP RAS 29(2), 161–200 (2017)

    Article  Google Scholar 

  12. Chen, J., Tao, Y., Lin, H.: Visual exploration and comparison of word embeddings. J. Vis. Lang. Comput. 48, 178–186 (2018)

    Article  Google Scholar 

  13. Naili, M., Chaibi, A.H., Ben Ghezala, H.H.: Comparative study of word embedding methods in topic segmentation. Procedia Comput. Sci. 112, 340–349 (2017)

    Article  Google Scholar 

  14. Vorontsov, K., Potapenko, A.: Tutorial on probabilistic topic modeling: additive regularization for stochastic matrix factorization. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorskiy, R.E. (eds.) AIST 2014. CCIS, vol. 436, pp. 29–46. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12580-0_3

    Chapter  Google Scholar 

  15. Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015)

    Article  MathSciNet  Google Scholar 

  16. Kutuzov, A., Kuzmenko, E.: WebVectors: a toolkit for building web interfaces for vector semantic models. In: Ignatov, D., et al. (eds.) AIST 2016. CCIS, vol. 661, pp. 155–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52920-2_15

    Chapter  Google Scholar 

  17. Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  18. Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)

  19. Zuo, Y., Zhao, J., Ke, X.: Word network topic model: a simple but general solution for short and imbalanced texts. Knowl. Inf. Syst. 48(2), 379–398 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

The reported study was partially funded by RFBR according to the research project No. 18-47-860013 Intelligent system for the formation of educational programs based on neural network models of natural language to meet the requirements of the digital economy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Ryazanov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vinel, M., Ryazanov, I., Botov, D., Nikolaev, I. (2019). Experimental Comparison of Unsupervised Approaches in the Task of Separating Specializations Within Professions in Job Vacancies. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2019. Communications in Computer and Information Science, vol 1119. Springer, Cham. https://doi.org/10.1007/978-3-030-34518-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34518-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34517-4

  • Online ISBN: 978-3-030-34518-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics