Experimental Comparison of Unsupervised Approaches in the Task of Separating Specializations Within Professions in Job Vacancies

Vinel, Mikhail; Ryazanov, Ivan; Botov, Dmitriy; Nikolaev, Ivan

doi:10.1007/978-3-030-34518-1_7

Mikhail Vinel⁹,
Ivan Ryazanov⁹,
Dmitriy Botov⁹ &
…
Ivan Nikolaev⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1119))

Included in the following conference series:

Conference on Artificial Intelligence and Natural Language

631 Accesses
4 Citations

Abstract

In this article an unsupervised approach for analysis of labor market requirements allowing to solve the problem of discovering latent specializations within broadly defined professions is presented. For instance, for the profession of “programmer” such specializations could be “CNC programmer”, “mobile developer”, “frontend developer” and so on. Various statistical methods of text vector representations: TF-IDF, probabilistic topic modeling, neural language models based on distributional semantics (word2vec, fasttext) and deep contextualized word representation (ELMo and multilingual BERT) have been experimentally evaluated. Both pre-trained models and models trained on the texts of job vacancies in Russian have been investigated. The experiments were conducted on dataset provided by online recruitment platforms. Several types of clustering methods: K-means, Affinity Propagation, BIRCH, Agglomerative clustering, and HDBSCAN have been tested. In case of predetermined clusters’ number (k-means, agglomerative) the best result was achieved by ARTM. However, if the number of clusters was not specified ahead of time, word2vec trained on our job vacancies dataset has outperformed other models. The models trained on our corpora perform much better than pre-trained models with large even multilingual vocabulary.

Supported by RFBR, research project No. 18-47-860013.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ikudo, A., et al.: Occupational Classifications: A Machine Learning Approach. No. w24951. National Bureau of Economic Research (2018)
Google Scholar
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: Using machine learning for labour market intelligence. In: Altun, Y., et al. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10536, pp. 330–342. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_27
Chapter Google Scholar
Colombo, E., Mercorio, F., Mezzanzanica, M.: Applying machine learning tools on web vacancies for labour market and skill analysis (2018)
Google Scholar
Wowczko, I.: Skills and vacancy analysis with data mining techniques. Informatics, vol. 2, no. 4. Multidisciplinary Digital Publishing Institute (2015)
Google Scholar
Spirin, N., Karahalios, K.: Unsupervised approach to generate informative structured snippets for job search engines. In: Proceedings of the 22nd International Conference on World Wide Web. ACM (2013)
Google Scholar
Muthyala, R., et al.: Data-driven job search engine using skills and company attribute filters. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE (2017)
Google Scholar
Deokar, S.T.: Text documents clustering using K-means algorithm. Int. J. Technol. Eng. Sci. [IJTES] 1(4), 282–286 (2013)
Google Scholar
Zhu, Y., Yu, J., Jia, C.: Initializing k-means clustering using affinity propagation. In: 2009 9th International Conference on Hybrid Intelligent Systems, vol. 1. IEEE (2009)
Google Scholar
Guan, R., et al.: Text clustering with seeds affinity propagation. IEEE Trans. Knowl. Data Eng. 23(4), 627–637 (2011)
Article Google Scholar
Gencoglu, O.: Deep representation learning for clustering of health tweets. arXiv preprint arXiv:1901.00439 (2018)
Parhomenko, P.A., Grigorev, A.A., Astrakhantsev, N.A.: A survey and an experimental comparison of methods for text clustering: application to scientific articles. Trudy ISP RAN/Proc. ISP RAS 29(2), 161–200 (2017)
Article Google Scholar
Chen, J., Tao, Y., Lin, H.: Visual exploration and comparison of word embeddings. J. Vis. Lang. Comput. 48, 178–186 (2018)
Article Google Scholar
Naili, M., Chaibi, A.H., Ben Ghezala, H.H.: Comparative study of word embedding methods in topic segmentation. Procedia Comput. Sci. 112, 340–349 (2017)
Article Google Scholar
Vorontsov, K., Potapenko, A.: Tutorial on probabilistic topic modeling: additive regularization for stochastic matrix factorization. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorskiy, R.E. (eds.) AIST 2014. CCIS, vol. 436, pp. 29–46. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12580-0_3
Chapter Google Scholar
Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015)
Article MathSciNet Google Scholar
Kutuzov, A., Kuzmenko, E.: WebVectors: a toolkit for building web interfaces for vector semantic models. In: Ignatov, D., et al. (eds.) AIST 2016. CCIS, vol. 661, pp. 155–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52920-2_15
Chapter Google Scholar
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Zuo, Y., Zhao, J., Ke, X.: Word network topic model: a simple but general solution for short and imbalanced texts. Knowl. Inf. Syst. 48(2), 379–398 (2016)
Article Google Scholar

Download references

Acknowledgments

The reported study was partially funded by RFBR according to the research project No. 18-47-860013 Intelligent system for the formation of educational programs based on neural network models of natural language to meet the requirements of the digital economy.

Author information

Authors and Affiliations

Chelyabinsk State University, 129, Bratiev Kashirinykh street, 454001, Chelyabinsk, Russia
Mikhail Vinel, Ivan Ryazanov, Dmitriy Botov & Ivan Nikolaev

Authors

Mikhail Vinel
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Ryazanov
View author publications
You can also search for this author in PubMed Google Scholar
Dmitriy Botov
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Nikolaev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Ryazanov .

Editor information

Editors and Affiliations

Krasovskii Institute of Mathematics and Mechanics, Yekaterinburg, Russia
Dmitry Ustalov
ITMO University, St. Petersburg, Russia
Andrey Filchenkov
Computer Science, University of Helsinki, Helsinki, Finland
Lidia Pivovarova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vinel, M., Ryazanov, I., Botov, D., Nikolaev, I. (2019). Experimental Comparison of Unsupervised Approaches in the Task of Separating Specializations Within Professions in Job Vacancies. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2019. Communications in Computer and Information Science, vol 1119. Springer, Cham. https://doi.org/10.1007/978-3-030-34518-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-34518-1_7
Published: 13 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34517-4
Online ISBN: 978-3-030-34518-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics