Abstract
In this article an unsupervised approach for analysis of labor market requirements allowing to solve the problem of discovering latent specializations within broadly defined professions is presented. For instance, for the profession of “programmer” such specializations could be “CNC programmer”, “mobile developer”, “frontend developer” and so on. Various statistical methods of text vector representations: TF-IDF, probabilistic topic modeling, neural language models based on distributional semantics (word2vec, fasttext) and deep contextualized word representation (ELMo and multilingual BERT) have been experimentally evaluated. Both pre-trained models and models trained on the texts of job vacancies in Russian have been investigated. The experiments were conducted on dataset provided by online recruitment platforms. Several types of clustering methods: K-means, Affinity Propagation, BIRCH, Agglomerative clustering, and HDBSCAN have been tested. In case of predetermined clusters’ number (k-means, agglomerative) the best result was achieved by ARTM. However, if the number of clusters was not specified ahead of time, word2vec trained on our job vacancies dataset has outperformed other models. The models trained on our corpora perform much better than pre-trained models with large even multilingual vocabulary.
Supported by RFBR, research project No. 18-47-860013.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ikudo, A., et al.: Occupational Classifications: A Machine Learning Approach. No. w24951. National Bureau of Economic Research (2018)
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: Using machine learning for labour market intelligence. In: Altun, Y., et al. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10536, pp. 330–342. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_27
Colombo, E., Mercorio, F., Mezzanzanica, M.: Applying machine learning tools on web vacancies for labour market and skill analysis (2018)
Wowczko, I.: Skills and vacancy analysis with data mining techniques. Informatics, vol. 2, no. 4. Multidisciplinary Digital Publishing Institute (2015)
Spirin, N., Karahalios, K.: Unsupervised approach to generate informative structured snippets for job search engines. In: Proceedings of the 22nd International Conference on World Wide Web. ACM (2013)
Muthyala, R., et al.: Data-driven job search engine using skills and company attribute filters. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE (2017)
Deokar, S.T.: Text documents clustering using K-means algorithm. Int. J. Technol. Eng. Sci. [IJTES] 1(4), 282–286 (2013)
Zhu, Y., Yu, J., Jia, C.: Initializing k-means clustering using affinity propagation. In: 2009 9th International Conference on Hybrid Intelligent Systems, vol. 1. IEEE (2009)
Guan, R., et al.: Text clustering with seeds affinity propagation. IEEE Trans. Knowl. Data Eng. 23(4), 627–637 (2011)
Gencoglu, O.: Deep representation learning for clustering of health tweets. arXiv preprint arXiv:1901.00439 (2018)
Parhomenko, P.A., Grigorev, A.A., Astrakhantsev, N.A.: A survey and an experimental comparison of methods for text clustering: application to scientific articles. Trudy ISP RAN/Proc. ISP RAS 29(2), 161–200 (2017)
Chen, J., Tao, Y., Lin, H.: Visual exploration and comparison of word embeddings. J. Vis. Lang. Comput. 48, 178–186 (2018)
Naili, M., Chaibi, A.H., Ben Ghezala, H.H.: Comparative study of word embedding methods in topic segmentation. Procedia Comput. Sci. 112, 340–349 (2017)
Vorontsov, K., Potapenko, A.: Tutorial on probabilistic topic modeling: additive regularization for stochastic matrix factorization. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorskiy, R.E. (eds.) AIST 2014. CCIS, vol. 436, pp. 29–46. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12580-0_3
Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. 101(1), 303–323 (2015)
Kutuzov, A., Kuzmenko, E.: WebVectors: a toolkit for building web interfaces for vector semantic models. In: Ignatov, D., et al. (eds.) AIST 2016. CCIS, vol. 661, pp. 155–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52920-2_15
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Zuo, Y., Zhao, J., Ke, X.: Word network topic model: a simple but general solution for short and imbalanced texts. Knowl. Inf. Syst. 48(2), 379–398 (2016)
Acknowledgments
The reported study was partially funded by RFBR according to the research project No. 18-47-860013 Intelligent system for the formation of educational programs based on neural network models of natural language to meet the requirements of the digital economy.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vinel, M., Ryazanov, I., Botov, D., Nikolaev, I. (2019). Experimental Comparison of Unsupervised Approaches in the Task of Separating Specializations Within Professions in Job Vacancies. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2019. Communications in Computer and Information Science, vol 1119. Springer, Cham. https://doi.org/10.1007/978-3-030-34518-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-34518-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34517-4
Online ISBN: 978-3-030-34518-1
eBook Packages: Computer ScienceComputer Science (R0)