Keywords

1 Introduction

Organizations have invested heavily in collecting and analyzing various data sets, such as those pertaining to their customers and the usage of their resources. The knowledge generated by these investments has resulted in different variants and architectures for Customer Relationship Management (CRM) systems and Enterprise Resource Planning (ERP) systems. The wealth of the data that organizations have today about their core processes and their human capital has triggered HR professionals to exploit this data for improving the HR function. This is partly due to the fact that organizations are subject to a fast-changing reality in which new digital solutions drive organizations to adapt fast to these changes. As a consequence, the required competencies of employees that are needed for the business performance of organizations are also subject to changes. Currently, HR professionals experience that, on the one hand, the development of the human capital and, on the other hand, the development of the core processes in organizations, are diverging. This induces a gap between the competencies of an organization’s employees and what (skillset) is needed to adequately implement the core processes of the organization. Consequently, HR professionals are increasingly unable to adequately match the wishes and abilities of employees to the requirements of the core processes of organizations.

Concepts and techniques from the field of big data/data analytics may provide the desired insights in the gap and, more importantly, may inform organizations about how to bridge the gap. Realizing such applications, however, entails addressing a number of technical issues and challenges such as establishing efficient integration, storage and retrieval of large data volumes, as well as processing different types of data almost in real-time. It is also relevant to note that often the usage of big data does not fully coincide with the purpose for which the original data was collected. This entails a number of so-called soft challenges for using big data in (HR) practice. Relying on data gathered from various sources and for diverse purposes can result in, for example, violations of fundamental human rights, such as privacy and autonomy [1,2,3].

This paper aims at utilizing big data for improving the HR function in organizations. Specifically, our research question can be formulated as: How can big data be utilized to bridge the gap between the skills and knowledge that employees of an organization have and the skills and knowledge that are needed within the organization in order to run the organization’s core processes effectively? This research question, which is originally posed by the HR professionals from the practice, is an emerging topic in the HR field. Our study can be classified on the intersection of HR and big data, referred to as HR Analytics. As pointed out in [4], the realization of HR Analytics is still in its childhood, despite many efforts made so far. We analyze the characteristics of HR Analytics and, on the basis of this analysis, present a framework for realizing HR Analytics. Two key findings of our analyses are: (a) typical HR needs must be considerably elaborated further in detail before being translated into required data analytics tasks such as classification, associations and profiling and (b) HR data is in practice scattered over various systems and a significant amount of effort is needed to integrate the corresponding data sets. The latter finding is in line with [5].

Although our framework consists of four main steps, the implementation part of this paper is primarily devoted to the first two steps. The goal of these steps is to map broad HR needs/concepts into a set of feasible data analytics tasks. For this purpose we present a number of tools in achieving this mapping and illustrate how these tools have been applied to two real life-cases.

We envision that in the near future HR analytics will be an integral part of the HR function. As such, HR analytics is going to restructure the HR function by providing HR professionals with the values of predefined and relevant indicators. These values can help HR professionals to take appropriate decisions for bridging the aforementioned gap. By means of the two real-life cases, we illustrate how the relevant indicators can be measured in practice.

The remainder of this paper is organized as follows. Section 2 reviews the contemporary HR function and reports our findings. Based on these findings, we propose in Sect. 3 a comprehensive framework for HR Analytics and elaborate on the implementations of the first two steps of the framework. Section 4 concludes the paper.

2 HR Analytics

Although there is an overwhelming number of papers related to HR Analytics, the term HR Analytics is relatively new [4]. In this section, we describe the state of the art of HR Analytics, discuss how HR Analytics restructures the HR function, and present some examples to illustrate the added value of HR Analytics. Finally, we describe a number of existing challenges involving HR data.

2.1 State of the Art

To improve their business functions, organizations take full advantage of the large amount of data that is available within or outside their organizations. However, this data exploitation is not the case for HR. So far, the HR field has focused mainly on descriptive analytics [6], a bit on predictive analytics [7], and very little on prescriptive analytics [8]. Yet, there is no widely accepted definition of HR Analytics. In [4], HR Analytics is regarded as a practice that is enabled by IT, focusing on the analysis and visualization of data, while [9] defines HR Analytics as a set of tools and technologies that provides a wide range of capabilities, ranging from simple reporting to predictive modelling.

In [10], a distinction is made between HR Analytics and HR Metrics. HR Metrics are the measures of key HR management outcomes. These measures are classified as efficiency, effectiveness or impact. In [10] is subsequently argued that HR Analytics is not about the measures, but rather represents the statistical techniques and experimental approaches that can be used to show the impact of HR activities. Despite this distinction between HR Metrics and HR Analytics in [10], definitional ambiguity still remains. In contrast to [10], we consider HR Metrics as a part of HR Analytics.

Regardless of the fact that there is a growing interest in and an abundance amount of literature about HR Analytics, the research and development on how to exploit relevant data to improve the HR function are still in its infancy. A search on Google Scholar on the terms “HR Analytics”, for example, resulted in more than 80.000 hits. Nevertheless, very little and limited scientific evidence is found on the use and adoption of HR Analytics as mentioned in [4]. In [4] 60 papers were reviewed and only 14 of them were classified as scientific papers. There are a number of reasons why HR Analytics has not met the expectations so far. The main reason, as identified in [11], is that HR professionals do not understand big data analytics and big data analytics teams do not understand HR. In [4] an approach is proposed in which the central question that should be asked is: How HR data can be used to create, capture, leverage and protect value? Developing advanced concepts and techniques of big data may be helpful in answering this question. In our view, these concepts and techniques should be tailored to the HR domain in order to contribute to the solution of HR problems [12].

Given the current development stage of HR Analytics, we are in line with [4] about the need for doing more scientific research.

2.2 Restructuring the HR Function

A HR department’s goal is to help the organization to deliver its strategy and objectives by effectively managing its human capital. To realize this goal, the HR function covers several areas, ranging from recruitment, learning and development to vitality.

Fast-changing circumstances in society create new questions that require up to date answers. Typical questions that must be addressed in contemporary HR departments are: What does our staff look like within 10 years if we do nothing? What interventions are needed to bridge the gap between the knowledge and skills of our employees and the competencies that are required in the future? Are we diverse enough? To what extent does our policy contribute to the sustainable employability of our employees? Up to now, HR professionals try to answer these questions by formulation a set of hypotheses and collecting the data needed to accept or reject these hypotheses. We note that a hypothesis does not necessarily answer a question but provides some insights that may help to find an answer. As it has been discussed in the literature, this traditional statistical approach suffers from some flaws [13]. For example, the hypotheses should be devised beforehand, which may be a time-consuming process as it requires advanced and in-depth knowledge of the field. Searching for interesting hypotheses can be accomplished by using contemporary (big) data analytics/data mining algorithms applied to all relevant data sets available.

An emerging area that is becoming part of the HR function is the so-called information provisioning [14]. In this area, the collection and analysis of employee data and the provision of the results to the organization are the main tasks. We argue that big data can strengthen the function of information provisioning.

There are many descriptions and definitions of big data in circulation, varying from formal to informal definitions. A common description of big data is based on the well-known 3 V’s [15]. A formal definition of big data, on the other hand, considers it as inducing a model based on world observations/data stored in information systems. In this formal definition, we can distinguish three building blocks, namely: data from different systems, analysis techniques based on inductive reasoning, and models derived from big data [12]. To facilitate the design and implementation of these building blocks, a wide range of tools, concepts and algorithms are developed. Reliable employee data which is necessary to underpin the information provisioning are organized in the first building block. In the second building block, existing analytics algorithms are organized that must be tailored to extract those models that can answer the HR questions and needs. In the last building block, those tools are organized that can help interpreting the obtained models when answering the HR questions. Naturally, HR professionals should be able to interact with the latter building block.

2.3 Illustrative Examples

In large organizations, such as ministries or municipalities, a lot of money is invested in employees’ professional development. However, employees often follow the same course or training in these organizations, without their job descriptions, abilities and personal wishes being taken into account. HR professionals, who currently cannot adequately tailor the wishes and abilities of individual employees to the needs of the organization, can use HR analytics to generate the profiles of employees. The resulting employee profile can be used by HR professionals to advise the employee about a suitable training to follow or to prepare the employee for a better fitting job.

Next, suppose the HR department of the Dutch Ministry of Justice and Security observing that crime in society becomes more technology-driven. This HR department, subsequently, experiences a skill gap in the organization because its staff are unable to cope with the criminal implications of new technologies. The organization, consequently, may require having employees with a law background in combination with cyber security skills. The questions that may arise are: how can the employer improve the skillset of its workforce? Which employees can best be chosen to follow an expensive in-depth training about new technologies? With HR Analytics the HR professional could obtain some indications that, for example, those employees with a background in financial law tend to adopt ICT skills more easily. The HR department, therefore, can focus on retraining these employees. This informed policy should help closing the observed gap significantly.

Another example is concerned with the change of the retirement pattern in the Dutch public sector, observing that more and more employees delay their retirement and change their contracts to part-time work. Note that in the Netherlands, retiring public sector employees have the option of postponing their retirement up to 5 years. After expiring these optional extra years, employees’ retirement is obligatory. This retirement policy is based on the fact that, often, the retiring people are in key roles, hold key relationships, and are critical for ensuring the continuity of an organization’s performance. However, it is also challenging to keep a potential successor waiting if the incumbent chooses not to retire at the time expected. Currently HR professionals use two indicators of age to estimate the collective retirement behaviors of their employees. To this end, HR Analytics can improve this estimation of the collective retirement behaviors by taking into account many additional indicators, such as recent changes in role, pay level, rates of the changes in pay, and incentive eligibility. This improved estimation allows HR professionals to be more effective in managing the retirement cycle and ensures that key roles have a successor ready just in time.

2.4 HR Data

Several processes may be distinguished in the HR function. For each of these processes, there are a number of data items that are collected in several HR Information Systems (HRIS). Examples of the collected HR data items are competence assessment, date of birth, disciplinary date and email traffic between employees. Often, these data items are scattered among different legacy HRIS. As has been pointed out by several scholars, legacy data sets are hard to integrate since they are noisy and incomplete; and their semantics are uncertain, see [16] for an overview. In addition, the silo mentality in organizations prevents HR-related data being combined [11].

Based on a prescriptive framework, Pape [5] prioritizes the data items that are important for HR analytics. Pape has selected a total of 298 descriptive and predictive HR analyses from the academic and professional literature (202); and interviewed HR analysts (96). From these analyses, Pape derives 126 data items and identifies top 30 data items influential in the identified HR processes. For identifying these top 30 data items, Pape uses a framework based on interviewing 24 HR professionals from 15 organizations. Six top data items in the derived top 30 list are location, role, function, manager ID, performance score and total other benefits. In [5] a further argument is made that collecting those data items is challenging for many HR departments.

For HR Analytics, the quality, accessibility and availability of data are crucial. As argued in [4, 5], an enormous effort is required to obtain the data that meets these aspects. Therefore, despite having many interesting HR questions (as raised in the previous sections) with a potential solution based on HR Analytics, it makes sense to choose a data set that is feasible to integrate and to exploit this data set to gain some of the insights needed by HR professionals. Thus, in a sense, HR Analytics is constrained by the efforts needed to build such an integrated data set.

2.5 Problem Statement

So far we have seen that there is a need by HR professionals to gain insight in tackling the main question of this contribution, namely: How to utilize big data to bridge the gap between the current and the desired skills and knowledge of employees within an organization? Actually, this is a very broad question as it includes many of the questions asked today by contemporary HR departments (see Sects. 2.2 and 2.3).

There are several issues in answering these questions raised. Firstly, the HR questions are open for various interpretations. Secondly, the HR questions are insufficiently elaborated upon for being translated to a set of data analytics solutions (such as association, clustering, profiling, or classification problems) [17]. Thirdly, it is unclear beforehand which efforts will be required to collect and integrate the proper data. Therefore, a systematic and specific approach is required to deal with such HR questions. In the next section, we present the main building blocks of such a systematic approach.

3 A Framework for HR Analytics

In this section we present a framework to realize HR Analytics in practice, as shown in Fig. 1. As concluded in [11], unfortunately HR professionals and big data analytics professionals do not understand each other well. This lack of understanding, as a consequence, may cause HR Analytics applications to fail and result in disappointments. Therefore, an essential part of our framework is to create a common understanding between these professionals. By means of the two real-life cases, we illustrate how to identify broad HR needs and make them into a set of feasible data analytics tasks and how the values of relevant indicators may be collected in practice. Although our framework consists of four steps, this paper primarily focuses on the implementation aspects of the first two steps of the framework. Due to page limitation, we excluded any implementations of the latter two steps. Issues related to these steps are common over many domains. For more discussion about this see, e.g., [16].

Fig. 1.
figure 1

Framework for HR-Analytics.

In the following subsections, we discuss each of the steps of the framework. In Sect. 3.1, we discuss how we use a multi-disciplinary approach [18] based on the method of design thinking to create consensus about the insights that are needed to contribute to the solution of our main question. In Sect. 3.2, we describe a route for mapping a broad HR notion/concept to concrete measurable indicators. Section 3.3 sketches the data collection and integration measures needed to acquire a clean and integrated data set for further analysis. Finally, Sect. 3.4 sketches the big data techniques to analyze the integrated data set and to interpret the analysis outcomes, i.e., the models.

3.1 Question Elucidation

In HR Analytics situations may arise where there is no consensus on what the main HR need (or problem) is, nor a common understanding of how (partial) problems should be tackled. A way to address these problems is to create either more consensus between the viewpoints of the stakeholders about the problem at hand and/or to create a shared understanding and acceptance of the cause-effect relations. Design Thinking [19] can contribute to resolving these uncertainties. It is rooted in product and service design, has been successfully applied to cases where people and organizations interact with technological processes. Particularly, in those cases where user needs are insufficiently documented and are hidden in tacit knowledge among poorly communicating stakeholders, Design Thinking is proven to be fruitful [20].

Design Thinking is a multi-disciplinary and participatory approach that encourages successive meetings in which various stakeholders (e.g., system developers and end-users) discuss their viewpoints and insights to answer the issue at hand. These meetings continue until a consensus is reached, i.e., a fitting solution arises from the viewpoints of the stakeholders involved [21].

As case studies, we have involved the HR professionals from the HR department of the Dutch Ministry of Justice and Security (case 1) and the HR department of the municipality of Rotterdam (case 2). By means of a number of lectures on HR Analytics provided by us to HR professionals, workshops and bilateral discussions between professionals of data analytics and HR, we elaborated our data analytics perspective in regard to the main research question of Sect. 2.5. In both cases the participants agreed that gaining insights in the development of the HR concepts of sustainable employability, diversity and job performance are important for the envisioned developments of the (core) processes of these organizations. Although these processes can be different for each HR department, one can distinguish a number of sub-processes that have similarities, for example, sub-processes that are needed for job applications.

Once an agreement about the relevant HR concepts is reached, it is worthwhile to work on these concepts and how to operationalize them based on the available and accessible data. In the next section, we describe and illustrate how the concepts of sustainable employability, diversity and job performance have been made operational.

3.2 Concepts, Phenomena and Indicator Definition

The second step of the framework involves the operationalization of the identified HR concepts. Based on a desktop research, some representative indicators that cover these concepts are searched for. In addition, the usefulness of these indicators must be determined by the HR professionals and big data analytics professionals involved. The first group indicates whether the indicator is representative enough and the latter group determines the quality of the attributes. For operationalization purposes, metrics for these representative indicators will be determined. The values of the indicators must be measured from the collected data.

Suppose, for diversity we have two indicators age and ethnicity. The age indicator can be expressed in years and measured reliably. Subsequently, a distribution function of the number of people in different age classes can be calculated. The ethnicity indicator is more difficult to measure because ethnicity must first be defined. Suppose we limit ourselves to the distinction between immigrants and non-immigrants, and we use a definition as proposed by Statistics Netherlands (abbreviated as CBS in Dutch). According to the definition of CBS, an immigrant is a person one of whose parents was not born in the Netherlands. The values for the indicator ethnicity are difficult to measure because organizations do not register the origin of their employees’ parents.

The operationalization step of the sustainable employability concept was picked up in case 1 by two student groups of the Rotterdam University of Applied Sciences. The students got an assignment to develop a tool that gives HR professionals an insight into sustainable employability of employees, due to its importance for the HR department of the Dutch Ministry of Justice and Security. Based on desktop research, the students quickly discovered that sustainable employability is a broad notion. HR organizations tend to have different views about which topics are relevant to sustainable employability. Literature showed that the sustainable employability concept encompasses several HR topics such as (1) engagement and vitality, (2) work organization, (3) health, and (4) learning and development.

To limit the amount of work needed to define all indicators of these 4 HR topics of sustainable employability in the time given for the project, the students, in consultation with the HR professionals of the ministry, chose to further operationalize the topic of learning and development. This topic focuses on employees being more productive if they are working at the right level and if they are experiencing appropriate challenges. Involved and interested employees keep a close eye on developments in their field, gain new knowledge, and follow additional training courses. Involved employers offer room for this, because it pays off to invest in sustainable employability. Relevant indicators can show how many employees are at the right level and which employees should further develop themselves so that the organization can operate in a most optimal way.

Figure 2 shows the operationalization of the learning and development concept. Some of the indicators in Fig. 2 measure organizational aspects (e.g., the amount of offered facilities) and others measure personnel aspects (e.g. motivation and personal development). Due to space limits, Fig. 2 shows only some of the values to be measured for the indicators. Next, the data collection and integration steps are needed to determine which data is available and useful per indicator.

Fig. 2.
figure 2

From learning and development topic to measurable indicators

3.3 Data Collection and Integration

The data about employees’, for example, function, salary scale, performance appraisals, non-attendances, conducted training courses, is currently distributed among different legacy systems (i.e. database management systems). In the HR domain there exist many different Human Resource Information Systems (HRIS) for different HR purposes. The data varies from structured data, like salary information, to unstructured data, like paragraphs of natural language text about the performance appraisals of employees. Bringing this data together is a necessary step to measure the value of the indicators. To this end, employee data needs to be extracted from these legacy systems. Before the data is used as input for big data analytics, it is subjected to Ethical Impact Assessment (EIA) and Data Privacy Impact Assessment (DPIA) to determine, particularly, whether (or how) the secondary use of data is ethical and legal. Hereby one can revise the predefined indicators or to further anonymize the data.

Figure 3 depicts a schematic overview of the process for collecting and integrating data from multiple HR sources. The HR department of the Dutch Ministry of Justice & Security uses multiple different HRISs. For example, P-direkt is the HR service provider for the Dutch national government, Leonardo is the ERP system for financial-logistics information, Coach Pool is an employee coaching system, Goodhabitz registers followed online training courses, Learning Management System (LMS) gives an insight into the skills/talents of employees, and MO contains data about employee surveys.

Fig. 3.
figure 3

Schematics of an approach to integrate data from different sources

The relation management module in Fig. 3 is used to clean (i.e., to remove those records with known wrong registrations from) the source files and combine the data (i.e., relate all available data to a unique event or class of employees). For example, to integrate the subjective MO survey data with the data from LMS (e.g., to find out whether satisfied employees have developed their talents and skills), it is required to exploit common attributes of employees. The module can also be used to deal with missing data. If the value of an attribute is (temporarily) unavailable, for example due to technological problems, it may be replaced with the value of a similar attribute that is measured at a slightly earlier or later point in time, while it, more or less, covers the same notion. This replacement is referred to as imputation.

To obtain the data set for analysis, it is usually required to have a unique identifying key that is present in all database systems. If such a key is not available or not desired because of, for example, privacy-regulations, then combining the datasets can be performed using a set of common attributes [1]. Based on this key or the attribute set, the data items of an employee from different sources are combined. The first step of the data integration process is to extract the same conceptual data from different database systems. In general, this is the so-called “micro data” on the person level.

3.4 Big Data Analytics and Output Interpretation

Over the years a wide range of techniques has been developed, resulting in many algorithms to analyze data. These algorithms are focused on performing a task such as classifying, associating, clustering and searching. For a certain task, various variants of techniques are usually available. For example, for a classification task, you may apply discriminant analysis, decision trees, regression, support vector machines or neural networks. Depending on the task and the given preconditions, an algorithm will be chosen and, if necessary, tailored. If, for example, it is very clear what characteristics should be classified, then discriminant analysis can be applied. If it is not clear which attributes need to be classified and there is a lot of data available, a neural network can be used to learn the classification task and generate a classification model. The challenge of using big data is to determine which tasks need to be performed in order to achieve meaningful results. Next, choosing an algorithm to perform the task is also a challenge. In the literature there are best practices and guidelines [12].

The interpretation of the results of big data is a challenging task. A reason for this is that it concerns statistical truths that apply to groups (e.g., a set of persons) and the fact that they are not always applicable to individuals. A statistical truth implies a distribution function of possible outcomes that only occurs with a very large group of observations (almost with an infinitive size). In addition, the dataset used contains only a limited number of variables that can partially describe the individual variation. Based on this, no conclusion can be drawn or no prediction can be made about individual cases. Such incorrect interpretations can lead to unjust decisions about individuals. In practice, however, this is often done to make (policy) decisions. In the end, each specific application domain has its own challenges in interpreting big data results. In HR data, which usually concerns personal data, privacy and security issues play an important role [. Guidelines and strategies for applying big data results in practice, while taking into account the mentioned challenges, are in their infancy.

4 Conclusion

Currently HR professionals face challenges when searching for an adequate answer to the question of how to match the wishes and abilities of individual employees to the requirements of the organization. To acquire the necessary insights and knowledge that can help HR professionals to minimize the gap or to anticipate an even greater gap, organizations see a great potential in harnessing the wealth of the data about their human capital and core processes. We argued that big data may provide new insights based on that data, referred to as HR Analytics in this contribution.

In this paper we analyzed the specific characteristics of HR Analytics and, on the basis of this analysis, presented a framework to realize HR Analytics in practice. Two key findings of our analyses are: (a) typical HR needs should be considerably further elaborated upon before being translated into data analytics tasks such as classification, associations and profiling and (b) HR data is scattered across various systems in practice and a significant amount of effort is needed to integrate the corresponding data.

This paper is primarily devoted to the first two steps in the presented framework. The goal of these steps is to operationalize broad HR needs into a set of feasible data analytics tasks. We presented a number of tools in achieving this goal and illustrated how these tools have been applied in practice. We envision that HR analytics will be an integral part of the HR function. As such, HR Analytics will restructure the HR function by informing HR professionals via relevant indicators. These indicators can help HR professionals to make appropriate decisions for bridging the mentioned gap.