1 Introduction

Nowadays, data thrive of importance. In a digital and fast changing environment, huge companies such as Wal-Mart, a retail giant, logs more than 1 million customer transactions every hour, feeding databases estimated at more than 2.5 petabytes, which is equivalent of 167 times the books in America’s Library of Congress. Additionally, social media platforms, such as Facebook are overwhelmed with a total storage that has exceeded 40 billion photos [28]. These examples highlight that organizations become increasingly interested in the volume and value of data they can capture, in the creation of new data and in search for external data sources.

Due to the massive attention that data is gaining, public sector slowly but steadily takes decisive actions towards adopting and using similar practices used in the private sector. Governments and public sector, attempt to exploit data at their benefit, in order to achieve improved quality services for all citizens. As an example, through use, reuse and free distribution of datasets, governments promote entrepreneurship, innovation, and citizen-centric services [23].

The structure of the paper is described below. In Sect. 2, the research method is presented with all the steps that have been executed to decide the final literature. In Sects. 3 and 4, the findings of the research are described in detail. Specifically, opportunities and challenges are gathered together and categorized into wider groups. Last, some conclusions appear in Sect. 5.

2 Research Procedure

This paper constitutes a systematic literature review [25]. The research was initiated by taking into account the guidelines proposed in [11] with explicit inclusion and exclusion criteria. Specifically, the process we followed for preparing this paper used the following steps: first the definition of research questions and relevant keywords, then the identification of digital libraries and lastly, the selection of studies, which are following the framework of inclusion/exclusion criteria.

2.1 Definition of Research Questions

The aim of this systematic literature review is to investigate the role of emerging data technologies and applications, such as big data technologies and cloud computing [3] in the public sector while identifying potential opportunities and risks deriving from different usages of data in governments, worldwide. Our work addresses the following main Research Question which further includes two defining Questions:

What is the impact of the implementation of Data Technologies in public sector entities?

  1. 1.

    Which opportunities arise for data-driven administrations?

  2. 2.

    What are the challenges of a data-driven public sector?

2.2 Search Methodology

The aim was to include in our research scientific articles, other literature review, reports and studies from credible sources. As a result, the research was conducted using the following digital libraries: ACM, IEEE Xplore, Science Direct, Springer Link, JSTOR, EBSCO Information Services. The search keywords used for this literature review were the following:

  1. 3.

    “data-driven public sector”

  2. 4.

    “public sector AND (machine learning OR artificial intelligence OR data analytics OR big data OR Internet of Things)’’

  3. 5.

    “data driven AND (public service OR (public sector AND (challenges OR machine learning)))’’.

As it could be concluded by the selection of the keywords, our team used extensively the correlative conjunction of “AND’’ and “OR” in order to link the main research on public sector with different fields closely related to the concept of data-driven administrations. Moreover, we extended the search by using synonyms for the term “public sector”, i.e. “public administration” and “public service”.

2.3 Study Selection

After using the above method, a list of relevant to the study articles was created. However, some of the results were still irrelevant to the research questions even with the research keys being in the abstract or/and title. So, within this procedure, some inclusion and exclusion criteria were developed and applied. The inclusion criteria according to which publications were included in the list of the essential studies are the following:

  1. 1.

    Articles and conference papers focusing on data driven organizations, data usage and innovation deriving from the data usage in public sector.

  2. 2.

    Studies that describe how data driven societies and organizations collaborate in promoting innovation.

The exclusion criteria were the following:

  1. 1.

    Studies that partially discuss the topic and did not focus on the main aspects of data driven public sector.

  2. 2.

    Studies that focused on describing data driven administrations but not discussing opportunities or risks.

  3. 3.

    Studies that include only technical aspects (e.g. data technologies, architectures).

The papers used in our research were gathered in December 2017. So this literature review includes information on studies that were published before that date. Articles and studies published before 2000 were excluded from the research. Also, all articles in languages other than English were excluded. As shown in Fig. 1, in the first step of research done by using the keywords, the number of results retrieved was 1030. In the next step, duplicates were discarded and studies that were not journal articles or conference papers were also excluded. After checking the title, the results reduced to 220. The process resumed with the thorough reading of abstracts, which led to 84 relevant articles. In the final step, after reading the full text of all articles, some of them were found irrelevant in terms of content and were excluded. So the number of articles reviewed in full text was 29.

Fig. 1.
figure 1

Literature review summary

3 Opportunities for a Data Driven Public Sector

The adoption of new data technologies entails many opportunities for governments. The findings presented in this part of the review were grouped, as shown in Table 1, where only 22 out of 29 articles included the discussed opportunities.

Table 1. Papers discussing opportunities for data-driven administrations

3.1 Efficiency

According to references found in the articles, in almost every case, efficiency level is positively affected by the data revolution. Analytical implementation of data owned by public sector’s services could lead to the improvement of governance practices in terms of control of expenditure and policy-making [14]. In Ireland, scientists applied data mining techniques to data deriving from social media platforms in order to predict early variations in unemployment rates. In Indonesia, data from Twitter was used for the prediction of the inflation rates by analyzing real-time tweets regarding the price of rice [10, 24]. In addition, in New York, data scientists managed to detect illegal conversions, a condition where the number of people that live in a building surpasses the appointed number of tenants [9].

Detailed, timely and linked data could be used to unravel undiscovered patterns and meaningful information [24]. Regarding this statement, big data analytics would be extremely helpful if integrated into the field of fraud detection operations, as seen in the Internal Revenue Service (IRS) of the United States. In addition, by applying clustering and classification methods on big data analytics, the public administrations could correlate and manage requests faster. Big data could enable the conversion of the existing e-government systems to multi-dynamic systems in which analytical tools such as descriptive or predictive analytics would provide valuable reports and insights [14]. Regarding the efficiency of the provided services, big data analytics assist policy-makers by providing insights about the imposed policies either by estimating the opinion of the citizens through participative technologies/applications or by prognosticating it within high levels of accuracy [9, 14, 15].

In [27], authors state that Artificial intelligence (AI) could boost the efficiency of administration. The combination of humans and AI would change decision-making processes in public services. These processes have traditionally been executed in several steps by humans until now: problem identification; information collection and analysis; searching for different solutions; assessing the alternatives; selecting from others; implementation; evaluation and feedback. As a result, currently decisions made exclusively by humans could be supplemented by AI-embedded decision-making. In the case of problem identification, for example, AI applications might detect problems that managers would not pay attention to [27].

3.2 Public Participation and Transparency

A data-oriented public sector could offer opportunities for public participation and transparency. All these could be achieved through big data technologies and tools, as this field has the potential of delivering reliable services in the e-government department and openness of a variety of datasets [1]. Open data published by governmental portals along with ease of access would lead to higher transparency for citizens [7]. Furthermore, one of the most notable advantages of open data is that by making data transparent, public trust in government and civil servants is increased [21, 32].

Sharing data is also a great chance for governments to strengthen the sense of cooperation and communication between citizens and the state. For example, US government gathers and publishes different kind of data, which give to citizens and government the opportunity to utilize them more efficiently [2]. Also, through the social media channels, public sector transparency and accountability could be significantly improved, too. Suggestions or objections from citizens could be shared through social networks [5]. Therefore, the extended use of social media in e-government has the potential of gradually improving transparency [14].

3.3 Innovation

Innovation in a data driven government environment is centered around collaboration between governments, private and public organizations in order to invent new applications and solutions. These innovations could lead to the achievement of higher levels of efficiency, transparency, accountability, service quality and trust in government. Data-driven innovation requires technological capabilities needed for collecting, opening, sharing, combining, and analyzing data. Hence, is a complex field in which many stakeholders are involved as well as influential factors at a strategic, political, organizational, technical and governance level [13].

A disruptive aspect of innovation would be conceived through cloud services. The utilization of cloud computing would be a formidable asset for both public and business administrations, as well as for the citizens themselves. For the public sector, it is an important tool that could facilitate data interoperability, security and flexibility at the same time. Overall, cloud computing could advance the software services of public administrations and accelerate innovation processes [3].

Several case studies support the promotion of data-driven innovation in the public sector. For example, the “Living Lab” project is an environment that supports public open innovation processes by enabling public administrations to digitally communicate with private sector organizations and derive jointly creative solutions [13]. An experiment conducted in the United Kingdom showed that the increase in the proportion of distributed data between public administrations was a worth trying venture. If local authorities share their data, developers would be able to produce innovative applications (e.g. Transport of London) that could benefit both citizens and the council [31].

Smart cities are yet another innovative regional government approach, as they create a new path in understanding the urban problems [4]. Cities are becoming more complex than ever due to new technologies and digitalization. Hence, there is a global willingness to enhance the understanding of modern cities problems by making thoughtful decisions and taking actions towards the right direction. Concurrently, critical problems relating to cities like transportation and energy management could be resolved by utilizing data technologies. The approach of Smart Cities is highly correlated with big data technologies [1]. Some examples of Big Data applications are “Smart Education”, “Smart Traffic Lights” and “Smart Grid” [2].

E-participation is also fundamental for the Smart Cities Initiative. It promotes the co-operation of all the community members [20]. Smart cities take into account the exploitation of data technologies for communication and dissemination purposes. Citizens have the capability of feedback with online participation on urban problems, new plans and policies [4]. Additionally, open data produces social benefits, as citizens can more easily interact with government in an informative manner [32]. As a result, the citizen views and expectations can be identified, adopted and covered in future policies.

For all these features, data technologies and tools represent an unprecedented innovation, which has already been introduced to many countries such as Brazil, Singapore, and Portugal [20].

4 Challenges

Nowadays, data-driven public sector comes with several types of challenges including data security and privacy, portability and interoperability. Additionally, there are legal problems, such as national domination, old-fashioned legacy laws and heavy procurement processes. In addition, citizens and businesses have high expectations concerning better, more effective and personalized services [3]. The aforementioned issues have been identified in 26 papers listed in Table 2.

Table 2. Papers discussing challenges for data-driven administrations.

4.1 Cultural and Political Barriers

Despite the technology invasion and its integration in people’s everyday life, public governance confronts serious challenges regarding the wide pursuance of data technologies. Governance culture, politics, and ethics play a significant role in multiple ways.

An important challenge concerning public sector’s general approach to data is to identify and use it as an asset in order to strengthen internal procedures [8]. As the digital world evolves, the governmental open data become part of policy agendas. However, there is reluctance in releasing public information because of the preference of maintaining secrecy in that field. Further, fears exist concerning quality, accuracy and exposure to mistakes that may occur by the employees, as well as the misunderstanding of the real value created from open data [12].

Traditionally, the public sector has been a centralized and bureaucracy-oriented organization, thus the resistance to change exist in many levels of governance. The existing organizational models may impede the implementation of new technologies [14]. For instance, the utilization of semantic web technologies in the public sector is hindered by the complex public organizational structure as well as limited resources [22].

Cultural impediments are affecting public sector’s workforce management. Provision for training of employees and administrators on issues related to data is out of the governmental agenda [22]. Furthermore, data may not be transformed into knowledge in the hands of not experienced people. Misconceptions and misinterpretations could affect negatively the policy-decision making as analysis could be processed by uninformed individuals, thus making the role of experts crucial [1, 19]. Shortage of leadership may also be observed [12].

A major factor which subtends to the cultural diversifications is the level of public engagement in innovative public sector initiatives. A large proportion of the public is not digitally engaged [6]. This lack of familiarity strengthens citizens’ resistance to change, which consequently may hinder the holistic implementation of programs like Smart Cities [2, 20].

Lastly, ethical concerns arise due to possible insertion of artificial intelligence devices in public sector. ‘Hubo’ - ‘Hubogent’ is a compound of ‘Hubo’ and ‘agent’- a humanoid robot developed by the Korean Advanced Institute of Science and Technology (KAIST) in South Korea, a robotic agent geared with an intelligence system that carries out administrative work for human beings. According to the authors, there is an inherent danger in creating devices whose intelligence could, in certain areas, surpass that of humans. In order to prevent potential misuse and an imbalance of responsibility, appropriate prevention measures -including monitoring and controlling AI, effective legal gadgets and ethical systems, and principles of liability- must be established [27].

4.2 Technical Barriers

Another ominous challenge related to data implementations in public sector is that huge amount of data is collected from various sources, such as mobile devices and sensors and in many different formats. The existing tools do not align with the capabilities needed to process and store the amount of data generated and the traditional systems become inadequate and insufficient regarding the emerging data processing needs [1, 14]. Hence, the necessity for upgraded methods and tools appears.

Another major issue is that the paper-based media in public organizations inhibits information sharing as reproduction and dissemination of information are relatively expensive and time consuming. Movement of information is slow and cumbersome, inducing the generation of information silos and barriers. Using digitized applications to save and share information would demolish the above obstacles [33].

Low quality is a huge drawback due to the fact that data exists in many different formats, though unlinked and stored in separate systems with no provision for integrative features [10]. Due to the complex structure of public sector and the absence of collaborative design, large data sets are held into non-interoperable systems, which results in making the cost of integration for future analysis huge to bear [3, 9].

Governments dealing with issues of big data integration face some unique challenges. The biggest one appears to be the data collection. Governments encounter difficulties, as the data not only derives from multiple channels (such as social networks, Web, and crowdsourcing) but also from different sources (such as countries, institutions, agencies, and departments) [20]. Also, sharing data and information between countries remains a perpetual challenge. This issue complicates the integration effort of complementary data among government agencies and departments [16].

Moreover, integration of open data is challenging and often requires detailed analytical skills. Additionally, difficulties in accessing open data and failure to update them on a regular basis, prevent organizations and people from relying on public sector’s open data. Presently, open data are mostly available in different formats and have interfaces that are not user-friendly, a situation which fails to attract many users [32].

4.3 Data Protection – Privacy and Security Issues

There are also major issues regarding data privacy and security in public sectors. Not only all the collected data have to be into a machine-readable format, but also actions should be taken in order to protect the personal data or employee details [17]. Citizens anticipate that their personal data would be collected, protected and appropriately used by public agencies [29]. Unauthorized access of personal information may cause numerous problems. First of all, it may damage the commonly held opinion or worst, cause physical corruptions. In addition, it may lead to lack of trust in employees and instant rejections of people. There might also be some negative issues in operational systems, like delays and inadequacies [29].

Every single department of a public sector owns not only public but also private data, so high-security levels must be maintained in order to avoid unauthorized access. Public agencies should collect, share and use data in a way that citizens’ personal data are not violated. This is of utmost importance because most public organizations encounter cyber attackers. Undoubtedly this fact affects the productivity and the professional reputation of organizations [21].

Furthermore, similar aspects of security obstacles are raised regarding big data. Specifically, collecting and using big data, while protecting privacy, is a major challenge [24]. Governments have access to huge volume of data and in several cases, the information provided is not filtered and derives from non-credible sources [30].

In addition, variances arise between U.S legislation and the potential use of data from public organizations as the U.S public sector needs new data protection technology in order to ensure privacy. This is a strong challenge considering that there is certain unwillingness from citizens to cede over data to the state [26].

4.4 Efficient Data Management

Difficulties regarding efficient Data Management is another barrier for a data driven public sector. A remarkable problem is that the majority of the data remains unstructured and thus renders unusable. Such data includes photos, videos, audio files, etc. that must be converted into structured forms to be analyzed and utilized appropriately [2].

Strong evidence of the above is an Australian case study about a Firearms Management System (FMS), under the responsibility of the Western Australia Police. FMS is a system that holds information about all citizens of Western Australia that are certified to own a firearm. The study revealed major problems on entering, processing and presenting data. As a result, the local authorities were unable to manage the information regarding data of the licenses [29].

Goal displacement appears to be one of the main concerns. Specifically, this problem occurs when the administrative level of the organization focuses on tasks that are oriented towards measured outcomes that involve a massive volume of data [19].

Despite the fact that intelligent data technologies are already available to governments, there is still a challenge for administrators to correctly identify areas of potential use. In the case that they already possess data, they should manage them in the most beneficial way. According to a public sector CIO [9], only a 30% percent of the current data is going into analytics procedure, leaving a vast amount of information untapped [18]. Nevertheless, the challenge that arises here concerns the interpretation of the data because each interested party may hold a different perspective regarding the same information [30].

5 Conclusion

During our research, we came across many successful cases and examples which demonstrate the net value that could be added to public sector entities as a result of the successful management and exploitation of existing data. That value could be measured in terms of the delivered services quality, functionality, fiscal planning, local governance and new practices regarding government crowdsourcing. Nevertheless, several challenges remain due to the introduction of new processes and new issues related to privacy, applicable legal framework, security, and cultural issues. Although considerable insight has been gained upon those matters, our review suggests that further primary research needs to be performed especially to assess the user experience, the impact of new data technologies, the challenge of rationalization and simplification of public sector process design as well as the appropriate cultural issues and mentality of the public sector as well. Policy makers should encourage, amongst others, the (re-)education of public sector employees and create training programs in accordance with public needs.