1 Introduction

‘We only have to imagine a world without Google searches, online weather forecasts or GPS technologies to realize the current impact of data on our lives’ (Jetzek et al. 2014, p.101).

The rapid advancement of ICTs together with electronic publishing has enabled wide distribution of large amounts of data previously held in closed, internal systems. ‘Big data’ consists of datasets so large and complex that they require advanced capture, storage, management, and analysis technologies (Chen et al. 2012; Hota et al. 2015). While big data is characterised by its size and variety (Gandomi and Haider 2015; Kankanhalli et al. 2016), ‘open data’ is characterised by its free availability and absence of privacy restrictions (Janssen et al. 2012). Although large volumes of raw open data published in an electronic format are machine-readable and can be shared online and re-used, on its own open data offers limited potential for decision making. However, when dispersed open data is interlinked to provide more context, greater opportunities for stakeholders to exploit the data for innovative purposes are provided, for example through collaboration and co-creation (Behkamal et al. 2014).

‘Big open linked data’ (BOLD) is a recent and rapidly emerging field in the technology oriented business world (Janssen et al. 2015). It refers to the integration of diverse data, without predefined restrictions or conditions of use, to create new insights (Janssen and Kuk 2016). BOLD can be released by public and private organizations or individuals (Janssen et al. 2015) and can increase the reach of statistical and operational information, and deepen analysis of outcomes and impacts. Realising the variety of potential benefits (Hossain et al. 2016), governments are keen to adopt open data policies, documented by the increasing number of countries committing to the Open Government Partnership, with 65 countries collectively developing more than 2000 policy initiatives by 2014 (Open Government Partnership 2014). McKinsey and Company (2011) estimate that the value of big data to US healthcare could be more than $300 billion through driving efficiency and quality, and in the private sector using big data effectively has the potential to increase retailers’ operating margins by 60 %. The use of BOLD is often tied to evidence-based policymaking (Ferro et al. 2013; Janssen and Kuk 2016); however, unlike public sector actors, private organizations can view data as a strategic asset, providing a challenge to greater information sharing (Sayogo et al. 2014).

It is widely recognised that innovation is key to growth and performance (Hauser et al. 2006; Van der Panne et al. 2003). BOLD creates innovation opportunities for both the public and private sectors, from innovation of processes and products to developments in the supply chain and new markets (Jetzek et al. 2014; Zuiderwijk et al. 2014). However, Janssen et al. (2015, p.87) state that ‘creating innovations with data is a complex process in which both the available data and the users’ demands need to be taken into account’. Despite the complexities, research has not yet attempted to draw together the factors affecting innovation through BOLD. Industry-focussed research highlights issues that need to be addressed to capture the full potential of big data - such as innovation - including data policies, technology infrastructure, organizational change and talent, access to data, and competitive advantage (McKinsey and Company 2011). Although providing a useful starting point for further investigation, the interrelationships between the issues have not been explored, which is necessary for avoiding failure and maximising success of new initiatives in this area (Dwivedi et al. 2015a; Hughes et al. 2015). Therefore, adopting the interpretive structural modelling (ISM) method, this research seeks to attend to this gap.

The remainder of the paper is as follows. First, a literature review of research regarding BOLD and innovation is undertaken. Next is a section detailing the ISM method employed to determine the power of different factors in driving innovation through BOLD, followed by further sections discussing the results and their implications. Finally, the paper is concluded, outlining limitations and discussing future lines of research.

2 Literature review

In their analysis of the literature, Chen et al. (2012) found research regarding ‘big data’ began to gain traction from 2007. Similarly, Zuiderwijk et al. (2014) report a sharp increase in publications regarding ‘open data’ from 2009. However, research combining the concepts of big, open, and linked data has only recently begun to emerge, and studies considering innovation through BOLD are even more scarce.

This review of the literature finds support for Zuiderwijk et al.’s (2014) suggestions that much of the existing research has oriented towards data provision. Shadbolt et al. (2012) consider how to bring open government data into the linked-data web. They report that licensing restrictions are one of the biggest obstacles, management of an influx of heterogeneous data a challenge, and ease of citizen access and better infrastructure is critical to realize value. Considering data disclosure in the private sector, Sayogo et al. (2014) found several challenges and motivating factors regarding market dynamics, information policies, data challenges, and technological capability. Nevertheless, research is beginning to emerge regarding the acceptance and use of data and open data technologies (Zuiderwijk et al. 2015). Juell-Skielse et al.’s (2014) study investigates the role and functions of digital innovation contests and explores the support provided following such contests to finalise and implement the participants’ ideas. Susha et al. (2015) examined the organisational measures to facilitate the use of open data. Their findings indicated that most public organisations have no or limited interaction with data users and are often found selective in terms of with whom and how to communicate.

Given the novelty of the area, many existing studies adopt a case study method. Lassinantti et al. (2014) used two in-depth case studies of Swedish municipalities to consider how local open data initiatives can stimulate innovation. Analysis of the cases revealed different drivers for open data initiatives – ‘techno-economic growth’ and ‘co-created societal growth’. The authors note that although targeted innovation activities initially render quicker results, excluding potential innovators can inhibit more radical innovations. Janssen et al. (2015) explored the link between BOLD and smart cities based on case studies of Amsterdam and Rio de Janeiro and found that BOLD combined with predictive analytics enables improved use of resources in the urban area. It was found that a main challenge of using BOLD to create smart cities is in identifying data sources and the availability of the data. The authors noted that much can be accomplished with simple analytic techniques but in order to take advantage of the methods citizens must be smart with the knowledge provided.

Nugroho et al. (2015) provided a comprehensive cross-national comparative framework to compare the open data policies from different countries. The comparison highlighted various lessons including actions related to strong legal framework, generic operational policies, data providers and data users, data quality, designated agencies and initiatives, and incentives for stimulating demand for data. Jetzek et al. (2014) devise a framework of value generation strategies from the data provider’s perspective. The four identified mechanisms are transparency, participation, efficiency, and innovation. Jetzek et al. (2014) propose a conceptual model of the data driven innovation mechanism consisting of three fundamental phases: idea generation, idea conversion, and idea diffusion. They determine four multi-dimensional ‘enabling factors’ capable of influencing the innovation mechanism, namely absorptive capacity, such as organizational capabilities; openness, such as ease of access to data; resource governance, including leadership and privacy; and technical connectivity, for instance number of platforms. However, the conceptual model is presented at a high level of abstraction, failing to account for interrelationships between individual factors, and is based on a single-case study.

Following Dwivedi et al.’s (2015a) approach, a recent panel discussion held at the 14th IFIP I3E Conference brought together invited academic and practitioner experts to consider how BOLD can be utilised to drive innovation and the obstacles and challenges that might be implicated (Dwivedi et al. 2015b). Several of the panellists noted the diverging interests of different stakeholders and the risks of forgetting users’ needs as a result of data-driven solutions. As disadvantages of BOLD are often overlooked (see Zuiderwijk and Janssen 2014), panellists discussed the technical, legal, regulatory, and ethical challenges. This panel discussion provides further foundations for the development of a conceptual model of innovation through BOLD.

Zuiderwijk et al. (2014) argue that the diversity of theories that are currently implicated in open data research is likely to be a result of the topic being an emerging phenomenon. The authors recommend that future research should focus on theory development and stimulating the use of open data. Therefore, this paper responds to these recommendations by taking pioneering steps to develop a theory of driving innovation through BOLD.

3 Methods

Interpretive structural modelling (ISM) is a well-established method for identifying relationships among specific items, which define a problem or an issue (Jharkharia and Shankar 2005). A number of factors may be related to any complex problem under consideration. However, the direct and indirect relationships between the factors describe the situation far more accurately than a specific factor taken in isolation. Therefore, ISM develops insight into collective understanding of these relationships (Attri et al. 2013). The method is interpretive in the sense that a group’s adjudication decides whether and how the variables are related. It is structural in the sense that an overall structure is extracted from the complex set of variables based on their relationships. Finally, it is modelling in the sense that the specific relationships and overall structure are portrayed in a digraph model through a hierarchical configuration.

The ISM method helps to impose order and direction on the complexity of the relationships among the variables of a system (Attri et al. 2013; Sage 1977; Warfield 1974). For a complex and emerging problem, such as innovation through BOLD, a number of factors may be implicated. However, the direct and indirect relationships between the factors describing the situation are far more precise than the individual factors considered in isolation. Therefore, ISM develops insight into the collective understanding of these relationships. For example, Singh et al. (2007) used ISM to develop structural relationships between competitiveness factors to aid small and medium enterprises’ strategic decisions. Similarly, Agarwal et al. (2007) applied ISM to identify and analyse the interrelationships of the variables influencing supply chain agility. Moreover, Talib et al. (2011) employed ISM to analyse the interactions among the barriers to total quality management implementation. The application of ISM typically forces managers to review perceived priorities and improves their understanding of the linkages among key concerns. The various steps involved in the ISM method are (Singh et al. 2007):

  1. [1]

    Identification of elements relevant to the problem or issue; this could be undertaken through a literature review or any group problem solving technique (such as panel discussion).

  2. [2]

    Establishing a contextual relationship between variables with respect to which pairs of variables will be examined.

  3. [3]

    [3] Developing a Structural Self-Interaction Matrix (SSIM) of elements to indicate pair-wise relationships between variables of the system.

  4. [4]

    Developing a reachability matrix from the SSIM and checking the matrix for transitivity. Transitivity of the contextual relation is a basic assumption in ISM, which states that if element A is related to B, and B is related to C, then A will be necessarily related to C.

  5. [5]

    Partitioning of the reachability matrix into different levels.

  6. [6]

    Based on the relationships given above in the reachability matrix, drawing a directed graph (digraph), and removing transitive links.

  7. [7]

    Converting the resultant digraph into an ISM-based model, by replacing element nodes with statements.

  8. [8]

    Reviewing the ISM-based model to check for conceptual inconsistency and making the necessary modifications.

The above outlined steps that lead to the development of the ISM model are discussed below.

3.1 Identification of elements

The literature review revealed that a comprehensive identification of the factors related to innovation through BOLD has not previously been undertaken. Therefore, expert opinions were sought to identify elements and develop contextual relationships among relevant variables.

The first step involved identifying all relevant facets of innovation through BOLD via a panel session with interested BOLD experts attending the first day of the 14th IFIP I3E Conference in Delft, The Netherlands. Every element was discussed thoroughly to develop a common understanding. The factors that experts finally agreed on were: resistance to change, value, access to data, awareness, security, privacy, human resource factors, organisational factors, data licensing, data quality, technology infrastructure, cost, acceptance, risk, competitive advantage, external pressure, legal aspect, trust, and innovation through BOLD. As the aim of the research is to identify and analyse factors driving “innovation through BOLD”, it is considered as an ultimate variable and the impact of all other variables are explored around it. Table 1 presents the meaning/definition/example/type of various factors as discussed and finalised by the panel of experts.

Table 1 Description of identified elements

3.2 Structural self-interaction matrix (SSIM)

Once the elements had been identified it was necessary to determine contextual relationships between the factors to develop the SSIM. In total seven specialists, including three professionals with diverse industry experience related to BOLD and four highly proficient academics with mixed experience of teaching, researching, and advising government on policy and on BOLD related matters, were chosen to provide their expert views. The diversity among participants helped to ensure a holistic view was achieved.

To analyse variables associated with innovation through BOLD, a contextual relationship of ‘helps achieve’ or ‘influences’ is chosen. To express the relationships between different factors on innovation through BOLD, four symbols were used to denote the directions of relationships between the parameters i and j (here, i < j):

  • V – Construct i helps achieve or influences j.

  • A – Construct j helps achieve or influences i.

  • X – Constructs i and j help achieve or influence each other.

  • O – Constructs i and j are unrelated.

For example, the following statements explain the use of symbols V, A, X, O in SSIM:

  1. [1]

    Resistance to change (Variable 1) helps achieve or influences innovation through BOLD (Variable 19) = V

  2. [2]

    Legal aspect (Variable 17) helps achieve or influences security (Variable 5) = A

  3. [3]

    Technical infrastructure (Variable 11) and privacy (Variable 6) help achieve or influence each other = X

  4. [4]

    Data quality (Variable 10) and access to data (Variable 3) are unrelated = O

Based on contextual relationships, the SSIM is developed (see Table 2).

Table 2 Structural self-interaction matrix (SSIM)

3.3 Reachability matrix

The SSIM is converted into a binary matrix, called the initial reachability matrix, by substituting V, A, X, and O with 1 and 0 as per the case. The substitution of 1 s and 0 s are as per the following rules:

  1. [1]

    If the (i, j) entry in the SSIM is V, the (i, j) entry in the reachability matrix becomes 1 and the (j, i) entry becomes 0.

  2. [2]

    If the (i, j) entry in the SSIM is A, the (i, j) entry in the reachability matrix becomes 0 and the (j, i) entry becomes 1.

  3. [3]

    If the (i, j) entry in the SSIM is X, the (i, j) entry in the reachability matrix becomes 1 and the (j, i) entry becomes 1.

  4. [4]

    If the (i, j) entry in the SSIM is O, the (i, j) entry in the reachability matrix becomes 0 and the (j, i) entry becomes 0.

Following these rules, the initial reachability matrix for innovation through BOLD is shown in Table 3.

Table 3 Initial reachability matrix

After including transitivity as explained in Step 4 of the ISM method, the final reachability matrix is shown in Table 4. Table 4 also shows the driving and dependence power of each variable. The driving power for each variable is the total number of variables (including itself), which it may help to achieve. On the other hand, dependence power is the total number of variables (including itself), which may help in achieving it. These driving and dependence powers will be used later in the classification of variables into the four groups including autonomous, dependent, linkage, and drivers.

Table 4 Final reachability matrix

3.4 Level partitions

The matrix is partitioned by assessing the reachability and antecedent sets for each variable (Warfield 1974). The final reachability matrix leads to the reachability and antecedent set for each factor relating to innovation through BOLD. The reachability set R(si) of the variable si is the set of variables defined in the columns that contained 1 in row si. Similarly, the antecedent set A(si) of the variable si is the set of variables defined in the rows, which contain 1 in the column si. Then, the interaction of these sets is derived for all the variables. The variables for which the reachability and intersection sets are the same are the top-level variables of the ISM hierarchy. The top-level variables of the hierarchy would not help to achieve any other variable above their own level in the hierarchy. Once the top-level variables are identified, they are separated out from the rest of the variables and then the same process is repeated to find out the next level of variables, and so on. These identified levels help in building the digraph and the final ISM model (Agarwal et al. 2007; Singh et al. 2007). In the present context, the variables along with their reachability set, antecedent set, and the top level is shown in Table 5. The process is completed in seven iterations (in Tables 4-10) as follows:

Table 5 Partition on reachability matrix: interaction I

In Table 5, variables 2 (i.e., value), 6 (i.e., privacy), 12 (i.e., cost), 13 (i.e., acceptance), 15 (i.e., competitive advantage), and 19 (i.e., innovation through BOLD) are found at level I as the elements (e.g., elements 1, 2, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 17, 18, 19 for variable 2) for these variables at reachability and intersection set are the same. So, they will be positioned at the top of the hierarchy of the ISM model.

In Table 6, the variables 1 (i.e., resistance to change), 5 (i.e., security), 7 (i.e., human resource factors), 14 (i.e., satisfaction), and 18 (i.e., trust) are put at level II as the elements (e.g., elements 1, 3, 4, 5, 7, 8, 9, 11, 14, 16, 17, and 18 for variable 1, and elements 1, 5, 7, 8, 14, 16, 17, and 18 for variable 18) for these variables as reachability and intersection set are the same. Thus, they will be positioned at level II in the ISM model. Moreover, we also remove the rows corresponding to variables 2, 6, 12, 13, 15, and 19 from Table 5, which are already positioned at the top level (i.e., level I). The same process of deleting the rows corresponding to the previous level and marking the next level position to the new table is repeated until we reach the final variable in the table.

Table 6 Partition on reachability matrix: interaction II

In Table 7, variables 3 (i.e., access to data), 4 (i.e., awareness), and 9 (i.e., data licensing) are put at level III as the elements (i.e., elements 3, 4, 8, 9, and 11 for variable 3, elements 3, 4, 8, 9, 11, 16 for variable 4, and elements 3, 4, 9, 11, and 17 for variable 9) at reachability set and intersection set for these variables are the same. Thus, it will be positioned at level III in the ISM model.

Table 7 Partition on reachability matrix: interaction III

In Table 8, variables 8 (i.e., organisational factors) and 17 (i.e., legal aspect) are put at level IV as the elements (i.e., elements 8, 10, 11, 16, 17 for variable 8 and elements 8, 10, 11, 17 for variable 17) at reachability set and intersection set for these variables are the same. Thus, it will be positioned at level IV in the ISM model.

Table 8 Partition on reachability matrix: interaction IV

In Table 9, variable 16 (i.e., external pressure) is put at level V, as the elements (i.e., 11 and 16) at reachability set and intersection set for this variable are the same. Thus, it will be positioned at level V in the ISM model.

Table 9 Partition on reachability matrix: interaction V

In Table 10, variable 10 (i.e., data quality) is put at level VI as the element (i.e., 10) at reachability set and intersection set for this variable is the same. Thus, it will be positioned at level VI in the ISM model.

Table 10 Partition on reachability matrix: interaction VI

In Table 11, variable 11 (i.e., technical infrastructure) is put at level VII as the element (i.e., 11) at reachability set and intersection set for this variable is the same. Thus, it will be positioned at level VII in the ISM model.

Table 11 Partition on reachability matrix: interaction VII

3.5 Developing canonical matrix

A canonical matrix is developed by clustering variables in the same level, across the rows and columns of the final reachability matrix as shown in Table 12. This matrix is just another, more convenient, form of the final reachability matrix (i.e., Table 3) as far as drawing the ISM model is concerned.

Table 12 Canonical form of final reachability matrix

3.6 Classification of factors influencing innovation using BOLD

The factors for innovation using BOLD are classified into four categories based on driving power and dependence power: autonomous, dependent, linkage, and drivers (Mandal and Deshmukh 1994). The driving power and dependence power of each of these BOLD factors is shown in Table 4. The driver power – dependence power diagram is shown in Fig. 1.

Fig. 1
figure 1

Driving power and dependence diagram

This figure has four quadrants that represent the autonomous, dependent, linkage, and driver categories. For example, a factor that has a driving power of 2 and dependence power of 17 is positioned at a place with dependence power of 17 in the X-axis and driving power of 2 on the Y-axis. Based on its position, it can be defined as a dependent variable. Similarly, a factor having a driving power of 17 and a dependence power of 2 can be positioned at dependence power of 2 at the X-axis and driving power of 17 on the Y-axis. Based on its position, it can be defined as a driving variable. The objective behind the classification for innovation through BOLD is to analyse the driver power and dependency of the factors.

The first cluster includes autonomous factors that have weak driver power and weak dependence. These factors are relatively disconnected from the system. In the context of the current research, none of the factors belong to this cluster. The second cluster consists of the dependent variables that have weak driver power but strong dependence; acceptance is the only variable that belongs to this cluster. The third cluster has the linkage variables that have strong driver power and dependence. Any action on these variables will have an effect on the others and also a feedback effect on themselves. The majority of the variables - resistance to change, value, access to data, awareness, security, privacy, human resource factors, organisational factors, data licensing, technical infrastructure, cost, risk, competitive advantage, external pressure, legal aspect, trust, and innovation through BOLD - fall under this category. The fourth cluster includes drivers or independent variables with strong driving power and weak dependence. Only one variable, namely data quality, falls under this category (see Fig. 1).

3.7 Formation of structural model

From the canonical form of the reachability matrix (see Table 12), the structural model is generated by means of vertices and nodes and lines or edges. If there is a relationship between the factors i and j responsible for innovation through BOLD, this is shown by an arrow that points from i to j. This graph is called directed graph or digraph. After removing the indirect links, the digraph is finally converted into an ISM-based model as shown in Fig. 2.

Fig. 2
figure 2

ISM-based model. = shows links to all nodes in next upper level

The different levels, and the variables at each level, are identified using the level partitioning process of the ISM method. They indicate the degree of driving and dependence power of a variable or set of variables and how they are linked up to each other at the same level and with the variables of the next upper level.

The ISM-based model developed in this research depicts that technical infrastructure (such as processing power, legacy systems, software access, high storage capability, scalability and performance, and fragmentation) is the most fundamental variable for innovation using BOLD as it comes at the base of the ISM hierarchy (i.e., Level VII) (see Agarwal et al. 2007). Technical infrastructure facilitates data quality, which further helps in building external pressure to address and maintain it. Collectively, technical infrastructure, data quality, and external pressure provide the basis for innovation through BOLD. Moreover, they are also closely linked to each other. These lower level factors lead to shaping the organisational factors (including culture, strategy, structure, governance, competency, ambitions, vision etc.) and legal aspect (see level IV).

The improvement in middle level variables helps to achieve next-level variables (Agarwal et al. 2007). Therefore, improvement in organisational factors and legal aspects lead to better access to data, superior awareness (including awareness of data, the platform where it is published, and potential of innovation), and data licensing. These factors at Level III directly influence resistance to change, security, HR factors (such as leadership, management competency, lack of knowledge, capacity building, and asymmetry of information), risk, and trust (including trust of technology, data, processes, and innovation) at the next higher level (i.e., Level II). For example, open access to data can raise questions regarding data security, especially in relation to sensitive data, and can also raise concerns about the trust of data, so leading to higher risk for using and implementing it further.

The top level variables demonstrate strong dependence on other variables (Agarwal et al. 2007). In the present context, the variables value, privacy, cost, acceptance, competitive advantage and innovation through BOLD which are at the top level (i.e., Level I) show strong dependence power. The variables at Level II influence the topmost hierarchy (i.e., Level I) of the ISM model. For example, aspects related to security can better serve the privacy of BOLD. Similarly, the relationship between risk and innovation through BOLD indicates that higher the risk involved with access and use of BOLD, weaker will be the innovation using such data whereas higher trust, on the contrary, can strengthen innovation through BOLD.

4 Discussion

BOLD opens a world of possibilities for innovation but creating innovations with BOLD is a complex process. The ISM method has uncovered the relationships between the numerous variables identified during the brainstorming session at the 14th IFIP I3E Conference as being associated with innovation through BOLD. The findings are now discussed in the context of existing literature as well as discussions undertaken by experts at the 14th IFIP I3E Conference panel, and theoretical contributions and practical implications are explored.

Almost all variables were determined to have both strong driving and dependence powers, determining them as ‘linkage’ variables. Linkage variables can be considered relatively unstable (Singh et al. 2007; Talib et al. 2011). Therefore, in the context of innovation through BOLD, any action on almost all the variables will have an effect on the others as well as feedback on themselves. An explanation for this is that BOLD is in its infancy and governments and companies are still struggling with how to make sense of it. There is not one proven or best infrastructure, and data quality is often unclear and needs to be investigated. The hype might result in pressure, but the capabilities to take advantage of this and to create acceptable and feasible innovations that are not conflicting with legislation are lacking. Therefore, knowledge about all aspects presented in the ISM-based model is necessary to drive innovation.

Finding that technical infrastructure comes at the base of the ISM hierarchy is in accordance with much of the existing research regarding BOLD. Insufficient technical capabilities and lack of adequate technical infrastructure create a major impediment for data creation as well as data sharing (Sayogo et al. 2014; Shadbolt et al. 2012). One of the experts at the 14th IFIP I3E Conference panel commented that “all too often datasets are not linked and there is a need for tools to derive links between datasets”. Without the technical infrastructure, BOLD will not be able to be found, processed and analysed (Zuiderwijk et al. 2015) – an obvious requirement for innovation through BOLD.

Park et al. (2012) argue that business intelligence systems are of limited value when they deal with inaccurate and unreliable data, which are common characteristics of self-reported data. As the only ‘driver’ according to Fig. 1, data quality needs consistent attention (Agarwal et al. 2007) to encourage innovation through BOLD. The results of the ISM-based model suggest that poor data quality, will eventually lead to less trust, more risk, and ultimately prohibit innovation through BOLD.

That legal aspects have an effect on access to data and data licensing supports Sayogo et al.’s (2014) argument that unclear demarcation of legal boundaries can hamper data openness, which in turn would inhibit innovation through BOLD. Experts at the 14th IFIP I3E Conference panel discussion asked questions like “Who is in control of the data?”, “Who guarantees business continuity and quality?” and “What happens if the people who open and manage the data are corrupt?”. Often miscellaneous data are combined from various sources, from different owners, so nobody has responsibility. The use of BOLD poses high demands on data governance. However, McKinsey and Company (2011) suggest that for benefits to be realised, policy makers will often also need to push the deployment of big data innovation and the findings of this study support this. Nevertheless, the ISM-based model also determined organisational factors to appear on the same level as legal aspects, suggesting both internal and external governance is equally important.

At the 14th IFIP I3E Conference panel it was expressed that “there is a lot of value that can be derived [from BOLD] – customers become the product as soon as they use platforms such as Facebook”. Jetzek et al. (2014) suggest that innovation through BOLD creates value through new structures, which themselves form the foundation for new data and hence innovation, resulting in a cyclical process where value and innovation through BOLD feed into each other. Support is found for this value generation framework and conceptual model of the data driven innovation mechanism given that value, competitive advantage, and innovation through BOLD appear at the same top level in the ISM-based model and are all linkage variables.

4.1 Theoretical contributions

BOLD is a relatively new and emerging field of research, thus only a few studies (e.g., Dwivedi et al. 2015b; Janssen and Kuk 2016; Janssen et al. 2015) have been published in this area. As far as the authors are aware, there has not been any previous attempt to identify factors driving innovation through BOLD. Therefore, this is the first study in the field that identifies and links nineteen factors related to innovation through BOLD. The formal development of these links and further predictive causal links between factors as identified in this research can be considered as a significant contribution in this area.

A further key theoretical contribution is in the method adopted, being the first study to utilise ISM to determine the links between constructs steering innovation through BOLD and assess how these links are represented in the perspective of their driving and dependence power in relation to the other factors. The hierarchy or level of constructs presented in the ISM-based model indicates the relative importance of different variables as drivers, relatively dependent constructs or constructs somewhere in the middle across the levels. The ISM-based model also provides the correlations between the constructs presented at the upper four levels. The interdependency of these constructs at the same level indicates how closely they are related to each other and so will allow researchers to select these constructs for further framework development and validation.

4.2 Practical implications

The proposed ISM-based model for identification and ranking of factors influencing innovation through BOLD provides a framework for practitioners and policy makers to help encourage and manage innovation through BOLD. The utility of the ISM method lies in imposing order and direction on the complexity of relationships among these factors, which will help decision-makers to better utilise their available resources for maximising innovation through BOLD.

The driver-dependence matrix (Fig. 1) indicates that there is no construct falling in the autonomous cluster. The constructs under this cluster are the weak drivers and weak dependents and hence they do not have much influence. The absence of any autonomous factors in this study indicates that policy makers and practitioners should pay attention to all factors identified as being related to innovation through BOLD. As acceptance is a factor with weak driving power and relatively high dependence power, practitioners should give high priority to understanding the acceptance of innovations and using BOLD. All other factors except acceptance and data quality fall under the linkage cluster, making them unstable as any action on these factors will have an impact on the others and also feedback on themselves (Talib et al. 2011). This reiterates the importance for practitioners to ensure their attention is shared across all variables identified.

5 Conclusion

In order to attend to the current gap in the literature the key objective of the present study was to develop a hierarchy of factors influencing innovation through BOLD. The variety of data sources, the different interests of stakeholders, and unknown outcomes make it a challenge to drive innovation through BOLD. From a panel of experts, 19 variables relevant to innovation through BOLD were identified including resistance to change, value, access to data, awareness, security, privacy, human resource factors, organisational factors, data licensing, data quality, technology infrastructure, cost, acceptance, risk, competitive advantage, external pressure, legal aspect, trust, and innovation through BOLD itself. Utilising ISM, the categorisation of factors was achieved and relationships between the variables were established. The findings indicate that technical infrastructure, data quality, and external pressure form the foundations for innovation through BOLD. The placing of value, competitive advantage, and innovation through BOLD at the same top level in the ISM-based model reinforces the utility of innovation through BOLD and thus the importance of this research. However, the high dependencies and linkages among variables show that for many components there are uncertainties about how to do this as there is no standard infrastructure for BOLD that can be used to foster innovation. Despite this, organizations need to be able to deal with all aspects of the ISM-based model to create innovation through BOLD; it is likely that only a few organizations are able to deal with all these aspects. This suggests that more proven practices are necessary before innovation through BOLD can fly.

5.1 Future lines of research

Despite the significant contributions of this research, like all studies it is not without limitations. Although experts were consulted to generate factors relevant to innovation through BOLD there are likely to be other relevant factors, which could be explored in future research. Similarly, it would be useful for future research to conceptually develop the factors further using both inductive and deductive methods before the model is statistically tested and validated using structural equation modelling. As identified in the literature review, some research is emerging regarding the overcoming of challenges such as technological capability, management of heterogeneous data, and quality assessment. Further research should be conducted on each component of the ISM-based model in order to assess the policy and practical implications for each.

Aside from the future research directions resulting from the limitations of the study, the novelty of BOLD presents a wide-range of further lines of research. BOLD innovation might be conceptualized as a complex adaptive system (CAS). CAS can generally be defined as a system that emerges over time into a coherent form, and adapts and organizes itself without any singular entity deliberately managing or controlling it (Holland 1996). Innovation through BOLD is a complex process in which many organizations might interact with each other. Therefore, social interaction among actors and the use of technology are both key aspects. Users may change over time and innovations will be shaped and reshaped based on input from different actors. The use of BOLD is a typical situation in which various stakeholders have different objectives; some might prefer transparency whereas others may want to keep data private as a strategic asset. Different scenarios or use contexts might focus on one type of actor or sector, a range of innovation trajectories including deductive and inductive, and/or different needs and objectives. Therefore, further research is required to delineate how different actors can successfully interact to achieve innovation through BOLD as a CAS.

Whereas literature has mainly focussed on the role that technology can play in facilitating humans in processes of innovation, there is a rise in innovative practices and products that are shaped by technology. As computational power, networks and algorithms are growing in terms of speed and strength BOLD can be ordered, reordered and analysed by non-human intelligent systems. Industry-wide there has been a rise of predictive algorithms that can automatically detect new business opportunities and can help assess if business concepts or start-ups will succeed or fail. As humans increasingly have to deal with non-human actors in the form of intelligent BOLD systems more research is needed to understand this relationship in general, but more specifically it is necessary to understand the role of artificial intelligent systems in the process of innovation through BOLD.

From a data-management perspective, successful BOLD innovation raises several challenges including: finding and dealing with large data sets; integrating datasets that were not originally intended to be integrated; restructuring datasets to fit a common vocabulary; and building usable data management interfaces for users of various levels of expertise. Future research is required to uncover the effective data models and existing formalisms to handle the integration of data and transformations. Moreover, these systems should be able to deal with both structured and unstructured data. More research is needed to develop new tools for big data analytics, as existing statistical tools may not facilitate the analysis of large volumes of unstructured data. The concept of ‘deep learning’ is relevant here where intelligent algorithms capable of recognizing items of interest in large quantities of unstructured and binary data, and deducing relationships without needing specific models or programming instructions, need to be developed. More attention is also required to develop effective user interfaces that enable non-experts who do not have deep data-management experience to find, integrate, transform, and visualise data in meaningful ways.

Related to data-management, another area that requires further work in the use of BOLD is ethics, where tools as well as policies and guidelines are needed that are capable of ensuring the privacy and security of data. In this respect, more research is needed into anonymization of organisations and individuals during use and re-use while at the same time ensuring that transparency and accountability is maintained. For this purpose, regulatory frameworks are evolving and need to be developed further to help define how to collect, manage and interpret data for scientific and practical purposes.

More research is needed to identify and define the business case and conditions for small and medium size enterprises to come up with innovative real time systems that are capable of extracting, indexing and linking data across multiple data sources, such as internal systems, data warehouses, sensors, and social media streams, as well as user generated location based data from mobile devices. A key area that is yet unclear in the BOLD debate is the value proposition that it offers third party organisations and entrepreneurs who the public sector rely on for developing applications that can exploit their open data. Moreover, several questions, including, who will be the ultimate end users of public sector open data, who will pay for the use of the analytical tools and solutions that can make sense of the open data, and how useful is the public sector open data for end users, still remain to be answered empirically. Indeed, answers to these questions will form the basis for defining a sustainable business model in which conditions for exploiting BOLD can be set out in a public sector context for all stakeholders, including the business community and citizens.

Innovation generally requires, on the one hand, diversity of contexts, actors and evidence, and, on the other, interaction between these through various forms of experimentation. This can take place deductively in a designed and top-down manner directed by a particular need or objective, or more inductively, open-ended, bottom-up and emergent (as in CAS). Although the former is more common as innovation with or for a purpose, the latter can also make important contributions. Developing more proven practices of BOLD needs more research into the array of specific roles it can play in these two contexts to drive or support innovation, for example by developing real life scenarios which recognise that the context, purpose and perceived benefits of use are highly important. In turn, this will likely rest on the recognition that non-BOLD evidence and inputs are both unavoidable and necessary - BOLD is unlikely to achieve high impact or meaningful innovations on its own.