Introduction

Open data is “digital data that is made available with the technical and legal characteristics necessary for it to be freely used, reused, and redistributed by anyone, anytime, anywhere” (Open Data Charter 2015, para. 1). This concept has proliferated and made diverse data sets publicly available from various organisations, including government spending, research outcomes, archives, surveys, environmental sensing, and museum collections. While sharing data online is not new, the formalisation of open data has created a global movement and led to a huge growth in the breadth and depth of data available. More than 250 governments, and global and local organisations, have initiatives that create and share open data (World Bank 2017), with more than 17,000 data sets of international relevance from the World Bank alone (World Bank 2018).

To date, potential uses of open data in learning are not well understood. However, several drivers could add impetus to this. Within the open data movement there is a desire to find ways to broaden public engagement (Shadbolt et al. 2012). There are also wider calls for data literacies to be considered as a key twenty first century skill set (Wolff et al. 2016). The Open Data Charter, which has been adopted by 17 national and 35 local governments to date, includes a principle to “engage with schools and post-secondary education institutions to support increased open data research and to incorporate data literacy into educational curricula” (Open Data Charter 2015, “Principle 6”, para. 5). In response, this paper asks how open data could become a valuable part of education by analysing the perspectives of early adopters of open data in teaching.

Openness already has a substantial history in education (Weller 2015) which has only recently been linked with open data. It is therefore important to compare and contrast how openness has emerged in relation to data and to educational resources. Atenas and Havermann (2015) suggest that when open data is used in education it can be considered as a form of open educational resources (OER), given the similar licensing and definitions. However, they note that the use of open data in this way is under-discussed. OER is specifically defined as “free and openly licenced educational materials that can be used for teaching, learning, research and other purposes” (Creative Commons 2016, para. 1). OER emerges from educational practices and OER initiatives focus on producing resources specifically for teaching and learning. While there are common principles, there are also distinctions in the nature of these forms of knowledge and their provenance. Unlike OER, open data can be created and released with no consideration that it might be used for educational purposes, and major proponents of open data primarily highlight the economic and social value it could hold (e.g. Open Data Institute 2017; Open Knowledge International, n.d.), rather than its educational uses. Given this, it is necessary to explore whether there are challenges to educators in using open data in order for it to be useful as a material for learning.

Research suggests a lack of awareness of open data amongst educators, for whom distinctions between open data, OER, and other online content are ambiguous (Atenas et al. 2015). There is little research to suggest why or how open data should be employed in teaching practices. The contribution of this paper is provide a rich understanding of reasons for using open data, the pedagogical approaches that are considered to be relevant, and the challenges to wider use. While open data is not currently a mainstream educational resource, there are educators who make substantial use of it in their teaching. Possibilities are being explored at all levels of education. Therefore we identify and interview a diverse sample of these ‘early adopter’ educators in order to understand their practices, perceptions and experiences. A document analysis was also performed on materials related to the interviewed educator’s practices with open data. Through a thematic analysis of this data, we develop a conceptual framework that suggests how open data could become a mainstream resource for education.

Open data in learning activities

Uses of open data in learning

The limited research on open data in teaching to date suggests a lack of awareness or coordination. Atenas et al. (2015) found that a majority of the 26 respondents to a survey on open data use in education were not actually making use of open data. They were using data that was free, but not openly-licenced. It is concluded that there is a lack of awareness of open data amongst educators and a challenge of clarifying the distinct value of it. Similar confusion between open and copyrighted materials has been identified in findings on educator perceptions of OER (Rolfe 2012). Among the respondents who did use open data, Atenas et al. (2015) found a wide variety of sources and topics, including governmental, NGO, museum, and research project data. This suggests that a breadth of possibilities are being explored by a minority.

Case studies of teaching with open data suggest a trend towards project or inquiry-based learning: Shamash et al. (2015) ask students to explore open data sets through visualisation, in order to develop analysis skills. Gray (2014) describes assignments in which a database of court records provides the grounding for exploring themes in historical crime. Ciocola and Reggi (2015) describe an approach that leads school children to investigate the outcomes of publically-funded projects. Across these activities, the capacity for choice of direction, and real-world relevance are emphasised. Alternatively, Dunwell et al. (2016) developed a set of mini-games around nutritional data, but they still identify that a quality of open data is the extensive volume of material, used to create a different learner experience each time.

Challenges involved in using open data can also be identified. Dunwell et al. (2016) highlight difficulties with using ‘live’ open data, as opposed to immutable, controlled data. Love et al. (2016) report difficulties in finding convenient open data for a data mining assignment, and caution that work needed to be done upfront by educators. Reflecting on the process of creating introductory programming assignments using open data, Coughlan (2015) describe work to identify data sets that facilitated activities related to learning outcomes, test model solutions, and monitor for emergent issues, to ensure that activities worked well.

Drivers for open data use in education

Three drivers could encourage educational use of open data. Firstly, public engagement with open data is considered an important tool for democracy and empowerment of individuals in society. However, there are a lack of effective socio-technical mechanisms for this (Atenas et al. 2015). Open data could provide societal and global benefits (Manca et al. 2016). Worthy (2015) highlights the popularity of some interfaces built using UK government open data, such as ‘TheyWorkForYou’, which summarises the activities of elected representatives to 2–300,000 visitors a month. Meanwhile other sources, such as those on local government spending, have limited public use and are primarily aimed at businesses and experts. Who uses open data, and the mechanisms through which society is influenced by it, remain open issues where education could play a pivotal role as a mechanism for public engagement.

A second driver is that open data holds potential as a material suited to learning important skills. Data literacy broadly combines technical and statistical skills with the ability to draw meaning by posing questions, interpreting data and analyses effectively, and also in developing skills in reading and creating visualisations (Crusoe 2016). Data literacy skills can be developed through learning activities that use open data (Wolff et al. 2016), and beyond this, Atenas et al. (2015) propose that critical thinking, statistical literacies, global citizenship, teamwork, data curation and research skills can be acquired through learning with open data.

A third driver is that organisations want to encourage public engagement and enhancement of the data they release. This could develop into valuable forms of interaction between organisations, educators, and learners. Coughlan et al. (2015) describe how the development of a system for schoolchildren and the public to identify locations related to artworks was prompted by a museum choosing to share data about its collection. The additional data generated through these public engagement activities could enhance collection data while providing learning experiences. Bradley et al. (2009) describe how an educational game to teach spectroscopy can develop skills and crowdsource improvements to the data. Although these organisational drivers may not necessarily align with learning goals, there is potential for mutual benefit.

Pedagogical connections

Rather than suggest that open data offers an entirely new pedagogy, it is important to identify links with existing teaching and learning concepts to guide research and practice. Four such connections are: Inquiry-based learning, Open education, Personalisation, and Authenticity.

The prevalence of inquiry or project-based pedagogy noted above brings a large body of research which, in particular, highlight the importance of providing suitable guidance to learners (Lazonder and Harmsen 2016). Project-based learning often includes first-hand data collection, but has also been based around using existing sources of data. For example, Edelson et al. (1999) used a climate data-based activity to explore inquiry learning challenges. Land and Greene (2000) analysed projects where learners searched for and used openly-available (but not openly-licensed) online data, such as grocery or airline ticket prices. A key finding was the need to scaffold the development of goal-driven strategies, while being open to some serendipitous data-driven exploration. This meant that time was not lost to ineffective searching and browsing.

There is a need to clarify how open data offers benefits above proprietary data. As the difference is primarily licencing, the benefits for open data could be similar to those of OER. Butcher (2015) summarises three transformative benefits of OER, which could be applicable to open data:

  • Increased range and reduced cost of resources.

  • Support for adaptation of materials, which allows educators and students to be active participants who learn by doing and creating.

  • Building capacity by providing educators access to the means of production of the resources.

Open data offers potential for personalised learning, for example by supporting a student to learn with data on a topic of interest to them. Responsibility for this may be taken by the learner or teacher, or could be performed by an algorithm (Fitzgerald et al. 2017). Research suggests that personally-relevant contextualisation of activities can improve motivation, level of involvement, and learning (Cordova and Lepper 1996). Familiarity with the topic of the data can create confidence and ground critical thinking. For example, Mittelmeier et al. (2018) report use of open data from the World Bank in a learning activity and found that the use of locally-relevant content (country-level educational statistics) encouraged greater participation in intercultural student collaborations.

The potential for authentic learning experiences (Herrington and Herrington 2006) with open data should be explored and evaluated. One could compare the efficacy of simulations such as virtual laboratories with activities based on ‘real’ open data. Benefits to the use of a well-designed simulation include removal of the ‘messiness’ of activities based on real-world data collection. But losing complex authenticity—for example the possibility of mistakes in the data collection—can negatively impact on learning (De Jong et al. 2013). Extrapolating from this, the authenticity of open data could be of educational value, because it is drawn from real-world actions and professional organisational practices. However, it is argued that broader characteristics, such as community interaction and authentic assessments are needed for learning to be wholly authentic (Hung et al. 2012; Herrington and Herrington 2006). Simply using open data doesn’t create authentic learning, but it offers a starting point.

In summary, disparate examples show the potential of open data for learning, and external drivers that suggest that stronger links will emerge between open data and education. However there is sa lack of mainstream understanding of open data amongst educators. Underexplored links exist with established research and practices. In the remainder of this paper, the perspectives of educators using open data are analysed to understand the potential and challenges for wider use.

Methodology

This research employs a qualitative descriptive design. This is appropriate to understand and summarise specific events experienced by individuals (Knupfer and McLellan 1996). In this case, these events are the practices of early adopters of open data in teaching. The design assumes a level of objectivity in the reports of activities by interviewees. In the phenomenological research tradition the data would be considered to represent lived experiences, rather than objective facts (Knupfer and McLellan 1996). However, the document analysis of materials related to each interviewee’s practice provides further support for the reports, and the combined focus on describing the learning activities used, educator experiences, and views on the potential of open data in learning, is valuable in gaining a holistic view from early adopters.

The overarching question was: How do experienced educators conceptualise open data and its use within education? Within this, three sub-questions highlight the rationale, pedagogy, and practical issues found in the use of open data:

  1. (1)

    Why would educators make use of open data in their teaching?

  2. (2)

    What approaches do educators employ when using open data?

  3. (3)

    How do educators incorporate open data into their teaching and what challenges exist in this?

A semi-structured interview approach was used to gather rich data from the first-hand perspectives of educators identified as having used open data in teaching through a web-based review. A process of thematic analysis was conducted with this data following the approach described by Braun and Clarke (2006). The resulting phases of this process are described in the findings section below.

A document analysis was also performed using gathered materials related to the educators practice, including course materials, project websites, and published reports of their teaching practice. The documents were originally used to develop an initial understanding of the interviewee’s practice pre-interview. After the interviews were conducted and analysed, the documents were analysed and coded according to the interview themes in order to confirm trustworthiness of the data from the interview, and to augment understanding of the themes derived from the analysis of the interviews.

Sampling and recruitment

Given limited information of open data use in education, a purposive sampling approach was deemed appropriate to recruit a sample of educators to provide insight through their experiences with open data. As preparation, a web-based review was conducted to identify projects and educators connecting open data and education. This created a base understanding of the area and identified a population of potential interviewees. 26 relevant links to courses and educational projects were identified through Google searches on relevant terms (“open data course”, “open data education”, “open data teaching”). In response to Atenas et al.’s (2015) finding that the meaning of open data could be ambiguous to educators, the researcher clarified that all or most of the data sources used in each of these courses or projects were openly-licenced. The research aimed to gain perspectives across diverse educational contexts, so interviewees were judged to have experiences of using open data in different teaching or educational contexts. 14 suitable contacts were identified from these 26 links. Contact was made via email to provide a project information sheet and request an interview.

Interview structure

To understand the emergence of this practice and its meaning, a constructivist viewpoint on the creation of knowledge is taken. This means that interviewees, as experts in this area, can shape the themes arising alongside the interviewer. Semi-structured interviews were used to explore experiences and perceptions through dialogue, with space for the interviewee to raise themes beyond a structure of consistent questions (see “Appendix A”).

In the first stage, the interviewee provided a record of their own experiences and explained their rationale and approach to using open data. Questions then prompted further discussion of the potential, requirements, and challenges of open data in education. The themes in literature described above were used to devise these questions, but questions approached these topics generally to avoid leading the interviewee. For example, rather than asking whether they use an inquiry-based learning design, the interviewer asked what sort of approaches or learning activities they think work with open data, and whether they use particular theories of learning or pedagogy. It is also important to reflect on the influence of the background of the researcher. The researcher/interviewer has first-hand experience of using open data in their own teaching, and an interest in OER. However, the questioning focused on the interviewee’s practices, to prioritise their perspectives.

Document analysis

For each of the interviewees, materials, reports, media, and websites related to their use of open data were gathered for a process of document analysis (Bowen 2009). This material was reviewed prior to the interviews for an initial understanding of the aims and practices of each interviewee. After the thematic analysis of the interviews was performed, this material was returned to and a thematic analysis was performed to identify instances of themes proposed. This provides a suitable means to triangulate and validate the assertions made by interviewees (Bowen 2009).

Ethics

The research was approved by the Human Research Ethics Committee of The Open University, and conducted in accordance with the 1964 Helsinki declaration and later amendments. To provide an opportunity for interviewees to correct any misrepresentation of their views, a report detailing the full analysis of the data was provided to the interviewees for amendment. Informed consent was obtained from all individual participants included in the study. It was agreed that the names of interviewees, and their organisations, staff, or students are not published, in order to protect anonymity and support candid responses.

Results

14 identified people or projects were contacted and 10 interviews were conducted. 4 of the interviews were conducted in person and the remaining 6 via video conferencing. The interviews ranged from 28 to 47 min in length, with an average length of 37 min. A total of 15 artefacts were used in the document analysis with at least one artefact for each interviewee.

Table 1 describes the interviewees and the materials used for the document analysis in each case.

Table 1 Interviewees, the contexts of their teaching, and the artefacts used in document analysis

As a sample of interviewees in a specialised area, this provides a broad representation across teaching contexts, subject matter and location. Hereafter, ID codes are provided in brackets after quotes to identify interviewees.

Findings

177 initial codes were generated by the researcher in the initial stage of thematic analysis to describe concepts in the interview transcripts. These were refined and sorted into broader themes and an initial set of overarching categories. The researcher then provided interviewees with a report of this analysis to check for any misrepresentation or inaccuracies of interpretation. None were identified. Further stages of analysis explored the relationships between these themes and categories. A final stage then focused on how these categories and themes could be refined to support answering the research questions by splitting or combining themes that contained multiple concepts within them. As one example of this, a theme of “data literacies” from the earlier analysis was identified to contain distinct themes within it that reflected different questions. Firstly, that open data provided a means to teach data literacies, and secondly, the need to ensure that students and educators had appropriate data literacies to complete the learning activities. This was therefore split into a “Basis for learning data literacies” theme within the “Characteristics for Learning” category, and “Managing data literacy requirements” and “Developing educator data literacies” themes within the “Skills and Knowledge” category.

The final refined analysis described below has 6 categories that contain 28 themes. The categories and themes are described in the following sections (including Tables 2, 3, 4, 5, 6 and 7) and highlighted in bold where mentioned.

The categories of themes comprise: Characteristics for Learning, which represents perceived strengths of open data when used in learning. Teaching Approach, which represents elements of the learning activities described. Open Data in Society themes describe ways in which education, open data, and societal issues interact. Creating and Sharing themes highlight social interactions in learning with open data. Skills and Knowledge themes represent ways in which the understanding of learners and educators interacts with the capacity to use open data in learning. Finally, Making Data Usable describes work that is performed in order that learning activities can occur with open data. Below we describe each category in turn and the themes within them.

The document analysis validated the existence of 25 of the 28 themes and identified 65 instances of these themes in total. Tables 2, 3, 4, 5, 6 and 7 identify the count of documents in which each theme was found. The nature of the documents analysed meant that certain themes were strongly represented. For example, ‘tools that support learning with open data’ are commonly introduced to students in course materials, and tasks related to ‘communicating with data’ were often given to students. The nature of the documents can also explain the lack of instances of the three themes that were not identified. Educators would be unlikely to describe the need for ‘developing educator data literacies’ in their own teaching materials. The potential for generating ‘supply prompted by demand for open data’ through educational activity would not be a priority to communicate in these situations. The challenge of ‘finding raw data’ reported by educators was implicit in the examples of data sets used, but not explicitly discussed in these documents. Beyond validation of the themes, the documents provide useful elaboration of the themes and examples of this are integrated with the interview data in the below descriptions.

Characteristics for learning

A range of valuable characteristics of open data were identified by educators (Table 2). Firstly, the broad scope of the materials can support a wide variety of novel activities. For instance, E’s assignment guidance states that:

“Canada’s Open Data portal is http://data.gc.ca/, where you can find tens of thousands of data sets from various government ministries and agencies. One goal of releasing Open Data is to… use this data to provide services or information to Canadians that the government does not have the resources or the mandate to provide.” (E, Assignment Guidance).

Table 2 Themes identified within the ‘characteristics for learning’ category

Related to this, open data can have relevance to the learner or to current affairs, and this is seen to have motivational qualities. As D surmised: “You take a map with some data, the first thing (students) do is put their postcode in. ‘Can I see my house?’” (D). Similarly, K took open data on test scores in local schools back into the classroom to teach statistics, and found that:

“They have family or friends who go to different schools in their neighbourhoods. That drab learning experience around box plots was transformed into something completely different” (K).

Open data offers universal access to digitised records and analyses from a range of contexts. It can therefore create advanced possibilities for activities where learning occurs in ways that previously would have been only the realm of professional or higher-level research activity, or would have been too time consuming to perform. Interviewees noted how the use of these databases changed the possibilities in learning activities and outcomes. For example, G teaches college bioinformatics and described how they run lab activities in which students explore plant genetics based around an open scientific database: “They pick a gene, they research it, they even do some transformations with the gene” (G). This database is designed for researchers, rather than education. However with sufficient guidance, the labs allowed students “to experience how all this data is just there and if we work together we can get better research results” (G). They were enthused by the way that open data brought students closer to the world of professional practice, and “found it is not actually that hard for them to interpret the data once they figure out how to get there” (G) by being guided through the interface. Open data can therefore support transitions to higher levels of scholarship and to close gaps between teaching and research.

Related to this, the real-world provenance of the data, and the processes through which it is collected, offers authenticity for the exploration of rich and complex issues, and for development of skills in real-world contexts. In the case of the plant genetics database, G described the unpredictability that arises from the database, in contrast to a contrived textbook example. When “comparing genes you would expect to have a certain outcome. But based on what gene you pick, you may not have the data line up as you would expect…(students) ask me how come this box is red rather than green. And I get to say ‘I don’t know. That is an open question. You get to figure that out.’” (G).

Other educators were keen to use open data as a basis for learning data literacies. Interviewees expressed that developing capacity to understand data was important to learners and societies. There was seen to be potential through this to address imbalances where, for example, there was a “culture where people rely a lot on what politicians say rather than the data” (F), or for E, to make important data accessible for the public to understand.

Teaching approach

A variety of themes highlighted the ways in which learning activities were designed to harness open data and to respond to the challenges of matching it to student capabilities (Table 3).

Table 3 Themes identified within the ‘teaching approach’ category

Eight of ten interviewees employed forms of project or inquiry-based learning to harness the relevance, broad scope and authenticity characteristics described above. The lesson plans used by C were structured in project phases of “designing, exploring, analysing and telling”, and the adaptation of inquiry-based approaches to use of open data was considered relatively straightforward, with D stating that:

“In scientific experimentation, often that says ‘you collect your data’, but you can collect your data from an open data source. Really it is the same scaffolding.” (D).

The broad scope of data sources supported students to make choices for personalisation, through which the potential for the characteristic of relevance to the learner could emerge. Teaching materials from C and E guide students to explore a particular data repository in order to develop an interesting or “non-trivial” (E) project, while identifying a topic and any relevant open data sources is an initial stage in the activity developed by F.

Most educators saw the capacity to harness connections by working across multiple data and information sources as important. This theme was prevalent in the document analysis where assignments and examples of student activities showed the value of combining sources. For example, educator E provided an example to their students where border agency data on expulsions from the country was combined with other sources of data on crime and demographics to identify correlations. The examples they share of assignments created by their students include various comparisons of data from different sources. For example, analysing mortgage rates in conjunction with data on average home rental prices over time to show changes in the financial implications of home ownership or renting.

Where assessment was important, process-based assessments were mentioned. This focused on how well students had followed a process and used sources, rather than for accuracy or a specific goal. For example, A stated that they “judge them on… the effective use of the sources they have got. They (should) go off and find other sources.” (A).

Although they were keen to support students to connect multiple sources and for the authenticity of the data to emerge, educators could also use various forms of simplification to reduce the burden of complexity and support the achievement of learning outcomes in the time available. This was particularly evident in school-based activities. In working with teachers to create such activities, D noted that:

“we’ve got three sessions (in the school) but in reality if someone is doing a genuine data analysis of these kinds of complex data sets (they might take) five or six weeks.” (D).

Simplification could also include devising small test activities before a main project activity began, and scaling up from small data sets in which each value and computation could be understood. For example, a starting activity where students were restricted to a small number of values within the data, that let pupils “see that: ‘OK if I can make it work with 10 values it will also work with 12,000’” (J)

Open data in society

Rationale and use of open data was influenced by the open data movement and data-producing organisations, and the engagement of learners with open data was considered a means towards societal change (Table 4). This was highlighted when interviewees expressed that their learning activities had the potential for real-world impacts, even if these were small or sporadic. For example that students had added information to a public research database. Open data also held potential for infusing education with politics and governance. Some educators such as C directly use open data as a means to engage learners with political issues like spending. Others were more concerned with developing public engagement with open data through education. For example, that:

“…now that we have this base of available open data, there is the issue of getting people to use it. And getting people to understand the possibilities for their own work.” (F)

In this, supply and demand for open data are intertwined: if open data is used by the general public through education, this can create demand for more data to be released in usable forms. As E noted: “It is not enough to work with the open data that is there. You should also be asking questions about the data that is not there” (E).

Table 4 Themes identified within the ‘open data in society’ category

Creating and sharing

This category captures how social and collaborative approaches were employed around open data-based activities (Table 5). All ten interviewees mentioned that Communicating with data was seen as a key skill to be developed. There was a strong emphasis on creating stories with the data, for example that “what you are using it for is to go in and find a story…that will be compelling” (E). This theme was also identified in six artefacts in the document analysis, showing how communicating with data was embedded in the design of learning activities. For example, the teaching materials used by C and F both included the assignment of a group member to be the ‘storyteller’, who could be responsible for “finding interesting angles to explore and producing outputs that really speak to the intended audience” (F).

Table 5 Themes identified within the ‘creating and sharing’ category

Creating data and sharing it offers another route to make contributions to society and to broaden understanding of the processes and influences on data in our lives. Learners could combine activities around existing open data with others to “collect (data) with thermometers, to geographically map where they took different temperature measurements” (J). Such activities could provide an additional perspective and raise awareness of issues such as accuracy or personal data privacy. F’s organisation provides materials on ‘Collecting data using Smartphones’, as well as on creating maps to represent data. Peer support mechanisms included sharing advice as students grappled with new concepts and skills, and for different roles in collaboration.

Skills and knowledge

Themes in this category highlight the interplay between the understanding required to enable open data-based activities, and where skills could be developed through these activities (Table 6). Eight of ten educators highlighted that their teaching combined subject knowledge and data literacies, even where data literacies was the primary goal. For example:

Table 6 Themes identified within the ‘skills and knowledge’ category

(For other educators) “…the real end is understanding more about politics. For me, the real end is… (to) use an interest in politics to get them interested in data” (B). However, they also “wanted data sets where a fairly wide spectrum of people would have some idea of what it is about” (B) and felt that “you should be learning real and interesting things as part of the by product of what you are doing” (B).

They therefore chose data sets that provided scope for comparison of basic statistics about countries around the world, and around health, that would be of wide interest, as a foundational familiarity with the subject domain of the data was seen as necessary, and interest in the subject matter a motivator.

Managing data literacy requirements for students was a key issue for educators, including the technical skills required and the capacity to follow an inquiry process using data. G noted that:

“The only way I’ve managed that is to take it real slow… Because they first have to grasp the computer stuff and then they have to grasp the science… I don’t actually see the two as in conflict. I see tech as being a bigger part of the science nowadays.” (G)

The wider development of student digital literacies was also raised. Skills could be developed in activities, however there could also be checkpoints to assess existing literacies. For example A noted that they get their students “au fait with the technology… very early and then we come back to it. So I get them to do mini things… so I know then that they have been able to use the search mechanism” (A).

Six interviewees described that developing educator data literacies was essential, but it was not clear how this would be facilitated. It was noted that: “open education and open data…(should) meet and have some sort of repository where all these (learning activities) exist, where we can share them” (G). The importance for educators to feel comfortable in their own knowledge of the data was seen to be key to making data usable.

Making data usable

A range of themes described preparation and manipulation activities around the open data (Table 7). All interviewees noted that finding good data for a purpose could be a substantial undertaking for educators or learners. Teaching materials from F identified the ‘scout’ as a member of each team whose role was to find relevant data, and to find information about its quality and limitations. Educators looked for certain qualities in these sources, for example, that they offered raw data, rather than aggregations that were of limited value some purposes. Refining and curating data sources was a common undertaking to simplify the learning activity, with “a lot of curation and cleaning up and digging…before that open data set has any value in the context of learning” (K).

Table 7 Themes identified within the ‘making data usable’ category

Four interviewees emphasised that poor choices in data formats and documentation could lead to the potential misinterpretation of data by students and additional challenges for educators. A lack of clarity or consistency in how data is captured and analysed may result in obscured issues of accuracy and change to data that educators had to respond to. Interviewee H described how, on returning to run a second presentation of their course, they were surprised that the same data points had changed and no longer matched the course materials. This was because these points was based on estimates that had altered after the inclusion of additional data. As in other tasks to make data usable, the work required to assess accuracy could become part of the learning activity, For example, the skills taught in D’s project were stated as including “understanding the validity” and engaging students in assessing “anomalies, bias, and completeness” in the data used.

Six interviewees described tools that support learning with open data. Open data was commonly manipulated in spreadsheets and this was seen to help students to review data and formulas. However some interviewees noted that these general tools for data manipulation and analysis offered poor support to learn from mistakes. E.g.: “Most spreadsheet software, if you don’t get the exact syntax …they will give you an error message and that’s pretty much it” (K). The potential to simplify understanding of data using visualisation was also raised, to support students “to be able to look at a whole lot of variables at once, using graphical metaphors that are pretty easy to understand” (B). B and K were involved with projects to produce tools for learning with open data. Tools were described in eight of the analysed documents including in lesson plans, course materials, and assignment guidance. This shows the importance of guiding learners to understand tools as part of the process of using open data in learning, and again emphasised that learning with open data encompasses learning data and digital literacies alongside subject knowledge. It also highlights the need to simplify and scaffold learning through choice and guidance around tools, with A and J highlighting how simple interfaces to open data, such as Gapminder (2018), were instrumental their work,. Figure 1 shows a Gapminder tool, as used by J, which provides simple ways to compare data about countries over time.

Fig. 1
figure 1

Gapminder interface for exploring open data (author Gapminder.org, CC-BY)

Discussion

The analysis is now used to answer the research questions and construct a conceptual framework through which support for the use of open data in education can be understood.

Why would educators make use of open data in their teaching?

Educators identified positive characteristics that could emerge through the use of open data in learning: The broad scope for exploration of varied subject matter, and the potential relevance to the learner, were perceived to motivate learners. Open data also presented advanced possibilities for activities that reflected higher level professional or academic practices, and the potential for authentic learning experiences drawing on real-world data and processes. A final characteristic is that open data provides a basis for teaching data literacy. This was a motivation for educators engaged with the themes of the ‘Open Data and Society’ category, as these data literacies are essential to developing awareness and public participation with open data.

These characteristics can link with, and extend, familiar concepts in education that interviewees implicitly or explicitly followed in their teaching. This provide avenues to inform next steps in research and practice. To highlight two such links: research evidence suggests that relevance can improve educational outcomes (Mittelmeier et al. 2018; Cordova and Lepper 1996). Authentic learning research (e.g. Herrington and Herrington 2006) would suggest that learning with open data should go beyond just accessing a ‘real’ data set, but include elements such as authentic collaboration and assessment of outcomes. The themes within the ‘Creating and Sharing’, and ‘Open Data in Society’ categories show how some educators developed a broader sense of authenticity by getting students to communicate, collaborate, and engage with the wider world through open data.

Using these themes, the rationale of interviewees could be classified into three groups: Those doing subject matter teaching who used open data to create advanced possibilities (A, G), those seeking to develop public engagement through open data with authentic inquiry (C, F, J), and those using the relevance and broad scope of open data to motivate learners and develop data literacies (B, D, E, H, K).

What approaches do educators employ when using open data?

Themes in the ‘Teaching approach’ category provides a foundation to answering this question, but other categories are connected. Project or inquiry-based activities were commonly employed, and these were a familiar pedagogy for most interviewees. This facilitated the choice of a topic and of data as a means to achieving personalisation. Research suggests that there are alternative ways to provide such personalisation, and insufficient clarity on where and why it has a positive impact (Fitzgerald et al. 2017). Educators built in encouragement to harness connections across multiple data sets and for sharing and communicating with data. In this regard, open data-based learning could be linked with Connectivism (Siemens 2005), and the principles associated with this could form a basis for activity design.

The above themes focus on harnessing positive characteristics of open data, yet teachers also design activities to overcome particular challenges. Simplification of complex activities was required. This relates to themes present in the ‘Skills and knowledge’ category, where learner and educator’s data and digital literacies may need to be monitored and developed in tandem with subject knowledge. Research on scaffolding strategies could be particularly relevant (Belland et al. 2013), given the need for complex skill development in the context of motivation-based, authentic and ill-structured tasks.

How do educators incorporate open data into their teaching and what challenges exist in this?

The strong sense of value in using open data was tempered by practical challenges in implementation. Themes within ‘Making Data Usable’ highlight barriers for many educators. Finding good data for a purpose, developing awareness of formats and documentation, and refining data in order to make it usable, were not trivial tasks for educators or learners as they explored how to harness the value of open data.

Greater educator data literacy would support wider use of open data in learning, and this could be connected with a wider data-driven agenda for education, in which it is argued that educator data literacy is essential for evaluating and improving teaching. Mandinach and Gummer (2013) identify a growth in awareness that all educators must use evidence in decision making and therefore need training in data literacies, and Kippers et al. (2018) describe a program that significantly improved the data literacy of teachers. If this wider agenda brings advances in data literacy, it could overcome some of the existing barriers to open data as a mainstream educational material.

Challenges raised by interviewees arise from gaps between producers of data and the educational users. Raw data may not be available in the form that would be most useful, and the educators or learners may not fully understand the accuracy of the data or its propensity to change. It is therefore pertinent for educators to consider how they might be influential in the wider ‘Open Data and Society’ space. There is potential for educators and learners to engage in the demand for open data, and therefore influence its supply. Tools that support educational use of open data, whether designed specifically for this purpose, or having appropriate characteristics, are also sought after as a means to alleviate burdens and enhance the learning potential.

Illustrating the connections between rationale, approach and challenges

To illustrate how rationale, approach and challenges are interrelated in the interviewee’s activities, the overall approaches of the three differing rationale groups are now described.

Creating advanced possibilities for subject matter teaching

For A and G, open data enhances the activities they use to teach their subject. A uses open data to get students to understand how legal trials operated historically. Students can choose a single trial or set of related trials on a theme of interest. Students produce a group presentation in which they enact a trial and discuss it, and an individual written report. Assessment is process-based including the effective use of multiple sources and the cohesion of the presentation. Open data is valuable to this pedagogical design by overcoming the limitations of access and understanding of printed records. It supports inquiry across multiple sources. One source is at the core but two others are pointed to for supplementary information. A and G emphasise the ease of use of these resources, including tools built into the websites for visualisation and search, and emphasised the challenge of ensuring that students have appropriate data and digital literacies.

Develop public engagement through open data with authentic inquiry

For C, F, and J, open data is a basis for public engagement with societal and political issues. This requires the development of literacies in data and civic participation. For example, F’s initiative aims to develop greater data literacy in the public to tackle local concerns. This is achieved through devising inquiry-based structures which can adapt the interests of the local groups taking part. There is an emphasis on identifying a problem that is of authentic interest and relevance. The learning process is driven by this problem, so if the existing open data is inadequate, the response is to identify or collect further datasets. Skills are developed in areas such as communicating with data in journalism or lobbying. C and J work in schools, but present a similar emphasis on basing inquiry in authentic problems and relevance to current affairs.

Use relevance and broad scope to motivate learners to develop data literacies

B, D, E, H, and K all teach topics related to data literacy, including programming for data analysis, or data visualisation. For B, the pedagogy of the MOOC provides choice to learners to personalise their data analysis activities with datasets that have been simplified to reduce the burden or potential for mistakes. B, D, and H emphasis that supporting the choice of data with personal relevance (for example, local to the student) is a tool to motivate interest where this can otherwise be lacking. Requirements for subject matter knowledge and data literacies are again a tension, but here it is the subject matter knowledge that can be missing—data cannot be understood without context. B selects data sets around widely-understood topics such as health. E guides students towards using data sets where they have some existing knowledge.

A framework to support greater use of open data in learning

A conceptual framework using this analysis is presented in Fig. 2. This is grounded in the themes raised in the interview analysis, but elaborates on the relationships between the categories which suggest how further practice, research and development could support open data to become a mainstream part of learning activities. This framework can guide thinking around the construction of learning activities using open data, or could inform wider initiatives to engage educators with open data.

Fig. 2
figure 2

Framework for using open data in learning activities

The framework highlights the interconnected nature of each category. Here we highlight some of the interconnected relationships across these that should be a focus for further research and development.

Identify shared goals for cooperation between educators and open data actors, leading to tools and initiatives that make open data more usable with less effort

Educators could engage large numbers of people with relevant data, and activities can combine use of existing open data with sharing and advocacy for release of data. The responsibility for making open data usable for education is unclear and mutually beneficial models of cooperation could support the tasks of ‘Making data usable’, increasing the potential for educators and learners to engage.

Harnessing motivations and personalisation towards real world impact from learning, to create shared artefacts and communications that improve the potential to use open data

Open data is grounded in societal and organisational practices and can support authentic learning experiences. However open data are rarely devised for educational purposes, and are often poorly specified. Sharing the structures, created data, and outcomes devised in educational activities using open data could provide a means to reduce these burdens.

Work to make data usable requires and develops literacies, education should be integrated with open data initiatives to support the development of literacies

For educators and learners, data literacies are both a requirement and potential learning outcome. At the same time, pedagogical expertise from the education sector would be beneficial in reducing barriers to public engagement with open data, which is a goal of many open data initiatives.

Limitations and future work

In line with many qualitative studies, this research has focused on in-depth analysis of rich data from a small sample. The limited mainstream use of open data in education to date constrained the sample of interviewees and projects. The research was also limited to English language speakers and future research should rebalance this. Early adopters may act in some ways that are not generalizable to mainstream educators. This research has identified a set of reasons and approaches to using open data in teaching. It did not seek to measure the effectiveness of open data in a specific pedagogical approach. Future work should seek to measure whether specific pedagogical designs using open data are effective for students, building on the views of educators and the links to related pedagogical research described here.

Conclusions

By working with early adopters this research identifies how open data has potential as an authentic and relevant material that expands the possibilities for learning activities. It can be used to develop public engagement with data. It offers advanced possibilities for learning activities across a range of subject matters, and can utilise the personal interests of learners to motivate learning of data literacies. Similar themes arose from interviewees working in diverse educational contexts, from schools to MOOCs.

Limited mainstream use in education can be attributed to challenges identified here. Most open data sets are being released from real-world professional practices, such as scientific research or policy-making. Learners and educators may not have the literacies or resources to make them useful to them. Tools that simplify some of the complexities of working with open data are emerging, but there remain challenges to making data usable. These challenges are not insurmountable, and their repetition across interviewees suggests that shared solutions would be valuable. The framework in Fig. 2 highlights relationships between the themes that would be part of these solutions.

Data-sharing organisations could be actively engaged in providing suitable forms of data and support. Identifying mutual benefits for these organisations to engage with learners is one way forward. Partnerships in which educators and learners engage with the data producer could also increase the authenticity of learning experiences. However, open data can be used and adapted independent of the data sharing organisation. A second approach therefore relies on educators, learners, and educational technologists to build a layer of support and translation on top of original data sources that makes them suitable to the teaching approaches described above. This would be appropriate to working across data sources and utilising connections between these. Some evidence for the emergence of both of these approaches could be found in the interviews, and they are not mutually exclusive.

To increase effective use of open data in learning, there is a need to understand how to harness the qualities identified here effectively in the design and implementation of learning activities, and of supporting approaches and technologies. The adaptation of existing research and pedagogy into this endeavour should be fruitful, and the analysis and framework presented here offers a conceptual starting point through which to direct these efforts.