Defining and conceptualising data harmonisation: a scoping review protocol
Data harmonisation is an important intervention to strengthen health systems functioning. It has the potential to enhance the production, accessibility and utilisation of routine health information for clinical and service management decision-making. It is important to understand the range of definitions and concepts of data harmonisation, as well as how its various social and technical components and processes are thought to lead to better health management decision-making. However, there is lack of agreement in the literature, and in practice, on definitions and conceptualisations of data harmonisation, making it difficult for health system decision-makers and researchers to design, implement, evaluate and compare data harmonisation interventions. This scoping review aims to synthesise (1) definitions and conceptualisations of data harmonisation as well as (2) explanations in the literature of the causal relationships between data harmonisation and health management decision-making.
This review follows recommended methodological stages for scoping studies. We will identify relevant studies (peer-reviewed and grey literature) from 2000 onwards, in English only, and with no methodological restriction, in various electronic databases, such as CINAHL, MEDLINE via PubMed and Global Health. Two reviewers will independently screen records for potential inclusion for the abstract and full-text screening stages. One reviewer will do the data extraction, analysis and synthesis, with built-in reliability checks from the rest of the team. We will use a combination of sampling techniques, including two types of ‘purposeful sampling’, a methodological approach that is particularly suitable for a scoping review with our objectives. We will provide (a) a numerical synthesis of characteristics of the included studies and (b) a narrative synthesis of definitions and explanations in the literature of the relationship between data harmonisation and health management decision-making.
We list potential limitations of this scoping review. To our knowledge, this scoping review will be the first to synthesise definitions and conceptualisations of data harmonisation in the literature as well as the underlying explanations in the literature of the causal links between data harmonisation and health management decision-making.
KeywordsData harmonisation Data linkage Health information exchange Routine health information system Scoping review Health management decision-making
Health information exchange
Low- and middle-income countries
Medical Subject Headings
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
Routine health information system
World Health Organization
An effective health system relies on a routine health information system (RHIS) that provides the informational support needed by health managers to identify gaps in service delivery and to inform planning, implementation and monitoring of interventions . However, many countries, especially in low-and middle-income settings, do not have well-functioning routine health information systems (RHISs) to monitor and evaluate their work [2, 3]. This limits countries’ ability to improve the effectiveness, efficiency, quality and equity of their health services.
Given the increasing availability of large electronic databases of routine health information, health authorities and managers, information technology (IT) stakeholders and researchers have identified data harmonisation as an important intervention for strengthening routine health information systems (RHISs) [4, 5]. There is often a lack of coordination and integration of large electronic databases; this is typically due to inconsistencies between key variables and indicators for collecting, analysing and reporting health information across programmes ; the production of poor quality data that cannot easily be exchanged ; and programmatic fragmentation across levels of the health system which can result in the duplication and excessive production of data . Data harmonisation has the potential to address all these problems, through coordination, linkage and integration of existing large-scale databases [6, 7, 8].
Harmonised data sets also have the potential to improve informational support for health management decision-making and, in turn, support health systems strengthening [9, 10]. However, data harmonisation interventions may take on different parts of the problem of fragmented systems, use different definitions and may have different intended outcomes with regard to improving routine health information systems. This makes comparison and assessment of its usefulness for improving health systems functioning difficult to assess. A second challenge is that sometimes even when quality and timely health information are available, limited access to and use by management for planning, monitoring and evaluation and quality improvement is still a problem [6, 7, 9]. Data harmonisation has the potential to provide timely, relevant and accessible informational support for health management decision-making [12, 13], but we need to better understand how data harmonisation might actually work to improve decision-making. In this review, we are interested to learn more about the scope of data harmonisation definitions and activities as well as how those working in this field understand its effect on management decision-making. Our assumption is that harmonised routine health information may increase access to and use of relevant routine health information which could improve management decision-making and tasks of monitoring, evaluation, planning and ongoing quality improvement. We define effective management decision-making as the proactive and interactive process that demands and uses the best available data (well-integrated, complete and accurate data) during programme development as well as monitoring and evaluation .
Why it is important to do this scoping review
There is growing recognition that the successful implementation of data harmonisation interventions occurs in multiple technical and social (i.e. organisational and behavioural) contexts. This multi-faceted nature of data harmonisation has resulted in a range of different terms being used for interventions with similar aims and activities . For example, terms such as data integration , data linkage  and health information exchange  are all used to describe data harmonisation-type activities, and it is not always clear the extent to which these efforts are similar in practice, scope and relevance. While the use of multiple terms is not a problem in itself, lack of clarity on what constitutes ‘data harmonisation’ makes it difficult to compare studies and synthesise evidence on impact.
Lack of understanding of the underlying causal mechanisms between the data harmonisation activities and the intended outcomes for health management decision-making also makes it difficult to compare interventions and to evaluate the impact and implications for health systems strengthening. Having a clearer idea of the range of definitions and concepts used, the various components and activities included in data harmonisation interventions and the proposed underlying causal mechanisms being tested can help inform researchers and health system decision-makers on the design, implementation and evaluation of different data harmonisation interventions .
Since 2012, there have been three systematic reviews on data harmonisation and related activities, indicating a growing interest in the topic. The reviews were concerned with the integration of health information found in multiple databases across multiple organisations, for the purposes of clinical and service improvements, and for research analyses. One review focused on the determinants of RHIS performance and its role in improving health systems functioning and performance at the local level . Another focused on views of health care professionals on data sharing or data linkage of clinical data for research purposes , while the third focused on barriers and facilitators of health information exchange (HIE) in LMICs . Consistent with what was found in primary studies of data harmonisation processes, these reviews used a variety of terms to explain the integration and exchange of health information . Data harmonisation was defined both narrowly and broadly depending on its objectives; in one review, data linkage was used solely to describe the technical stages of combining multiple databases , while in another, health information exchange was used to describe similar as well as broader processes involving multiple stakeholders to mobilise information across various systems, organisations and geographical areas . It is important to identify and synthesise these variations in terminology in a systematic way, to reflect both the range of activities, but also to identify the commonalities, and build an understanding of how data harmonisation interventions are thought to work to support the different needs of implementers and/or users of harmonised data.
This scoping review will follow the methodological stages for scoping studies proposed by Arksey and O’Malley  who recommend a process that is “not linear but iterative, requiring researchers to engage with each stage in a reflexive way” in order to achieve both ‘in-depth and broad’ results. The steps involved are identifying the research question, identifying relevant studies, selecting studies for inclusion, data extraction and data synthesis.
Study question and objectives
To identify and synthesise the characteristics of studies of data harmonisation;
To identify and synthesise the various definitions and concepts used to describe data harmonisation interventions, and
To develop a conceptual understanding of explanations in the literature of the causal relationship between data harmonisation interventions and health management decision-making.
In order to inform our understanding of the causal mechanisms (including the role of key contextual socio-technical dynamics) (objective 3), we will draw on information extracted for objectives 1 and 2 and, in addition, extract data on the descriptions of the components, processes, contexts and intended causal pathways of data harmonisation interventions. Such a synthesis has the potential to broaden and clarify the knowledge base of researchers and health management about the range of and variation in data harmonisation interventions, and the intended relationship between the components (individually or in combination) and management decision-making.
Identifying relevant studies
Peer-reviewed research studies (no methodological restrictions) and grey literature on data harmonisation in health-related information databases are eligible if they provide (a) a definition and/or a conceptualisation of data harmonisation (and/or related terms) and/or (b) a description of a data harmonisation intervention (in terms of components and processes and causal mechanisms) and/or (c) contribute to an explanation of the causal relationship between data harmonisation and health management decision-making (for example, through improved quality and accessibility of harmonised information for management and or the utilisation of harmonised health information for management decision-making). Studies concerned with various technical aspects of data harmonisation, such as changes in key variables and indicators, software and hardware infrastructure for data generation, and in reporting and feedback procedures, are also eligible, provided it is considered part of a data harmonisation intervention.
MEDLINE via PubMed
Science Citation Index and Social Sciences Citation Index, ISI Web of Science
Relevant websites, such as the World Health Organization (WHO) and MEASURE Evaluation websites
Search terms will include a distillation of keywords and Medical Subject Headings (MeSH) terms related to data harmonisation (concept A) and health information system (concept B). We have developed a preliminary search strategy using relevant keywords and MeSH terms (see Additional file 1). To ensure that we do not miss potential studies, we will apply an iterative approach using known studies that meet the inclusion criteria identified during preparation of the protocol. Studies known to meet the inclusion criteria will be searched for among “hits” (search records) and used to identify new keywords and MeSH terms not already included in the search strategy. Once the search strategy has been finalised using the PubMed database, we will tailor it to each database and report on the adaptations. Searches will be limited to English as we do not have the resources required for reviewing non-English literature. There will be no geographic restrictions.
In addition to the electronic searches, review authors will (a) search the reference lists of all included studies and key references (for example, relevant systematic reviews) and (b) contact authors of included studies and/or experts in the field for additional references.
Selecting studies for inclusion
The initial search from different sources will be conducted to identify a database of records (title and abstracts) of relevant studies. The search results will be collated in the Endnote reference management programme and duplicates removed . The final search database will then be uploaded into Covidence, an electronic programme designed for managing the screening process in systematic reviews (https://www.covidence.org). Two reviewers (BS and AH) will then independently screen the records to evaluate their eligibility for full-text review. The full texts of those studies identified as potentially relevant will be retrieved and read by the two reviewers to make a final decision about inclusion. During this full-text review stage, where necessary, study authors will be contacted for further information. At both the abstract and full-text screening stages, conflicts will be resolved by the two reviewers (BS and AH) first attempting to reach a consensus view; failing which, a third reviewer (NL) will be the final arbitrator. The study selection process will be summarised using a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram.
We will use a combination of sampling techniques, including two types of ‘purposeful sampling’, a methodological approach that is particularly suitable for the focus of our scoping review . These sampling techniques are intended to address both the breadth (for example, exploring the characteristics of studies on data harmonisation) and depth (for example, definitions, concepts, components, processes and explanations of casual mechanisms of data harmonisation) [18, 19] of the review.
For objective 1, we will not apply any sampling strategy, in order to ensure we capture the characteristics of the widest range of studies on the topic. For objectives 2 and 3, we will apply both maximum variation sampling (to identify both variation and similarities in definition and concepts and intervention descriptions) as well as theoretical sampling (where we will sample in relation to emerging theoretical insights and questions to provide a sufficiently ‘rich’ synthesis of descriptions of underlying causal mechanisms) . The theoretical sampling will be iterative as we will start with synthesising emerging insights and may then loop back and look for more studies.
Data extraction or ‘charting the data’
The process of data extraction and sorting will be done in Excel, using the data items in the data extraction framework (Fig. 1) to fill in information for each of the items in the framework. This will also allow for comparison of key items across studies and allow for synthesis within and across data items (for example, comparing definitions across studies, or comparing within one study, the definition and the description of the intervention components and processes).
As this scoping review aims to identify various characteristics, definitions and causal mechanisms of data harmonisation, we will not conduct any risk of bias or quality assessment of included studies. This approach is consistent with scoping reviews of similar aims and methodological frameworks for conducting scoping reviews [15, 20, 21].
Data synthesis or ‘collating, summarising and reporting the findings’
One review author (BS) will conduct data analysis, using manual coding and data synthesis methods on the extracted data from included studies. Another reviewer (NL) will review the data analysis work on an ongoing basis as an additional quality check.
This review will combine quantitative and qualitative syntheses to provide an overview of our findings. First, we will present an overview of all the included studies using a numerical analysis of the key characteristics of the studies . The numerical synthesis will include following categories: income level of the country, the level of the health care system targeted in the intervention (for example primary health care, hospital-level, community-based health care), the particular type of routine health information systems involved (for example, clinical care, finance, human resources or drug supply information systems), the governance/management level targeted in the intervention (for example facility, district, regional or national levels) and types of patient population or disease programme (for example non-communicable disease or adult reproductive health).
The second synthesis approach will be a qualitative narrative synthesis  of data harmonisation definitions and of the conceptual models for understanding of how data harmonisation is meant to improve health management decision-making. We will collate and summarise definitions of data harmonisation and related concepts describing data harmonisation activities by looking for the key components across definitions and for key variations. We will code and synthesise the extracted data to identify the key issues that emerge regarding components and processes of data harmonisation interventions, the expected outcomes and impacts, and the factors influencing data harmonisation effects on management decision-making (including the steps of production, access and/or utilisation of health information).
To summarise, the numerical and narrative synthesis will result in three sets of findings: (a) an overview of key characteristics of data harmonisation studies, (b) the definitions and conceptualisations of data harmonisation, and (c) a narrative synthesis of the relationship between data harmonisation and health management decision-making.
Finally, we will ensure that the reporting of our findings is aligned with the PRISMA 2015 statement presented in Additional file 2.
Ethics and dissemination
This is a scoping review of completed studies, so no ethical approval is required. The results will be disseminated through peer-reviewed publications and conference presentations as well as shared with local and national stakeholders engaged in data harmonisation projects.
To our knowledge, this scoping review will be the first to synthesise definitions and conceptualisations of data harmonisation in the literature, as well as the underlying explanations in the literature of the causal links between data harmonisation and health management decision-making. Given time and financial constraints, we will only search for English studies published after 2000; potentially relevant studies may be missed. Applying purposeful sampling techniques will assist with addressing both breadth and depth of explanation in this scoping review, but it may also result in missing potentially useful content [18, 19]. This scoping review will be of interest to designers, implementers and users of data harmonisation interventions; it will broaden understandings of the range and complexity of studies, definitions, systems, organisations and stakeholders involved in such interventions and of the intended causal pathways for improving health management decision-making.
We would like to thank Ms. Gill Morgan, University of Cape Town, who assisted with developing the search strategy.
Time to write this paper was supported by the US National Institute of Mental Health [grant number 1R01MH106600] and the South African Medical Research Council (SAMRC). The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of the US National Institutes of Health or the SAMRC.
Availability of data and materials
BS drafted the protocol together with NL and CC. AH contributed to the development of search strategy. All authors reviewed and approved the final manuscript before final submission for peer review.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 1.World Health Organization. Everybody’s business-strengthening health systems to improve health outcomes: WHO’s framework for action. Geneva: WHO; 2007.Google Scholar
- 2.Lippeveld T, editor. Routine health information systems: the glue of a unified health system. Keynote address at the Workshop on Issues and Innovation in Routine Health Information in Developing Countries, Potomac, March; 2001.Google Scholar
- 3.World Health Organization. Country health information systems assessments: overview and lessons learnt. Geneva: WHO; 2012.Google Scholar
- 4.Boulle A. The use of routine health data sources in the Western Cape to address health service questions (presentation). Health Impact Assessment Directorate, Western Cape Department of Health. 2014.Google Scholar
- 6.Mutale W, Chintu N, Amoroso C, Awoonor-Williams K, Phillips J, Baynes C, et al. Improving health information systems for decision making across five sub-Saharan African countries: implementation strategies from the African Health Initiative. BMC Health Serv Res. 2013;13:S9.CrossRefPubMedPubMedCentralGoogle Scholar
- 7.Karuri J, Waiganjo P, Daniel O, Manya A. DHIS2: the tool to improve health data demand and use in Kenya. J Health Inform Dev Countr. 2014;8:38–60.Google Scholar
- 8.Heywood A, Boone D. Guidelines for data management standards in routine health information systems. Measure Evaluation. 2015.Google Scholar
- 11.Fichtinger A, Rix J, Schäffler U, et al. Data harmonisation put into practice by the HUMBOLDT Project. Int J Spat Data Infrastruc Res. 2011;6:234–60.Google Scholar
- 12.Lippeveld T. Routine health facility and community information systems: creating an information use culture. Glob Health Sci Pract. 2017;5(3):338–40.Google Scholar
- 21.Popay J, Roberts H, Sowden A, et al. Guidance on the conduct of narrative synthesis in systematic reviews. A product from the ESRC methods programme. 2006.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.