Background

Childbirth experiences can have immediate as well as long-term positive or negative effects on life, well-being and health [1]. A positive experience can be remembered as an empowering life event [1,2,3] connected to personal growth and self-knowledge affecting the transition to motherhood [4]. A negative birth experience increases the risk of negative health outcomes, such as postpartum depression [5] and future fear of giving birth [6], that can lead to a request for caesarean birth in future pregnancies [7, 8], and have an impact on future reproduction [9, 10]. The memory of a birth can vary over time for the woman, with either more positive or negative memories being recalled at a later period after birth compared to directly after [3, 11]. Furthermore childbirth, as experienced by the woman giving birth, can vary considerably from how a caregiver or relative may experience the same event. The person beside the woman may focus on more tangible, observable aspects and underestimate psychological aspects. It is therefore important that women are asked for their experiences [12]. Women have the right to a dignified, respectful, and humane health care during childbirth. Mistreatment of women in childbirth is a violation of women’s fundamental human rights [13]. Such mistreatment can occur both in the interaction between the woman and health care provider as through systematic failures in health facilities and health system levels. Therefor there is need of reliable and validated instruments to highlight women’s experiences and promote respectful and supportive care [14].

Studies on women’s childbirth experiences have been using different surrogate terms and related concepts such as ‘childbirth satisfaction’, ‘satisfaction with care’, ‘experiences of control’ or ‘of support’, ‘experience of relationship with caregivers’ and ‘experience of pain’ [15]. Women’s satisfaction with childbirth is multidimensional and affects the childbirth experience [16]. When evaluating and drawing conclusions from care in labour and birth, women’s experiences of childbirth should be one outcome of considerable importance to measure. This requires the use of reliable and valid instruments adapted to the purpose. As researchers might select and use different terms related to each other when studying women’s childbirth experiences, we have chosen to include instruments that use surrogate terms and related concepts in this review.

For an instrument to receive good levels of reliability and validity, extensive development and testing of psychometric properties is needed [17]. Without valid psychometric properties, conclusions drawn may be false and lead to invalid conclusions on the concept [18].

No review specifically focusing on instruments measuring women’s childbirth experiences has been found, but there are two reviews evaluating instruments measuring ‘maternal childbirth satisfaction’ [19, 20]. Perriman and Davis identified and reviewed 4 instruments measuring maternal satisfaction with continuity of maternity care models in before, during and after labour and birth. The papers describing the instruments primarily compared outcomes rather than describing the development of the tool [19]. Sawyer et al. identified and reviewed 9 multi-item instruments specifically studying maternal satisfaction with care given during labour and birth [20]. In an attempt to give researchers and clinicians an overview, we performed a systematic review to identify and present validated instruments measuring women’s childbirth experience.

Methods

A systematic review is a rigorous method of research that follows a systematic procedure to enable a summary of all findings from multiple studies on a specific topic. The start point is a rigorous search process for capturing the entire body of scientific studies [21]. As researchers might select and use different terms related to each other when studying women’s childbirth experience [15], we have chosen to use a broad definition and use surrogate terms and related concepts in this review, e.g. childbirth satisfaction, control, support, fear. The Cochrane guideline was used as guidance [21].

Eligibility criteria

First a review protocol was developed (see Additional file1). Inclusion and exclusion criteria were established in advance and documented in the review protocol. Criteria for inclusion in this review were as follows:

  • Papers representing instruments measuring women’s childbirth experience.

  • Papers should describe the development or test psychometric properties of an instrument.

  • Instruments assessing both pregnancy, childbirth and the postpartum period are included if one or more dimensions are related to women’s childbirth experiences, and this could be assessed as a separate scale.

  • Papers reporting original research, published in peer-reviewed journal.

  • Reviews were included to enable us to find original papers.

  • Papers published in English or French were included as the researchers could understand these languages.

Dissertations, non-original research, or conference papers were excluded.

Search strategy

The search strategy was designed and developed following consultation with a healthcare librarian. Before the final search all authors commented and agreed on the search string that was adapted for the individual databases (see Additional file 2). The final search took place in January 2016 in the electronic databases of PubMed, Scopus, CINAHL, Cochrane Library and PsycINFO. No restriction in the dates of publishing was made.

In total 8074 citations were identified (PubMed n = 2785, CINAHL n = 1140, PsycINFO n = 558, Scopus n = 3426 and Cochrane n = 165). For the initial screening all the search results were imported into reference management software (EndNote) and duplicates were removed, leaving 5106 titles and abstract to be screened for inclusion. First, papers clearly irrelevant to our topic, such as papers assessing childhood development, contraceptives etc., were removed by one of us (HN). The remaining 809 titles and abstract were assessed independently by two researchers (HN and an assistant, JC). This identified 266 residual papers which were assessed independently by two of the reviewers (HN and MB) to include papers for more in-depth full text assessment. Sixty-nine papers were retrieved in full text and assessed for eligibility criteria by two reviewers independently (HN and CB, or MB and CB, or HN and MB). Any potential conflicts were solved by the third reviewer. Fourteen additional studies were found through search of reference lists of included papers and were assessed in full-text by two independent reviewers for eligibility criteria (HN and MB). Three of these papers were included after assessment in full text. In total 83 papers were thus assessed in full text of which 37 did not fulfil the inclusion criteria and were excluded with reason (see Table 1). The names of each instrument were then searched in PubMed and CINAHL to retrieve further potential papers related to the specific instrument. No further papers on the development or testing of psychometric properties of the identified instruments were found. The flow of selection for studies are shown in Fig. 1.

Table 1 Excluded papers with reason
Fig. 1
figure 1

Flow chart of study selection

Quality assessment of included instruments

As the aim of this review was to identify and assess instruments measuring women’s childbirth experiences, the focus was not on the quality of the studies of the included articles but to identify psychometric properties of identified instruments. This was done using criteria specified by Terwee et al. [17] which refer to the following properties; Content validity, Internal consistency, Criterion validity, Construct validity, Reproducibility agreement, Reproducibility reliability, Responsiveness, Floor and ceiling effects, and Interpretability. The properties were evaluated as; + = positive rating, ? = indeterminate rating, − = negative rating, and 0 = no information available. Terwee et al. emphasise the importance of a clear design and method, and that the sample size needs to be greater than 50 subjects in every subgroup of the analysis [17]. In addition to quality assessment of these properties we added another two criteria. The first one considers the need for the instrument and, for a positive rating, a search for existing instruments had to have been done, demonstrating the need to develop and test a new instrument. The second rating item added is related to face validity. For a positive rating, members of the target population should have been asked about the appropriateness of the questionnaire and of each question.

This rating of the measurement properties was performed independently by two review authors (HN and MB, or HN and CB, or MB and CB). When ratings differed between the pairs, it was discussed and, when conflict remained, the third reviewer was included in the discussion to reach consensus. An overview of the results of the quality rating of psychometric properties of included instruments is displayed in Table 2. The last column in the table gives the total figure awarded to each tool, based on a mark of 1 for each ‘+’, and 0.5 for one or more ‘?’ grades. This is only a rough guide to the overall quality of the instrument and must be interpreted with caution. For example, two tools that both received a mark of 6 may be of very different quality, depending on the criteria that were awarded the points.

In conducting this review, our focus and aim was on identifying measures and conducting a broad assessment of their psychometric properties. Given the large number of instruments found, and their very different foci, it was not possible to make clear recommendations as to one particular instrument that would suit all purposes. Instead, some general suggestions are made as to the instruments that appear to be emerging as the top ranking tools in terms of the quality measurement performed, and the overall mark given.

Data extraction and analysis

The following data were extracted for each instrument: Name of instrument/acronym, authors (year), country of origin, aim/motive of instrument, number of items, dimensions/subscales, response scale, timeframe to answer the questionnaire, whether or not the questionnaire was available and a short narrative summary of included instruments. The data extraction was made by the first author (HN) and then checked by the other authors for accuracy.

One of the individual papers was conducted by one of the authors (MB). To avoid conflict of interest this paper was assessed for eligibility criteria, and quality assessment was made, by the two other authors (HN and CB).

Results

Forty-six articles presenting 36 instruments [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59] measuring women’s childbirth experiences were included for quality assessment. Different surrogate terms and related concepts used in identified instruments were described by authors as: childbirth experience (27.8%), satisfaction with care/birth/childbirth (36.1%), perception of birth/care (13.9%), control (11.1%), support (8.3%), fear of childbirth (5.6%), childbirth trauma (2.8%), birth memories (2.8%) and childbirth schema (2.8%). In five of the identified instruments we found cultural validation/translation of the instrument had been done. Most of the instruments were developed and tested in the United States (6) and in the United Kingdom (6). Further countries represented were: Canada (4), the Netherlands (4), Turkey (3), Sweden (3), Jordan (3), France (2), Italy (2), Australia (1), Senegal (1), and Norway (1). Number of items in the instruments varied from three to 145. Nine of the instruments were uni-dimensional, and 27 consisted of several dimensions/subscales. Quality ratings of psychometric properties are presented in Table 2. Descriptive data of included instruments are presented in Table 3, and characteristics in Table 4. Instruments are reported in alphabetical order by first author.

Table 2 Quality rating of psychometric properties with Terwee et al.’s criteria
Table 3 Descriptive data of the included instruments
Table 4 Characteristics of included instruments

A few of the tools gained a low quality rating, which would indicate the need for further development and evaluation of their psychometric properties. These included: The Childbirth Trauma Index for adolescents [22] (overall quality mark of 2); The Perception of Birth Scale [23, 24] (overall quality marks of 3); Support and Control in Birth [25] (overall quality marks of 4); The Childbirth Experience Perception Questionnaire [26] and The Birth satisfaction scale and the Birth satisfaction scale - revised [27,28,29] (overall quality marks of 4.5); The Birth Memories and Recall Questionnaire [30], The labour and delivery satisfaction index [31] (an instrument developed and evaluated in 1987, and in need of further testing and updating of its psychometric properties), the Women’s delivery experience measures [32], and the Childbirth schema scale [33] (overall quality marks of 5).

In general, we would suggests that tools with marks of 2 to 4.5 are not suitable for use without further testing, especially if there is another existing tool that will serve the same purpose. Tools with a mark of 5 may be suitable if they are the only instrument developed in that topic area, but not otherwise, and further testing before use is recommended.

The majority of tools (20 out of 36, 56%) had marks of 6 or 6.5, which probably indicates a suitable tool, unless there is a higher quality one in the same area. We suggest that the seven instruments with marks of 7 to 9 (Table 2) can be considered valid and reliable although, of course, further testing is always welcome and could improve them further. These included: The Childbirth Experience Questionnaire [34], The maternal satisfaction scale for caesarean section [35], The Responsiveness in Perinatal and Obstetric Health Care Questionnaire [36, 37], Pregnancy and maternity care patients experiences questionnaire [38] and The Childbirth Perception Scale [39]. The tool with the highest quality rating, of 9, was the Wijma Delivery Expectancy/experience Questionnaire [40], an instrument measuring fear specific to labour and childbirth with one version used during pregnancy (version A) and one used after childbirth (version B). The Wijma Delivery Expectancy/experience questionnaire has been used extensively [60,61,62,63,64,65,66] and cultural validation and translations have been made in several countries [67,68,69]. As this scale is commonly used for measuring fear of childbirth, and it is properly developed with good psychometric properties, we recommend this scale for measuring women’s experience of fear in childbirth, when a detailed survey is necessary. However, a number of different cut-off points are used to define severe fear of childbirth, resulting in different prevalence rates, and these should be standardised.

Discussion

The purpose of this systematic review was to identify and analyse instruments that measure women’s childbirth experiences, and 46 papers representing 36 instruments were identified and included. By including surrogate terms and related concepts to the childbirth experiences, a broader and more holistic overview of existing instruments was achieved. Identified instruments demonstrated a wide range in purpose and content as well as in the quality of psychometric properties.

When choosing between different instruments, one needs to consider all ratings together as well as taking into account those measurement properties that are most important for a specific application, setting and population, e.g. practical aspects such as burden for women, and cost and quality aspects regarding the validity and reliability of the instrument [70]. If the researcher chooses an inappropriate or poor quality measurement instrument, this may lead to bias in the conclusion, resulting in wasted resources and unethical procedures for the women that participated [71]. Rudman [72] concluded that a multi-item instrument including different dimensions of care instead of a single global measure, gave a more diverse and richer picture of women’s childbirth experiences but also led to a more negative picture [72]. To choose the right instrument for clinicians and researchers for their specific context is a complex process. In our result we present an overview in Tables 1 and 2 of descriptive data and characteristics of instruments as well as a narrative summary of the individual instruments, which can aid in this process.

Terwee et al. [17] consider the content validity to be the single most important psychometric property of the questionnaire, and state that only if the content validity is adequate can the questionnaire be considered, and the remaining measurement properties become useful. All instruments in our review did get a positive rating of content validity. But a more thorough investigation would still be advisable to see which instruments have the strongest content validity to aid in choosing an appropriate instrument. Many of the instruments that we identified would need further testing of their psychometric properties to determine which would be best. This is consistent with the finding of Sawyer et al. [20], who evaluated nine questionnaires about women’s satisfaction during labour and birth, concluding that none of the questionnaires had optimal testing of validity and reliability. Most of the instruments in our review did report on several tests of psychometric properties, but further evaluation of validity and reliability was needed.

Among the excluded papers (Table 1) there are several questionnaires developed that were not included in this review as they did not report on psychometric properties [73] or the focus was on a study rather than development of the instrument [72, 74]. Before using a specific instrument, we suggest that a thorough investigation of the development and testing of the instrument should be done to ensure good psychometric properties. In the US Food and Drug Administration’s guidelines on developing new patient-reported outcome measures, they suggest that a new instrument can be developed by modifying an existing one [18]. As we found a large number of questionnaires and instruments, we agree with this suggestion. When conducting studies of psychometric properties of an instrument, we recommend applying standards such as the COSMIN checklist [75, 76] and Terwee et al.’s criteria [17] in order to enhance the quality of the results and to facilitate the researcher to compare and find an instrument with good psychometric properties.

Several of the papers included in our review consisted of development and validation of existing questionnaires [23, 26, 41]. As well, several of the questionnaires have been culturally translated and validated in other languages and cultures [67,68,69, 77,78,79,80].

Methodological considerations

The attempt with this review was to identify all studies and instruments that meet the eligibility criteria, but it is possible that we have missed relevant articles, written in other languages than English and French, or indexed in other databases than those chosen. A limitation of this search was that we did not use Terwee et al’s PubMed search filter [81] which may have generated more papers. We suggest that this review can be used as a tool for identification of existing instruments, while acknowledging that each researcher will have to assess their chosen tool themselves in the light of the lack of, in most cases, sufficient testing. Terwee et al. [82] raised in their discussion of the quality of systematic reviews of health related outcome measurement the need for reviewers to make strong recommendations. Our review consists of a large number and wide range of instruments, making it difficult to make those recommendations, particularly as a more thorough evaluation of psychometric properties and quality assessment of included studies was needed. Nevertheless, we have made some suggestions in relation to use of tools depending on their overall quality score. As we chose to include instruments that use surrogate terms and related concepts to women’s childbirth experiences this review presents for researchers and clinicians the diversity of instruments developed. For assessing methodological quality, the COSMIN checklist has newly been developed. It is a detailed and rigorous checklist [75, 76], useful in future systematic literature reviews that have a more narrowed construct of interest, so it could be manageable to do a more in-depth assessment of each instrument comprising both psychometric properties and methodological quality of the development process of each instrument.

Conclusions

This systematic review provides an overview of existing instruments measuring women’s childbirth experiences and can support researchers to identify an appropriate instrument for their research purpose. Most of the instruments require further validation and reliability testing. Given the plethora of instruments in use in the literature, and the lack of complete testing for many of them, we recommend that researchers do not develop any more new tools, but try to test thoroughly, adapt and improve those that already exist.

Researchers and clinicians need help in finding and selecting the most suitable instrument for their purpose. This makes reviews of measurement instruments important as they aid researchers in finding appropriate, established and tested instruments instead of developing new ones. When different instruments are used to measure the same construct of interest, e.g. women’s experiences of caesarean section, it can become difficult in systematic reviews to compare and statistically report the results. We trust that this review can contribute in helping clinicians and researchers to find the right instrument for their specific context.