1 The Data Demand and Challenge

Poverty is the paramount indicator used to gauge the socioeconomic well-being of a population. Particularly after a shock or in a volatile context, poverty estimates can identify who was affected, and how severely. This is particularly relevant in fragile countries where monitoring poverty dynamics help measure the country’s progress toward stability, or increased risk of relapsing into conflict. As one of the main indicators for poverty, monetary poverty is measured by a welfare aggregate, usually based on consumption in developing countries and a poverty line. The poverty line indicates the minimum level of welfare required for healthy living.

Consumption aggregates are traditionally estimated based on time-consuming household consumption surveys. A household consumption questionnaire records consumption (how much was consumed) and expenditure (how much was purchased, or obtained in other ways like gifts or aid) for a comprehensive list of food and non-food items. Covering between 300 and 400 items, the questionnaire often exceeds 120 minutes to administer. In addition to the longer administering time leading to higher costs, response fatigue can increase measurement error, especially for items at the end of the questionnaire. In a fragile country context, a face-to-face time of 90–120 minutes can be prohibitively high. In the case of Somalia, security concerns restricted the duration of a survey visit in Mogadishu to about 60 minutes.

The extensive nature of household consumption surveys makes it difficult to obtain updated poverty estimates, especially when they are needed the most, such as after a shock and in fragile countries. Approaches have therefore been developed to reduce administering times to allow for the collection of consumption data. The most straightforward approach to minimize administering time is to reduce the number of items surveyed, either by asking for aggregates, or by skipping less frequently consumed items, which is called the reduced consumption methodology. However, both approaches—using aggregates, and skipping less common items–have been shown to underestimate consumption, which in turn overestimates poverty.Footnote 1 Splitting the questionnaire to allow for multiple visits is another solution, but potential attrition issues especially in fragile contexts increases the required sample size and may be costlier. In addition, multiple visits to the same household can increase security concerns.

The second class of approaches utilizes a full consumption baseline survey and updates poverty estimates based on a small subset of collected indicators.Footnote 2 These approaches estimate a welfare model based on the baseline survey using a small number of easy-to-collect indicators. This allows poverty estimates to be updated by collecting only the set of indicators instead of the direct consumption data. While this approach is cost-effective and easy to implement in normal circumstances, it has two major drawbacks in the context of fragility and shocks. First, the approach requires a baseline survey, which is sometimes not available, as in the case of Mogadishu. Second, the approach relies on a structural model estimated from the baseline survey.Footnote 3 In the case of shocks, structural assumptions that cannot be tested are often violated. Thus, poverty updates based on the violated assumptions tend to underestimate the impact of the shock on poverty. Therefore, cross-survey imputation methodologies are not applicable in the context of shocks and fragility.

2 The Innovation

To assess poverty in Mogadishu, we tested a new methodology combining an innovative questionnaire design with standard imputation techniques. This substantially reduces the administering time of a consumption survey from multiple hours or even days to about 60 minutes, while still resulting incredible poverty estimates. The gain in shorter administering time, however, is offset by the need to impute missing consumption values. Given the design of the questionnaire, this method circumvents the systematic biases identified for alternative methodologies.

2.1 Overview

The rapid consumption survey methodology involves five main steps (Fig. 1). First, core items are selected based on their importance for consumption. Second, the remaining items are partitioned into optional modules. Third, optional modules are assigned to groups of households. Fourth, after data collection, consumption of optional modules is imputed for all households. Fifth, the resulting consumption aggregate is used to estimate poverty indicators.

Fig. 1
figure 1

Illustration of the rapid consumption survey methodology (using illustrative data only)

First, core consumption items are selected. Consumption in a country bears some variability, but usually a small number of a few dozen items captures the majority of consumption. These items are assigned to the core module, which will be administered to all households. Important items can be identified by their average consumption share per household or across households. Previous consumption surveys in the same country, or consumption shares of neighboring or similar countries can be used to estimate consumption shares.

Second, non-core items are partitioned into optional modules. Different methods can be used for this partitioning. In the simplest case, the remaining items are ordered according to their consumption share and assigned one by one while iterating the optional module in each step. A more sophisticated method takes into account the correlation between items, and partition them in a way so that all items within a module explain consumption as best as possible, while the information between modules should be highly correlated. The partitioning influences the standard error of the estimation, but does not introduce bias. Thus, even in the absence of a previous survey, this methodology can be applied. More complicated partition patterns can result in a set of very different items in each module. However, the modular structure should not influence the layout of the questionnaire. Instead, all items should be grouped into categories of consumption (e.g. cereals) and different recall periods. It is therefore recommended to use CAPI technology, which allows the structure of the consumption module to be hidden from the enumerator.

Third, optional modules should be assigned to groups of households. Optional modules should be assigned randomly, stratified by clusters to ensure appropriate representation of optional modules in each cluster. This means that each cluster should include about the same number of households assigned to each optional module. This step is followed by the actual data collection.

Fourth, household consumption should be estimated by imputation. The average consumption of each optional module can be estimated based on the subsample of households assigned to the optional module. In the most straightforward case, a simple average can be estimated. More sophisticated techniques can employ a welfare model based on household characteristics and consumption of the core items. The next section presents six techniques and demonstrates their performance on the dataset from Hargeisa.

Single imputation of the consumption aggregate underestimates the variance of household consumption. Depending on the location of the poverty line relative to the consumption distribution, this may either consistently under- or overestimate poverty. Multiple imputations based on bootstrapping can mitigate the problem but will render analysis more complicated. We use single as well as multiple imputation techniques for the evaluation of the methodology.

3 Key Results

In this section, the rapid consumption methodology will first be applied to a dataset including a full consumption module from Hargeisa, Somaliland. This will be used to assess the performance of the rapid consumption methodology compared to the traditional full consumption methodology. The results of the High Frequency Survey in Mogadishu are then presented. Security risks in Mogadishu restrict face-to-face interview time to less than one hour; therefore, the rapid consumption methodology was used to derive the first ever consumption estimates for Mogadishu. We present the resulting consumption aggregate, and perform consistency checks for its validation.

3.1 Ex Post Simulation

The rapid consumption methodology is applied ex post to household budget data collected in Hargeisa, Somaliland. Hargeisa was chosen as it is very similar to Mogadishu. Using the full consumption dataset from Hargeisa allows a full assessment of the new methodology. Based on selected indicators, we compare the results of the estimated consumption based on the rapid consumption methodology with the results from using the traditional full consumption module. We add a comparison with the results for a reduced consumption module.

The simulation assigns each household to one optional module. The consumption data for the modules not assigned to the household is deleted. Multiple simulations are performed, with various modules being assigned to households. Across the simulations, we calculate three consumption indicators and four poverty and inequality indicators. The consumption indicators capture the accuracy of the estimation at three different levels: the household level, the cluster level (consisting of about nine households), and the level of the dataset. In addition, we calculate the poverty headcount (FGT0), the poverty depth (FGT1), the poverty severity (FGT2), and the Gini coefficient to capture inequality.

Six estimation techniques are compared with respect to their relative bias and relative standard error, based on 20 simulations. All simulations used the same item assignment to modules using the algorithm as described (see Table 1 for the resulting consumption shares per module).Footnote 4 The estimation techniques differ considerably in terms of performance. We also compare the techniques to using a reduced consumption module where the same consumption items are collected for all households. The number of items is equal to the size of the core module and one optional module, implying a comparable face-to-face interview time to the rapid consumption methodology.

Table 1 Number of items and consumption share captured per module

Comparing the reduced consumption approach with the full consumption as a reference, the reduced consumption approach suffers from an underestimation of consumption. This is not surprising because the approach only collects information on the consumption of a subset of items. Applying the median as a summary statistic also results in an underestimation of consumption. As consumption distributions have a long right tail, the median consumption belongs to a poorer household than the average household. In the case of Hargeisa, several optional modules have a median of zero consumption. Thus, the median underestimates the consumption in a similar way to the reduced consumption approach. In contrast, the average consumption of households is larger than the consumption of the median household. Thus, it is not surprising that the technique using the average as a summary statistic overestimates total consumption at the household and cluster levels.

The regression techniques have a similar performance, with a considerable upward bias at all levels. The Tobit regression performs slightly better at the household and cluster levels. As known from literature about small area estimates, the regression approaches do not model the error distribution correctly and, thus, underestimate the tails of the distribution. Depending on the value of the poverty line relative to the mode of the distribution, this results in an over- or under-estimation of the poverty rate. In contrast, both imputation techniques perform exceptionally well, with a bias below 1% at all levels (Fig. 2).

Fig. 2
figure 2

Average relative bias and standard error

While the bias is important in order to understand the systematic deviation of the estimation, the relative standard error helps to understand the variation of the estimation. Other than in a simulation setting, the standard error of the estimation cannot be calculated, as only one assignment of households to optional modules is available. Thus, it is important that the estimation technique delivers a small relative standard error.

Generally, the relative standard error reduces when moving from the household level over the cluster level to the simulation level. The relative standard error for the reduced consumption methodology is smaller than for the summary statistic techniques because the reduced consumption is not subject to variation from the module assignment to households. The regression techniques have large relative standard errors of around 20% at the household level, while the multiple imputation techniques vary between 15 and 20%. At the cluster level, the relative standard error drops to 7% for regression techniques and 5% for multiple imputation techniques. At the simulation level, the relative standard error is around 3% for regression techniques and 1% for multiple imputation techniques.

The distributional shape of the estimated household consumption level can be compared to the reference household consumption by employing standard poverty and inequality indicators. The poverty headcount (FGT0) is 57.4% for the reference distribution.Footnote 5 Not surprisingly, the reduced consumption technique and the median summary statistic overestimate poverty by several percentage points due to the underestimation of consumption, while the average summary statistic and the regression techniques underestimate poverty, since they overestimate consumption. The multiple imputation techniques overestimate poverty, but only by 0.5 percentage points (or about 1%), performing significantly better than the reduced consumption approach, which has a bias that is more than two times larger. The reduced consumption technique and the median summary statistic as well as the multiple imputation techniques deliver good results for FGT1 and FGT2, emphasizing that not only can the headcount be estimated reasonably well, but the distributional shape is also conserved. With the exception of the median summary statistic, these techniques also perform well estimating the Gini coefficient, with a bias of less than 0.5 percentage points. The relative standard errors show similar results as for the estimation of the consumption. The relative standard error of the reduced consumption for FGT0 is double that of the multiple imputation techniques. The relative standard errors for the multiple imputation techniques for FGT1 are comparable but larger than for FGT2 and Gini (Fig. 3).

Fig. 3
figure 3

Bias and standard errors

In conclusion, the average summary statistic and the regression approaches cannot deliver convincing estimates. While the reduced consumption technique and the median summary statistic perform considerably better, they both overestimate poverty. Only the multiple imputation techniques are convincing in all estimation exercises. In terms of the estimation of the important poverty headcount (FGT0), the multiple imputation techniques are virtually unbiased.

4 Implementation Challenges, Lessons Learned, and Next Steps

In late 2014, consumption data using the proposed rapid consumption methodology was collected in Mogadishu using CAPI. The rapid consumption questionnaire reduced face-to-face interview time considerably. A household visit took about 40 minutes on average (with a median of 35 minutes), including greetings, household characteristics, consumption modules, and a number of perception questions. Nine out of ten interviews took less than 65 minutes.

After data cleaning and quality assurance procedures, 675 households with consumption data were retained.Footnote 6 A welfare model was built to predict missing consumption in optional modules. The welfare model was tested on the core consumption, after removing the core consumption as an explanatory variable. The model for food consumption retrieved an R2 of 0.24, while non-food consumption was modeled with an R2 of 0.16. It is important to emphasize that these models give a lower bound of the R2 compared to the models used in the prediction, as the prediction models include the core consumption as an explanatory variable. Given the assessment of the different estimation techniques in the previous section, the multivariate normal approximation using multiple imputations is applied to the Mogadishu dataset.

For the Mogadishu dataset, the assignment of items to modules had to be manually refined.Footnote 7 The refinement had a minor impact on the share of consumption per module. It is curious, though, that the share of consumption per module is different for Hargeisa and Mogadishu. Using the Hargeisa dataset, 91% of food consumption (and 76% of non-food consumption) is captured in the core module. In contrast, the core food consumption share is only 64% (and 62% of non-food consumption) in Mogadishu before imputing the consumption of non-assigned modules. Thus, employing a reduced consumption module based on consumption shares identified in Hargeisa would have crudely underestimated consumption in Mogadishu, without being able to evaluate the inaccuracy. In contrast, the rapid consumption methodology allows the estimation of shares for each module, while the consumption estimation procedure implicitly takes into account the ‘missing’ consumption shares for each household (Table 2).

Table 2 Number of items and consumption shares captured per module

The cumulative consumption distribution can be compared for the consumption captured in the core module, the assigned optional modules, and the imputed consumption. By construction, the core consumption shows the lowest consumption per household. Adding the consumption from the assigned optional modules shifts the cumulative consumption curve slightly. The imputed consumption is shifted even further as the estimated consumption shares from the non-assigned modules are added (Fig. 4).

Fig. 4
figure 4

Cumulative consumption distribution (in USD) per day and per capita (Color figure online) (Note For core module (dark blue), core and assigned optional modules (medium blue), and imputed consumption (light blue). The presented consumption aggregate does not include consumption from durable goods

Without full consumption aggregate values for Mogadishu, we can only show the consistency of the retrieved consumption aggregate with other household characteristics to validate the estimates. Consumption per capita usually reduces with increasing household size. Indeed, we find that household size is significantly negatively correlated with estimated per capita consumption.Footnote 8 Per capita consumption also decreases with a larger share of children among the household members. The proportion of employed members of the household significantly increases consumption per capita. Thus, the retrieved consumption estimate is consistent and using the evidence from the ex post simulations, highly accurate.

The results of the ex post simulation indicate that the rapid consumption methodology can reliably estimate consumption and poverty. The experience in Mogadishu also shows that the rapid consumption methodology can be implemented in extremely high-risk areas, due to its success in limiting face-to-face interview time to less than one hour. While these results are encouraging, the rapid consumption methodology has some limitations.

The rapid consumption questionnaire varies in comprehensiveness and the order of items in the consumption module between households. The effect of a response bias due to this can neither be estimated from the simulations nor from the data collected in Mogadishu. However, an enhanced design with different optional modules varying in their comprehensiveness can shed light on this bias. Comparison between responses for the same item in a comprehensive and an incomprehensive list would indicate a lower bound for response bias. Assuming that a comprehensive list results in a better estimate, the response bias could be corrected.

The rapid consumption methodology can increase the gap between capacity at enumerator level and the complexity of the survey instrument. Capacity at the enumerator level is often low in developing countries, especially in a fragile context. The rapid consumption methodology increases the complexity of the questionnaire, which can further increase the gap between existing and required enumerator capacity. However, CAPI technology can seal off complexity from enumerators, as software can automatically create the consumption module based on core and optional modules for each household without showing the partition to the enumerator. In Mogadishu, advanced CAPI technology was used to automatically generate the questionnaire based on the assignment of the household to an optional module. While enumerators were made aware that different households would be asked about different items, administering the rapid consumption questionnaire did not require any additional training of enumerators beyond that needed for a standard consumption questionnaire.

Analysis of rapid consumption data requires high capacity. Analysis capacity is usually limited in developing countries, and especially in fragile contexts. While the general idea of optional consumption modules being assigned to households is digestible by local counterparts, poverty analysis based on a bootstrapped sample of consumption distribution is likely to overwhelm local capacity. However, even standard poverty analysis is often beyond the limits of local capacity in fragile countries. Therefore, capacity building usually focuses on data collection skills with a longer-term perspective on increasing data analysis capacity. In addition, the rapid consumption methodology might be the only way of creating poverty estimates in certain areas, for example, in Mogadishu.

The results of the ex post simulation and the application of the methodology in Mogadishu suggest that the rapid consumption methodology is a promising approach to estimating consumption and poverty in a cost-efficient and fast manner, even in fragile areas.Footnote 9 A similar ex post simulation for South Sudan and Kenya (data not shown) indicates that the rapid consumption methodology can also be applied at the country-level, with large intra-country consumption variation.Footnote 10 The rapid consumption methodology has been implemented in Somalia, South Sudan, and Kenya, with additional countries in the pipeline.