Age heaping is a pattern of responses to questions on censuses and other record-keeping systems where too many responses are found for certain ages, typically ages ending in zero, five and even numbers. It is usually found in censuses or other demographic source materials where illiterate or semiliterate people responded. Demographers have developed methods for reapportioning these responses to adjacent ages, but these techniques are not recommended for older populations (age 62 and up) because their age misreporting often takes more complex forms and because higher death rates may cause biases in distributing people across nearby age groups.
Nineteenth- and early twentieth-century censuses in Europe and North America often showed response errors due to age heaping, and this pattern of error was often found to be concentrated among illiterate and innumerate population groups. Several indices (typically algebraic smoothing formulas) were developed to correct for this problem, most notably by Whipple (1919) and Myers (1940).
Age heaping is not widespread for most major world censuses today, and it cannot always be assumed that the original data can be made more accurate by applying age heaping formulas. Even with the updated age heaping formulas (e.g., Spoorenberg 2007), this technique should not be applied to older populations because patterns of mortality and age distribution among those in their 60s and above can be complex. Further, the mental decline that often accompanies aging can result in age misreporting that cannot be corrected with age heaping indices.
Key Research Findings
As mass literacy spread in Europe and North America since the nineteenth century, the problem of age heaping declined. After 1950, as nations in many other world regions conducted censuses among populations with high levels of illiteracy, there was renewed interest in this technique, which was originally developed for and applied to US data. In the past 20 years, researchers have found that historical European populations have often shown age heaping but the patterns are complex.
Age heaping emerged as a demographic technique to be applied to census data when reported age distributions showed overly large counts at certain ages. When single years of age were reported, they often had a nonrandom occurrence. Respondents’ ages appeared to cluster on ages ending in zero, five, and even numbered digits in a regular fashion across different censuses.
Across the globe there has been a century-long compression of the age at death as societies went through the demographic transition. Yet the degree to which this has resulted in a “rectangularization” of population appears to differ between nations (Wilmoth and Horiuchi 1999). Because of this and because, as will be shown below, nations can show large annual shifts in single-year birth and death rates, using age heaping indices for older populations may result in misestimating true historical population changes.
History of the Method
The mis-enumeration of age in censuses has been of concern to demographers since the 1920s. While smoothing techniques for age distributions such as the Whipple Index (Whipple 1919; Shryock and Siegel 1973) and Myers’ Blended Index (Myers 1940) have been developed to adjust age distributions, there has been little concern with external validation of these methods.
After Whipple’s early twentieth-century (1919) demonstration of age heaping, most methods of measuring and norming data focused on discovering a way of apportioning census and other data for adult populations and sometimes for adult populations below a certain age. Initially, these techniques for data cleaning were used primarily on US data (see e.g., Bachi 1951).
At the time there was relatively little concern with external validation of these methods. Only recently have studies evaluated whether age misreporting follows this assumed pattern; errors in age have been shown to follow other patterns as well (Blum and Krauss 2018). In addition, there has been little study of whether large swings in single-year drops in age distributions in historical populations could be real (due to disease or famine) that happen to fall on the ages that census respondents are hypothesized to avoid.
The mathematical formula used in decomposing age heaping assumes that the age distribution of the population between ages 23 and 62 (or 72 in some recent studies) is flat; that is, it is assumed that the same size population is found at every age and that deviations due to heaping on individual ages are evenly distributed. This is usually not true. Populations, especially in the nineteenth and twentieth centuries that had not completed the demographic transition (Harper and Armelagos 2010), usually had a pyramidal shape across age.
With a growing population, there is more chance for a larger population at younger ages, which introduces some bias in the measure. After the demographic transition, which was linked in most countries to bureaucratization, rising educational levels and the institutionalization of public health measures, populations at each age tend to resemble each other in size. High survival rates in modern societies until the oldest ages result in a rectangularization of the age structure, and this age structure replaces the pyramidal form found in traditional or transitional nations where birth and death rates remain high (see Ebeling et al. 2018). Paradoxically, even as the need for the use of age heaping measures to decompose heaping on certain ages is less biased (they are increasingly applied to a rectangular rather than a pyramidal age structure), they are probably less necessary to use. A recent modification of the Whipple Index tries to remedy this problem (Spoorenberg 2007).
This age distribution problem is complicated by the fact that in most countries, in any given census, younger people (relative to older people in the same census) are (a) more likely to be literate and numerate and (b) are more likely to have recently experienced events (school leaving, military draft registration, courtship, and marriage) that are more linked to specific ages than is true for middle-aged or elderly respondents (e.g., Blum and Krauss, 2018). Younger people are less likely to suffer from dementia and other mental incapacities that may interfere with the correct recall of one’s true age. None of these age heaping indices take these historical factors into account nor have estimates been made of the rate at which mental incapacity affects an aging population. Probably a larger proportion of those in institutional settings like nursing homes get their age assigned to them on surveys and censuses, but this has apparently never been investigated as a source of bias.
Illiteracy, innumeracy, different cultural measures of calendrical, and age calculation as well as cognitive decline can contribute to age heaping. All appear to have distinct effects on age heaping. The methodological literature on how to correct for age heaping apparently assumes that there is one main way in which age heaping occurs, and that is by individual respondents preferring certain nearby or adjacent ages (ending in zero, five, and even numbers). Current measures of age heaping can only measure this pattern of mis-enumeration.
Historical Trends in Age Heaping
Age heaping in census and other demographic data is a problem in early censuses and other record-keeping systems where the ages of a significant proportion of the population cannot be accurately estimated. It can continue to be a problem in countries with weak vital statistics systems and where age data on census results cannot be checked against other administrative records. With the advent of decolonialization, many new nations conducted modern censuses during the 1946–1965 era. These new nations often had very low levels of literacy, and many of the age data reported in these censuses were subject to age heaping. The United Nations and other agencies providing technical demographic aid to these nations began to include measures of age heaping into their suggestions for data refinement and improvement. The age heaping measures developed by Whipple and Myers for use with deficient data from the United States gained a renewed popularity among demographers in the developing world (Carrier and Farrag 1959; Shryock and Siegel 1973).
Scholars have long debated the quality of age and mortality data for the African American and white populations of the United States because it has important implications for the interpretation of differences in mortality and health in different eras (Zelnik 1961; Coale and Rives 1973). Because the US Census is conducted independently of other sources of demographic data (enumerators must rely on self-reports by respondents and cannot check other records for accuracy), demographers have had to discover ways of trying to ensure data quality and accuracy (Ewbank 1981).
The key debate on whether there is a “crossover effect” of prior high mortality among African Americans that weeded out the older African American population and led to superior survivorship for them (compared to whites) at over age 65 has largely revolved around other possible biases in the census and vital statistics data (Preston et al. 1999). Age heaping per se does not figure into these arguments much because it would only involve a relatively inconsequential shift in age data and risk of death.
The US baby boom (made up of people born between 1946 and 1964; the effect is much less striking in postwar European nations) poses a special problem for age heaping. In 2012, for example, the US white population (technically the resident white population, either alone or in combination with other races) stood at about 251 million people.
Age in 2012
Surviving number (in 1,000s)
Percent increase over prior year
At first glance, it looks like age heaping may be affecting the reported US white population (Blacks and other races were excluded because their baby boom is much less pronounced). There appears to be huge jump in numbers at age 65 in 2012, just as age heaping theory would predict.
Yet this surge in births in 1947 is real: birth rates shot up by almost 40 percent between 1946 and 1947. During and after 1945, millions of US soldiers returned home and married, had children, and found jobs. This apparent surge in the number of white Americans at age 65 is due to a real, 1-year increase in the birth rate over a half century earlier and has been confirmed by census and administrative data for decades afterward.
Other historical events can also make it appear that surviving age cohort are suffering from age heaping in official data. While no accurate figures exist, there were probably huge demographic effects on a narrow band of age cohorts that sustained huge losses between 1942 and 1945 in Germany, Japan, and the former Soviet Union and in World War I’s peak mortality years of 1916–1918. Some data suggest that the influenza epidemic of 1918 (which hit again in 1920 in East Asia) may have led to a high death rate among people in their late 20s, a pattern not repeated in subsequent twentieth-century influenza pandemics (Gagnon et al. 2013). If the very small birth cohorts of the Three Bitter Years (born in 1959–1961) in China are included in computations of age heaping for the 1982 and following censuses, age heaping indices are far higher for that nation. Yet this is due to a real event, not to age misstatement.
Over the last two decades, two trends have reignited interest in attacking the question of age heaping in census and other historical data available from European and North American sources. First, a group of economic historians has made broad claims that a lack of age heaping in early modern data in some societies is indicative of a higher level of human capital (due to improved numeracy) and may be a harbinger of a tendency toward earlier economic development (see A’Hearn et al. 2009; Tollnek and Baten 2016). In some senses this is a resuscitation of modernization theory (Inkeles 1969) in that it looks for a precursor variable in economic development (other classic examples are Weber’s Protestant Ethic and David McClelland’s n-Ach, a psychological need to achieve that drives entrepreneurship). While the theory is suggestive, it may require better specification of the pathways through which knowledge of one’s age has a direct effect on societal economic growth. No researcher has shown how or whether an additional proportion of numerate individuals is specifically responsible for sustained economic growth.
Second, several large-scale data sets now allow researchers to see how age heaping changed over time in national, sub-national, and sometimes individual-level data in such databases as Integrated Public Use Microdata Samples (IPUMS), the North American Population Project (NAPP), and the Mosaic project at Max Planck University (Sobek 2016; Szoltysek et al. 2018). The Mosaic project has shown that there were distinct “bands” of age heaping across Europe, with much greater age heaping found in Eastern and Southern Europe (Szoltysek et al. 2018). These new data sources have led to studies that show that factors such as gender and marital status can affect age heaping (Foldvari et al. 2012).
At the oldest ages, research has shown that age heaping still exists in census data from advanced and highly literate nations (Jdanov et al. 2008). In general, data on the oldest-old in Sweden is generally considered to be highly reliable and to exhibit little age heaping. Methods of gauging age heaping among the oldest-old involve mortality estimation using model life tables combined with smoothing of data on reported ages of survivors and decedents (see comparisons of Han Chinese and Swedish populations in Zeng and Vaupel 2003). Since pensions sometimes require the attainment of a certain age, attempts to qualify can result in new patterns of age heaping (Budd and Guinane 1991).
Effects of Cultural Systems on Age Heaping
Demographers have had little interest in the psychological or cultural sources of age heaping. Numeracy is a learned skill, as is the ability to estimate or approximate large numbers. In the United States, most school children learn the concept of odd and even numbers at about age 6 and can count to 100 and learn how to round up by about age 7 (Child Development Tracker 2019). Early research in the United States suggested that there are numerous reasons why children misreported their age; Stile’s report on 5,000 children says that “there is a very great carelessness or ignorance among school children in reference to their age and birthday” and pushed for “the rapid extension of birth registration and issuance of birth-registration certificates to parents”(Stiles 1915). Little cross-cultural research exists on the topic, but, as the Chinese case (mentioned below) shows, the salience or importance of knowing one’s exact age may differ across cultures.
Knowing one’s age is profoundly social: no one independently remembers the circumstances and time of one’s birth. The classic twentieth-century British anthropologists’ guide to how to conduct social and cultural anthropology, “Notes and Queries in Anthropology,” exhorts the anthropologist, amateur, or colonial officer to make inquiries and take detailed field notes on concepts in numeracy and age computation (Royal Anthropological Institute of Great Britain and Ireland 1951). Yet most such studies ignore these topics; we know very little about how high people could count in most preliterate (and many literate) societies, how they rounded numerical data, and how they conceived of age and the passage of years.
Age heaping in census and other demographic data can continue to be a problem in countries with weak vital statistics systems and where age data on census results cannot be checked against other administrative records. Yet age heaping appears to be declining in the population census of India; it declined by almost half between the 2001 and 2011 censuses, largely as a result of the spread of primary education (Agrawal and Khanduja 2015). However, age heaping is still prevalent in recent censuses Nepal and Bangladesh, although both the Whipple and Myers’ indices are quite low in Indonesia and Malaysia (National Planning Commission [Government of Nepal] 2014: 62).
The other large world population where age heaping might have been prevalent, China, has never (at least since the late nineteenth century) exhibited a tendency toward age heaping. Since the late seventeenth century, when the Jesuits Schall and Verbiest convinced the new Qing rulers to introduce a new civil and ritual calendar, the days of the Chinese lunisolar calendar have matched the Gregorian calendar (see Hu 2002). Li and Sun’s analysis of the Whipple Index and Myer’s Blended Index for men and women in the Chinese censuses of 1953, 1964, 1982, 1990, and 2000 all show remarkably little age heaping in any census (2003: 37). This is due to the use of the traditional Chinese lunisolar calendar (Dershowitz and Reingold 2008) and the importance of correct recording of age for fortune-telling associated with major life events. Of course, this applies to the approximately 93 percent of the Chinese population of Han origin; some studies have suggested that non-Han Muslim groups have pronounced age exaggeration, but whether these groups also exhibit age heaping as well has yet to be established.
While the use of the oriental zodiacal calendar (i.e., using animal years of birth as the age referent) does introduce some other minor but well-known differences from Gregorian (western) calendrical calculations (Saw 1967), age heaping is not one of them. Most researchers attribute the accurate computation of ages by Chinese to the use of the Chinese “animal year” calendar (Banister 1984; Jowett and Li 1992). The high levels of accuracy in age recall by Chinese respondents suggest that in Chinese culture at least, age recall is not correlated with numeracy or even literacy. In the 1953 census, probably well over half of Chinese women were illiterate, but they recalled their ages very accurately, and the spread of literacy and numeracy in China after 1949 had little discernible effect on age heaping.
Traditional Korean, Japanese, Vietnamese, and Thai age and calendrical systems of age computation roughly follow the Han Chinese model, where years are of different lengths because an intercalary month is injected at periodic intervals to balance out the lunar and solar cycles (Dershowitz and Reingold 2008; on Thailand see Chayovan and Knodel 1993). There is no evidence that age heaping is of major importance even among the illiterate in early censuses in these nations. Oddly enough, some Afghan ethnic groups, including the Turkoman, Uzbek, and Hazara, know their Chinese animal year of birth but not their age. One inventive researcher used this referent point as well as the dates of historical events and Muslim calendar years to help a respondent who did not know his or her age to estimate it (Scanland 1976). In the Afghan case, the problem of age reckoning is complicated by the use of different Islamic calendars (Conant 2001).
For most populations in high-income nations with virtually universal primary school education systems, age heaping is not a problem in censuses and other sources of data. Age heaping remains a problem in data interpretation in some societies, but it is probably of declining significance. However, the increase in the proportion of older people will require more work on whether age heaping is becoming more prevalent among the oldest-old as they lack the capacity to give accurate age data. Except for work by researchers working on Chinese data (Zeng and Vaupel 2003), relatively little work is being done on the topic, and the special nature of the elderly in East Asian societies (lower literacy but extremely clear knowledge of exact age before cognitive decline sets in) suggests that such results are not generalizable to other populations.
For all the attention paid to age heaping by demographers, it is surprising to find that no national statistical agency or historical archive provides census data that have been corrected for obvious cases of age heaping. Clearly, the application of age heaping formulas could improve some flawed historical data, but it has yet to be done. For most modern censuses, more complex models of age misreporting that introduce insights from studies of innumeracy, the use of different calendars, and the psychology of cognitive decline with age might lead to a better specification of age heaping than the “laws” of age heaping introduced by Whipple and others a century ago. For gerontologists, the use of age heaping formulas to fix uneven age totals in data sets with older respondents is tempting but inadvisable because it may introduce as many new errors as it takes away.
Future research might include more analysis of whether knowledge of one’s exact age differs by age, gender, educational level, cultural background, or method of census collection, as well as by who is supplying the information. The fact that I have never seen age data from any nation with age unspecified suggests that demographers may have been overly credulous as to where in the statistics-gathering process exact ages appear.
- A’Hearn B, Baten J, Crayen D (2009) Quantifying quantitative literacy: age heaping and the history of human capital. J Econ Hist 69(3):783–808Google Scholar
- Agrawal G, Khanduja P (2015) Influence of literacy on India’s tendency for age misreporting: evidence from the census of 2011. J Popul Soc Stud 23(1):47–56Google Scholar
- Bachi R (1951) The tendency to round off age returns, measurement and correction. Bull Int Stat Inst 33(4):195–222Google Scholar
- Banister J (1984) An analysis of recent data on the population of China. Popul Dev Rev 10(2):241–271Google Scholar
- Blum M, Krauss KP (2018) Age heaping and numeracy: looking behind the curtain. Econ Hist Rev 71(2):464–479Google Scholar
- Budd JW, Guinane T (1991) Intentional age-misreporting, age heaping and the 1908 old age pensions act in Ireland. Popul Stud 45(3):497–518Google Scholar
- Carrier NH, Farrag AM (1959) The reduction of errors in census populations for statistically underdeveloped countries. Popul Stud 12(3):240–285Google Scholar
- Chayovan N, Knodel J (1993) Age and birth date reporting in Thailand: evidence from the 1987 demographic and health survey. Macro International, ColumbiaGoogle Scholar
- Child Development Tracker (2019). http://www.pbs.org/parents/childdevelopmenttracker/six/mathematics.html. Accessed Feb 4 2019
- Coale AJ, Rives NW (1973) A statistical reconstruction of the Black population of the United States 1880–1970: estimates of the true numbers by age and sex, birth rates, and total fertility. Popul Index 39(1):3–36Google Scholar
- Conant E (2001) Why Afghans don’t know their ages. Newsweek 29 November. https://www.newsweek.com/why-afghans-dont-know-their-ages-149749. Accessed 16 Jan 2019
- Dershowitz N, Reingold EM (2008) Calendrical calculations, 3rd edn. Cambridge University Press, CambridgeGoogle Scholar
- Ebeling M, Rau R, Baudisch A (2018) Rectangularization of the survival curve reconsidered: the maximum inner rectangle approach. Popul Stud 72(3):369–379Google Scholar
- Ewbank DC (1981) Age misreporting and age-selective underenumeration: sources, patterns, and consequences for demographic analysis. National Academy Press, Washington, DCGoogle Scholar
- Foldvari P, van Leeuwen B, van Leeuwen-Li J (2012) How did women count? A note on gender-specific age heaping differences in the sixteenth to nineteenth centuries. Econ Hist Rev 65(1):304–313Google Scholar
- Gagnon A, Miller MS, Hallman SA, Bourbeau R, Herring DA, Earn DJD, Madrenas J (2013) Age-specific mortality during the 1918 influenza pandemic: unravelling the mystery of high young adult mortality. PLoS One 8(8):1–9Google Scholar
- Harper K, Armelagos G (2010) The changing disease-scape in the third epidemiological transition. Int J Environ Res Public Health 7:675–697Google Scholar
- Hu MH (2002) Provenance in contest: searching for the origins of Jesuit astronomy in early Qing China, 1664–1703. Int Hist Rev 24(1):1–36Google Scholar
- Inkeles A (1969) Making men modern: on the causes and consequences of individual change in six developing countries. Am J Sociol 75(2):208–225Google Scholar
- Jdanov DA, Jasilionis D, Soroko EL, R Rau, Vaupel JW (2008) Beyond the Kannisto-Thatcher database on old age mortality: an assessment of data quality at advanced ages. Working paper 2008–13. Max Plank Institute for Demographic Research, RostockGoogle Scholar
- Jowett AJ, Li YQ (1992) Age heaping: contrasting patterns from China. GeoJournal 28(4):427–442Google Scholar
- Li SZ, Sun FB (2003) Mortality analysis of China’s 2000 population census data: a preliminary examination. China Rev 3(2):31–48Google Scholar
- Myers RJ (1940) Errors and bias in the reporting of ages in census data. Trans Actuar Soc Am 41(2):104Google Scholar
- National Planning Commission [Government of Nepal] (2014) Population monograph of Nepal. Central Bureau of Statistics, KathmanduGoogle Scholar
- Preston SH, Elo IT, Stewart Q (1999) Effects of age misreporting on mortality estimates at older ages. Popul Stud 53(2):165–177Google Scholar
- Royal Anthropological Institute of Great Britain and Ireland (1951) Notes and queries on anthropology, 6th edn. Routledge and Keegan Paul, LondonGoogle Scholar
- Saw SH (1967) Errors in Chinese age statistics. Demography 4(2):859–875Google Scholar
- Scanland P (1976) The age computer: a simple device for improving age computation in censuses and surveys. Public Health Rep 91(4):360–367Google Scholar
- Shryock HS, Siegel JS (1973) The methods and materials of demography. Academic, New YorkGoogle Scholar
- Sobek M (2016) Data prospects: IPUMS-International. In: White MJ (ed) International handbook of migration and population distribution. Springer, New York, pp 157–174Google Scholar
- Spoorenberg T (2007) Quality of age reporting: extension and application of the modified Whipple’s index. Popul Ecol 62(4):729–742Google Scholar
- Stiles CW (1915) Difficulties in obtaining ages. Public Health Rep 30(5):310–311Google Scholar
- Szoltysek M, Poniat R, Gruber S (2018) Age heaping patterns in MOSAIC data. Hist Methods 51(1):13–38Google Scholar
- Tollnek F, Baten J (2016) Age-heaping-based human capital estimates. In: Diebolt C, Haupert M (eds) Handbook of cliometrics. Springer, New York, pp 132–154Google Scholar
- Whipple GC (1919) Vital statistics- an introduction to the science of demography. Wiley, New YorkGoogle Scholar
- Wilmoth JR, Horiuchi S (1999) Rectangularization revisited: variability of age at death within human populations. Demography 36(4):475–495Google Scholar
- Zelnik M (1961) Age heaping in the United States census: 1880–1950. Milbank Q 39:540–573Google Scholar
- Zeng Y, Vaupel JW (2003) Oldest-old mortality in China. Demogr Res 8:215–244Google Scholar