Abstract
Several approaches have been proposed to classify populations into ethnic groups using people’s names, as an alternative to ethnicity self-identification information when this is not available. These methodologies have been developed, primarily in the public health literature in different countries, in isolation from and with little participation from demographers or social scientists. This chapter brings together these isolated efforts and provides a coherent comparison, a common methodology and terminology. A systematic review of the most representative studies that develop new name-based ethnicity classifications has been conducted, extracting methodological commonalities, achievements and shortcomings. Their current limitations are mainly due to a restricted number of names and a partial spatio-temporal coverage of the reference population datasets used to produce name reference lists. The chapter concludes with a review of unconventional computational approaches that set the baseline for the development of an innovative name classification methodology in the next chapter (Chap. 7).
“The classificatory role of names proves very useful. By studying names we can find out how the human race divides up and then sort into groups the many people living in a single society” (Smith-Bannister 1997: 15)
This chapter is partly based on material previously published in: Mateos P. 2007. A Review of Name-based Ethnicity Classification Methods and their Potential in Population Studies. Population Space and Place 13(4): 243–263. Part of the text, tables and figures from this article is reproduced here with permission of the journal publisher.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbotts J, Williams R, Smith GD (1999) Association of medical, physiological, behavioural and socio-economic factors with elevated mortality in men of Irish heritage in West Scotland. J Public Health Med 21(1):46–54
Abrahamse AF, Morrison PA, Bolton NM (1994) Surname analysis for estimating local concentration of Hispanics and Asians. Popul Res Policy Rev 13:383–398
Adebayo C, Mitchell P (2005) Patient profiling. Presented at GEONom, London, 25 May. Available at http://www.casa.ucl.ac.uk/geonom/Initial_meeting. Accessed 12 May 2006
Bhopal R, Fischbacher C, Steiner M, Chalmers J, Povey C et al (2004) Ethnicity and health in Scotland: can we fill the information gap? Centre for Public Health and Primary Care Research, University of Edinburg. Available at http://www.chs.med.ed.ac.uk/phs/research/Retrocoding%20final%20report.pdf. Accessed 22 Nov 2005
Bonaventura P, Gori M, Maggini M, Scarselli F, Sheng J (2003) A hybrid model for the prediction of the linguistic origin of surnames. IEEE Trans Knowl Data Eng 15(3):760–763
Bouwhuis CB, Moll HA (2003) Determination of ethnicity in children in the Netherlands: two methods compared. Eur J Epidemiol 18(5):385–388
Buechley RW (1961) A reproducible method of counting persons of Spanish surname. J Am Stat Assoc 56(293):88–97
Buechley RW (1967) Characteristic name sets of Spanish populations. Names 15:53–69
Buechley RW (1976) Generally useful ethnic search system: GUESS (mimeo). Cancer Research and Treatment Center, University of New Mexico, Albuquerque
Buechley RW, Dunn J, Linden G, Breslow L (1957) Excess lung cancer mortality rates among Mexican women in California. Cancer 10:63–66
Chaudhry S, Fink A, Gelberg L, Brook R (2003) Utilization of papanicolaou smears by South Asian women living in the United States. J Gen Intern Med 18(5):377–384
Choi BCK, Hanley AJ, Holowaty EJ, Dale D (1993) Use of surnames to identify individuals of Chinese ancestry. Am J Epidemiol 138:723–734
Coldman AJ, Braun T, Gallagher RP (1988) The classification of ethnic status using name information. J Epidemiol Community Health 42(4):390–395
Cook D, Hewitt D, Milner J (1972) Uses of the surname in epidemiologic research. Am J Epidemiol 95:38–45
Coronado GD, Koepsell TD, Thompson B, Schwartz SM, Wharton RS et al (2002) Assessing cervical cancer risk in Hispanics. Cancer Epidemiol Biomark Prev 11(10 Pt 1):979–984
Cummins C, Winter H, Cheng K-K, Maric R, Silcocks P et al (1999) An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin. J Public Health Med 2(4):401–406
Elliott MN, Fremont A, Morrison PA, Pantoja P, Lurie N (2008) A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Serv Res 43(5):1722–1736
Fernandez EW (1975) Comparison of persons of Spanish surname and persons of Spanish origin in the United States. Technical Paper No. 38. U.S. Bureau of the Census, Washington
Fucilla JG (1943) The anglicization of Italian surnames in the United States. Am Speech 18(1):26–32
Hage BH, Oliver RG, Powles JW, Wahlqvist ML (1990) Telephone directory listings of presumptive Chinese surnames: an appropriate sampling frame for a dispersed population with characteristic surnames. Epidemiology 1(5):405–408
Hanks P, Tucker DK (2000) A diagnostic database of American personal names. Names 48(1):59–69
Harding S, Dews H, Simpson S (1999) The potential to identify South Asians using a computerised algorithm to classify names. Popul Trends 97:46–50
Harland JO, White M, Bhopal RS (1997) Identifying Chinese populations in the UK for epidemiological research experience of a name analysis of the FHSA register. Family Health Services Authority. Public Health 111:331–337
Himmelfarb HS, Loar RM, Mott SH (1983) Sampling by ethnic surnames: the case of American Jews. Public Opin Q 47:247–260
Hinton L, Jenkins CN, McPhee S, Wong C, Lai KQ et al (1998) A survey of depressive symptoms among Vietnamese-American men in three locales: prevalence and correlates. J Nerv Ment Dis 186(11):677–683
Hofstetter CR, Hovell MF, Lee J, Zakarian J, Park H et al (2004) Tobacco use and acculturation among Californians of Korean descent: a behavioral epidemiological analysis. Nicotine Tob Res 6(3):481–489
Honer D (2004) Identifying ethnicity: a comparison of two computer programmes designed to identify names of South Asian ethnic origin. UK Centre for Evidence in Ethnicity Health & Diversity, University of Warwick. Available at http://www2.warwick.ac.uk/fac/med/research/csri/ethnicityhealth/aspects_diversity/identifying_ethnicity/. Accessed 22 Jun 2006
Humpert A, Schneiderheinze K (2000) Stichprobenziehung für telefonische zuwandererumfragen. Einsatzmöglichkeiten der namenforschung. ZUMA-Nachrichten 24(47):36–64
Jobling MA (2001) In the name of the father: surnames and genetics. Trends Genet 17(6):353–357
Kimmerle MM (1942) Norwegian-American surnames in transition. Am Speech 17(3):158–165
Kitano HH, Lubben JE, Chi I (1988) Predicting Japanese American drinking behavior. Int J Addict 23(4):417–428
Kolehmainen JI (1939) Finnish surnames in America. Am Speech 14(1):33–38
Lai DW (2004) Impact of culture on depressive symptoms of elderly Chinese immigrants. Can J Psychiatry 49(12):820–827
Lasker GW (1985) Surnames and genetic structure. Cambridge University Press, Cambridge
Lasker G (1997) Census versus sample data in isonymy studies: relationship at short distances. Hum Biol 69(5):733–738
Lauderdale DS (2006) Birth outcomes for Arabic-named women in California before and after September 11. Demography 43(1):185–201
Lauderdale D, Kestenbaum B (2000) Asian American ethnic identification by surname. Popul Res Policy Rev 19(3):283–300
Linguistic Minorities Project (1985) The other languages of England. Routledge & Kegan Paul, London
Longley PA, Maguire DJ, Goodchild MF, Rhind D (2005) Geographic information systems and science. Wiley, Chichester
Lyra F (1966) Polish surnames in the United States. Am Speech 41(1):39–44
Martineau A, White M (1998) What’s not in a name. The accuracy of using names to ascribe religious and geographical origin in a British population. J Epidemiol Community Health 52(5):336–337
Mateos P (2007) A review of name-based ethnicity classification methods and their potential in population studies. Popul Space Place 13(4):243–263
Nanchahal K, Mangtani P, Alston M, dos Santos Silva I (2001) Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British Health-related studies. J Public Health Med 23(4):278–285
Nicoll A, Bassett K, Ulijaszek SJ (1986) What’s in a name? Accuracy of using surnames and forenames in ascribing Asian ethnic identity in English populations. J Epidemiol Community Health 40(4):364–368
Passel JS, Word DL (1980) Constructing the list of Spanish surnames for the 1980 Census an application of Bayes theorem. Presented at annual meeting of the population association of America, Denver, CO, April 1980
Passel JS, Word DL, McKenney ND, Kim Y (1982) Postcensal estimates of the Asian population in the United States description of methods using surname and administrative records. Presented at annual meeting of the Population Association of America, San Diego, CA, April 1982
Peach C, Owen D (2004) Social geography of British South Asian Muslim, Sikh and Hindu sub-communities. ESRC end of project full report R-000239765. Available at http://www.esrcsocietytoday.ac.uk/ESRCInfoCentre/ (search for “R-000239765”). Accessed 15 Aug 2006
Perkins RC (1993) Evaluating the Passel-word Spanish surname list 1990 decennial census post enumeration survey results. Technical Working Paper 4. US Bureau of the Census, Population Division, Washington, DC. Available at http://www.census.gov/population/www/documentation/twps0004.html. Accessed 29 May 2005
Petersen W (2001) Surnames in US population records. Popul Dev Rev 27(2):315
Rahman MM, Luong NT, Divan HA, Jesser C, Golz SD et al (2005) Prevalence and predictors of smoking behavior among Vietnamese men living in California. Nicotine Tob Res 7(1):103–109
Razum O, Zeeb H, Akgun S (2001) How useful is a name-based algorithm in health research among Turkish migrants in Germany? Trop Med Int Health 6(8):654–661
Rissel C, Ward JE, Jorm L (1999) Estimates of smoking and related behaviour in an immigrant Lebanese community: does survey method matter? Aust N Z J Public Health 23(5):534–537
Senior PA, Bhopal R (1994) Ethnicity as a variable in epidemiological research. Br Med J 309(6950):327–330
Sheth T, Nair C, Nargundkar M, Anand S, Yusuf S (1999) Cardiovascular and cancer mortality among Canadians of European, south Asian and Chinese origin from 1979 to 1993: an analysis of 1.2 million deaths. Can Med Assoc J 161:132–138
Smith-Bannister S (1997) Names and naming patterns in England 1538–1700. Oxford University Press, Clarendon, PA
Stewart SL, Swallen KC, Glaser SL, Horn-Ross PL, West DW (1999) Comparison of methods for classifying Hispanic ethnicity in a population-based cancer registry. Am J Epidemiol 149(11):1063–1071
The Economist (2007) What’s in a name? The Economist, Technology Quarterly Survey, 10 March: 27
Tu SP, Yasui Y, Kuniyuki A, Schwartz SM, Jackson JC et al (2002) Breast cancer screening: stages of adoption among Cambodian American women. Cancer Detect Prev 26(1):33–41
Tucker DK (2003) Surnames, forenames and correlations. In: Hanks P (ed) Dictionary of American family names. Oxford University Press, New York, pp xxiii–xxvii
US Bureau of the Census (1953) Persons of Spanish surname. US Census of Population: 1950, vol IV, Special Report P-E, No. 3C, U.S. Department of Commerce. US Government Printing Office, Washington, DC
US Bureau of the Census (1963) U.S. Census of Population: 1960. Subject reports, Persons of Spanish surname. U.S. Government Printing Office, Washington, DC
US Bureau of the Census (1980) 1980 census of population and housing: Spanish list technical documentation. Data User Services Division, Washington, DC
US Census Bureau (2006) US Census Bureau geneaology resources. Available at http://www.census.gov/genealogy/www/. Accessed 12 May 2006
Williams K (2007) US Patent application: name classifier algorithm. Available at http://www.uspto.gov (search for patent number ‘20070005597’). Accessed 19 Mar 2007
Williams K, Patman F (2005) Personal entity extraction filtering using name data stores. Presented at international conference on intelligence analysis. McLean, VA, 2–6 May. Available at https://analysis.mitre.org/proceedings/Final_Papers_Files/33_Camera_Ready_Paper.pdf. Accessed 26 May 2006
Winnie WW Jr (1960) The Spanish surname criterion for identifying Hispanos in the southwestern United States: a preliminary evaluation. Soc Forces 38(4):363–366
Word DL, Perkins RC (1996) Building a Spanish surname list for the 1990s a new approach to an old problem. Technical Working Paper 13. US Census Bureau, Population Division, Washington, DC. Available at http://www.census.gov/population/documentation/twpno13.pdf. Accessed 29 May 2005
Word DL, Passel JS, Causey BD, Fernandez EF (1978) Determining a list of Spanish surnames by analysis of geographical distributions. Presented at annual meeting of southern regional demographic group, San Antonio, TX, October
Yavari P, Hislop TG, Abanto Z (2005) Methodology to identify Iranian immigrants for epidemiological studies. Asian Pac J Cancer Prev 6(4):455–457
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Mateos, P. (2014). Classifying Ethnicity Through People’s Names. In: Names, Ethnicity and Populations. Advances in Spatial Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45413-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-45413-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45412-7
Online ISBN: 978-3-642-45413-4
eBook Packages: Business and EconomicsEconomics and Finance (R0)