Skip to main content

Classifying Ethnicity Through People’s Names

  • Chapter
  • First Online:
Names, Ethnicity and Populations

Part of the book series: Advances in Spatial Science ((ADVSPATIAL))

Abstract

Several approaches have been proposed to classify populations into ethnic groups using people’s names, as an alternative to ethnicity self-identification information when this is not available. These methodologies have been developed, primarily in the public health literature in different countries, in isolation from and with little participation from demographers or social scientists. This chapter brings together these isolated efforts and provides a coherent comparison, a common methodology and terminology. A systematic review of the most representative studies that develop new name-based ethnicity classifications has been conducted, extracting methodological commonalities, achievements and shortcomings. Their current limitations are mainly due to a restricted number of names and a partial spatio-temporal coverage of the reference population datasets used to produce name reference lists. The chapter concludes with a review of unconventional computational approaches that set the baseline for the development of an innovative name classification methodology in the next chapter (Chap. 7).

“The classificatory role of names proves very useful. By studying names we can find out how the human race divides up and then sort into groups the many people living in a single society” (Smith-Bannister 1997: 15)

This chapter is partly based on material previously published in: Mateos P. 2007. A Review of Name-based Ethnicity Classification Methods and their Potential in Population Studies. Population Space and Place 13(4): 243–263. Part of the text, tables and figures from this article is reproduced here with permission of the journal publisher.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abbotts J, Williams R, Smith GD (1999) Association of medical, physiological, behavioural and socio-economic factors with elevated mortality in men of Irish heritage in West Scotland. J Public Health Med 21(1):46–54

    Article  Google Scholar 

  • Abrahamse AF, Morrison PA, Bolton NM (1994) Surname analysis for estimating local concentration of Hispanics and Asians. Popul Res Policy Rev 13:383–398

    Article  Google Scholar 

  • Adebayo C, Mitchell P (2005) Patient profiling. Presented at GEONom, London, 25 May. Available at http://www.casa.ucl.ac.uk/geonom/Initial_meeting. Accessed 12 May 2006

  • Bhopal R, Fischbacher C, Steiner M, Chalmers J, Povey C et al (2004) Ethnicity and health in Scotland: can we fill the information gap? Centre for Public Health and Primary Care Research, University of Edinburg. Available at http://www.chs.med.ed.ac.uk/phs/research/Retrocoding%20final%20report.pdf. Accessed 22 Nov 2005

  • Bonaventura P, Gori M, Maggini M, Scarselli F, Sheng J (2003) A hybrid model for the prediction of the linguistic origin of surnames. IEEE Trans Knowl Data Eng 15(3):760–763

    Article  Google Scholar 

  • Bouwhuis CB, Moll HA (2003) Determination of ethnicity in children in the Netherlands: two methods compared. Eur J Epidemiol 18(5):385–388

    Article  Google Scholar 

  • Buechley RW (1961) A reproducible method of counting persons of Spanish surname. J Am Stat Assoc 56(293):88–97

    Article  Google Scholar 

  • Buechley RW (1967) Characteristic name sets of Spanish populations. Names 15:53–69

    Google Scholar 

  • Buechley RW (1976) Generally useful ethnic search system: GUESS (mimeo). Cancer Research and Treatment Center, University of New Mexico, Albuquerque

    Google Scholar 

  • Buechley RW, Dunn J, Linden G, Breslow L (1957) Excess lung cancer mortality rates among Mexican women in California. Cancer 10:63–66

    Article  Google Scholar 

  • Chaudhry S, Fink A, Gelberg L, Brook R (2003) Utilization of papanicolaou smears by South Asian women living in the United States. J Gen Intern Med 18(5):377–384

    Article  Google Scholar 

  • Choi BCK, Hanley AJ, Holowaty EJ, Dale D (1993) Use of surnames to identify individuals of Chinese ancestry. Am J Epidemiol 138:723–734

    Google Scholar 

  • Coldman AJ, Braun T, Gallagher RP (1988) The classification of ethnic status using name information. J Epidemiol Community Health 42(4):390–395

    Article  Google Scholar 

  • Cook D, Hewitt D, Milner J (1972) Uses of the surname in epidemiologic research. Am J Epidemiol 95:38–45

    Google Scholar 

  • Coronado GD, Koepsell TD, Thompson B, Schwartz SM, Wharton RS et al (2002) Assessing cervical cancer risk in Hispanics. Cancer Epidemiol Biomark Prev 11(10 Pt 1):979–984

    Google Scholar 

  • Cummins C, Winter H, Cheng K-K, Maric R, Silcocks P et al (1999) An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin. J Public Health Med 2(4):401–406

    Article  Google Scholar 

  • Elliott MN, Fremont A, Morrison PA, Pantoja P, Lurie N (2008) A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Serv Res 43(5):1722–1736

    Article  Google Scholar 

  • Fernandez EW (1975) Comparison of persons of Spanish surname and persons of Spanish origin in the United States. Technical Paper No. 38. U.S. Bureau of the Census, Washington

    Google Scholar 

  • Fucilla JG (1943) The anglicization of Italian surnames in the United States. Am Speech 18(1):26–32

    Article  Google Scholar 

  • Hage BH, Oliver RG, Powles JW, Wahlqvist ML (1990) Telephone directory listings of presumptive Chinese surnames: an appropriate sampling frame for a dispersed population with characteristic surnames. Epidemiology 1(5):405–408

    Article  Google Scholar 

  • Hanks P, Tucker DK (2000) A diagnostic database of American personal names. Names 48(1):59–69

    Article  Google Scholar 

  • Harding S, Dews H, Simpson S (1999) The potential to identify South Asians using a computerised algorithm to classify names. Popul Trends 97:46–50

    Google Scholar 

  • Harland JO, White M, Bhopal RS (1997) Identifying Chinese populations in the UK for epidemiological research experience of a name analysis of the FHSA register. Family Health Services Authority. Public Health 111:331–337

    Google Scholar 

  • Himmelfarb HS, Loar RM, Mott SH (1983) Sampling by ethnic surnames: the case of American Jews. Public Opin Q 47:247–260

    Article  Google Scholar 

  • Hinton L, Jenkins CN, McPhee S, Wong C, Lai KQ et al (1998) A survey of depressive symptoms among Vietnamese-American men in three locales: prevalence and correlates. J Nerv Ment Dis 186(11):677–683

    Article  Google Scholar 

  • Hofstetter CR, Hovell MF, Lee J, Zakarian J, Park H et al (2004) Tobacco use and acculturation among Californians of Korean descent: a behavioral epidemiological analysis. Nicotine Tob Res 6(3):481–489

    Article  Google Scholar 

  • Honer D (2004) Identifying ethnicity: a comparison of two computer programmes designed to identify names of South Asian ethnic origin. UK Centre for Evidence in Ethnicity Health & Diversity, University of Warwick. Available at http://www2.warwick.ac.uk/fac/med/research/csri/ethnicityhealth/aspects_diversity/identifying_ethnicity/. Accessed 22 Jun 2006

  • Humpert A, Schneiderheinze K (2000) Stichprobenziehung für telefonische zuwandererumfragen. Einsatzmöglichkeiten der namenforschung. ZUMA-Nachrichten 24(47):36–64

    Google Scholar 

  • Jobling MA (2001) In the name of the father: surnames and genetics. Trends Genet 17(6):353–357

    Article  Google Scholar 

  • Kimmerle MM (1942) Norwegian-American surnames in transition. Am Speech 17(3):158–165

    Article  Google Scholar 

  • Kitano HH, Lubben JE, Chi I (1988) Predicting Japanese American drinking behavior. Int J Addict 23(4):417–428

    Google Scholar 

  • Kolehmainen JI (1939) Finnish surnames in America. Am Speech 14(1):33–38

    Article  Google Scholar 

  • Lai DW (2004) Impact of culture on depressive symptoms of elderly Chinese immigrants. Can J Psychiatry 49(12):820–827

    Google Scholar 

  • Lasker GW (1985) Surnames and genetic structure. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Lasker G (1997) Census versus sample data in isonymy studies: relationship at short distances. Hum Biol 69(5):733–738

    Google Scholar 

  • Lauderdale DS (2006) Birth outcomes for Arabic-named women in California before and after September 11. Demography 43(1):185–201

    Article  Google Scholar 

  • Lauderdale D, Kestenbaum B (2000) Asian American ethnic identification by surname. Popul Res Policy Rev 19(3):283–300

    Article  Google Scholar 

  • Linguistic Minorities Project (1985) The other languages of England. Routledge & Kegan Paul, London

    Google Scholar 

  • Longley PA, Maguire DJ, Goodchild MF, Rhind D (2005) Geographic information systems and science. Wiley, Chichester

    Google Scholar 

  • Lyra F (1966) Polish surnames in the United States. Am Speech 41(1):39–44

    Article  Google Scholar 

  • Martineau A, White M (1998) What’s not in a name. The accuracy of using names to ascribe religious and geographical origin in a British population. J Epidemiol Community Health 52(5):336–337

    Article  Google Scholar 

  • Mateos P (2007) A review of name-based ethnicity classification methods and their potential in population studies. Popul Space Place 13(4):243–263

    Article  Google Scholar 

  • Nanchahal K, Mangtani P, Alston M, dos Santos Silva I (2001) Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British Health-related studies. J Public Health Med 23(4):278–285

    Article  Google Scholar 

  • Nicoll A, Bassett K, Ulijaszek SJ (1986) What’s in a name? Accuracy of using surnames and forenames in ascribing Asian ethnic identity in English populations. J Epidemiol Community Health 40(4):364–368

    Article  Google Scholar 

  • Passel JS, Word DL (1980) Constructing the list of Spanish surnames for the 1980 Census an application of Bayes theorem. Presented at annual meeting of the population association of America, Denver, CO, April 1980

    Google Scholar 

  • Passel JS, Word DL, McKenney ND, Kim Y (1982) Postcensal estimates of the Asian population in the United States description of methods using surname and administrative records. Presented at annual meeting of the Population Association of America, San Diego, CA, April 1982

    Google Scholar 

  • Peach C, Owen D (2004) Social geography of British South Asian Muslim, Sikh and Hindu sub-communities. ESRC end of project full report R-000239765. Available at http://www.esrcsocietytoday.ac.uk/ESRCInfoCentre/ (search for “R-000239765”). Accessed 15 Aug 2006

  • Perkins RC (1993) Evaluating the Passel-word Spanish surname list 1990 decennial census post enumeration survey results. Technical Working Paper 4. US Bureau of the Census, Population Division, Washington, DC. Available at http://www.census.gov/population/www/documentation/twps0004.html. Accessed 29 May 2005

  • Petersen W (2001) Surnames in US population records. Popul Dev Rev 27(2):315

    Article  Google Scholar 

  • Rahman MM, Luong NT, Divan HA, Jesser C, Golz SD et al (2005) Prevalence and predictors of smoking behavior among Vietnamese men living in California. Nicotine Tob Res 7(1):103–109

    Article  Google Scholar 

  • Razum O, Zeeb H, Akgun S (2001) How useful is a name-based algorithm in health research among Turkish migrants in Germany? Trop Med Int Health 6(8):654–661

    Article  Google Scholar 

  • Rissel C, Ward JE, Jorm L (1999) Estimates of smoking and related behaviour in an immigrant Lebanese community: does survey method matter? Aust N Z J Public Health 23(5):534–537

    Article  Google Scholar 

  • Senior PA, Bhopal R (1994) Ethnicity as a variable in epidemiological research. Br Med J 309(6950):327–330

    Article  Google Scholar 

  • Sheth T, Nair C, Nargundkar M, Anand S, Yusuf S (1999) Cardiovascular and cancer mortality among Canadians of European, south Asian and Chinese origin from 1979 to 1993: an analysis of 1.2 million deaths. Can Med Assoc J 161:132–138

    Google Scholar 

  • Smith-Bannister S (1997) Names and naming patterns in England 1538–1700. Oxford University Press, Clarendon, PA

    Google Scholar 

  • Stewart SL, Swallen KC, Glaser SL, Horn-Ross PL, West DW (1999) Comparison of methods for classifying Hispanic ethnicity in a population-based cancer registry. Am J Epidemiol 149(11):1063–1071

    Article  Google Scholar 

  • The Economist (2007) What’s in a name? The Economist, Technology Quarterly Survey, 10 March: 27

    Google Scholar 

  • Tu SP, Yasui Y, Kuniyuki A, Schwartz SM, Jackson JC et al (2002) Breast cancer screening: stages of adoption among Cambodian American women. Cancer Detect Prev 26(1):33–41

    Article  Google Scholar 

  • Tucker DK (2003) Surnames, forenames and correlations. In: Hanks P (ed) Dictionary of American family names. Oxford University Press, New York, pp xxiii–xxvii

    Google Scholar 

  • US Bureau of the Census (1953) Persons of Spanish surname. US Census of Population: 1950, vol IV, Special Report P-E, No. 3C, U.S. Department of Commerce. US Government Printing Office, Washington, DC

    Google Scholar 

  • US Bureau of the Census (1963) U.S. Census of Population: 1960. Subject reports, Persons of Spanish surname. U.S. Government Printing Office, Washington, DC

    Google Scholar 

  • US Bureau of the Census (1980) 1980 census of population and housing: Spanish list technical documentation. Data User Services Division, Washington, DC

    Google Scholar 

  • US Census Bureau (2006) US Census Bureau geneaology resources. Available at http://www.census.gov/genealogy/www/. Accessed 12 May 2006

  • Williams K (2007) US Patent application: name classifier algorithm. Available at http://www.uspto.gov (search for patent number ‘20070005597’). Accessed 19 Mar 2007

  • Williams K, Patman F (2005) Personal entity extraction filtering using name data stores. Presented at international conference on intelligence analysis. McLean, VA, 2–6 May. Available at https://analysis.mitre.org/proceedings/Final_Papers_Files/33_Camera_Ready_Paper.pdf. Accessed 26 May 2006

  • Winnie WW Jr (1960) The Spanish surname criterion for identifying Hispanos in the southwestern United States: a preliminary evaluation. Soc Forces 38(4):363–366

    Article  Google Scholar 

  • Word DL, Perkins RC (1996) Building a Spanish surname list for the 1990s a new approach to an old problem. Technical Working Paper 13. US Census Bureau, Population Division, Washington, DC. Available at http://www.census.gov/population/documentation/twpno13.pdf. Accessed 29 May 2005

  • Word DL, Passel JS, Causey BD, Fernandez EF (1978) Determining a list of Spanish surnames by analysis of geographical distributions. Presented at annual meeting of southern regional demographic group, San Antonio, TX, October

    Google Scholar 

  • Yavari P, Hislop TG, Abanto Z (2005) Methodology to identify Iranian immigrants for epidemiological studies. Asian Pac J Cancer Prev 6(4):455–457

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mateos, P. (2014). Classifying Ethnicity Through People’s Names. In: Names, Ethnicity and Populations. Advances in Spatial Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45413-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45413-4_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45412-7

  • Online ISBN: 978-3-642-45413-4

  • eBook Packages: Business and EconomicsEconomics and Finance (R0)

Publish with us

Policies and ethics