Skip to main content

“We Know Who You Are and We Know Where You Live”: A Research Agenda for Web Demographics

  • Chapter
  • First Online:
Crowdsourcing Geographic Knowledge

Abstract

In the digital era, personal data, such as full name, address, age, phone number, and household members, may scatter across various administrative records, databases of private companies, and social networks, as well as through information contributed as volunteered geographic information (VGI). People-finder sites gather such data and provide user interfaces for Internet users to query web demographics. The emergence of web technologies such as mash-ups, web mapping, and webscraping presents opportunities to capitalize on the availability of web demographics and opens up new frontiers of research. The main objectives of this chapter are to (1) examine web demographics as an example of VGI and (2) explore the research agenda of web demographics. More research and development are needed to enhance extraction rules, identify and remove erroneous enumeration (e.g., duplicate, fictitious, and incomplete records), validate the coverage and accuracy of web demographics, and explore potential applications. Web demographics must be used cautiously in light of the uncertainty of web demographics (e.g., digital divide), privacy issues, and other societal impacts.

This title is partially adapted from Goss (1995) as a tribute to his early vision of “data merchants” acquiring massive personal demographic data and the instrumentation prospects and potential for resistance to geodemographic marketing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Anderson, M. J., & Fienberg, S. E. (1999). Who counts? The politics of census-taking in contemporary America. New York: Russell Sage.

    Google Scholar 

  • Anderson, M. J., & Fienberg, S. E. (2002). Why is there still a controversy about adjusting the census for undercount. PSOnline, March, 83–85.

    Google Scholar 

  • Armstrong, M. P., Rushton, G., Zimmerman, D. L., et al. (1999). Geographically masking health data to preserve confidentiality. Statistics in Medicine, 18, 497–525.

    Article  Google Scholar 

  • Barreto, M., DeFrancesco-Soto, V., Merolla, J., & Ramirez, R. (2008). Latino campaign ad experimental study. Los Angeles, CA.

    Google Scholar 

  • Berners-Lee, T., Hendler, J., Lassila, O., et al. (2001). The semantic web (Resource document). Scientific American Magazine. http://www.scientificamerican.com/article.cfm?id=the-semantic-web . Accessed 26 May 2009.

  • Berry, J. W., Phinney, J. S., Sam, D. L., Vedder, P., et al. (2006). Immigrant youth: Acculturation, identity, and adaptation. Applied Psychology: An International Review, 55(3), 303–332.

    Article  Google Scholar 

  • Bhaduri, D. (2011). Enhancing resolution of population distribution data in spatial, temporal, and sociocultural dimensions: Advances and challenges (Resource document). Specialist meeting in future direction of spatial demography. http://ncgia.ucsb.edu/projects/spatial-demography/docs/Bhaduri-position.pdf. Accessed 2 Jan 2012.

  • Browning, C. R. (2011). Future directions in spatial demography (Resource document). Specialist meeting in future direction of spatial demography. http://ncgia.ucsb.edu/projects/spatial-demography/docs/Browning-position.pdf. Accessed 2 Jan 2012.

  • Chang, C., Kayed, M., Girgis, M., Shaalan, K., et al. (2006). A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 18, 1411–1428.

    Article  Google Scholar 

  • Cheng, X., Hu, Y., Chia, L.-T., et al. (2011). Exploiting local dependencies with spatial-scale space (S-Cube) for near-duplicate retrieval. Computer Vision and Image Understanding, 115(6), 750–758.

    Article  Google Scholar 

  • Chow, T. E. (2008). The potential of maps APIs for internet GIS. Transactions in GIS, 12(2), 179–191.

    Article  Google Scholar 

  • Chow, T. E. (2011). Geography 2.0: A mashup perspective. In S. Li, S. Dragicevic, & B. Veenendaal (Eds.), Advances in web-based GIS, mapping services and applications (pp. 15–36). Boca Raton: CRC Press.

    Chapter  Google Scholar 

  • Chow, T. E., Lin, Y., Huynh, N. T., Davis, J., et al. (2010). Using web demographics to model population change of Vietnamese-Americans in Texas between 2000–2009. GeoJournal. doi:10.1007/s10708-010-9390-6.

  • Chow, T. E., Lin, Y., Chan, W. D., et al. (2011). The development of a web-based demographic data extraction tool for population monitoring. Transactions in GIS, 15(4), 479–494.

    Article  Google Scholar 

  • Coleman, D. J., Georgiadou, Y., Labonte, J., et al. (2009). Volunteered geographic information: The nature and motivation of producers. International Journal of Spatial Data Infrastructures Research, 4, 332–358.

    Google Scholar 

  • De Longueville, B. (2010). Community-based geoportals: The next generation? Concepts and methods for the geospatial Web 2.0. Computers, Environment and Urban Systems, 34(4), 299–308.

    Article  Google Scholar 

  • Dobson, J. E., & Fisher, P. F. (2003). Geoslavery. IEEE Technology and Society Magazine, 22(1), 47–52.

    Article  Google Scholar 

  • Egghe, L. (2005). Power laws in the information production process: Lotkaian informetrics. Amsterdam: Academic.

    Google Scholar 

  • Elwood, S., & Leszczynski, A. (2011). Privacy, reconsidered: New representations, data practices, and the geoweb. Geoforum, 42(1), 6–15.

    Article  Google Scholar 

  • Firestone, S. M., Ward, M. P., Christley, R. M., Dhand, N. K., et al. (2011). The importance of location in contact networks: Describing early epidemic spread using spatial social network analysis. Preventive Veterinary Medicine, 102(2), 185–195.

    Article  Google Scholar 

  • Goodchild, M. F. (1989). Modeling errors in objects and fields. In M. F. Goodchild & S. Gopal (Eds.), The accuracy of spatial databases. New York: Taylor & Francis.

    Google Scholar 

  • Goodchild, M. F. (2007). Citizens as voluntary sensors: Spatial data infrastructure in the world of web 2.0. International Journal of Spatial Data Infrastructure Research, 2, 24–32.

    Google Scholar 

  • Goodchild, M. F. (2008). Spatial accuracy 2.0. Spatial Uncertainty: Proceedings of the Eighth International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, 1, 1–7.

    Google Scholar 

  • Goss, J. (1995). We know who you are and we know where you live: The instrumental rationality of geodemographic systems. Economic Geography, 71, 171–198.

    Article  Google Scholar 

  • Gujjary, V. A., & Saxena, A. (2011). A neutral network approach for data masking. Neurocomputing, 74(9), 1497–1501.

    Article  Google Scholar 

  • Hardy, D. (2010). The wikification of geospatial metadata (Resource document). Workshop on the “Role of Volunteered Geographic Information in Advancing Science”. http://www.ornl.gov/sci/gist/workshops/papers/Hardy.pdf. Accessed 21 July 2011.

  • Hugl, U. (2011). Reviewing person’s value of privacy of online social networking. Internet Research, 21(4), 1–17.

    Article  Google Scholar 

  • Kaplan, B., & Lasker, G. (1983). The present distribution of some English surnames derived from place names. Human Biology, 55(2), 243–250.

    Google Scholar 

  • Kwan, M. P., Casas, I., Schmitz, B. C., et al. (2004). Protection of geoprivacy and accuracy of spatial information: How effective are geographical masks? Cartographica, 39, 15–28.

    Article  Google Scholar 

  • Landsbergen, D. (2004). Screen-level bureaucracy: Databases as public records. Government Information Quarterly, 21(1), 24–25.

    Article  Google Scholar 

  • Lasker, G. (1985). Surnames and genetic structure. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Lasker, G., & Mascie-Taylor, C. (1990). Atlas of British surnames. Detroit: Wayne State University Press.

    Google Scholar 

  • Lauderdale, D. S., & Kestenbaum, B. (2000). Asian-American ethnic identification by surname. Population Research and Policy Review, 19, 283–300.

    Article  Google Scholar 

  • Li, W., Liu, J., Wang, C., et al. (2005). Web document duplicate removal algorithm based on keyword sequences. In Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (pp. 511–516). Piscataway: IEEE.

    Google Scholar 

  • Linberger, P., & White, G. (1998). Geographic information on the web: Extracting demographic and market research information. Proceedings of the Nineteenth Annual National Online Meeting, 19, 235–242.

    Google Scholar 

  • Longley, P. A., Cheshire, J. A., Mateos, P., et al. (2011). Creating a regional geography of Britain through the spatial analysis of surnames. Geoforum, 42(4), 506–516.

    Article  Google Scholar 

  • Low, W. L., Lee, M. L., Ling, T. W., et al. (2001). A knowledge-based approach for duplicate, elimination in data cleaning. Information Systems, 26(8), 585–606.

    Article  Google Scholar 

  • Métais, E. (2002). Enhancing information systems management with natural language processing techniques. Data and Knowledge Engineering, 41(2–3), 247–272.

    Article  Google Scholar 

  • Morgan, R. O., Wei, I. I., Virnig, B. A., et al. (2004). Improving identification of Hispanic males in Medicare: Use of surname matching. Medical Care, 42, 810–816.

    Article  Google Scholar 

  • Nagata, M. L. (1999). Why did you change your name? Name changing patterns and the life course in early modern Japan. The History of the Family, 4(3), 315–338.

    Article  Google Scholar 

  • Nasseri, K. (2007). Construction and validation of a list of common Middle Eastern surnames for epidemiological research. Cancer Detection and Prevention, 31, 424–429.

    Article  Google Scholar 

  • Newman, G., Graham, J., Crall, A., Laituri, M., et al. (2011). The art and science of multi-scale citizen science support. Ecological Informatics, 6(3–4), 217–227.

    Article  Google Scholar 

  • Oliva, J., Serrano, J. I., del Castillo, M. D., Iglesias, A., et al. (2011). SyMSS: A syntax-based measure for short-text semantic similarity. Data and Knowledge Engineering, 70(4), 390–405.

    Article  Google Scholar 

  • Perez-Stable, E. J., Hiatt, R. A., Sabogal, F., Otero-Sabogal, R., et al. (1995). Use of Spanish surnames to identify Latinos: Comparison to self-identification. Journal of the National Cancer Institute Monographs, 18, 11–15.

    Google Scholar 

  • Perkins, R. C. (1993). Evaluating the passel-word Spanish surname list: 1990 decennial census post enumeration survey results (Resource document, U.S. Bureau of the Census, Population Division Working Paper No. 4). http://www.census.gov/population/www/documentation/twps0004.html. Accessed 15 July 2011.

  • Phillip, M. (2005). Why pay for value-added information? World Patent Information, 27(1), 7–11.

    Article  Google Scholar 

  • Prieger, J. E., & Hu, W. (2008). The broadband digital divide and the nexus of race, competition, and quality. Information Economics and Policy, 20(2), 150–167.

    Article  Google Scholar 

  • Quan, H., Wang, F., Schopflocher, D., Norris, C., Galbraith, P. D., Faris, P., Graham, M. M., Knudtson, M. L., Ghali, W. A., et al. (2006). Development and validation of a surname list to define Chinese ethnicity. Medical Care, 44, 328–333.

    Article  Google Scholar 

  • Robbin, A. (2001). The loss of personal privacy and its consequences for social research. Journal of Government Information, 28(5), 493–527.

    Article  Google Scholar 

  • Robinson, J. G., & Adlakha, A. (2002). Comparison of A.C.E. revision II results with demographic analysis (Resource document, U.S. Bureau of the Census, DSSD A.C.E. Revision II Estimates Memorandum Series #PP-41). http://www.census.gov/dmd/www/pdf/pp-41r.pdf. Accessed 12 July 2011.

  • Seeger, C. J. (2008). The role of facilitated volunteered geographic information in the landscape planning and site design process. GeoJournal, 72(3–4), 199–213.

    Article  Google Scholar 

  • Shah, B. R., Chiu, M., Amin, S., Ramani, M., Sadry, S., Tu, J. V., et al. (2010). Surname lists to identify South Asian and Chinese ethnicity from secondary data in Ontario, Canada: A validation study. BMC Medical Research Methodology, 10, 42. doi:101186/1471-2288-10-42.

    Article  Google Scholar 

  • Singer, E., Mathiowetz, N. A., & Couper, M. P. (1993). The impact of privacy and confidentiality concerns on survey participation: The case of the 1990 U.S. census. Public Opinion Quarterly, 57(4), 465–482.

    Article  Google Scholar 

  • Singleton, A. D., & Longley, P. A. (2009). Geodemographics, visualization, and social networks in applied geography. Applied Geography, 29(3), 289–298.

    Article  Google Scholar 

  • Sui, D. Z. (2008). The wikification of GIS and its consequences: Or Angelina Jolie’s new tattoo and the future of GIS. Computers Environment and Urban Systems, 32(1), 1–5.

    Article  Google Scholar 

  • Swift, J. N., Goldberg, D. W., & Wilson, J. P. (2008). Geocoding best practices: Review of eight commonly used geocoding systems (Resource document, University of Southern California GIS Research Laboratory Technical Report No 10). http://spatial.usc.edu/Users/dan/gislabtr10_Eight-Commonly-Used-Geocoding-Systems.pdf. Accessed 2 Jan 2012.

  • Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46, 34–240.

    Article  Google Scholar 

  • Tulloch, D. L. (2008). Is VGI participation? From vernal pools to video games. GeoJournal, 72(3–4), 161–171.

    Article  Google Scholar 

  • Uhlaner, C. J., Cain, B. E., & Kiewiet, D. R. (1989). Political participation of ethnic minorities in the 1980s. Political Behavior, 11(3), 195–231.

    Article  Google Scholar 

  • U.S. Census Bureau. (2010). Genealogy data: Frequently occurring surnames from Census 2000 (Resource document). http://www.census.gov/genealogy/www/data/2000surnames/index.html. Accessed 15 July 2011.

  • U.S. Government Accountability Office. (2001). Significant increase in cost per housing unit compared to 1990 (Resource document. GAO-02–31). http://www.gao.gov/new.items/d0231.pdf. Accessed 12 July 2011.

  • U.S. Government Accountability Office. (2008). Census Bureau should take action to improve the credibility and accuracy of its cost estimate for the decennial census (Resource document. GAO-08–554). http://www.gao.gov/new.items/d08554.pdf. Accessed 23 July 2011.

  • Wei, I. I., Virnig, B. A., John, D. A., & Morgan, R. O. (2006). Using a Spanish surname match to improve identification of Hispanic women in Medicare administrative data. Health Research and Educational Trust, 41(4), 1469–1481.

    Google Scholar 

  • WhitePages. (2011). WhitePages privacy central (Resource document). http://www.whitepage.com/help/privacy_central. Accessed 20 July 2011.

  • Word, D. L., Coleman, C. D., Nunbziata, R., Kominski, R., et al. (n.d.). Demographic aspects of surname from Census 2000, genealogy data: Frequent occurring surnames from Census 2000 (Resource document. US Census Bureau). http://www.census.gov/genealogy/www/data/2000surnames/surnames.pdf. Accessed 14 July 2011.

  • Wright, T. (2000). Census 2000: Who says counting is easy as 1–2–3? Government Information Quarterly, 17(2), 121–136.

    Article  Google Scholar 

Download references

Acknowledgments

The author is thankful to Yan Lin, who provided valuable assistance in preparing the statistics of web demographics acquired from several people-finder sites. Collaboration with colleagues and fellow students, including Niem Huynh, David Parr, John Davis, and Anne Ngu, on projects related to web demographics is instrumental to the ideas articulated in this manuscript. The author is in debt to Nancy Wilson, David Parr, and Niem Huynh for their editorial assistance and helpful reviews. The constructive comments from the reviewers greatly improved the quality of this manuscript. Any errors in the manuscript are solely the responsibility of the author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Edwin Chow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht.

About this chapter

Cite this chapter

Chow, T.E. (2013). “We Know Who You Are and We Know Where You Live”: A Research Agenda for Web Demographics. In: Sui, D., Elwood, S., Goodchild, M. (eds) Crowdsourcing Geographic Knowledge. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-4587-2_15

Download citation

Publish with us

Policies and ethics