Abstract
In the digital era, personal data, such as full name, address, age, phone number, and household members, may scatter across various administrative records, databases of private companies, and social networks, as well as through information contributed as volunteered geographic information (VGI). People-finder sites gather such data and provide user interfaces for Internet users to query web demographics. The emergence of web technologies such as mash-ups, web mapping, and webscraping presents opportunities to capitalize on the availability of web demographics and opens up new frontiers of research. The main objectives of this chapter are to (1) examine web demographics as an example of VGI and (2) explore the research agenda of web demographics. More research and development are needed to enhance extraction rules, identify and remove erroneous enumeration (e.g., duplicate, fictitious, and incomplete records), validate the coverage and accuracy of web demographics, and explore potential applications. Web demographics must be used cautiously in light of the uncertainty of web demographics (e.g., digital divide), privacy issues, and other societal impacts.
This title is partially adapted from Goss (1995) as a tribute to his early vision of “data merchants” acquiring massive personal demographic data and the instrumentation prospects and potential for resistance to geodemographic marketing systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anderson, M. J., & Fienberg, S. E. (1999). Who counts? The politics of census-taking in contemporary America. New York: Russell Sage.
Anderson, M. J., & Fienberg, S. E. (2002). Why is there still a controversy about adjusting the census for undercount. PSOnline, March, 83–85.
Armstrong, M. P., Rushton, G., Zimmerman, D. L., et al. (1999). Geographically masking health data to preserve confidentiality. Statistics in Medicine, 18, 497–525.
Barreto, M., DeFrancesco-Soto, V., Merolla, J., & Ramirez, R. (2008). Latino campaign ad experimental study. Los Angeles, CA.
Berners-Lee, T., Hendler, J., Lassila, O., et al. (2001). The semantic web (Resource document). Scientific American Magazine. http://www.scientificamerican.com/article.cfm?id=the-semantic-web . Accessed 26 May 2009.
Berry, J. W., Phinney, J. S., Sam, D. L., Vedder, P., et al. (2006). Immigrant youth: Acculturation, identity, and adaptation. Applied Psychology: An International Review, 55(3), 303–332.
Bhaduri, D. (2011). Enhancing resolution of population distribution data in spatial, temporal, and sociocultural dimensions: Advances and challenges (Resource document). Specialist meeting in future direction of spatial demography. http://ncgia.ucsb.edu/projects/spatial-demography/docs/Bhaduri-position.pdf. Accessed 2 Jan 2012.
Browning, C. R. (2011). Future directions in spatial demography (Resource document). Specialist meeting in future direction of spatial demography. http://ncgia.ucsb.edu/projects/spatial-demography/docs/Browning-position.pdf. Accessed 2 Jan 2012.
Chang, C., Kayed, M., Girgis, M., Shaalan, K., et al. (2006). A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 18, 1411–1428.
Cheng, X., Hu, Y., Chia, L.-T., et al. (2011). Exploiting local dependencies with spatial-scale space (S-Cube) for near-duplicate retrieval. Computer Vision and Image Understanding, 115(6), 750–758.
Chow, T. E. (2008). The potential of maps APIs for internet GIS. Transactions in GIS, 12(2), 179–191.
Chow, T. E. (2011). Geography 2.0: A mashup perspective. In S. Li, S. Dragicevic, & B. Veenendaal (Eds.), Advances in web-based GIS, mapping services and applications (pp. 15–36). Boca Raton: CRC Press.
Chow, T. E., Lin, Y., Huynh, N. T., Davis, J., et al. (2010). Using web demographics to model population change of Vietnamese-Americans in Texas between 2000–2009. GeoJournal. doi:10.1007/s10708-010-9390-6.
Chow, T. E., Lin, Y., Chan, W. D., et al. (2011). The development of a web-based demographic data extraction tool for population monitoring. Transactions in GIS, 15(4), 479–494.
Coleman, D. J., Georgiadou, Y., Labonte, J., et al. (2009). Volunteered geographic information: The nature and motivation of producers. International Journal of Spatial Data Infrastructures Research, 4, 332–358.
De Longueville, B. (2010). Community-based geoportals: The next generation? Concepts and methods for the geospatial Web 2.0. Computers, Environment and Urban Systems, 34(4), 299–308.
Dobson, J. E., & Fisher, P. F. (2003). Geoslavery. IEEE Technology and Society Magazine, 22(1), 47–52.
Egghe, L. (2005). Power laws in the information production process: Lotkaian informetrics. Amsterdam: Academic.
Elwood, S., & Leszczynski, A. (2011). Privacy, reconsidered: New representations, data practices, and the geoweb. Geoforum, 42(1), 6–15.
Firestone, S. M., Ward, M. P., Christley, R. M., Dhand, N. K., et al. (2011). The importance of location in contact networks: Describing early epidemic spread using spatial social network analysis. Preventive Veterinary Medicine, 102(2), 185–195.
Goodchild, M. F. (1989). Modeling errors in objects and fields. In M. F. Goodchild & S. Gopal (Eds.), The accuracy of spatial databases. New York: Taylor & Francis.
Goodchild, M. F. (2007). Citizens as voluntary sensors: Spatial data infrastructure in the world of web 2.0. International Journal of Spatial Data Infrastructure Research, 2, 24–32.
Goodchild, M. F. (2008). Spatial accuracy 2.0. Spatial Uncertainty: Proceedings of the Eighth International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, 1, 1–7.
Goss, J. (1995). We know who you are and we know where you live: The instrumental rationality of geodemographic systems. Economic Geography, 71, 171–198.
Gujjary, V. A., & Saxena, A. (2011). A neutral network approach for data masking. Neurocomputing, 74(9), 1497–1501.
Hardy, D. (2010). The wikification of geospatial metadata (Resource document). Workshop on the “Role of Volunteered Geographic Information in Advancing Science”. http://www.ornl.gov/sci/gist/workshops/papers/Hardy.pdf. Accessed 21 July 2011.
Hugl, U. (2011). Reviewing person’s value of privacy of online social networking. Internet Research, 21(4), 1–17.
Kaplan, B., & Lasker, G. (1983). The present distribution of some English surnames derived from place names. Human Biology, 55(2), 243–250.
Kwan, M. P., Casas, I., Schmitz, B. C., et al. (2004). Protection of geoprivacy and accuracy of spatial information: How effective are geographical masks? Cartographica, 39, 15–28.
Landsbergen, D. (2004). Screen-level bureaucracy: Databases as public records. Government Information Quarterly, 21(1), 24–25.
Lasker, G. (1985). Surnames and genetic structure. Cambridge: Cambridge University Press.
Lasker, G., & Mascie-Taylor, C. (1990). Atlas of British surnames. Detroit: Wayne State University Press.
Lauderdale, D. S., & Kestenbaum, B. (2000). Asian-American ethnic identification by surname. Population Research and Policy Review, 19, 283–300.
Li, W., Liu, J., Wang, C., et al. (2005). Web document duplicate removal algorithm based on keyword sequences. In Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (pp. 511–516). Piscataway: IEEE.
Linberger, P., & White, G. (1998). Geographic information on the web: Extracting demographic and market research information. Proceedings of the Nineteenth Annual National Online Meeting, 19, 235–242.
Longley, P. A., Cheshire, J. A., Mateos, P., et al. (2011). Creating a regional geography of Britain through the spatial analysis of surnames. Geoforum, 42(4), 506–516.
Low, W. L., Lee, M. L., Ling, T. W., et al. (2001). A knowledge-based approach for duplicate, elimination in data cleaning. Information Systems, 26(8), 585–606.
Métais, E. (2002). Enhancing information systems management with natural language processing techniques. Data and Knowledge Engineering, 41(2–3), 247–272.
Morgan, R. O., Wei, I. I., Virnig, B. A., et al. (2004). Improving identification of Hispanic males in Medicare: Use of surname matching. Medical Care, 42, 810–816.
Nagata, M. L. (1999). Why did you change your name? Name changing patterns and the life course in early modern Japan. The History of the Family, 4(3), 315–338.
Nasseri, K. (2007). Construction and validation of a list of common Middle Eastern surnames for epidemiological research. Cancer Detection and Prevention, 31, 424–429.
Newman, G., Graham, J., Crall, A., Laituri, M., et al. (2011). The art and science of multi-scale citizen science support. Ecological Informatics, 6(3–4), 217–227.
Oliva, J., Serrano, J. I., del Castillo, M. D., Iglesias, A., et al. (2011). SyMSS: A syntax-based measure for short-text semantic similarity. Data and Knowledge Engineering, 70(4), 390–405.
Perez-Stable, E. J., Hiatt, R. A., Sabogal, F., Otero-Sabogal, R., et al. (1995). Use of Spanish surnames to identify Latinos: Comparison to self-identification. Journal of the National Cancer Institute Monographs, 18, 11–15.
Perkins, R. C. (1993). Evaluating the passel-word Spanish surname list: 1990 decennial census post enumeration survey results (Resource document, U.S. Bureau of the Census, Population Division Working Paper No. 4). http://www.census.gov/population/www/documentation/twps0004.html. Accessed 15 July 2011.
Phillip, M. (2005). Why pay for value-added information? World Patent Information, 27(1), 7–11.
Prieger, J. E., & Hu, W. (2008). The broadband digital divide and the nexus of race, competition, and quality. Information Economics and Policy, 20(2), 150–167.
Quan, H., Wang, F., Schopflocher, D., Norris, C., Galbraith, P. D., Faris, P., Graham, M. M., Knudtson, M. L., Ghali, W. A., et al. (2006). Development and validation of a surname list to define Chinese ethnicity. Medical Care, 44, 328–333.
Robbin, A. (2001). The loss of personal privacy and its consequences for social research. Journal of Government Information, 28(5), 493–527.
Robinson, J. G., & Adlakha, A. (2002). Comparison of A.C.E. revision II results with demographic analysis (Resource document, U.S. Bureau of the Census, DSSD A.C.E. Revision II Estimates Memorandum Series #PP-41). http://www.census.gov/dmd/www/pdf/pp-41r.pdf. Accessed 12 July 2011.
Seeger, C. J. (2008). The role of facilitated volunteered geographic information in the landscape planning and site design process. GeoJournal, 72(3–4), 199–213.
Shah, B. R., Chiu, M., Amin, S., Ramani, M., Sadry, S., Tu, J. V., et al. (2010). Surname lists to identify South Asian and Chinese ethnicity from secondary data in Ontario, Canada: A validation study. BMC Medical Research Methodology, 10, 42. doi:101186/1471-2288-10-42.
Singer, E., Mathiowetz, N. A., & Couper, M. P. (1993). The impact of privacy and confidentiality concerns on survey participation: The case of the 1990 U.S. census. Public Opinion Quarterly, 57(4), 465–482.
Singleton, A. D., & Longley, P. A. (2009). Geodemographics, visualization, and social networks in applied geography. Applied Geography, 29(3), 289–298.
Sui, D. Z. (2008). The wikification of GIS and its consequences: Or Angelina Jolie’s new tattoo and the future of GIS. Computers Environment and Urban Systems, 32(1), 1–5.
Swift, J. N., Goldberg, D. W., & Wilson, J. P. (2008). Geocoding best practices: Review of eight commonly used geocoding systems (Resource document, University of Southern California GIS Research Laboratory Technical Report No 10). http://spatial.usc.edu/Users/dan/gislabtr10_Eight-Commonly-Used-Geocoding-Systems.pdf. Accessed 2 Jan 2012.
Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46, 34–240.
Tulloch, D. L. (2008). Is VGI participation? From vernal pools to video games. GeoJournal, 72(3–4), 161–171.
Uhlaner, C. J., Cain, B. E., & Kiewiet, D. R. (1989). Political participation of ethnic minorities in the 1980s. Political Behavior, 11(3), 195–231.
U.S. Census Bureau. (2010). Genealogy data: Frequently occurring surnames from Census 2000 (Resource document). http://www.census.gov/genealogy/www/data/2000surnames/index.html. Accessed 15 July 2011.
U.S. Government Accountability Office. (2001). Significant increase in cost per housing unit compared to 1990 (Resource document. GAO-02–31). http://www.gao.gov/new.items/d0231.pdf. Accessed 12 July 2011.
U.S. Government Accountability Office. (2008). Census Bureau should take action to improve the credibility and accuracy of its cost estimate for the decennial census (Resource document. GAO-08–554). http://www.gao.gov/new.items/d08554.pdf. Accessed 23 July 2011.
Wei, I. I., Virnig, B. A., John, D. A., & Morgan, R. O. (2006). Using a Spanish surname match to improve identification of Hispanic women in Medicare administrative data. Health Research and Educational Trust, 41(4), 1469–1481.
WhitePages. (2011). WhitePages privacy central (Resource document). http://www.whitepage.com/help/privacy_central. Accessed 20 July 2011.
Word, D. L., Coleman, C. D., Nunbziata, R., Kominski, R., et al. (n.d.). Demographic aspects of surname from Census 2000, genealogy data: Frequent occurring surnames from Census 2000 (Resource document. US Census Bureau). http://www.census.gov/genealogy/www/data/2000surnames/surnames.pdf. Accessed 14 July 2011.
Wright, T. (2000). Census 2000: Who says counting is easy as 1–2–3? Government Information Quarterly, 17(2), 121–136.
Acknowledgments
The author is thankful to Yan Lin, who provided valuable assistance in preparing the statistics of web demographics acquired from several people-finder sites. Collaboration with colleagues and fellow students, including Niem Huynh, David Parr, John Davis, and Anne Ngu, on projects related to web demographics is instrumental to the ideas articulated in this manuscript. The author is in debt to Nancy Wilson, David Parr, and Niem Huynh for their editorial assistance and helpful reviews. The constructive comments from the reviewers greatly improved the quality of this manuscript. Any errors in the manuscript are solely the responsibility of the author.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht.
About this chapter
Cite this chapter
Chow, T.E. (2013). “We Know Who You Are and We Know Where You Live”: A Research Agenda for Web Demographics. In: Sui, D., Elwood, S., Goodchild, M. (eds) Crowdsourcing Geographic Knowledge. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-4587-2_15
Download citation
DOI: https://doi.org/10.1007/978-94-007-4587-2_15
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-4586-5
Online ISBN: 978-94-007-4587-2
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)