Skip to main content

Understanding User Behavior Through Log Data and Analysis

  • Chapter
  • First Online:

Abstract

HCI researchers are increasingly collecting rich behavioral traces of user interactions with online systems in situ at a scale not previously possible. These logs can be used to characterize user interactions with existing systems and compare different designs. Large-scale log studies give rise to new challenges in experimental design, data collection and interpretation, and ethics. The chapter discusses how to address these challenges using search engine logs, but the methods are applicable to other types of log data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Adar, E., Teevan, J., & Dumais, S. T. (2008). Large scale analysis of web revisitation patterns. In Proceedings of CHI 2008 (pp. 1197–1206). New York: ACM.

    Google Scholar 

  • Baeza-Yates, R., Dupret, G., & Velasco, J. (2007). A study of mobile search queries in Japan. In Proceedings of WWW 2007 workshop on query log analysis: Social and technical challenges. New York, NY: ACM.

    Google Scholar 

  • Barbaro, M. & Zeller, T. (2006). A face is exposed for AOL searcher No. 4417749, New York Times, Retrieved on August 9, 2006, from http://www.nytimes.com/2006/08/09/technology/09aol.html?_r=1

  • Barnett, V., & Lewis, S. (1994). Outliers in statistical data. New York, NY: Wiley & Sons.

    MATH  Google Scholar 

  • Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D. A., & Frieder, O. (2004). Hourly analysis of a very large topically categorized web query log. In Proceedings of SIGIR 2004 (pp. 321–328). New York, NY: ACM.

    Google Scholar 

  • Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10.

    Article  Google Scholar 

  • Brown, C. (2012). Split testing with Google analytics experiments. Retrieved on December 16, 2012, from http://webdesign.tutsplus.com/tutorials/applications/split-testing-with-google-analytics-experiments/

  • Capra, R. (2011). HCI browser: A tool for administration and data collection for studies of web search behavior. In Proceedings of HCIHCI 2011 (pp. 259–268). New York, NY: Springer.

    Google Scholar 

  • Crook, T., Frasca, B., Kohavi, R., & Longbotham, R. (2009). Seven pitfalls to avoid when running controlled experiments on the web. In Proceedings of KDD 2009 (pp. 1105–1114). New York, NY: ACM.

    Google Scholar 

  • Dell, N., Vaidyanathan, V., Medhi, I., Cutrell, E., & Thies, W. (2012). “Yours is better!”: Participant response bias in HCI. In Proceedings of CHI 2012 (pp. 1321–1330). New York, NY: ACM.

    Google Scholar 

  • Dumais, S. T., Cutrell, E., Cadiz, J. J., Jancke, G., Sarin, R., & Robbins, D. C. (2003). Stuff I’ve seen: A system for personal information retrieval and re-use. In Proceedings of SIGIR 2003 (pp. 72–79). New York, NY: ACM.

    Google Scholar 

  • Efthimiadis, E. N. (2008). How do Greeks search the web?: A query log analysis study. In Proceedings iNews 2008 (pp. 81–84). New York, NY: ACM.

    Google Scholar 

  • Fetterly, D., Manasse, M., & Najork, M. (2004). Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Proceedings WebDB 2004 (pp. 1–6). New York, NY: ACM.

    Google Scholar 

  • Fox, S., Karnawat, K., Mydland, M., Dumais, S. T., & White, T. (2005). Evaluating implicit measures to improve web search. ACM: Transactions on Information Systems (TOIS), 23(2), 147–168.

    Google Scholar 

  • Ghorab, M. R., Leveling, J., Zhou, D., Jones, G. J. F., & Wade, V. (2009). Identifying common user behaviour in multilingual search logs. In Proceedings of CLEF 2009, pp. 518–525.

    Google Scholar 

  • Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457, 1012–1014.

    Article  Google Scholar 

  • Google. (2012). Google analytics. Retrieved on December 16, 2012, from http://www.google.com/analytics/

  • Huck, S. (2011). Reading statistics and research (6th ed.). Boston, MA: Pearson.

    Google Scholar 

  • Jansen, B. J. (2006). Search log analysis: What it is, what’s been done, how to do it. Library and Information Science Research, 28(3), 407–432.

    Article  Google Scholar 

  • Jupiter Research Corporation. (2005, March 9). Measuring unique visitors: Addressing the dramatic decline in the accuracy of cookie-based measurement

    Google Scholar 

  • Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., & Xu, Y. (2012). Trustworthy online controlled experiments: Five puzzling outcomes explained. In Proceedings of KDD 2012 (pp. 786–794). New York, NY: ACM.

    Google Scholar 

  • Kohavi, R., Longbotham, R., Sommerfield, D., & Henne, R. M. (2009). Controlled experiments on the web: Survey and practical guide. Data Mining and Knowledge Discovery, 18(1), 140–181.

    Article  MathSciNet  Google Scholar 

  • Kotov, A., Bennett, P., White, R. W., Dumais, S. T., & Teevan, J. (2011). Modeling and analysis of cross-session search tasks. In Proceedings of SIGIR 2011 (pp. 5–14). New York, NY: ACM.

    Google Scholar 

  • Lau, T., & Horvitz, E. (1999). Patterns of search: Analyzing and modeling web query refinement. In Proceedings of user modeling 1999 (pp. 119–128). New York, NY: ACM.

    Google Scholar 

  • Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. In Proceedings of IEEE symposium on security and privacy 2008 (pp. 111–125). Washington, DC: IEEE.

    Chapter  Google Scholar 

  • Ogbuji, U. (2009). Working with web server logs. Retrieved on December 16, 2012, fromhttp://www.ibm.com/developerworks/web/library/wa-apachelogs/

  • Osborne, J. W. (2012). Best practices in data cleaning: Everything you need to know before and after collecting your data. Thousand Oak, CA: Sage Publications.

    Google Scholar 

  • Rodden, K., & Leggett, M. (2010). Best of both worlds: Improving Gmail labels with the affordance of folders. In Proceedings of CHI 2010 (pp. 4587–4596). New York, NY: ACM.

    Google Scholar 

  • Silverstein, C., Henzinger, M., Marais, H., & Moricz, M. (1998). Analysis of a very large web search engine query log. Technical Report 1998-014. Digital SRC.

    Google Scholar 

  • Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Oxford, England: Appleton-Century.

    Google Scholar 

  • Spink, A., Ozmutlu, S., Ozmutlu, H. C., & Jansen, B. J. (2002). U.S. versus European web searching trends. ACM SIGIR Forum, 36(2), 32–38.

    Article  Google Scholar 

  • Starbird, K. & Palen, L. (2010). Pass it on? Retweeting in mass emergencies. In Proceedings of ISCRAM 2010, pp. 1–10.

    Google Scholar 

  • Tang, D., Agarwal, A., O’Brien, D., & Meyer, M. (2010). Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings KDD 2010 (pp. 17–26). New York, NY: ACM.

    Google Scholar 

  • Teevan, J., Adar, E., Jones, R., & Potts, M. (2007). Information re-retrieval: Repeat queries in Yahoo’s logs. In Proceedings of SIGIR 2007 (pp. 151–158). New York, NY: ACM.

    Google Scholar 

  • Teevan, J., Dumais, S. T., & Liebling, D. J. (2008). To personalize or not to personalize: Modeling queries with variation in user intent. In Proceedings of SIGIR 2008 (pp. 163–170). New York, NY: ACM.

    Google Scholar 

  • Teevan, J., & Hehmeyer, A. (2013). Understanding how the projection of availability state impacts the reception of incoming communication. In Proceedings of CSCW 2013 (pp. 753–758). New York, NY: ACM.

    Google Scholar 

  • Teevan, J., Ramage, D., & Morris, M. R. (2011). #TwitterSearch: A comparison of microblog search and web search. In Proceedings of WSDM 2011 (pp. 35–44). New York, NY: ACM.

    Google Scholar 

  • Tyler, S. K., & Teevan, J. (2010). Large scale query log analysis of re-finding. In Proceedings of WSDM 2010 (pp. 191–200). New York, NY: ACM.

    Google Scholar 

  • White, R., Dumais, S. T., & Teevan, J. (2009). Characterizing the influence of domains expertise on web search behavior. In Proceedings of WSDM 2009 (pp. 132–141). New York, NY: ACM.

    Google Scholar 

  • White, R., & Morris, D. (2007). Investigating the querying and browsing behavior of advanced search engine users. In Proceedings of SIGIR 2007 (pp. 255–262). New York, NY: ACM.

    Google Scholar 

  • Wikipedia: AOL search. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/AOL_search_data_scandal

  • Wikipedia: Delta method. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/Delta_method

  • Wikipedia: Hadoop. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/Apache_Hadoop

  • Wikipedia: Netflix. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/Netflix_Prize

  • Wikipedia: Power. Retrieved on December 16, 2012, from http://en.wikipedia.org/wiki/Statistical_power

  • Wikipedia: Simpson’s Paradox. Retrieved on December 16, 2012, from http://wikipedia.org/Simpsons_Paradox

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susan Dumais .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Dumais, S., Jeffries, R., Russell, D.M., Tang, D., Teevan, J. (2014). Understanding User Behavior Through Log Data and Analysis. In: Olson, J., Kellogg, W. (eds) Ways of Knowing in HCI. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0378-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-0378-8_14

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-0377-1

  • Online ISBN: 978-1-4939-0378-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics