Search query formation by strategic consumers

Abstract

Submitting queries to search engines has become a major way for consumers to search for information and products. The massive amount of search query data available today has the potential to provide valuable information on consumer preferences. In order to unlock this potential, it is necessary to understand how consumers translate their preferences into search queries. Strategic consumers should attempt to maximize the information content of the search results, conditional on a set of beliefs on how the search engine operates. We show using field data that optimal queries may exclude some of the terms that are more relevant to the consumer, potentially at the expense of less relevant terms. In two incentive-aligned lab experiments, we find that consumers have some ability to strategically omit relevant terms when forming their search queries, but that their search queries tend to be suboptimal. In a third incentive-aligned experiment, we find that consumers’ beliefs on how the search engine operates tend to be inaccurate. Overall, our results are consistent with consumers being strategic when formulating their queries, but acting on incorrect beliefs on how the search engine operates.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    A topic model is a statistical model that describes text using a set of topics rather than individual words, where topics are defined as probabilistic combinations of words.

  2. 2.

    The five queries that do not have any related term in the vocabulary are: fast food com, food poisoning symptoms, boys food#, food lion weekly ad, pet food express.

  3. 3.

    The only reason this would not be true would be if the optimal query included all the words in G plus some additional non-relevant terms, which is highly unlikely. In this paper we focus on queries that only include relevant terms.

  4. 4.

    Before running each study, we obtained all the activation probabilities using the same approach as with our field data (see “Search results” in Section 4.1). We ran all queries on a single computer to ensure that the results given to participants during the game would not be dependent on the computer on which the query was run. We used these results during the game, i.e., we did not actually run any query during the game. We also re-ran these queries using different computers, and the optimal queries and results were mostly consistent.

  5. 5.

    As the specific query type selected under naive beliefs depends on the unobserved belief parameters αlow and αhigh in this study, we do not create a table like Table 5 or a figure like Fig. 3b.

  6. 6.

    For more examples, one can refer to these articles/blogs: https://neilpatel.com/blog/seo-excel-hacks/; https://blog.hubspot.com/marketing/on-page-seo-template; https://www.distilled.net/excel-for-seo/; https://medium.com/@jacobjs/beginners-guide-to-seotools-for-excel-part-1-db84ed54daff; https://moz.com/blog/one-formula-seo-data-analysis-made-easy-excel; https://mainpath.com/using-excel-as-an-seo-tool/; https://cleverclicks.com.au/blog/seo-excel-formula-toolkit/.

References

  1. Azzopardi, L., Kelly, D., Brennan, K. (2013). How query cost affects search behavior. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (pp. 23–32).

  2. Borrell. (2016). Forecast says seo-related spending will be worth $80 billion by 2020. Available online from http://searchengineland.com/forecast-says-seo-related-spending-will-worth-80-billion-2020-247712.

  3. Dzyabura, D., & Hauser, J.R. (2018). The role of preference discovery in product search. Marketing Science. forthcoming.

  4. Erdem, T., Keane, M.P., Öncü, T.S., Strebel, J. (2005). Learning about computers: an analysis of information search and technology choice. Quantitative Marketing and Economics, 3(3), 207–247.

    Article  Google Scholar 

  5. Fu, W.-T., & Pirolli, P. (2007). Snif-act: a cognitive model of user navigation on the world wide web. Human–Computer Interaction, 22(4), 355–412.

    Google Scholar 

  6. Gabaix, X., Laibson, D., Moloche, G., Weinberg, S. (2006). Costly information acquisition: experimental analysis of a boundedly rational model. American Economic Review, 96(4), 1043–1068.

    Article  Google Scholar 

  7. Honka, E., & Chintagunta, P. (2016). Simultaneous or sequential? Search strategies in the us auto insurance industry. Marketing Science, 36(1), 21–42.

    Article  Google Scholar 

  8. Hui, S.K., Bradlow, E.T., Fader, P.S. (2009). Testing behavioral hypotheses using an integrated model of grocery store shopping path and purchase behavior. Journal of Consumer Research, 36(3), 478–493.

    Article  Google Scholar 

  9. InternetLiveStats. (2016). Google search statistics. Available online from http://www.internetlivestats.com/google-search-statistics/.

  10. Jansen, B.J., Booth, D., Smith, B. (2009). Using the taxonomy of cognitive learning to model online searching. Information Processing & Management, 45(6), 643–663.

    Article  Google Scholar 

  11. Jansen, B.J., Spink, A., Pfaff, A., Goodrum, A. (2000a). Web query structure: implications for ir system design. In Proceedings of the 4th World multiconference on systemics, cybernetics and informatics (SCI 2000) (pp. 169–176).

  12. Jansen, B.J., Spink, A., Saracevic, T. (2000b). Real life, real users, and real needs: a study and analysis of user queries on the web. Information processing & management, 36(2), 207–227.

    Article  Google Scholar 

  13. Jeziorski, P., & Segal, I. (2015). What makes them click: empirical analysis of consumer demand for search advertising. American Economic Journal: Microeconomics, 7(3), 24–53.

    Google Scholar 

  14. Kamvar, M., & Baluja, S. (2006). A large scale study of wireless search behavior: Google mobile search. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 701–709): ACM.

  15. Kim, J.B., Albuquerque, P., Bronnenberg, B.J. (2010). Online demand under limited consumer search. Marketing Science, 29(6), 1001–1023.

    Article  Google Scholar 

  16. Li, H., & Xu, J. (2013). Semantic matching in search. Foundation and Trends in Informational Retrieval, 7(5), 343–469.

    Article  Google Scholar 

  17. Liu, J., & Toubia, O. (2018). A semantic approach for estimating consumer content preferences from online search queries. Marketing Science, 37, 6.

    Google Scholar 

  18. Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to information retrieval Vol. 1. Cambridge: Cambridge University Press.

    Google Scholar 

  19. Nair, H. (2007). Intertemporal price discrimination with forward-looking consumers: application to the us market for console video-games. Quantitative Marketing and Economics, 5(3), 239–292.

    Article  Google Scholar 

  20. Narayanan, S., & Kalyanam, K. (2015). Position effects in search advertising and their moderators: a regression discontinuity approach. Marketing Science, 34(3), 388–407.

    Article  Google Scholar 

  21. Park, J., & Chung, H. (2009). Consumers travel website transferring behaviour: analysis using clickstream data-time, frequency, and spending. The Service Industries Journal, 29(10), 1451–1463.

    Article  Google Scholar 

  22. Pirolli, P.L. (2007). Information foraging theory: adaptive interaction with information. Oxford University Press.

  23. Ruthven, I. (2003). Re-examining the potential effectiveness of interactive query expansion. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval (pp. 213–220): ACM.

  24. Salton, G., & McGill, M.J. (1986). Introduction to modern information retrieval. New York: McGraw-Hill, Inc.

    Google Scholar 

  25. Santos, R.L., Macdonald, C., Ounis, I. (2015). Search result diversification. Foundations and Trends in Information Retrieval, 9(1), 1–90.

    Article  Google Scholar 

  26. Seiler, S. (2013). The impact of search costs on consumer behavior: a dynamic approach. Quantitative Marketing and Economics, 11(2), 155–203.

    Article  Google Scholar 

  27. Shi, S.W., & Trusov, M. (2013). The path to click: are you on it? Working paper.

  28. Spink, A., Wolfram, D., Jansen, M.B., Saracevic, T. (2001). Searching the web: the public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234.

    Article  Google Scholar 

  29. Stigler, G.J. (1961). The economics of information. Journal of Political Economy, 69(3), 213–225.

    Article  Google Scholar 

  30. Weitzman, M.L. (1979). Optimal search for the best alternative. Econometrica: Journal of the Econometric Society, 641–654.

    Article  Google Scholar 

  31. Wiltshire, C. (2015). Are you still using google analytics and excel to manage your seo strategy? Available online from https://www.gshiftlabs.com/seo-blog/are-you-still-using-google-analytics-and-excel-to-manage-your-seo-strategy/.

  32. Wu, W.-C., Kelly, D., Sud, A. (2014). Using information scent and need for cognition to understand online search behavior. In Proceedings of the 37th International ACM SIGIR conference on Research & Development in Information Retrieval (pp. 557–566): ACM.

  33. Yang, L., Toubia, O., De Jong, M.G. (2015). A bounded rationality model of information search and choice in preference measurement. Journal of Marketing Research, 52(2), 166–183.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Olivier Toubia.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Alternative utility function

We reproduce Table 1 under an alternative utility function, which captures the number of times each word appears on a page, not just whether it appears. That is, the utility function in Eq. 1 is replaced with:

$$ U(l) = \sum\limits_{t_{j} \in G} \beta_{j} \ n_{t_{j},l} $$
(11)

where, \(n_{t_{j},l}\) is the number of times word tj appears on webpage l. The results are summarized in Table 10.

Table 10 Benefits of omitting relevant terms

Appendix B: Instruction page for search query game

Fig. 6
figure6

Study 1 – Instructions

Fig. 7
figure7

Study 1 – Quiz

Fig. 8
figure8

Study 2 – Instructions

Fig. 9
figure9

Study 2 – Quiz

Appendix C: Naive queries in study 2

C.1 Two words valued at $2, one word valued at $1

Without loss of generality, we assume: β1 = β2 = 2,β3 = 1. The support of the utility of webpages is {0,1,2,3,4,5}. We compute the probability distribution of the maximum utility across L pages, under the naive beliefs represented in Eq. 8. We simplify notations by setting \(\widetilde {Prob}(t_{i} \in l|q) =p_{i}\) for i ∈{1, 2, 3}. We start by computing the cumulative density function of the maximum utility, i.e, \(\phi _{j} = Prob(\max \nolimits _{l}\{U(l)\}\leq j)\) for j ∈{0, 1, 2, 3, 4, 5}, as follows:

$$ \begin{array}{@{}rcl@{}} \phi_{5} &=&1 \\ \phi_{4} &=&(1-p_{1}p_{2}p_{3})^{L} \\ \phi_{3} &=&(1-p_{1}p_{2})^{L} \\ \phi_{2} &=& \left[{\sum}_{i=1}^{3} \left( p_{i}{\Pi}_{j \neq i}(1-p_{j}) \right)+ {\Pi}_{i=1}^{3}(1-p_{i}) \right]^{L} \\ \phi_{1} &=&(1-p_{1})^{L}(1-p_{2})^{L} \\ \phi_{0} &=&(1-p_{1})^{L}(1-p_{2})^{L}(1-p_{3})^{L} \end{array} $$

Given this, we can express the objective function as (recall that c = 1 in the experiment):

$$ \begin{array}{@{}rcl@{}} \tilde{f}^{\text{naive}}(q)&=&E[\max_{l}\{U(l)\}|q]-|q| \\ &=&5(\phi_{5}-\phi_{4})+4(\phi_{4}-\phi_{2})+3(\phi_{3}-\phi_{2})+2(\phi_{2}-\phi_{1})+(\phi_{1}-\phi_{0})-|q| \\ &=&5-(\phi_{4}+\phi_{3}+\phi_{2}+\phi_{1}+\phi_{0})-|q| \end{array} $$
(12)
Fig. 10
figure10

Naive query type – one term is valued at $1

There are six query types such that all queries from the same type achieve the same value of the objective function in Eq. 12:

  1. 1.

    Empty queries (|q| = 0): p1 = p2 = p3 = αlow

  2. 2.

    Queries with only the low value term (|q| = 1): p1 = p2 = αlow,p3 = αhigh

  3. 3.

    Queries with only one high value term (|q| = 1): we assume that p1 = αhigh and p2 = p3 = αlow without loss of generality

  4. 4.

    Queries with only one high value term and the low value term (|q| = 2): we assume that p1 = αhigh and p2 = αlow,p3 = αhigh without loss of generality

  5. 5.

    Queries with only the two high value terms (|q| = 2): p1 = p2 = αhigh,p3 = αlow

  6. 6.

    Queries with all three terms (|q| = 3): p1 = p2 = p3 = αhigh

We compute the naive objective function, i.e., Eq. 12, for each query type, when both αlow and αhigh vary between 0 and 1 under the constraint that αlowαhigh, for L = 10. Figure 10 displays the query type that maximizes the naive objective function, as a function of αhigh and αlow. We see that the query types containing the low-value term (type 2, 4 and 6) never maximize the naive objective function. The other three query types may maximize the objective function, depending on the values of the parameters αhigh and αlow.

C.2 All words valued at $2

The support of the webpage utility is {0,2,4,6}. We compute the probability distribution of the maximum utility across L pages, under the naive beliefs defined in Eq. 8. We also simplify notations by setting \(\widetilde {Prob}(t_{i} \in l|q)=p_{i}\) for i ∈{1, 2, 3}. We first compute the cumulative density function of the maximum utility, i.e, \(\phi _{j} = Prob(\max \nolimits _{l}\{U(l)\}\leq j)\)j ∈{0, 2, 4, 6}, as follows:

$$ \begin{array}{@{}rcl@{}} \phi_{6} &=& 1 \\ \phi_{4} &=& (1-p_{1}p_{2}p_{3})^{L} \\ \phi_{2} &=& \left[{\sum}_{i=1}^{3} \left( p_{i}{\Pi}_{j \neq i}(1-p_{j}) \right) + {\Pi}_{i=1}^{3}(1-p_{i}) \right]^{L} \\ \phi_{0} &=& (1-p_{1})^{L}(1-p_{2})^{L}(1-p_{3})^{L} \end{array} $$

Given this, we can express the objective function as (recall that c = 1 in the experiment):

$$ \begin{array}{@{}rcl@{}} \tilde{f}^{\text{naive}}(q)&=E[max_{l}\{U(l)\}|q]-|q| \\ &=&6(\phi_{6}-\phi_{4})+4(\phi_{4}-\phi_{2})+2(\phi_{2}-\phi_{0})-|q| \\ &=&6-2(\phi_{4}+\phi_{2}+\phi_{0})-|q| \end{array} $$
(13)

In this case, there are four query types such that all queries from the same type achieve the same value of the objective function in Eq. 13:

  1. 1.

    Empty queries (|q| = 0): p1 = p2 = p3 = αlow

  2. 2.

    Queries with only one term (|q| = 1): we assume that p1 = αhigh and p2 = p3 = αlow without loss of generality

  3. 3.

    Queries with two terms (|q| = 2): we assume that p1 = p2 = αhigh and p3 = αlow without loss of generality

  4. 4.

    Queries with all three terms (|q| = 3): p1 = p2 = p3 = αhigh

We compute the naive objective function, i.e., Eq. 13, for each type of query, when αlow and αhigh vary between 0 and 1 under the constraint that αlowαhigh, for L = 10. Figure 11 shows that under naive beliefs, all four query types may maximize the objective function, depending on the values of the parameters αhigh and αlow.

Fig. 11
figure11

Naive query type – all terms are valued at $2

Appendix D: Field data: all shorter queries and related words

Table 11 Shorter queries and related words (part 1)
Table 12 Shorter queries and sample related words (part 2)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Toubia, O. Search query formation by strategic consumers. Quant Mark Econ 18, 155–194 (2020). https://doi.org/10.1007/s11129-019-09217-3

Download citation

Keywords

  • Search engines
  • Revealed preference
  • Experiments

JEL Classification

  • M300