Abstract
Submitting queries to search engines has become a major way for consumers to search for information and products. The massive amount of search query data available today has the potential to provide valuable information on consumer preferences. In order to unlock this potential, it is necessary to understand how consumers translate their preferences into search queries. Strategic consumers should attempt to maximize the information content of the search results, conditional on a set of beliefs on how the search engine operates. We show using field data that optimal queries may exclude some of the terms that are more relevant to the consumer, potentially at the expense of less relevant terms. In two incentivealigned lab experiments, we find that consumers have some ability to strategically omit relevant terms when forming their search queries, but that their search queries tend to be suboptimal. In a third incentivealigned experiment, we find that consumers’ beliefs on how the search engine operates tend to be inaccurate. Overall, our results are consistent with consumers being strategic when formulating their queries, but acting on incorrect beliefs on how the search engine operates.
This is a preview of subscription content, log in to check access.
Notes
 1.
A topic model is a statistical model that describes text using a set of topics rather than individual words, where topics are defined as probabilistic combinations of words.
 2.
The five queries that do not have any related term in the vocabulary are: fast food com, food poisoning symptoms, boys food#, food lion weekly ad, pet food express.
 3.
The only reason this would not be true would be if the optimal query included all the words in G plus some additional nonrelevant terms, which is highly unlikely. In this paper we focus on queries that only include relevant terms.
 4.
Before running each study, we obtained all the activation probabilities using the same approach as with our field data (see “Search results” in Section 4.1). We ran all queries on a single computer to ensure that the results given to participants during the game would not be dependent on the computer on which the query was run. We used these results during the game, i.e., we did not actually run any query during the game. We also reran these queries using different computers, and the optimal queries and results were mostly consistent.
 5.
 6.
For more examples, one can refer to these articles/blogs: https://neilpatel.com/blog/seoexcelhacks/; https://blog.hubspot.com/marketing/onpageseotemplate; https://www.distilled.net/excelforseo/; https://medium.com/@jacobjs/beginnersguidetoseotoolsforexcelpart1db84ed54daff; https://moz.com/blog/oneformulaseodataanalysismadeeasyexcel; https://mainpath.com/usingexcelasanseotool/; https://cleverclicks.com.au/blog/seoexcelformulatoolkit/.
References
Azzopardi, L., Kelly, D., Brennan, K. (2013). How query cost affects search behavior. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (pp. 23–32).
Borrell. (2016). Forecast says seorelated spending will be worth $80 billion by 2020. Available online from http://searchengineland.com/forecastsaysseorelatedspendingwillworth80billion2020247712.
Dzyabura, D., & Hauser, J.R. (2018). The role of preference discovery in product search. Marketing Science. forthcoming.
Erdem, T., Keane, M.P., Öncü, T.S., Strebel, J. (2005). Learning about computers: an analysis of information search and technology choice. Quantitative Marketing and Economics, 3(3), 207–247.
Fu, W.T., & Pirolli, P. (2007). Snifact: a cognitive model of user navigation on the world wide web. Human–Computer Interaction, 22(4), 355–412.
Gabaix, X., Laibson, D., Moloche, G., Weinberg, S. (2006). Costly information acquisition: experimental analysis of a boundedly rational model. American Economic Review, 96(4), 1043–1068.
Honka, E., & Chintagunta, P. (2016). Simultaneous or sequential? Search strategies in the us auto insurance industry. Marketing Science, 36(1), 21–42.
Hui, S.K., Bradlow, E.T., Fader, P.S. (2009). Testing behavioral hypotheses using an integrated model of grocery store shopping path and purchase behavior. Journal of Consumer Research, 36(3), 478–493.
InternetLiveStats. (2016). Google search statistics. Available online from http://www.internetlivestats.com/googlesearchstatistics/.
Jansen, B.J., Booth, D., Smith, B. (2009). Using the taxonomy of cognitive learning to model online searching. Information Processing & Management, 45(6), 643–663.
Jansen, B.J., Spink, A., Pfaff, A., Goodrum, A. (2000a). Web query structure: implications for ir system design. In Proceedings of the 4th World multiconference on systemics, cybernetics and informatics (SCI 2000) (pp. 169–176).
Jansen, B.J., Spink, A., Saracevic, T. (2000b). Real life, real users, and real needs: a study and analysis of user queries on the web. Information processing & management, 36(2), 207–227.
Jeziorski, P., & Segal, I. (2015). What makes them click: empirical analysis of consumer demand for search advertising. American Economic Journal: Microeconomics, 7(3), 24–53.
Kamvar, M., & Baluja, S. (2006). A large scale study of wireless search behavior: Google mobile search. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 701–709): ACM.
Kim, J.B., Albuquerque, P., Bronnenberg, B.J. (2010). Online demand under limited consumer search. Marketing Science, 29(6), 1001–1023.
Li, H., & Xu, J. (2013). Semantic matching in search. Foundation and Trends in Informational Retrieval, 7(5), 343–469.
Liu, J., & Toubia, O. (2018). A semantic approach for estimating consumer content preferences from online search queries. Marketing Science, 37, 6.
Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to information retrieval Vol. 1. Cambridge: Cambridge University Press.
Nair, H. (2007). Intertemporal price discrimination with forwardlooking consumers: application to the us market for console videogames. Quantitative Marketing and Economics, 5(3), 239–292.
Narayanan, S., & Kalyanam, K. (2015). Position effects in search advertising and their moderators: a regression discontinuity approach. Marketing Science, 34(3), 388–407.
Park, J., & Chung, H. (2009). Consumers travel website transferring behaviour: analysis using clickstream datatime, frequency, and spending. The Service Industries Journal, 29(10), 1451–1463.
Pirolli, P.L. (2007). Information foraging theory: adaptive interaction with information. Oxford University Press.
Ruthven, I. (2003). Reexamining the potential effectiveness of interactive query expansion. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval (pp. 213–220): ACM.
Salton, G., & McGill, M.J. (1986). Introduction to modern information retrieval. New York: McGrawHill, Inc.
Santos, R.L., Macdonald, C., Ounis, I. (2015). Search result diversification. Foundations and Trends in Information Retrieval, 9(1), 1–90.
Seiler, S. (2013). The impact of search costs on consumer behavior: a dynamic approach. Quantitative Marketing and Economics, 11(2), 155–203.
Shi, S.W., & Trusov, M. (2013). The path to click: are you on it? Working paper.
Spink, A., Wolfram, D., Jansen, M.B., Saracevic, T. (2001). Searching the web: the public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234.
Stigler, G.J. (1961). The economics of information. Journal of Political Economy, 69(3), 213–225.
Weitzman, M.L. (1979). Optimal search for the best alternative. Econometrica: Journal of the Econometric Society, 641–654.
Wiltshire, C. (2015). Are you still using google analytics and excel to manage your seo strategy? Available online from https://www.gshiftlabs.com/seoblog/areyoustillusinggoogleanalyticsandexceltomanageyourseostrategy/.
Wu, W.C., Kelly, D., Sud, A. (2014). Using information scent and need for cognition to understand online search behavior. In Proceedings of the 37th International ACM SIGIR conference on Research & Development in Information Retrieval (pp. 557–566): ACM.
Yang, L., Toubia, O., De Jong, M.G. (2015). A bounded rationality model of information search and choice in preference measurement. Journal of Marketing Research, 52(2), 166–183.
Author information
Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Alternative utility function
We reproduce Table 1 under an alternative utility function, which captures the number of times each word appears on a page, not just whether it appears. That is, the utility function in Eq. 1 is replaced with:
where, \(n_{t_{j},l}\) is the number of times word t_{j} appears on webpage l. The results are summarized in Table 10.
Appendix B: Instruction page for search query game
Appendix C: Naive queries in study 2
C.1 Two words valued at $2, one word valued at $1
Without loss of generality, we assume: β_{1} = β_{2} = 2,β_{3} = 1. The support of the utility of webpages is {0,1,2,3,4,5}. We compute the probability distribution of the maximum utility across L pages, under the naive beliefs represented in Eq. 8. We simplify notations by setting \(\widetilde {Prob}(t_{i} \in lq) =p_{i}\) for i ∈{1, 2, 3}. We start by computing the cumulative density function of the maximum utility, i.e, \(\phi _{j} = Prob(\max \nolimits _{l}\{U(l)\}\leq j)\) for j ∈{0, 1, 2, 3, 4, 5}, as follows:
Given this, we can express the objective function as (recall that c = 1 in the experiment):
There are six query types such that all queries from the same type achieve the same value of the objective function in Eq. 12:

1.
Empty queries (q = 0): p_{1} = p_{2} = p_{3} = α^{low}

2.
Queries with only the low value term (q = 1): p_{1} = p_{2} = α^{low},p_{3} = α^{high}

3.
Queries with only one high value term (q = 1): we assume that p_{1} = α^{high} and p_{2} = p_{3} = α^{low} without loss of generality

4.
Queries with only one high value term and the low value term (q = 2): we assume that p_{1} = α^{high} and p_{2} = α^{low},p_{3} = α^{high} without loss of generality

5.
Queries with only the two high value terms (q = 2): p_{1} = p_{2} = α^{high},p_{3} = α^{low}

6.
Queries with all three terms (q = 3): p_{1} = p_{2} = p_{3} = α^{high}
We compute the naive objective function, i.e., Eq. 12, for each query type, when both α^{low} and α^{high} vary between 0 and 1 under the constraint that α^{low} ≤ α^{high}, for L = 10. Figure 10 displays the query type that maximizes the naive objective function, as a function of α^{high} and α^{low}. We see that the query types containing the lowvalue term (type 2, 4 and 6) never maximize the naive objective function. The other three query types may maximize the objective function, depending on the values of the parameters α^{high} and α^{low}.
C.2 All words valued at $2
The support of the webpage utility is {0,2,4,6}. We compute the probability distribution of the maximum utility across L pages, under the naive beliefs defined in Eq. 8. We also simplify notations by setting \(\widetilde {Prob}(t_{i} \in lq)=p_{i}\) for i ∈{1, 2, 3}. We first compute the cumulative density function of the maximum utility, i.e, \(\phi _{j} = Prob(\max \nolimits _{l}\{U(l)\}\leq j)\)j ∈{0, 2, 4, 6}, as follows:
Given this, we can express the objective function as (recall that c = 1 in the experiment):
In this case, there are four query types such that all queries from the same type achieve the same value of the objective function in Eq. 13:

1.
Empty queries (q = 0): p_{1} = p_{2} = p_{3} = α^{low}

2.
Queries with only one term (q = 1): we assume that p_{1} = α^{high} and p_{2} = p_{3} = α^{low} without loss of generality

3.
Queries with two terms (q = 2): we assume that p_{1} = p_{2} = α^{high} and p_{3} = α^{low} without loss of generality

4.
Queries with all three terms (q = 3): p_{1} = p_{2} = p_{3} = α^{high}
We compute the naive objective function, i.e., Eq. 13, for each type of query, when α^{low} and α^{high} vary between 0 and 1 under the constraint that α^{low} ≤ α^{high}, for L = 10. Figure 11 shows that under naive beliefs, all four query types may maximize the objective function, depending on the values of the parameters α^{high} and α^{low}.
Appendix D: Field data: all shorter queries and related words
Rights and permissions
About this article
Cite this article
Liu, J., Toubia, O. Search query formation by strategic consumers. Quant Mark Econ 18, 155–194 (2020). https://doi.org/10.1007/s11129019092173
Received:
Accepted:
Published:
Issue Date:
Keywords
 Search engines
 Revealed preference
 Experiments
JEL Classification
 M300