Abstract
The Unbiased Learning-to-Rank framework [16] has been recently proposed as a general approach to systematically remove biases, such as position bias, from learning-to-rank models. The method takes two steps - estimating click propensities and using them to train unbiased models. Most common methods proposed in the literature for estimating propensities involve some degree of intervention in the live search engine. An alternative approach proposed recently uses an Expectation Maximization (EM) algorithm to estimate propensities by using ranking features for estimating relevances [21]. In this work we propose a novel method to directly estimate propensities which does not use any intervention in live search or rely on modeling relevance. Rather, we take advantage of the fact that the same query-document pair may naturally change ranks over time. This typically occurs for eCommerce search because of change of popularity of items over time, existence of time dependent ranking features, or addition or removal of items to the index (an item getting sold or a new item being listed). However, our method is general and can be applied to any search engine for which the rank of the same document may naturally change over time for the same query. We derive a simple likelihood function that depends on propensities only, and by maximizing the likelihood we are able to get estimates of the propensities. We apply this method to eBay search data to estimate click propensities for web and mobile search and compare these with estimates using the EM method [21]. We also use simulated data to show that the method gives reliable estimates of the “true” simulated propensities. Finally, we train an unbiased learning-to-rank model for eBay search using the estimated propensities and show that it outperforms both baselines - one without position bias correction and one with position bias correction using the EM method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that keeping only query-document pairs that appeared at two ranks exactly is in no way a requirement of our method. The method is general and can be used for query-document pairs that appeared more than twice. This is just intended to simplify our analysis without a significant loss in data, since it is rare for the same query-document pair to appear at more than two ranks.
- 2.
Note that these ranking models are significantly different from the eBay production ranker, the details of which are proprietary.
- 3.
This is true for our data as discussed in Sect. 4. For the cases when most query-document pairs receive multiple clicks we suggest using a different method, such as computing the ratios of propensities by computing the ratios of numbers of clicks.
References
Agarwal, A., Zaitsev, I., Wang, X., Li, C., Najork, M., Joachims, T.: Estimating position bias without intrusive interventions. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 474–482. ACM (2019)
Ai, Q., Bi, K., Luo, C., Guo, J., Croft, W.B.: Unbiased learning to rank with unbiased propensity estimation. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 385–394. ACM (2018)
Burges, C.J.: From ranknet to lambdarank to lambdamart: An overview. Technical report, June 2010
Carterette, B., Chandar, P.: Offline comparative evaluation with incremental, minimally-invasive online feedback. In: The 41st International ACM SIGIR Conference on Research & #38; Development in Information Retrieval, SIGIR 2018, pp. 705–714. ACM, New York (2018). https://doi.org/10.1145/3209978.3210050
Casella, G., George, E.I.: Explaining the gibbs sampler. Am. Stat. 46(3), 167–174 (1992)
Chapelle, O., Zhang, Y.: A dynamic bayesian network click model for web search ranking. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1–10. ACM (2009)
Chuklin, A., Markov, I., Rijke, M.D.: Click models for web search. Synth. Lect. Inf. Concepts Retrieval Serv. 7(3), 1–115 (2015)
Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 87–94. ACM (2008)
Dupret, G.E., Piwowarski, B.: A user browsing model to predict search engine click data from past observations. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 331–338. ACM (2008)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Statist. 29(5), 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451
Guo, F., et al.: Click chain model in web search. In: Proceedings of the 18th International Conference on World Wide Web, pp. 11–20. ACM (2009)
Guo, F., Liu, C., Wang, Y.M.: Efficient multiple-click models in web search. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 124–131. ACM (2009)
He, J., Zhai, C., Li, X.: Evaluation of methods for relative comparison of retrieval systems based on clickthroughs. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2029–2032. ACM (2009)
Hofmann, K., Whiteson, S., De Rijke, M.: A probabilistic method for inferring preferences from clicks. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 249–258. ACM (2011)
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 154–161. SIGIR 2005. ACM, New York (2005). https://doi.org/10.1145/1076034.1076063
Joachims, T., Swaminathan, A., Schnabel, T.: Unbiased learning-to-rank with biased feedback. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. WSDM 2017, pp. 781–789. ACM, New York (2017). https://doi.org/10.1145/3018661.3018699
Joachims, T., et al.: Evaluating retrieval performance using clickthrough data (2003)
Li, H.: A short introduction to learning to rank. IEICE Trans. Inf. Syst. 94(10), 1854–1862 (2011)
Radlinski, F., Joachims, T.: Minimally invasive randomization for collecting unbiased preferences from clickthrough logs (2006)
Radlinski, F., Kleinberg, R., Joachims, T.: Learning diverse rankings with multi-armed bandits. In: Proceedings of the 25th International Conference on Machine Learning, pp. 784–791. ACM (2008)
Wang, X., Golbandi, N., Bendersky, M., Metzler, D., Najork, M.: Position bias estimation for unbiased learning to rank in personal search. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, pp. 610–618. ACM, New York (2018). https://doi.org/10.1145/3159652.3159732
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Likelihood Function Simplification
There are multiple approaches that one can take to estimate the propensities depending on the data itself. Let us first consider the query-document pairs that appeared only at one rank. The parameters \(p_i\) and \(z_j\) appear only as a product of each other in the likelihood function (2). These query-document pairs could be helpful in estimating the product of the propensity at the rank that they appeared at and the relevance \(z_j\) but not each one individually. With \(z_j\) unknown, this would not help to estimate the propensity. We should mention that in the presence of a reliable prior for \(z_j\) and/or \(p_i\) the likelihood function above can be used even for those query-document pairs that appeared only at one rank. In this case it would be more useful to take a Bayesian approach and estimate the posterior distribution for the propensities, for example using Gibbs sampling [5].
From now on we will assume that the query-document pairs appear at least at two different ranks. Another extreme is the case when each query-document pair appears a large number of times at different ranks. This will mean that we will get a large number of query-document pairs at each rank. In this case the propensity ratios for two ranks can be simply estimated by taking the ratio of click through rates of same query-document pairs at these ranks.
Let us now consider the case when the data consists of a large number of query-document pairs that appeared a few times (can be as few as twice) at different ranks, but the query-document pairs do not appear a large enough number of times to be able to get reliable estimates of propensities from taking the ratio of click through rates. In this case we will actually need to maximize the likelihood above and somehow eliminate the nuisance parameters \(z_j\) to get estimates for the \(p_i\). We will focus the rest of this work on this case. Also, the data we have collected from eBay search logs falls in this category, as discussed in Sect. 4.
If a query-document pair appeared only a few times there is a good chance that it did not receive any clicks. These query-document pairs will not help in estimating the propensities by likelihood maximization because of the unknown parameter \(z_j\). Specifically, for such query-document pairs we will have the terms \(\prod _{k=1}^{m_j}(1-p_{r_{jk}}z_j)\). If we use the maximum likelihood approach for estimating the parameters then the maximum will be reached by \(z_j=0\) for which the terms above will be 1. So the query-document pairs without any clicks will not change the maximum likelihood estimate of the propensities. For that reason we will only keep query-document pairs that received at least one click. However, we cannot simply drop the terms from the likelihood function for query-document pairs that did not receive any clicks. Doing so would bias the data towards query-document pairs with a higher likelihood of click. Instead, we will replace the likelihood function above by a conditional probability. Specifically, the likelihood function (2) computes the probability of the click data \(\{c_{jk}\}\) obtained for that query-document pair. We need to replace that probability by a conditional probability - the probability of the click data \(\{c_{jk}\}\) under the condition that there was at least one click received: \(\sum _kc_{jk}>0\). The likelihood function for the query-document pair \(x_j\) will take the form:
Here \(\mathcal {L}_j\) denotes the likelihood function for the query-document pair \(x_j\), \(D_j=\{c_{jk}\}\) denotes the click data for query-document pair j, and P denotes probability. \(\sum _k c_{jk} > 0\) simply means that there was at least one click. In the first line above we have replaced the probability of data \(D_j\) by a conditional probability. The second line uses the formula for conditional probability. The probability of \(D_j\) and at least one click just equals to probability of \(D_j\) since we are only keeping query-document pairs that received at least one click. This is how the second equality of the second line is derived. Finally, in the last line we have explicitly written out \(P(D_j)\) in the numerator as in (2) and the probability of at least one click in the denominator (the probability of no click is \(\prod _{k=1}^{m_j}(1-p_{r_{jk}}z_j)\) so the probability of at least one click is 1 minus that).
The full likelihood is then the product of \(\mathcal {L}_j\) for all query-document pairs:
From now on we will assume by default that our dataset contains only query-document pairs that received at least one click and will omit the subscript \(\sum _k c_{jk} > 0\).
Our last step will be to simplify the likelihood function (5). Typically the click probabilities \(p_iz_j\) are not very large (i.e. not close to 1). This is the probability that the query-document pair j will get a click when displayed at rank i. To simplify the likelihood for each query-document pair we will only keep terms linear in \(p_iz_j\) and drop higher order terms like \(p_{i_1}z_{j_1}p_{i_2}z_{j_2}\). We have verified this simplifying assumption for our data in Sect. 4. In general, we expect this assumption to be valid for most search engines. It is certainly a valid assumption for lower ranks since click through rates are typically much smaller for lower ranks. Since we are dropping product terms the largest ones would be between ranks 1 and 2. For most search engines the click through rates at rank 2 are around 10% or below, which we believe is small enough to be able to safely ignore the product terms mentioned above (they would be at least 10 times smaller than linear terms). We empirically show using simulations in Appendix B that this assumption works very well for data similar to eBay data. If for other search engines the click through rates are much larger for topmost ranks we suggest keeping only those query-document pairs that appeared at least once at a lower enough rank. Also, using the methodology of simulations from Appendix B one can verify how well this assumption works for their particular data.
Under the simplifying assumption we get for the denominator in (5):
Let us now simplify the numerator of (5). Firstly, since the click probabilities are not large and each query-document pair appears only a few times we can assume there is only one click per query-document pairFootnote 3. We can assume \(c_{jl_j}=1\) and \(c_{jk}=0\) for \(k\ne l_j\). The numerator then simplifies to
Using (6) and (7) the likelihood function (5) simplifies to
In the last step \(z_j\) cancels out from the numerator and the denominator. Our assumption of small click probabilities, together with keeping only query-document pairs that received at least one click allowed us to simplify the likelihood function to be only a function of propensities. Now we can simply maximize the likelihood (8) to estimate the propensities.
Equation (8) makes it clear why we need to include the requirement that each query-document pair should appear more than once at different ranks. If we have a query-document pair that appeared only once (or multiple times but always at the same rank) then the numerator and the denominator would cancel each other out in (8). For that reason we will keep only query-document pairs that appeared at two different ranks at least.
It is numerically better to maximize the log-likelihood, which takes the form:
B Results on Simulations
In this Appendix we use simulated data to verify that the method of estimating propensities developed in Sect. 3 works well. For our simulations we choose the following propensity function as truth:
which assigns propensity of 1 for ranks 1 and 2, and then decreases as the inverse of the log of the rank.
Other than choosing our own version of propensities we simulate the data to be as similar to the eBay dataset as possible. We generate a large number of query-document pairs and randomly choose a mean rank \(rank_{mean}\) for each query-document pair uniformly between 1 and 500. We randomly generate a click probability z for that query-document pair depending on the mean rank \(rank_{mean}\). We choose the distribution from which the click probabilities are drawn such that the click through rates at each rank match closely with the click through rates for real data, taking into account the “true” propensities (10). We then generate two different ranks drawn from \(\mathcal {N}(rank_{mean}, (rank_{mean} / 5)^2)\). For each rank i we compute the probability of a click as \(zp_i^{\mathrm {sim}}\). Then we keep only those query-document pairs which appeared at two different ranks and got at least one click, in agreement with our method used for real eBay data. Finally, we keep about 40,000 query-document pairs so that the simulated data is similar to the eBay web search data in size. This becomes the simulated data.
The estimated propensities on the simulated dataset are shown in Fig. 2. The green solid curve shows the true propensity (10), the blue solid curve shows the estimated propensity using the direct estimation method, and the red dashed curve is the estimated propensity using interpolation. As we can see, the estimations closely match with the truth. Furthermore, we can see that the interpolation method gives a better result by reducing the noise in the estimate. These results show that the propensity estimation method developed in this paper works well.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Aslanyan, G., Porwal, U. (2019). Position Bias Estimation for Unbiased Learning-to-Rank in eCommerce Search. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-32686-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32685-2
Online ISBN: 978-3-030-32686-9
eBook Packages: Computer ScienceComputer Science (R0)