Skip to main content
Log in

Consistency of the estimator of binary response models based on AUC maximization

  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

This paper examines the asymptotic properties of a binary response model estimator based on maximization of the Area Under receiver operating characteristic Curve (AUC). Given certain assumptions, AUC maximization is a consistent method of binary response model estimation up to normalizations. As AUC is equivalent to Mann-Whitney U statistics and Wilcoxon test of ranks, maximization of area under ROC curve is equivalent to the maximization of corresponding statistics. Compared to parametric methods, such as logit and probit, AUC maximization relaxes assumptions about error distribution, but imposes some restrictions on the distribution of explanatory variables, which can be easily checked, since this information is observable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. \(k=1\) leads to a degenerate case, when the parameter of interest is normalized to 1 or \(-\)1.

References

  • Agarwal S, Har-Peled S, Roth D (2005) A uniform convergence bound for the area under the ROC curve. In: Proceedings of the 10th international workshop on artificial intelligence and, statistics, pp 1–8

  • Ailon N, Mohri M (2007) An efficient reduction of ranking to classification. Technical Report TR-2007-903, New York University

  • Balcan MF, Bansal N, Beygelzimer A, Coppersmith D, Langford J, Sorkin GB (2008) Robust reductions from ranking to classification. Mach Learn J 72(1–2):139–153

    Google Scholar 

  • Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol 12(4):387–415

    Article  MathSciNet  MATH  Google Scholar 

  • Cortes C, Mohri M (2004) AUC optimization vs error rate minimization. Advances in neural information processing systems. MIT Press, Cambridge

    Google Scholar 

  • Jaroszewicz S (2006) Polynomial association rules with applications to logistic regression. KDD conference paper, pp 586–591

  • Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36

    Google Scholar 

  • Herschtal A, Raskutti B (2004) Optimising area under the roc curve using gradient descent. ACM Press, ICML

    Google Scholar 

  • Horowitz JL (1992) Smoothed maximum score estimator for the binary response model. Econometrica 60(3):505–531

    Article  MathSciNet  MATH  Google Scholar 

  • Manski CF (1975) Maximum score estimation of the stochastic utility model of choice. J Econom 3(3): 205–228

    Google Scholar 

  • Manski CF (1983) Closest empirical distribution estimation. Econometrica 51(2):305–319

    Article  MathSciNet  Google Scholar 

  • Manski CF (1985a) Semiparametric analysis of discrete response: asymptotic properties of the maximum score estimator. J Econom 27(3):313–333

    Article  MathSciNet  MATH  Google Scholar 

  • Manski CF (1985b) Semiparametric analysis of binary response from response-based samples. J Econom 31(1):31–40

    Article  MathSciNet  Google Scholar 

  • Manski CF (1986) Operational characteristics of maximum score estimation. J Econom 32(1):85–108

    Article  MathSciNet  Google Scholar 

  • Manski CF (1988) Identification of binary response models. J Am Stat Assoc 83(403):729–738

    Article  MathSciNet  MATH  Google Scholar 

  • Marrocco C, Duin RPW, Tortorella F (2008) Maximizing the area under the ROC curve by pairwise feature combination. Pattern Recognit 41(6):1961–1974

    Article  MATH  Google Scholar 

  • Rakotomamonjy A (2004) Optimizing area under ROC curve with SVMs. ROC Anal Artif Intell proceedings, 71–80

  • Toh KA, Kim J, Lee S (2008) Maximizing area under ROC curve for biometric scores fusion. Pattern Recognit 41:3373–3392

    Article  MATH  Google Scholar 

  • Train K (2003) Discrete choice methods with simulation, 1st edn. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Wenxia G, Whitmore GA (2010) Binary response and logistic regression in recent accounting research publications: a methodological note. Rev Quant Financ Account 34(1):81–93

    Article  Google Scholar 

  • Wooldridge JM (2006) Introductory econometrics: a modern approach, 3rd edn. Thomson South-Western, Canada

    Google Scholar 

Download references

Acknowledgments

I would like to thank the participants at the 12th Symposium of Mathematics and its Applications (2009) in Timisoara. Furthermore, I wish to thank Alfredas Račkauskas, Dmitrij Celov and Irena Mikolajun for their useful comments and Steve Guttenberg for his help with the English language.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor Fedotenkov.

Appendix

Appendix

1.1 Proof of Lemma 2

Proof

Using the definition of conditional probability, the expression for \(AUC_\infty (b)\) in Eq. (4) can be rewritten:

$$\begin{aligned} AUC_\infty (b)=CP_X(b^{\prime } X_1>b^{\prime } X_2)P_{X,\epsilon }(Y_1=1, Y_2=0 \vert b^{\prime } X_1>b^{\prime } X_2). \end{aligned}$$
(5)

The probability that for a randomly drawn \(X_1\) and \(X_2\) pair inequality \(b^{\prime } X_1>b^{\prime }X_2\) holds is constant and because of assumption 6 it is equal to 0.5. This constant will be included in \(C\), to make notation as simple as possible. Employing this notation and the law of total probability we find:

$$\begin{aligned}&\displaystyle AUC_\infty =C P_{X,\epsilon }(Y_1=1, Y_2=0 \vert b^{\prime }X_1>b^{\prime }X_2)=\end{aligned}$$
(6)
$$\begin{aligned}&\displaystyle C\int \int \limits _{b^{\prime }X_1>b^{\prime }X_2}\int {\mathbb{1 }(Y_1=1 \quad and \quad Y_2=0 \vert X_1,X_2)}dF_\epsilon dF_X(X_1)dF_X(X_2)= \end{aligned}$$
(7)
$$\begin{aligned}&\displaystyle C\int \int \limits _{b^{\prime }X_1>b^{\prime }X_2}P_\epsilon (Y_1=1\vert X_1)P_\epsilon (Y_2=0 \vert X_2)dF_X(X_1)dF_X(X_2). \end{aligned}$$
(8)

Equation (8) derives from the facts that the inner integral in Eq. (7) can be treated as a probability and \(Y_1\) and \(Y_2\) are independent events.

Next, we take the true parameter \(\beta \) and compare it with an arbitrary parameter \(\tilde{\beta }\). In Eq. (8), parameters alter the integral range and also have an impact on \(P_\epsilon (Y_1=1\vert X_1)P_\epsilon (Y_2=0 \vert X_2)\), because \(b\) determines if \(Y\) will be treated as \(Y_1\) or \(Y_2\). Note that for an observation with explanatory factors \(X_r, P(Y_r=0\vert X_r)=F_\epsilon (-b^{\prime } X_r)\) and \(P(Y_r=1\vert X_r)=1-F_\epsilon (-b^{\prime } X_r), \forall X_r, r=1,2\).

Consider \(X_1\) and \(X_2\) from \(D_X\). Without loss of generality it can be assumed that \(\beta ^{\prime } X_1> \beta ^{\prime } X_2\). The observations will be ranked correctly when \(Y_1=1\) and \(Y_2=0\). With parameter \(\beta \), the probability of ranking observations correctly is \(A= P_\epsilon (Y_1=1\vert X_1)P_\epsilon (Y_2=0\vert X_2)= (1-F_\epsilon (-\beta ^{\prime } X_1))F_\epsilon (-\beta ^{\prime } X_2)\). Now let us consider another parameter: \(\tilde{\beta }\). When \(\tilde{\beta } X_1 > \tilde{\beta } X_2\), the probability of ranking observations correctly remains the same because \(\tilde{\beta }\) doesn’t generate data. Namely, term \(1-F_\epsilon (-\beta ^{\prime } X_1)\) remains, because the true probability of \(Y_1=1\) is \(1-F_\epsilon (-\beta ^{\prime } X_1)\). Another situation is \(\tilde{\beta } X_2 > \tilde{\beta } X_1\). Now the probability of correct ranking is \(\tilde{A}= P_\epsilon (Y_2=1\vert X_2)P_\epsilon (Y_1=0\vert X_1)= F_\epsilon (-\beta ^{\prime } X_1)(1-F_\epsilon (-\beta ^{\prime } X_2)) =F_\epsilon (-\beta ^{\prime } X_1)-F_\epsilon (-\beta ^{\prime } X_1)F_\epsilon (-\beta ^{\prime } X_2)\). If we compare this with \(A=F_\epsilon (-\beta ^{\prime } X_2)-F_\epsilon (-\beta ^{\prime } X_1)F_\epsilon (-\beta ^{\prime } X_2)\), it clear that the term \(F_\epsilon (-\beta ^{\prime } X_2)\ge F_\epsilon (-\beta ^{\prime } X_1)\), because \(\beta ^{\prime } X_1> \beta ^{\prime } X_2\) and \(F_\epsilon \) is nondecreasing. Hence, \(A\ge \tilde{A}\).

The integral range of \(AUC_\infty (\beta )\) is \(\beta ^{\prime }(X_1-X_2)>0\), while taking a parameter \(\tilde{\beta }\) the integral range is \(\tilde{\beta }^{\prime }(X_1-X_2)>0\). It may be the case that \(X\) is concentrated in the area \(\tilde{\beta }^{\prime }(X_1-X_2)>0\), with relatively few observations in \(\beta ^{\prime }(X_1-X_2)>0\). To insure that this is not the case, it is assumed that \(X\) is drawn from a distribution that is symmetric around zero. Lines \(\beta ^{\prime }(X_1-X_2)=0\) and \(\tilde{\beta }^{\prime }(X_1-X_2)=0\) both pass through the origin in \(D_X\times D_X\) space, therefore \(X\) symmetry around zero insures that \(AUC_\infty (\beta )\ge AUC_\infty (\tilde{\beta })\). \(\square \)

1.2 Proof of Lemma 3

Proof

Suppose the existence of an \(X_1\) and \(X_2\) pair, \(X_1\in D_X\) and \(X_2 \in D_X\), that \(\beta ^{\prime } X_1> \beta ^{\prime } X_2\), but \(\tilde{\beta }^{\prime } X_1< \tilde{\beta }^{\prime } X_2\). Then a neighborhood of point \(X_1 \tilde{U}(X_1)\) exists such that if we substitute \(X_1\) with an element \(\tilde{X}\) from \(\tilde{U}(X_1)\), the inequalities \(\beta ^{\prime } \tilde{X}> \beta ^{\prime } X_2\) and \(\tilde{\beta }^{\prime } \tilde{X}< \tilde{\beta }^{\prime } X_2\) are valid.

Define \(E_r\):

$$\begin{aligned} E_r=min \left(\Bigg \vert \frac{\beta ^{\prime }(X_1-X_2)}{2\beta _r k}\Bigg \vert , \Bigg \vert \frac{\tilde{\beta }^{\prime }(X_1-X_2)}{2\tilde{\beta }_r k}\Bigg \vert \right), \quad r=1,2,\ldots k. \end{aligned}$$
(9)

Now define the neighborhood \(\tilde{U}(X_1)\) of the point \(X_1\) as a set of all \(\tilde{X}\) such that for each component of \(\tilde{X} \in \tilde{U}(X_1)\) an inequality is valid: \(X_{1,r}-E_r \le \tilde{X}_r \le X_{i,r}+E_r, r=1 \ldots k\). In general \(\tilde{U}(X_1)\) shouldn’t necessarily be a subset of \(D_X\).

Convergence in assumption 5 implies that \(r<\infty \) exists such that \(U_r(X_1)\subset \tilde{U}(X_1)\). Hence, \(P_X( \tilde{\beta }^{\prime } \tilde{X}< \tilde{\beta }^{\prime } X_2 \;and\; \beta ^{\prime } \tilde{X}> \beta ^{\prime } X_2)>0\). \(\square \)

1.3 Proof of Lemma 4

Proof

The previous lemma implies that if such an \(X_1\) and \(X_2\) pair exists, \(X_1\in D_X\) and \(X_2 \in D_X\), that \(\beta ^{\prime } X_1> \beta ^{\prime } X_2\), but \(\tilde{\beta }^{\prime } X_1< \tilde{\beta }^{\prime } X_2\); such inequalities are valid with nonzero probability. Together with assumption 4, we get \(AUC_\infty (\beta )>AUC_\infty (\tilde{\beta })\). (See proof of Lemma 2.)

Suppose, that \(\tilde{\beta }\) exists such that, when \(\beta ^{\prime } X_1> \beta ^{\prime } X_2\) holds, \(\tilde{\beta } X_1> \tilde{\beta } X_2\) holds. We assumed that the first element of \(X\) has a strictly increasing continuous distribution function. Let us take a pair \(X_1,X_2 \in D_X\). Furthermore consider a sequence of \(\eta _r,r=1,2,3\ldots \) such that \(\eta _r<\beta ^{\prime }(X_{1}-X_{2})/\beta _1\), where \(\beta _1\) is the first element of vector \(\beta \) and \(\lim _{r\rightarrow \infty }\eta _r=\beta ^{\prime }(X_{1}-X_{2})/\beta _1\). Thus the inequality \(\beta ^{\prime } X_{1}> \beta ^{\prime } X_{2}+\beta _1\eta _r\) is satisfied. Likewise the inequality with \(\tilde{\beta }\): \(\tilde{\beta }^{\prime } X_{1}> \tilde{\beta }^{\prime } X_{2}+\tilde{\beta }_1\eta _r\) . When \(r\rightarrow \infty \) the last inequality may be rewritten as \((\tilde{\beta }^{\prime }- (\tilde{\beta }_1/\beta _1)\beta ^{\prime })(X_{1}-X_2)\ge 0\). Taking a sequence of \(\eta _r>\beta ^{\prime }(X_{1}-X_{2})/\beta _1\) converging to \(\beta ^{\prime }(X_{1}-X_{2})/\beta _1\) the opposite inequality is found: \(((\tilde{\beta }^{\prime }- (\tilde{\beta }_1/\beta _1)\beta ^{\prime })(X_{1}-X_2)\le 0\). Hence, \((\tilde{\beta }^{\prime }- (\tilde{\beta }_1/\beta _1)\beta ^{\prime })(X_{1}-X_2)= 0\). Therefore, \(\tilde{\beta }/\tilde{\beta }_1=\beta /\beta _1\): the coefficients \(\beta \) and \(\tilde{\beta }\) are proportional.

If \(\tilde{\beta }=c\beta \) for a \(c>0\), the proof that \(AUC_{\infty }(\tilde{\beta })=AUC_{\infty }(\beta )\) is trivial. It follows directly from the definition of the AUC.

It follows that \(AUC_\infty (\tilde{\beta })=AUC_\infty (\beta )\) is equivalent to \(\tilde{\beta }=c\beta \), where \(c\) is a constant. \(\square \)

1.4 Proof of Lemma 5

Proof

To show, that \(AUC_\infty (b)\) is continuous on \(b\) it is sufficient to show that \(AUC_\infty (b+\Delta b)\rightarrow AUC_\infty (b)\) when \(\Delta b\rightarrow 0\). Rewrite Eq. (4) for \(b+\Delta b\):

$$\begin{aligned}&\!\!\!\! AUC_\infty (b+\Delta b)= C P_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad Y_1=1, Y_2=0\big ) \nonumber \\&\quad =CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1>b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ) \nonumber \\&\qquad +CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1\le b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ).\nonumber \\ \end{aligned}$$
(10)

A similar strategy can be performed with \(AUC_\infty (b)\):

$$\begin{aligned}&\!\!\!\! AUC_\infty (b)=\nonumber \\&CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1>b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ) \nonumber \\&+CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1\le (b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1> b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ).\qquad \quad \end{aligned}$$
(11)

Subtracting \(AUC_\infty (b)\) from \(AUC_\infty (b+\Delta b)\) we get:

$$\begin{aligned}&AUC_\infty (b+\Delta b)-AUC_\infty (b)=\nonumber \\&\quad CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1\le b^{\prime }X_2;\quad Y_1=1, Y_2=0\big )\nonumber \\&\quad - CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1\le (b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1> b^{\prime }X_2;\quad Y_1=1, Y_2=0\big )\nonumber \\ \end{aligned}$$
(12)
$$\begin{aligned}&\lim _{\Delta b\rightarrow 0} \Big (AUC_\infty (b{+}\Delta b){-}AUC_\infty (b)\Big )=\nonumber \\&\quad CP_{X,\epsilon }\big (b^{\prime }X_1\ge b^{\prime }X_2;\quad b^{\prime }X_1\le b^{\prime }X_2;\quad Y_1{=}1, Y_2=0\big ) \nonumber \\&\quad -CP_{X,\epsilon }\big (b^{\prime }X_1\le b^{\prime } X_2;\quad b^{\prime }X_1> b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ). \qquad \qquad \qquad \end{aligned}$$
(13)

The first term in the right hand side of Eq. (13) may be rewritten as \(CP_{X,\epsilon }\big (b^{\prime }X_1= b^{\prime }X_2; Y_1=1, Y_2=0\big )\). It is equal to zero because of assumption 6. In the second term, events \(b^{\prime }X_1\le b^{\prime } X_2\) and \(b^{\prime }X_1> b^{\prime }X_2\) are mutually exclusive, so the probability of such an event is also equal to zero. Therefore, \(AUC_\infty (b)\) is continuous on \(b\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fedotenkov, I. Consistency of the estimator of binary response models based on AUC maximization. Stat Methods Appl 22, 381–390 (2013). https://doi.org/10.1007/s10260-013-0229-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-013-0229-4

Keywords

Mathematics Subject Classification (2010)

Navigation