Abstract
This paper studies nonparametric estimation of the regression function with surrogate outcome data under double-sampling designs, where a proxy response is observed for the full sample and the true response is observed on a validation set. A new estimation approach is proposed for estimating the regression function. The authors first estimate the regression function with a kernel smoother based on the validation subsample, and then improve the estimation by utilizing the information on the incomplete observations from the non-validation subsample and the surrogate of response from the full sample. Asymptotic normality of the proposed estimator is derived. The effectiveness of the proposed method is demonstrated via simulations.
Similar content being viewed by others
References
J. Wittes, E. Lakatos, and J. Probstfield, Surrogate endpoints in clinical trials: Cardiovascular diseases, Statist. Med., 1989, 8: 415–425.
J. Neyman, Contribution to the theory of sampling from human populations, J. Amer. Statist. Assoc., 1938, 33: 101–116.
M. S. Pepe, Inference using surrogate outcome data and a validation sample, Biometrika, 1992, 79: 355–365.
S. R. Lipsitz, N. M. Laird, and D. P. Harrington, Weighted least squares analysis of repeated categorical measurements with outcomes subject to nonresponse, Biometrics, 1994, 50: 11–24.
M. S. Pepe, M. Reilly, and T. R. Fleming, Auxiliary outcome data and the mean score method, J. Statist. Plann. Inference, 1994, 42: 137–160.
J. M. Robins, A. Rotnitzky, and L. P. Zhao, Estimation of regression coefficients when some regressors are not always observed, J. Amer. Statist. Assoc., 1994, 89: 846–866.
N. E. Breslow and K. C. Cain, Logistic regression for two-stage case-control data, Biometrika, 1998, 75: 11–20.
J. J. Forster and P. W. F. Smith, Model-based inference for categorical survey data subject to non-ignorable non-response, J. R. Statist. Soc. B., 1998, 60: 57–70.
N. Chatterjee, Y. H. Chen, and N. E. Breslow, A pseudoscore estimator for regression problems with two-phrase sampling, J. Amer. Statist. Assoc., 2003, 98: 158–168.
R. J. A. Little and D. Rubin, Statistical Analysis with Missing Data, 2nd Ed., John Wiley, New York, 2002.
Y. H. Chen and H. Chen, A unified approach to regression analysis under double-sampling designs, J. R. Statist. Soc. B., 2000, 62: 449–460.
J. Jiang and H. Zhou, Additive Hazards Regression with Auxiliary Covariates, Biometrika, 2007, 94: 359–369.
J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications, Chapman and Hall, London, 1996.
J. Fan and I. Gijbels, Data-driven bandwidth selection in local polynomial fitting: Variable bandwidth and spatial adaptation, J. R. Statist. Soc. B., 1995, 57: 371–394.
W. Härdle, Applied Nonparametric Regression Analysis, Cambridge University Press, Cambridge, 1990.
J. Jiang and P. Mack, Robust local polynomial regression for dependent data, Statist. Sinica, 2001, 11: 705–722.
Author information
Authors and Affiliations
Additional information
This research is supported by the National Natural Science Foundation of the US under Grant No. DMS-0906482.
This paper was recommended for publication by Editor Guohua ZOU.
Rights and permissions
About this article
Cite this article
Jiang, X., Jiang, J. & Liu, Y. Nonparametric regression under double-sampling designs. J Syst Sci Complex 24, 167–175 (2011). https://doi.org/10.1007/s11424-011-8129-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-011-8129-x