Abstract
The problem of regression is to estimate the value of a dependent numeric variable based on the values of one or more independent variables. Regression algorithms are used for prediction (including forecasting of time-series data), inference, hypothesis testing, and modeling of causal relationships. Although this problem has been studied extensively, most of these approaches are not generic in that they require the user to make an intelligent guess about the form of the regression equation. In this paper we present a new regression algorithm PAGER – Parameterless, Accurate, Generic, Efficient kNN-based Regression. PAGER is also simple and outlier-resilient. These desirable features make PAGER a very attractive alternative to existing approaches. Our experimental study compares PAGER with 12 other algorithms on 4 standard real datasets, and shows that PAGER is more accurate than its competitors.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Jammalamadaka, N., Pudi, V., Jawahar, C.V.: Efficient Search with Changing Similarity Measures on Large Multimedia Datasets. In: Proc. of the International Multimedia Modelling Conference (2007)
Wang, Y.: A new approach to fitting linear models in high dimensional spaces, PhD thesis, Department of Computer Science, University of Waikato, New Zealand (2000)
Wang, Y., Witten, I.H.: Modeling for optimal probability prediction (2002)
Barreto, H.: An Introduction to Least Median of Squares. Chapter contribution to Barreto and Howland, Econometrics via Monte Carlo Simulation
Lingjaerde, O.C., Liestøl, K.: Generalized projection pursuit regression. SIAM Journal on Scientific Computing (1999)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees. Wadsworth Inc. (1984)
Todorovski, L.: Declarative bias in equation discovery. M.Sc. Thesis. Faculty of Computer and Information Science, Ljubljana, Slovenia (1998)
Dzeroski, S., Todorovski, L.: Discovering dynamics: from inductive logic programming to machine discovery. Journal of Intelligent Information Systems 4, 89–108 (1995)
Smola, A.J., Scholkopf, B.: A tutorial on support vector regression. Technical Report NC2-TR-1998-030, NeuroCOLT2 Technical Report Series (1998)
Shevade, S., Keerthi, S., Bhattacharyya, C., Murthy, K.: Improvements to smo algorithm for svm regression. Technical Report CD-99-16, Control Division Dept of Mechanical and Production Engineering, National University of Singapore (1999)
Chu, W., Keerthi, S.S.: New approaches to support vector ordinal regression. In: Proc. of International Conference on Machine Learning (ICML 2005), pp. 142–152 (2005)
Ware, M.: Implementation of multilayer perceptron backpropagation (2005), http://weka.sourceforge.net/doc/weka/classifiers/functions/MultilayerPerceptron.html
Mielniczuk, J., Tyrcha, J.: Consistency of multilayer perceptron regression estimators. Neural Networks 53(2), 1019–1022 (1993)
Haykin, S.: Self-organizing maps. In: Neural networks - A comprehensive foundation, 2nd edn. Prentice-Hall, Englewood Cliffs
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
Schapire, R.E.: A Brief Introduction to Boosting. In: Proc. 16th International Joint Conf. Artificial Intelligence, pp. 1401–1406 (1999)
Fix, E., Hodges Jr., J.L.: Discriminatory analysis, non-parameteric discrimination: Consistency properties. Technical Report 21-49-004(4), USAF school of aviation medicine, Randolf field, Texas (1951)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, Chichester (1987)
Asuncion, A., Newman, D.: UCI Machine learning repository (2007)
The body fat dataset (1985), http://lib.stat.cmu.edu/datasets/bodyfat
Friedman, J.H.: Stochastic Gradient Boosting. Technical Report Stanford University (1999), http://www-stat.stanford.edu/~jhf/ftp/stobst.ps
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, H., Desai, A., Pudi, V. (2010). PAGER: Parameterless, Accurate, Generic, Efficient kNN-Based Regression. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15251-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-15251-1_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15250-4
Online ISBN: 978-3-642-15251-1
eBook Packages: Computer ScienceComputer Science (R0)