Classifier Performance and Evaluation

Aggarwal, Charu C.

doi:10.1007/978-3-319-73531-3_7

Charu C. Aggarwal²

9897 Accesses

Abstract

Among all machine learning problems, classification is the most well studied, and has the most number of solution methodologies. This embarrassment of riches also leads to the natural problems of model selection and evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Instead of computing the expected values of the bias-variance trade-off over different choices of training data sets, one can compute it over different randomized choices of models. This approach is referred to as the model-centric view of the bias-variance trade-off [9]. The traditional view of the bias-variance trade-off is a data-centric view in which the randomized process to describe the bias-variance trade-off is defined by using different choices of training data sets. From the data-centric view, a random forest is really a bias reduction method over training data sets of small size.
2.
Throughout this book, we have used y _j ∈ {−1, +1} in the classification setting. However, we switch to the notation {0, 1} here for greater conformity with the information retrieval literature.

Bibliography

C. Aggarwal. Recommender systems: The textbook. Springer, 2016.
Google Scholar
C. Aggarwal. Outlier analysis. Springer, 2017.
Book Google Scholar
C. Aggarwal and S. Sathe. Outlier ensembles: An introduction. Springer, 2017.
Google Scholar
L. Breiman. Random forests. Journal Machine Learning archive, 45(1), pp. 5–32, 2001.
Article Google Scholar
L. Breiman. Bagging predictors. Machine Learning, 24(2), pp. 123–140, 1996.
MathSciNet MATH Google Scholar
P. Bühlmann and B. Yu. Analyzing bagging. Annals of Statistics, pp. 927–961, 2002.
Article MathSciNet Google Scholar
T. Fawcett. ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report HPL-2003-4, Palo Alto, CA, HP Laboratories, 2003.
Google Scholar
Y. Freund, and R. Schapire. A decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, pp. 23–37, 1995.
Google Scholar
J. Friedman. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), pp. 367–378, 2002.
Article MathSciNet Google Scholar
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), pp. 337–407, 2000.
Article MathSciNet Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.
Google Scholar
T. Hastie and R. Tibshirani. Generalized additive models. CRC Press, 1990.
Google Scholar
T. K. Ho. Random decision forests. Third International Conference on Document Analysis and Recognition, 1995. Extended version appears as “The random subspace method for constructing decision forests” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), pp. 832–844, 1998.
Google Scholar
T. K. Ho. Nearest neighbors in random subspaces. Lecture Notes in Computer Science, Vol. 1451, pp. 640–648, Proceedings of the Joint IAPR Workshops SSPR’98 and SPR’98, 1998. http://link.springer.com/chapter/10.1007/BFb0033288
R. Kohavi and D. Wolpert. Bias plus variance decomposition for zero-one loss functions. ICML Conference, 1996.
Google Scholar
E. Kong and T. Dietterich. Error-correcting output coding corrects bias and variance. ICML Conference, pp. 313–321, 1995.
Chapter Google Scholar
M. Kuhn. Building predictive models in R Using the caret Package. Journal of Statistical Software, 28(5), pp. 1–26, 2008. https://cran.r-project.org/web/packages/caret/index.html
Article Google Scholar
C. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press, Cambridge, 2008.
Google Scholar
R. Samworth. Optimal weighted nearest neighbour classifiers. The Annals of Statistics, 40(5), pp. 2733–2763, 2012.
Article MathSciNet Google Scholar
G. Seni and J. Elder. Ensemble methods in data mining: Improving accuracy through combining predictions. Synthesis Lectures in Data Mining and Knowledge Discovery, Morgan and Claypool, 2010.
Article Google Scholar
J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. ACM SIGIR Conference, 2007.
Google Scholar
Z.-H. Zhou. Ensemble methods: Foundations and algorithms. CRC Press, 2012.
Google Scholar
https://archive.ics.uci.edu/ml/datasets.html
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
http://www.cs.waikato.ac.nz/ml/weka/
http://nlp.stanford.edu/links/statnlp.html
https://cran.r-project.org/web/packages/RTextTools/RTextTools.pdf
https://cran.r-project.org/web/packages/rotationForest/index.html
http://trec.nist.gov/data.html
http://research.nii.ac.jp/ntcir/data/data-en.html
http://www.clef-initiative.eu/home
https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups
https://archive.ics.uci.edu/ml/datasets/Reuters-21578+Text+Categorization+ Collection
http://www.daviddlewis.com/resources/testcollections/rcv1/
http://labs.europeana.eu/data
http://www.icwsm.org/2009/data/

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Charu C. Aggarwal

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C. (2018). Classifier Performance and Evaluation. In: Machine Learning for Text. Springer, Cham. https://doi.org/10.1007/978-3-319-73531-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-73531-3_7
Published: 20 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73530-6
Online ISBN: 978-3-319-73531-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics