Abstract
Among all machine learning problems, classification is the most well studied, and has the most number of solution methodologies. This embarrassment of riches also leads to the natural problems of model selection and evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Instead of computing the expected values of the bias-variance trade-off over different choices of training data sets, one can compute it over different randomized choices of models. This approach is referred to as the model-centric view of the bias-variance trade-off [9]. The traditional view of the bias-variance trade-off is a data-centric view in which the randomized process to describe the bias-variance trade-off is defined by using different choices of training data sets. From the data-centric view, a random forest is really a bias reduction method over training data sets of small size.
- 2.
Throughout this book, we have used y j ∈ {−1, +1} in the classification setting. However, we switch to the notation {0, 1} here for greater conformity with the information retrieval literature.
Bibliography
C. Aggarwal. Recommender systems: The textbook. Springer, 2016.
C. Aggarwal. Outlier analysis. Springer, 2017.
C. Aggarwal and S. Sathe. Outlier ensembles: An introduction. Springer, 2017.
L. Breiman. Random forests. Journal Machine Learning archive, 45(1), pp. 5–32, 2001.
L. Breiman. Bagging predictors. Machine Learning, 24(2), pp. 123–140, 1996.
P. Bühlmann and B. Yu. Analyzing bagging. Annals of Statistics, pp. 927–961, 2002.
T. Fawcett. ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report HPL-2003-4, Palo Alto, CA, HP Laboratories, 2003.
Y. Freund, and R. Schapire. A decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, pp. 23–37, 1995.
J. Friedman. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), pp. 367–378, 2002.
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics, 28(2), pp. 337–407, 2000.
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.
T. Hastie and R. Tibshirani. Generalized additive models. CRC Press, 1990.
T. K. Ho. Random decision forests. Third International Conference on Document Analysis and Recognition, 1995. Extended version appears as “The random subspace method for constructing decision forests” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), pp. 832–844, 1998.
T. K. Ho. Nearest neighbors in random subspaces. Lecture Notes in Computer Science, Vol. 1451, pp. 640–648, Proceedings of the Joint IAPR Workshops SSPR’98 and SPR’98, 1998. http://link.springer.com/chapter/10.1007/BFb0033288
R. Kohavi and D. Wolpert. Bias plus variance decomposition for zero-one loss functions. ICML Conference, 1996.
E. Kong and T. Dietterich. Error-correcting output coding corrects bias and variance. ICML Conference, pp. 313–321, 1995.
M. Kuhn. Building predictive models in R Using the caret Package. Journal of Statistical Software, 28(5), pp. 1–26, 2008. https://cran.r-project.org/web/packages/caret/index.html
C. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press, Cambridge, 2008.
R. Samworth. Optimal weighted nearest neighbour classifiers. The Annals of Statistics, 40(5), pp. 2733–2763, 2012.
G. Seni and J. Elder. Ensemble methods in data mining: Improving accuracy through combining predictions. Synthesis Lectures in Data Mining and Knowledge Discovery, Morgan and Claypool, 2010.
J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. ACM SIGIR Conference, 2007.
Z.-H. Zhou. Ensemble methods: Foundations and algorithms. CRC Press, 2012.
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
https://cran.r-project.org/web/packages/RTextTools/RTextTools.pdf
https://cran.r-project.org/web/packages/rotationForest/index.html
https://archive.ics.uci.edu/ml/datasets/Reuters-21578+Text+Categorization+ Collection
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Aggarwal, C.C. (2018). Classifier Performance and Evaluation. In: Machine Learning for Text. Springer, Cham. https://doi.org/10.1007/978-3-319-73531-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-73531-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73530-6
Online ISBN: 978-3-319-73531-3
eBook Packages: Computer ScienceComputer Science (R0)