Remedies for Severe Class Imbalance

  • Max Kuhn
  • Kjell Johnson

Abstract

When modeling discrete classes, the relative frequencies of the classes can have a significant impact on the effectiveness of the model. An imbalance occurs when one or more classes have very low proportions in the training data as compared to the other classes. Imbalance can be present in any data set or application, and hence, the practitioner should be aware of the implications of modeling this type of data. To illustrate the impacts and remedies for severe class imbalance, we present a case study example (Section 16.1) and the impact of class imbalance on performances measures (Section 16.2). Sections 16.3-16.6 describe approaches for handling imbalance using the existing data such as maximizing minority class accuracy, adjusting classification cut-offs or prior probabilities, or adjusting sample weights prior to model tuning. Handling imbalance can also be done through sophisticated up- or down-sampling methods (Section 16.7) or by applying costs to the classification errors (Section 16.8). In the Computing Section (16.9) we demonstrate how to implement these remedies in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.

References

  1. Abdi H, Williams L (2010). “Principal Component Analysis.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.CrossRefGoogle Scholar
  2. Agresti A (2002). Categorical Data Analysis. Wiley–Interscience.Google Scholar
  3. Ahdesmaki M, Strimmer K (2010). “Feature Selection in Omics Prediction Problems Using CAT Scores and False Nondiscovery Rate Control.” The Annals of Applied Statistics, 4(1), 503–519.MathSciNetMATHCrossRefGoogle Scholar
  4. Alin A (2009). “Comparison of PLS Algorithms when Number of Objects is Much Larger than Number of Variables.” Statistical Papers, 50, 711–720.MathSciNetMATHCrossRefGoogle Scholar
  5. Altman D, Bland J (1994). “Diagnostic Tests 3: Receiver Operating Characteristic Plots.” British Medical Journal, 309(6948), 188.CrossRefGoogle Scholar
  6. Ambroise C, McLachlan G (2002). “Selection Bias in Gene Extraction on the Basis of Microarray Gene–Expression Data.” Proceedings of the National Academy of Sciences, 99(10), 6562–6566.MATHCrossRefGoogle Scholar
  7. Amit Y, Geman D (1997). “Shape Quantization and Recognition with Randomized Trees.” Neural Computation, 9, 1545–1588.CrossRefGoogle Scholar
  8. Armitage P, Berry G (1994). Statistical Methods in Medical Research. Blackwell Scientific Publications, Oxford, 3rd edition.Google Scholar
  9. Artis M, Ayuso M, Guillen M (2002). “Detection of Automobile Insurance Fraud with Discrete Choice Models and Misclassified Claims.” The Journal of Risk and Insurance, 69(3), 325–340.CrossRefGoogle Scholar
  10. Austin P, Brunner L (2004). “Inflation of the Type I Error Rate When a Continuous Confounding Variable Is Categorized in Logistic Regression Analyses.” Statistics in Medicine, 23(7), 1159–1178.CrossRefGoogle Scholar
  11. Ayres I (2007). Super Crunchers: Why Thinking–By–Numbers Is The New Way To Be Smart. Bantam.Google Scholar
  12. Barker M, Rayens W (2003). “Partial Least Squares for Discrimination.” Journal of Chemometrics, 17(3), 166–173.CrossRefGoogle Scholar
  13. Batista G, Prati R, Monard M (2004). “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data.” ACM SIGKDD Explorations Newsletter, 6(1), 20–29.CrossRefGoogle Scholar
  14. Bauer E, Kohavi R (1999). “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” Machine Learning, 36, 105–142.CrossRefGoogle Scholar
  15. Becton Dickinson and Company (1991). ProbeTec ET Chlamydia trachomatis and Neisseria gonorrhoeae Amplified DNA Assays (Package Insert).Google Scholar
  16. Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). “Tissue Classification with Gene Expression Profiles.” Journal of Computational Biology, 7(3), 559–583.CrossRefGoogle Scholar
  17. Bentley J (1975). “Multidimensional Binary Search Trees Used for Associative Searching.” Communications of the ACM, 18(9), 509–517.MathSciNetMATHCrossRefGoogle Scholar
  18. Berglund A, Kettaneh N, Uppgård L, Wold S, DR NB, Cameron (2001). “The GIFI Approach to Non–Linear PLS Modeling.” Journal of Chemometrics, 15, 321–336.Google Scholar
  19. Berglund A, Wold S (1997). “INLR, Implicit Non–Linear Latent Variable Regression.” Journal of Chemometrics, 11, 141–156.CrossRefGoogle Scholar
  20. Bergmeir C, Benitez JM (2012). “Neural Networks in R Using the Stuttgart Neural Network Simulator: RSNNS.” Journal of Statistical Software, 46(7), 1–26.CrossRefGoogle Scholar
  21. Bergstra J, Casagrande N, Erhan D, Eck D, Kégl B (2006). “Aggregate Features and AdaBoost for Music Classification.” Machine Learning, 65, 473–484.CrossRefGoogle Scholar
  22. Berntsson P, Wold S (1986). “Comparison Between X-ray Crystallographic Data and Physiochemical Parameters with Respect to Their Information About the Calcium Channel Antagonist Activity of 4-Phenyl-1,4-Dihydropyridines.” Quantitative Structure-Activity Relationships, 5, 45–50.CrossRefGoogle Scholar
  23. Bhanu B, Lin Y (2003). “Genetic Algorithm Based Feature Selection for Target Detection in SAR Images.” Image and Vision Computing, 21, 591–608.CrossRefGoogle Scholar
  24. Bishop C (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford.MATHGoogle Scholar
  25. Bishop C (2006). Pattern Recognition and Machine Learning. Springer.Google Scholar
  26. Bland J, Altman D (1995). “Statistics Notes: Multiple Significance Tests: The Bonferroni Method.” British Medical Journal, 310(6973), 170–170.CrossRefGoogle Scholar
  27. Bland J, Altman D (2000). “The Odds Ratio.” British Medical Journal, 320(7247), 1468.CrossRefGoogle Scholar
  28. Bohachevsky I, Johnson M, Stein M (1986). “Generalized Simulated Annealing for Function Optimization.” Technometrics, 28(3), 209–217.MATHCrossRefGoogle Scholar
  29. Bone R, Balk R, Cerra F, Dellinger R, Fein A, Knaus W, Schein R, Sibbald W (1992). “Definitions for Sepsis and Organ Failure and Guidelines for the Use of Innovative Therapies in Sepsis.” Chest, 101(6), 1644–1655.CrossRefGoogle Scholar
  30. Boser B, Guyon I, Vapnik V (1992). “A Training Algorithm for Optimal Margin Classifiers.” In “Proceedings of the Fifth Annual Workshop on Computational Learning Theory,” pp. 144–152.Google Scholar
  31. Boulesteix A, Strobl C (2009). “Optimal Classifier Selection and Negative Bias in Error Rate Estimation: An Empirical Study on High–Dimensional Prediction.” BMC Medical Research Methodology, 9(1), 85.CrossRefGoogle Scholar
  32. Box G, Cox D (1964). “An Analysis of Transformations.” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252.Google Scholar
  33. Box G, Hunter W, Hunter J (1978). Statistics for Experimenters. Wiley, New York.MATHGoogle Scholar
  34. Box G, Tidwell P (1962). “Transformation of the Independent Variables.” Technometrics, 4(4), 531–550.MathSciNetMATHCrossRefGoogle Scholar
  35. Breiman L (1996a). “Bagging Predictors.” Machine Learning, 24(2), 123–140.MathSciNetMATHGoogle Scholar
  36. Breiman L (1996b). “Heuristics of Instability and Stabilization in Model Selection.” The Annals of Statistics, 24(6), 2350–2383.MathSciNetMATHCrossRefGoogle Scholar
  37. Breiman L (1996c). “Technical Note: Some Properties of Splitting Criteria.” Machine Learning, 24(1), 41–47.MathSciNetMATHGoogle Scholar
  38. Breiman L (1998). “Arcing Classifiers.” The Annals of Statistics, 26, 123–140.MathSciNetMATHGoogle Scholar
  39. Breiman L (2000). “Randomizing Outputs to Increase Prediction Accuracy.” Mach. Learn., 40, 229–242. ISSN 0885-6125.Google Scholar
  40. Breiman L (2001). “Random Forests.” Machine Learning, 45, 5–32.MATHCrossRefGoogle Scholar
  41. Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and Regression Trees. Chapman and Hall, New York.MATHGoogle Scholar
  42. Bridle J (1990). “Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition.” In “Neurocomputing: Algorithms, Architectures and Applications,” pp. 227–236. Springer–Verlag.Google Scholar
  43. Brillinger D (2004). “Some Data Analyses Using Mutual Information.” Brazilian Journal of Probability and Statistics, 18(6), 163–183.MathSciNetMATHGoogle Scholar
  44. Brodnjak-Vonina D, Kodba Z, Novi M (2005). “Multivariate Data Analysis in Classification of Vegetable Oils Characterized by the Content of Fatty Acids.” Chemometrics and Intelligent Laboratory Systems, 75(1), 31–43.CrossRefGoogle Scholar
  45. Brown C, Davis H (2006). “Receiver Operating Characteristics Curves and Related Decision Measures: A Tutorial.” Chemometrics and Intelligent Laboratory Systems, 80(1), 24–38.CrossRefGoogle Scholar
  46. Bu G (2009). “Apolipoprotein E and Its Receptors in Alzheimer’s Disease: Pathways, Pathogenesis and Therapy.” Nature Reviews Neuroscience, 10(5), 333–344.CrossRefGoogle Scholar
  47. Buckheit J, Donoho DL (1995). “WaveLab and Reproducible Research.” In A Antoniadis, G Oppenheim (eds.), “Wavelets in Statistics,” pp. 55–82. Springer-Verlag, New York.CrossRefGoogle Scholar
  48. Burez J, Van den Poel D (2009). “Handling Class Imbalance In Customer Churn Prediction.” Expert Systems with Applications, 36(3), 4626–4636.CrossRefGoogle Scholar
  49. Cancedda N, Gaussier E, Goutte C, Renders J (2003). “Word–Sequence Kernels.” The Journal of Machine Learning Research, 3, 1059–1082.MathSciNetMATHGoogle Scholar
  50. Caputo B, Sim K, Furesjo F, Smola A (2002). “Appearance–Based Object Recognition Using SVMs: Which Kernel Should I Use?” In “Proceedings of NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision,”.Google Scholar
  51. Carolin C, Boulesteix A, Augustin T (2007). “Unbiased Split Selection for Classification Trees Based on the Gini Index.” Computational Statistics & Data Analysis, 52(1), 483–501.MathSciNetMATHCrossRefGoogle Scholar
  52. Castaldi P, Dahabreh I, Ioannidis J (2011). “An Empirical Assessment of Validation Practices for Molecular Classifiers.” Briefings in Bioinformatics, 12(3), 189–202.CrossRefGoogle Scholar
  53. Chambers J (2008). Software for Data Analysis: Programming with R. Springer.Google Scholar
  54. Chan K, Loh W (2004). “LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees.” Journal of Computational and Graphical Statistics, 13(4), 826–852.MathSciNetCrossRefGoogle Scholar
  55. Chang CC, Lin CJ (2011). “LIBSVM: A Library for Support Vector Machines.” ACM Transactions on Intelligent Systems and Technology, 2, 27: 1–27:27.Google Scholar
  56. Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002). “SMOTE: Synthetic Minority Over–Sampling Technique.” Journal of Artificial Intelligence Research, 16(1), 321–357.MATHGoogle Scholar
  57. Chun H, Keleş S (2010). “Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(1), 3–25.MathSciNetCrossRefGoogle Scholar
  58. Chung D, Keles S (2010). “Sparse Partial Least Squares Classification for High Dimensional Data.” Statistical Applications in Genetics and Molecular Biology, 9(1), 17.MathSciNetMATHCrossRefGoogle Scholar
  59. Clark R (1997). “OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets.” Journal of Chemical Information and Computer Sciences, 37(6), 1181–1188.CrossRefGoogle Scholar
  60. Clark T (2004). “Can Out–of–Sample Forecast Comparisons Help Prevent Overfitting?” Journal of Forecasting, 23(2), 115–139.CrossRefGoogle Scholar
  61. Clemmensen L, Hastie T, Witten D, Ersboll B (2011). “Sparse Discriminant Analysis.” Technometrics, 53(4), 406–413.MathSciNetCrossRefGoogle Scholar
  62. Cleveland W (1979). “Robust Locally Weighted Regression and Smoothing Scatterplots.” Journal of the American Statistical Association, 74(368), 829–836.MathSciNetMATHCrossRefGoogle Scholar
  63. Cleveland W, Devlin S (1988). “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting.” Journal of the American Statistical Association, pp. 596–610.Google Scholar
  64. Cohen G, Hilario M, Pellegrini C, Geissbuhler A (2005). “SVM Modeling via a Hybrid Genetic Strategy. A Health Care Application.” In R Engelbrecht, AGC Lovis (eds.), “Connecting Medical Informatics and Bio–Informatics,” pp. 193–198. IOS Press.Google Scholar
  65. Cohen J (1960). “A Coefficient of Agreement for Nominal Data.” Educational and Psychological Measurement, 20, 37–46.CrossRefGoogle Scholar
  66. Cohn D, Atlas L, Ladner R (1994). “Improving Generalization with Active Learning.” Machine Learning, 15(2), 201–221.Google Scholar
  67. Cornell J (2002). Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data. Wiley, New York, NY.MATHCrossRefGoogle Scholar
  68. Cortes C, Vapnik V (1995). “Support–Vector Networks.” Machine Learning, 20(3), 273–297.MATHGoogle Scholar
  69. Costa N, Lourenco J, Pereira Z (2011). “Desirability Function Approach: A Review and Performance Evaluation in Adverse Conditions.” Chemometrics and Intelligent Lab Systems, 107(2), 234–244.CrossRefGoogle Scholar
  70. Cover TM, Thomas JA (2006). Elements of Information Theory. Wiley–Interscience.Google Scholar
  71. Craig-Schapiro R, Kuhn M, Xiong C, Pickering E, Liu J, Misko TP, Perrin R, Bales K, Soares H, Fagan A, Holtzman D (2011). “Multiplexed Immunoassay Panel Identifies Novel CSF Biomarkers for Alzheimer’s Disease Diagnosis and Prognosis.” PLoS ONE, 6(4), e18850.CrossRefGoogle Scholar
  72. Cruz-Monteagudo M, Borges F, Cordeiro MND (2011). “Jointly Handling Potency and Toxicity of Antimicrobial Peptidomimetics by Simple Rules from Desirability Theory and Chemoinformatics.” Journal of Chemical Information and Modeling, 51(12), 3060–3077.CrossRefGoogle Scholar
  73. Davison M (1983). Multidimensional Scaling. John Wiley and Sons, Inc.MATHGoogle Scholar
  74. Dayal B, MacGregor J (1997). “Improved PLS Algorithms.” Journal of Chemometrics, 11, 73–85.CrossRefGoogle Scholar
  75. de Jong S (1993). “SIMPLS: An Alternative Approach to Partial Least Squares Regression.” Chemometrics and Intelligent Laboratory Systems, 18, 251–263.CrossRefGoogle Scholar
  76. de Jong S, Ter Braak C (1994). “Short Communication: Comments on the PLS Kernel Algorithm.” Journal of Chemometrics, 8, 169–174.CrossRefGoogle Scholar
  77. de Leon M, Klunk W (2006). “Biomarkers for the Early Diagnosis of Alzheimer’s Disease.” The Lancet Neurology, 5(3), 198–199.CrossRefGoogle Scholar
  78. Defernez M, Kemsley E (1997). “The Use and Misuse of Chemometrics for Treating Classification Problems.” TrAC Trends in Analytical Chemistry, 16(4), 216–221.CrossRefGoogle Scholar
  79. DeLong E, DeLong D, Clarke-Pearson D (1988). “Comparing the Areas Under Two Or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics, 44(3), 837–45.MATHCrossRefGoogle Scholar
  80. Derksen S, Keselman H (1992). “Backward, Forward and Stepwise Automated Subset Selection Algorithms: Frequency of Obtaining Authentic and Noise Variables.” British Journal of Mathematical and Statistical Psychology, 45(2), 265–282.CrossRefGoogle Scholar
  81. Derringer G, Suich R (1980). “Simultaneous Optimization of Several Response Variables.” Journal of Quality Technology, 12(4), 214–219.Google Scholar
  82. Dietterich T (2000). “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.” Machine Learning, 40, 139–158.CrossRefGoogle Scholar
  83. Dillon W, Goldstein M (1984). Multivariate Analysis: Methods and Applications. Wiley, New York.MATHGoogle Scholar
  84. Dobson A (2002). An Introduction to Generalized Linear Models. Chapman & Hall/CRC.Google Scholar
  85. Drucker H, Burges C, Kaufman L, Smola A, Vapnik V (1997). “Support Vector Regression Machines.” Advances in Neural Information Processing Systems, pp. 155–161.Google Scholar
  86. Drummond C, Holte R (2000). “Explicitly Representing Expected Cost: An Alternative to ROC Representation.” In “Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,” pp. 198–207.CrossRefGoogle Scholar
  87. Duan K, Keerthi S (2005). “Which is the Best Multiclass SVM Method? An Empirical Study.” Multiple Classifier Systems, pp. 278–285.Google Scholar
  88. Dudoit S, Fridlyand J, Speed T (2002). “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association, 97(457), 77–87.MathSciNetMATHCrossRefGoogle Scholar
  89. Duhigg C (2012). “How Companies Learn Your Secrets.” The New York Times. URL http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html.
  90. Dunn W, Wold S (1990). “Pattern Recognition Techniques in Drug Design.” In C Hansch, P Sammes, J Taylor (eds.), “Comprehensive Medicinal Chemistry,” pp. 691–714. Pergamon Press, Oxford.Google Scholar
  91. Dwyer D (2005). “Examples of Overfitting Encountered When Building Private Firm Default Prediction Models.” Technical report, Moody’s KMV.Google Scholar
  92. Efron B (1983). “Estimating the Error Rate of a Prediction Rule: Improvement on Cross–Validation.” Journal of the American Statistical Association, pp. 316–331.Google Scholar
  93. Efron B, Hastie T, Johnstone I, Tibshirani R (2004). “Least Angle Regression.” The Annals of Statistics, 32(2), 407–499.MathSciNetMATHCrossRefGoogle Scholar
  94. Efron B, Tibshirani R (1986). “Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy.” Statistical Science, pp. 54–75.Google Scholar
  95. Efron B, Tibshirani R (1997). “Improvements on Cross–Validation: The 632+ Bootstrap Method.” Journal of the American Statistical Association, 92(438), 548–560.MathSciNetMATHGoogle Scholar
  96. Eilers P, Boer J, van Ommen G, van Houwelingen H (2001). “Classification of Microarray Data with Penalized Logistic Regression.” In “Proceedings of SPIE,” volume 4266, p. 187.Google Scholar
  97. Eugster M, Hothorn T, Leisch F (2008). “Exploratory and Inferential Analysis of Benchmark Experiments.” Ludwigs-Maximilians-Universität München, Department of Statistics, Tech. Rep, 30.Google Scholar
  98. Everitt B, Landau S, Leese M, Stahl D (2011). Cluster Analysis. Wiley.Google Scholar
  99. Ewald B (2006). “Post Hoc Choice of Cut Points Introduced Bias to Diagnostic Research.” Journal of clinical epidemiology, 59(8), 798–801.CrossRefGoogle Scholar
  100. Fanning K, Cogger K (1998). “Neural Network Detection of Management Fraud Using Published Financial Data.” International Journal of Intelligent Systems in Accounting, Finance & Management, 7(1), 21–41.CrossRefGoogle Scholar
  101. Faraway J (2005). Linear Models with R. Chapman & Hall/CRC, Boca Raton.MATHGoogle Scholar
  102. Fawcett T (2006). “An Introduction to ROC Analysis.” Pattern Recognition Letters, 27(8), 861–874.MathSciNetCrossRefGoogle Scholar
  103. Fisher R (1936). “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics, 7(2), 179–188.CrossRefGoogle Scholar
  104. Forina M, Casale M, Oliveri P, Lanteri S (2009). “CAIMAN brothers: A Family of Powerful Classification and Class Modeling Techniques.” Chemometrics and Intelligent Laboratory Systems, 96(2), 239–245.CrossRefGoogle Scholar
  105. Frank E, Wang Y, Inglis S, Holmes G (1998). “Using Model Trees for Classification.” Machine Learning.Google Scholar
  106. Frank E, Witten I (1998). “Generating Accurate Rule Sets Without Global Optimization.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151.Google Scholar
  107. Free Software Foundation (June 2007). GNU General Public License.Google Scholar
  108. Freund Y (1995). “Boosting a Weak Learning Algorithm by Majority.” Information and Computation, 121, 256–285.MathSciNetMATHCrossRefGoogle Scholar
  109. Freund Y, Schapire R (1996). “Experiments with a New Boosting Algorithm.” Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156.Google Scholar
  110. Friedman J (1989). “Regularized Discriminant Analysis.” Journal of the American Statistical Association, 84(405), 165–175.MathSciNetCrossRefGoogle Scholar
  111. Friedman J (1991). “Multivariate Adaptive Regression Splines.” The Annals of Statistics, 19(1), 1–141.MathSciNetMATHCrossRefGoogle Scholar
  112. Friedman J (2001). “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 29(5), 1189–1232.MathSciNetMATHCrossRefGoogle Scholar
  113. Friedman J (2002). “Stochastic Gradient Boosting.” Computational Statistics and Data Analysis, 38(4), 367–378.MathSciNetMATHCrossRefGoogle Scholar
  114. Friedman J, Hastie T, Tibshirani R (2000). “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics, 38, 337–374.MathSciNetMATHCrossRefGoogle Scholar
  115. Friedman J, Hastie T, Tibshirani R (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33(1), 1–22.CrossRefGoogle Scholar
  116. Geisser S (1993). Predictive Inference: An Introduction. Chapman and Hall.Google Scholar
  117. Geladi P, Kowalski B (1986). “Partial Least-Squares Regression: A Tutorial.” Analytica Chimica Acta, 185, 1–17.CrossRefGoogle Scholar
  118. Geladi P, Manley M, Lestander T (2003). “Scatter Plotting in Multivariate Data Analysis.” Journal of Chemometrics, 17(8–9), 503–511.CrossRefGoogle Scholar
  119. Gentleman R (2008). R Programming for Bioinformatics. CRC Press.Google Scholar
  120. Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber M, Iacus S, Irizarry R, Leisch F, Li C, Mächler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J (2004). “Bioconductor: Open Software Development for Computational Biology and Bioinformatics.” Genome Biology, 5(10), R80.CrossRefGoogle Scholar
  121. Giuliano K, DeBiasio R, Dunlay R, Gough A, Volosky J, Zock J, Pavlakis G, Taylor D (1997). “High–Content Screening: A New Approach to Easing Key Bottlenecks in the Drug Discovery Process.” Journal of Biomolecular Screening, 2(4), 249–259.CrossRefGoogle Scholar
  122. Goldberg D (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Addison–Wesley, Boston.MATHGoogle Scholar
  123. Golub G, Heath M, Wahba G (1979). “Generalized Cross–Validation as a Method for Choosing a Good Ridge Parameter.” Technometrics, 21(2), 215–223.MathSciNetMATHCrossRefGoogle Scholar
  124. Good P (2000). Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer.Google Scholar
  125. Gowen A, Downey G, Esquerre C, O’Donnell C (2010). “Preventing Over–Fitting in PLS Calibration Models of Near-Infrared (NIR) Spectroscopy Data Using Regression Coefficients.” Journal of Chemometrics, 25, 375–381.CrossRefGoogle Scholar
  126. Graybill F (1976). Theory and Application of the Linear Model. Wadsworth & Brooks, Pacific Grove, CA.MATHGoogle Scholar
  127. Guo Y, Hastie T, Tibshirani R (2007). “Regularized Linear Discriminant Analysis and its Application in Microarrays.” Biostatistics, 8(1), 86–100.MATHCrossRefGoogle Scholar
  128. Gupta S, Hanssens D, Hardie B, Kahn W, Kumar V, Lin N, Ravishanker N, Sriram S (2006). “Modeling Customer Lifetime Value.” Journal of Service Research, 9(2), 139–155.CrossRefGoogle Scholar
  129. Guyon I, Elisseeff A (2003). “An Introduction to Variable and Feature Selection.” The Journal of Machine Learning Research, 3, 1157–1182.MATHGoogle Scholar
  130. Guyon I, Weston J, Barnhill S, Vapnik V (2002). “Gene Selection for Cancer Classification Using Support Vector Machines.” Machine Learning, 46(1), 389–422.MATHCrossRefGoogle Scholar
  131. Hall M, Smith L (1997). “Feature Subset Selection: A Correlation Based Filter Approach.” International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855–858.Google Scholar
  132. Hall P, Hyndman R, Fan Y (2004). “Nonparametric Confidence Intervals for Receiver Operating Characteristic Curves.” Biometrika, 91, 743–750.MathSciNetMATHCrossRefGoogle Scholar
  133. Hampel H, Frank R, Broich K, Teipel S, Katz R, Hardy J, Herholz K, Bokde A, Jessen F, Hoessler Y (2010). “Biomarkers for Alzheimer’s Disease: Academic, Industry and Regulatory Perspectives.” Nature Reviews Drug Discovery, 9(7), 560–574.CrossRefGoogle Scholar
  134. Hand D, Till R (2001). “A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems.” Machine Learning, 45(2), 171–186.MATHCrossRefGoogle Scholar
  135. Hanley J, McNeil B (1982). “The Meaning and Use of the Area under a Receiver Operating (ROC) Curvel Characteristic.” Radiology, 143(1), 29–36.CrossRefGoogle Scholar
  136. Hardle W, Werwatz A, Müller M, Sperlich S, Hardle W, Werwatz A, Müller M, Sperlich S (2004). “Nonparametric Density Estimation.” In “Nonparametric and Semiparametric Models,” pp. 39–83. Springer Berlin Heidelberg.Google Scholar
  137. Harrell F (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York.MATHCrossRefGoogle Scholar
  138. Hastie T, Pregibon D (1990). “Shrinking Trees.” Technical report, AT&T Bell Laboratories Technical Report.Google Scholar
  139. Hastie T, Tibshirani R (1990). Generalized Additive Models. Chapman & Hall/CRC.Google Scholar
  140. Hastie T, Tibshirani R (1996). “Discriminant Analysis by Gaussian Mixtures.” Journal of the Royal Statistical Society. Series B, pp. 155–176.Google Scholar
  141. Hastie T, Tibshirani R, Buja A (1994). “Flexible Discriminant Analysis by Optimal Scoring.” Journal of the American Statistical Association, 89(428), 1255–1270.MathSciNetMATHCrossRefGoogle Scholar
  142. Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.Google Scholar
  143. Hawkins D (2004). “The Problem of Overfitting.” Journal of Chemical Information and Computer Sciences, 44(1), 1–12.CrossRefGoogle Scholar
  144. Hawkins D, Basak S, Mills D (2003). “Assessing Model Fit by Cross–Validation.” Journal of Chemical Information and Computer Sciences, 43(2), 579–586.CrossRefGoogle Scholar
  145. Henderson H, Velleman P (1981). “Building Multiple Regression Models Interactively.” Biometrics, pp. 391–411.Google Scholar
  146. Hesterberg T, Choi N, Meier L, Fraley C (2008). “Least Angle and L 1 Penalized Regression: A Review.” Statistics Surveys, 2, 61–93.MathSciNetMATHCrossRefGoogle Scholar
  147. Heyman R, Slep A (2001). “The Hazards of Predicting Divorce Without Cross-validation.” Journal of Marriage and the Family, 63(2), 473.CrossRefGoogle Scholar
  148. Hill A, LaPan P, Li Y, Haney S (2007). “Impact of Image Segmentation on High–Content Screening Data Quality for SK–BR-3 Cells.” BMC Bioinformatics, 8(1), 340.CrossRefGoogle Scholar
  149. Ho T (1998). “The Random Subspace Method for Constructing Decision Forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 340–354.Google Scholar
  150. Hoerl A (1970). “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics, 12(1), 55–67.MathSciNetMATHCrossRefGoogle Scholar
  151. Holland J (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI.Google Scholar
  152. Holland J (1992). Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA.Google Scholar
  153. Holmes G, Hall M, Frank E (1993). “Generating Rule Sets from Model Trees.” In “Australian Joint Conference on Artificial Intelligence,”.Google Scholar
  154. Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.MathSciNetCrossRefGoogle Scholar
  155. Hothorn T, Leisch F, Zeileis A, Hornik K (2005). “The Design and Analysis of Benchmark Experiments.” Journal of Computational and Graphical Statistics, 14(3), 675–699.MathSciNetCrossRefGoogle Scholar
  156. Hsieh W, Tang B (1998). “Applying Neural Network Models to Prediction and Data Analysis in Meteorology and Oceanography.” Bulletin of the American Meteorological Society, 79(9), 1855–1870.CrossRefGoogle Scholar
  157. Hsu C, Lin C (2002). “A Comparison of Methods for Multiclass Support Vector Machines.” IEEE Transactions on Neural Networks, 13(2), 415–425.CrossRefGoogle Scholar
  158. Huang C, Chang B, Cheng D, Chang C (2012). “Feature Selection and Parameter Optimization of a Fuzzy-Based Stock Selection Model Using Genetic Algorithms.” International Journal of Fuzzy Systems, 14(1), 65–75.MathSciNetGoogle Scholar
  159. Huuskonen J (2000). “Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology.” Journal of Chemical Information and Computer Sciences, 40(3), 773–777.CrossRefGoogle Scholar
  160. Ihaka R, Gentleman R (1996). “R: A Language for Data Analysis and Graphics.” Journal of Computational and Graphical Statistics, 5(3), 299–314.Google Scholar
  161. Jeatrakul P, Wong K, Fung C (2010). “Classification of Imbalanced Data By Combining the Complementary Neural Network and SMOTE Algorithm.” Neural Information Processing. Models and Applications, pp. 152–159.Google Scholar
  162. Jerez J, Molina I, Garcia-Laencina P, Alba R, Ribelles N, Martin M, Franco L (2010). “Missing Data Imputation Using Statistical and Machine Learning Methods in a Real Breast Cancer Problem.” Artificial Intelligence in Medicine, 50, 105–115.CrossRefGoogle Scholar
  163. John G, Kohavi R, Pfleger K (1994). “Irrelevant Features and the Subset Selection Problem.” Proceedings of the Eleventh International Conference on Machine Learning, 129, 121–129.Google Scholar
  164. Johnson K, Rayens W (2007). “Modern Classification Methods for Drug Discovery.” In A Dmitrienko, C Chuang-Stein, R D’Agostino (eds.), “Pharmaceutical Statistics Using SAS: A Practical Guide,” pp. 7–43. Cary, NC: SAS Institute Inc.Google Scholar
  165. Johnson R, Wichern D (2001). Applied Multivariate Statistical Analysis. Prentice Hall.Google Scholar
  166. Jolliffe I, Trendafilov N, Uddin M (2003). “A Modified Principal Component Technique Based on the lasso.” Journal of Computational and Graphical Statistics, 12(3), 531–547.MathSciNetCrossRefGoogle Scholar
  167. Kansy M, Senner F, Gubernator K (1998). “Physiochemical High Throughput Screening: Parallel Artificial Membrane Permeation Assay in the Description of Passive Absorption Processes.” Journal of Medicinal Chemistry, 41, 1007–1010.CrossRefGoogle Scholar
  168. Karatzoglou A, Smola A, Hornik K, Zeileis A (2004). “kernlab - An S4 Package for Kernel Methods in R.” Journal of Statistical Software, 11(9), 1–20.CrossRefGoogle Scholar
  169. Kearns M, Valiant L (1989). “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.” In “Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing,”.Google Scholar
  170. Kim J, Basak J, Holtzman D (2009). “The Role of Apolipoprotein E in Alzheimer’s Disease.” Neuron, 63(3), 287–303.CrossRefGoogle Scholar
  171. Kim JH (2009). “Estimating Classification Error Rate: Repeated Cross–Validation, Repeated Hold–Out and Bootstrap.” Computational Statistics & Data Analysis, 53(11), 3735–3745.MathSciNetMATHCrossRefGoogle Scholar
  172. Kimball A (1957). “Errors of the Third Kind in Statistical Consulting.” Journal of the American Statistical Association, 52, 133–142.CrossRefGoogle Scholar
  173. Kira K, Rendell L (1992). “The Feature Selection Problem: Traditional Methods and a New Algorithm.” Proceedings of the National Conference on Artificial Intelligence, pp. 129–129.Google Scholar
  174. Kline DM, Berardi VL (2005). “Revisiting Squared–Error and Cross–Entropy Functions for Training Neural Network Classifiers.” Neural Computing and Applications, 14(4), 310–318.CrossRefGoogle Scholar
  175. Kohavi R (1995). “A Study of Cross–Validation and Bootstrap for Accuracy Estimation and Model Selection.” International Joint Conference on Artificial Intelligence, 14, 1137–1145.Google Scholar
  176. Kohavi R (1996). “Scaling Up the Accuracy of Naive–Bayes Classifiers: A Decision–Tree Hybrid.” In “Proceedings of the second international conference on knowledge discovery and data mining,” volume 7.Google Scholar
  177. Kohonen T (1995). Self–Organizing Maps. Springer.Google Scholar
  178. Kononenko I (1994). “Estimating Attributes: Analysis and Extensions of Relief.” In F Bergadano, L De Raedt (eds.), “Machine Learning: ECML–94,” volume 784, pp. 171–182. Springer Berlin / Heidelberg.Google Scholar
  179. Kuhn M (2008). “Building Predictive Models in R Using the caret Package.” Journal of Statistical Software, 28(5).Google Scholar
  180. Kuhn M (2010). “The caret Package Homepage.” URL http://caret.r-forge.r-project.org/.
  181. Kuiper S (2008). “Introduction to Multiple Regression: How Much Is Your Car Worth?” Journal of Statistics Education, 16(3).Google Scholar
  182. Kvålseth T (1985). “Cautionary Note About R 2.” American Statistician, 39(4), 279–285.Google Scholar
  183. Lachiche N, Flach P (2003). “Improving Accuracy and Cost of Two–Class and Multi–Class Probabilistic Classifiers using ROC Curves.” In “Proceedings of the Twentieth International Conference on Machine Learning,” volume 20, pp. 416–424.Google Scholar
  184. Larose D (2006). Data Mining Methods and Models. Wiley.Google Scholar
  185. Lavine B, Davidson C, Moores A (2002). “Innovative Genetic Algorithms for Chemoinformatics.” Chemometrics and Intelligent Laboratory Systems, 60(1), 161–171.CrossRefGoogle Scholar
  186. Leach A, Gillet V (2003). An Introduction to Chemoinformatics. Springer.Google Scholar
  187. Leisch F (2002a). “Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis.” In W Härdle, B Rönz (eds.), “Compstat 2002 — Proceedings in Computational Statistics,” pp. 575–580. Physica Verlag, Heidelberg.Google Scholar
  188. Leisch F (2002b). “Sweave, Part I: Mixing R and LaTeX.” R News, 2(3), 28–31.Google Scholar
  189. Levy S (2010). “The AI Revolution is On.” Wired.Google Scholar
  190. Li J, Fine JP (2008). “ROC Analysis with Multiple Classes and Multiple Tests: Methodology and Its Application in Microarray Studies.” Biostatistics, 9(3), 566–576.MATHCrossRefGoogle Scholar
  191. Lindgren F, Geladi P, Wold S (1993). “The Kernel Algorithm for PLS.” Journal of Chemometrics, 7, 45–59.CrossRefGoogle Scholar
  192. Ling C, Li C (1998). “Data Mining for Direct Marketing: Problems and solutions.” In “Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining,” pp. 73–79.Google Scholar
  193. Lipinski C, Lombardo F, Dominy B, Feeney P (1997). “Experimental and Computational Approaches To Estimate Solubility and Permeability In Drug Discovery and Development Settings.” Advanced Drug Delivery Reviews, 23, 3–25.CrossRefGoogle Scholar
  194. Liu B (2007). Web Data Mining. Springer Berlin / Heidelberg.Google Scholar
  195. Liu Y, Rayens W (2007). “PLS and Dimension Reduction for Classification.” Computational Statistics, pp. 189–208.Google Scholar
  196. Lo V (2002). “The True Lift Model: A Novel Data Mining Approach To Response Modeling in Database Marketing.” ACM SIGKDD Explorations Newsletter, 4(2), 78–86.CrossRefGoogle Scholar
  197. Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002). “Text Classification Using String Kernels.” The Journal of Machine Learning Research, 2, 419–444.MATHGoogle Scholar
  198. Loh WY (2002). “Regression Trees With Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12, 361–386.MathSciNetMATHGoogle Scholar
  199. Loh WY (2010). “Tree–Structured Classifiers.” Wiley Interdisciplinary Reviews: Computational Statistics, 2, 364–369.CrossRefGoogle Scholar
  200. Loh WY, Shih YS (1997). “Split Selection Methods for Classification Trees.” Statistica Sinica, 7, 815–840.MathSciNetMATHGoogle Scholar
  201. Mahé P, Ueda N, Akutsu T, Perret J, Vert J (2005). “Graph Kernels for Molecular Structure–Activity Relationship Analysis with Support Vector Machines.” Journal of Chemical Information and Modeling, 45(4), 939–951.CrossRefGoogle Scholar
  202. Mahé P, Vert J (2009). “Graph Kernels Based on Tree Patterns for Molecules.” Machine Learning, 75(1), 3–35.CrossRefGoogle Scholar
  203. Maindonald J, Braun J (2007). Data Analysis and Graphics Using R. Cambridge University Press, 2nd edition.Google Scholar
  204. Mandal A, Johnson K, Wu C, Bornemeier D (2007). “Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Techniques.” Journal of Chemical Information and Modeling, 47(3), 981–988.CrossRefGoogle Scholar
  205. Mandal A, Wu C, Johnson K (2006). “SELC: Sequential Elimination of Level Combinations by Means of Modified Genetic Algorithms.” Technometrics, 48(2), 273–283.MathSciNetCrossRefGoogle Scholar
  206. Martin J, Hirschberg D (1996). “Small Sample Statistics for Classification Error Rates I: Error Rate Measurements.” Department of Informatics and Computer Science Technical Report.Google Scholar
  207. Martin T, Harten P, Young D, Muratov E, Golbraikh A, Zhu H, Tropsha A (2012). “Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling?” Journal of Chemical Information and Modeling, 52(10), 2570–2578.CrossRefGoogle Scholar
  208. Massy W (1965). “Principal Components Regression in Exploratory Statistical Research.” Journal of the American Statistical Association, 60, 234–246.CrossRefGoogle Scholar
  209. McCarren P, Springer C, Whitehead L (2011). “An Investigation into Pharmaceutically Relevant Mutagenicity Data and the Influence on Ames Predictive Potential.” Journal of Cheminformatics, 3(51).Google Scholar
  210. McClish D (1989). “Analyzing a Portion of the ROC Curve.” Medical Decision Making, 9, 190–195.CrossRefGoogle Scholar
  211. Melssen W, Wehrens R, Buydens L (2006). “Supervised Kohonen Networks for Classification Problems.” Chemometrics and Intelligent Laboratory Systems, 83(2), 99–113.CrossRefGoogle Scholar
  212. Mente S, Lombardo F (2005). “A Recursive–Partitioning Model for Blood–Brain Barrier Permeation.” Journal of Computer–Aided Molecular Design, 19(7), 465–481.CrossRefGoogle Scholar
  213. Menze B, Kelm B, Splitthoff D, Koethe U, Hamprecht F (2011). “On Oblique Random Forests.” Machine Learning and Knowledge Discovery in Databases, pp. 453–469.Google Scholar
  214. Mevik B, Wehrens R (2007). “The pls Package: Principal Component and Partial Least Squares Regression in R.” Journal of Statistical Software, 18(2), 1–24.CrossRefGoogle Scholar
  215. Michailidis G, de Leeuw J (1998). “The Gifi System Of Descriptive Multivariate Analysis.” Statistical Science, 13, 307–336.MathSciNetMATHCrossRefGoogle Scholar
  216. Milborrow S (2012). Notes On the earth Package. URL http://cran.r-project.org/package=earth.
  217. Min S, Lee J, Han I (2006). “Hybrid Genetic Algorithms and Support Vector Machines for Bankruptcy Prediction.” Expert Systems with Applications, 31(3), 652–660.CrossRefGoogle Scholar
  218. Mitchell M (1998). An Introduction to Genetic Algorithms. MIT Press.Google Scholar
  219. Molinaro A (2005). “Prediction Error Estimation: A Comparison of Resampling Methods.” Bioinformatics, 21(15), 3301–3307.CrossRefGoogle Scholar
  220. Molinaro A, Lostritto K, Van Der Laan M (2010). “partDSA: Deletion/Substitution/Addition Algorithm for Partitioning the Covariate Space in Prediction.” Bioinformatics, 26(10), 1357–1363.CrossRefGoogle Scholar
  221. Montgomery D, Runger G (1993). “Gauge Capability and Designed Experiments. Part I: Basic Methods.” Quality Engineering, 6(1), 115–135.CrossRefGoogle Scholar
  222. Muenchen R (2009). R for SAS and SPSS Users. Springer.Google Scholar
  223. Myers R (1994). Classical and Modern Regression with Applications. PWS-KENT Publishing Company, Boston, MA, second edition.Google Scholar
  224. Myers R, Montgomery D (2009). Response Surface Methodology: Process and Product Optimization Using Designed Experiments. Wiley, New York, NY.MATHGoogle Scholar
  225. Neal R (1996). Bayesian Learning for Neural Networks. Springer-Verlag.Google Scholar
  226. Nelder J, Mead R (1965). “A Simplex Method for Function Minimization.” The Computer Journal, 7(4), 308–313.MATHCrossRefGoogle Scholar
  227. Netzeva T, Worth A, Aldenberg T, Benigni R, Cronin M, Gramatica P, Jaworska J, Kahn S, Klopman G, Marchant C (2005). “Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure–Activity Relationships.” In “The Report and Recommendations of European Centre for the Validation of Alternative Methods Workshop 52,” volume 33, pp. 1–19.Google Scholar
  228. Niblett T (1987). “Constructing Decision Trees in Noisy Domains.” In I Bratko, N Lavrač (eds.), “Progress in Machine Learning: Proceedings of EWSL–87,” pp. 67–78. Sigma Press, Bled, Yugoslavia.Google Scholar
  229. Olden J, Jackson D (2000). “Torturing Data for the Sake of Generality: How Valid Are Our Regression Models?” Ecoscience, 7(4), 501–510.Google Scholar
  230. Olsson D, Nelson L (1975). “The Nelder–Mead Simplex Procedure for Function Minimization.” Technometrics, 17(1), 45–51.MATHCrossRefGoogle Scholar
  231. Osuna E, Freund R, Girosi F (1997). “Support Vector Machines: Training and Applications.” Technical report, MIT Artificial Intelligence Laboratory.Google Scholar
  232. Ozuysal M, Calonder M, Lepetit V, Fua P (2010). “Fast Keypoint Recognition Using Random Ferns.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.CrossRefGoogle Scholar
  233. Park M, Hastie T (2008). “Penalized Logistic Regression for Detecting Gene Interactions.” Biostatistics, 9(1), 30.MATHCrossRefGoogle Scholar
  234. Pepe MS, Longton G, Janes H (2009). “Estimation and Comparison of Receiver Operating Characteristic Curves.” Stata Journal, 9(1), 1–16.Google Scholar
  235. Perrone M, Cooper L (1993). “When Networks Disagree: Ensemble Methods for Hybrid Neural Networks.” In RJ Mammone (ed.), “Artificial Neural Networks for Speech and Vision,” pp. 126–142. Chapman & Hall, London.Google Scholar
  236. Piersma A, Genschow E, Verhoef A, Spanjersberg M, Brown N, Brady M, Burns A, Clemann N, Seiler A, Spielmann H (2004). “Validation of the Postimplantation Rat Whole-embryo Culture Test in the International ECVAM Validation Study on Three In Vitro Embryotoxicity Tests.” Alternatives to Laboratory Animals, 32, 275–307.Google Scholar
  237. Platt J (2000). “Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods.” In B Bartlett, B Schölkopf, D Schuurmans, A Smola (eds.), “Advances in Kernel Methods Support Vector Learning,” pp. 61–74. Cambridge, MA: MIT Press.Google Scholar
  238. Provost F, Domingos P (2003). “Tree Induction for Probability–Based Ranking.” Machine Learning, 52(3), 199–215.MATHCrossRefGoogle Scholar
  239. Provost F, Fawcett T, Kohavi R (1998). “The Case Against Accuracy Estimation for Comparing Induction Algorithms.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453.Google Scholar
  240. Quinlan R (1987). “Simplifying Decision Trees.” International Journal of Man–Machine Studies, 27(3), 221–234.CrossRefGoogle Scholar
  241. Quinlan R (1992). “Learning with Continuous Classes.” Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, pp. 343–348.Google Scholar
  242. Quinlan R (1993a). “Combining Instance–Based and Model–Based Learning.” Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243.Google Scholar
  243. Quinlan R (1993b). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.Google Scholar
  244. Quinlan R (1996a). “Bagging, Boosting, and C4.5.” In “In Proceedings of the Thirteenth National Conference on Artificial Intelligence,”.Google Scholar
  245. Quinlan R (1996b). “Improved use of continuous attributes in C4.5.” Journal of Artificial Intelligence Research, 4, 77–90.Google Scholar
  246. Quinlan R, Rivest R (1989). “Inferring Decision Trees Using the Minimum Description Length Principle.” Information and computation, 80(3), 227–248.MathSciNetMATHCrossRefGoogle Scholar
  247. Radcliffe N, Surry P (2011). “Real–World Uplift Modelling With Significance–Based Uplift Trees.” Technical report, Stochastic Solutions.Google Scholar
  248. Rännar S, Lindgren F, Geladi P, Wold S (1994). “A PLS Kernel Algorithm for Data Sets with Many Variables and Fewer Objects. Part 1: Theory and Algorithm.” Journal of Chemometrics, 8, 111–125.Google Scholar
  249. R Development Core Team (2008). R: Regulatory Compliance and Validation Issues A Guidance Document for the Use of R in Regulated Clinical Trial Environments. R Foundation for Statistical Computing, Vienna, Austria.Google Scholar
  250. R Development Core Team (2010). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.Google Scholar
  251. Reshef D, Reshef Y, Finucane H, Grossman S, McVean G, Turnbaugh P, Lander E, Mitzenmacher M, Sabeti P (2011). “Detecting Novel Associations in Large Data Sets.” Science, 334(6062), 1518–1524.CrossRefGoogle Scholar
  252. Richardson M, Dominowska E, Ragno R (2007). “Predicting Clicks: Estimating the Click–Through Rate for New Ads.” In “Proceedings of the 16th International Conference on the World Wide Web,” pp. 521–530.Google Scholar
  253. Ridgeway G (2007). “Generalized Boosted Models: A Guide to the gbm Package.” URL http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf.
  254. Ripley B (1995). “Statistical Ideas for Selecting Network Architectures.” Neural Networks: Artificial Intelligence and Industrial Applications, pp. 183–190.Google Scholar
  255. Ripley B (1996). Pattern Recognition and Neural Networks. Cambridge University Press.Google Scholar
  256. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M (2011). “pROC: an open-source package for R and S+ to analyze and compare ROC curves.” BMC Bioinformatics, 12(1), 77.Google Scholar
  257. Robnik-Sikonja M, Kononenko I (1997). “An Adaptation of Relief for Attribute Estimation in Regression.” Proceedings of the Fourteenth International Conference on Machine Learning, pp. 296–304.Google Scholar
  258. Rodriguez M (2011). “The Failure of Predictive Modeling and Why We Follow the Herd.” Technical report, Concepcion, Martinez & Bellido.Google Scholar
  259. Ruczinski I, Kooperberg C, Leblanc M (2003). “Logic Regression.” Journal of Computational and Graphical Statistics, 12(3), 475–511.MathSciNetMATHCrossRefGoogle Scholar
  260. Rumelhart D, Hinton G, Williams R (1986). “Learning Internal Representations by Error Propagation.” In “Parallel Distributed Processing: Explorations in the Microstructure of Cognition,” The MIT Press.Google Scholar
  261. Rzepakowski P, Jaroszewicz S (2012). “Uplift Modeling in Direct Marketing.” Journal of Telecommunications and Information Technology, 2, 43–50.Google Scholar
  262. Saar-Tsechansky M, Provost F (2007a). “Decision–Centric Active Learning of Binary–Outcome Models.” Information Systems Research, 18(1), 4–22.MATHCrossRefGoogle Scholar
  263. Saar-Tsechansky M, Provost F (2007b). “Handling Missing Values When Applying Classification Models.” Journal of Machine Learning Research, 8, 1625–1657.MATHGoogle Scholar
  264. Saeys Y, Inza I, Larranaga P (2007). “A Review of Feature Selection Techniques in Bioinformatics.” Bioinformatics, 23(19), 2507–2517.CrossRefGoogle Scholar
  265. Schapire R (1990). “The Strength of Weak Learnability.” Machine Learning, 45, 197–227.Google Scholar
  266. Schapire YFR (1999). “Adaptive Game Playing Using Multiplicative Weights.” Games and Economic Behavior, 29, 79–103.MathSciNetMATHCrossRefGoogle Scholar
  267. Schmidberger M, Morgan M, Eddelbuettel D, Yu H, Tierney L, Mansmann U (2009). “State–of–the–Art in Parallel Computing with R.” Journal of Statistical Software, 31(1).Google Scholar
  268. Serneels S, Nolf ED, Espen PV (2006). “Spatial Sign Pre-processing: A Simple Way to Impart Moderate Robustness to Multivariate Estimators.” Journal of Chemical Information and Modeling, 46(3), 1402–1409.CrossRefGoogle Scholar
  269. Shachtman N (2011). “Pentagon’s Prediction Software Didn’t Spot Egypt Unrest.” Wired.Google Scholar
  270. Shannon C (1948). “A Mathematical Theory of Communication.” The Bell System Technical Journal, 27(3), 379–423.MathSciNetMATHCrossRefGoogle Scholar
  271. Siegel E (2011). “Uplift Modeling: Predictive Analytics Can’t Optimize Marketing Decisions Without It.” Technical report, Prediction Impact Inc.Google Scholar
  272. Simon R, Radmacher M, Dobbin K, McShane L (2003). “Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification.” Journal of the National Cancer Institute, 95(1), 14–18.CrossRefGoogle Scholar
  273. Smola A (1996). “Regression Estimation with Support Vector Learning Machines.” Master’s thesis, Technische Universit at Munchen.Google Scholar
  274. Spector P (2008). Data Manipulation with R. Springer.Google Scholar
  275. Steyerberg E (2010). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, 1st ed. softcover of orig. ed. 2009 edition.Google Scholar
  276. Stone M, Brooks R (1990). “Continuum Regression: Cross-validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares, and Principal Component Regression.” Journal of the Royal Statistical Society, Series B, 52, 237–269.MathSciNetMATHGoogle Scholar
  277. Strobl C, Boulesteix A, Zeileis A, Hothorn T (2007). “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics, 8(1), 25.CrossRefGoogle Scholar
  278. Suykens J, Vandewalle J (1999). “Least Squares Support Vector Machine Classifiers.” Neural processing letters, 9(3), 293–300.MathSciNetMATHCrossRefGoogle Scholar
  279. Tetko I, Tanchuk V, Kasheva T, Villa A (2001). “Estimation of Aqueous Solubility of Chemical Compounds Using E–State Indices.” Journal of Chemical Information and Computer Sciences, 41(6), 1488–1493.CrossRefGoogle Scholar
  280. Tibshirani R (1996). “Regression Shrinkage and Selection via the lasso.” Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.MathSciNetMATHGoogle Scholar
  281. Tibshirani R, Hastie T, Narasimhan B, Chu G (2002). “Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.” Proceedings of the National Academy of Sciences, 99(10), 6567–6572.CrossRefGoogle Scholar
  282. Tibshirani R, Hastie T, Narasimhan B, Chu G (2003). “Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays.” Statistical Science, 18(1), 104–117.MathSciNetMATHCrossRefGoogle Scholar
  283. Ting K (2002). “An Instance–Weighting Method to Induce Cost–Sensitive Trees.” IEEE Transactions on Knowledge and Data Engineering, 14(3), 659–665.CrossRefGoogle Scholar
  284. Tipping M (2001). “Sparse Bayesian Learning and the Relevance Vector Machine.” Journal of Machine Learning Research, 1, 211–244.MathSciNetMATHGoogle Scholar
  285. Titterington M (2010). “Neural Networks.” Wiley Interdisciplinary Reviews: Computational Statistics, 2(1), 1–8.MATHCrossRefGoogle Scholar
  286. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R (2001). “Missing Value Estimation Methods for DNA Microarrays.” Bioinformatics, 17(6), 520–525.CrossRefGoogle Scholar
  287. Tumer K, Ghosh J (1996). “Analysis of Decision Boundaries in Linearly Combined Neural Classifiers.” Pattern Recognition, 29(2), 341–348.CrossRefGoogle Scholar
  288. US Commodity Futures Trading Commission and US Securities & Exchange Commission (2010). Findings Regarding the Market Events of May 6, 2010.Google Scholar
  289. Valiant L (1984). “A Theory of the Learnable.” Communications of the ACM, 27, 1134–1142.MATHCrossRefGoogle Scholar
  290. Van Der Putten P, Van Someren M (2004). “A Bias–Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000.” Machine Learning, 57(1), 177–195.MATHCrossRefGoogle Scholar
  291. Van Hulse J, Khoshgoftaar T, Napolitano A (2007). “Experimental Perspectives On Learning From Imbalanced Data.” In “Proceedings of the 24th International Conference On Machine learning,” pp. 935–942.Google Scholar
  292. Vapnik V (2010). The Nature of Statistical Learning Theory. Springer.Google Scholar
  293. Varma S, Simon R (2006). “Bias in Error Estimation When Using Cross–Validation for Model Selection.” BMC Bioinformatics, 7(1), 91.MathSciNetCrossRefGoogle Scholar
  294. Varmuza K, He P, Fang K (2003). “Boosting Applied to Classification of Mass Spectral Data.” Journal of Data Science, 1, 391–404.Google Scholar
  295. Venables W, Ripley B (2002). Modern Applied Statistics with S. Springer.Google Scholar
  296. Venables W, Smith D, the R Development Core Team (2003). An Introduction to R. R Foundation for Statistical Computing, Vienna, Austria, version 1.6.2 edition. ISBN 3-901167-55-2, URL http://www.R-project.org.
  297. Venkatraman E (2000). “A Permutation Test to Compare Receiver Operating Characteristic Curves.” Biometrics, 56(4), 1134–1138.MathSciNetMATHCrossRefGoogle Scholar
  298. Veropoulos K, Campbell C, Cristianini N (1999). “Controlling the Sensitivity of Support Vector Machines.” Proceedings of the International Joint Conference on Artificial Intelligence, 1999, 55–60.Google Scholar
  299. Verzani J (2002). “simpleR – Using R for Introductory Statistics.” URL http://www.math.csi.cuny.edu/Statistics/R/simpleR.
  300. Wager TT, Hou X, Verhoest PR, Villalobos A (2010). “Moving Beyond Rules: The Development of a Central Nervous System Multiparameter Optimization (CNS MPO) Approach To Enable Alignment of Druglike Properties.” ACS Chemical Neuroscience, 1(6), 435–449.CrossRefGoogle Scholar
  301. Wallace C (2005). Statistical and Inductive Inference by Minimum Message Length. Springer–Verlag.Google Scholar
  302. Wang C, Venkatesh S (1984). “Optimal Stopping and Effective Machine Complexity in Learning.” Advances in NIPS, pp. 303–310.Google Scholar
  303. Wang Y, Witten I (1997). “Inducing Model Trees for Continuous Classes.” Proceedings of the Ninth European Conference on Machine Learning, pp. 128–137.Google Scholar
  304. Weiss G, Provost F (2001a). “The Effect of Class Distribution on Classifier Learning: An Empirical Study.” Department of Computer Science, Rutgers University.Google Scholar
  305. Weiss G, Provost F (2001b). “The Effect of Class Distribution On Classifier Learning: An Empirical Study.” Technical Report ML-TR-44, Department of Computer Science, Rutgers University.Google Scholar
  306. Welch B (1939). “Note on Discriminant Functions.” Biometrika, 31, 218–220.MathSciNetMATHGoogle Scholar
  307. Westfall P, Young S (1993). Resampling–Based Multiple Testing: Examples and Methods for P–Value Adjustment. Wiley.Google Scholar
  308. Westphal C (2008). Data Mining for Intelligence, Fraud & Criminal Detection: Advanced Analytics & Information Sharing Technologies. CRC Press.Google Scholar
  309. Whittingham M, Stephens P, Bradbury R, Freckleton R (2006). “Why Do We Still Use Stepwise Modelling in Ecology and Behaviour?” Journal of Animal Ecology, 75(5), 1182–1189.CrossRefGoogle Scholar
  310. Willett P (1999). “Dissimilarity–Based Algorithms for Selecting Structurally Diverse Sets of Compounds.” Journal of Computational Biology, 6(3), 447–457.MathSciNetCrossRefGoogle Scholar
  311. Williams G (2011). Data Mining with Rattle and R : The Art of Excavating Data for Knowledge Discovery. Springer.Google Scholar
  312. Witten D, Tibshirani R (2009). “Covariance–Regularized Regression and Classification For High Dimensional Problems.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 71(3), 615–636.MathSciNetMATHCrossRefGoogle Scholar
  313. Witten D, Tibshirani R (2011). “Penalized Classification Using Fisher’s Linear Discriminant.” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73(5), 753–772.MathSciNetMATHCrossRefGoogle Scholar
  314. Wold H (1966). “Estimation of Principal Components and Related Models by Iterative Least Squares.” In P Krishnaiah (ed.), “Multivariate Analyses,” pp. 391–420. Academic Press, New York.Google Scholar
  315. Wold H (1982). “Soft Modeling: The Basic Design and Some Extensions.” In K Joreskog, H Wold (eds.), “Systems Under Indirect Observation: Causality, Structure, Prediction,” pt. 2, pp. 1–54. North–Holland, Amsterdam.Google Scholar
  316. Wold S (1995). “PLS for Multivariate Linear Modeling.” In H van de Waterbeemd (ed.), “Chemometric Methods in Molecular Design,” pp. 195–218. VCH, Weinheim.Google Scholar
  317. Wold S, Johansson M, Cocchi M (1993). “PLS–Partial Least-Squares Projections to Latent Structures.” In H Kubinyi (ed.), “3D QSAR in Drug Design,” volume 1, pp. 523–550. Kluwer Academic Publishers, The Netherlands.Google Scholar
  318. Wold S, Martens H, Wold H (1983). “The Multivariate Calibration Problem in Chemistry Solved by the PLS Method.” In “Proceedings from the Conference on Matrix Pencils,” Springer–Verlag, Heidelberg.MATHGoogle Scholar
  319. Wolpert D (1996). “The Lack of a priori Distinctions Between Learning Algorithms.” Neural Computation, 8(7), 1341–1390.CrossRefGoogle Scholar
  320. Yeh I (1998). “Modeling of Strength of High-Performance Concrete Using Artificial Neural Networks.” Cement and Concrete research, 28(12), 1797–1808.CrossRefGoogle Scholar
  321. Yeh I (2006). “Analysis of Strength of Concrete Using Design of Experiments and Neural Networks.” Journal of Materials in Civil Engineering, 18, 597–604.CrossRefGoogle Scholar
  322. Youden W (1950). “Index for Rating Diagnostic Tests.” Cancer, 3(1), 32–35.CrossRefGoogle Scholar
  323. Zadrozny B, Elkan C (2001). “Obtaining Calibrated Probability Estimates from Decision Trees and Naive Bayesian Classifiers.” In “Proceedings of the 18th International Conference on Machine Learning,” pp. 609–616. Morgan Kaufmann.Google Scholar
  324. Zeileis A, Hothorn T, Hornik K (2008). “Model–Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514.MathSciNetCrossRefGoogle Scholar
  325. Zhu J, Hastie T (2005). “Kernel Logistic Regression and the Import Vector Machine.” Journal of Computational and Graphical Statistics, 14(1), 185–205.MathSciNetCrossRefGoogle Scholar
  326. Zou H, Hastie T (2005). “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society, Series B, 67(2), 301–320.MathSciNetMATHCrossRefGoogle Scholar
  327. Zou H, Hastie T, Tibshirani R (2004). “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics, 15, 2006.MathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Max Kuhn
    • 1
  • Kjell Johnson
    • 2
  1. 1.Division of Nonclinical StatisticsPfizer Global Research and DevelopmentGrotonUSA
  2. 2.Arbor AnalyticsSalineUSA

Personalised recommendations