Hydrometeorological variables predict fecal indicator bacteria densities in freshwater: data-driven methods for variable selection
- 393 Downloads
Statistical models of microbial water quality inform risk management for water recreation. Current research focuses on resource-intensive, location-specific data collection and water quality modeling, but this approach may be cost-prohibitive for risk managers responsible for numerous recreation sites. As an alternative, we tested the ability of two data-driven models, tree regression and random forests with conditional inference trees, to select readily available hydrometeorological variables for use in linear mixed effects (LME) models predicting bacterial density. The study included the Chicago Area Waterway System (CAWS) and Lake Michigan beaches and harbors in Chicago, Illinois, at which Escherichia coli and enterococci were measured seasonally in 2007–2009. Tree regression node variables reduced data dimensionality by >50 %. Variable importance ranks from random forests were used in a forward-step selection based on R 2 and root mean squared prediction error (RMSPE). We found two to three variables explained bacteria densities well relative to random forests with all variables. LME models with tree- or forest-selected variables performed reasonably well (0.335 < R 2 < 0.658). LME models for Lake Michigan had good prediction accuracy with respect to the single sample maximum standard (72–77 %), but limited sensitivity (23–62 %). Results suggest that our alternative approach is feasible and performs similarly to more resource-intensive approaches.
KeywordsRandom forests Combined sewer overflow Tree regression Rainfall Fecal indicator bacteria
We would like to acknowledge the contributions of the CHEERS sample collection and data management team, particularly, Mr. Ross Gladding, Dr. Margit Javor, Ms. Chiping Nieh, Dr. Peter Scheff, and Ms. Ember Vannoy. The map was created by Mr. Raja Kaliappan. The CHEERS study was funded by the Metropolitan Water Reclamation District of Greater Chicago.
- Edwards, P. J., Headley, A. S., Machin, F. H., & Scarr, A. M. (2003). Factors affecting microbiological water quality at sixteen beaches in South-West Wales. Journal of CIWEM, 17, 45–50.Google Scholar
- Liaw, A., & Wiener, M. (2002). Classification and regression by random forest. R News, 2(3), 18–22.Google Scholar
- Schets, F. M., vanWijnen, J. H., Schijven, J. F., Schoon, H., & de RodaHusman, A. M. (2008). Monitoring of waterborne pathogens in surface waters in Amsterdam, the Netherlands, and the potential health risk associated with exposure to Cryptosporidium and Giardia in these waters. Applied Environmental Microbiology, 74, 2069–2078.CrossRefGoogle Scholar
- Strobl, C, Hothorn, T., & Zeileis, A. (2009) Party on! A new, conditional variable importance measure for random forests available in the party package. Technical Report Number 050, Department of Statistics, University of Munich.Google Scholar
- Svetnik, V., Liaw, A., Tong, C., & Wang, T. (2004). Using Breiman’s random forest to modeling structure–activity relationships of pharmaceutical molecules. Multiple classifier systems, Fifth international workshop, MCS2004, proceedings, 9–11 June, 2004, Caligari, Italy. Lecture notes in computer science, Springer. 3007, 334-343.Google Scholar
- Telech, J. W., Brenner, K. P., Haughland, R., Sams, E., Dufour, A. P., Wymer, L., et al. (2009). Modeling enterococcus densities measured by quantitative polymerase chain reaction and membrane filtration using environmental conditions at four Great Lakes beaches. Water Research, 43, 4947–4955.CrossRefGoogle Scholar
- US EPA. (1986). Ambient water quality criteria for beaches—1986. EPA 440/5-84-002, http://water.epa.gov/scitech/swguidance/standards/criteria/ health/recreation/ upload/2009_04_13_beaches_1986crit.pdf. Accessed on April 12, 2011.
- Wilkes, G., Edge, T., Gannon, V., Jokinen, C., Lyautey, E., Medeiros, D., et al. (2009). Seasonal relationships among indicator bacteria, pathogenic bacteria, Cryptosporidium oocysts, Giardia cysts, and hydrological indices for surface waters within an agricultural landscape. Water Research, 43, 2209–2223.CrossRefGoogle Scholar