Skip to main content

Knowledge Discovery in Environmental Data

  • Conference paper
  • 1127 Accesses

Part of the book series: NATO Science Series ((NAIV,volume 80))

An approach to tackling management problems of water resources is the introduction of the advanced ICTs (Information and Communication Technologies). In IWM (Integrated Water Management) and, in general, in the environmental field the power of these technologies has allowed for large-scale data collection campaigns. The number of parameters that must be measured to monitor an ecosystem is potentially high. Systematic measurement of those parameters generates huge amounts of data that should be suitably interpreted and used. A pragmatic approach has to be used to get the best information = knowledge from all this data. Within such amounts of data there is a lot of hidden information, in terms of models, patterns and trends. But information is difficult to be extracted since data is of varying quantity and quality. As a consequence, semi-automatic knowledge extraction from data has gained great importance within the economic and scientific community. Knowledge Discovery from Databases (KDD) has emerged as a framework where a plethora of techniques for identifying useful and understandable patterns in data have flourished. Most of those techniques can be used with success in the environmental field and, in particular, in IWM. In this paper we give an overview of what KDD is and mention some applications in these areas.

Keywords: Integrated Water Management, Data Mining, Environmental Databases

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abbot, M. B., Babovic, V. M., and Cunge, J. A., 2001, Towards the hydraulics of the hydroinformatics era, Journal of Hydraulic Research 39(4):339-349.

    Google Scholar 

  • Anctil, F., and Tape, D. G., 2004, An exploration of artificial neural network rainfall-runoff forecasting combined with wavelet decomposition, J. Environ. Eng. Sci. 3:S121-S128.

    Article  Google Scholar 

  • Babovic, V., Drècourt, J., Keijzer, M., and Hansen, P., 2002, Modelling of water supply assets: A data mining approach, Urban Water 4(4):401-414.

    Article  Google Scholar 

  • Baratti, R., Cannas, B., Fanni, A., Pintus, M., Sechi, G. M., and Toreno, N., 2003, River flow forecast for reservoir management through neural networks, Neurocomputing 55:421-437.

    Article  Google Scholar 

  • Bardossy, A., and Duckstein, L., 1995, Fuzzy rule-based modeling with application to geophysical, economic, biological and engineering systems, CRC, London.

    Google Scholar 

  • Bargiela, A., and Pedrycz, W., 2003, Granular Computing. An Introduction, Kluwer Academic Publishers, Boston, Dordrecht, London.

    Google Scholar 

  • Baxter, C. W., Smith, D. W., and Stanley, S. J., 2004, A comparison of artificial neural networks and multiple regression methods for the analysis of pilot-scale data, J. Environ. Eng. Sci. 3:S45-S58.

    Article  Google Scholar 

  • Bazartseren, B., Hildebrandt, G., and Holz, K. P., 2003, Short-term water level prediction using neural networks and neuro-fuzzy approach, Neurocomputing, 55:439-450.

    Article  Google Scholar 

  • Bessler, F. T., Savic, D. A., and Walters, G. A., 2003, Water reservoir control with data mining, Journal of water resources planning and management, 129(1):26-34.

    Article  Google Scholar 

  • Bhattacharya, B., Lobbrecht, A. H., and Solomatine, D. P., 2003, Neural Networks and Reinforcement Learning in Control of Water Systems, J. Water Res. Plan. and Mgmnt., 129(6):458-465.

    Article  Google Scholar 

  • Bhattacharya, B., and Solomatine, D. P., 2005, Neural networks and M5 model trees in modeling water level-discharge relationship, Neurocomputing 63:381-396.

    Article  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., 1984, Classification and Regression Trees, Chapman & Hall, New York.

    Google Scholar 

  • Cestnik, B., Kononenko, I., and Bratko, I., 1987, ASSISTANT 86: A knowledge-elicitation tool for sophisticated users, in: Progress in Machine Learning, Bratko, I., and Lavrac, N., eds., Sigma Press, Wilmslow.

    Google Scholar 

  • Cichocki, A., and Unbehauen, R., 1993, Neural Networks for Optimization and Signal Processing, John Wiley & Sons Ltd. & B. G. Teubner, Stuttgart.

    Google Scholar 

  • Clark, P., and Boswell, R., 1991, Rule induction with CN2: Some recent improvements, in: Proc. of the Fifth European Working Session on Learning, Springer, Berlin, pp. 151-163.

    Google Scholar 

  • Clark, P., and Niblett, T., 1989, The CN2 induction algorithm, Machine Learning 3(4):261-283.

    Google Scholar 

  • Díaz, J. L., Pérez, R., Nudelman, M., and Izquierdo, J., 2005, Minería de datos (Data Mining) en los abastecimeintos de agua, casos hipotéticos de utilización, in: Proc. of the V SEREA Seminario Iberoamericano sobre Planificación, Proyecto y Operación de Abastecimientos de Agua, Valencia (Spain).

    Google Scholar 

  • Doyle, P., Lane, J. I., Theeuwes, J. J., and Zayatz, L. M., eds., 2001, Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier, Amsterdam.

    Google Scholar 

  • Dzeroski, S., 2001, Data Mining in a nutshell, in Relational Data Mining, Dzeroski, S., and Lvrac, N., eds., Springer, Berlin, pp. 3-27.

    Google Scholar 

  • Dzeroski, S., 2002, Knowledge discovery in environmental databases, in: Analysis of environ-mental data with machine learning methods, Ljubljana, Slovenia.

    Google Scholar 

  • Edwards, M., Ferrand, N., Goreaud, F., and Huet, S., 2005, The relevance of aggregating a water consumption model cannot be disconnected from the choice of the information available on the resource, Simulation Modelling Practice and Theory 13:287-307.

    Article  Google Scholar 

  • El-Din, A. G., Smith, D. W., and El-Din, M. G., 2004, Application of artificial neural networks in wastewater treatment, J. Environ. Eng. Sci. 3:S81-S95.

    Article  CAS  Google Scholar 

  • Espert, V., López, P. A., and Izquierdo, J., 1999, Fundamentals of a water quality model solution for dissolved oxygen in one-dimensional receiving system, in: Proc. of Intnl. Workshop on Numerical Modelling of Hydrodynamic Systems, pp. 444-445.

    Google Scholar 

  • Faye, R. M., Sawadogo, S., Lishou, C., and Mora-Camino, F., 2003, Long-term fuzzy manage-ment of water resource systems, Applied Mathematics and Computation 37:459-475.

    Article  Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., 1996, From Data Mining to Knowledge Discovery: An overview, in: Advances and knowledge Discovery and Data Mining, Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., eds., MIT Press, Cambridge (MA), pp. 1-34.

    Google Scholar 

  • Fielding, A. H., ed., 1999, Machine Learning Methods for Ecological Applications, Kluwer Academic Press, Dordrecht.

    Google Scholar 

  • Iglesias, P. L., Martínez, F. J., Fuertes, V. S., and Pérez, R., 2004, Algoritmo genético modificado para diseño de redes de abastecimiento de agua, in: Proc. of the IV SEREA, Seminário Hispano-Brasileiro sobre Sistemas de Abastecimento Urbano de Água, João Pessoa, Brasil.

    Google Scholar 

  • Izquierdo, J., Pérez, R., and Iglesias, P. L., 2004a, Mathematical models and methods in the water industry, Mathematical and Computer Modeling 39:1353-1374.

    Article  Google Scholar 

  • Izquierdo, J., 2004b, Detection and identification of anomalies in water distribution systems using neural networks, NATO/CCMS 2nd workshop of the pilot study Integrated Water Management, Genoa.

    Google Scholar 

  • Jacob, C., 2003, Stochastic Search Methods, in: Intelligent Data Analysis. An Introduction, Berthold, M., and Hand, D. J., eds., Springer, Berlin, pp. 350-401.

    Google Scholar 

  • Jeffers, J. N. R., 1999, Genetic Algorithms I, in: Machine Learning Methods for Ecological Applications, Fielding, A. H., ed., Kluwer Academic Press, Dordrecht, pp. 107-121.

    Google Scholar 

  • Juang, Ch. F., 2003, Temporal problems solved by dynamic fuzzy network based on genetic algorithm with variable-length chromosomes, Fuzzy Sets and Systems 142:199-219.

    Article  Google Scholar 

  • Kaufmann A., and Gupta, M. M., 1991, Introduction to fuzzy arithmetics: Theory and Applica-tions, Van Nostrand Reinhold, New York.

    Google Scholar 

  • Keim, D., and Ward, M., 2003, Visualization, in: Intelligent Data Analysis. An Introduction, Berthold, M., and Hand D. J., eds., Springer, Berlin, pp. 403-427.

    Chapter  Google Scholar 

  • Kohonen, T., 2001, Self-Organizing Maps, Springer-Verlag, Berlin, Heidelberg.

    Google Scholar 

  • Kuncheva, L. I., Wrench, J., Jain, L. C., and Al-Zaidan, A. S., 2000, A fuzzy model of heavy metal loadings in Liverpool bay, Env. Modeling and Software 15:161-167.

    Article  Google Scholar 

  • Lavkulich, L. M., 2004, Some remarks on the topic ‘Environmental Indicators/Human Health’, NATO/CCMS pilot study Integrated Water Managemet, Genova.

    Google Scholar 

  • Lingireddy, S., and Brion, G. M., eds.,2005, Artificial Neural Networks in Water Supply Engineering, ASCE (EWRI), Reston (VA).

    Google Scholar 

  • López, P. A., 2001, Metodología para la calibración de modelos matemáticos de dispersión de contaminantes incluyendo regímenes no permanentes, Unpublished doctoral Dissertation, Polytechnic University of Valencia (Spain).

    Google Scholar 

  • Matías, A., 2004, Diseño de redes de distribución de agua contemplando la fiabilidad, mediante algoritmos genéticos, Unpublished doctoral Dissertation, Polytechnic University of Valencia (Spain).

    Google Scholar 

  • Michalewicz, Z., 1994, Genetic algorithms + data structures = evolution programs, Springer-Verlag, New York.

    Google Scholar 

  • Michalski, R. S., and Larson, J., 1983, Incremental generation of vl1 hypotheses: the underlying methodology and the description of program AQ11, ISG 83-5, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, Urbana.

    Google Scholar 

  • Michalski, R. S., Mozetic, I., Hong, J., and Lavrac, N., 1986, The AQ15 inductive learning system: an overview and experiments, in: Proc. of IMAL 1986, Orsay, France, Université de Paris-Sud.

    Google Scholar 

  • Millot, J., Rodríguez, M. J., and Sérodes, J. B., 2002, Contribution of Neural Networks for Modelling Trihalomethanes Occurrence in Drinking Water, J. Water Res. Plan. and Mgmnt. 128 (5):370-376.

    Article  Google Scholar 

  • Moore, R., 1979, Methods and Applications of Interval Analysis, Siam, Philadelphia.

    Google Scholar 

  • Nauck, D., Klawonn, F., and Kruse, R., 1997, Foundations of Neuro-Fuzzy Systems, John Wiley, New York.

    Google Scholar 

  • Nishida, W., Noguchi, M., Matsushita, H., and Solomatine, D. P., 2004, A Study on the Appli-cation of Genetic Algorithm to Calibration of Water Quality Model, Ann. J. of Hydraulic Engineering 48(2):1321-1326.

    Google Scholar 

  • Oh, S. K., and Pedrycz, W., 2004, Self-organizing polynomial neural networks based on polynomial and fuzzy polynomial neurons: analysis and design, Fuzzy Sets and Systems, 142:163-198.

    Article  Google Scholar 

  • Oh, S. K., Pedrycz, W., and Park, H. S., 2003, Multi-FNN identification based on HCM clustering and evolutionary fuzzy granulation, Simulation Modelling Practice and Theory 11:627-642.

    Article  Google Scholar 

  • Panella, M., Rizzi, A., and Martinelli, G., 2003, Refining accuracy of environmental data prediction by MoG neural networks. Neurocomputing 55:521-549.

    Article  Google Scholar 

  • Quinlan, J.R., 1983, Learning Efficient Classification Procedures and Their Application to Chess End Games, in: Machine Learning: An Artificial Intelligence Approach, Michalski, R., Carbonell, J., Mitchell, T., eds., Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  • Quinlan, J.R., 1986, Induction of Decision Trees, Machine Learning 1:81-106.

    Google Scholar 

  • Quinlan, J.R., 1990, Learning logical definitions from Relations, Machine Learning, 5:239-266.

    Google Scholar 

  • Quinlan, J.R., 1993, C4.5. Programs for Machine Learning, San Francisco, Morgan Kaufmann.

    Google Scholar 

  • Quinlan, J.R., 1996, Learning first-order definitions from relations, Machine Learning 5(3): 239-266.

    Google Scholar 

  • Quinlan, J. R., and Cameron-Jones, R. M., 1993. FOIL: A Midterm Report, in: Proc. of the 6th European Conference on Machine Learning, P. Brazdil ed., 667:3-20.

    Google Scholar 

  • Rowland, J., Andrews, W. S., and Reber, K. A. M., 2004, A neural network approach to selecting indicators for a sustainable ecosystem, J. Environ. Eng. Sci., 3:S129-S136.

    Article  Google Scholar 

  • Ruck, B. M., Walley, W. J., and Hawkes, H. A., 1993, Biological classification of river water quality using neural networks, in: Applications of Artificial Intelligence VIII, Vol 2: Applications and Techniques, Rzevski, G., Pastor, J., and Adey, R.A., eds., Elsevier/CMP, Southampton, pp. 361-372.

    Google Scholar 

  • Solomatine, D. P., and Dulal, K. N., 2003, Model trees as an alternative to neural networks in rainfall-runoff modeling, Hydrological Sciences Journal 48(3):399-411.

    Article  Google Scholar 

  • Solomatine, D. P., and Siek, M. B., 2004, Flexible and optimal M5 model trees with applications to flow predictions, in: Proc. 6th Int. Conference on Hydroinformatics, World Scientific, Singapore.

    Google Scholar 

  • Torra, V., 2003, Trends in Information Fusion in Data Mining, in: Information Fusion in Data Mining, Torra, V., ed., Springer-Verlag, Heidelberg, pp. 1-6.

    Google Scholar 

  • Torra, V., and Domingo-Ferrer, J., 2003, Record linkage methods for multidatabase data mining, in: Information Fusion in Data Mining, Torra, V., ed., Springer-Verlag, Heidelberg, pp. 101-132.

    Google Scholar 

  • Tronchi, S., Giona, M., and Baratti, R., 2003, Reconstruction of chaotic time series by neural models: a case study, Neurocomputing 55:581-591.

    Article  Google Scholar 

  • Tsumoto, S., 2003, Discovery of Temporal Knowledge in Medical Time-Series Databases using Moving Average, Multiscale Matching and Rule Information, in: Information Fusion in Data Mining, Torra, V., ed., Springer-Verlag, Heidelberg, pp. 79-100.

    Google Scholar 

  • U.S. EPA Terms, 2000, Terms of the Environment. Document order number EPA175B97001, National Service Center for Environmental Publications, also available at http://www.epa.org/ OCEPAterms/.

  • Vojinovic, Z. and Solomatine, D.P., 2005, Multi-criteria global evolutionary optimization approach to rehabilitation of urban drainage systems, Geophysical Research Abstracts, 7:10720. EGU General Assembly, Vienna.

    Google Scholar 

  • Walley, W. J., and Dzeroski, S., 1996, Biological monitoring: a comparison between Bayesian, neural and machine learning methods of water quality classification, in: Proc. of the International symposium on Environmental Software Systems, Chapman and Hall, London, 229-240.

    Google Scholar 

  • Walley, W. J., Martin, R. W., and O’Connor, M. A., 2000, Self organizing maps for the classification and diagnosis of river quality from biological and environmental data, in: Environmental Software Systems: Environmental Information and Decision Support, Denzer, R., Swayne, A., Purvis, M., and Schimak, G., eds, Kluwer, Dordretch, pp. 27-41.

    Google Scholar 

  • Wu, Z. Y., and Simpson, A. R., 2002, A self-adaptative boundary search genetic algorithm and its application to water distribution systems, J. Hydr. Research 40(2):191-199.

    Article  Google Scholar 

  • Yager, R. R., 2003, Data Mining Using Granular Linguistic Summaries, in: Information Fusion in Data Mining, Torra, V., ed., Springer-Verlag, Heidelberg, pp. 211-229.

    Google Scholar 

  • Zadeh, L. A., 1965, Fuzzy sets, Information and Control 8:338-353.

    Google Scholar 

  • Zadeh, L. A., 1995, Probability Theory and fuzzy logic are complementary rather than competitive, Technometrics 37(3):271-276.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer

About this paper

Cite this paper

Izquierdo, J., Díaz, J.L., Pérez, R., López, P.A., Mora, J.J. (2008). Knowledge Discovery in Environmental Data. In: Meire, P., Coenen, M., Lombardo, C., Robba, M., Sacile, R. (eds) Integrated Water Management. NATO Science Series, vol 80. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6552-1_5

Download citation

Publish with us

Policies and ethics