Abstract
Data quality is one of the fundamental issues influencing the performance of any data investigation algorithm. Poor data quality always leads to poor quality results. In the investigation chain, the data selection phase is followed by the preprocessing phase, which results in increased data quality, while in parallel it demands the highest time resources of the overall data investigation chain. The preprocessing phase includes the handling of missing data, handling of the outliers, data de-trending and data smoothing. The methods that are used in the preprocessing phase are usually not sufficiently reported in the literature of environmental data analysis and knowledge extraction. The current paper investigates the performance of several methods in all phases of the preprocessing chain of environmental data, by emphasizing in the use of ICT (Information & Communication Technology) methods for the materialization of such preprocessing tasks, and by making use of the air quality as the environmental domain paradigm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kukkonen J, Partanen L, Karppinen A, Ruuskanen J, Junninen H, Kolehmainen M, Niska H, Dorling S, Foxall R, Cawley G (2003) Extensive evaluation of neural network models for the prediction of NO2 and PM10 concentrations, compared with a deterministic modelling system and measurements in central Helsinki. Atmospheric Environment 37: 4539-4550.
Tzima F, Karatzas K, Mitkas P, Karathanasis S (2007) Using data-mining techniques for PM10 forecasting in the metropolitan area of Thessaloniki, Greece. Proceedings of the 20th International Joint Conference on Neural Networks (http://www.ijcnn2007.org):2752–2757
Pyle D (1999) Data preparation for data mining. Los Altos.
Gardner MW, Dorling SR (1998) Artificial Neural Networks (The Multilayer Perceptron) - a Review of Applications in the Atmospheric Sciences. Atmospheric Environment 32 (14/15):2627-2636
Kolehmainen M, Rissanen E, Raatikainen O, Ruuskanen J (2001) Monitoring odorous sulfur emissions using self-organizing maps for handling ion mobility spectrometry data. Journal of Air and Waste Management 51:966-971.
Sfetsos A, Siriopoulos C (2004) Time series forecasting with a hybrid clustering scheme and pattern recognition, IEEE Transactions on Systems, Man and Cybernetics, Part A, Vol. 34 (3): 399-405
Bianchini M, Di Iorio E, Maggini M, Mocenni C, Pucci A (2006) A Cyclostationary Neural Network Model for the Prediction of the NO2 Concentration. ESANN 2006:67-72
Zhang Z, San Y (2004) Adaptive Wavelet Neural Network for Prediction of Hourly NOX and NO2 Concentrations. Winter Simulation Conference 2004:1170-1778
Finardi S, Pellegrini U (2004) Systematic Analysis of Meteorological Conditions Causing Severe Urban Air Episodes in the Central Po Valley”. 9th Conference on Harmonisation within Atmospheric Dispersion Modelling for Regulatory Purposes:250-254
Airbase, the European Air quality database: http://air-climate.eionet.europa.eu/databases/airbase (accessed 06 March 2009).
Karatzas K, Kaltsatos S (2007) Air pollution modelling with the aid of computational intelligence methods in Thessaloniki, Greece. Simulation Modelling Practice and Theory, vol 15, issue 10:1310-1319
Weather Underground web site http://www.wunderground.com/
Witten IH, Eibe F (2005) Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition
Slini T, Karatzas K, Mousiopoulos N (2004) Correlation of air pollution and meteorological data using neural networks. Int. J. Environment and Pollution, vol 20(1-6):218-229
Engelbrecht AP (2002) Computational Intelligence, An Introduction. University of Pretoria, South Africa.
Matlab Documentation, Section of Pre-processing Data.
Agnew DC, Constable C, Lecture on Total Least Squares and Robust Methods, http://mahi.ucsd.edu/cathy/Classes/SIO223/Part1/sio223.chap8.pdf
Ronchetti E (2008) Lectures. Department of Economics, University of Geneva, Switzerland.
Huber PJ (1964) Robust Estimation of a Location Parameter. Ann. Of Mathematical Statistics 35(1):73-101
Yohai V (2006) The teaching of robust statistics for regression, in Proceedings of the 7th International Conference on Teaching Statistics, http://www.ime.usp.br/~abe/ICOTS7/Proceedings/PDFs/InvitedPapers/3B3_YOHA.pdf (accessed 05 March 2009)
Andrews DF (1974) A Robust Method for Multiple Linear Regression. Technometrics (16):523-531
Leblebicioğlu A (2008) Financial integration, credit market imperfections and consumption smoothing. North Carolina State University
Xiong L, Guo S, O’Connor KM (2005) Smoothing the seasonal means of rainfall and runoff in the linear perturbation model (LPM) using the kernel estimator, Journal of Hydrology 324(1-4):266-282.
Savitzky A, Golay MJE (1964) Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry 36:1627-1639
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine Learning, Neural and Statistical Classification (eds), (accessed 06 March 2009) http://www.shams.edu.eg/www.maththinking.com/4/whole.pdf
Bishop C (1995) Neural Networks for Pattern Recognition. Clarendon Press, Oxford.
Kolehmainen M, Junninen H, Niska H, Patama T, Ruuskanen A, Tuppurainen K, Ruuskanen J (2007) Environmental Communication in the Information Society. 16th International Conference Informatics for Environmental Protection, September 25-27, Vienna University of Technology, 2002:445-451
Varotsos C, Ondov J, Efstathiou M (2005) Scaling properties of air pollution in Athens, Greece and Baltimore, Maryland, Atmospheric Environment 39(22):4041-4047.
Slini T, Karatzas K, Mousiopoulos M (2004) Correlation of air pollution and meteorological data using neural networks. Int. J. Environment and Pollution, vol 20, nos 1-6:218-229
Matlab Documentation, Section of Neural Network Toolbox.
Saini LM, Soni MK (2002) Artificial neural network based peak load forecasting using Levenberg-Marquardt and quasi-Newton methods. Generation, Transmission and Distribution, IEE Proceedings, vol 149, issue 5:578–584
StatSoft, Inc. © Copyright (1984-2003) Neural Networks http://www.statsoft.com/textbook/stneunet.html
Willmott CJ, Ackleson SG, Davis RE, Feddema JJ, Klink KM, Legates DR, O’Donnell J, Rowe CM (1985) Statistics for the Evaluation and Comparison of Models, Geophys J (Res), 90(C5):8995–9005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kyriakidis, I., Karatzas, K.D., Papadourakis, G. (2009). Using Preprocessing Techniques in Air Quality forecasting with Artificial Neural Networks. In: Athanasiadis, I.N., Rizzoli, A.E., Mitkas, P.A., Gómez, J.M. (eds) Information Technologies in Environmental Engineering. Environmental Science and Engineering(). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88351-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-88351-7_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88350-0
Online ISBN: 978-3-540-88351-7
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)