An Approach to Clustering Using the Expectation-Maximization and Selection of Attributes ReliefF Applied to Water Treatment Plants process

  • Fábio Cosme Rodrigues dos Santos
  • André Felipe Henriques Librantz
  • Renato José Sassi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10657)


The water treatment process contains several physico-chemical parameters relevant to decision making and the water quality scenarios’ identification. Some scenarios are evident and can be observed without the application of mathematical or statistical techniques, however some of these scenarios are difficult to distinguish, and it is necessary to use computational intelligence techniques for solution. In this context, the paper aims to show the application of the expectation-maximization (EM) algorithm for data clusters of the coagulation process and the ReliefF algorithm to determine the importance of the physico-chemical parameters, using the WEKA tool to analyze historical dataset of a water treatment plant. The results were favorable to the scenarios’ identification and to determine the relevance of the parameters related to the process.


Clustering Expectation-maximization ReliefF Water treatment Coagulant dosage 



The authors would like to thank São Paulo Research Foundation (FAPESP, grant#2016/02641-1), Basic Sanitation Company of the State of São Paulo - (SABESP) and Nove de Julho University (UNINOVE) for their supports.


  1. 1.
    Akbar, T.A., Hassan, Q.K., Achari, G.: A methodology for clustering lakes in alberta on the basis of water quality parameters. Clean - Soil Air Water 39(10), 916–924 (2011)CrossRefGoogle Scholar
  2. 2.
    Baxter, C.W., Stanley, S.J., Zhang, Q.: Development of a full-scale artificial neural network model for the removal of natural organic matter by enhanced coagulation. Aqua 48(4), 129–136 (1999)Google Scholar
  3. 3.
    Bieroza, M., Baker, A., Bridgeman, J.: New data mining and calibration approaches to the assessment of water treatment efficiency. Adv. Eng. Softw. 44(1), 126–135 (2012)CrossRefGoogle Scholar
  4. 4.
    Heddam, S., Bermad, A., Dechemi, N.: ANFIS-based modelling for coagulant dosage in drinking water treatment plant: a case study. Environ. Monit. Assess. 184(4), 1953–1971 (2012)CrossRefGoogle Scholar
  5. 5.
    Juntunen, P., Liukkonen, M., Lehtola, M., Hiltunen, Y.: Cluster analysis by self-organizing maps: an application to the modelling of water quality in a treatment process. Appl. Soft Comput. 13(7), 3191–3196 (2013)CrossRefGoogle Scholar
  6. 6.
    Kalteh, A., Hjorth, P., Berndtsson, R.: Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environ. Model. Softw. 23(7), 835–845 (2008)CrossRefGoogle Scholar
  7. 7.
    Lamrini, B., Lakhal, E.K., Le Lann, M.V., Wehenkel, L.: Data validation and missing data reconstruction using self-organizing map for water treatment. Neural Comput. Appl. 20(4), 575–588 (2011)CrossRefGoogle Scholar
  8. 8.
    Leu, S.S., Bui, Q.N.: Leak prediction model for water distribution networks created using a bayesian network learning approach. Water Resour. Manag. 30(8), 2719–2733 (2016)CrossRefGoogle Scholar
  9. 9.
    McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, Hoboken (2008)CrossRefMATHGoogle Scholar
  10. 10.
    North, B., Blake, A.: Using expectation-maximisation to learn dynamical models from visual data. Image Vis. Comput. 17(8), 611–616 (1999)CrossRefGoogle Scholar
  11. 11.
    Ogwueleka, T., Ogwueleka, F.: Optimization of drinking water treatment processes using artificial neural network. Niger. J. Technol. 28(1), 16–25 (2009)Google Scholar
  12. 12.
    Olanweraju, R.F., Muyibi, S.A., Salawudeen, T.O., Aibinu, A.M.: An intelligent modeling of coagulant dosing system for water treatment plants based on artificial neural network. Aust. J. Basic Appl. Sci. 6(1), 93–99 (2012)Google Scholar
  13. 13.
    Olawoyin, R., Nieto, A., Grayson, R.L., Hardisty, F., Oyewole, S.: Application of artificial neural network (ANN) self-organizing map (SOM) for the categorization of water, soil and sediment quality in petrochemical regions. Expert Syst. Appl. 40(9), 3634–3648 (2013)CrossRefGoogle Scholar
  14. 14.
    Park, S., Bae, H., Kim, C.: Decision model for coagulant dosage using genetic programming and multivariate statistical analysis for coagulation/flocculation at water treatment process. Korean J. Chem. Eng. 25(6), 1372–1376 (2008)CrossRefGoogle Scholar
  15. 15.
    Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of RelieF and RReliefF. Mach. Learn. 53(1), 23–69 (2003)CrossRefMATHGoogle Scholar
  16. 16.
    Romano, M., Kapelan, Z., Savić, D.A.: Evolutionary algorithm and expectation maximization strategies for improved detection of pipe bursts and other events in water distribution systems. J. Water Resour. Plann. Manag. 140(5), 572–584 (2007)CrossRefGoogle Scholar
  17. 17.
    Siti Rozaimah, S.A., Pasilatun Adawiyah, I., Mohd. Marzuki, M., Rakmi, A.R.: Pattern recognition of fractal profiles in coagulation-flocculation process of wastewater via neural network. J. Inst. Eng. 68(4), 17–19 (2007)Google Scholar
  18. 18.
    WEKA: Homepage. Accessed 09 Oct 2016
  19. 19.
    Ye, G., Fenner, R.A.: Weighted least squares with expectation-maximization algorithm for burst detection in U.K. water distribution systems. J. Water Resour. Plann. Manag. 140(4), 417–424 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Fábio Cosme Rodrigues dos Santos
    • 1
    • 2
  • André Felipe Henriques Librantz
    • 1
  • Renato José Sassi
    • 1
  1. 1.Nove de Julho University - UNINOVESão PauloBrazil
  2. 2.Basic Sanitation Company of the State of São Paulo - SABESPSão PauloBrazil

Personalised recommendations