Single Imputation Methods Applied to a Global Geothermal Database

  • Román-Flores Mariana AlelhíEmail author
  • Santamaría-Bonfil Guillermo
  • Díaz-González Lorena
  • Arroyo-Figueroa Gustavo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11288)


In the exploitation stage of a geothermal reservoir, the estimation of the bottomhole temperature (BHT) is essential to know the available energy potential, as well as the viability of its exploitation. This BHT estimate can be measured directly, which is very expensive, therefore, statistical models used as virtual geothermometers are preferred. Geothermometers have been widely used to infer the temperature of deep geothermal reservoirs from the analysis of fluid samples collected at the soil surface from springs and exploration wells. Our procedure is based on an extensive geochemical data base (n = 708) with measurements of BHT and geothermal fluid of eight main element compositions. Unfortunately, the geochemical database has missing data in terms of some compositions of measured principal elements. Therefore, to take advantage of all this information in the BHT estimate, a process of imputation or completion of the values is necessary.

In the present work, we compare the imputations using medium and medium statistics, as well as the stochastic regression and the support vector machine to complete our data set of geochemical components. The results showed that the regression and SVM are superior to the mean and median, especially because these methods obtained the smallest RMSE and MAE errors.


Geothermal data Missing data Imputation Stochastic regression 


  1. 1.
    Díaz-González, L., Santoyo, E., Reyes-Reyes, J.: Tres nuevos geotermómetros mejorados de Na/K usando herramientas computacionales y geoquimiométricas: aplicación a la predicción de temperaturas de sistemas geotérmicos. Revista Mexicana de Ciencias Geológicas 25(3), 465–482 (2008)Google Scholar
  2. 2.
    Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall/CRC, New York/Boca Raton (1997)CrossRefGoogle Scholar
  3. 3.
    Allison, P.D.: Missing Data, vol. 136. Sage Publications, Thousand Oaks (2001)zbMATHGoogle Scholar
  4. 4.
    Batista, G.E., Monard, M.C.: An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 17(5–6), 519–533 (2003)CrossRefGoogle Scholar
  5. 5.
    Tsai, C.F., Li, M.L., Lin, W.C.: A class center based approach for missing value imputation. Knowl.-Based Syst. 151, 124–135 (2018)CrossRefGoogle Scholar
  6. 6.
    Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Norazian, M.N., Shukri, Y.A., Azam, R.N.: Al Bakri, A.M.M.: Estimation of missing values in air pollution data using single imputation techniques. ScienceAsia 34, 341–345 (2008)CrossRefGoogle Scholar
  8. 8.
    Noor, N.M., Abdullah, M.M.A.B., Yahaya, A.S., Ramli, N.A.: Comparison of linear interpolation method and mean method to replace the missing values in environmental data set. Small 5, 10 (2015)Google Scholar
  9. 9.
    Razak, N.A., Zubairi, Y.Z., Yunus, R.M.: Imputing missing values in modelling the PM10 concentrations. Sains Malays. 43, 1599–1607 (2014)Google Scholar
  10. 10.
    Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for imputation of missing values in air quality data sets. Atmos. Environ. 38, 2895–2907 (2004)CrossRefGoogle Scholar
  11. 11.
    Yahaya, A.S., Ramli, N.A., Ahmad, F., Mohd, N., Muhammad, N., Bahrim, N.H.: Determination of the best imputation technique for estimating missing values when fitting the weibull distribution. Int. J. Appl. Sci. Technol. (2011)Google Scholar
  12. 12.
    Jerez, J.M., et al.: Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50, 105–115 (2010)CrossRefGoogle Scholar
  13. 13.
    Engels, J.M., Diehr, P.: Imputation of missing longitudinal data: a comparison of methods. J. Clin. Epidemiol. 56(10), 968–976 (2003)CrossRefGoogle Scholar
  14. 14.
    Shrive, F.M., Stuart, H., Quan, H., Ghali, W.A.: Dealing with missing data in a multi-question depression scale: a comparison of imputation methods. BMC Med. Res. Methodol. 6(1), 57 (2006)CrossRefGoogle Scholar
  15. 15.
    Newman, D.A.: Longitudinal modeling with randomly and systematically missing data: a simulation of ad hoc, maximum likelihood, and multiple imputation techniques. Organ. Res. Methods 6, 328–362 (2003)CrossRefGoogle Scholar
  16. 16.
    Olinsky, A., Chen, S., Harlow, L.: The comparative efficacy of imputation methods for missing data in structural equation modeling. Eur. J. Oper. Res. 151(1), 53–79 (2003)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Aydilek, I.B., Arslan, A.: A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf. Sci. 233, 25–35 (2013)CrossRefGoogle Scholar
  18. 18.
    Wang, X., Li, A., Jiang, Z., Feng, H.: Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinformatics 7(1), 32 (2006)CrossRefGoogle Scholar
  19. 19.
    Buuren, S.V., Groothuis-Oudshoorn, K.: MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 1–68 (2010)Google Scholar
  20. 20.
    Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7, 147 (2002)CrossRefGoogle Scholar
  21. 21.
    Drucker, H., Burges, C.J., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression machines. In: Advances in Neural Information Processing Systems, pp. 155–161 (1997)Google Scholar
  22. 22.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  23. 23.
    Schölkopf, B., Smola, A.J.: Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, p. 644. MIT Press, Cambridge (2002)Google Scholar
  24. 24.
    Lakshminarayan, K., Harp, S.A., Samad, T.: Imputation of missing data in industrial databases. Appl. Intell. 11(3), 259–275 (1999)CrossRefGoogle Scholar
  25. 25.
    Baraldi, A.N., Enders, C.K.: An introduction to modern missing data analyses. J. Sch. Psychol. 48(1), 5–37 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Román-Flores Mariana Alelhí
    • 1
    Email author
  • Santamaría-Bonfil Guillermo
    • 2
  • Díaz-González Lorena
    • 3
  • Arroyo-Figueroa Gustavo
    • 4
  1. 1.Posgrado en Optimización y Cómputo AplicadoUniversidad Autónoma del Estado de MorelosCuernavacaMexico
  2. 2.Instituto Nacional de Electricidad y Energías Limpias, Gerencia de Tecnologías de la InformaciónCuernavacaMexico
  3. 3.Departamento de Computación, Centro de Investigación en Ciencias, Instituto de Investigación en Ciencias Básicas AplicadasUniversidad Autónoma del Estado de MorelosCuernavacaMexico
  4. 4.Instituto Nacional de Electricidad y Energías LimpiasCuernavacaMexico

Personalised recommendations