Advertisement

Improving Regression Models by Dissimilarity Representation of Bio-chemical Data

  • Francisco Jose Silva-Mata
  • Catherine Jiménez
  • Gabriela Barcas
  • David Estevez-Bresó
  • Niusvel Acosta-MendozaEmail author
  • Andres Gago-Alonso
  • Isneri Talavera-Bustamante
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11401)

Abstract

The determination of characteristics by regression models using bio-chemical data from analytical techniques such as Near Infrared Spectrometry and Nuclear Magnetic Resonance is a common activity within the recognition of substances and their chemical-physical properties. The data obtained from the mentioned techniques are commonly represented as vectors, which ignore the continuous nature of data and the correlation between variables. This fact affects the regression modeling and calibration processes. For solving these problems, alternative representations of data have been previously used with good results, such as those ones based on functions and the others based on dissimilarity representation. By using the alternative based on dissimilarities, the obtained results improve the efficiency of the classification processes, but the experience in regression with this representation is scarce. For this reason, in this paper, in order to improve the quality of the regression models, we combine the dissimilarity representation with some adequate data pre-processing, in our case, we use the classical Partial Least Square regression as the modeling method. The evaluation of the results was carried out by using the coefficient of determination \(R^2\) for each case and a statistical analysis of them is performed.

Keywords

Dissimilarity representation Regression Bio-chemical data 

References

  1. 1.
    Brereton, R.G.: Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley, Chichester (2003)CrossRefGoogle Scholar
  2. 2.
    Bro, R., et al.: Data fusion in metabolomic cancer diagnostics. Metabolomics 9(1), 3–8 (2013)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Esbensen, K.H., Guyot, D., Westad, F., Houmoller, L.P.: Multivariate data analysis: in practice: an introduction to multivariate data analysis and experimental design. In: Multivariate Data Analysis (2002)Google Scholar
  4. 4.
    Larsen, F.H., van den Berg, F., Engelsen, S.B.: An exploratory chemometric study of 1H NMR spectra of table wines. J. Chemometr. J. Chemometr. Soc. 20(5), 198–208 (2006)CrossRefGoogle Scholar
  5. 5.
    Martin, Y.C., Lin, C.T., Hetti, C., DeLazzer, J.: PLS analysis of distance matrixes to detect nonlinear relationships between biological potency and molecular properties. J. Med. Chem. 38(16), 3009–3015 (1995)CrossRefGoogle Scholar
  6. 6.
    Massart, D.L.: Handbook of Chemometrics and Qualimetrics. Elsevier Science, Amsterdam (1997)Google Scholar
  7. 7.
    Pekalska, E., Duin, R.P.: Dissimilarity representations allow for building good classifiers. Pattern Recogn. Lett. 23(8), 943–956 (2002)CrossRefGoogle Scholar
  8. 8.
    Porro Munoz, D.: Classification of continuous multi-way data via dissimilarity representation (2013)Google Scholar
  9. 9.
    Rinnan, Å., van den Berg, F., Engelsen, S.B.: Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends Anal. Chem. 28(10), 1201–1222 (2009)CrossRefGoogle Scholar
  10. 10.
    Rinnan, R., Rinnan, Å.: Application of near infrared reflectance (NIR) and fluorescence spectroscopy to analysis of microbiological and chemical properties of arctic soil. Soil Biol. Biochem. 39(7), 1664–1673 (2007)CrossRefGoogle Scholar
  11. 11.
    Thodberg, H.: Statlib-datasets archive website (2018). http://lib.stat.cmu.edu/datasets/tecator
  12. 12.
    Zerzucha, P., Daszykowski, M., Walczak, B.: Dissimilarity partial least squares applied to non-linear modeling problems. Chemometr. Intell. Lab. Syst. 110(1), 156–162 (2012)CrossRefGoogle Scholar
  13. 13.
    Zerzucha, P., Walczak, B.: Concept of (dis)similarity in data analysis. TrAC Trends Anal. Chem. 38, 116–128 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Francisco Jose Silva-Mata
    • 1
  • Catherine Jiménez
    • 1
  • Gabriela Barcas
    • 1
  • David Estevez-Bresó
    • 1
  • Niusvel Acosta-Mendoza
    • 1
    Email author
  • Andres Gago-Alonso
    • 1
  • Isneri Talavera-Bustamante
    • 1
  1. 1.Advanced Technologies Application CenterHavanaCuba

Personalised recommendations