Skip to main content

FS4RVDD: A Feature Selection Algorithm for Random Variables with Discrete Distribution

  • Conference paper
  • First Online:
Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications (IPMU 2018)

Abstract

Feature Selection is a crucial step for inferring regression and classification models in QSPR (Quantitative Structure–Property Relationship) applied to Cheminformatics. A particularly complex case of QSPR modelling occurs in Polymer Informatics because the features under analysis require the management of uncertainty. In this paper, a novel feature selection method for addressing this special QSPR scenario is presented. The proposed methodology assumes that each feature is characterized by a probabilistic distribution of values associated with the polydispersity of the polymers included in the training dataset. This new algorithm has two sequential steps: ranking of the features, generated by correlation analysis, and iterative subset reduction, obtained by feature redundancy analysis. A prototype of the algorithm has been implemented in order to conduct a proof of concept. The method performance has been evaluated by using synthetic datasets of different sizes and varying the cardinality of the feature selected sub-sets. These preliminary results allow concluding that the chosen mathematical representation and the proposed method is suitable for managing the uncertainty inherent to the polymerization. Nevertheless, this research constitutes a piece of work in progress and additional experiments should be conducted in the future in order to assess the actual benefits and limitations of this methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, Y., Li, T., Liu, H.: Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53, 551–577 (2017)

    Article  Google Scholar 

  2. Eklund, M., Norinder, U., Boyer, S., Carlsson, L.: Choosing feature selection and learning algorithms in QSAR. J. Chem. Inf. Model. 54, 837–843 (2014)

    Article  Google Scholar 

  3. Li, J., Fong, S., Siu, S., Mohammed, S., Fiaidhi, J., Wong, K.K.L.: WITHDRAWN: improving classification of protein binders for virtual drug screening by novel swarm-based feature selection techniques. Comput. Med. Imaging Graph. (2016, in press)

    Google Scholar 

  4. Ponzoni, I., Sebastián-Pérez, V., Requena-Triguero, C., Roca, C., Martínez, M.J., Cravero, F., Díaz, M.F., Páez, J.A., Gómez Arrayás, R., Adrio, J., Campillo, N.E.: Hybridizing feature selection and feature learning approaches in QSAR modeling for drug discovery. Sci. Rep. 7, Article number 2403 (2017)

    Google Scholar 

  5. Adams, N.: Polymer informatics. In: Meier, M., Webster, D. (eds.) Polymer Libraries. Advances in Polymer Science, vol. 225, pp. 107–149 (2010)

    Google Scholar 

  6. Audus, D.J., De Pablo, J.J.: Polymer informatics: opportunities and challenges. ACS Macro Lett. 6, 1078–1082 (2017)

    Article  Google Scholar 

  7. Liu, Y., Zhao, T., Ju, W., Shi, S.: Materials discovery and design using machine learning. J. Materiomics 3, 159–177 (2017)

    Article  Google Scholar 

  8. Huan, T.D., Mannodi-Kanakkithodi, A., Kim, C., Sharma, V., Pilania, G., Ramprasad, R.: A polymer dataset for accelerated property prediction and design. Sci. Data 3, Article number 160012 (2016)

    Article  Google Scholar 

  9. Singh, R.K., Sivabalakrishnan, M.: Feature selection of gene expression data for cancer classification: a review. Procedia Comput. Sci. 50, 52–57 (2015)

    Article  Google Scholar 

  10. Tommasel, A., Godoy, D.: A Social-aware online short-text feature selection technique for social media. Inf. Fusion 40, 1–17 (2018)

    Article  Google Scholar 

  11. Soto, A.J., Cecchini, R.L., Vazquez, G.E., Ponzoni, I.: A wrapper-based feature selection method for ADMET prediction using evolutionary computing. In: Marchiori, E., Moore, J.H. (eds.) EvoBIO 2008. LNCS, vol. 4973, pp. 188–199. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78757-0_17

    Chapter  Google Scholar 

  12. Soto, A.J., Cecchini, R.L., Vazquez, G.E., Ponzoni, I.: Multi-objective feature selection in QSAR using a machine learning approach. Mol. Inf. 28, 1509–1523 (2009)

    Google Scholar 

  13. Martínez, M.J., Ponzoni, I., Díaz, M.F., Vazquez, G.E., Soto, A.J.: Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods. J. Cheminform. 7, 39 (2015)

    Article  Google Scholar 

  14. Cravero, F., Martínez, M.J., Vazquez, G.E., Díaz, M.F., Ponzoni, I.: Feature learning applied to the estimation of tensile strength at break in polymeric material design. J. Integr. Bioinf. 13, 286 (2016)

    Google Scholar 

  15. McCrum, N.G., Buckley, C.P., Bucknall, C.B.: Principles of Polymer Engineering. Oxford University Press, Oxford; New York (1997)

    Google Scholar 

  16. Sheu, W.-S.: Molecular weight averages and polydispersity of polymers. J. Chem. Educ. 78, 554–555 (2001)

    Article  Google Scholar 

  17. Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)

    MathSciNet  MATH  Google Scholar 

  18. Cravero, F., Schustik, S., Martínez, M.J., Ponzoni, I., Díaz, M.F.: Macro approach to molecular modelling of linear polymers applied to estimation of tensile modulus for new materials development. In: VIII International Symposium on Materials (Materias2017), Aveiro, Portugal (2017)

    Google Scholar 

  19. Cravero, F., Martínez, M.J., Vazquez, G.E., Ponzoni, I., Díaz, M.F.: Representación de la Estructura Molecular de Polímeros Sintéticos de Alto Peso. In: XXXI Congreso Argentino de Química, Buenos Aires, Argentina (2016)

    Google Scholar 

Download references

Acknowledgments

This work is kindly supported by CONICET, grant PIP 112-2012-0100471 and UNS, grants PGI 24/N042 and PGI 24/ZM17.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ignacio Ponzoni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cravero, F., Schustik, S., Martínez, M.J., Díaz, M.F., Ponzoni, I. (2018). FS4RVDD: A Feature Selection Algorithm for Random Variables with Discrete Distribution. In: Medina, J., Ojeda-Aciego, M., Verdegay, J., Perfilieva, I., Bouchon-Meunier, B., Yager, R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications. IPMU 2018. Communications in Computer and Information Science, vol 855. Springer, Cham. https://doi.org/10.1007/978-3-319-91479-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91479-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91478-7

  • Online ISBN: 978-3-319-91479-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics