Skip to main content

Missing-Values Imputation Algorithms for Microarray Gene Expression Data

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1986))

Abstract

In gene expression studies, missing values are a common problem with important consequences for the interpretation of the final data (Satija et al., Nat Biotechnol 33(5):495, 2015). Numerous bioinformatics examination tools are used for cancer prediction, including the data set matrix (Bailey et al., Cell 173(2):371–385, 2018); thus, it is necessary to resolve the problem of missing-values imputation. This chapter presents a review of the research on missing-values imputation approaches for gene expression data. By using local and global correlation of the data, we were able to focus mostly on the differences between the algorithms. We classified the algorithms as global, hybrid, local, or knowledge-based techniques. Additionally, this chapter presents suitable assessments of the different approaches. The purpose of this review is to focus on developments in the current techniques for scientists rather than applying different or newly developed algorithms with identical functional goals. The aim was to adapt the algorithms to the characteristics of the data.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Fehrmann RS, Karjalainen JM, Krajewska M, Westra H-J, Maloney D, Simeonov A, Pers TH, Hirschhorn JN, Jansen RC, Schultes EA (2015) Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat Genet 47(2):115

    Article  CAS  Google Scholar 

  2. Lima-Tenório MK, Pineda EAG, Ahmad NM, Fessi H, Elaissari A (2015) Magnetic nanoparticles: in vivo cancer diagnosis and therapy. Int J Pharm 493(1-2):313–327

    Article  Google Scholar 

  3. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B (2018) Comprehensive characterization of cancer driver genes and mutations. Cell 173(2):371–385; e318

    Article  CAS  Google Scholar 

  4. Criscuolo E, Spadini S, Lamanna J, Ferro M, Burioni R (2017) Bacteriophages and their immunological applications against infectious threats. J Immunol Res 2017:3780697

    Article  Google Scholar 

  5. Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134

    Article  Google Scholar 

  6. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  CAS  Google Scholar 

  7. Satija R, Farrell JA, Gennert D, Schier AF, Regev A (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495

    Article  CAS  Google Scholar 

  8. Lai H-H, Chuang T-H, Wong L-K, Lee M-J, Hsieh C-L, Wang H-L, Chen S-U (2017) Identification of mosaic and segmental aneuploidies by next-generation sequencing in preimplantation genetic screening can improve clinical outcomes compared to array-comparative genomic hybridization. Mol Cytogenet 10(1):14

    Article  Google Scholar 

  9. Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. In: Pacific symposium on biocomputing 2017. World Scientific, pp 219–229

    Google Scholar 

  10. Larose DT, Larose CD (2014) Discovering knowledge in data: an introduction to data mining. Wiley, Hoboken, NJ

    Google Scholar 

  11. Quinn JJ, Chang HY (2016) Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet 17(1):47

    Article  CAS  Google Scholar 

  12. Gogoshin G, Boerwinkle E, Rodin AS (2017) New algorithm and software (BNOmics) for inferring and visualizing Bayesian networks from heterogeneous big biological and genetic data. J Comput Biol 24(4):340–356

    Article  CAS  Google Scholar 

  13. Zomorrodi AR, Segrè D (2016) Synthetic ecology of microbes: mathematical models and applications. J Mol Biol 428(5):837–861

    Article  CAS  Google Scholar 

  14. Hu W, Lin X, Chen K (2015) Integrated analysis of differential gene expression profiles in hippocampi to identify candidate genes involved in Alzheimer’s disease. Mol Med Rep 12(5):6679–6687

    Article  CAS  Google Scholar 

  15. Cressie N (2015) Statistics for spatial data. Wiley, Hoboken, NJ

    Google Scholar 

  16. Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinformatics 2015:198363

    Article  Google Scholar 

  17. Lang KM, Little TD (2018) Principled missing data treatments. Prev Sci 19(3):284–294

    Article  Google Scholar 

  18. Josse J, Husson F (2016) missMDA: a package for handling missing values in multivariate data analysis. J Stat Softw 70(1):1–31

    Article  Google Scholar 

  19. Tsai C-F, Li M-L, Lin W-C (2018) A class center based approach for missing value imputation. Knowl-Based Syst 151:124–135

    Article  Google Scholar 

  20. Garvey C, Meng C, Nagy JG (2018) Singular value decomposition approximation via Kronecker summations for imaging applications. arXiv preprint arXiv:180311525

    Google Scholar 

  21. Chatfield C (2018) Introduction to multivariate analysis. Routledge, New York

    Book  Google Scholar 

  22. Tran CT, Zhang M, Andreae P (2016) A genetic programming-based imputation method for classification with missing data. In: European conference on genetic programming. Springer, pp 149–163

    Google Scholar 

  23. Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H (2014) Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am J Epidemiol 179(6):764–774

    Article  Google Scholar 

  24. Bhattacharya S, Rajan V, Anand A (2017) Clustering high dimensional data using gaussian mixture copula model with lasso based regularization. Google Patents

    Google Scholar 

  25. Fox J (2015) Applied regression analysis and generalized linear models. Sage Publications, Thousand Oaks, CA

    Google Scholar 

  26. van der Loo M (2017) Simputation: simple imputation. R package version 02 2

    Google Scholar 

  27. Armina R, Zain AM, Ali NA, Sallehuddin R (2017) A review on missing value estimation using imputation algorithm. J Phys Conf Ser 892:012004

    Article  Google Scholar 

  28. Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, vol 10. Wiley, New York

    Book  Google Scholar 

  29. Colantonio A, Di Pietro R, Ocello A, Verde NV (2010) ABBA: adaptive bicluster-based approach to impute missing values in binary matrices. In: Proceedings of the 2010 ACM symposium on applied computing. ACM, pp 1026–1033

    Google Scholar 

  30. Smart Richman L, Blodorn A, Major B (2016) An identity-based motivational model of the effects of perceived discrimination on health-related behaviors. Group Process Intergroup Relat 19(4):415–425

    Article  Google Scholar 

  31. Naik B, Mahapatra S, Nayak J, Behera H (2017) Fuzzy clustering with improved swarm optimization and genetic algorithm: hybrid approach. In: Computational intelligence in data mining. Springer, pp 237–247

    Google Scholar 

  32. Qi S, Schmid F (2017) Hybrid particle-continuum simulations coupling Brownian dynamics and local dynamic density functional theory. Soft Matter 13(43):7938–7947

    Article  CAS  Google Scholar 

  33. Shukur OB, Lee MH (2015) Imputation of missing values in daily wind speed data using hybrid AR-ANN method. Mod Appl Sci 9(11):1

    Article  Google Scholar 

  34. Kayri M (2016) Predictive abilities of bayesian regularization and Levenberg–Marquardt algorithms in artificial neural networks: a comparative empirical study on social data. Math Comput Appl 21(2):20

    Google Scholar 

  35. Gan S, Wang S, Chen Y, Chen X, Huang W, Chen H (2016) Compressive sensing for seismic data reconstruction via fast projection onto convex sets based on seislet transform. J Appl Geophys 130:194–208

    Article  Google Scholar 

  36. van der Loo M, de Jonge E (2018) Statistical data cleaning with applications in R. Wiley, New York

    Google Scholar 

  37. Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, Thomas PD (2016) PANTHER version 11: expanded annotation data from gene ontology and reactome pathways, and data analysis tool enhancements. Nucleic Acids Res 45(D1):D183–D189

    Article  Google Scholar 

  38. Aziz MF, Caetano-Anollés K, Caetano-Anollés G (2016) The early history and emergence of molecular functions and modular scale-free network behavior. Sci Rep 6:25058

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We would like to thank Universiti Malaysia Pahang for supporting this work under the RDU Grant, Grant number: RDU1703200 and RDU180344.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kohbalan Moorthy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Moorthy, K., Jaber, A.N., Ismail, M.A., Ernawan, F., Mohamad, M.S., Deris, S. (2019). Missing-Values Imputation Algorithms for Microarray Gene Expression Data. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9442-7_12

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9441-0

  • Online ISBN: 978-1-4939-9442-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics