Effective Removal of Noisy Data Via Batch Effect Processing

  • Ryan G. BentonEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1617)


In order to have faith in the analysis of data, a key factor is to have confidence that the data is reliable. In the case of microRNA, reliability includes understanding the collection methods, ensuring that the analysis is appropriate, and ensuring that the data itself is accurate. A key element in ensuring data accuracy is the removal of noise. While there can be several sources of noise, a common source of noise is the batch effect, which can be defined as systematic variability in the data caused by non-biological factors. This chapter will present various techniques designed to remove variability caused by batch effects and the potential effectiveness.

Key words

MicroRNA Batch effects Normalization Knowledge Discovery in Databases Noise Removal 


  1. 1.
    Aggarwal CC (2015) Data mining: the textbook. Springer, New York. doi: 10.1007/978-3-319-14142-8 CrossRefGoogle Scholar
  2. 2.
    Munson MA (2012) A study on the importance of and time spent on different modeling steps. ACM SIGKDD Explor Newsl 13:65–71. doi: 10.1145/2207243.2207253 CrossRefGoogle Scholar
  3. 3.
    Adriaans P, Zantinge D (1996) Data mining. Addison-Wesley, Reading, MAGoogle Scholar
  4. 4.
    Duhamel A, Nuttens MC, Devos P et al (2003) A preprocessing method for improving data mining techniques. Application to a large medical diabetes database. Stud Heal Technol Inf 95:269–274Google Scholar
  5. 5.
    Jiawei H, Kamber M, Han J, Pei J (2012) Data mining: concepts and techniques. Morgan Kaufmann Publishers, Walthham, MA. doi:  10.1016/B978-0-12-381479-1.00001-0
  6. 6.
    Guo Y, Zhao S, Su P-F et al (2014) Statistical strategies for microRNAseq batch effect reduction. Transl Cancer Res 3:260–265PubMedPubMedCentralGoogle Scholar
  7. 7.
    Ding F (2013) A comparative study of different strategies of batch effect removal in microarray data: a case study of three datasets. Master thesis, University of PittsburghGoogle Scholar
  8. 8.
    Vaisipour S (2014) Detecting, correcting, and preventing the batch effects in multi-site data, with a focus on gene expression microarrays. Doctoral thesis, University of AlbertaGoogle Scholar
  9. 9.
    Nygaard V, Rødland EA, Hovig E (2016) Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses. Biostatistics 17:29–39PubMedGoogle Scholar
  10. 10.
    Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A 98:31–36CrossRefPubMedGoogle Scholar
  11. 11.
    Rao Y, Lee Y, Jarjoura D et al (2008) A comparison of normalization techniques for microRNA microarray data. Stat Appl Genet Mol Biol 7:Article22Google Scholar
  12. 12.
    Park T, Tsui SK-W, Chen L, et al (2010) 2010 {IEEE} International conference on bioinformatics and biomedicine, {BIBM} 2010, Hong Kong, China, Dec 18–21, 2010, ProceedingsGoogle Scholar
  13. 13.
    Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193. doi: 10.1093/bioinformatics/19.2.185 CrossRefPubMedGoogle Scholar
  14. 14.
    Hu J, He X (2007) Enhanced quantile normalization of microarray data to reduce loss of information in gene expression profiles. Biometrics 63:50–59Google Scholar
  15. 15.
    Wu Z, Aryee MJ (2010) Subset quantile normalization using negative control features. J Comput Biol 17:1385–1395CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Hansen KD, Irizarry RA, Wu Z (2012) Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 13:204–216CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Leek JT, Storey JD (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3:1724–1735Google Scholar
  18. 18.
    Buja A, Eyuboglu N (1992) Remarks on parallel analysis. Multivariate Behav Res 27:509–540CrossRefPubMedGoogle Scholar
  19. 19.
    Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100:9440–9445CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8:118–127. doi: 10.1093/biostatistics/kxj037
  21. 21.
    Dudoit S, Yang YH, Callow MJ, Speed TP (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 12:111–139Google Scholar
  22. 22.
    Cleveland WS, Devlin SJ (1988) Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc 83:596CrossRefGoogle Scholar
  23. 23.
    Scherer A (2009) Batch effects and noise in microarray experiments: sources and solutions. Wiley Blackwell, OxfordCrossRefGoogle Scholar
  24. 24.
    Qin LX, Zhou Q (2014) MicroRNA array normalization: an evaluation using a randomized dataset as the benchmark. PLoS One 9:e98879CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of South Alabama School of ComputingMobileUSA

Personalised recommendations