Skip to main content

TNorm: An Unsupervised Batch Effects Correction Method for Gene Expression Data Classification

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9489))

Included in the following conference series:

  • 2111 Accesses

Abstract

In the field of biomedical research, gene expression analysis helps to identify the disease-related genes as genetic markers for diagnosis. As there is a huge number of publicly available gene expression datasets, the ongoing challenge is to utilize those available data effectively. Merging microarray datasets from different batches to improve the statistical power of a study is one of the active research topics. However, various works have addressed the issue of batch effects variation, which describes variation in gene expression levels induced by different experimental environments. Ignoring this variation may result in erroneous findings in a study. This work proposes a method for batch effect correction by mapping underlying topology of different batches. The mapping process for cross-batch normalization is examined using basic linear transformation. The comparative study of three cancers is conducted to compare the proposed method with a proven batch effects correction method. The results show that our method outperforms the existing method in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Su, A.I., Welsh, J.B., Sapinoso, L.M., Kern, S.G., Dimitrov, P., Lapp, H., Schultz, P.G., Powell, S.M., Moskaluk, C.A., Frierson Jr., H.F., Hampton, G.M.: Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61, 7388–7393 (2001)

    Google Scholar 

  2. Lu, Y., Han, J.: Cancer classification using gene expression data. Inf. Syst. 28, 243–268 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  3. Wang, Y., Klijn, J.G., Zhang, Y., Sieuwerts, A.M., Look, M.P., Yang, F., Talantov, D., Timmermans, M., Meijer-van Gelder, M.E., Yu, J., Jatkoe, T., Berns, E.M.J.J., Atkins, D., Foekens, J.A.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679 (2005)

    Article  Google Scholar 

  4. Dupuy, A., Simon, R.M.: Critical review of published miroarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl Cancer Inst. 99, 147–157 (2007)

    Article  Google Scholar 

  5. Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays a multiple random validation strategy. Lancet 365, 488–492 (2005)

    Article  Google Scholar 

  6. Ein-Dor, L., Suk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. U.S.A. 103, 5923–5928 (2006)

    Article  Google Scholar 

  7. Xu, L., Tan, A.C., Winslow, R.L., Geman, D.: Merging microarray data from separate breast cancer studies provides a robust prognostic test. BMC Bioinf. 9, 125 (2008)

    Article  Google Scholar 

  8. Shabalin, A.A., Tjelmeland, H., Fan, C., Perou, C.M.: Merging two gene-expression studies via cross-platform normalization. Bioinformatics 24, 1154 (2008)

    Article  Google Scholar 

  9. Wang, Y., Joshi, T., Zhang, X.S., Xu, D., Chen, L.: Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics 22, 2413 (2006)

    Article  Google Scholar 

  10. Choi, H., Shen, R., Chinnaiyan, A.M., Ghosh, D.: A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments. BMC Bioinf. 8, 364 (2007)

    Article  Google Scholar 

  11. Warnat, P., Eils, R., Brors, B.: Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinf. 6, 265 (2005)

    Article  Google Scholar 

  12. Larsen, M.J., Thomassen, M., Tan, Q., Sorensen, K.P., Kruse, T.A.: Microarray-based RNA profiling of Breast cancer: batch effect removal improves cross-platform consistency. BioMed Res. Int. 2014, 11 (2014)

    Article  Google Scholar 

  13. Engchuan, W., Meechai, A., Tongsima, S., Chang, J.H.: Handling batch effect on cross-platform classification of microarray data. Int. J. Adv. Intell. Paradigms (in press)

    Google Scholar 

  14. Johnson, W.E., Li, C.: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118 (2007)

    Article  MATH  Google Scholar 

  15. Marian, P., Wesam, B., Colin, F.: Topology-preserving mappings for data visualization, pp. 131–150. Principal Manifolds for Data Visualization and Dimension Reduction. Springer, Berlin Heidelberg (2008)

    Google Scholar 

  16. Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–229 (2002)

    Article  Google Scholar 

  17. Turashvili, G., Bouchal, J., Baumforth, K., Wei, W., Dziechciarkova, M., Ehrmann, J., Klein, J., Fridman, E., Skarda, J., Srovnal, J., Hajduch, M., Murray, P., Kolar, Z.: Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis. BMC Cancer 7, 55 (2007)

    Article  Google Scholar 

  18. Richardson, A.L., Wang, Z.C., De Nicolo, A., Lu, X., Brown, M., Miron, A., Liao, X., Iglehart, J.D., Livingston, D.M., Ganesan, S.: X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell 9, 121–132 (2006)

    Article  Google Scholar 

  19. Hong, Y., Ho, K.S., Eu, K.W., Cheah, P.Y.: A susceptibility gene set for early onset colorectal cancer that integrates diverse signaling pathways: implication for tumorigenesis. Clin. Cancer Res. 13, 1107–1114 (2007)

    Article  Google Scholar 

  20. Sabates-Bellver, J., Van der Flier, L.G., de Palo, M., Cattaneo, E., Maake, C., Rehrauer, H., Laczko, E., Kurowski, M.A., Bujnicki, J.M., Menigatti, M., Luz, J., Ranalli, T.V., Gomes, V., Pastorelli, A., Faggiani, R., Anti, M., Jiricny, J., Clevers, H., Marra, G.: Transcriptome profile of human colorectal adenomas. Mol. Cancer Res. 5, 1263–1275 (2007)

    Article  Google Scholar 

  21. Spira, A., Beane, J.E., Shah, V., Steiling, K., Liu, G., Schembri, F., Gliman, S., Dumas, Y.M., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lenburg, M.E., Brody, J.S.: Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat. Med. 13, 361–366 (2007)

    Article  Google Scholar 

  22. Landi, M.T., Dracheva, T., Rotunno, M., Figueroa, J.D., Liu, H., Dasgupta, A., Mann, R.E., Fukuoka, J., Hames, M., Bergen, A.W., Murphy, S.E., Yang, P., Pesatori, A.C., Consonni, D., Bertazzi, P.A., Wacholder, S., Shih, J.H., Caporaso, N.E., Jen, J.: Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS ONE 3, e1651 (2008)

    Article  Google Scholar 

  23. Sootanan, P., Prom-on, S., Meechai, A., Chan, J.H.: Pathway-based microarray analysis for robust disease classification. Neural Comput. Appl. 21, 649–660 (2011)

    Article  Google Scholar 

  24. Engchuan, W., Chan, J.H.: Pathway activity transformation for multi-class classification of Lung cancer datasets. Neurocomputing 165, 81–89 (2014)

    Article  Google Scholar 

  25. Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis, a knowledge-based approached for interpreting genome-wide expression profiles. PNAS 102, 15545–15550 (2005)

    Article  Google Scholar 

  26. Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)

    Article  Google Scholar 

  27. Hall, M.A.: Correlation-Based Feature Subset Selection for Machine Learning. Hamilton, New Zealand (1998)

    Google Scholar 

  28. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced dataset: A review. GESTS Int. Trans. ComSci. Eng. 30, 25–36 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Praisan Padungweang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Padungweang, P., Engchuan, W., Chan, J.H. (2015). TNorm: An Unsupervised Batch Effects Correction Method for Gene Expression Data Classification. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9489. Springer, Cham. https://doi.org/10.1007/978-3-319-26532-2_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26532-2_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26531-5

  • Online ISBN: 978-3-319-26532-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics