Major Challenges and Limitations of Big Data Analytics

  • Bo CaoEmail author
  • Jim Reilly


The big data analytics open a promising path to personalized psychiatry. Along with the opportunities are some unprecedented challenges. In this chapter, we will discuss some of these challenges that we are facing in the field of big data analytics in psychiatry. For example, we are still lacking data standardization in diagnoses, variables and protocols, and we also have limitations in applications of machine learning techniques. However, the field of big data analytics in psychiatry is rapid developing, and we expect to overcome these challenges with the joint force of researchers in related fields in the near future.


Data standardization Feature selection Imbalanced data Overfitting problem Missing data 


  1. Absinta M, Ha SK, Nair G et al (2017) Human and nonhuman primate meninges harbor lymphatic vessels that can be visualized noninvasively by MRI. Elife. 6:e29738. CrossRefPubMedPubMedCentralGoogle Scholar
  2. American Psychiatric Association (2013a) Diagnostic and statistical manual of mental disorders, 5th Edition (DSM-5). Diagnostic Stat Manual of Mental Disorder 4th Ed TR. 280.
  3. American Psychiatric Association (2013b) Highlights of changes from DSM-IV to DSM-5. Focus (Madison) 11(4):525–527. CrossRefGoogle Scholar
  4. Andreasen NC, Nopoulos P, Magnotta V, Pierson R, Ziebell S, Ho B-C (2011) Progressive brain change in schizophrenia: a prospective longitudinal study of first-episode schizophrenia. Biol Psychiatry 70(7):672–679. CrossRefPubMedPubMedCentralGoogle Scholar
  5. Armanfard N, Reilly JP, Komeili M (2016a) Local feature selection for data classification. IEEE Trans Pattern Anal Mach Intell 38(6):1217–1227. CrossRefPubMedGoogle Scholar
  6. Armanfard N, Komeili M, Reilly JP, Mah R, Connolly JF (2016b) Automatic and continuous assessment of ERPs for mismatch negativity detection. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, vol 2016. IEEE, Piscataway, pp 969–972. CrossRefGoogle Scholar
  7. Armanfard N, Reilly JP, Komeili M (2017) Logistic localized modeling of the sample space for feature selection and classification. IEEE Trans Neural Networks Learn Syst 29(5):1396–1413. CrossRefGoogle Scholar
  8. Bellman RE, Dreyfus SE (1962) Applied dynamic programming. Ann Math Stat 33(2):719–726. CrossRefGoogle Scholar
  9. Berk M, Conus P, Lucas N et al (2007) Setting the stage: from prodrome to treatment resistance in bipolar disorder. Bipolar Disord 9(7):671–678. CrossRefPubMedGoogle Scholar
  10. Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin. CrossRefGoogle Scholar
  11. Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The X-random case. Int Stat Rev 60(3):291–319. CrossRefGoogle Scholar
  12. Cao B, Passos IC, Mwangi B et al (2016) Hippocampal volume and verbal memory performance in late-stage bipolar disorder. J Psychiatr Res 73:102–107. CrossRefPubMedGoogle Scholar
  13. Cao B, Stanley JA, Passos IC et al (2017a) Elevated choline-containing compound levels in rapid cycling bipolar disorder. Neuropsychopharmacology 42(11):2252–2258. CrossRefPubMedPubMedCentralGoogle Scholar
  14. Cao B, Mwangi B, Passos IC et al (2017b) Lifespan gyrification trajectories of human brain in healthy individuals and patients with major psychiatric disorders. Sci Rep 7(1):511. CrossRefPubMedPubMedCentralGoogle Scholar
  15. Cao B, Passos IC, Mwangi B et al (2017c) Hippocampal subfield volumes in mood disorders. Mol Psychiatry 22(9):1–7. CrossRefGoogle Scholar
  16. Cao B, Luo Q, Fu Y et al (2018) Predicting individual responses to the electroconvulsive therapy with hippocampal subfield volumes in major depression disorder. Sci Rep 8(1):5434. CrossRefPubMedPubMedCentralGoogle Scholar
  17. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. CrossRefGoogle Scholar
  18. Colic S, Wither RG, Lang M, Zhang L, Eubanks JH, Bardakjian BL (2017) Prediction of antiepileptic drug treatment outcomes using machine learning. J Neural Eng 14(1):016002. CrossRefPubMedGoogle Scholar
  19. García-Laencina PJ, Sancho-Gómez J-L, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282. CrossRefGoogle Scholar
  20. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York. CrossRefGoogle Scholar
  21. Haukvik UK, Westlye LT, Mørch-Johnsen L et al (2015) In vivo hippocampal subfield volumes in schizophrenia and bipolar disorder. Biol Psychiatry 77(6):581–588. CrossRefPubMedGoogle Scholar
  22. Haykin S (2009) Neural networks and learning machines, vol 3. Prentice Hall, Upper Saddle River doi:978-0131471399Google Scholar
  23. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284. CrossRefGoogle Scholar
  24. Kapczinski NS, Mwangi B, Cassidy RM et al (2016) Neuroprogression and illness trajectories in bipolar disorder. Expert Rev Neurother 7175:1744–8360 (Electronic):1–9. CrossRefGoogle Scholar
  25. Khodayari-Rostamabad A, Hasey GM, MacCrimmon DJ, Reilly JP, de Bruin H (2010) A pilot study to determine whether machine learning methodologies using pre-treatment electroencephalography can predict the symptomatic response to clozapine therapy. Clin Neurophysiol 121(12):1998–2006. CrossRefPubMedGoogle Scholar
  26. Khodayari-Rostamabad A, Reilly JP, Hasey GM, de Bruin H, MacCrimmon DJ (2013) A machine learning approach using EEG data to predict response to SSRI treatment for major depressive disorder. Clin Neurophysiol 124(10):1975–1985. CrossRefPubMedGoogle Scholar
  27. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2):1–7. CrossRefGoogle Scholar
  28. Le QV A tutorial on deep learning part 2: autoencoders, convolutional neural networks and recurrent neural networks. Google Brain. 2015:1–20Google Scholar
  29. Le Roux N, Bengio Y (2008) Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput 20(6):1631–1649. CrossRefPubMedGoogle Scholar
  30. Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201. CrossRefPubMedGoogle Scholar
  31. Panta SR, Wang R, Fries J et al (2016) A tool for interactive data visualization: application to over 10,000 brain imaging and phantom MRI data sets. Front Neuroinform 10:1–12. CrossRefGoogle Scholar
  32. Passos IC, Mwangi B, Vieta E, Berk M, Kapczinski F (2016) Areas of controversy in neuroprogression in bipolar disorder. Acta Psychiatr Scand 134(2):91–103. CrossRefPubMedGoogle Scholar
  33. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. CrossRefPubMedGoogle Scholar
  34. Rajkomar A, Oren E, Chen K et al (2018) Scalable and accurate deep learning for electronic health records. npj Digit Med 1(1):1–15. CrossRefGoogle Scholar
  35. Ravan M, Reilly JP, Trainor LJ, Khodayari-Rostamabad A (2011) A machine learning approach for distinguishing age of infants using auditory evoked potentials. Clin Neurophysiol 122(11):2139–2150. CrossRefPubMedGoogle Scholar
  36. Ravan M, MacCrimmon D, Hasey G, Reilly JP, Khodayari-Rostamabad A (2012) A machine learning approach using P300 responses to investigate effect of clozapine therapy. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. IEEE, Piscataway, pp 5911–5914. CrossRefGoogle Scholar
  37. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533. CrossRefGoogle Scholar
  38. Schapire RE (2003) The boosting approach to machine learning: an overview. Nonlinear Estim Classif 171:149–171 doi: Scholar
  39. Soutullo C, Chang K (2005) Bipolar disorder in children and adolescents: international perspective on epidemiology and phenomenology. Bipolar Disord 7(6):497–506. CrossRefGoogle Scholar
  40. Stein JL, Hibar DP, Madsen SK et al (2011) Discovery and replication of dopamine-related gene effects on caudate volume in young and elderly populations (N1198) using genome-wide search. Mol Psychiatry 16(9):927–937. CrossRefPubMedPubMedCentralGoogle Scholar
  41. Trautmann S, Rehm J, Wittchen H (2016) The economic costs of mental disorders. EMBO Rep 17(9):1245–1249. CrossRefPubMedPubMedCentralGoogle Scholar
  42. Van Leemput K, Bakkour A, Benner T et al (2009) Automated segmentation of hippocampal subfields from ultra-high resolution in vivo MRI. Hippocampus 19(6):549–557. CrossRefPubMedPubMedCentralGoogle Scholar
  43. Vigo D, Thornicroft G, Atun R (2016) Estimating the true global burden of mental illness. Lancet Psychiatry 3(2):171–178. CrossRefPubMedGoogle Scholar
  44. Whiteford HA, Degenhardt L, Rehm J et al (2013) Global burden of disease attributable to mental and substance use disorders: findings from the Global Burden of Disease Study 2010. Lancet 382(9904):1575–1586. CrossRefPubMedPubMedCentralGoogle Scholar
  45. Williams EG, Auwerx J (2015) The convergence of systems and reductionist approaches in complex trait analysis. Cell 162(1):23–32. CrossRefPubMedPubMedCentralGoogle Scholar
  46. Woods KS, Doss CC, Bowyer KW, Solka JL, Priebe CE, Jr WPK (1993) Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. Int J Pattern Recognit Artif Intell 7(6):1417–1436CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Psychiatry, Faculty of Medicine & DentistryUniversity of AlbertaEdmontonCanada
  2. 2.Department of Electrical and Computer EngineeringMcMaster UniversityHamiltonCanada

Personalised recommendations