Cognitive Data Analysis for Big Data

  • Jing ShyrEmail author
  • Jane Chu
  • Mike Woods
Part of the Springer Handbooks of Computational Statistics book series (SHCS)


Cognitive data analysis (CDA) automates and adds cognitive processes to data analysis so that the business user or data analyst can gain insights from advanced analytics. CDA is especially important in the age of big data, where the data is so complex, and includes both structured and unstructured data, that it is impossible to manually examine all possible combinations. As a cognitive computing system, CDA does not simply take over the entire process. Instead, CDA interacts with the user and learns from the interactions. This chapter reviews IBM Corporation’s (IBM SPSS Modeler CRISP-DM guide, 2011) Cross Industry Standard Process for Data Mining (CRISP-DM) as a precursor of CDA. Then, continuing to develop the ideas set forth in Shyr and Spisic’s (“Automated data analysis for Big Data.” WIREs Comp Stats 6: 359–366, 2014), this chapter defines a new three-stage CDA process. Each stage (Data Preparation, Automated Modeling, and Application of Results) is discussed in detail. The Data Preparation stage alleviates or eliminates the data preparation burden from the user by including smart technologies such as natural language query and metadata discovery. This stage prepares the data for specific and appropriate analyses in the Automated Modeling stage, which performs descriptive as well as predictive analytics and presents the user with starting points and recommendations for exploration. Finally, the Application of Results stage considers the user’s purpose, which may be to directly gain insights for smarter decisions and better business outcomes or to deploy the predictive models in an operational system.


Business intelligence Cognitive data analysis Data lineage Data quality Entity analytics Metadata discovery Natural language query Social media analytics 


  1. Ansotegui C, Malitsky Y, Samulowitz H, Sellmann M, Tierney K (2015) Model-based genetic algorithms for algorithm configuration. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence. Association of for the Advancement of Artificial Intelligence Press, Palo Alto, CA, pp 733–739Google Scholar
  2. Barrenechea M (2013) Big data: big hype? Forbes, February 4:2013. Google Scholar
  3. Chu J, Zhong WC (2015) Automatic time interval metadata determination for business intelligence and predictive analytics. US Patent Application 14/884,468. 15 Oct 2015Google Scholar
  4. City of Chicago (2011) Crimes—2001 to present. City of Chicago Data Portal. Accessed 5 Apr 2018
  5. Davenport TH, Harris JG (2007) Competing on analytics: the new science of winning. Harvard Business School Press, Boston, MAGoogle Scholar
  6. Hurwitz JS, Kaufman M, Bowles A (2015) Cognitive computing and big data analytics. Wiley, Indianapolis, INGoogle Scholar
  7. IBM Corporation (2011) IBM SPSS Modeler CRISP-DM guide. IBM Corporation, Armonk, NYGoogle Scholar
  8. IBM Corporation (2014) The four V’s of big data. IBM Big Data & Analytics Hub.
  9. International Plain Language Federation (2018). Plain Language definition. Accessed 5 Apr 2018
  10. Lohr S (2014) For big-data scientists, ‘Janitor Work’ is key hurdle to insights. New York Times, August 17 2014.
  11. Maney K (2014) ‘Big Data’ will change how you play, see the doctor, even eat. Newsweek, July 24 2014.
  12. Rais-Ghasem M, Grosset R, Petitclerc M, Wei Q (2013) Towards semantic data analysis. IBM Canada, Ltd., Ottawa, OntarioGoogle Scholar
  13. Shyr J, Spisic D (2014) Automated data analysis for Big Data. WIREs Comp Stats 6:359–366CrossRefGoogle Scholar
  14. Shyr J, Spisic D, Chu J, Han S, Zhang XY (2013). Relationship discovery in business analytics. In: JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association. pp 5146–5158Google Scholar
  15. Sokol L, Jonas J (2012) Using entity analytics to greatly increase the accuracy of your models quickly and easily. IBM Corporation, Armonk, NYGoogle Scholar
  16. Techopedia (2018). Data lineage. Accessed 5 Apr 2018
  17. Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithm. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. New York, NY: Association for Computing Machinery. 847–855Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM Business AnalyticsChicagoUSA

Personalised recommendations