Abstract
Cognitive data analysis (CDA) automates and adds cognitive processes to data analysis so that the business user or data analyst can gain insights from advanced analytics. CDA is especially important in the age of big data, where the data is so complex, and includes both structured and unstructured data, that it is impossible to manually examine all possible combinations. As a cognitive computing system, CDA does not simply take over the entire process. Instead, CDA interacts with the user and learns from the interactions. This chapter reviews IBM Corporation’s (IBM SPSS Modeler CRISP-DM guide, 2011) Cross Industry Standard Process for Data Mining (CRISP-DM) as a precursor of CDA. Then, continuing to develop the ideas set forth in Shyr and Spisic’s (“Automated data analysis for Big Data.” WIREs Comp Stats 6: 359–366, 2014), this chapter defines a new three-stage CDA process. Each stage (Data Preparation, Automated Modeling, and Application of Results) is discussed in detail. The Data Preparation stage alleviates or eliminates the data preparation burden from the user by including smart technologies such as natural language query and metadata discovery. This stage prepares the data for specific and appropriate analyses in the Automated Modeling stage, which performs descriptive as well as predictive analytics and presents the user with starting points and recommendations for exploration. Finally, the Application of Results stage considers the user’s purpose, which may be to directly gain insights for smarter decisions and better business outcomes or to deploy the predictive models in an operational system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ansotegui C, Malitsky Y, Samulowitz H, Sellmann M, Tierney K (2015) Model-based genetic algorithms for algorithm configuration. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence. Association of for the Advancement of Artificial Intelligence Press, Palo Alto, CA, pp 733–739
Barrenechea M (2013) Big data: big hype? Forbes, February 4:2013. http://www.forbes.com/sites/ciocentral/2013/02/04/big-data-big-hype/
Chu J, Zhong WC (2015) Automatic time interval metadata determination for business intelligence and predictive analytics. US Patent Application 14/884,468. 15 Oct 2015
City of Chicago (2011) Crimes—2001 to present. City of Chicago Data Portal. https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2. Accessed 5 Apr 2018
Davenport TH, Harris JG (2007) Competing on analytics: the new science of winning. Harvard Business School Press, Boston, MA
Hurwitz JS, Kaufman M, Bowles A (2015) Cognitive computing and big data analytics. Wiley, Indianapolis, IN
IBM Corporation (2011) IBM SPSS Modeler CRISP-DM guide. IBM Corporation, Armonk, NY
IBM Corporation (2014) The four V’s of big data. IBM Big Data & Analytics Hub. http://www.ibmbigdatahub.com/infographic/four-vs-big-data
International Plain Language Federation (2018). Plain Language definition. http://www.iplfederation.org/. Accessed 5 Apr 2018
Lohr S (2014) For big-data scientists, ‘Janitor Work’ is key hurdle to insights. New York Times, August 17 2014. http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=0
Maney K (2014) ‘Big Data’ will change how you play, see the doctor, even eat. Newsweek, July 24 2014. http://www.newsweek.com/2014/08/01/big-data-big-data-companies-260864.html
Rais-Ghasem M, Grosset R, Petitclerc M, Wei Q (2013) Towards semantic data analysis. IBM Canada, Ltd., Ottawa, Ontario
Shyr J, Spisic D (2014) Automated data analysis for Big Data. WIREs Comp Stats 6:359–366
Shyr J, Spisic D, Chu J, Han S, Zhang XY (2013). Relationship discovery in business analytics. In: JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association. pp 5146–5158
Sokol L, Jonas J (2012) Using entity analytics to greatly increase the accuracy of your models quickly and easily. IBM Corporation, Armonk, NY
Techopedia (2018). Data lineage. https://www.techopedia.com/definition/28040/data-lineage. Accessed 5 Apr 2018
Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithm. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. New York, NY: Association for Computing Machinery. 847–855
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Shyr, J., Chu, J., Woods, M. (2018). Cognitive Data Analysis for Big Data. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-18284-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18283-4
Online ISBN: 978-3-319-18284-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)