Text Analytics: The Dark Data Frontier

Masood, Adnan; Hashmi, Adnan

doi:10.1007/978-1-4842-4106-6_4

Adnan Masood³ &
Adnan Hashmi⁴

1866 Accesses
3 Citations

Abstract

Text is everywhere. Analysts at Gartner estimate that upward of 80 percent of enterprise data today is unstructured. Our everyday interactions generate torrents of such data, including tweets, blog posts, advertisements, news, articles, research papers, descriptions, emails, YouTube comments, Yelp reviews, surveys from your insurance company, and call transcripts; there is a tremendous amount of unstructured data, and the majority of it is text. Another general way to describe this large amount of mostly monetizable data (except YouTube comments—those are toxic!) is by classifying it as dark data. The origin of this term is not well known, but it was popularized by Stanford’s Dr. Chris Re, who founded the DeepDive program for extracting valuable information from dark data. The term pertains to the mountains of raw information collected in various ways, and such data remains difficult to analyze.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 29.99; Price excludes VAT (USA)

Softcover Book: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A comprehensive list of Azure solution architectures to help you design and implement secure, highly available, performant, and resilient solutions on Azure can be found here: https://azure.microsoft.com/en-us/solutions/architecture/ .

Author information

Authors and Affiliations

Stanford, CA, USA
Adnan Masood
Nashville, TN, USA
Adnan Hashmi

Authors

Adnan Masood
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Hashmi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Masood, A., Hashmi, A. (2019). Text Analytics: The Dark Data Frontier. In: Cognitive Computing Recipes. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4106-6_4

Download citation

DOI: https://doi.org/10.1007/978-1-4842-4106-6_4
Published: 28 March 2019
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4105-9
Online ISBN: 978-1-4842-4106-6
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics