Abstract
We have ushered in the age of Big Data, where organizations and businesses are having difficulty managing all the data generated by various systems, processes, and transactions. However, the term Big Data is misused a lot due to the vague definition of the 3Vs of data—volume, variety, and velocity. It is sometimes difficult to quantify what data is “big”. Some might think a billion records in a database is “Big Data,” but that number seems small compared to the petabytes of data being generated by various sensors or by social media. One common characteristic is the large volume of unstructured textual data that’s present across all organizations, irrespective of their domain. As an example, we have vast amounts of data in the form of tweets, status messages, hash tags, articles, blogs, wikis, and much more on social media. Even retail and ecommerce stores generate a lot of textual data, from new product information and metadata to customer reviews and feedback.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Dipanjan Sarkar
About this chapter
Cite this chapter
Sarkar, D. (2019). Natural Language Processing Basics. In: Text Analytics with Python. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4354-1_1
Download citation
DOI: https://doi.org/10.1007/978-1-4842-4354-1_1
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4353-4
Online ISBN: 978-1-4842-4354-1
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)