Natural Language Processing Basics

Sarkar, Dipanjan

doi:10.1007/978-1-4842-4354-1_1

Dipanjan Sarkar²

7047 Accesses
1 Citations

Abstract

We have ushered in the age of Big Data, where organizations and businesses are having difficulty managing all the data generated by various systems, processes, and transactions. However, the term Big Data is misused a lot due to the vague definition of the 3Vs of data—volume, variety, and velocity. It is sometimes difficult to quantify what data is “big”. Some might think a billion records in a database is “Big Data,” but that number seems small compared to the petabytes of data being generated by various sensors or by social media. One common characteristic is the large volume of unstructured textual data that’s present across all organizations, irrespective of their domain. As an example, we have vast amounts of data in the form of tweets, status messages, hash tags, articles, blogs, wikis, and much more on social media. Even retail and ecommerce stores generate a lot of textual data, from new product information and metadata to customer reviews and feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Bangalore, Karnataka, India
Dipanjan Sarkar

Authors

Dipanjan Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sarkar, D. (2019). Natural Language Processing Basics. In: Text Analytics with Python. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4354-1_1

Download citation

DOI: https://doi.org/10.1007/978-1-4842-4354-1_1
Published: 22 May 2019
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4353-4
Online ISBN: 978-1-4842-4354-1
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics