Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Text Mining

  • Yanli CaiEmail author
  • Jian-Tao Sun
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_418


Knowledge discovery in text (KDT)


Text mining is the art of data mining from text data collections. The goal is to discover knowledge (or information, patterns) from text data, which are unstructured or semi-structured. It is a subfield of Data Mining (DM), which is also known as Knowledge Discovery in Databases (KDD). KDD is to discover knowledge from various data sources, including text data, relational databases, Web data, user log data, etc. Text Mining is also related to other research fields, including Machine Learning (ML), Information Retrieval (IR), Natural Language Processing (NLP), Information Extraction (IE), Statistics, Pattern Recognition (PR), Artificial Intelligence (AI), etc.

Historical Background

The phrase of Knowledge Discovery in Databases (KDD) was first used at 1st KDD workshop in 1989. Marti Hearst [4] first used the term of text data mining (TDM) and differentiated it with other concepts such as information retrieval and natural language...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Andreas H, Andreas N, Gerhard P. A brief survey of text mining. J Computat Linguistics Lang Technol. 2005;20(1):19–62.Google Scholar
  2. 2.
    Bing L. Web data mining: exploring hyperlinks contents and usage data. Berlin: Springer; 2007. p. 411–47.Google Scholar
  3. 3.
    Dipanjan D, Martins AFT. A survey on automatic text summarization. Literature survey for the language and statistics II course at Carnegie Mellon University; November. 2007.Google Scholar
  4. 4.
    Hearst M Untangling text data mining. In: Proceedings of the 27th Annual Meeting of the Associate for Computational Linguistics; 1999.Google Scholar
  5. 5.
    Informative and indicative summarization. Available at: http://www1.cs.columbia.edu/~min/papers/sigirDuc01/node2.html.
  6. 6.
    Liebman M. Bioinformatics: an editorial perspective. Available at: http://www.netsci.org/Science/Bioinform/feature01.html.
  7. 7.
    Usama F, Gregory P-S, Padhraic S. From data mining to knowledge discovery in databases. AI Mag. 1996;17(3):37–54.Google Scholar
  8. 8.
    Wayne CL. Multilingual topic detection and tracking: successful research enabled by corpora and evaluation. In: Proceedings of the 27th Annual Meeting of the Associate for Computational Linguistics; 2000.Google Scholar
  9. 9.
    Witten IH. Text mining. In: Singh MP, editor. Practical handbook of internet computing. Boca Raton: Chapman and Hall/CRC Press; 2005. p. 14-1–14-22.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Shanghai Jiao Tong UniversityShanghaiChina
  2. 2.Microsoft Research AsiaBeijingChina

Section editors and affiliations

  • Zheng Chen
    • 1
  1. 1.Microsoft Research AsiaMicrosoft CorporationBeijingChina