Skip to main content

Text Representation

  • Reference work entry
  • First Online:
  • 60 Accesses

Definition

Text representation is one of the fundamental problems in text mining and Information Retrieval (IR). It aims to numerically represent the unstructured text documents to make them mathematically computable. For a given set of text documents D = {di, i = 1, 2, … , n}, where each di stands for a document, the problem of text representation is to represent each di of D as a point si in a numerical space S, where the distance/similarity between each pair of points in space S is well defined.

Historical Background

Mining the unstructured text data has attracted much attention of researchers in different areas due to its great industrial and commercial application potentials. A fundamental problem of text mining is how to represent the text documents to make them mathematically computable. Various text representation strategies have been proposed in the past decades for different application purposes such as text categorization, novelty detection and Information Retrieval (IR) [5]...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA. 2000;97(18):10101–6.

    Article  Google Scholar 

  2. Daniel J, James HM. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech processing. Englewood Cliffs: Prentice-Hall; 2000.

    Google Scholar 

  3. Deerwester S, Dumais ST, Landauer TK, Furnas GW, Harshman RA. Indexing by latent semantic analysis. J Soc Inf Sci. 1990;41(6):391–407.

    Article  Google Scholar 

  4. Gerard SA. Theory of indexing. Philadelphia: Society for Industrial Mathematics; 1987.

    Google Scholar 

  5. Gerard S, Michael J. Introduction to Modern Information Retrieval. New York: McGraw-Hill; 1983.

    MATH  Google Scholar 

  6. Thomas H. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR conference on Research and Development in Information Retrieval; 1999. p. 50–7.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Yan .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Yan, J. (2018). Text Representation. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_420

Download citation

Publish with us

Policies and ethics