Definition
Text representation is one of the fundamental problems in text mining and Information Retrieval (IR). It aims to numerically represent the unstructured text documents to make them mathematically computable. For a given set of text documents D = {di, i = 1, 2, … , n}, where each di stands for a document, the problem of text representation is to represent each di of D as a point si in a numerical space S, where the distance/similarity between each pair of points in space S is well defined.
Historical Background
Mining the unstructured text data has attracted much attention of researchers in different areas due to its great industrial and commercial application potentials. A fundamental problem of text mining is how to represent the text documents to make them mathematically computable. Various text representation strategies have been proposed in the past decades for different application purposes such as text categorization, novelty detection and Information Retrieval (IR) [5]...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA. 2000;97(18):10101–6.
Daniel J, James HM. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech processing. Englewood Cliffs: Prentice-Hall; 2000.
Deerwester S, Dumais ST, Landauer TK, Furnas GW, Harshman RA. Indexing by latent semantic analysis. J Soc Inf Sci. 1990;41(6):391–407.
Gerard SA. Theory of indexing. Philadelphia: Society for Industrial Mathematics; 1987.
Gerard S, Michael J. Introduction to Modern Information Retrieval. New York: McGraw-Hill; 1983.
Thomas H. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR conference on Research and Development in Information Retrieval; 1999. p. 50–7.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Yan, J. (2018). Text Representation. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_420
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_420
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering