Abstract
Summarizing the contents of a book is a matter of personal preferences. One way to obtain more objectivity is to use formal criteria for the identification of the most important findings. In Chap. 8, we introduced text mining, in particular, analysis methods based on the term document matrix for the detection of structure in text data. Hence, we thought that it is self-evident to use the text mining approach as a starting point for a summary. Specifically, we used the following procedure in order to acquire an overview on the contents in the different chapters:
-
Definition of a corpus containing the eight chapters of the book.
-
Cleaning the documents in the standard way by removal of stop words, punctuation, and numbers.
-
Definition of two document term matrices: one with words and the other one with words and bigrams. For these two matrices, some stemming was done, mainly to clean plurals. Furthermore, some additional stop words were removed, mainly words in context of the examples in Chap. 8.
-
For both matrices, term frequency-inverse document frequencies (TF-IDF) were calculated.
-
Definition of comparison clouds each based on 60 terms.
-
Calculation of topic maps of order 2–8 for the term document matrices with stem terms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Grossmann, W., Rinderle-Ma, S. (2015). Summary. In: Fundamentals of Business Intelligence. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46531-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-662-46531-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46530-1
Online ISBN: 978-3-662-46531-8
eBook Packages: Computer ScienceComputer Science (R0)