Abstract
Keyword and feature extraction is a fundamental problem in text data mining and document processing. A majority of document processing applications directly depend on the quality and speed of keyword extraction algorithms. In this article, an approach, introduced in [1], to rapid change detection in data streams and documents is developed and analysed. It is based on ideas from image processing and especially on the Helmholtz Principle from the Gestalt Theory of human perception. Applied to the problem of keywords extraction, it delivers fast and effective tools to identify meaningful keywords using parameter-free methods. We also define a level of meaningfulness of the keywords which can be used to modify the set of keywords depending on application needs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A. Balinsky, H. Balinsky, and S.Simske, On Helmholtzs principle for documents processing, Proc. 10th ACM symposium on Document engineering, Sep. 2010.
A. N. Srivastava and M. Sahami (editors), Text Mining: classification, clustering, and applications, CRC Press, 2009.
D. Lowe, Perceptual Organization and Visual Recognition, Amsterdam: Kluwer Academic Publishers, 1985.
K. Spärck Jones, A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, vol. 28, no. 1, pp. 1121, 1972.
S. Robertson, Understanding inverse document frequency: On theoretical arguments for idf, Journal of Documentation, vol. 60, no. 5, pp. 503520, 2004.
J. Kleinberg, Bursty and hierarchical structure in streams, Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.
A. Desolneux, L. Moisan, and J.-M. Morel, From Gestalt Theory to Image Analysis: A Probabilistic Approach, ser. Interdisciplinary Applied Mathematics, Springer, 2008, vol.34.
Union Adresses. [Online]. Available at http://stateoftheunion.onetwothree.net/
20 Newsgroups Data Set. [Online], 1999. Available at http://kdd.ics.uci.edu/databases/20newsgroups
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Balinsky, A., Balinsky, H., Simske, S. (2017). On the Helmholtz Principle for Data Mining. In: Kreinovich, V. (eds) Uncertainty Modeling. Studies in Computational Intelligence, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-319-51052-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-51052-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51051-4
Online ISBN: 978-3-319-51052-1
eBook Packages: EngineeringEngineering (R0)