Skip to main content

On the Helmholtz Principle for Data Mining

  • Chapter
  • First Online:
Uncertainty Modeling

Part of the book series: Studies in Computational Intelligence ((SCI,volume 683))

  • 820 Accesses

Abstract

Keyword and feature extraction is a fundamental problem in text data mining and document processing. A majority of document processing applications directly depend on the quality and speed of keyword extraction algorithms. In this article, an approach, introduced in [1], to rapid change detection in data streams and documents is developed and analysed. It is based on ideas from image processing and especially on the Helmholtz Principle from the Gestalt Theory of human perception. Applied to the problem of keywords extraction, it delivers fast and effective tools to identify meaningful keywords using parameter-free methods. We also define a level of meaningfulness of the keywords which can be used to modify the set of keywords depending on application needs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A. Balinsky, H. Balinsky, and S.Simske, On Helmholtzs principle for documents processing, Proc. 10th ACM symposium on Document engineering, Sep. 2010.

    Google Scholar 

  2. A. N. Srivastava and M. Sahami (editors), Text Mining: classification, clustering, and applications, CRC Press, 2009.

    Google Scholar 

  3. D. Lowe, Perceptual Organization and Visual Recognition, Amsterdam: Kluwer Academic Publishers, 1985.

    Google Scholar 

  4. K. Spärck Jones, A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, vol. 28, no. 1, pp. 1121, 1972.

    Google Scholar 

  5. S. Robertson, Understanding inverse document frequency: On theoretical arguments for idf, Journal of Documentation, vol. 60, no. 5, pp. 503520, 2004.

    Google Scholar 

  6. J. Kleinberg, Bursty and hierarchical structure in streams, Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.

    Google Scholar 

  7. A. Desolneux, L. Moisan, and J.-M. Morel, From Gestalt Theory to Image Analysis: A Probabilistic Approach, ser. Interdisciplinary Applied Mathematics, Springer, 2008, vol.34.

    Google Scholar 

  8. Union Adresses. [Online]. Available at http://stateoftheunion.onetwothree.net/

  9. 20 Newsgroups Data Set. [Online], 1999. Available at http://kdd.ics.uci.edu/databases/20newsgroups

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Balinsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Balinsky, A., Balinsky, H., Simske, S. (2017). On the Helmholtz Principle for Data Mining. In: Kreinovich, V. (eds) Uncertainty Modeling. Studies in Computational Intelligence, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-319-51052-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51052-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51051-4

  • Online ISBN: 978-3-319-51052-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics