Correlating Words - Approaches and Applications

  • Mario M. KubekEmail author
  • Herwig Unger
  • Jan Dusik
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9256)


The determination of characteristic and discriminating terms as well as their semantic relationships plays a vital role in text processing applications. As an example, term clustering techniques heavily rely on this information. Classic approaches for this means such as statistical co-occurrence analysis however usually only consider relationships between two terms that co-occur as immediate neighbours or on sentence level. This article presents flexible approaches to find statistically significant correlations between two or more terms using co-occurrence windows of arbitrary sizes. Their applicability will be discussed in detail by presenting solutions to improve the interactive and image-based search in the World Wide Web. Moreover, approaches to determine directed term associations and applications for them will be explained, too.


Word correlations Co-occurrence analysis  N-term co-occurrences Term associations Co-occurrence graphs 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Timonen, M., Silvonen, P., Kasari, M.: Modelling a query space using associations. In: Proceedings of the 2011 Conference on Information Modelling and Knowledge Bases XXII, pp. 77–96. IOS Press (2011)Google Scholar
  2. 2.
    Kubek, M., Witschel, H.F.: Searching the web by using the knowledge in local text documents. In: Proceedings of Mallorca Workshop 2010 Autonomous Systems. Shaker Verlag Aachen (2010)Google Scholar
  3. 3.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)Google Scholar
  4. 4.
    Biemann, C.: Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the HLT-NAACL-06 Workshop on Textgraphs-06, pp. 73–80. ACL, New York City (2006)Google Scholar
  5. 5.
    de Saussure, F.: Cours de Linguistique Générale. Payot, Paris (1916)Google Scholar
  6. 6.
    Dice, L.R.: Measures of the Amount of Ecologic Association Between Species. Ecology 26(3), 297–302 (1945)CrossRefGoogle Scholar
  7. 7.
    Jaccard, P.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin del la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901)Google Scholar
  8. 8.
    Büchler, M.: Flexibles Berechnen von Kookkurrenzen auf strukturierten und unstrukturierten Daten. Masters thesis, University of Leipzig (2006)Google Scholar
  9. 9.
    Quasthoff, U., Wolff, C.: The poisson collocation measure and its applications. In: Second International Workshop on Computational Approaches to Collocations. IEEE, Vienna (2002)Google Scholar
  10. 10.
    Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74. MIT Press, Cambridge (1993)Google Scholar
  11. 11.
    Heyer, G., Quasthoff, U., Wittig, T.: Text Mining: Wissensrohstoff Text: Konzepte, Algorithmen, Ergebnisse. W3L-Verlag, Dortmund (2006)Google Scholar
  12. 12.
    Fellbaum, C.: WordNet and wordnets. In: Brown, K., et al. (eds.) Encyclopedia of Language and Linguistics, 2nd edn, pp. 665–670. Elsevier, Oxford (2005)Google Scholar
  13. 13.
    McDonald, R., et al.: Non-projective dependency parsing using spanning tree algorithms. In: Byron, D., Venkataraman, A., Zhang, D. (eds.) Proc. of the Joint Conf. on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 523–530. ACL, Vancouver (2005)Google Scholar
  14. 14.
    Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proc. of the Sixteenth National Conference on Artificial Intelligence, Orlando, pp. 474–479 (1999)Google Scholar
  15. 15.
    Michel, J., et al.: Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331(6014), 176–182 (2011)CrossRefGoogle Scholar
  16. 16.
    Website of Google Autocomplete (2015).
  17. 17.
    Kubek, M.: Interaktive Anwendungen Kontextbasierter Suchverfahren. In: Fortschritt-Berichte VDI, Reihe 10 Nr. 839, VDI-Verlag Düsseldorf (2014)Google Scholar
  18. 18.
    Sukjit, P., Kubek, M., Böhme, T., Unger, H.: PDSearch: using pictures as queries. In: Boonkrong, S., Unger, H., Meesad, P. (eds.) Recent Advances in Information and Communication Technology. AISC, vol. 265, pp. 255–262. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  19. 19.
    Joshi, A., Motwani, R.: Keyword generation for search engine advertising. In: Sixth IEEE International Conference on Data Mining Workshops, Hong Kong, pp. 490–496 (2006)Google Scholar
  20. 20.
    Cutts, M.: Oxford Guide to Plain English. Oxford University Press (2013)Google Scholar
  21. 21.
    Biemann, C., Bordag, S., Quasthoff, U.: Automatic acquisition of paradigmatic relations using iterated co-occurrences. In: Proc. of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 967–970 (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Chair of Communication NetworksFernUniversität in HagenHagenGermany

Personalised recommendations