On N-term Co-occurrences

Kubek, Mario; Unger, Herwig

doi:10.1007/978-3-319-06538-0_7

Mario Kubek⁵ &
Herwig Unger⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 265))

941 Accesses
1 Citations

Abstract

Since 80% of all information in the World Wide Web (WWW) is in textual form, most of the search activities of the users are based on groups of search words forming queries that represent their information needs. The quality of the returned results -usually evaluated using measures such as precision and recall- mostly depends on the quality of the chosen query terms. Therefore, their relatedness must be evaluated accordingly using and matched against the documents to be found. In order to do so properly, in this paper, the notion of n-term co-occurrences will be introduced and distinguished from the related concepts of n-grams and higher-order co-occurrences. Finally, their applicability for search, clustering and data mining processes will be considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

November 2013 Web Server Survey (2013), http://news.netcraft.com/archives/2013/11/01/november-2013-web-server-survey.html (last retrieved on March 01, 2014)
Grimes, S.: Unstructured Data and the 80 Percent Rule (2008), http://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percent-rule (last retrieved on March 01, 2014)
Agrawal, R., Yu, X., King, I., Zajac, R.: Enrichment and Reductionism: Two Approaches for Web Query Classification. In: Lu, B.-L., Zhang, L., Kwok, J., et al. (eds.) ICONIP 2011, Part III. LNCS, vol. 7064, pp. 148–157. Springer, Heidelberg (2011)
Chapter Google Scholar
Website of Google Autocomplete, Web Search Help (2013), http://support.google.com/websearch/bin/answer.py?hl=en&answer=106230 (last retrieved on March 01, 2014)
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Frei, H.-P., Harman, D., Schäuble, P., Wilkinson, R. (eds.) Proc. of the 19th AnnualInternational ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1996, Zurich, pp. 4–11 (1996)
Google Scholar
Kubek, M., Witschel, H.F.: Searching the Web by Using the Knowledge in Local Text Documents. In: Proceedings of Mallorca Workshop 2010 Autonomous Systems. Shaker Verlag, Aachen (2010)
Google Scholar
Keiichiro, H., et al.: Query expansion based on predictive algorithms for collaborative filtering. In: Proc. of the 24th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 414–415 (2001)
Google Scholar
Han, L., Chen, G.: HQE: A hybrid method for query expansion. Expert Systems with Applications Journal 36, 7985–7991 (2009)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article Google Scholar
Deerwester, S., et al.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Heyer, G., Quasthoff, U., Wittig, T.: Text Mining: Wissensrohstoff Text: Konzepte, Algorithmen, Ergebnisse. W3L-Verlag, Dortmund (2006)
Google Scholar
Büchler, M.: Flexibles Berechnen von Kookkurrenzen auf strukturierten und unstrukturie-ten Daten. Master’s thesis, University of Leipzig (2006)
Google Scholar
Dice, L.R.: Measures of the Amount of Ecologic Association Between Species. Ecology 26(3), 297–302 (1945)
Article Google Scholar
Jaccard, P.: Étude Comparative de la Distribution Floraledansune Portion des Alpeset des Jura. Bulletin de la SociétéVaudoise des Sciences Naturelles 37, 547–579 (1901)
Google Scholar
Quasthoff, U., Wolff, C.: The Poisson Collocation Measure and its Applications. In: Proc. of the Second International Workshop on Computational Approaches to Collocations, Wien (2002)
Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)
Google Scholar
Michel, J., et al.: Quantitative Analysis of Culture Using Millions of Digitized Books. Science 14 331(6014), 176–182 (2011)
Google Scholar
Biemann, C., Bordag, S., Quasthoff, U.: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences. In: Proc. of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, pp. 967–970 (2004)
Google Scholar
Witschel, H.F.: Terminologie-Extraktion - Möglichkeiten der Kombination statistischer und musterbasierter Verfahren. Ergon-Verlag (2004)
Google Scholar
Luhn, H.P.: Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Website of DocAnalyser (2014), http://www.docanalyser.de (last retrieved on March 01, 2014)
Kubek, M., Unger, H.: Detecting Source Topics by Analysing Directed Co-occurrence Graphs. In: Proc. 12th Intl. Conf. on Innovative Internet Community Systems, GI Lecture Notes in Informatics, vol. P-204, pp. 202–211. Köllen Verlag, Bonn (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, FernUniversität in Hagen, Hagen, Germany
Mario Kubek & Herwig Unger

Authors

Mario Kubek
View author publications
You can also search for this author in PubMed Google Scholar
Herwig Unger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mario Kubek .

Editor information

Editors and Affiliations

Faculty of Information Technology, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand
Sirapat Boonkrong
Lehrgebiet Kommunikationsnetze, University of Hagen, Hagen, Germany
Herwig Unger
Faculty of Information Technology, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand
Phayung Meesad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kubek, M., Unger, H. (2014). On N-term Co-occurrences. In: Boonkrong, S., Unger, H., Meesad, P. (eds) Recent Advances in Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 265. Springer, Cham. https://doi.org/10.1007/978-3-319-06538-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-06538-0_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06537-3
Online ISBN: 978-3-319-06538-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics