Search Word Extraction Using Extended PageRank Calculations

Kubek, Mario; Unger, Herwig

doi:10.1007/978-3-642-24806-1_25

Mario Kubek⁴ &
Herwig Unger⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 391))

958 Accesses
2 Citations

Abstract

This paper describes a newmethod to determine characteristic terms from texts by weighting them using extended PageRank calculations. Additionally, this method clusters found semantic term relations to assign each term a level of specifity to be able to distinguish between general and specific terms. This way, it is also possible to differentiate between terms of different semantic orientations in the same specifity level. In the experiments, it is shown which terms can be used for the automatic retrieval of semantically similar documents from large corpora like the World Wide Web through automatic query formulation. The selection of query terms of a different specifity level is also a useful instrument in interactive document retrieval to express the intended similarity of documents to be found. An added advantage of this method is, that it does not rely on third-party datasets and works on single texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article Google Scholar
Heyer, G., Quasthoff, U., Wittig, T.: Text Mining - Wissensrohstoff Text. W3L Verlag, Bochum (2006)
Google Scholar
Kubek, M., Unger, H.: Empiric Considerations of the PageRank’s Clustering Property. In: 7th International Conference on Computing and Information Technology (IC2IT), Bangkok (2011)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. In: Technical report, Stanford Digital Library Technologies Project (1998)
Google Scholar
Wang, J., Liu, J., Wang, C.: Keyword extraction based on pageRank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 857–864. Springer, Heidelberg (2007)
Chapter Google Scholar
Mihalcea, R., Tarau, P., Figa, E.: PageRank on Semantic Networks, with application to Word Sense Disambiguation. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)
Google Scholar
Sodsee, S., Komkhao, M., Meesad, P., Unger, H.: An Extended PageRank Calculation Including Network Parameters. In: Computer Science Education: Innovation and Technology (CSEIT 2010) Special Track: Knowledge Discovery, KD 2010 (2010)
Google Scholar
Buechler, M.: Flexibles Berechnen von Kookkurrenzen auf strukturierten und unstrukturierten Daten. Masters thesis, University of Leipzig (2006)
Google Scholar
Quasthoff, U., Wolff, C.: The Poisson Collocation Measure and its Applications. In: Proc. Second International Workshop on Computational Approaches to Collocations, Wien (2002)
Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)
Google Scholar
Kubek, M., Witschel, H.F.: Searching the Web by Using the Knowledge in Local Text Documents. In: Proceedings of Mallorca Workshop 2010 Autonomous Systems. Shaker Verlag, Aachen (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, FernUniversity in Hagen, Hagen, Germany
Mario Kubek & Herwig Unger

Authors

Mario Kubek
View author publications
You can also search for this author in PubMed Google Scholar
Herwig Unger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mario Kubek .

Editor information

Editors and Affiliations

Lehrgebiet Informationstechnik, FernUniversität Hagen, Universitätsstraße 27, Hagen, 58084, Germany
Herwig Unger
, Institut für Intelligente Systemtechnolo, Alpen Adria Universität Klagenfurt, Universitätsstraße 65-67, Klagenfurt, 9020, Austria
Kyandoghere Kyamaky
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, Warsaw, 01--447, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kubek, M., Unger, H. (2012). Search Word Extraction Using Extended PageRank Calculations. In: Unger, H., Kyamaky, K., Kacprzyk, J. (eds) Autonomous Systems: Developments and Trends. Studies in Computational Intelligence, vol 391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24806-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-24806-1_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24805-4
Online ISBN: 978-3-642-24806-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics