Skip to main content

Search Word Extraction Using Extended PageRank Calculations

  • Chapter
Autonomous Systems: Developments and Trends

Part of the book series: Studies in Computational Intelligence ((SCI,volume 391))

Abstract

This paper describes a newmethod to determine characteristic terms from texts by weighting them using extended PageRank calculations. Additionally, this method clusters found semantic term relations to assign each term a level of specifity to be able to distinguish between general and specific terms. This way, it is also possible to differentiate between terms of different semantic orientations in the same specifity level. In the experiments, it is shown which terms can be used for the automatic retrieval of semantically similar documents from large corpora like the World Wide Web through automatic query formulation. The selection of query terms of a different specifity level is also a useful instrument in interactive document retrieval to express the intended similarity of documents to be found. An added advantage of this method is, that it does not rely on third-party datasets and works on single texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  2. Heyer, G., Quasthoff, U., Wittig, T.: Text Mining - Wissensrohstoff Text. W3L Verlag, Bochum (2006)

    Google Scholar 

  3. Kubek, M., Unger, H.: Empiric Considerations of the PageRank’s Clustering Property. In: 7th International Conference on Computing and Information Technology (IC2IT), Bangkok (2011)

    Google Scholar 

  4. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. In: Technical report, Stanford Digital Library Technologies Project (1998)

    Google Scholar 

  5. Wang, J., Liu, J., Wang, C.: Keyword extraction based on pageRank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 857–864. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Mihalcea, R., Tarau, P., Figa, E.: PageRank on Semantic Networks, with application to Word Sense Disambiguation. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)

    Google Scholar 

  7. Sodsee, S., Komkhao, M., Meesad, P., Unger, H.: An Extended PageRank Calculation Including Network Parameters. In: Computer Science Education: Innovation and Technology (CSEIT 2010) Special Track: Knowledge Discovery, KD 2010 (2010)

    Google Scholar 

  8. Buechler, M.: Flexibles Berechnen von Kookkurrenzen auf strukturierten und unstrukturierten Daten. Masters thesis, University of Leipzig (2006)

    Google Scholar 

  9. Quasthoff, U., Wolff, C.: The Poisson Collocation Measure and its Applications. In: Proc. Second International Workshop on Computational Approaches to Collocations, Wien (2002)

    Google Scholar 

  10. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)

    Google Scholar 

  11. Kubek, M., Witschel, H.F.: Searching the Web by Using the Knowledge in Local Text Documents. In: Proceedings of Mallorca Workshop 2010 Autonomous Systems. Shaker Verlag, Aachen (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Kubek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kubek, M., Unger, H. (2012). Search Word Extraction Using Extended PageRank Calculations. In: Unger, H., Kyamaky, K., Kacprzyk, J. (eds) Autonomous Systems: Developments and Trends. Studies in Computational Intelligence, vol 391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24806-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24806-1_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24805-4

  • Online ISBN: 978-3-642-24806-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics