Abstract
This paper describes a newmethod to determine characteristic terms from texts by weighting them using extended PageRank calculations. Additionally, this method clusters found semantic term relations to assign each term a level of specifity to be able to distinguish between general and specific terms. This way, it is also possible to differentiate between terms of different semantic orientations in the same specifity level. In the experiments, it is shown which terms can be used for the automatic retrieval of semantically similar documents from large corpora like the World Wide Web through automatic query formulation. The selection of query terms of a different specifity level is also a useful instrument in interactive document retrieval to express the intended similarity of documents to be found. An added advantage of this method is, that it does not rely on third-party datasets and works on single texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Heyer, G., Quasthoff, U., Wittig, T.: Text Mining - Wissensrohstoff Text. W3L Verlag, Bochum (2006)
Kubek, M., Unger, H.: Empiric Considerations of the PageRank’s Clustering Property. In: 7th International Conference on Computing and Information Technology (IC2IT), Bangkok (2011)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. In: Technical report, Stanford Digital Library Technologies Project (1998)
Wang, J., Liu, J., Wang, C.: Keyword extraction based on pageRank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 857–864. Springer, Heidelberg (2007)
Mihalcea, R., Tarau, P., Figa, E.: PageRank on Semantic Networks, with application to Word Sense Disambiguation. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)
Sodsee, S., Komkhao, M., Meesad, P., Unger, H.: An Extended PageRank Calculation Including Network Parameters. In: Computer Science Education: Innovation and Technology (CSEIT 2010) Special Track: Knowledge Discovery, KD 2010 (2010)
Buechler, M.: Flexibles Berechnen von Kookkurrenzen auf strukturierten und unstrukturierten Daten. Masters thesis, University of Leipzig (2006)
Quasthoff, U., Wolff, C.: The Poisson Collocation Measure and its Applications. In: Proc. Second International Workshop on Computational Approaches to Collocations, Wien (2002)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1994)
Kubek, M., Witschel, H.F.: Searching the Web by Using the Knowledge in Local Text Documents. In: Proceedings of Mallorca Workshop 2010 Autonomous Systems. Shaker Verlag, Aachen (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kubek, M., Unger, H. (2012). Search Word Extraction Using Extended PageRank Calculations. In: Unger, H., Kyamaky, K., Kacprzyk, J. (eds) Autonomous Systems: Developments and Trends. Studies in Computational Intelligence, vol 391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24806-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-24806-1_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24805-4
Online ISBN: 978-3-642-24806-1
eBook Packages: EngineeringEngineering (R0)