Abstract
In this paper, we have proposed several improved versions of rapid automatic keyword extraction (RAKE) algorithm for extracting keywords from Hindi documents. As RAKE requires a stopword list to generate the set of candidate keywords, which is unavailable in Hindi, we have constructed the Hindi stopword list for this purpose. We have found some weakness in keyword scoring measures of RAKE and proposed several models such as N-RAKE, SD-RAKE, NSD-RAKE, and WOS-RAKE to improve upon the effectiveness of RAKE. We have found that our modifications yield better results in general than original RAKE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ortuño, M., Carpena, P., Bernaola-Galván, P., Muñoz, E. and Somoza, A.M., “Keyword detection in natural languages and DNA”, Europhys. Lett. 57, (2002), pp. 759–764.
Rose, S., Engel, D., Cramer, N., & Cowley, W., “Automatic keyword extraction from individual documents”, Text Mining: Applications and Theory, John Wiley & Sons Ltd., (2010).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Siddiqi, S., Sharan, A. (2018). Improved RAKE Models to Extract Keywords from Hindi Documents. In: Bhateja, V., Nguyen, B., Nguyen, N., Satapathy, S., Le, DN. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-10-7512-4_47
Download citation
DOI: https://doi.org/10.1007/978-981-10-7512-4_47
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7511-7
Online ISBN: 978-981-10-7512-4
eBook Packages: EngineeringEngineering (R0)