Improved RAKE Models to Extract Keywords from Hindi Documents

Siddiqi, Sifatullah; Sharan, Aditi

doi:10.1007/978-981-10-7512-4_47

Sifatullah Siddiqi¹⁹ &
Aditi Sharan¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 672))

1882 Accesses
2 Citations

Abstract

In this paper, we have proposed several improved versions of rapid automatic keyword extraction (RAKE) algorithm for extracting keywords from Hindi documents. As RAKE requires a stopword list to generate the set of candidate keywords, which is unavailable in Hindi, we have constructed the Hindi stopword list for this purpose. We have found some weakness in keyword scoring measures of RAKE and proposed several models such as N-RAKE, SD-RAKE, NSD-RAKE, and WOS-RAKE to improve upon the effectiveness of RAKE. We have found that our modifications yield better results in general than original RAKE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ortuño, M., Carpena, P., Bernaola-Galván, P., Muñoz, E. and Somoza, A.M., “Keyword detection in natural languages and DNA”, Europhys. Lett. 57, (2002), pp. 759–764.
Google Scholar
Rose, S., Engel, D., Cramer, N., & Cowley, W., “Automatic keyword extraction from individual documents”, Text Mining: Applications and Theory, John Wiley & Sons Ltd., (2010).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India
Sifatullah Siddiqi & Aditi Sharan

Authors

Sifatullah Siddiqi
View author publications
You can also search for this author in PubMed Google Scholar
Aditi Sharan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sifatullah Siddiqi .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Technology and Engineering Division, Duy Tan University, Da Nang, Vietnam
Bao Le Nguyen
Duy Tan University, Da Nang, Vietnam
Nhu Gia Nguyen
Department of CSE, PVP Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
Suresh Chandra Satapathy
Faculty of Information Technology , Hai Phong University, Hai Phong, Vietnam
Dac-Nhuong Le

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siddiqi, S., Sharan, A. (2018). Improved RAKE Models to Extract Keywords from Hindi Documents. In: Bhateja, V., Nguyen, B., Nguyen, N., Satapathy, S., Le, DN. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-10-7512-4_47

Download citation

DOI: https://doi.org/10.1007/978-981-10-7512-4_47
Published: 02 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7511-7
Online ISBN: 978-981-10-7512-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics