Keyword Extraction from Hindi Documents Using Document Statistics and Fuzzy Modelling

Siddiqi, Sifatullah; Sharan, Aditi

doi:10.1007/978-981-10-7512-4_35

Keyword Extraction from Hindi Documents Using Document Statistics and Fuzzy Modelling

Sifatullah Siddiqi¹⁹ &
Aditi Sharan¹⁹

Conference paper
First Online: 02 March 2018

1857 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 672))

Abstract

In this paper, we put forward a novel unsupervised, domain independent and corpus independent approach for automatic keyword extraction. Our approach combines the document statistics of frequency and spatial distribution of a word in order to extract the keywords. We have extracted keywords from Hindi documents using document statistics and utilized the power of fuzzy logic to combine those document statistics effectively for better results. Further, we use this information to frame fuzzy rules for keyword extraction. Main advantages of our approach are that it uses the fuzzy membership for the variables instead of dealing with crisp thresholds and corpus independent setting of fuzzy membership boundaries. Our work is especially significant in the light that it has been implemented and tested on Hindi which is a resource poor and underrepresented language.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Zahang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., & Wang, B.: Automatic Keyword Extraction from Documents Using Conditional Random Fields, Journal of CIS (2008), pp. 1169–1180.
Google Scholar
Ortuño, M. et al.: Keyword detection in natural languages and DNA, Europhys. Lett. (2002).
Google Scholar
Luhn, H. P.: A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, 1 (4). (1957) pp. 309–317.
Google Scholar
G. Salton, C. S. Yang, Yu, C. T.: A Theory of Term Importance in Automatic Text Analysis, Journal of the American society for Information Science, 26(1), (1975) pp. 33–44.
Google Scholar
Herrera, J.P., Pury, P.A.: Statistical keyword detection in literary corpora, The European physical journal, (2008).
Google Scholar
Carpena, P. et al.: Level statistics of words-Finding keywords in literary texts and symbolic sequences, Physical Review E, (2009).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India
Sifatullah Siddiqi & Aditi Sharan

Authors

Sifatullah Siddiqi
View author publications
You can also search for this author in PubMed Google Scholar
Aditi Sharan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sifatullah Siddiqi .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Technology and Engineering Division, Duy Tan University, Da Nang, Vietnam
Bao Le Nguyen
Duy Tan University, Da Nang, Vietnam
Nhu Gia Nguyen
Department of CSE, PVP Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
Suresh Chandra Satapathy
Faculty of Information Technology , Hai Phong University, Hai Phong, Vietnam
Dac-Nhuong Le

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siddiqi, S., Sharan, A. (2018). Keyword Extraction from Hindi Documents Using Document Statistics and Fuzzy Modelling. In: Bhateja, V., Nguyen, B., Nguyen, N., Satapathy, S., Le, DN. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-10-7512-4_35

Download citation

DOI: https://doi.org/10.1007/978-981-10-7512-4_35
Published: 02 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7511-7
Online ISBN: 978-981-10-7512-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics