Abstract
In this paper, we put forward a novel unsupervised, domain independent and corpus independent approach for automatic keyword extraction. Our approach combines the document statistics of frequency and spatial distribution of a word in order to extract the keywords. We have extracted keywords from Hindi documents using document statistics and utilized the power of fuzzy logic to combine those document statistics effectively for better results. Further, we use this information to frame fuzzy rules for keyword extraction. Main advantages of our approach are that it uses the fuzzy membership for the variables instead of dealing with crisp thresholds and corpus independent setting of fuzzy membership boundaries. Our work is especially significant in the light that it has been implemented and tested on Hindi which is a resource poor and underrepresented language.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Zahang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., & Wang, B.: Automatic Keyword Extraction from Documents Using Conditional Random Fields, Journal of CIS (2008), pp. 1169–1180.
Ortuño, M. et al.: Keyword detection in natural languages and DNA, Europhys. Lett. (2002).
Luhn, H. P.: A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, 1 (4). (1957) pp. 309–317.
G. Salton, C. S. Yang, Yu, C. T.: A Theory of Term Importance in Automatic Text Analysis, Journal of the American society for Information Science, 26(1), (1975) pp. 33–44.
Herrera, J.P., Pury, P.A.: Statistical keyword detection in literary corpora, The European physical journal, (2008).
Carpena, P. et al.: Level statistics of words-Finding keywords in literary texts and symbolic sequences, Physical Review E, (2009).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Siddiqi, S., Sharan, A. (2018). Keyword Extraction from Hindi Documents Using Document Statistics and Fuzzy Modelling. In: Bhateja, V., Nguyen, B., Nguyen, N., Satapathy, S., Le, DN. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-10-7512-4_35
Download citation
DOI: https://doi.org/10.1007/978-981-10-7512-4_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7511-7
Online ISBN: 978-981-10-7512-4
eBook Packages: EngineeringEngineering (R0)