Abstract
Information retrieval involves similarity estimation of the documents in a repository. It is the measure of the closeness of documents which can be in general measured as a similarity/distance score for the user entered query. This score is used to rank and retrieve the documents from the repository based on user need. Distance-based similarity algorithms are generally of the order O(n) rather than O(n\(^{2}\)). A similarity measure finds its usage not only in estimating similarity score for document retrieval but also clustering and classification. Researchers in the past have suggested numerous similarity measures. This paper presents a new and efficient Information retrieval algorithm using Bray–Curtis Distance-based information retrieval from OHSUMED. Detailed analysis shows that the Bray–Curtis Distance-based similarity measure used for Information retrieval outperforms the other prevailing similarity methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ounis I, Macdonald C, Lin J, Soboroff I (2011) Overview of the TREC-2011 microblog track. In: Proceedings of the 20th Text REtrieval conference (TREC 2011), vol 32
Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. ACM SIGIR Forum 51(2):268–276
Voorhees EM, Harman DK (eds) (2005) TREC: experiment and evaluation in information retrieval, vol 1. MIT Press, Cambridge
Belkin NJ, Croft WB (1992) Information filtering and information retrieval: two sides of the same coin? Commun ACM 35(12):29–38
Niwa Y, Sakurai H (1999) Document retrieval-assisting method and system for the same and document retrieval service using the same with document frequency and term frequency. U.S. Patent 5,987,460, issued 16 Nov 1999
Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Lee MD, Navarro DJ, Nikkerud H (2005) An empirical evaluation of models of text document similarity. Proc Cogn Sci Soc 27(27)
Ilijoski B, Popeska Z (2017) A survey of text mining techniques, algorithms and applications, pp 141–144
Zhang Y, Callan J, Minka T (2002) Novelty and redundancy detection in adaptive filtering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 81–88
Alami N, El Adlouni Y, En-nahnahi N, Meknassi M (2017) Using statistical and semantic analysis for Arabic text summarization. In: International conference on information technology and communication systems. Springer, Cham, pp 35–50
Lee L (1999) Measures of distributional similarity. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics. Association for Computational Linguistics, pp 25–32
Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc (just-accepted)
Bray JR, Curtis JT (1957) An ordination of the upland forest communities of Southern Wisconsin. Ecol Monogr 27:325–349
Clarke KR, Somerfield PJ, Chapman MG (2006) On resemblance measures for ecological studies, including taxonomic dissimilarities and a zero-adjusted Bray-Curtis coefficient for denuded assemblages. J Exp Mar Biol Ecol 330:55–80
Field JG, Clarke KR, Warwick RM (1982) A practical strategy for analyzing multispecies distribution patterns. Mar Ecol Prog Ser 8:37–52
Chacoff NP, Resasco J, Vázquez DP (2018) Interaction frequency, network position, and the temporal persistence of interactions in a plant-pollinator network. Ecology 99(1):21–28
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1–47
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Thakur, N., Mehrotra, D., Bansal, A., Bala, M. (2019). Analysis and Implementation of the Bray–Curtis Distance-Based Similarity Measure for Retrieving Information from the Medical Repository. In: Bhattacharyya, S., Hassanien, A., Gupta, D., Khanna, A., Pan, I. (eds) International Conference on Innovative Computing and Communications. Lecture Notes in Networks and Systems, vol 56. Springer, Singapore. https://doi.org/10.1007/978-981-13-2354-6_14
Download citation
DOI: https://doi.org/10.1007/978-981-13-2354-6_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2353-9
Online ISBN: 978-981-13-2354-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)