Abstract
Efficient generation of metadata and retrieval is becoming highly challenging with the increase in volume of resources. Huge recall with low precision is the problem of most document retrieval systems. Effective automatic generation of metadata can reduce time required in creating metadata manually. This paper attempts to develop asearch algorithm to retrieve documents using automatically generated metadata. The fifteen elements of Dublin Core Metadata are discussed and addition of another element consisting of popular search keywords for retrieving the document has been suggested. The keywords in the suggested extended metadata element can be grouped according to geographic location for personalizing the search results. Precision increment and recall decrement can be achieved by retrieving books of specific topic-ids for which maximum words match with the searched keywords. Different stemming mechanisms for the search keywords have also been discussed. A flowchart of the proposed algorithm is given at the end.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
DCMI Dublin Core Metadata, http://dublincore.org
Smeaton, A.F.: Natural Language Processing & Information Retrieval, http://www.compapp.dcu.ie/~asmeaton/asmeaton.html
Frakes, W.B.: Term Conflation for Information Retrieval. In: 7th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 383–389 (1984)
Porter, M.F.: An Algorithm for Suffix Tripping. Program. 14, 130–137 (1980)
Paice, C.D.: Another Stemmer. ACM SIGIR Forum 24, 56–61 (1990)
Dawson, J.: Suffix Removal and Word Conflation. ALLC Bulletin 2, 33–46 (1974)
Mayfield, J., McNamee, P.: Single N-gram Stemming. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–416 (2003)
Melucci, M., Orio, N.: A Novel Method for Stemmer Generation based on Hidden Markov Models. In: 12th International Conference on Information and Knowledge Management, pp. 131–138 (2003)
Majumder, P., Mitra, M., Parui, K.S., Kole, G., Mitra, P., Datta, K.: YASS: Yet another Suffix Stripper. ACM Transactions on Information Systems 25 (2007)
Xu, J., Croft, B.W.: Corpus-based Stemming using Co-occurrence of Word Variants. ACM Transactions on Information Systems 16, 61–81 (1998)
Funchun, P., Ahmed, N., Xin, L., Yumao, L.: Context Sensitive Stemming for Web Search. In: 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 639–646 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bandyopadhyay, S., Bandyopadhyay, A.K. (2012). Developing an Efficient Mechanism for Retrieving Documents Using Automatically Generated Metadata. In: Krishna, P.V., Babu, M.R., Ariwa, E. (eds) Global Trends in Computing and Communication Systems. ObCom 2011. Communications in Computer and Information Science, vol 269. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29219-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-29219-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29218-7
Online ISBN: 978-3-642-29219-4
eBook Packages: Computer ScienceComputer Science (R0)