Abstract
Traditional Information Retrieval (IR) systems are based on bag-of-words representation. This approach retrieves relevant documents by lexical matching between query and document terms. Due to synonymy and polysemy, lexical methods produce imprecise or incomplete results. In this paper we present SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. This paper focuses on the named entity level. Our aim is to prove that named entities are useful to improve retrieval performance. We exploit a model able to capture entity relationships, although they are not explicit in documents text. Experiments on CLEF dataset prove the effectiveness of our hypothesis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: COLING, pp. 466–471 (1996)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Basile, P., Caputo, A., Gentile, A.L., Degemmis, M., Lops, P., Semeraro, G.: Enhancing Semantic Search using N-Levels Document Representation. In: Bloehdorn, S., Grobelnik, M., Mika, P., Tran, D.T. (eds.) Proceedings of the Workshop on Semantic Search (SemSearch 2008) at the 5th European Semantic Web Conference (ESWC 2008), Tenerife, Spain, June 2, 2008. CEUR Workshop Proceedings, CEUR-WS.org, vol. 334, pp. 29–43 (2008)
Fox, E.A., Shaw, J.A.: Combination of Multiple Searches. In: TREC, pp. 243–252 (1993)
Lee, J.H.: Analyses of Multiple Evidence Combination. In: SIGIR, pp. 267–276. ACM, New York (1997)
Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River (1971)
Basile, P., Caputo, A., Semeraro, G.: UNIBA-SENSE at CLEF 2008: SEmantic N-levels Search Engine. In: CLEF 2008: Ad Hoc Track Overview (2008) (CLEF 2008 Working Notes)
Kudo, T., Matsumoto, Y.: Fast methods for kernel-based text analysis. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp. 24–31 (2003)
Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL-2003, Edmonton, Canada, pp. 142–147 (2003)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research 5, 361–397 (2004)
Basile, P., de Gemmis, M., Gentile, A., Iaquinta, L., Lops, P., Semeraro, G.: META-MultilanguagE Text Analyzer. In: Proc. of the Language and Speech Technnology Conference-LangTech., pp. 137–140 (2008)
Agirre, E., Di Nunzio, G.M., Ferro, N., Mandl, T., Peters, C.: CLEF 2008: Ad Hoc Track Overview. In: Working notes for the CLEF 2008 Workshop (2008), http://www.clef-campaign.org/2008/working_notes/adhoc-final.pdf
Landauer, T.K., Dumais, S.T.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104, 211–240 (1997)
Lund, K., Burgess, C.: Producing High-Dimensional Semantic Spaces From Lexical Co-Occurrence. Behavior Research Methods Instruments and Computers 28, 203–208 (1996)
Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Stockholm: Stockholm University, Faculty of Humanities, Department of Linguistics (2006)
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)
Widdows, D., Ferraro, K.: Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (2008)
Kanerva, P.: Sparse Distributed Memory. MIT Press, Cambridge (1988)
Smeaton, A., Kelledy, F., ODonnell, R.: TREC-4 Experiments at Dublin City University: Thresholding Posting Lists, Query Expansion with WordNet, and POS Tagging of Spanish. In: TREC (1995)
Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 3-6, pp. 61–69 (1994) (Special Issue of the SIGIR Forum)
Corley, C., Mihalcea, R.: Measuring the Semantic Similarity of Texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, Michigan, June 2005, pp. 13–18. Association for Computational Linguistics (2005)
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Moldovan, D.I., Mihalcea, R.: Using WordNet and Lexical Operators to Improve Internet Searches. IEEE Internet Computing 4(1), 34–43 (2000)
Davies, J., Weeks, R.: QuizRDF: Search Technology for the Semantic Web. In: Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS 2004)-Track 4, vol. 4, p. 8 (2004)
Ducatel, G., Cui, Z., Azvine, B.: Hybrid Ontology and Keyword Matching Indexing System. In: Proceedings of IntraWeb Workshop at WWW2006, Edimburgh (2006)
Thompson, P., Dozier, C.: Name searching and information retrieval. In: Proceedings of Second Conference on Empirical Methods in Natural Language Processing, pp. 134–140 (1997)
Pehcevski, J., Vercoustre, A.M., Thom, J.A.: Exploiting Locality of Wikipedia Links in Entity Ranking. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 258–269. Springer, Heidelberg (2008)
Bautin, M., Skiena, S.: Concordance-Based Entity-Oriented Search. In: Web Intelligence, pp. 586–592. IEEE Computer Society, Los Alamitos (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Caputo, A., Basile, P., Semeraro, G. (2009). Boosting a Semantic Search Engine by Named Entities. In: Rauch, J., RaÅ›, Z.W., Berka, P., Elomaa, T. (eds) Foundations of Intelligent Systems. ISMIS 2009. Lecture Notes in Computer Science(), vol 5722. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04125-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-04125-9_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04124-2
Online ISBN: 978-3-642-04125-9
eBook Packages: Computer ScienceComputer Science (R0)