Skip to main content

Boosting a Semantic Search Engine by Named Entities

  • Conference paper
Foundations of Intelligent Systems (ISMIS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5722))

Included in the following conference series:

Abstract

Traditional Information Retrieval (IR) systems are based on bag-of-words representation. This approach retrieves relevant documents by lexical matching between query and document terms. Due to synonymy and polysemy, lexical methods produce imprecise or incomplete results. In this paper we present SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not simply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. This paper focuses on the named entity level. Our aim is to prove that named entities are useful to improve retrieval performance. We exploit a model able to capture entity relationships, although they are not explicit in documents text. Experiments on CLEF dataset prove the effectiveness of our hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: COLING, pp. 466–471 (1996)

    Google Scholar 

  2. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  3. Basile, P., Caputo, A., Gentile, A.L., Degemmis, M., Lops, P., Semeraro, G.: Enhancing Semantic Search using N-Levels Document Representation. In: Bloehdorn, S., Grobelnik, M., Mika, P., Tran, D.T. (eds.) Proceedings of the Workshop on Semantic Search (SemSearch 2008) at the 5th European Semantic Web Conference (ESWC 2008), Tenerife, Spain, June 2, 2008. CEUR Workshop Proceedings, CEUR-WS.org, vol. 334, pp. 29–43 (2008)

    Google Scholar 

  4. Fox, E.A., Shaw, J.A.: Combination of Multiple Searches. In: TREC, pp. 243–252 (1993)

    Google Scholar 

  5. Lee, J.H.: Analyses of Multiple Evidence Combination. In: SIGIR, pp. 267–276. ACM, New York (1997)

    Chapter  Google Scholar 

  6. Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River (1971)

    Google Scholar 

  7. Basile, P., Caputo, A., Semeraro, G.: UNIBA-SENSE at CLEF 2008: SEmantic N-levels Search Engine. In: CLEF 2008: Ad Hoc Track Overview (2008) (CLEF 2008 Working Notes)

    Google Scholar 

  8. Kudo, T., Matsumoto, Y.: Fast methods for kernel-based text analysis. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp. 24–31 (2003)

    Google Scholar 

  9. Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)

    MATH  Google Scholar 

  10. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL-2003, Edmonton, Canada, pp. 142–147 (2003)

    Google Scholar 

  11. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research 5, 361–397 (2004)

    Google Scholar 

  12. Basile, P., de Gemmis, M., Gentile, A., Iaquinta, L., Lops, P., Semeraro, G.: META-MultilanguagE Text Analyzer. In: Proc. of the Language and Speech Technnology Conference-LangTech., pp. 137–140 (2008)

    Google Scholar 

  13. Agirre, E., Di Nunzio, G.M., Ferro, N., Mandl, T., Peters, C.: CLEF 2008: Ad Hoc Track Overview. In: Working notes for the CLEF 2008 Workshop (2008), http://www.clef-campaign.org/2008/working_notes/adhoc-final.pdf

  14. Landauer, T.K., Dumais, S.T.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychological Review 104, 211–240 (1997)

    Article  Google Scholar 

  15. Lund, K., Burgess, C.: Producing High-Dimensional Semantic Spaces From Lexical Co-Occurrence. Behavior Research Methods Instruments and Computers 28, 203–208 (1996)

    Article  Google Scholar 

  16. Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Stockholm: Stockholm University, Faculty of Humanities, Department of Linguistics (2006)

    Google Scholar 

  17. Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)

    MathSciNet  Google Scholar 

  18. Widdows, D., Ferraro, K.: Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (2008)

    Google Scholar 

  19. Kanerva, P.: Sparse Distributed Memory. MIT Press, Cambridge (1988)

    MATH  Google Scholar 

  20. Smeaton, A., Kelledy, F., ODonnell, R.: TREC-4 Experiments at Dublin City University: Thresholding Posting Lists, Query Expansion with WordNet, and POS Tagging of Spanish. In: TREC (1995)

    Google Scholar 

  21. Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 3-6, pp. 61–69 (1994) (Special Issue of the SIGIR Forum)

    Google Scholar 

  22. Corley, C., Mihalcea, R.: Measuring the Semantic Similarity of Texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Ann Arbor, Michigan, June 2005, pp. 13–18. Association for Computational Linguistics (2005)

    Google Scholar 

  23. Resnik, P.: Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  24. Moldovan, D.I., Mihalcea, R.: Using WordNet and Lexical Operators to Improve Internet Searches. IEEE Internet Computing 4(1), 34–43 (2000)

    Article  Google Scholar 

  25. Davies, J., Weeks, R.: QuizRDF: Search Technology for the Semantic Web. In: Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS 2004)-Track 4, vol. 4, p. 8 (2004)

    Google Scholar 

  26. Ducatel, G., Cui, Z., Azvine, B.: Hybrid Ontology and Keyword Matching Indexing System. In: Proceedings of IntraWeb Workshop at WWW2006, Edimburgh (2006)

    Google Scholar 

  27. Thompson, P., Dozier, C.: Name searching and information retrieval. In: Proceedings of Second Conference on Empirical Methods in Natural Language Processing, pp. 134–140 (1997)

    Google Scholar 

  28. Pehcevski, J., Vercoustre, A.M., Thom, J.A.: Exploiting Locality of Wikipedia Links in Entity Ranking. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 258–269. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  29. Bautin, M., Skiena, S.: Concordance-Based Entity-Oriented Search. In: Web Intelligence, pp. 586–592. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Caputo, A., Basile, P., Semeraro, G. (2009). Boosting a Semantic Search Engine by Named Entities. In: Rauch, J., RaÅ›, Z.W., Berka, P., Elomaa, T. (eds) Foundations of Intelligent Systems. ISMIS 2009. Lecture Notes in Computer Science(), vol 5722. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04125-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04125-9_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04124-2

  • Online ISBN: 978-3-642-04125-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics