A Node Indexing Scheme for Web Entity Retrieval

Delbru, Renaud; Toupikov, Nickolai; Catasta, Michele; Tummarello, Giovanni

doi:10.1007/978-3-642-13489-0_17

Renaud Delbru²³,
Nickolai Toupikov²³,
Michele Catasta²⁴ &
…
Giovanni Tummarello^23,25

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6089))

Included in the following conference series:

Extended Semantic Web Conference

1332 Accesses
25 Citations

Abstract

Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrieving relevant semi-structured information. In this paper, we present an “entity retrieval system” designed to provide entity search capabilities over datasets as large as the entire Web of Data. Our system supports full-text search, semi-structural queries and top-k query results while exhibiting a concise index and efficient incremental updates. We advocate the use of a node indexing scheme and show that it offers a good compromise between query expressiveness, query processing time and update complexity in comparison to three other indexing techniques. We then demonstrate how such system can effectively answer queries over 10 billion triples on a single commodity machine.

Download to read the full chapter text

Chapter PDF

TRank: Ranking Entity Types Using the Web of Data

Example Based Entity Search in the Web of Data

A Review of Graph-Based Models for Entity-Oriented Search

Article 30 August 2021

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: Sindice.com: A document-oriented lookup index for open linked data. International Journal of Metadata, Semantics and Ontologies 3(1) (2008)
Google Scholar
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment, pp. 411–422 (2007)
Google Scholar
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)
Chapter Google Scholar
Weiss, C., Karras, P., Bernstein, A.: Hexastore - sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment 1(1), 1008–1019 (2008)
Google Scholar
Neumann, T., Weikum, G.: RDF-3X - a RISC-style Engine for RDF. Proceedings of the VLDB Endowment 1(1), 647–659 (2008)
Google Scholar
Baeza-Yates, R., Navarro, G.: Integrating contents and structure in text retrieval. SIGMOD Rec. 25(1), 67–79 (1996)
Article Google Scholar
Walsh, N., Fernández, M., Malhotra, A., Nagy, M., Marsh, J.: XQuery 1.0 and XPath 2.0 data model (XDM). W3C recommendation, W3C (January 2007)
Google Scholar
Gang, G., Chirkova, R.: Efficiently Querying Large XML Data Repositories: A Survey. IEEE Transactions on Knowledge and Data Engineering 19(10), 1381–1403 (2007)
Article Google Scholar
Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 361–370 (2001)
Google Scholar
Haixun, W., Hao, H., Jun, Y., Yu, P., Yu, J.: Dual Labeling: Answering Graph Reachability Queries in Constant Time. In: Proceedings of the 22nd International Conference on Data Engineering, p. 75. IEEE, Los Alamitos (2006)
Google Scholar
Su-Cheng, H., Chien-Sing, L.: Node Labeling Schemes in XML Query Optimization: A Survey and Trends. IETE Technical Review 26(2), 88 (2009)
Article Google Scholar
Wang, H., Liu, Q., Penin, T., Fu, L., Zhang, L., Tran, T., Yu, Y., Pan, Y.: Semplore: A scalable IR approach to search the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 177–188 (2009)
Article Google Scholar
Bast, H., Chitea, A., Suchanek, F., Weber, I.: ESTER: efficient search on text, entities, and relations. In: Proceedings of the 30th Annual International ACM SIGIR Conference, pp. 671–678. ACM, New York (2007)
Google Scholar
Dong, X., Halevy, A.: Indexing dataspaces. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, p. 43 (2007)
Google Scholar
Christophides, V., Plexousakis, D., Scholl, M., Tourtounis, S.: On labeling schemes for the semantic web. In: Proceedings of the 12th International Conference on World Wide Web, p. 544 (2003)
Google Scholar
Beyer, K., Viglas, S.D., Tatarinov, I., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference, pp. 204–215 (2002)
Google Scholar
Sacks-davis, R., Dao, T., Thom, J.A., Zobel, J.: Indexing documents for queries on structure, content and attributes. In: Proceedings of International Symposium on Digital Media Information Base, November 1997, pp. 236–245. World Scientific, Singapore (1997)
Google Scholar
Anh, V.N., Moffat, A.: Structured index organizations for high-throughput text querying. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 304–315. Springer, Heidelberg (2006)
Chapter Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: Compressing and indexing documents and images, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computer Surveys 38(2), 6 (2006)
Article Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
MATH Google Scholar
Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14(4), 349–379 (1996)
Article Google Scholar
Graefe, G.: Query evaluation techniques for large databases. ACM Computing Surveys 25(2), 73 (1993)
Article Google Scholar
Graefe, G.: B-tree indexes for high update rates. ACM SIGMOD Record 35(1), 39 (2006)
Article Google Scholar
Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Dynamic maintenance of web indexes using landmarks. In: Proceedings of the 12th International Conference on World Wide Web, p. 102 (2003)
Google Scholar
Delbru, R., Toupikov, N., Catasta, M., Fuller, R., Tummarello, G.: SIREn: Efficient Search on Semi- Structured Documents. In: Lucene in Action, 2nd edn. Manning Publications Co. (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Digital Enterprise Research Institute, National University of Ireland, Galway, Galway, Ireland
Renaud Delbru, Nickolai Toupikov & Giovanni Tummarello
School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
Michele Catasta
Fondazione Bruno Kessler, Trento, Italy
Giovanni Tummarello

Authors

Renaud Delbru
View author publications
You can also search for this author in PubMed Google Scholar
Nickolai Toupikov
View author publications
You can also search for this author in PubMed Google Scholar
Michele Catasta
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Tummarello
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Free University Amsterdam, De Boelelaan 1081a, 1081 HV, Amsterdem, The Netherlands
Lora Aroyo
Institute of Computer Science, FORTH and Computer Science Department, University of Crete, P.O. Box 1385, 71110, Heraklion, Greece
Grigoris Antoniou
School of Science and Technology, Department of Media Technology, Aalto University, P.O. Box15500, 00076, Aalto, Finland
Eero Hyvönen
Department of AI, Free University Amsterdam, De Boelelaan 1081A, 1081HV, Amsterdam, The Netherlands
Annette ten Teije
Institut für Informatik, B6, 26, Universität Mannheim, 68159, Mannheim, Germany
Heiner Stuckenschmidt
Knowledge Media Institute, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Liliana Cabral
Stanford Biomedical Informatics Research Center, 251 Campus Drive, 94305-5479, Stanford, CA, USA
Tania Tudorache

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Delbru, R., Toupikov, N., Catasta, M., Tummarello, G. (2010). A Node Indexing Scheme for Web Entity Retrieval. In: Aroyo, L., et al. The Semantic Web: Research and Applications. ESWC 2010. Lecture Notes in Computer Science, vol 6089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13489-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-13489-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13488-3
Online ISBN: 978-3-642-13489-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Node Indexing Scheme for Web Entity Retrieval

Abstract

Chapter PDF

Similar content being viewed by others

TRank: Ranking Entity Types Using the Web of Data

Example Based Entity Search in the Web of Data

A Review of Graph-Based Models for Entity-Oriented Search

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Node Indexing Scheme for Web Entity Retrieval

Abstract

Chapter PDF

Similar content being viewed by others

TRank: Ranking Entity Types Using the Web of Data

Example Based Entity Search in the Web of Data

A Review of Graph-Based Models for Entity-Oriented Search

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation