Advertisement

A Node Indexing Scheme for Web Entity Retrieval

  • Renaud Delbru
  • Nickolai Toupikov
  • Michele Catasta
  • Giovanni Tummarello
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6089)

Abstract

Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrieving relevant semi-structured information. In this paper, we present an “entity retrieval system” designed to provide entity search capabilities over datasets as large as the entire Web of Data. Our system supports full-text search, semi-structural queries and top-k query results while exhibiting a concise index and efficient incremental updates. We advocate the use of a node indexing scheme and show that it offers a good compromise between query expressiveness, query processing time and update complexity in comparison to three other indexing techniques. We then demonstrate how such system can effectively answer queries over 10 billion triples on a single commodity machine.

Keywords

Index Size Inverted Index SPARQL Query Triple Pattern Inverted List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: Sindice.com: A document-oriented lookup index for open linked data. International Journal of Metadata, Semantics and Ontologies 3(1) (2008)Google Scholar
  2. 2.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment, pp. 411–422 (2007)Google Scholar
  3. 3.
    Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore - sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment 1(1), 1008–1019 (2008)Google Scholar
  5. 5.
    Neumann, T., Weikum, G.: RDF-3X - a RISC-style Engine for RDF. Proceedings of the VLDB Endowment 1(1), 647–659 (2008)Google Scholar
  6. 6.
    Baeza-Yates, R., Navarro, G.: Integrating contents and structure in text retrieval. SIGMOD Rec. 25(1), 67–79 (1996)CrossRefGoogle Scholar
  7. 7.
    Walsh, N., Fernández, M., Malhotra, A., Nagy, M., Marsh, J.: XQuery 1.0 and XPath 2.0 data model (XDM). W3C recommendation, W3C (January 2007)Google Scholar
  8. 8.
    Gang, G., Chirkova, R.: Efficiently Querying Large XML Data Repositories: A Survey. IEEE Transactions on Knowledge and Data Engineering 19(10), 1381–1403 (2007)CrossRefGoogle Scholar
  9. 9.
    Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 361–370 (2001)Google Scholar
  10. 10.
    Haixun, W., Hao, H., Jun, Y., Yu, P., Yu, J.: Dual Labeling: Answering Graph Reachability Queries in Constant Time. In: Proceedings of the 22nd International Conference on Data Engineering, p. 75. IEEE, Los Alamitos (2006)Google Scholar
  11. 11.
    Su-Cheng, H., Chien-Sing, L.: Node Labeling Schemes in XML Query Optimization: A Survey and Trends. IETE Technical Review 26(2), 88 (2009)CrossRefGoogle Scholar
  12. 12.
    Wang, H., Liu, Q., Penin, T., Fu, L., Zhang, L., Tran, T., Yu, Y., Pan, Y.: Semplore: A scalable IR approach to search the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 177–188 (2009)CrossRefGoogle Scholar
  13. 13.
    Bast, H., Chitea, A., Suchanek, F., Weber, I.: ESTER: efficient search on text, entities, and relations. In: Proceedings of the 30th Annual International ACM SIGIR Conference, pp. 671–678. ACM, New York (2007)Google Scholar
  14. 14.
    Dong, X., Halevy, A.: Indexing dataspaces. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, p. 43 (2007)Google Scholar
  15. 15.
    Christophides, V., Plexousakis, D., Scholl, M., Tourtounis, S.: On labeling schemes for the semantic web. In: Proceedings of the 12th International Conference on World Wide Web, p. 544 (2003)Google Scholar
  16. 16.
    Beyer, K., Viglas, S.D., Tatarinov, I., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference, pp. 204–215 (2002)Google Scholar
  17. 17.
    Sacks-davis, R., Dao, T., Thom, J.A., Zobel, J.: Indexing documents for queries on structure, content and attributes. In: Proceedings of International Symposium on Digital Media Information Base, November 1997, pp. 236–245. World Scientific, Singapore (1997)Google Scholar
  18. 18.
    Anh, V.N., Moffat, A.: Structured index organizations for high-throughput text querying. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 304–315. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: Compressing and indexing documents and images, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (1999)Google Scholar
  20. 20.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computer Surveys 38(2), 6 (2006)CrossRefGoogle Scholar
  21. 21.
    Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)zbMATHGoogle Scholar
  22. 22.
    Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14(4), 349–379 (1996)CrossRefGoogle Scholar
  23. 23.
    Graefe, G.: Query evaluation techniques for large databases. ACM Computing Surveys 25(2), 73 (1993)CrossRefGoogle Scholar
  24. 24.
    Graefe, G.: B-tree indexes for high update rates. ACM SIGMOD Record 35(1), 39 (2006)CrossRefGoogle Scholar
  25. 25.
    Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Dynamic maintenance of web indexes using landmarks. In: Proceedings of the 12th International Conference on World Wide Web, p. 102 (2003)Google Scholar
  26. 26.
    Delbru, R., Toupikov, N., Catasta, M., Fuller, R., Tummarello, G.: SIREn: Efficient Search on Semi- Structured Documents. In: Lucene in Action, 2nd edn. Manning Publications Co. (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Renaud Delbru
    • 1
  • Nickolai Toupikov
    • 1
  • Michele Catasta
    • 2
  • Giovanni Tummarello
    • 1
    • 3
  1. 1.Digital Enterprise Research InstituteNational University of Ireland, GalwayGalwayIreland
  2. 2.School of Computer and Communication SciencesÉcole Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
  3. 3.Fondazione Bruno KesslerTrentoItaly

Personalised recommendations