Skip to main content

Publishing Data on the Web

  • Chapter
Web Information Retrieval

Abstract

During the last 20 years a number of techniques to publish data on the Web have emerged. They range from the inexpensive approach of setting up a Web form to query a database, to the costly one of publishing Linked Data. So far none of these techniques has emerged as the preferable one, but search engine rich snippets are rapidly changing this game. Search engine optimization is becoming the driving business model for data publishing. This chapter surveys, in chronological order, the different techniques data owners have been using to publish data on the Web, provides a mean to comparatively evaluate them, and presents an outlook on the near future of Web data publishing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 79.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For example, http://www.census.gov/ online since 1996.

  2. 2.

    For example, http://www.amazon.com/ online since 1995.

  3. 3.

    The internationalized resource identifier (IRI) generalizes the uniform resource identifier (URI) that generalizes the uniform resource locator (URL) used on the Web to identify a Web page. While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from other character sets including Chinese, Japanese, Korean, Cyrillic, and so forth. The IRIs are defined by RFC 3987.

  4. 4.

    http://opendatacommons.org/licenses/pddl/.

  5. 5.

    http://opendatacommons.org/licenses/by/.

  6. 6.

    http://creativecommons.org/publicdomain/zero/1.0/.

  7. 7.

    http://www.opensearch.org/.

  8. 8.

    Modern browsers have a search box (e.g., Firefox 18 has it in the top right corner) with which users can issue search requests to their preferred search engines without opening the search engine page. The set of search engines reachable from the browser is extensible. When the user is on a Web page that contains an OpenSearch service description (e.g., amazon.com), the auto-discovery feature of the protocol allows the browser to detect the search engine interface, and the user can add it to the set of search engines directly invocable from the browser.

  9. 9.

    Source http://trends.builtwith.com/docinfo/OpenSearch.

  10. 10.

    A permalink is an IRI pointing to a piece of information published on the text that remains unchanged indefinitely. Most Web 2.0 content syndication software systems support such links.

  11. 11.

    http://www.json.org/.

  12. 12.

    Interested readers can read a popular blog post entitled “Web Services Org Folds Up and the REST is History” by R. Irani available online at http://blog.programmableweb.com/2010/11/30/web-services-org-folds-up-and-the-rest-is-history/.

  13. 13.

    http://blog.programmableweb.com/2012/11/26/8000-apis-rise-of-the-enterprise/.

  14. 14.

    http://blog.programmableweb.com/2012/08/23/7000-apis-twice-as-many-as-this-time-last-year/.

  15. 15.

    http://www.citysearch.com/.

  16. 16.

    http://www.urbanspoon.com/.

  17. 17.

    http://www.bloglines.com/.

  18. 18.

    http://www.insiderpages.com/.

  19. 19.

    http://www.merchantcircle.com/.

  20. 20.

    http://microformats.org/wiki/hCalendar.

  21. 21.

    The International Standard ISO 8601 specifies numeric representations of date and time. Interested readers can refer to http://www.cl.cam.ac.uk/~mgk25/iso-time.html for a summary.

  22. 22.

    http://www.microformats.org/.

  23. 23.

    The term name collision refers to the problem that occurs in computer programs when the same name denotes different things; typically it arises when two separate program components are merged. Problems of name collision are commonly addressed by introducing namespaces and prefixes.

  24. 24.

    http://microformats.org/wiki/issues#opened_2010.

  25. 25.

    http://schema.org/Event.

  26. 26.

    http://www.readwriteweb.com/archives/how_best_buy_is_using_the_semantic_web.php.

  27. 27.

    http://www.readwriteweb.com/archives/facebook_the_semantic_web.php.

  28. 28.

    http://developers.facebook.com/docs/opengraph.

  29. 29.

    Source http://trends.builtwith.com/docinfo/Open-Graph-Protocol.

  30. 30.

    http://readwrite.com/2010/04/15/the_modigliani_test_semantic_web_tipping_point.

  31. 31.

    http://readwrite.com/2010/04/25/the_modigliani_test_for_linked_data.

  32. 32.

    http://factforge.net/.

  33. 33.

    http://www.fdic.gov.

  34. 34.

    http://www2.fdic.gov/hsob/SelectRpt.asp?EntryTyp=30.

  35. 35.

    http://blog.okfn.org/2012/10/19/data-party-tracking-europes-failed-banks/.

  36. 36.

    http://www.datatables.org/.

  37. 37.

    http://developer.yahoo.com/yql/.

  38. 38.

    http://www.google.com/fusiontables.

  39. 39.

    http://www.freebase.com/.

  40. 40.

    The European Commission within the Seventh Information and Communication Technologies Work Programme funded the projects LOD2 (http://lod2.eu) and LACT (http://latc-project.eu) to further push the development of Linked Data technologies, and LDBC (http://www.ldbc.eu/) to establish industry cooperation between vendors of Linked Data technologies in developing, endorsing, and publishing reliable and insightful benchmark results.

  41. 41.

    http://www.google.com/trends/.

References

  1. B. Adida, M. Birbeck, S. McCarron, I. Herman, RDFa Core 1.1—syntax and processing rules for embedding RDF through attributes, Technical report, W3C Recommendation, 2010. http://www.w3.org/TR/rdfa-core/

  2. B. Adida, M. Birbeck, S. Pemberton, HTML+RDFa 1.1—support for RDFa in HTML4 and HTML5, Technical report, W3C Working Draft, 2012. http://www.w3.org/TR/rdfa-in-html/

  3. J. Allsopp, Link-based microformats: rel-license, rel-tag, rel-nofollow, and votelinks, in Microformats: Empowering Your Markup for Web 2.0 (Apress, New York, 2007), pp. 51–74

    Google Scholar 

  4. G. Alonso, F. Casati, H. Kuno, V. Machiraju, Web Services—Concepts, Architectures and Applications, 1st edn. (Springer, Berlin, 2003)

    Google Scholar 

  5. M.K. Bergman, White paper: the deep web. Surfacing hidden value. J. Electron. Publ. 7(1) (2001). doi:10.3998/3336451.0007.104

  6. T. Berners-Lee, Linked data, World wide web design issues (2010), http://www.w3.org/DesignIssues/LinkedData.html

  7. C. Bizer, The emerging web of linked data. IEEE Intell. Syst. 24(5), 87–92 (2009)

    Article  Google Scholar 

  8. C. Bizer, T. Heath, T. Berners-Lee, Linked data—the story so far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  9. D. Brickley, R.V. Guha, RDF vocabulary description language 1.0: RDF schema, Technical report, W3C Recommendation (2004), http://www.w3.org/TR/rdf-schema/

  10. J. Cope, N. Craswell, D. Hawking, Automated discovery of search interfaces on the web, in ADC, ed. by K.-D. Schewe, X. Zhou. CRPIT, vol. 17 (Australian Computer Society, Oarlinghurst, 2003), pp. 181–189

    Google Scholar 

  11. E. Della Valle, I. Celino, D. Dell’Aglio, The experience of realizing a semantic web urban computing application. Trans. GIS 14(2), 163–181 (2010)

    Article  Google Scholar 

  12. D.W. Embley, Y.S. Jiang, Y.-K. Ng, Record-boundary discovery in web documents, in SIGMOD Conference, ed. by A. Delis, C. Faloutsos, S. Ghandeharizadeh (ACM, New York, 1999), pp. 467–478

    Google Scholar 

  13. B. He, M. Patel, Z. Zhang, K.C.-C. Chang, Accessing the deep web. Commun. ACM 50(5), 94–101 (2007)

    Article  Google Scholar 

  14. M. Hepp, GoodRelations: an ontology for describing products and services offers on the web, in EKAW, ed. by A. Gangemi, J. Euzenat. Lecture Notes in Computer Science, vol. 5268 (Springer, Berlin, 2008), pp. 329–346

    Google Scholar 

  15. R. Khare, Microformats: the next (small) thing on the semantic web? IEEE Internet Comput. 10(1), 68–75 (2006)

    Article  Google Scholar 

  16. R. Khare, T. Çelik, Microformats: a pragmatic path to the semantic web, in WWW, ed. by L. Carr, D.D. Roure, A. Iyengar, C.A. Goble, M. Dahlin (ACM, New York, 2006), pp. 865–866

    Google Scholar 

  17. O. Lassila, R.R. Swick, Resource description framework (RDF) model and syntax specification, Technical report, W3C Recommendation, Feb 1999, http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

  18. J. Madhavan, D. Ko, L. Kot, V. Ganapathy, A. Rasmussen, A.Y. Halevy, Google’s deep web crawl. Proc. VLDB Endow. 1(2), 1241–1252 (2008)

    Google Scholar 

  19. S. McCarron, XHTML+RDFa 1.1—support for RDFa via XHTML modularization, Technical report, W3C Recommendatio, June 2012, http://www.w3.org/TR/xhtml-rdfa/

  20. D.L. McGuinness, F. Van Harmelen, OWL web ontology language overview, Technical report, W3C Recommendation, Nov 2004, http://www.w3.org/TR/owl-features/

  21. H. Mühleisen, C. Bizer, Web data commons—extracting structured data from two large web corpora, in LDOW, ed. by C. Bizer, T. Heath, T. Berners-Lee, M. Hausenblas. CEUR Workshop Proceedings, vol. 937 (2012). CEUR-WS.org

    Google Scholar 

  22. B. Uzzi, The sources and consequences of embeddedness for the economic performance of organizations: the network effect. Am. Sociol. Rev. 61(4), 674–698 (1996)

    Article  Google Scholar 

  23. J. Yu, B. Benatallah, F. Casati, F. Daniel, Understanding mashup development. IEEE Internet Comput. 12(5), 44–52 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ceri, S., Bozzon, A., Brambilla, M., Della Valle, E., Fraternali, P., Quarteroni, S. (2013). Publishing Data on the Web. In: Web Information Retrieval. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39314-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39314-3_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39313-6

  • Online ISBN: 978-3-642-39314-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics