Abstract
Recent years witnessed an exponential increase in the number of data services available on the Web. Many popular Web sites, including social networks, offer API for interacting with their information, and open data initiative such as the Linked Data project promise to achieve the vision of the Web of data. Unfortunately, access to Web data is typically limited by the constraints imposed by the query interface, and by technical limitations such as the network latency, or the number and frequency of allowed daily service invocations. Moreover, several sources may independently publish data about the same real-world objects; in such case, their combined use for assembling all available information about those objects requires duplicate removal, reconciliation and integration. This paper describes various data materialization problems, defining properties such as source coverage and data alignment of the materialized data, and then focuses on a specific problem, the reseeding of data access methods by using available information from previous calls in order to build a materialization of maximum size.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bozzon, A., Brambilla, M., Ceri, S., Fraternali, P.: Liquid query: multi-domain exploratory search on the web. In: WWW 2010: Proceedings of the 19th International Conference on World Wide Web (2010)
Cafarella, M.J., Madhavan, J., Halevy, A.: Web-scale extraction of structured data. SIGMOD Rec. 37, 55–61 (2009)
Cali, A., Calvanese, D., Martinenghi, D.: Dynamic Query Optimization under Access Limitations and Dependencies. J. UCS 15(1), 33–62 (2009)
Cambazoglu, B.B., Junqueira, F.P., Plachouras, V., Banachowski, S., Cui, B., Lim, S., Bridge, B.: A refreshing perspective of search engine caching. In: WWW 2010: Proceedings of the 19th International Conference on World Wide Web (2010)
Bozzon, A., Brambilla, M., Ceri, S., Quarteroni, S.: A Framework for Integrating, Exploring, and Searching Location-Based Web Data. IEEE Internet Computing 15(6), 24–31 (2011)
Dasgupta, A., Das, G., Mannila: A random walk approach to sampling hidden databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (2007)
Gupta, A., Mumick, I.S. (eds.): Materialized views: techniques, implementations, and applications. MIT Press, Cambridge (1999)
Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10, 270–294 (2001)
Halevy, A., Rajaraman, A., Ordille, J.: Data integration: The teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16. VLDB Endowment (2006)
Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s deep web crawl. Proc. VLDB Endowment 1(2), 1241–1252 (2008)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: 16th International World Wide Web Conference (WWW 2007), New York, NY, USA (2007)
Wu, P., Wen, J.-R., Liu, H., Ma, W.-Y.: Query selection techniques for efficient crawling of structured web sources. In: International Conference on Data Engineering (2006)
Zerfos, P., Cho, J., Ntoulas, A.: Downloading textual hidden web content through keyword queries. In: Joint Conference on Digital Libraries, pp. 100–109 (2005)
Bozzon, A., Braga, D., Brambilla, M., Ceri, S., Corcoglioniti, F., Fraternali, P., Vadacca, S.: Search computing: multi-domain search on ranked data. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), pp. 1267–1270. ACM, New York (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bozzon, A., Ceri, S., Zagorac, S. (2012). Materialization of Web Data Sources. In: Ceri, S., Brambilla, M. (eds) Search Computing. Lecture Notes in Computer Science, vol 7538. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34213-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-34213-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34212-7
Online ISBN: 978-3-642-34213-4
eBook Packages: Computer ScienceComputer Science (R0)