Abstract
The increasing amount of information available in the web demands sophisticated querying methods and knowledge discovery techniques. In this study, we introduce our architectural framework WIND for a data warehouse over a domain-specific thematic section of the Internet. The aim of WIND is to provide a partially materialized structured view of the underlying information sources, on which database querying can be applied and mining techniques can be developed. WIND loads web documents into several complementary local repositories like OODBMSs and text retrieval systems. This allows for a combination of attribute and content-oriented query processing. Special interest is paid to domain-specific document formats. To support conversion between (semi-)structured documents and database objects, we consider a technique for the generation of format converters based on the notion of object-grammars.
Preview
Unable to display preview. Download preview PDF.
References
K. Aberer, K. Böhm, and C. Hüser. The prospects of publishing using advanced database concepts. Electronic Publishing, 6(4):469–480, dec 1993.
S. Abiteboul, S. Cluet, and T. Milo. Querying and updating the file. In 19th VLDB Conf., volume 19, pages 73–85, 8 1993.
S. Abiteboul, S. Cluet, and T. Milo. A database interface for file update. In SIGMOD '95, pages 386–397, 1995.
S. Abiteboul, S. Cluet, and T. Milo. Correspondence and translation for heterogeneous data. In ICDT '97, number 1186 in LNCS, pages 351–363, 1997.
R. Cattell. The Object Database Standard, ODMG-93. Morgan Kaufmann, 1994.
S. Chaudhuri and L. Gravano. Optimizing queries over multimedia repositories. In SIGMOD'96, pages 91–102, Montreal, Canada, June 1996. ACM.
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In Proc. of the 100th Anniv. Meeting, pages 7–18. Information Processing Society of Japan, 1994.
O. Etzioni. The World-Wide Web: Quagmire or gold mine? CACM, 39(11):65–68, Nov. 1996.
R. Fagin. Combining fuzzy informationm from multiple systems. In PODS'96, pages 216–226, Montreal, Canada, June 1996. ACM.
L. Faulstich, V. Linnemann, and M. Spiliopoulou. Using object-grammars for internet data warehousing. Technical report, Institut für Informationssysteme, Med. Universität Lübeck, 1997. http://www.inf.fu-berlin.de/faulstic/wind.ps.
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. The KDD process for extracting useful knowledge from volumes of data. CACM, 39(11):27–34, Nov. 1996.
A. Feng and T. Wakayama. SIMON: A grammar-based transformation system for structured documents. Electronic Publishing, 6(4):361–372, Dec. 1993.
W. Inmon. EIS and the data warehouse: a simple approach to building an effective foundation for EIS. Database Programming & Design, 5(11):70–73, nov 1992.
W. Inmon. The data warehouse and data mining. CACM, 39(11):49–50, Nov. 1996.
W. Inmon and C. Kelley. Rdb/VMS: Developing the Data Warehouse. QED Publishing Group, Boston, Massachusetts, 1993.
E. Kuikka and M. Penttonen. Transformation of structured documents with the use of grammar. Electronic Publishing, 6(4):373–383, Dec. 1993.
A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying Heterogeneous Information Sources Using Source Descriptions. In 22th VLDB Conf., pages 251–262, 1996.
J. Paakki. Attribute grammar paradigms: A high-level methodology in language implementation. ACM Computing Surveys, 27(2):196–255, June 1995.
U. Stutschka and V. Linnemann. Attributierte grammatiken als Werkzeug zur datenmodellierung. In G. Lausen, editor, BTW'95, pages 160–178, 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Faulstich, L.C., Spiliopoulou, M., Linnemann, V. (1997). WIND: A warehouse for internet data. In: Small, C., Douglas, P., Johnson, R., King, P., Martin, N. (eds) Advances in Databases. BNCOD 1997. Lecture Notes in Computer Science, vol 1271. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63263-8_20
Download citation
DOI: https://doi.org/10.1007/3-540-63263-8_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63263-4
Online ISBN: 978-3-540-69254-6
eBook Packages: Springer Book Archive