Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS

Gimbel, Matthias; Klein, Michael; Lockemann, P. C.

doi:10.1007/978-3-540-44497-8_9

Matthias Gimbel⁹,
Michael Klein⁹ &
P. C. Lockemann⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2682))

377 Accesses

Abstract

The conflict between resource consumption and query performance in the data mining context often has no satisfactory solution. This is in sharp contrast to the needs of the analysts for interactive response times and has rendered the seamless integration of data mining operators into common multiuser database systems a difficult and (so far) not very successful task. This paper describes an approach that allows to combine preprocessing and data mining operators into one common KDD-aware implementation algebra such that interactivity, scalability and resource efficiency can simultaneously be achieved. The basic idea of our framework is pipelining. However, since there is a danger of blocking pipelines, we introduce controlled ordering-, cardinality- and special-value-properties of the data stream across the whole query tree up to the complex data mining operators. The framework builds on a spezialized index that is basically an extension of the UB-Tree and efficiently provides various data orderings. These orderings and the remaining properties are then exploited by the KDD-algebra operators to release results and internal data structures early enough to allow pipelined, resource-efficient query processing with interactive response times. This paper describes the framework and demonstrates its benefits in preprocessing and in the parallel and interactive detection of outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bayer, R.: The universal B-tree for multidimensional indexing. Technical Report TUM-I9637, TU München (November 1996)
Google Scholar
DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Communications of the ACM 35(6), 85–98 (1992)
Article Google Scholar
Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: A generic and non-blocking sort-based join algorithm. In: Proceedings of the 28th VLDB Conferende (2002)
Google Scholar
Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadephia, Pennsylvania, USA, June 1–3, 1999, pp. 287–298. ACM Press, New York (1999)
Chapter Google Scholar
Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: Dmql: A data mining query language for relational databases. In: Proceddings of the SIGMOD’96 Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Kanada, June 1996, pp. 27–34 (1996)
Google Scholar
Hilbert, D.: Über die stetige Abbildung eine Linie auf ein Flächenstück. Mathematische Annalen (1891)
Google Scholar
Ives, Z., Florescu, D., Friedmann, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: Proceddings of the ACM SIGMOD Conference (1999)
Google Scholar
Jagadish, H.V.: Linear clustering of objects with multiple atributes. In: Garcia-Molina, H., Jagadish, H.V. (eds.) Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23–25, 1990, pp. 332–342. ACM Press, New York (1990)
Chapter Google Scholar
Johnson, T., Lakshmanan, L.V.S., Ng, R.T.: The 3w model and algebra for unified data mining. In: Proceedings of the 26th VLDB Conference, Kairo, Egypt (2000)
Google Scholar
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th VLDB Conference, New York, USA (1998)
Google Scholar
Manegold, S., Waas, F., Kersten, M.L.: On optimal pipeline processing in parallel query execution. Technical report, CWI, Amsterdam (February 1998), http://www.cwi.nl/ftp/CWIreports/INS/INS-R9805.ps.Z
Markl, V., Zirkel, M., Bayer, R.: Processing operations with restrictions in rdbms without external sorting: The tetris algorithm. In: Proceedings of the 15th International Conference on Data Engineering, Sydney, Austrialia, March 23–26, 1999, pp. 562–571. IEEE Computer Society, Los Alamitos (1999)
Google Scholar
Orenstein, J.A., Merrett, T.H.: A class of data structures for associative searching. In: Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Waterloo, Ontario, Canada, April 2–4, 1984, pp. 181–190. ACM, New York (1984)
Chapter Google Scholar
Philippsen, M., Zenger, M.: Javaparty - transparent remote objects in java. In: Concurrency: Practice and Experience (1997)
Google Scholar
Raman, V., Raman, B., Hellerstein, J.M.: Online dynamic reordering for interactive data processing. In: Proceedings of the 25th VLDB Conference, Edinburgh, Scotland (1999)
Google Scholar
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proc. of the SIGMOD Conference (1979)
Google Scholar
Spiliopoulou, M., Hatzopoulos, M., Vassilakis, C.: A cost model for the estimation of query execution time in a parallel environment supporting pipeline. In: Computers and Artificial Intelligence (1996)
Google Scholar
Urhan, T., Franklin, M.J.: Xjoin: A reactively-scheduled pipelining join operator. IEEE Data Engineering Bulletin (2000)
Google Scholar
Urhan, T., Franklin, M.J.: Dynamic pipeline scheduling for improving interactive performance of online queries. In: Proceedings of the 27th Intl. Conference on Very Large Data Bases (2001)
Google Scholar
Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Proceedings of the First International Conference on Parallel and Distributed Information Systems, Miami Beach, December 1991, pp. 68–77 (1991)
Google Scholar
Wilschut, A.N., van Gils, S.A.: A model for pipelined query execution. In: Proceedings of the MASCOTS93 Syposium (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Fakultät für Informatik, Universität Karlsruhe, Am Fasanengarten 5, 76128, Karlsruhe, Germany
Matthias Gimbel, Michael Klein & P. C. Lockemann

Authors

Matthias Gimbel
View author publications
You can also search for this author in PubMed Google Scholar
Michael Klein
View author publications
You can also search for this author in PubMed Google Scholar
P. C. Lockemann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Informatica, Università di Torino, Italy
Rosa Meo
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy
Pier Luca Lanzi
Nokia Research Center, Nokia Group, P.O.Box 407, FIN-00045, Finland
Mika Klemettinen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gimbel, M., Klein, M., Lockemann, P.C. (2004). Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-44497-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics