MC-Tree: Improving Bayesian Anytime Classification

Kranen, Philipp; Günnemann, Stephan; Fries, Sergej; Seidl, Thomas

doi:10.1007/978-3-642-13818-8_19

MC-Tree: Improving Bayesian Anytime Classification

Philipp Kranen¹⁸,
Stephan Günnemann¹⁸,
Sergej Fries¹⁸ &
…
Thomas Seidl¹⁸

Conference paper

1862 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6187))

Abstract

In scientific databases large amounts of data are collected to create knowledge repositories for deriving new insights or planning further experiments. These databases can be used to train classifiers that later categorize new data tuples. However, the large amounts of data might yield a time consuming classification process, e.g. for nearest neighbors or kernel density estimators. Anytime classifiers bypass this drawback by being interruptible at any time while the quality of the result improves with higher time allowances. Interruptible classifiers are especially useful when newly arriving data has to be classified on demand, e.g. during a running experiment. A statistical approach to anytime classification has recently been proposed using Bayes classification on kernel density estimates.

In this paper we present a novel data structure called MC-Tree (Multi-Class Tree) that significantly improves Bayesian anytime classification. The tree stores a hierarchy of mixture densities that represent objects from several classes. Data transformations are used during tree construction to optimize the condition of the tree with respect to multiple classes. Anytime classification is achieved through novel query dependent model refinement approaches that take the entropy of the current mixture components into account. We show in experimental evaluation that the MC-Tree outperforms previous approaches in terms of anytime classification accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andre, D., Stone, P.: Physiological data modeling contest In: ICML 2004 (2004), http://www.cs.utexas.edu/users/pstone/workshops/2004icml/
Bayes, T.: An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society 53, 370–418 (1763)
Google Scholar
Bouckaert, R.: Naive Bayes Classifiers that Perform Well with Continuous Variables. In: AI (2004)
Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. DMKD Journal 2(2), 121–167 (1998)
Google Scholar
de Leeuw, J.: Applications of convex analysis to multidimensional scaling. In: Recent Developments in Statistics, pp. 133–146 (1977)
Google Scholar
DeCoste, D.: Anytime interval-valued outputs for kernel machines: Fast support vector machine classification via distance geometry. In: ICML, pp. 99–106 (2002)
Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, Chichester (2000)
Google Scholar
Esmeir, S., Markovitch, S.: Anytime induction of decision trees: An iterative improvement approach. In: Proc. of the 21st AAAI (2006)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases. In: ACM KDD (1996)
Google Scholar
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)
Google Scholar
Hettich, S., Bay, S.: The UCI KDD archive (1999), http://kdd.ics.uci.edu
John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: UAI. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Kranen, P., Assent, I., Baldauf, C., Seidl, T.: Self-adaptive anytime stream clustering. In: Proc. of the 9th IEEE ICDM (2009)
Google Scholar
Kranen, P., Seidl, T.: Harnessing the strengths of anytime algorithms for constant data streams. DMKD Journal, ECML PKDD Special Issue 19(2), 245–260 (2009)
Google Scholar
Kullback, S.: Information Theory and Statistics. Wiley, New York (1959)
MATH Google Scholar
Lauritzen, S.: The EM algorithm for graphical association models with missing data. Comp. Statistics & Data Analysis 19, 191–201 (1995)
Article MATH Google Scholar
Patrick, E., Fischer, F.: A generalized k-nearest neighbor rule. Information and Control 16(2), 128–152 (1970)
Article MATH MathSciNet Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Seidl, T., Assent, I., Kranen, P., Krieger, R., Herrmann, J.: Indexing density models for incremental learning and anytime classification on data streams. In: EDBT, pp. 311–322 (2009)
Google Scholar
Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, Boca Raton (1986)
MATH Google Scholar
Ueno, K., Xi, X., Keogh, E.J., Lee, D.-J.: Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM (2006)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Zilberstein, S.: Using anytime algorithms in intelligent systems. The AI magazine 17(3), 73–83 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Data management and data exploration group, RWTH Aachen University, Germany
Philipp Kranen, Stephan Günnemann, Sergej Fries & Thomas Seidl

Authors

Philipp Kranen
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Günnemann
View author publications
You can also search for this author in PubMed Google Scholar
Sergej Fries
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Seidl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, University of Heidelberg, 69120, Heidelberg, Germany
Michael Gertz
Dept. of Computer Science and Genome Center, University of California, One Shields Avenue, 95616, Davis, CA, USA
Bertram Ludäscher

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kranen, P., Günnemann, S., Fries, S., Seidl, T. (2010). MC-Tree: Improving Bayesian Anytime Classification. In: Gertz, M., Ludäscher, B. (eds) Scientific and Statistical Database Management. SSDBM 2010. Lecture Notes in Computer Science, vol 6187. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13818-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-13818-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13817-1
Online ISBN: 978-3-642-13818-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics