Abstract
Pareto Density Estimation (PDE) as defined in this work is a method for the estimation of probability density functions using hyperspheres. The radius of the hyperspheres is derived from optimizing information while minimizing set size. It is shown, that PDE is a very good estimate for data containing clusters of Gaussian structure. The behavior of the method is demonstrated with respect to cluster overlap, number of clusters, different variances in different clusters and application to high dimensional data. For high dimensional data PDE is found to be appropriate for the purpose of cluster analysis. The method is tested successfully on a difficult high dimensional real world problem: stock picking in falling markets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
DEBOECK, G.J. and ULTSCH, A. (2002): Picking Stocks with Emergent Self-Organizing Value Maps. In: M. Novak (Ed.): Neural Networks World, 10,1–2, 203–216.
DEVROYE, L. and LUGOSI, G. (1996): A universally acceptable smoothing factor for kernel density estimation. Annals of Statistics, 24, 2499–2512.
DEVROYE, L. and LUGOSI, G. (1997): Non-asymptotic universal smoothing factors kernel complexity and Yatracos classes. Annals of Stat., 25, 2626–2637.
DEVROYE, L. and LUGOSI, G. (2000): Variable kernel estimates: on the impossibility of tuning the parameters. In: E. Giné and D. Mason (Eds.): High-Dimensional Probability. Springer-Verlag, New York.
ESTER, M., KRIEGEL, H.-P., and SANDER, J. (1996): A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. On Knowledge Discovery and Data Mining.
HALL, P.( 1992): On global properties of variable bandwidth density estimators. Annals of Statistics, 20, 762–778.
HINNEBURG, A. and KEIM, D.A. (1998): An Efficient Approach to Clustering in Large Multimedia Databases with Noise, Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining.
MARANJIAN, S. (2002): The Best Number of Stocks, The Motley Fool, 26.
O’NEIL, W.J. (1995): How to make money in stocks. Mc Gaw Hill, New York.
SCOTT, D.W. (1992): Multivariate Density Estimation. Wiley-Interscience, New York.
ULTSCH, A. (2001): Eine Begründung der Pareto 80/20 Regel und Grenzwerte für die ABC-Analyse, Technical Report Nr. 30, Department of Computer Science, University of Marburg.
ULTSCH, A. (2003): Optimal density estimation in data containing clusters of unknown structure, Technical Report Nr. 34, Department of Computer Science, University of Marburg.
XU, X., ESTER, M., KRIEGEL, H.-P., and SANDER, J. (1998): Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases, Proc. Conf. on Data Engineering, 324–331.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Ultsch, A. (2005). Pareto Density Estimation: A Density Estimation for Knowledge Discovery. In: Baier, D., Wernecke, KD. (eds) Innovations in Classification, Data Science, and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-26981-9_12
Download citation
DOI: https://doi.org/10.1007/3-540-26981-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23221-6
Online ISBN: 978-3-540-26981-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)