Abstract
Recently several original methods for conditional density estimation (CDE) have been developed. The abundance of information comprised by the full conditional density of target variables is great when compared to the regression or quantile regression estimates. Still, there are only few independent experimental investigations of these methods, especially concerning a multidimensional target variable, and this paper aims to address this issue. We consider several approaches such as kernel density estimation, reduction to binary classification, Naïve Bayes, Bayesian Network, “varying coefficient” approach, random forests and Approximate Bayesian Computation applied to a conditional density estimation problem. We examine these methods when applying to various datasets together with the dependency of the methods’ performance on different parameters including the number of irrelevant covariates, smoothness, and flatness of the distribution. Considered datasets include artificial models with required properties and with the known exact value of CDE evaluation measure and a real-world dataset arisen from the problem of structure recognition by XANES spectra, which is reduced to a regression task with a complex multimodal probability distribution of the target variable. The special attention is paid to the computation of the evaluation measure as the methods based on the direct optimization of the loss employ its imprecise but fast approximation which results in the poor prediction quality for datasets with a small target variance.
This work was supported by the Russian Foundation for Basic Research, project 18-02-40029.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Angrist, J.D., Pischke, J.S.: Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press, Princeton (2008)
Bohm, G., Zech, G.: Introduction to Statistics and Data Analysis for Physicists, vol. 1. Desy, Hamburg (2010)
Burnaev, E., Nazarov, I.: Conformalized kernel ridge regression. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 45–52. IEEE (2016). https://doi.org/10.1109/ICMLA.2016.0017
Guda, A.A., Guda, S.A., Lomachenko, K.A., et al.: Quantitative structural determination of active sites from in situ and operando XANES spectra: from standard AB initio simulations to chemometric and machine learning approaches. Catal. Today (2018). https://doi.org/10.1016/j.cattod.2018.10.071
Guda, S.A., Guda, A.A., Soldatov, M.A., et al.: Optimized finite difference method for the full-potential XANES simulations: application to molecular adsorption geometries in MOFs and metal-ligand intersystem crossing transients. J. Chem. Theory Comput. 11(9), 4512–4521 (2015). https://doi.org/10.1021/acs.jctc.5b00327
Izbicki, R., Lee, A.B., Pospisil, T.: ABC-CDE: toward approximate Bayesian computation with complex high-dimensional data and limited simulations. J. Comput. Graph. Stat. 1–20 (2019). https://doi.org/10.1080/10618600.2018.1546594
Izbicki, R., Lee, A.B., et al.: Converting high-dimensional regression to high-dimensional conditional density estimation. Electron. J. Stat. 11(2), 2800–2831 (2017). https://doi.org/10.1214/17-EJS1302
Kemp, G.C., Silva, J.S.: Regression towards the mode. J. Econ. 170(1), 92–101 (2012). https://doi.org/10.1016/j.jeconom.2012.03.002
Kuleshov, A.P., Bernstein, A., Burnaev, E.: Conformal prediction in manifold learning. In: 7th Symposium on Conformal and Probabilistic Prediction and Applications, COPA 2018, Maastricht, The Netherlands, 11–13 June 2018, pp. 234–253 (2018)
Kuleshov, A.P., Bernstein, A., Burnaev, E.: Kernel regression on manifold valued data. In: 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018, Turin, Italy, 1–3 October 2018, pp. 120–129 (2018). https://doi.org/10.1109/DSAA.2018.00022
Martini, A., Guda, S.A., Guda, A.A., et al.: PyFitit: the software for quantitative analysis of XANES spectra using machine-learning algorithms. Mendeley Data (2019). https://doi.org/10.17632/dwrb56xrx6.1
Martini, A., Guda, S.A., Guda, A.A., et al.: PyFitit: the software for quantitative analysis of XANES spectra using machine-learning algorithms. Comput. Phys. Commun. (2019, to appear)
Pospisil, T., Lee, A.B.: RFCDE: random forests for conditional density estimation. arXiv preprint arXiv:1804.05753 (2018)
Rau, M.M., Seitz, S., Brimioulle, F., et al.: Accurate photometric redshift probability density estimation-method comparison and application. Mon. Not. R. Astron. Soc. 452(4), 3710–3725 (2015). https://doi.org/10.1093/mnras/stv1567
Yao, W., Li, L.: A new regression model: modal linear regression. Scand. J. Stat. 41(3), 656–671 (2014). https://doi.org/10.1111/sjos.12054
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Berger, A., Guda, S. (2019). Experimental Analysis of Approaches to Multidimensional Conditional Density Estimation. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Lecture Notes in Computer Science(), vol 11832. Springer, Cham. https://doi.org/10.1007/978-3-030-37334-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-37334-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37333-7
Online ISBN: 978-3-030-37334-4
eBook Packages: Computer ScienceComputer Science (R0)