Optimal unions of hidden classes

  • Radek HrebikEmail author
  • Jaromir Kukal
  • Josef Jablonsky
Original Paper


The cluster analysis is a traditional tool for multi-varietal data processing. Using the k-means method, we can split a pattern set into a given number of clusters. These clusters can be used for the final classification of known output classes. This paper focuses on various approaches that can be used for an optimal union of hidden classes. The resulting tasks include binary programming or convex optimization ones. Another possibility of obtaining hidden classes is designing imperfect classifier system. Novel context out learning approach is also discussed as possibility of using simple classifiers as background of the system of hidden classes which are easy to union to output classes using the optimal algorithm. All these approaches are useful in many applications, including econometric research. There are two main methodologies: supervised and unsupervised learning based on given pattern set with known or unknown output classification. Preferring supervised learning, we can combine the context out learning with optimal union of hidden classes to obtain the final classifier. But if we prefer unsupervised learning, we will begin with cluster analysis or another similar approach to also obtain the hidden class system for future optimal unioning. Therefore, the optimal union algorithm is widely applicable for any kind of classification tasks. The presented techniques are demonstrated on an artificial pattern set and on real data related to crisis prediction based on the clustering of macroeconomic indicators.


Classification Cluster analysis Binary programming Convex programming Cluster union Crisis prediction 



The authors would like to acknowledge the support of the research grants SGS 17/196/OHK4/3T/14 and SGS 17/197/OHK4/3T/14.


  1. Andrés JD, Lorca P, de Cos Juez FJ, Sánchez-Lasheras F (2011) Bankruptcy forecasting: a hybrid approach using fuzzy c-means clustering and multivariate adaptive regression splines (mars). Expert Syst Appl 38(3):1866–1875CrossRefGoogle Scholar
  2. Bolin JH, Edwards JM, Finch WH, Cassady JC (2014) Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches. Front Psychol 5:343Google Scholar
  3. Boser BE, Guyon IM, Vapnik VN, (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, COLT ’92, New York, NY, USA, pp 144–152Google Scholar
  4. Chang L, Slikker W (1995) Neurotoxicology: approaches and methods. Elsevier, AmsterdamGoogle Scholar
  5. Dias JG, Vermunt JK, Ramos S (2015) Clustering financial time series: new insights from an extended hidden Markov model. Eur J Oper Res 243(3):852–864CrossRefGoogle Scholar
  6. Directorate General for Economic and Financial Affairs (ECFIN), Statistical annex to European economy. Autumn 2015, Technical report, European Commission (2015).
  7. DiStefano J (2015) Dynamic systems biology modeling and simulation. Elsevier, AmsterdamGoogle Scholar
  8. Ghassempour S, Girosi F, Maeder A (2014) Clustering multivariate time series using hidden Markov models. Int J Environ Res Public Health 11(3):2741–2763CrossRefGoogle Scholar
  9. Harshbarger R, Reynolds J (2015) Mathematical applications for the management, life, and social sciences. Cengage Learning, BostonGoogle Scholar
  10. Hiriart-Urruty J, Lemarechal C (1996) Convex analysis and minimization algorithms I: fundamentals. Springer, BerlinGoogle Scholar
  11. Hrebik R, Kukal J, (2015) Multivarietal data whitening of main trends in economic development. In: Martincik D, Irgincova J, Janecek P (eds) Mathematical methods in economics, University of West Bohemia, Plzeň, pp 279–284Google Scholar
  12. Jaynes ET (1968) Prior probabilities. IEEE Trans Syst Sci Cybern 4(3):227–241CrossRefGoogle Scholar
  13. Kadir SN, Goodman DFM, Harris KD (2014) High-dimensional cluster analysis with the masked em algorithm. Neural Comput 26(11):2379–2394CrossRefGoogle Scholar
  14. Kateri M (2014) Contingency table analysis: methods and implementation using R. Statistics for Industry and Technology, Springer, New YorkCrossRefGoogle Scholar
  15. Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer series in statistics. Springer, New YorkCrossRefGoogle Scholar
  16. Kropat E, Weber G-W, Rückmann J-J (2010) Regression analysis for clusters in gene-environment networks based on ellipsoidal calculus and optimization. Dyn Continuous Discrete Impulsive Syst Ser B Appl Algorithms 17(5):639–657Google Scholar
  17. Novakova K (2008) Application of transforms in object recognition (in czech), Ph.D. thesis, FNSPE, CTU in PragueGoogle Scholar
  18. O’Brien L (1989) The statistical analysis of contingency table designs, concepts and techniques in modern geography, Environmental Publications, University of East Anglia, NorwichGoogle Scholar
  19. Santi E, Aloise D, Blanchard SJ (2016) A model for clustering data from heterogeneous dissimilarities. Eur J Oper Res 253(3):659–672CrossRefGoogle Scholar
  20. Shi G (2013) Data mining and knowledge discovery for geoscientists. Elsevier, AmsterdamGoogle Scholar
  21. Taylor J (1997) An introduction to error analysis: the study of uncertainties in physical measurements. A series of books in physics. University Science Books, SausalitoGoogle Scholar
  22. Volkovich Z, Barzily Z, Morozensky L (2008) A statistical model of cluster stability. Pattern Recogn 41(7):2174–2188CrossRefGoogle Scholar
  23. Volkovich Z, Barzily Z, Avros R, Toledano-Kitai D (2011) On application of a probabilistic k-nearest neighbors model for cluster validation problem. Commun Stat Theory Methods 40(16):2997–3010CrossRefGoogle Scholar
  24. Volkovich Z, Toledano-Kitai D, Weber GW (2013) Self-learning k-means clustering: a global optimization approach (report), J Glob Optim 56 (2):219(14)Google Scholar
  25. Wang J, Ma Y, Ouyang L, Tu Y (2016) A new Bayesian approach to multi-response surface optimization integrating loss function with posterior probability. Eur J Oper Res 249(1):231–237CrossRefGoogle Scholar
  26. Weber G (1978) A solution technique for binary integer programming using matchings on graphs. Cornell University, IthacaGoogle Scholar
  27. Weber G-W, Defterli O, Gök SZA, Kropat E (2011) Modeling, inference and optimization of regulatory networks based on time series data. Eur J Oper Res 211(1):1–14CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.FNSPECTU in PraguePrague 2Czech Republic
  2. 2.FISUniversity of EconomicsPrague 3Czech Republic

Personalised recommendations