Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space

Abstract

Clustering interval data has been studied for decades. High-dimensional interval data can be expressed in terms of hyperrectangles in \(\mathbb {R}^d\) (or d-orthotopes) in case of real-valued d-attributes data. This paper investigates such high-dimensional interval data: the Cartesian product of intervals, or a vector of interval. For the efficient computation of related Boolean functions, some interesting aspects have been discovered using vertices and edges of the graph, generated from given events. We also study the lower and upper-bounded orthants in \(\mathbb {R}^d\) as events for which we show the existence of a polynomial-time algorithm to calculate the probability of the union of such events. This efficient algorithm has been discovered by constructing a suitable partial order relation based on a recursive projection onto lower-dimensional spaces. Illustrative real-life applications are presented.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. Agarwal, A., Hosanagar, K., & Smith, M. (2008). Location, location, location: An analysis of profitability of position in online advertising markets. Journal of Marketing Research, 48, 1057–1073.

    Article  Google Scholar 

  2. Boole, G. (1854). Laws of thought. New York: Dover.

    Google Scholar 

  3. Boole, G. (1868). Of propositions numerically definite. Trans Cambridge Philos Soc, Part II, XI pp 396–411.

  4. Boros, E., & Prékopa, A. (1989). Closed form two-sided bounds for probabilities that exactly \(r\) and at least \(r\) out of \(n\) events occur. Mathematics of Operations Research, 14, 317–342.

    Article  Google Scholar 

  5. Boros, E., Scozzari, A., Tardella, F., & Veneziani, P. (2014). Polynomially computable bounds for the probability of the union of events. Mathematics of Operations Research, 39(4), 1311–1329.

    Article  Google Scholar 

  6. Boyd, S., & Vandenberghe, L. (2018). Introduction to applied linear algebra. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  7. Bukszár, J., & Prékopa, A. (2001). Probability bounds with cherry trees. Mathematics of Operations Research, 26(1), 174–192.

    Article  Google Scholar 

  8. Bukszár, J., & Szántai, T. (2001). Probability bounds given by hypercherry trees. Alkalmaz Mat Lapok, 19, 69–85.

    Google Scholar 

  9. Chan, T. M. (2011). Persistent predecessor search and orthogonal point location on the word ram. In SODA ’11.

  10. Hailperin, T. (1965). Best possible inequalities for the probability of a logical function of events. The American Mathematical Monthly, 72, 343–359.

    Article  Google Scholar 

  11. Hunter, D. (1976). Bounds for the probability of a union. Journal of Applied Probability, 13, 597–603.

    Article  Google Scholar 

  12. Iacono, J., & Langerman, S. (2000). Dynamic point location in fat hyperrectangles with integer coordinates. In CCCG.

  13. Jordan, C. (1867). Mémoire sur la résolution algébrique des équations. Journal de Mathématiques pures et appliquées, 12, 109–157.

    Google Scholar 

  14. Kruskal, J. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50.

    Article  Google Scholar 

  15. Lee, J. (2017). Computing the probability of union in the \(n\)-dimensional Euclidean space for application of the multivariate quantile: \(p\)-level efficient points. Operations Research Letters, 45(3), 242–247.

    Article  Google Scholar 

  16. Lee, J., & Choi, P. M. S. (2020). Chain of Antichains: An efficient and secure distributed ledger, Springer Singapore, Singapore, pp 19–58. https://doi.org/10.1007/978-981-15-2205-5_2.

  17. Lee, J., & Kim, J. (2019). Partially ordered data sets and a new efficient method for calculating multivariate conditional value-at-risk. Annals of Operations Research,. https://doi.org/10.1007/s10479-019-03366-0.

    Article  Google Scholar 

  18. Lee, J., & Prékopa, A. (2017). On the probability of union in the n-space. Operations Research Letters, 45(1), 19–24.

    Article  Google Scholar 

  19. Miklosik, A., Kuchta, M., Evans, N., & Zak, S. (2019). Towards the adoption of machine learning-based analytical tools in digital marketing. IEEE Access, 7, 85705–85718.

    Article  Google Scholar 

  20. Pelleg, D., & Moore, A. (2001). Mixtures of rectangles: Interpretable soft clustering. In ICML.

  21. Prékopa, A. (1988). Boole–Bonferroni inequalities and linear programming. Operational Research, 36(1), 145–162.

    Article  Google Scholar 

  22. Prékopa, A. (1990a). Sharp bounds on probabilities using linear programming. Operational Research, 38(2), 227–239.

    Article  Google Scholar 

  23. Prékopa, A. (1990b). The discrete moment problem and linear programming. Discrete Applied Mathematics, 27, 235–254.

    Article  Google Scholar 

  24. Prékopa, A. (1995). Stochastic programming. Amsterdam: Kluwer Academic Publishers.

    Book  Google Scholar 

  25. Prékopa, A. (2003). Probabilistic programming. Hand books in Operations Research and Management Science (Ruszczyński, A and Shapiro, A, Eds), 10, 267–351.

    Google Scholar 

  26. Scozzari, A., & Tardella, F. (2018). Complexity of some graph-based bounds on the probability of a union of events. Discrete Applied Mathematics, 244, 186–197.

    Article  Google Scholar 

  27. Souza, R., & Carvalho, F. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25, 353–365.

    Article  Google Scholar 

  28. Strang, G. (2019). Linear Algebra and Learning from Data. Wellesley - Cambridge Press.

  29. Suzuki, S., & Ibaraki, T. (2004). An average running time analysis of a backtracking algorithm to calculate the measure of the union of hyperrectangles in \(d\) dimensions. In CCCG.

  30. Worsley, K. (1982). An improved Bonferroni inequality and applications. Biometrika, 69, 297–302.

    Article  Google Scholar 

  31. Yang, Y., & Padmanabhan, B. (2005). Ghic: A hierarchical pattern-based clustering algorithm for grouping web transactions. IEEE Transactions on Knowledge and Data Engineering, 17, 1300–1304.

    Article  Google Scholar 

Download references

Acknowledgements

It is an honor for the first author to have his academic father, Professor András Prékopa (1929–2016) as a second author of this paper. This paper’s main topic: the probability of Boolean functions of high dimensional interval data, was studied in 2019 - 2020 solely by the first author, and he presented the main idea of this paper at ISAIM (International Symposium of Artificial Intelligence and Mathematics) in January 2020 in Fort Lauderdale, Florida. Working on Boolean functions of hyperrectangles and related binomial moment problem formulation was initially suggested by Professor Prékopa in May 2016. The first author dearly misses him.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jinwook Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

András Prékopa: Deceased 18 September 2016.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lee, J., Prékopa, A. Clusters of high-dimensional interval data and related Boolean functions of events in Euclidean space. Ann Oper Res (2021). https://doi.org/10.1007/s10479-021-03951-2

Download citation

Keywords

  • Clustering
  • Multivariate interval data
  • Orthant
  • Hyperrectangle
  • Graph
  • Spanning tree
  • Boolean functions
  • Euclidean space
  • Probability bounds