Model-Independent Bounding of the Supports of Boolean Formulae in Binary Data

Bykowski, Artur; Seppänen, Jouni K.; Hollmén, Jaakko

doi:10.1007/978-3-540-44497-8_12

Artur Bykowski⁹,
Jouni K. Seppänen¹⁰ &
Jaakko Hollmén¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2682))

378 Accesses
6 Citations

Abstract

Data mining algorithms such as the Apriori method for finding frequent sets in sparse binary data can be used for efficient computation of a large number of summaries from huge data sets. The collection of frequent sets gives a collection of marginal frequencies about the underlying data set. Sometimes, we would like to use a collection of such marginal frequencies instead of the entire data set (e.g. when the original data is inaccessible for confidentiality reasons) to compute other interesting summaries. Using combinatorial arguments, we may obtain tight upper and lower bounds on the values of inferred summaries. In this paper, we consider a class of summaries wider than frequent sets, namely that of frequencies of arbitrary Boolean formulae. Given frequencies of a number of any different Boolean formulae, we consider the problem of finding tight bounds on the frequency of another arbitrary formula. We give a general formulation of the problem of bounding formula frequencies given some background information, and show how the bounds can be obtained by solving a linear programming problem. We illustrate the accuracy of the bounds by giving empirical results on real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, ch. 12, pp. 307–328. AAAI Press, Menlo Park (1996)
Google Scholar
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 75–85. Springer, Heidelberg (2000)
Chapter Google Scholar
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: Proc. of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2001), Santa Barbara, CA, USA, May 2001, ACM, New York (2001)
Google Scholar
Calders, T.: Deducing bounds on the frequency of itemsets. In: EDBT 2002 Workshop on Database Technologies for Data Mining (2002)
Google Scholar
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)
Chapter Google Scholar
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proc. of the 2001 IEEE International Conference on Data Mining (ICDM 2001), San Jose, California, USA, pp. 163–170 (2001)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J., Bernstein, P.A. (eds.) 2000 ACM SIGMOD Intl. Conference on Management of Data, May 2000, pp. 1–12. ACM Press, New York (2000)
Chapter Google Scholar
Karmarkar, N.: A new polynomial-time algorithm for linear programming. In: Proceedings of the sixteenth annual ACM symposium on Theory of computing, pp. 302–311 (1984)
Google Scholar
Kreyszig, E.: Advanced Engineering Mathematics, 7th edn. John Wiley Inc., Chichester (1993)
MATH Google Scholar
Moore, A., Lee, M.S.: Cached sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research 8, 67–91 (1998)
MathSciNet MATH Google Scholar
Matoušek, J., Sharir, M., Welzl, E.: A subexponential bound for linear programming. Algorithmica 16(4/5), 498–516 (1996)
Article MathSciNet MATH Google Scholar
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations: Extended abstract. In: Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD 1996), Portland, Oregon, USA, August 1996, pp. 189–194. AAAI Press, Menlo Park (1996)
Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1(3), 259–289 (1997)
Article Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Information Systems 24(1), 25–46 (1999)
Article MathSciNet MATH Google Scholar
Pavlov, D., Mannila, H., Smyth, P.: Probabilistic models for query approximation with large sparse binary datasets. In: Proc. of the 16th Conference in Uncertainty in Artificial Intelligence (UAI 2000), Stanford, California, USA (2000)
Google Scholar
Toivonen, H.: Sampling large databases for association rules. In: Proc. of the 22th International Conference on Very Large Data Bases (VLDB 1996), Mumbai (Bombay), India, pp. 134–145 (1996)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LISI, INSA-Lyon, Bât. Blaise Pascal, 20, ave A. Einstein, F-69621 Cedex, Villeurbanne, France
Artur Bykowski
Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, 02015 HUT, Finland
Jouni K. Seppänen & Jaakko Hollmén

Authors

Artur Bykowski
View author publications
You can also search for this author in PubMed Google Scholar
Jouni K. Seppänen
View author publications
You can also search for this author in PubMed Google Scholar
Jaakko Hollmén
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Informatica, Università di Torino, Italy
Rosa Meo
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy
Pier Luca Lanzi
Nokia Research Center, Nokia Group, P.O.Box 407, FIN-00045, Finland
Mika Klemettinen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bykowski, A., Seppänen, J.K., Hollmén, J. (2004). Model-Independent Bounding of the Supports of Boolean Formulae in Binary Data. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-44497-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics