Mining Bi-sets in Numerical Data

Besson, Jérémy; Robardet, Céline; De Raedt, Luc; Boulicaut, Jean-François

doi:10.1007/978-3-540-75549-4_2

Mining Bi-sets in Numerical Data

Jérémy Besson^1,2,
Céline Robardet¹,
Luc De Raedt³ &
…
Jean-François Boulicaut¹

Conference paper

402 Accesses
6 Citations
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4747))

Abstract

Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers whose dimensions denote objects and properties). Therefore, using efficient 0/1 mining techniques needs for tedious Boolean property encoding phases. This is, e.g., the case, when considering microarray data mining and its impact for knowledge discovery in molecular biology. We consider the possibility to mine directly numerical data to extract collections of relevant bi-sets, i.e., couples of associated sets of objects and attributes which satisfy some user-defined constraints. Not only we propose a new pattern domain but also we introduce a complete solver for computing the so-called numerical bi-sets. Preliminary experimental validation is given.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.-F., Gandrillon, O.: Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human sage data. Genome Biology, 12 (November 2002)
Google Scholar
Bergmann, S., Ihmels, J., Barkai, N.: Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review 67 (March 2003)
Google Scholar
Besson, J., Pensa, R., Robardet, C., Boulicaut, J.-F.: Constraint-based mining of fault-tolerant patterns from boolean data. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 55–71. Springer, Heidelberg (2006)
Google Scholar
Besson, J., Robardet, C., Boulicaut, J.-F., Rome, S.: Constraint-based concept mining and its application to microarray data analysis. Intelligent Data Analysis 9(1), 59–82 (2005)
Google Scholar
Bozdech, Z., Llinás, M., Pulliam, B., Wong, E., Zhu, J., DeRisi, J.: The transcriptome of the intraerythrocytic developmental cycle of plasmodium falciparum. PLoS Biology 1(1), 1–16 (2003)
Article Google Scholar
Calders, T., Goethals, B., Jaroszewicz, S.: Mining rank correlated sets of numerical attributes. In: Proceedings ACM SIGKDD 2006, Philadelphia, USA, August 2006, pp. 96–105 (2006)
Google Scholar
Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2002)
Article Google Scholar
Hartigan, J.: Direct clustering of data matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)
Article Google Scholar
Liu, J., Wang, W.: Op-cluster: Clustering by tendency in high dimensional space. In: Proceedings IEEE ICDM’03, Melbourne, USA, December 2003, pp. 187–194 (2003)
Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. ACM/IEEE Trans. on computational biology and bioinformatics 1(1), 24–45 (2004)
Article Google Scholar
Pensa, R., Boulicaut, J.-F.: Boolean property encoding for local set pattern discovery: an application to gene expression data analysis. In: Morik, K., Boulicaut, J.-F., Siebes, A. (eds.) Local Pattern Detection. LNCS (LNAI), vol. 3539, pp. 114–134. Springer, Heidelberg (2005)
Google Scholar
Pensa, R.G., Leschi, C., Besson, J., Boulicaut, J.-F.: Assessment of discretization techniques for relevant pattern discovery from gene expression data. In: Proceedings ACM BIOKDD 2004, Seattle, USA, August 2004, pp. 24–30 (2004)
Google Scholar
Ruckert, U., Richter, L., Kramer, S.: Quantitative association rules based on half-spaces: An optimization approach. In: Proceedings IEEE ICDM 2004, November 2004, pp. 507–510, Brighton, UK (2004)
Google Scholar
Steinbach, M., Tan, P.-N., Xiong, H., Kumar, V.: Generalizing the notion of support. In: Proceedings ACM SIGKDD 2004, Seatle, USA, pp. 689–694 (2004)
Google Scholar
Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings ACM SIGMOD 2002, Madison, USA, June 2002, pp. 394–405 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

LIRIS UMR 5205 CNRS/INSA Lyon, Bâtiment Blaise Pascal, F-69621 Villeurbanne, France
Jérémy Besson, Céline Robardet & Jean-François Boulicaut
UMR INRA/INSERM 1235, F-69372 Lyon cedex 08, France
Jérémy Besson
Albert-Ludwigs-Universitat Freiburg, Georges-Kohler-Allee, Gebaude 079 D-79110 Freiburg, Germany
Luc De Raedt

Authors

Jérémy Besson
View author publications
You can also search for this author in PubMed Google Scholar
Céline Robardet
View author publications
You can also search for this author in PubMed Google Scholar
Luc De Raedt
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Boulicaut
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sašo Džeroski Jan Struyf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Besson, J., Robardet, C., De Raedt, L., Boulicaut, JF. (2007). Mining Bi-sets in Numerical Data. In: Džeroski, S., Struyf, J. (eds) Knowledge Discovery in Inductive Databases. KDID 2006. Lecture Notes in Computer Science, vol 4747. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75549-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-75549-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75548-7
Online ISBN: 978-3-540-75549-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics