Abstract
We briefly introduce the notion of an inductive database, explain its relation to constraint-based data mining, and illustrate it on an example. We then discuss constraints and constraint-based data mining in more detail. We further give an overview of recent developments in the area, focusing on those made within the IQ project and presented in a recent volume with the same title as this paper, edited by the author, Bart Goethals and Panče Panov, and published by Springer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 207–216. ACM Press, New York (1993)
Bayardo, R. (guest ed.): Constraints in data mining. Special issue of SIGKDD Explorations 4(1) (2002)
Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.-F., Gandrillon, O.: Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data. Genome Biology 3(12), research0067 (2002)
Besson, J., Boulicaut, J.-F., Guns, T., Nijssen, S.: Generalizing Itemset Mining in a Constraint Programming Setting. In: [25], pp. 107–126 (2010)
Bingham, E.: Finding Segmentations of Sequences. In: [25], pp. 177–197 (2010)
Bistarelli, S., Bonchi, F.: Interestingness is Not a Dichotomy: Introducing Softness in Constrained Pattern Mining. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 22–33. Springer, Heidelberg (2005)
Blachon, S., Pensa, R.G., Besson, J., Robardet, C., Boulicaut, J.-F., Gandrillon, O.: Clustering formal concepts to discover biologically relevant knowledge from gene expression data. In Silico Biology 7(4-5), 467–483 (2007)
Blockeel, H., Calders, T., Fromont, E., Goethals, B., Prado, A., Robardet, C.: Inductive Querying with Virtual Mining Views. In: [25], pp. 265–287 (2010b)
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7(1), 5–22 (2003)
Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.): Constraint-Based Mining and Inductive Databases. Springer, Berlin (2005)
Boulicaut, J.-F., Jeudy, B.: Constraint-based data mining. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 399–416. Springer, Berlin (2005)
Boulicaut, J.-F., Klemettinen, M., Mannila, H.: Modeling KDD processes within the inductive database framework. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 293–302. Springer, Heidelberg (1999)
Bringmann, B., Nijssen, S., Zimmermann, A.: From Local Patterns to Classification Models. In: [25], pp. 127–154 (2010)
Bringmann, B., Zimmermann, A., De Raedt, L., Nijssen, S.: Don’t be afraid of simpler patterns. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 55–66. Springer, Heidelberg (2006)
Calders, T., Goethals, B., Prado, A.B.: Integrating pattern mining in relational databases. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 454–461. Springer, Heidelberg (2006a)
Calders, T., Lakshmanan, L.V.S., Ng, R.T., Paredaens, J.: Expressive power of an algebra for data mining. ACM Transactions on Database Systems 31(4), 1169–1214 (2006b)
Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 64–80. Springer, Heidelberg (2006)
Cerf, L., Besson, J., Robardet, C., Boulicaut, J.-F.: Data-Peeler: Constraint-based closed pattern mining in n-ary relations. In: Proc. 8th SIAM Intl. Conf. on Data Mining, pp. 37–48. SIAM, Philadelphia (2008)
Cerf, L., Nhan Nguyen, B.T., Boulicaut, J.-F.: Mining Constrained Cross-Graph Cliques in Dynamic Networks. In: [25], pp. 199–228 (2010)
De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4(2), 69–77 (2002a)
De Raedt, L.: Data mining as constraint logic programming. In: Kakas, A.C., Sadri, F. (eds.) Computational Logic: Logic Programming and Beyond. LNCS (LNAI), vol. 2408, pp. 113–125. Springer, Heidelberg (2002b)
De Raedt, L., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: Proc. 14th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 204–212. ACM Press, New York (2008)
De Raedt, L., Kimmig, A., Gutmann, B., Kersting, K., Santos Costa, V., Toivonen, H.: Probabilistic Inductive Querying Using ProbLog. In: [25], pp. 229–262 (2010)
Džeroski, S.: Towards a general framework for data mining. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 259–300. Springer, Heidelberg (2007)
Džeroski, S., Goethals, B., Panov, P. (eds.): Inductive Databases and Constraint-Based Data Mining. Springer, Berlin (2010)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: An overview. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 495–515. MIT Press, Cambridge (1996)
Fayyad, U., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the KDD-2003 panel – “Data Mining: The Next 10 Years”. SIGKDD Explorations 5(2), 191–196 (2003)
Garriga, G.C., Khardon, R., De Raedt, L.: On mining closed sets in multirelational data. In: Proc. 20th Intl. Joint Conf. on Artificial Intelligence, pp. 804–809. AAAI Press, Menlo Park (2007)
Gionis, A., Mannila, H., Mielikainen, T., Tsaparas, P.: Assessing data mining results via swap randomization. In: Proc. 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 167–176. ACM Press, New York (2006)
Haiminen, N., Mannila, H.: Discovering isochores by least-squares optimal segmentation. Gene 394(1-2), 53–60 (2007)
Han, J., Lakshmanan, L.V.S., Ng, R.T.: Constraint-Based Multidimensional Data Mining. IEEE Computer 32(8), 46–50 (1999)
Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)
Johnson, T., Lakshmanan, L.V., Ng, R.: The 3W model and algebra for unified data mining. In: Proc. of the Intl. Conf. on Very Large Data Bases, pp. 21–32. Morgan Kaufmann, San Francisco (2000)
King, R.D., Schierz, A., Clare, A., Rowland, J., Sparkes, A., Nijssen, S., Ramon, J.: Inductive Queries for a Drug Designing Robot Scientist. In: [25], pp. 425–453 (2010)
Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in HIV data. In: Proc. 7th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 136–143. ACM Press, New York (2001)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Meo, R.: Optimization of a language for data mining. In: Proc. 18th ACM Symposium on Applied Computing, pp. 437–444. ACM Press, New York (2003)
Mitchell, T.M.: Generalization as search. Artificial Intelligence 18(2), 203–226 (1982)
Nijssen, S., De Raedt, L.: IQL: A proposal for an inductive query language. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 189–207. Springer, Heidelberg (2007)
Panov, P., Soldatova, L., Džeroski, S.: Representing Entities in the OntoDM Data Mining Ontology. In: [25], pp. 29–58 (2010)
Pečkov, A., Džeroski, S., Todorovski, L.: Multi-target polynomial regression with constraints. In: Proc. Intl. Wshp. on Constrained-Based Mining and Learning, ECML/PKDD, Warsaw, pp. 61–72 (2007)
Pensa, R.G., Robardet, C., Boulicaut, J.-F.: Constraint-driven co-clustering of 0/1 data. In: Basu, S., Davidson, I., Wagstaff, K. (eds.) Constrained Clustering: Advances in Algorithms, Theory and Applications, pp. 145–170. Chapman & Hall/CRC Press, Boca Raton, FL (2008)
Rigotti, C., Mitašiūnaitė, I., Besson, J., Meyniel, L., Boulicaut, J.-F., Gandrillon, O.: Using a Solver Over the String Pattern Domain to Analyze Gene Promoter Sequences. In: [25], pp. 407–423 (2010)
Slavkov, I., Džeroski, S.: Analyzing Gene Expression Data with Predictive Clustering Trees. In: [25], pp. 389–406 (2010)
Struyf, J., Džeroski, S.: Constrained Predictive Clustering. In: [25], pp. 155–175 (2010)
Vanschoren, J., Blockeel, H.: Experiment Databases. In: [25], pp. 335–361 (2010)
Vens, C., Schietgat, L., Struyf, J., Blockeel, H., Kocev, D., Džeroski, S.: Predicting Gene Function using Predictive Clustering Trees. In: [25], pp. 365–387 (2010)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proc. 17th Intl. Conf. on Machine Learning, pp. 1103–1110. Morgan Kaufmann, San Francisco (2000)
Yang, Q., Wu, X.: 10 Challenging problems in data mining research. International Journal of Information Technology & Decision Making 5(4), 597–604 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Džeroski, S. (2011). Inductive Databases and Constraint-Based Data Mining. In: Valtchev, P., Jäschke, R. (eds) Formal Concept Analysis. ICFCA 2011. Lecture Notes in Computer Science(), vol 6628. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20514-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-20514-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20513-2
Online ISBN: 978-3-642-20514-9
eBook Packages: Computer ScienceComputer Science (R0)