What Is Redescription Mining

  • Esther Galbrun
  • Pauli Miettinen
Chapter
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Abstract

In scientific investigations, data oftentimes differ in nature; for instance, they might originate from distinct sources or be cast over separate terminologies. In order to gain insight into the phenomenon of interest, an intuitive first task is to identify the correspondences that exist between these different aspects. This is the motivating principle behind redescription mining, a data analysis task that aims at finding distinct common characterizations of the same objects. In this chapter, we provide the basic definitions of redescription mining, including the data model, query languages, similarity measures, p-value calculations, and methods for pruning redundant redescriptions. We will also briefly cover related data analysis methods and provide a short history of redescription mining research.

References

  1. Aggarwal CC (2015) Data Mining: The Textbook. Springer, Cham, https://doi.org/10.1007/978-3-319-14142-8
  2. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD’93), pp 207–216, https://doi.org/10.1145/170035.170072
  3. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27(2):94–105, https://doi.org/10.1145/276305.276314
  4. Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04), pp 19–26, https://doi.org/10.1109/ICDM.2004.10095
  5. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297, https://doi.org/10.1007/BF00994018
  6. De Bie T (2011) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min Knowl Discov 23(3):407–446, https://doi.org/10.1007/s10618-010-0209-3
  7. De Raedt L, Guns T, Nijssen S (2010) Constraint programming for data mining and machine learning. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI’10)Google Scholar
  8. Gaidar D (2015) Mining redescriptors in Staphylococcus aureus data. Master’s thesis, Universität des Saarlandes, SaarbrückenGoogle Scholar
  9. Galbrun E (2013) Methods for redescription mining. PhD thesis, Department of Computer Science, University of HelsinkiGoogle Scholar
  10. Galbrun E, Kimmig A (2014) Finding relational redescriptions. Mach Learn 96(3):225–248, https://doi.org/10.1007/s10994-013-5402-3
  11. Galbrun E, Miettinen P (2012a) A case of visual and interactive data analysis: Geospatial redescription mining. In: Proceedings of the ECML PKDD 2012 Workshop on Instant and Interactive Data Mining (IID’12), URL http://adrem.ua.ac.be/iid2012/papers/galbrun_miettinen-visual_and_interactive_geospatial_redescription_mining.pdf, accessed 25 Oct 2017.
  12. Galbrun E, Miettinen P (2012b) From black and white to full color: Extending redescription mining outside the Boolean world. Stat Anal Data Min 5(4):284–303, https://doi.org/10.1002/sam.11145
  13. Galbrun E, Miettinen P (2012c) Siren: An interactive tool for mining and visualizing geospatial redescriptions [demo]. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12), pp 1544–1547, https://doi.org/10.1145/2339530.2339776
  14. Galbrun E, Miettinen P (2014) Interactive redescription mining. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD’14), pp 1079–1082, https://doi.org/10.1145/2588555.2594520
  15. Galbrun E, Miettinen P (2016) Analysing political opinions using redescription mining. In: IEEE International Conference on Data Mining Workshops, pp 422–427, https://doi.org/10.1109/ICDMW.2016.0066
  16. Gallo A, De Bie T, Cristianini N (2007) MINI: Mining informative non-redundant itemsets. In: Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’07), pp 438–445Google Scholar
  17. Gallo A, Miettinen P, Mannila H (2008) Finding subgroups having several descriptions: Algorithms for redescription mining. In: Proceedings of the 8th SIAM International Conference on Data Mining (SDM’08), pp 334–345, https://doi.org/10.1137/1.9781611972788.30
  18. Ganter B, Wille R (1999) Formal Concept Analysis: Mathematical Foundations. Springer, Berlin, https://doi.org/10.1007/978-3-642-59830-2
  19. Geng L, Hamilton HJ (2006) Interestingness measures for data mining: A survey. ACM Comput Surv 38(3):Article 9, https://doi.org/10.1145/1132960.1132963
  20. Goel N, Hsiao MS, Ramakrishnan N, Zaki MJ (2010) Mining complex Boolean expressions for sequential equivalence checking. In: Proceedings of the 19th IEEE Asian Test Symposium (ATS’10), pp 442–447, https://doi.org/10.1109/ATS.2010.81
  21. Grinnell J (1917) The niche-relationships of the california thrasher. The Auk 34(4):427–433Google Scholar
  22. Guns T, Nijssen S, De Raedt L (2013) k-Pattern set mining under constraints. IEEE Trans Knowl Data En 25(2):402–418, https://doi.org/10.1109/TKDE.2011.204
  23. Gupta SK, Phung D, Adams B, Venkatesh S (2013) Regularized nonnegative shared subspace learning. Data Min Knowl Disc 26(1):57–97, https://doi.org/10.1007/s10618-011-0244-8
  24. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554, https://doi.org/10.1162/neco.2006.18.7.1527
  25. Inselberg A (2009) Parallel Coordinates: Visual Multidimensional Geometry and Its Applications. Springer, Dordrecht, https://doi.org/10.1007/978-0-387-68628-8
  26. Jin Y, Murali TM, Ramakrishnan N (2008) Compositional mining of multirelational biological datasets. ACM Trans Knowl Disc Data 2(1):2–35, https://doi.org/10.1145/1342320.1342322
  27. Kalofolias J, Galbrun E, Miettinen P (2016) From sets of good redescriptions to good sets of redescriptions. In: Proceedings of the 16th IEEE International Conference on Data Mining (ICDM’16), pp 211–220, https://doi.org/10.1109/ICDM.2016.0032
  28. Khan SA, Kaski S (2014) Bayesian multi-view tensor factorization. In: Proceedings of the 2014 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’14), pp 656–671, https://doi.org/10.1007/978-3-662-44848-9_42
  29. Kröger P, Zimek A (2009) Subspace clustering techniques. In: Liu L, Özsu MT (eds) Encyclopedia of Database Systems, Springer, New York, pp 2873–2875, https://doi.org/10.1007/978-0-387-39940-9_607
  30. Kumar D (2007) Redescription mining: Algorithms and applications in bioinformatics. PhD thesis, Department of Computer Science, Virginia Polytechnic Institute and State UniversityGoogle Scholar
  31. van Leeuwen M, Galbrun E (2015) Association discovery in two-view data. IEEE Trans Knowl Data Eng 27(12):3190–3202, https://doi.org/10.1109/TKDE.2015.2453159
  32. Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of the 2008 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’08), vol 5212, pp 1–16, https://doi.org/10.1007/978-3-540-87481-2_1
  33. Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: A survey. IEEE Trans Comput Bio Bioinform 1(1):24–45, https://doi.org/10.1109/TCBB.2004.2
  34. Miettinen P (2012) On finding joint subspace boolean matrix factorizations. In: SIAM International Conference on Data Mining (SDM’12), pp 954–965, https://doi.org/10.1137/1.9781611972825.82
  35. Mihelčić M, Šmuc T (2016) InterSet: Interactive redescription set exploration. In: Proceedings of the 19th International Conference on Discovery Science (DS’16), vol 9956, pp 35–50Google Scholar
  36. Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2017) A framework for redescription set construction. Expert Syst Appl 68:196–215, https://doi.org/10.1016/j.eswa.2016.10.012
  37. Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2016) Redescription mining with multi-target predictive clustering trees. In: Proceedings of the 4th International Workshop on the New Frontiers in Mining Complex Patterns (NFMCP’15), pp 125–143, https://doi.org/10.1007/978-3-319-39315-5_9
  38. Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2017) Redescription mining augmented with random forest of multi-target predictive clustering trees. J of Intell Inf Syst pp 1–34, https://doi.org/10.1007/s10844-017-0448-5
  39. Parida L, Ramakrishnan N (2005) Redescription mining: Structure theory and algorithms. In: Proceedings of the 20th National Conference on Artificial Intelligence and the 7th Innovative Applications of Artificial Intelligence Conference (AAAI’05), pp 837–844Google Scholar
  40. Ramakrishnan N, Zaki MJ (2009) Redescription mining and applications in bioinformatics. In: Chen J, Lonardi S (eds) Biological Data Mining, Chapman and Hall/CRC, Boca Raton, FLGoogle Scholar
  41. Ramakrishnan N, Kumar D, Mishra B, Potts M, Helm RF (2004) Turning CARTwheels: An alternating algorithm for mining redescriptions. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 266–275, https://doi.org/10.1145/1014052.1014083
  42. Reza FM (1961) An Introduction to Information Theory. McGraw-Hill, New YorkGoogle Scholar
  43. Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471, https://doi.org/10.1016/0005-1098(78)90005-5
  44. Rossi F, Van Beek P, Walsh T (2006) Handbook of constraint programming. Elsevier, AmsterdamGoogle Scholar
  45. Soberón J, Nakamura M (2009) Niches and distributional areas: Concepts, methods, and assumptions. Proc Natl Acad Sci USA 106(Supplement 2):19,644–19,650, https://doi.org/10.1073/pnas.0901637106
  46. Umek L, Zupan B, Toplak M, Morin A, Chauchat JH, Makovec G, Smrke D (2009) Subgroup discovery in data sets with multi-dimensional responses: A method and a case study in traumatology. In: Proceedings of the 12th Conference on Artificial Intelligence in Medicine (AIME’09), vol 5651, pp 265–274, https://doi.org/10.1007/978-3-642-02976-9_39
  47. Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’97), vol 1263, pp 78–87, https://doi.org/10.1007/3-540-63223-9_108
  48. Zaki MJ, Ramakrishnan N (2005) Reasoning about sets using redescription mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05), pp 364–373, https://doi.org/10.1145/1081870.1081912
  49. Zinchenko T, Galbrun E, Miettinen P (2015) Mining predictive redescriptions with trees. In: IEEE International Conference on Data Mining Workshops, pp 1672–1675, https://doi.org/10.1109/ICDMW.2015.123

Copyright information

© The Author(s) 2017

Authors and Affiliations

  • Esther Galbrun
    • 1
  • Pauli Miettinen
    • 2
  1. 1.LORIAInria Nancy – Grand EstVillers-lès-NancyFrance
  2. 2.Databases and Information SystemsMax-Planck-Institute for InformaticsSaarbrückenGermany

Personalised recommendations