Advertisement

Algorithms for Redescription Mining

  • Esther Galbrun
  • Pauli Miettinen
Chapter
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Abstract

The aim of redescription mining is to find valid redescriptions for given data, query language, similarity relation, and user-specified constraints. In other words, we need to explore the search space consisting of query pairs from the query language, looking for those pairs that have similar enough support in the data and that satisfy the other constraints. In this chapter, we present the different methods that have been proposed to carry out this exploration efficiently. Existing methods can be arranged into three main categories: (1) mine-and-pair approaches, (2) alternating approaches, and (3) approaches that use atomic updates. We consider each one in turn, explaining its general common principles and looking at different algorithms designed on these principles. Next, we compare the different methods and discuss their relative strengths and weaknesses. Finally, we consider how to adapt the algorithms to handle cases where some values are missing from the input data.

References

  1. Aggarwal CC (2015) Data Mining: The Textbook. Springer, Cham, https://doi.org/10.1007/978-3-319-14142-8
  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases (VLDB’94), pp 487–499Google Scholar
  3. Blockeel H, De Raedt L, Ramon J (1998) Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning (ICML’98), pp 55–63Google Scholar
  4. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC press, Boca Raton, FLGoogle Scholar
  5. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297, https://doi.org/10.1007/BF00994018
  6. Galbrun E, Miettinen P (2012) From black and white to full color: Extending redescription mining outside the Boolean world. Stat Anal Data Min 5(4):284–303, https://doi.org/10.1002/sam.11145
  7. Gallo A, Miettinen P, Mannila H (2008) Finding subgroups having several descriptions: Algorithms for redescription mining. In: Proceedings of the 8th SIAM International Conference on Data Mining (SDM’08), pp 334–345, https://doi.org/10.1137/1.9781611972788.30
  8. Ganter B, Wille R (1999) Formal Concept Analysis: Mathematical Foundations. Springer, Berlin, https://doi.org/10.1007/978-3-642-59830-2
  9. Garey MR, Johnson DS (2002) Computers and intractability. A guide to the theory of NP-completeness, vol 29. W. H. Freeman and Co., San Francisco, CAGoogle Scholar
  10. Kumar D (2007) Redescription mining: Algorithms and applications in bioinformatics. PhD thesis, Department of Computer Science, Virginia Polytechnic Institute and State UniversityGoogle Scholar
  11. Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Proceedings of the 1994 AAAI Workshop on Knowledge Discovery in Databases (KDD’94), pp 181–192Google Scholar
  12. Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2017) A framework for redescription set construction. Expert Syst Appl 68:196–215, https://doi.org/10.1016/j.eswa.2016.10.012
  13. Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2016) Redescription mining with multi-target predictive clustering trees. In: Proceedings of the 4th International Workshop on the New Frontiers in Mining Complex Patterns (NFMCP’15), pp 125–143, https://doi.org/10.1007/978-3-319-39315-5_9
  14. Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2017) Redescription mining augmented with random forest of multi-target predictive clustering trees. J of Intell Inf Syst pp 1–34, https://doi.org/10.1007/s10844-017-0448-5
  15. Négrevergne B, Termier A, Rousset M, Méhaut J (2014) Para miner: A generic pattern mining algorithm for multi-core architectures. Data Min Knowl Disc 28(3):593–633, https://doi.org/10.1007/s10618-013-0313-2
  16. Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106, https://doi.org/10.1023/A:1022643204877
  17. Ramakrishnan N, Zaki MJ (2009) Redescription mining and applications in bioinformatics. In: Chen J, Lonardi S (eds) Biological Data Mining, Chapman and Hall/CRC, Boca Raton, FLGoogle Scholar
  18. Ramakrishnan N, Kumar D, Mishra B, Potts M, Helm RF (2004) Turning CARTwheels: An alternating algorithm for mining redescriptions. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 266–275, https://doi.org/10.1145/1014052.1014083
  19. Zaki MJ, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data En 17(4):462–478, https://doi.org/10.1109/TKDE.2005.60
  20. Zaki MJ, Ramakrishnan N (2005) Reasoning about sets using redescription mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05), pp 364–373, https://doi.org/10.1145/1081870.1081912
  21. Zhao L, Zaki MJ, Ramakrishnan N (2006) BLOSOM: A framework for mining arbitrary Boolean expressions. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), pp 827–832, https://doi.org/10.1145/1150402.1150511
  22. Zinchenko T, Galbrun E, Miettinen P (2015) Mining predictive redescriptions with trees. In: IEEE International Conference on Data Mining Workshops, pp 1672–1675, https://doi.org/10.1109/ICDMW.2015.123

Copyright information

© The Author(s) 2017

Authors and Affiliations

  • Esther Galbrun
    • 1
  • Pauli Miettinen
    • 2
  1. 1.LORIAInria Nancy – Grand EstVillers-lès-NancyFrance
  2. 2.Databases and Information SystemsMax-Planck-Institute for InformaticsSaarbrückenGermany

Personalised recommendations