Applications, Variants, and Extensions of Redescription Mining

  • Esther Galbrun
  • Pauli Miettinen
Chapter
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Abstract

Redescription mining is a data analysis task that aims at finding distinct common characterizations of the same objects. After defining the core problem and presenting algorithmic techniques to solve this task, we look in this chapter at some of the applications, variants, and extensions of redescription mining. We start by outlining different applications, as examples of how the method can be used in various domains. Next, we present two problem variants, namely, relational redescription mining and storytelling. The former aims at finding alternative descriptions for groups of objects in a relational data set, while the goal in the latter is to build a sequence of related queries in order to establish a connection between two given queries. Finally, we point out extensions of the task that constitute possible directions for future research. In particular, we discuss how redescription mining could be augmented with richer query languages and consider going beyond pairs of queries to multiple descriptions.

References

  1. Gaidar D (2015) Mining redescriptors in Staphylococcus aureus data. Master’s thesis, Universität des Saarlandes, SaarbrückenGoogle Scholar
  2. Galbrun E (2013) Methods for redescription mining. PhD thesis, Department of Computer Science, University of HelsinkiGoogle Scholar
  3. Galbrun E, Kimmig A (2014) Finding relational redescriptions. Mach Learn 96(3):225–248, https://doi.org/10.1007/s10994-013-5402-3
  4. Galbrun E, Miettinen P (2012) From black and white to full color: Extending redescription mining outside the Boolean world. Stat Anal Data Min 5(4):284–303, https://doi.org/10.1002/sam.11145
  5. Galbrun E, Miettinen P (2016) Analysing political opinions using redescription mining. In: IEEE International Conference on Data Mining Workshops, pp 422–427, https://doi.org/10.1109/ICDMW.2016.0066
  6. Galbrun E, Tang H, Fortelius M, žliobaitė I (2017) Computational biomes: The ecometrics of large mammal teeth. Palaeontol Electron. SubmittedGoogle Scholar
  7. Goel N, Hsiao MS, Ramakrishnan N, Zaki MJ (2010) Mining complex Boolean expressions for sequential equivalence checking. In: Proceedings of the 19th IEEE Asian Test Symposium (ATS’10), pp 442–447, https://doi.org/10.1109/ATS.2010.81
  8. Hossain MS, Butler P, Boedihardjo AP, Ramakrishnan N (2012a) Storytelling in entity networks to support intelligence analysts. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12), pp 1375–1383, https://doi.org/10.1145/2339530.2339742
  9. Hossain MS, Gresock J, Edmonds Y, Helm RF, Potts M, Ramakrishnan N (2012b) Connecting the dots between PubMed abstracts. PLoS ONE 7(1):1–23, https://doi.org/10.1371/journal.pone.0029509
  10. Kohonen T (1989) Self-organization and associative memory. Springer, New YorkGoogle Scholar
  11. Kumar D (2007) Redescription mining: Algorithms and applications in bioinformatics. PhD thesis, Department of Computer Science, Virginia Polytechnic Institute and State UniversityGoogle Scholar
  12. Kumar D, Ramakrishnan N, Helm RF, Potts M (2008) Algorithms for storytelling. IEEE Trans Knowl Data En 20(6):736–751, https://doi.org/10.1109/TKDE.2008.32
  13. van Leeuwen M, Galbrun E (2015) Association discovery in two-view data. IEEE Trans Knowl Data Eng 27(12):3190–3202, https://doi.org/10.1109/TKDE.2015.2453159
  14. Metzler S, Miettinen P (2015a) Join size estimation on Boolean tensors of RDF data. In: Proceedings of the 24th International Conference on the World Wide Web (WWW’15), pp 77–78, https://doi.org/10.1145/2740908.2742738
  15. Metzler S, Miettinen P (2015b) On defining SPARQL with Boolean tensor algebra. https://doi.org/10.1145/2740908.2742738
  16. Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2017) A framework for redescription set construction. Expert Syst Appl 68:196–215, https://doi.org/10.1016/j.eswa.2016.10.012
  17. Mihelčić M, Šimić G, Babić-Leko M, Lavrač N, Džeroski S, Šmuc T (2017) Using redescription mining to relate clinical and biological characteristics of cognitively impaired and Alzheimer’s disease patients.Google Scholar
  18. Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2016) Redescription mining with multi-target predictive clustering trees. In: Proceedings of the 4th International Workshop on the New Frontiers in Mining Complex Patterns (NFMCP’15), pp 125–143, https://doi.org/10.1007/978-3-319-39315-5_9
  19. Nijssen S, Kok JN (2005) The Gaston tool for frequent subgraph mining. Proceedings of the International Workshop on Graph-Based Tools (GraBaTs 2004) 127(1):77–87, https://doi.org/10.1016/j.entcs.2004.12.039
  20. Pearson RG, Dawson TP (2003) Predicting the impacts of climate change on the distribution of species: Are bioclimate envelope models useful? Glob Ecol Biogeogr 12:361–371, https://doi.org/10.1046/j.1466-822X.2003.00042.x
  21. Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol model 190(3):231–259, https://doi.org/10.1016/j.ecolmodel.2005.03.026
  22. Ramakrishnan N, Zaki MJ (2009) Redescription mining and applications in bioinformatics. In: Chen J, Lonardi S (eds) Biological Data Mining, Chapman and Hall/CRC, Boca Raton, FLGoogle Scholar
  23. Ramakrishnan N, Kumar D, Mishra B, Potts M, Helm RF (2004) Turning CARTwheels: An alternating algorithm for mining redescriptions. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 266–275, https://doi.org/10.1145/1014052.1014083
  24. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326, https://doi.org/10.1126/science.290.5500.2323
  25. Shvaiko P, Euzenat J (2005) A survey of schema-based matching approaches. J Data Semantics IV 3730:146–171, https://doi.org/10.1007/11603412_5
  26. Singh J, Kumar D, Ramakrishnan N, Singhal V, Jervis J, Garst JF, Slaughter SM, DeSantis AM, Potts M, Helm RF (2005) Transcriptional response of Saccharomyces cerevisiae to desiccation and rehydration. Appl Environ Microbiol 71(12):8752–8763, https://doi.org/10.1128/AEM.71.12.8752-8763.2005
  27. Soberón J, Nakamura M (2009) Niches and distributional areas: Concepts, methods, and assumptions. Proc Natl Acad Sci USA 106(Supplement 2):19,644–19,650, https://doi.org/10.1073/pnas.0901637106
  28. Suchanek FM, Kasneci G, Weikum G (2007) YAGO: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp 697–706, https://doi.org/10.1145/1242572.1242667
  29. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323, https://doi.org/10.1126/science.290.5500.2319
  30. Thuiller W, Lafourcade B, Engler R, Araújo MB (2009) BIOMOD – A platform for ensemble forecasting of species distributions. Ecography 32(3):369–373, https://doi.org/10.1111/j.1600-0587.2008.05742.x
  31. Watts A, Ke D, Wang Q, Pillay A, Nicholson-Weller A, Lee JC (2005) Staphylococcus aureus strains that express serotype 5 or serotype 8 capsular polysaccharides differ in virulence. Infect Immun 73(6), https://doi.org/10.1128/IAI.73.6.3502-3511.2005
  32. Wu H, Vreeken J, Tatti N, Ramakrishnan N (2014) Uncovering the plot: Detecting surprising coalitions of entities in multi-relational schemas. Data Min Knowl Disc 28(5–6):1398–1428, https://doi.org/10.1007/s10618-014-0370-1
  33. Yan X, Han J (2002) gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), pp 721–724, https://doi.org/10.1109/ICDM.2002.1184038
  34. Zhao L, Zaki MJ, Ramakrishnan N (2006) BLOSOM: A framework for mining arbitrary Boolean expressions. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), pp 827–832, https://doi.org/10.1145/1150402.1150511
  35. Zinchenko T, Galbrun E, Miettinen P (2015) Mining predictive redescriptions with trees. In: IEEE International Conference on Data Mining Workshops, pp 1672–1675, https://doi.org/10.1109/ICDMW.2015.123

Copyright information

© The Author(s) 2017

Authors and Affiliations

  • Esther Galbrun
    • 1
  • Pauli Miettinen
    • 2
  1. 1.LORIAInria Nancy – Grand EstVillers-lès-NancyFrance
  2. 2.Databases and Information SystemsMax-Planck-Institute for InformaticsSaarbrückenGermany

Personalised recommendations