Applications, Variants, and Extensions of Redescription Mining

Galbrun, Esther; Miettinen, Pauli

doi:10.1007/978-3-319-72889-6_3

Esther Galbrun¹⁶ &
Pauli Miettinen¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

370 Accesses

Abstract

Redescription mining is a data analysis task that aims at finding distinct common characterizations of the same objects. After defining the core problem and presenting algorithmic techniques to solve this task, we look in this chapter at some of the applications, variants, and extensions of redescription mining. We start by outlining different applications, as examples of how the method can be used in various domains. Next, we present two problem variants, namely, relational redescription mining and storytelling. The former aims at finding alternative descriptions for groups of objects in a relational data set, while the goal in the latter is to build a sequence of related queries in order to establish a connection between two given queries. Finally, we point out extensions of the task that constitute possible directions for future research. In particular, we discuss how redescription mining could be augmented with richer query languages and consider going beyond pairs of queries to multiple descriptions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.geneontology.org/. Accessed 25 Oct 2017.
2.
Units of fluorescent intensity depend on the measuring device and the procedure used, hence they are called arbitrary units.
3.
http://www.adni-info.org/. Accessed 25 Oct 2017.
4.
The term is used in its Grinnellian sense, see Soberón and Nakamura (2009).
5.
http://www.culturalsciences.info/AlyaWeb/. Accessed 25 Oct 2017.
6.
https://www.ncbi.nlm.nih.gov/pubmed/. Accessed 25 Oct 2017.
7.
Story adapted from https://bioinformatics.cs.vt.edu/connectingthedots/stories.html, case study 3 (accessed 25 Oct 2017).
8.
The result should perhaps be called ‘tridescription’ or ‘multi-description’, though.
9.
http://www.w3.org/TR/rdf-syntax-grammar. Accessed 25 Oct 2017.
10.
http://www.w3.org/TR/rdf-sparql-query. Accessed 25 Oct 2017.

References

Gaidar D (2015) Mining redescriptors in Staphylococcus aureus data. Master’s thesis, Universität des Saarlandes, Saarbrücken
Google Scholar
Galbrun E (2013) Methods for redescription mining. PhD thesis, Department of Computer Science, University of Helsinki
Google Scholar
Galbrun E, Kimmig A (2014) Finding relational redescriptions. Mach Learn 96(3):225–248, https://doi.org/10.1007/s10994-013-5402-3
Galbrun E, Miettinen P (2012) From black and white to full color: Extending redescription mining outside the Boolean world. Stat Anal Data Min 5(4):284–303, https://doi.org/10.1002/sam.11145
Galbrun E, Miettinen P (2016) Analysing political opinions using redescription mining. In: IEEE International Conference on Data Mining Workshops, pp 422–427, https://doi.org/10.1109/ICDMW.2016.0066
Galbrun E, Tang H, Fortelius M, žliobaitė I (2017) Computational biomes: The ecometrics of large mammal teeth. Palaeontol Electron. Submitted
Google Scholar
Goel N, Hsiao MS, Ramakrishnan N, Zaki MJ (2010) Mining complex Boolean expressions for sequential equivalence checking. In: Proceedings of the 19th IEEE Asian Test Symposium (ATS’10), pp 442–447, https://doi.org/10.1109/ATS.2010.81
Hossain MS, Butler P, Boedihardjo AP, Ramakrishnan N (2012a) Storytelling in entity networks to support intelligence analysts. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12), pp 1375–1383, https://doi.org/10.1145/2339530.2339742
Hossain MS, Gresock J, Edmonds Y, Helm RF, Potts M, Ramakrishnan N (2012b) Connecting the dots between PubMed abstracts. PLoS ONE 7(1):1–23, https://doi.org/10.1371/journal.pone.0029509
Kohonen T (1989) Self-organization and associative memory. Springer, New York
Google Scholar
Kumar D (2007) Redescription mining: Algorithms and applications in bioinformatics. PhD thesis, Department of Computer Science, Virginia Polytechnic Institute and State University
Google Scholar
Kumar D, Ramakrishnan N, Helm RF, Potts M (2008) Algorithms for storytelling. IEEE Trans Knowl Data En 20(6):736–751, https://doi.org/10.1109/TKDE.2008.32
van Leeuwen M, Galbrun E (2015) Association discovery in two-view data. IEEE Trans Knowl Data Eng 27(12):3190–3202, https://doi.org/10.1109/TKDE.2015.2453159
Metzler S, Miettinen P (2015a) Join size estimation on Boolean tensors of RDF data. In: Proceedings of the 24th International Conference on the World Wide Web (WWW’15), pp 77–78, https://doi.org/10.1145/2740908.2742738
Metzler S, Miettinen P (2015b) On defining SPARQL with Boolean tensor algebra. https://doi.org/10.1145/2740908.2742738
Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2017) A framework for redescription set construction. Expert Syst Appl 68:196–215, https://doi.org/10.1016/j.eswa.2016.10.012
Mihelčić M, Šimić G, Babić-Leko M, Lavrač N, Džeroski S, Šmuc T (2017) Using redescription mining to relate clinical and biological characteristics of cognitively impaired and Alzheimer’s disease patients.
Google Scholar
Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2016) Redescription mining with multi-target predictive clustering trees. In: Proceedings of the 4th International Workshop on the New Frontiers in Mining Complex Patterns (NFMCP’15), pp 125–143, https://doi.org/10.1007/978-3-319-39315-5_9
Nijssen S, Kok JN (2005) The Gaston tool for frequent subgraph mining. Proceedings of the International Workshop on Graph-Based Tools (GraBaTs 2004) 127(1):77–87, https://doi.org/10.1016/j.entcs.2004.12.039
Pearson RG, Dawson TP (2003) Predicting the impacts of climate change on the distribution of species: Are bioclimate envelope models useful? Glob Ecol Biogeogr 12:361–371, https://doi.org/10.1046/j.1466-822X.2003.00042.x
Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol model 190(3):231–259, https://doi.org/10.1016/j.ecolmodel.2005.03.026
Ramakrishnan N, Zaki MJ (2009) Redescription mining and applications in bioinformatics. In: Chen J, Lonardi S (eds) Biological Data Mining, Chapman and Hall/CRC, Boca Raton, FL
Google Scholar
Ramakrishnan N, Kumar D, Mishra B, Potts M, Helm RF (2004) Turning CARTwheels: An alternating algorithm for mining redescriptions. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 266–275, https://doi.org/10.1145/1014052.1014083
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326, https://doi.org/10.1126/science.290.5500.2323
Shvaiko P, Euzenat J (2005) A survey of schema-based matching approaches. J Data Semantics IV 3730:146–171, https://doi.org/10.1007/11603412_5
Singh J, Kumar D, Ramakrishnan N, Singhal V, Jervis J, Garst JF, Slaughter SM, DeSantis AM, Potts M, Helm RF (2005) Transcriptional response of Saccharomyces cerevisiae to desiccation and rehydration. Appl Environ Microbiol 71(12):8752–8763, https://doi.org/10.1128/AEM.71.12.8752-8763.2005
Soberón J, Nakamura M (2009) Niches and distributional areas: Concepts, methods, and assumptions. Proc Natl Acad Sci USA 106(Supplement 2):19,644–19,650, https://doi.org/10.1073/pnas.0901637106
Suchanek FM, Kasneci G, Weikum G (2007) YAGO: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp 697–706, https://doi.org/10.1145/1242572.1242667
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323, https://doi.org/10.1126/science.290.5500.2319
Thuiller W, Lafourcade B, Engler R, Araújo MB (2009) BIOMOD – A platform for ensemble forecasting of species distributions. Ecography 32(3):369–373, https://doi.org/10.1111/j.1600-0587.2008.05742.x
Watts A, Ke D, Wang Q, Pillay A, Nicholson-Weller A, Lee JC (2005) Staphylococcus aureus strains that express serotype 5 or serotype 8 capsular polysaccharides differ in virulence. Infect Immun 73(6), https://doi.org/10.1128/IAI.73.6.3502-3511.2005
Wu H, Vreeken J, Tatti N, Ramakrishnan N (2014) Uncovering the plot: Detecting surprising coalitions of entities in multi-relational schemas. Data Min Knowl Disc 28(5–6):1398–1428, https://doi.org/10.1007/s10618-014-0370-1
Yan X, Han J (2002) gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), pp 721–724, https://doi.org/10.1109/ICDM.2002.1184038
Zhao L, Zaki MJ, Ramakrishnan N (2006) BLOSOM: A framework for mining arbitrary Boolean expressions. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), pp 827–832, https://doi.org/10.1145/1150402.1150511
Zinchenko T, Galbrun E, Miettinen P (2015) Mining predictive redescriptions with trees. In: IEEE International Conference on Data Mining Workshops, pp 1672–1675, https://doi.org/10.1109/ICDMW.2015.123

Download references

Author information

Authors and Affiliations

LORIA, Inria Nancy – Grand Est, Villers-lès-Nancy, France
Esther Galbrun
Databases and Information Systems, Max-Planck-Institute for Informatics, Saarbrücken, Germany
Pauli Miettinen

Authors

Esther Galbrun
View author publications
You can also search for this author in PubMed Google Scholar
Pauli Miettinen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Galbrun, E., Miettinen, P. (2017). Applications, Variants, and Extensions of Redescription Mining. In: Redescription Mining. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-72889-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-72889-6_3
Published: 12 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72888-9
Online ISBN: 978-3-319-72889-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics