Skip to main content

Applications, Variants, and Extensions of Redescription Mining

  • Chapter
  • First Online:
Redescription Mining

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 370 Accesses

Abstract

Redescription mining is a data analysis task that aims at finding distinct common characterizations of the same objects. After defining the core problem and presenting algorithmic techniques to solve this task, we look in this chapter at some of the applications, variants, and extensions of redescription mining. We start by outlining different applications, as examples of how the method can be used in various domains. Next, we present two problem variants, namely, relational redescription mining and storytelling. The former aims at finding alternative descriptions for groups of objects in a relational data set, while the goal in the latter is to build a sequence of related queries in order to establish a connection between two given queries. Finally, we point out extensions of the task that constitute possible directions for future research. In particular, we discuss how redescription mining could be augmented with richer query languages and consider going beyond pairs of queries to multiple descriptions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.geneontology.org/. Accessed 25 Oct 2017.

  2. 2.

    Units of fluorescent intensity depend on the measuring device and the procedure used, hence they are called arbitrary units.

  3. 3.

    http://www.adni-info.org/. Accessed 25 Oct 2017.

  4. 4.

    The term is used in its Grinnellian sense, see Soberón and Nakamura (2009).

  5. 5.

    http://www.culturalsciences.info/AlyaWeb/. Accessed 25 Oct 2017.

  6. 6.

    https://www.ncbi.nlm.nih.gov/pubmed/. Accessed 25 Oct 2017.

  7. 7.

    Story adapted from https://bioinformatics.cs.vt.edu/connectingthedots/stories.html, case study 3 (accessed 25 Oct 2017).

  8. 8.

    The result should perhaps be called ‘tridescription’ or ‘multi-description’, though.

  9. 9.

    http://www.w3.org/TR/rdf-syntax-grammar. Accessed 25 Oct 2017.

  10. 10.

    http://www.w3.org/TR/rdf-sparql-query. Accessed 25 Oct 2017.

References

  • Gaidar D (2015) Mining redescriptors in Staphylococcus aureus data. Master’s thesis, Universität des Saarlandes, Saarbrücken

    Google Scholar 

  • Galbrun E (2013) Methods for redescription mining. PhD thesis, Department of Computer Science, University of Helsinki

    Google Scholar 

  • Galbrun E, Kimmig A (2014) Finding relational redescriptions. Mach Learn 96(3):225–248, https://doi.org/10.1007/s10994-013-5402-3

  • Galbrun E, Miettinen P (2012) From black and white to full color: Extending redescription mining outside the Boolean world. Stat Anal Data Min 5(4):284–303, https://doi.org/10.1002/sam.11145

  • Galbrun E, Miettinen P (2016) Analysing political opinions using redescription mining. In: IEEE International Conference on Data Mining Workshops, pp 422–427, https://doi.org/10.1109/ICDMW.2016.0066

  • Galbrun E, Tang H, Fortelius M, žliobaitė I (2017) Computational biomes: The ecometrics of large mammal teeth. Palaeontol Electron. Submitted

    Google Scholar 

  • Goel N, Hsiao MS, Ramakrishnan N, Zaki MJ (2010) Mining complex Boolean expressions for sequential equivalence checking. In: Proceedings of the 19th IEEE Asian Test Symposium (ATS’10), pp 442–447, https://doi.org/10.1109/ATS.2010.81

  • Hossain MS, Butler P, Boedihardjo AP, Ramakrishnan N (2012a) Storytelling in entity networks to support intelligence analysts. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12), pp 1375–1383, https://doi.org/10.1145/2339530.2339742

  • Hossain MS, Gresock J, Edmonds Y, Helm RF, Potts M, Ramakrishnan N (2012b) Connecting the dots between PubMed abstracts. PLoS ONE 7(1):1–23, https://doi.org/10.1371/journal.pone.0029509

  • Kohonen T (1989) Self-organization and associative memory. Springer, New York

    Google Scholar 

  • Kumar D (2007) Redescription mining: Algorithms and applications in bioinformatics. PhD thesis, Department of Computer Science, Virginia Polytechnic Institute and State University

    Google Scholar 

  • Kumar D, Ramakrishnan N, Helm RF, Potts M (2008) Algorithms for storytelling. IEEE Trans Knowl Data En 20(6):736–751, https://doi.org/10.1109/TKDE.2008.32

  • van Leeuwen M, Galbrun E (2015) Association discovery in two-view data. IEEE Trans Knowl Data Eng 27(12):3190–3202, https://doi.org/10.1109/TKDE.2015.2453159

  • Metzler S, Miettinen P (2015a) Join size estimation on Boolean tensors of RDF data. In: Proceedings of the 24th International Conference on the World Wide Web (WWW’15), pp 77–78, https://doi.org/10.1145/2740908.2742738

  • Metzler S, Miettinen P (2015b) On defining SPARQL with Boolean tensor algebra. https://doi.org/10.1145/2740908.2742738

  • Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2017) A framework for redescription set construction. Expert Syst Appl 68:196–215, https://doi.org/10.1016/j.eswa.2016.10.012

  • Mihelčić M, Šimić G, Babić-Leko M, Lavrač N, Džeroski S, Šmuc T (2017) Using redescription mining to relate clinical and biological characteristics of cognitively impaired and Alzheimer’s disease patients.

    Google Scholar 

  • Mihelčić M, Džeroski S, Lavrač N, Šmuc T (2016) Redescription mining with multi-target predictive clustering trees. In: Proceedings of the 4th International Workshop on the New Frontiers in Mining Complex Patterns (NFMCP’15), pp 125–143, https://doi.org/10.1007/978-3-319-39315-5_9

  • Nijssen S, Kok JN (2005) The Gaston tool for frequent subgraph mining. Proceedings of the International Workshop on Graph-Based Tools (GraBaTs 2004) 127(1):77–87, https://doi.org/10.1016/j.entcs.2004.12.039

  • Pearson RG, Dawson TP (2003) Predicting the impacts of climate change on the distribution of species: Are bioclimate envelope models useful? Glob Ecol Biogeogr 12:361–371, https://doi.org/10.1046/j.1466-822X.2003.00042.x

  • Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol model 190(3):231–259, https://doi.org/10.1016/j.ecolmodel.2005.03.026

  • Ramakrishnan N, Zaki MJ (2009) Redescription mining and applications in bioinformatics. In: Chen J, Lonardi S (eds) Biological Data Mining, Chapman and Hall/CRC, Boca Raton, FL

    Google Scholar 

  • Ramakrishnan N, Kumar D, Mishra B, Potts M, Helm RF (2004) Turning CARTwheels: An alternating algorithm for mining redescriptions. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 266–275, https://doi.org/10.1145/1014052.1014083

  • Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326, https://doi.org/10.1126/science.290.5500.2323

  • Shvaiko P, Euzenat J (2005) A survey of schema-based matching approaches. J Data Semantics IV 3730:146–171, https://doi.org/10.1007/11603412_5

  • Singh J, Kumar D, Ramakrishnan N, Singhal V, Jervis J, Garst JF, Slaughter SM, DeSantis AM, Potts M, Helm RF (2005) Transcriptional response of Saccharomyces cerevisiae to desiccation and rehydration. Appl Environ Microbiol 71(12):8752–8763, https://doi.org/10.1128/AEM.71.12.8752-8763.2005

  • Soberón J, Nakamura M (2009) Niches and distributional areas: Concepts, methods, and assumptions. Proc Natl Acad Sci USA 106(Supplement 2):19,644–19,650, https://doi.org/10.1073/pnas.0901637106

  • Suchanek FM, Kasneci G, Weikum G (2007) YAGO: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp 697–706, https://doi.org/10.1145/1242572.1242667

  • Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323, https://doi.org/10.1126/science.290.5500.2319

  • Thuiller W, Lafourcade B, Engler R, Araújo MB (2009) BIOMOD – A platform for ensemble forecasting of species distributions. Ecography 32(3):369–373, https://doi.org/10.1111/j.1600-0587.2008.05742.x

  • Watts A, Ke D, Wang Q, Pillay A, Nicholson-Weller A, Lee JC (2005) Staphylococcus aureus strains that express serotype 5 or serotype 8 capsular polysaccharides differ in virulence. Infect Immun 73(6), https://doi.org/10.1128/IAI.73.6.3502-3511.2005

  • Wu H, Vreeken J, Tatti N, Ramakrishnan N (2014) Uncovering the plot: Detecting surprising coalitions of entities in multi-relational schemas. Data Min Knowl Disc 28(5–6):1398–1428, https://doi.org/10.1007/s10618-014-0370-1

  • Yan X, Han J (2002) gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), pp 721–724, https://doi.org/10.1109/ICDM.2002.1184038

  • Zhao L, Zaki MJ, Ramakrishnan N (2006) BLOSOM: A framework for mining arbitrary Boolean expressions. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), pp 827–832, https://doi.org/10.1145/1150402.1150511

  • Zinchenko T, Galbrun E, Miettinen P (2015) Mining predictive redescriptions with trees. In: IEEE International Conference on Data Mining Workshops, pp 1672–1675, https://doi.org/10.1109/ICDMW.2015.123

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Galbrun, E., Miettinen, P. (2017). Applications, Variants, and Extensions of Redescription Mining. In: Redescription Mining. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-72889-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72889-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72888-9

  • Online ISBN: 978-3-319-72889-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics