Advertisement

Extending Redescription Mining to Multiple Views

  • Matej MihelčićEmail author
  • Sašo Džeroski
  • Tomislav Šmuc
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11198)

Abstract

Redescription mining is a data mining task that discovers re-descriptions of different subsets of entities from available data. Locating such re-descriptions is important in many scientific disciplines because it allows detecting different types of associations including synergy of different attributes of interest. There exist a number of redescription mining algorithms, however they are all restricted to use of one or maximally two disjoint sets of attributes (views) to re-describe different subsets of entities. The main reasons for this limitation are computational complexity and potentially large increase in number of produced patterns, in multi-view setting, during redescription mining. In this work we present an algorithm that allows mining redescriptions from multiple views using the CLUS-RM algorithm. Presented algorithm efficiently solves aforementioned problems. Its computational complexity, with respect to attribute operations, increases linearly with the increase of number of views and we present techniques to handle large number of produced redescriptions during redescription mining step.

Notes

Acknowledgement

The authors acknowledge the European Commission’s support through the projects MAESTRA (Gr. no. 612944) and HBP SGA2 (Gr. no. 785907), support of the Croatian Science Foundation (Pr. no. 9623: Machine Learning Algorithms for Insightful Analysis of Complex Data Structures) and partial support by the European Regional Development Fund under the grant KK.01.1.1.01.0009 (DATACROSS).

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328. American Association for Artificial Intelligence (1996)Google Scholar
  2. 2.
    Bickel, S., Scheffer, T.: Multi-view clustering. In: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 19–26. ICDM 2004, IEEE Computer Society, Washington, DC, USA (2004)Google Scholar
  3. 3.
    Cox, D.R.: Note on grouping. J. Am. Stat. Assoc. 52(280), 543–547 (1957)CrossRefGoogle Scholar
  4. 4.
    Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Appl. Intell. 13(1), 7–17 (2000)CrossRefGoogle Scholar
  5. 5.
    Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2(2), 139–172 (1987)Google Scholar
  6. 6.
    Fisher, W.D.: On grouping for maximum homogeneity. J. Am. Stat. Assoc. 53(284) (1958)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Galbrun, E., Miettinen, P.: From black and white to full color: extending redescription mining outside the Boolean world. Stat. Anal. Data Min. 5(4), 284–303 (2012)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Galbrun, E., Miettinen, P.: Siren: an interactive tool for mining and visualizing geospatial redescriptions. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1544–1547. KDD 2012, ACM, New York, NY, USA (2012)Google Scholar
  9. 9.
    Gallo, A., Miettinen, P., Mannila, H.: Finding subgroups having several descriptions: algorithms for redescription mining. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp. 334–345. SIAM (2008)Google Scholar
  10. 10.
    Gamberger, D., Mihelčić, M., Lavrač, N.: Multilayer clustering: a discovery experiment on country level trading data. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS (LNAI), vol. 8777, pp. 87–98. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11812-3_8CrossRefGoogle Scholar
  11. 11.
    Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining - a general survey and comparison. SIGKDD Explor. Newsl. 58–64 (2000)CrossRefGoogle Scholar
  12. 12.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  13. 13.
    Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)CrossRefGoogle Scholar
  14. 14.
    Michalski, R.S.: Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. J. Policy Anal. Inf. Syst. 4(3), 219–244 (1980)Google Scholar
  15. 15.
    Mihelcic, M., Dzeroski, S., Lavrac, N., Smuc, T.: Redescription mining with multi-target predictive clustering trees. In: New Frontiers in Mining Complex Patterns - 4th International Workshop, NFMCP, pp. 125–143. Porto, Portugal (2015)Google Scholar
  16. 16.
    Mihelčić, M., Džeroski, S., Lavrač, N., Šmuc, T.: A framework for redescription set construction. Expert. Syst. Appl. 68, 196–215 (2017)CrossRefGoogle Scholar
  17. 17.
    Parida, L., Ramakrishnan, N.: Redescription mining: structure theory and algorithms. In: AAAI, pp. 837–844. AAAI Press/The MIT Press (2005)Google Scholar
  18. 18.
    Ramakrishnan, N., Kumar, D., Mishra, B., Potts, M., Helm, R.F.: Turning cartwheels: an alternating algorithm for mining redescriptions. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 266–275. KDD 2004, ACM, New York, NY, USA (2004)Google Scholar
  19. 19.
    Stojanova, D., Ceci, M., Appice, A., Džeroski, S.: Network regression with predictive clustering trees. Data Min. Knowl. Discov. 25(2), 378–413 (2012)MathSciNetCrossRefGoogle Scholar
  20. 20.
    UN: Un database (2018), http://data.un.org/Explorer.aspx
  21. 21.
    UNCTAD: Unctad database (2014), http://unctadstat.unctad.org/
  22. 22.
    Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)MathSciNetCrossRefGoogle Scholar
  23. 23.
    WorldBank: World bank database (2014), http://data.worldbank.org/
  24. 24.
    Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Zaki, M.J., Ramakrishnan, N.: Reasoning about sets using redescription mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 364–373. KDD 2005, ACM, New York, USA (2005)Google Scholar
  26. 26.
    Zhang, M., He, C.: Survey on association rules mining algorithms. In: Advancing Computing, Communication, Control and Management, pp. 111–118. Lecture Notes in Electrical Engineering, Springer, Berlin Heidelberg (2010)CrossRefGoogle Scholar
  27. 27.
    Zinchenko, T.: Redescription Mining Over non-Binary Data Sets Using Decision Trees. Master’s thesis, Universität des Saarlandes Saarbrücken, Germany (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Matej Mihelčić
    • 1
    Email author
  • Sašo Džeroski
    • 2
  • Tomislav Šmuc
    • 1
  1. 1.Ruđer Bošković InstituteZagrebCroatia
  2. 2.Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations