Variable Transformation for Granularity Change in Hierarchical Databases in Actual Data Mining Solutions

Adeodato, Paulo J. L.

doi:10.1007/978-3-319-24834-9_18

Paulo J. L. Adeodato¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9375))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1464 Accesses

Abstract

This paper presents a variable transformation strategy for enriching the variables´ information content and defining the project target in actual data mining applications based on relational databases with data at different grains. In an actual solution for assessing the schools´ quality based on official school survey and students tests data, variables at the student and teachers´ grains had to become features of the schools they belonged. The formal problem was how to summarize the relevant information content of the attribute distributions in a few summarizing concepts (features). Instead of the typical lowest order distribution momenta, the proposed transformations based on the distribution histogram produced a weighted score for the input variables. Following the CRISP-DM method, the problem interpretation has been precisely defined as a binary decision problem on a granularly transformed student grade. The proposed granular transformation embedded additional human expert´s knowledge to the input variables at the school level. Logistic regression produced a classification score for good schools and the AUC_ROC and Max_KS assessed that score performance on statistically independent datasets. A 10-fold cross-validation experimental procedure showed that this domain-driven data mining approach produced statistically significant improvement at a 0.99 confidence level over the usual distribution central tendency approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

INEP Databases. <http://portal.inep.gov.br/basica-levantamentos-acessar>. Accessed 15 March 2015. (In Portuguese)
Travitzki, R.: ENEM: limites e possibilidades do Exame Nacional do Ensino Médio enquanto indicador de qualidade escolar. Ph.D. thesis, USP, São Paulo (2013). (In Portuguese)
Google Scholar
Shearer, C.: The CRISP-DM model: the new blueprint for data mining. J. Data Warehouse. 5(4), 13–22 (2000)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Patt. Recognition Lett. 27, 861–874 (2006)
Article Google Scholar
Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Statist. Sci. 17(3), 235–255 (2002)
Article MathSciNet MATH Google Scholar
Nordin, F., Kowalkowski, C.: Solutions offerings: a critical review and reconceptualisation. J. Serv. Manage. 21(4), 441–459 (2010)
Article Google Scholar
Hu, M.K.: Visual pattern recognition by moment invariants. IRE Trans Info. Theor. 8(2), 179–187 (1962)
Article MATH Google Scholar
Hair, Jr., J.F., Black, W.C., Babin, B.J., Anderson, R.E., Tatham, R.L.: Multivariate Data Analysis, 6th edn. Pearson Prentice Hall, Upper Saddle River (2006)
Google Scholar
Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 6th edn. Pearson Prentice Hall, Upper Saddle River (2007)
MATH Google Scholar
Sousa, M.U.R.S., Silva, K.P., Adeodato, P.J.L.: Data mining applied to the processes celerity of Pernambuco’s state court of accounts. In: Proceedings of CONTECSI 2008 (2008). (In Portuguese)
Google Scholar
Flusser, J., Suk, T.: Pattern recognition by affine moment invariants. Pattern Recogn. 26(1), 167–174 (1993)
Article MathSciNet Google Scholar
Cao, L.: Introduction to domain driven data mining. In: Cao, L., Yu, P.S., Zhang, C., Zhang, H. (eds.) Data Mining for Business Applications, pp. 3–10. Springer, US (2008)
Google Scholar
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. J. 42(3), 203–231 (2001)
Article MATH Google Scholar
Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, New York (1999)
Google Scholar
Adeodato, P.J.L., Vasconcelos, G.C., et al.: The power of sampling and stacking for the PAKDD-2007 cross-selling problem. Int. J. Data Warehouse. Min. 4(2), 22–31 (2008)
Article Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Waltham (2012)
MATH Google Scholar
Kavukcuoglu, K.: Learning feature hierarchies for object recognition. Ph.D. thesis, Department Computer Science, New York University, January 2011
Google Scholar

Download references

Acknowledgments

The author would like to thank Mr. Fábio C. Pereira for running the experiments.

Author information

Authors and Affiliations

Centro de Informática, Universidade Federal de Pernambuco, Recife, Brazil
Paulo J. L. Adeodato

Authors

Paulo J. L. Adeodato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paulo J. L. Adeodato .

Editor information

Editors and Affiliations

Wroclaw University of Technology, Wroclaw, Poland
Konrad Jackowski
Department of Systems, Wroclaw University of Technology, Wroclaw, Poland
Robert Burduk
Wroclaw Univ of Tech, Wroclaw, Poland
Krzysztof Walkowiak
Wroclaw University of Technology, Faculty of Electronics, Wroclaw, Poland
Michal Wozniak
School of Electrical & Electronic E, University of Manchester, Manchester, United Kingdom
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adeodato, P.J.L. (2015). Variable Transformation for Granularity Change in Hierarchical Databases in Actual Data Mining Solutions. In: Jackowski, K., Burduk, R., Walkowiak, K., Wozniak, M., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2015. IDEAL 2015. Lecture Notes in Computer Science(), vol 9375. Springer, Cham. https://doi.org/10.1007/978-3-319-24834-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-24834-9_18
Published: 07 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24833-2
Online ISBN: 978-3-319-24834-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics