Relevance Measures for Multivalued Attributes in Multiclass Datasets

Tasca, Mariana; Zadrozny, Bianca; Plastino, Alexandre

doi:10.1007/978-3-319-12027-0_28

Mariana Tasca⁶,
Bianca Zadrozny⁷ &
Alexandre Plastino⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8864))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1637 Accesses
1 Citations

Abstract

An important step in the knowledge discovery in databases (KDD) process is the attribute selection procedure, which aims at choosing a subset of attributes that can represent the important information within the data. Most of the existing attribute selection methods can only handle simple attribute types, such as categorical and numerical. In particular, these methods cannot be applied to multivalued attributes, which are attributes that take multiple values simultaneously for the same instance in the dataset. This article proposes two relevance measures for multivalued attributes, which aim at measuring their importance for classification. The proposed measures are adaptations of two widely used relevance measures for categorical attributes: information gain and gain ratio. In order to evaluate the proposed measures, experiments were conducted with multiclass datasets submitted to multi-relational classifiers. The experiments show that the proposed measures are good indicators of the relevance of multivalued attributes for multiclass classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D.W.: Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies 36(2), 267–287 (1992)
Article Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository (2013), http://archive.ics.uci.edu/ml
Dehaspe, L., Toivonen, H.: Discovery of relational association rules. In: Dĕzeroski, S. (ed.) Relational Data Mining, pp. 189–208. Springer, New York (2001)
Chapter Google Scholar
Deng, H., Runger, G., Tuv, E.: Bias of Importance Measures for Multi-valued Attributes and Solutions. In: Honkela, T. (ed.) ICANN 2011, Part II. LNCS, vol. 6792, pp. 293–300. Springer, Heidelberg (2011)
Chapter Google Scholar
Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2009)
Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification and Scene Analysis. John Willey and Sons, New York (2001)
Google Scholar
Dzeroski, S.: Multi-relational data mining: an introduction. SIGKDD Explorations Newsletter 5(1), 1–16 (2003)
Article Google Scholar
Dzeroski, S., Lavrac, N.: Relational Data Mining, 1st edn. Springer, Secaucus (2001)
Book MATH Google Scholar
Elmasri, R., Navathe, S.B.: Fundamentals of Database System, 6th edn. Addison-Wesley, USA (2010)
Google Scholar
Emde, W., Wettschereck, D.: Multi-relational data mining using probabilistic relational models: research summary. In: Proceedings of the Workshop in Multi-relational Data Mining, Freiburg (2001)
Google Scholar
Garriga, G.C., Khardon, R., De Raedt, L.: On mining closed sets in multi-relational data. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 804–809. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Google Scholar
Goethals, B., Page, W., Mampaey, M.: Mining interesting sets and rules in relational databases. In: Proceedings of the ACM Symposium on Applied Computing, pp. 997–1001. ACM, New York (2010)
Google Scholar
Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(3), 1437–1447 (2003)
Article Google Scholar
Harris, E.: Information gain versus gain ratio: a study of split method biases. In: Proceedings of International Symposium on Artificial Intelligence and Mathematics (2002)
Google Scholar
IBGE: Instituto Brasileiro de Geografia e Estatística (2008), http://loja.ibge.gov.br/
Kalousis, A., Woznica, A., Hilario, M.: A unifying framework for relational distance-based learning. Tech. rep., University of Geneva, Switzerland, (2005)
Google Scholar
Kersting, K., De Raedt, L.: Interpreting bayesian logic programs. In: Proceedings of the Work-in-Progress Track at the International Conference on Inductive Logic Programming, Szeged, Hungary, pp. 138–155 (2001)
Google Scholar
Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Dĕzeroski, S. (ed.) Relational Data Mining, pp. 262–286. Springer, New York (2001)
Chapter Google Scholar
Leiva, H.: MRDTL: a Multi-Relational Decision Tree Learning Algorithm. Master’s thesis, Iowa State University, Ames, USA (2002)
Google Scholar
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998)
Book MATH Google Scholar
Nijssen, S., Jimenez, A., Guns, T.: Constraint-based pattern mining in multi-relational databases. In: Proceedings of the 11th IEEE International Conference on Data Mining Workshops, pp. 1120–1127 (2011)
Google Scholar
Perlich, C., Provost, F.: Distribution-based aggregation for relational learning from identifier attributes. Machine Learning 62(1–2), 65–105 (2006)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Siebes, A., Koopman, A.: Discovering relational item sets efficiently. In: Proceedings of the SIAM International Conference on Data Mining, pp. 108–119 (2008)
Google Scholar
Spyropoulou, E., De Bie, T., Boley, M.: Interesting pattern mining in multi-relational data. Data Mining and Knowledge Discovery 28(3), 808–849 (2014)
Article MATH MathSciNet Google Scholar
Tasca, M., Zadrozny, B., Plastino, A.: A relevance measure for multivalued attributes. Journal of Information and Data Management 4(3), 421–436 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Computação, Universidade Federal Fluminense, Niterói, Brazil
Mariana Tasca & Alexandre Plastino
IBM Research, Rio de Janeiro, Brazil
Bianca Zadrozny

Authors

Mariana Tasca
View author publications
You can also search for this author in PubMed Google Scholar
Bianca Zadrozny
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Plastino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariana Tasca .

Editor information

Editors and Affiliations

Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Ana L.C. Bazzan
Pontifica Universidad Católica (PUC), Santiago de Chile, Chile
Karim Pichara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tasca, M., Zadrozny, B., Plastino, A. (2014). Relevance Measures for Multivalued Attributes in Multiclass Datasets. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-12027-0_28
Published: 12 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics