Selecting Candidate Labels for Hierarchical Document Clusters Using Association Rules

dos Santos, Fabiano Fernandes; de Carvalho, Veronica Oliveira; Oliveira Rezende, Solange

doi:10.1007/978-3-642-16773-7_14

Selecting Candidate Labels for Hierarchical Document Clusters Using Association Rules

Fabiano Fernandes dos Santos²²,
Veronica Oliveira de Carvalho²³ &
Solange Oliveira Rezende²²

Conference paper

1437 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6438))

Abstract

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels have to be built using only the terms in the documents of the collection. This paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical document clusters. The candidates are processed by a classical method to generate the labels. The idea of the proposed method is to process each parent-child relationship of the nodes as an antecedent-consequent relationship of association rules. The experimental results show that the proposed method can improve the precision and recall of labels obtained by classical methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
Bast, H., Dupret, G., Majumdar, D., Piwowarski, B.: Discovering a term taxonomy from term similarities using principal component analysis. In: Ackermann, M., Berendt, B., Grobelnik, M., Hotho, A., Mladenič, D., Semeraro, G., Spiliopoulou, M., Stumme, G., Svátek, V., van Someren, M. (eds.) EWMF 2005 and KDO 2005. LNCS (LNAI), vol. 4289, pp. 103–120. Springer, Heidelberg (2006)
Chapter Google Scholar
Glover, E.J., Pennock, D.M., Lawrence, S., Krovetz, R.: Inferring hierarchical descriptions. In: CIKM, pp. 507–514. ACM, New York (2002)
Google Scholar
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: KDD 1999: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 16–22. ACM, New York (1999)
Chapter Google Scholar
Lopes, A., Pinho, R., Paulovich, F., Minghim, R.: Visual text mining using association rules. In: ScienceDirect, pp. 316–326 (2007)
Google Scholar
Mahgoub, H., Rösner, D., Ismail, N., Torkey, F.: A text mining technique using association rules extraction. International Journal of Computational Intelligence, 21–28 (2008)
Google Scholar
Moura, M.F., Rezende, S.O.: A simple method for labeling hierarchical document clusters. In: Proceedings of AIA 2010 - Artificial Intelligence and Applications, Innsbruck, Austria (2010)
Google Scholar
Popescul, A., Ungar, L.: Automatic labeling of document clusters (2000) (unpublished manuscript), http://www.cis.upenn.edu/~popescul/Publications/popescul00labeling.pdf
Porter, M.F.: An algorithm for suffix stripping. Readings in Information Retrieval, 313–316 (1997)
Google Scholar
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA (1987)
Google Scholar
Searle, S.R.: Linear models. J. Wiley, New York (1971)
MATH Google Scholar
Treeratpituk, P., Callan, J.: Automatically labeling hierarchical clusters. In: dg.o 2006: Proceedings of the 2006 international conference on Digital government research, pp. 167–176. ACM, New York (2006)
Google Scholar
Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J.: Text Mining - Predictive Methods for Analizing Unstructured Information. Springer Science+Business Media, Inc., Heidelberg (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo (USP), Brazil
Fabiano Fernandes dos Santos & Solange Oliveira Rezende
Instituto de Geociências e Ciências Exatas UNESP, Univ Estadual Paulista, Brazil
Veronica Oliveira de Carvalho

Authors

Fabiano Fernandes dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Veronica Oliveira de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Solange Oliveira Rezende
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, , Av. Juan Dios Batiz, s/n, Zacatenco, 07738, Mexico City, México
Grigori Sidorov
Centro de Investigación en Matemáticas (CIMAT), Area de Computación, Callejón de Jalisco s/n, Mineral de Valenciana, Guanajuato, 36240, Guanajuato, México
Arturo Hernández Aguirre
Ciencias Computacionales, Instituto Nacional de Astrofísica, Optica y Electrónica (INAOE), Luis Enrique Erro No. 1, Santa María Tonantzintla, 72840, Puebla, México
Carlos Alberto Reyes García

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

dos Santos, F.F., de Carvalho, V.O., Oliveira Rezende, S. (2010). Selecting Candidate Labels for Hierarchical Document Clusters Using Association Rules. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds) Advances in Soft Computing. MICAI 2010. Lecture Notes in Computer Science(), vol 6438. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16773-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-16773-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16772-0
Online ISBN: 978-3-642-16773-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics