Document Clustering by Relevant Terms: An Approach

Reyes-Peña, Cecilia; Tovar Vidal, Mireya; Lavalle Martínez, José de Jesús

doi:10.1007/978-3-030-32520-6_44

Document Clustering by Relevant Terms: An Approach

Cecilia Reyes-Peña¹⁷,
Mireya Tovar Vidal¹⁷ &
José de Jesús Lavalle Martínez¹⁷

Conference paper
First Online: 13 October 2019

1271 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1069))

Abstract

In this work, a document clustering based on relevant terms into an untagged medical text corpus approach is presented. To achieve this, to create a list of documents containing each word is necessary. Then, for relevant term extraction, the frequency of each term is obtained in order to compute the word weight into the corpus and into each document. Finally, the clusters are built by mapping using main concepts from an ontology and the relevant terms (only subjects), assuming that if two words appear in the same documents these words are related. The obtained clusters have a category corresponding to ontology concepts, and they are measured with cluster from K-Means (assuming the k-Means cluster were well formed) using the Overlap Coefficient and obtaining 70% of similarity among the clusters.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015)
Google Scholar
Jensi, R., Wiselin, J.G.: A survey on optimization approaches to text document clustering. Int. J. Comput. Sci. Appl. 3, 31–44 (2013)
Google Scholar
Abualigah, L.M., Khader, A.T., Al-Betar, M.A., Alomari, O.A.: Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst. Appl. 84, 24–36 (2017)
Article Google Scholar
Medline. https://www.nlm.nih.gov/bsd/pmresources.html. Accessed 02 Aug 2019
Pinto, D., Rosso, P.: KnCr: a short-text narrow-domain sub-corpus of medline. In: Proceedings of TLH-ENC 2006, pp. 266–269 (2006)
Google Scholar
Habibi, M., Popescu-Belis, A.: Keyword extraction and clustering for document recommendation in conversations. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 746–759 (2015)
Article Google Scholar
Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, Boston, vol. 400, pp. 525–526 (2000)
Google Scholar
Balabantaray, R.C., Sarma, C., Jha, M.: Document clustering using k-means and k-medoids. CoRR, abs/1502.07938 (2015)
Google Scholar
Beltrán, B., Ayala, D.V., Pinto, D., Martínez, R.: Towards the construction of a clustering algorithm with overlap directed by query. Res. Comput. Sci. 145, 97–105 (2017)
Google Scholar
Jun, S., Park, S.-S., Jang, D.-S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)
Article Google Scholar
Disease ontology. https://www.disease-ontology.org. Accessed 02 Aug 2019
Reyes-Peña, C., Pinto-Avendaño, D., Vilariño Ayala, D.: Emotion classification of twitter data using an approach based on ranking. Res. Comput. Sci. 147(11), 45–52 (2018)
Google Scholar

Download references

Acknowledgment

This work is supported by the Sectoral Research Fund for Education with the CONACyT project 257357, and partially supported by the VIEP-BUAP project.

Author information

Authors and Affiliations

Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, 14 sur y Av. San Claudio, C.U., Puebla, Puebla, Mexico
Cecilia Reyes-Peña, Mireya Tovar Vidal & José de Jesús Lavalle Martínez

Authors

Cecilia Reyes-Peña
View author publications
You can also search for this author in PubMed Google Scholar
Mireya Tovar Vidal
View author publications
You can also search for this author in PubMed Google Scholar
José de Jesús Lavalle Martínez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mireya Tovar Vidal .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reyes-Peña, C., Tovar Vidal, M., Lavalle Martínez, J.d.J. (2020). Document Clustering by Relevant Terms: An Approach. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent Systems and Computing, vol 1069. Springer, Cham. https://doi.org/10.1007/978-3-030-32520-6_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-32520-6_44
Published: 13 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32519-0
Online ISBN: 978-3-030-32520-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics