Abstract
In this work, a document clustering based on relevant terms into an untagged medical text corpus approach is presented. To achieve this, to create a list of documents containing each word is necessary. Then, for relevant term extraction, the frequency of each term is obtained in order to compute the word weight into the corpus and into each document. Finally, the clusters are built by mapping using main concepts from an ontology and the relevant terms (only subjects), assuming that if two words appear in the same documents these words are related. The obtained clusters have a category corresponding to ontology concepts, and they are measured with cluster from K-Means (assuming the k-Means cluster were well formed) using the Overlap Coefficient and obtaining 70% of similarity among the clusters.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015)
Jensi, R., Wiselin, J.G.: A survey on optimization approaches to text document clustering. Int. J. Comput. Sci. Appl. 3, 31–44 (2013)
Abualigah, L.M., Khader, A.T., Al-Betar, M.A., Alomari, O.A.: Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst. Appl. 84, 24–36 (2017)
Medline. https://www.nlm.nih.gov/bsd/pmresources.html. Accessed 02 Aug 2019
Pinto, D., Rosso, P.: KnCr: a short-text narrow-domain sub-corpus of medline. In: Proceedings of TLH-ENC 2006, pp. 266–269 (2006)
Habibi, M., Popescu-Belis, A.: Keyword extraction and clustering for document recommendation in conversations. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 746–759 (2015)
Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, Boston, vol. 400, pp. 525–526 (2000)
Balabantaray, R.C., Sarma, C., Jha, M.: Document clustering using k-means and k-medoids. CoRR, abs/1502.07938 (2015)
Beltrán, B., Ayala, D.V., Pinto, D., Martínez, R.: Towards the construction of a clustering algorithm with overlap directed by query. Res. Comput. Sci. 145, 97–105 (2017)
Jun, S., Park, S.-S., Jang, D.-S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)
Disease ontology. https://www.disease-ontology.org. Accessed 02 Aug 2019
Reyes-Peña, C., Pinto-Avendaño, D., Vilariño Ayala, D.: Emotion classification of twitter data using an approach based on ranking. Res. Comput. Sci. 147(11), 45–52 (2018)
Acknowledgment
This work is supported by the Sectoral Research Fund for Education with the CONACyT project 257357, and partially supported by the VIEP-BUAP project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Reyes-Peña, C., Tovar Vidal, M., Lavalle Martínez, J.d.J. (2020). Document Clustering by Relevant Terms: An Approach. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent Systems and Computing, vol 1069. Springer, Cham. https://doi.org/10.1007/978-3-030-32520-6_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-32520-6_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32519-0
Online ISBN: 978-3-030-32520-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)