Skip to main content

Document Clustering by Relevant Terms: An Approach

  • Conference paper
  • First Online:
  • 1271 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1069))

Abstract

In this work, a document clustering based on relevant terms into an untagged medical text corpus approach is presented. To achieve this, to create a list of documents containing each word is necessary. Then, for relevant term extraction, the frequency of each term is obtained in order to compute the word weight into the corpus and into each document. Finally, the clusters are built by mapping using main concepts from an ontology and the relevant terms (only subjects), assuming that if two words appear in the same documents these words are related. The obtained clusters have a category corresponding to ontology concepts, and they are measured with cluster from K-Means (assuming the k-Means cluster were well formed) using the Overlap Coefficient and obtaining 70% of similarity among the clusters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015)

    Google Scholar 

  2. Jensi, R., Wiselin, J.G.: A survey on optimization approaches to text document clustering. Int. J. Comput. Sci. Appl. 3, 31–44 (2013)

    Google Scholar 

  3. Abualigah, L.M., Khader, A.T., Al-Betar, M.A., Alomari, O.A.: Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst. Appl. 84, 24–36 (2017)

    Article  Google Scholar 

  4. Medline. https://www.nlm.nih.gov/bsd/pmresources.html. Accessed 02 Aug 2019

  5. Pinto, D., Rosso, P.: KnCr: a short-text narrow-domain sub-corpus of medline. In: Proceedings of TLH-ENC 2006, pp. 266–269 (2006)

    Google Scholar 

  6. Habibi, M., Popescu-Belis, A.: Keyword extraction and clustering for document recommendation in conversations. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 746–759 (2015)

    Article  Google Scholar 

  7. Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, Boston, vol. 400, pp. 525–526 (2000)

    Google Scholar 

  8. Balabantaray, R.C., Sarma, C., Jha, M.: Document clustering using k-means and k-medoids. CoRR, abs/1502.07938 (2015)

    Google Scholar 

  9. Beltrán, B., Ayala, D.V., Pinto, D., Martínez, R.: Towards the construction of a clustering algorithm with overlap directed by query. Res. Comput. Sci. 145, 97–105 (2017)

    Google Scholar 

  10. Jun, S., Park, S.-S., Jang, D.-S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst. Appl. 41(7), 3204–3212 (2014)

    Article  Google Scholar 

  11. Disease ontology. https://www.disease-ontology.org. Accessed 02 Aug 2019

  12. Reyes-Peña, C., Pinto-Avendaño, D., Vilariño Ayala, D.: Emotion classification of twitter data using an approach based on ranking. Res. Comput. Sci. 147(11), 45–52 (2018)

    Google Scholar 

Download references

Acknowledgment

This work is supported by the Sectoral Research Fund for Education with the CONACyT project 257357, and partially supported by the VIEP-BUAP project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mireya Tovar Vidal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Reyes-Peña, C., Tovar Vidal, M., Lavalle Martínez, J.d.J. (2020). Document Clustering by Relevant Terms: An Approach. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2019. FTC 2019. Advances in Intelligent Systems and Computing, vol 1069. Springer, Cham. https://doi.org/10.1007/978-3-030-32520-6_44

Download citation

Publish with us

Policies and ethics