Advertisement

Document Theme Extraction Using Named-Entity Recognition

  • Deepali Nagrale
  • Vaibhav Khatavkar
  • Parag Kulkarni
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 810)

Abstract

The text mining can be implemented by term analysis of word or phrase. This term which describes the concepts of particular sentence is use to define the document theme. The new context-based mining technique is introduced which uses the concept-based mining model to analyze the terms present in sentence, document, and corpus levels. We find the theme of document like organization, medical, entertainment, sport, and so on. Context-based mining apply on statistical data as well as real-time data like Export data from Wikipedia. The theme of document is extracted by using Natural Language processing (NLP) for communication between computer and human languages and name entity recognition (NER) algorithm for identification of entity, entity chunking, and entity extraction. It used to get name entity in text such as person name, organization name, specific locations, time expressions, percentages quantities and so on. NLP and NER are used in context-based mining for finding name of entity and their relationship. Context Vector containing set of documents is used to extract the context of the document. Finally K-Mean algorithm is used for clustering to find inherent groupings of the text documents, then set of clusters are generated where each cluster exhibit high intra cluster similarity and low inter cluster similarity. The text document clustering is used to separate documents into groups or clusters based on their similarity so all groups define the distinct topics.

Keywords

Context mining NLP NER Context vector 

References

  1. 1.
    Kamath, S., Wagh, R.: Named entity recognition approaches and challenges. Int. J. Adv. Res. Comput. Commun. Eng. ISO 3297:2007 Certified 6(2) (2017)Google Scholar
  2. 2.
    Sharnagat, R.: Named Entity Recognition: A Literature Survey (2014)Google Scholar
  3. 3.
    Dhande, K.A., Umale, J.S., Kulkarni, P.A.: Context based text document sharing system using association rule mining. In: Annual IEEE India Conference (INDICON) (2014)Google Scholar
  4. 4.
    Khatavkar, V., Kulkarni, P.: Document context indentification using latent semantic analysis. In: Presented in: 3rd International Conference On Computing, Communication, Control And Automation, 17–18 Aug 2017, Pune, MS, India (To be published on IEEE)Google Scholar
  5. 5.
    Khatavkar, V., Kulkarni, P.: Context Vector Machine for Information Retrieval. Atlantis Press, Advances in Intelligent Systems Research 137, 375–379 (2017)Google Scholar
  6. 6.
    Sowmiya, J.S., Chandrakala, S.: Joint sentiment/topic extraction from text. In: IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT) (2014)Google Scholar
  7. 7.
  8. 8.
    Niemann, K., Wolpers, M.: Creating usage context-based object similarities to boost recommender systems in technology enhanced learning. J. Latex Class Files 6(1) (2007)Google Scholar
  9. 9.
    Sharma, D., Jain, S.: Context-based weighting for vector space model to evaluate the relation between concept and context in information storage and retrieval system. In: International Conference on Computer, Communication and Control (IC4) (2015)Google Scholar
  10. 10.
    Chen, Y., Han, B., Hou, P.: New feature selection methods based on context similarity for text categorization. In: 11th International Conference Fuzzy Systems and Knowledge Discovery (FSKD),, 2014Google Scholar
  11. 11.
    Chandrashekar, M., Lee, Y.: Visual context learning with big data analytics. In: IEEE 16th International Conference on Data Mining Workshops (2016)Google Scholar
  12. 12.
    Kulkarni, A.R., Tokekar, V., Kulkarni, P.: Identifying context of text documents using Naïve Bayes classification and Apriori association rule mining. In: 2012 CSI Sixth International Conference on Software Engineering (CONSEG) pp. 1–4, 5–7 Sept 2012.  https://doi.org/10.1109/conseg.2012.6349477
  13. 13.
    Bhakkad, A., Dharmadhikari, S.C., Emmanuel, M., Kulkarni, P.: E-VSM: novel text representation model to capture contex-based closeness between two text documents. In: 7th International Conference on Intelligent Systems and Control (ISCO) (2013)Google Scholar
  14. 14.
    Poormasoomi, A., Kahani, M., Yazdiand, S.V., Kamyar, H.: Context-based persian multi-document summa- rization (global view). In: International Conference on Asian Language Processing (2011)Google Scholar
  15. 15.

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Deepali Nagrale
    • 1
  • Vaibhav Khatavkar
    • 1
  • Parag Kulkarni
    • 1
  1. 1.Department of Computer Engineering and ITCollege of EngineeringPuneIndia

Personalised recommendations