Advertisement

A Semantic Taxonomy for Weighting Assumptions to Reduce Feature Selection from Social Media and Forum Posts

  • Ali Muttaleb HasanEmail author
  • Taha Hussein Rassem
  • Noorhuzaimi Mohd Noor
  • Ahmed Muttaleb Hasan
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1073)

Abstract

Numerous researchers have worked on the knowledge-based semantics of words to clarify the ambiguity of (https://github.com/alimuttaleb/Ali-Muttaleb/blob/master/Synonym.txt) synonyms in various natural-language processing fields, such as Wikipedia, websites, and social networks. This paper attempts to clarify ambiguities in the lexical semantics of taxonomy in social media. It proposes a new knowledge-based semantic representation approach that can handle ambiguity and high dimensionality issues in text mining. The proposed approach consists of two main components, namely, a feature-based method for incorporating the relationships between lexical sources and a topic-based reduction method to overcome high dimensionality issues. These components help weight and reduce the relevant features of a concept. The proposed approach captures further lexical semantic similarity between words. It also evaluates the use of (https://wordnet.princeton.edu) WordNet 3.1 in text clustering and constant weighting assumption in the feature-based method used to select concepts/words from social media. To address ambiguity, the semantics of concepts with small feature subset size reduction are represented, and the performance of the semantic similarity measurement is improved. The proposed method evaluates word semantic similarity using the (https://github.com/alimuttaleb/semantictaxonomy/blob/master/mc30.txt) MC30 dataset in WordNet and obtains the following results for semantic representation: r = 0.82, p = 0.81, m = 0.81, and nz = 0.96.

Keywords

Semantic taxonomy Feature-based method Semantic representation Feature selection Gloss Social media MC30 

Notes

Acknowledgment

This work is supported by the University Malaysia Pahang (UMP) via Research Grant UMP RDU1803141.

References

  1. 1.
    Kaplan, A.M., Haenlein, M.: Users of the world, unite! The challenges and opportunities of Social Media. Bus. Horiz. 53(1), 59–68 (2010)CrossRefGoogle Scholar
  2. 2.
    Balog, K., Mishne, G., De Rijke, M.: Why are they excited?: Identifying and explaining spikes in blog mood levels. In: Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations. Association for Computational Linguistics (2006)Google Scholar
  3. 3.
    Zhu, G., Iglesias, C.A.: Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst. Appl. 101, 8–24 (2018)CrossRefGoogle Scholar
  4. 4.
    Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)CrossRefGoogle Scholar
  5. 5.
    Agirre, E., et al.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (2009)Google Scholar
  6. 6.
    Witten, I.H., Milne, D.N.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links (2008)Google Scholar
  7. 7.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJcAI (2007)Google Scholar
  8. 8.
    Fodeh, S., Punch, B., Tan, P.-N.: On ontology-driven document clustering using core semantic features. Knowl. Inf. Syst. 28(2), 395–421 (2011)CrossRefGoogle Scholar
  9. 9.
    Wei, T., et al.: A semantic approach for text clustering using WordNet and lexical chains. Expert Syst. Appl. 42(4), 2264–2275 (2015)CrossRefGoogle Scholar
  10. 10.
    Saif, A., Ab Aziz, M.J., Omar, N.: Reducing explicit semantic representation vectors using Latent Dirichlet allocation. Knowl.-Based Syst. 100, 145–159 (2016)CrossRefGoogle Scholar
  11. 11.
    Saif, A., Ab Aziz, M.J., Omar, N.: Evaluating knowledge-based semantic measures on Arabic. Int. J. Commun. Antenna Propag. 4(5), 180–194 (2014)Google Scholar
  12. 12.
    Saif, A., Ab Aziz, M.J., Omar, N.: Mapping Arabic WordNet synsets to Wikipedia articles using monolingual and bilingual features. Natural Language Engineering. 23(1), 53–91 (2017)CrossRefGoogle Scholar
  13. 13.
    Saif, A., et al.: Weighting-based semantic similarity measure based on topological parameters in semantic taxonomy. Nat. Lang. Eng. 24(6), 861–886 (2018)CrossRefGoogle Scholar
  14. 14.
    Fodeh, S.J., Punch, W.F., Tan, P.-N.: Combining statistics and semantics via ensemble model for document clustering. In: Proceedings of the 2009 ACM Symposium on Applied Computing. ACM (2009)Google Scholar
  15. 15.
    AlAgha, I., Nafee, R.: Investigating the efficiency of WordNet as background knowledge for document clustering. J. Eng. Res. Technol. 2(2) (2016)Google Scholar
  16. 16.
    Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. J. Artif. Intell. Res. 34, 443–498 (2009)CrossRefGoogle Scholar
  17. 17.
    Hassan, S.: Measuring semantic relatedness using salient encyclopedic concepts. University of North Texas (2011)Google Scholar
  18. 18.
    Taieb, M.A.H., Aouicha, M.B., Hamadou, A.B.: Computing semantic relatedness using Wikipedia features. Knowl.-Based Syst. 50, 260–278 (2013)CrossRefGoogle Scholar
  19. 19.
    Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inform. 44(1), 118–125 (2011)CrossRefGoogle Scholar
  20. 20.
    Sánchez, D., Batet, M., Isern, D.: Ontology-based information content computation. Knowl.-Based Syst. 24(2), 297–303 (2011)CrossRefGoogle Scholar
  21. 21.
    Zhou, Z., Wang, Y., Gu, J.: A new model of information content for semantic similarity in WordNet. In: 2008 Second International Conference on Future Generation Communication and Networking Symposia, FGCNS 2008. IEEE (2008)Google Scholar
  22. 22.
    Zesch, T.: Study of semantic relatedness of words using collaboratively constructed semantic resources. Technische Universität (2010)Google Scholar
  23. 23.
    Zesch, T., Gurevych, I., Mühlhäuser, M.: Comparing Wikipedia and German WordNet by evaluating semantic relatedness on multiple datasets. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers. Association for Computational Linguistics (2007)Google Scholar
  24. 24.
    Al-Tashi, Q., Hasan, A.M.: Word sense disambiguation: a review. Southern Connecticut State University, Hilton C. Buley Library 1, 2, pp. 20–458 (2019)Google Scholar
  25. 25.
    Rassem, T.H., et al.: Restoring the missing features of the corrupted speech using linear interpolation methods. In: 2017 AIP Conference Proceedings. AIP Publishing (2017)Google Scholar
  26. 26.
    Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eval. 44(1–2), 137–158 (2010)CrossRefGoogle Scholar
  27. 27.
    Hasan, A.M., Zakaria, L.Q.: Question classification using support vector machine and pattern matching. J. Theor. Appl. Inf. Technol. 87(2) (2016)Google Scholar
  28. 28.
    Omar, N., Al-Tashi, Q.: Arabic nested noun compound extraction based on linguistic features and statistical measures. GEMA Online® J. Lang. Stud. 18(2) (2018)Google Scholar
  29. 29.
    Hasan, A.M., Rassem, T.H., Noorhuzaimi, M.: Combined support vector machine and pattern matching for arabic islamic hadith question classification system. In: International Conference of Reliable Information and Communication Technology. Springer (2018)Google Scholar
  30. 30.
    Al-Tashi, Q., et al.: Binary optimization using hybrid grey wolf optimization for feature selection. IEEE Access 7, 39496–39508 (2019)CrossRefGoogle Scholar
  31. 31.
    Al-Tashi, Q., Rais, H., Jadid, S.: Feature selection method based on grey wolf optimization for coronary artery disease classification. In: International Conference of Reliable Information and Communication Technology. Springer (2018)Google Scholar
  32. 32.
    Hasan, A.M., Rassem, T.H., Karimah, M.: Pattern-matching based for Arabic question answering: a challenge perspective. Adv. Sci. Lett. 24(10), 7655–7661 (2018)CrossRefGoogle Scholar
  33. 33.
    Aouicha, M.B., Taieb, M.A.H., Hamadou, A.B.: Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness. Appl. Intell. 45(2), 475–511 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Ali Muttaleb Hasan
    • 1
    Email author
  • Taha Hussein Rassem
    • 1
  • Noorhuzaimi Mohd Noor
    • 1
  • Ahmed Muttaleb Hasan
    • 1
  1. 1.Faculty of Computing (FKOM)University Malaysia PahangGambang, KuantanMalaysia

Personalised recommendations