Semantic Similarity Analysis of Urdu Documents

  • Rida Hijab Basit
  • Muhammad AslamEmail author
  • A. M. Martinez-Enriquez
  • Afraz Z. Syed
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10267)


Semantic similarity analysis is an emerging research area and plays an important role in document classification, text summarization, and plagiarism identification. Moreover, digital data are increasing tremendously over the Internet. Such unstructured data need efficient tools to find any relevant topic or related content optimally. Thus, many systems have been developed for various languages (English, Arabic, Hindi, Turkish, etc.) to retrieve documents based on semantic similarity but no such work has been done on Urdu language. For optimal search of Urdu digital documents, there is a need of such a system that finds semantically similar documents. This paper focuses on studying the existing systems and proposing an approach for Urdu documents providing a better semantic similarity score. Our proposed system - Semantic Similarity System for Urdu (TripleS4Urdu) provides good results that have been compiled after evaluation.


Semantic similarity analysis Latent semantic analysis LSA 


  1. 1.
    Hussein, A.S.: Arabic document similarity analysis using n-grams and singular value decomposition. In: IEEE 9th International Conference on Research Challenges in Information Science (RCIS), pp. 445–455 (2015)Google Scholar
  2. 2.
    Al-Saleh, A.B., Menai, M.E.B.: Automatic Arabic text summarization: a survey. Artif. Intell. Rev. 45(2), 203–234 (2016)CrossRefGoogle Scholar
  3. 3.
    Saloot, A.M., Idris, N., Mahmud, R., Ja’afar, S., Thorleuchter, D., Gani, A.: Hadith data mining and classification: a comparative analysis. Artif. Intell. Rev. 46(1), 113–128 (2016)CrossRefGoogle Scholar
  4. 4.
    Yucesoy, B., Oguducu, S.G.: Comparison of semantic and single term similarity measures for clustering Turkish documents. In: Proceedings of IEEE 6th International Conference on Machine Learning and Applications (ICMLA), pp. 393–398 (2007)Google Scholar
  5. 5.
    Han, C., Choi, J.: Effect of latent semantic indexing for clustering clinical documents. In: Proceedings of IEEE/ACIS 9th International Conference on Computer and Information Science (ICIS), pp. 561–566 (2010)Google Scholar
  6. 6.
    Ensan, A., Biletskiy, Y.: Matchmaking through semantic annotation and similarity measurement. In: IEEE 25th Canadian Conference on Electrical & Computer Engineering (CCECE), pp. 1–5 (2012)Google Scholar
  7. 7.
    Humayoun, M., Hammarstrm, H., Ranta, A.: Urdu morphology, orthography and lexicon extraction. In: Proceedings of 2nd Workshop on Computational Approaches to Arabic Script-Based Languages (2007)Google Scholar
  8. 8.
    Syed, A.Z., Aslam, M., Martinez-Enriquez, A.M.: Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif. Intell. Rev. 41(4), 535–561 (2014)CrossRefGoogle Scholar
  9. 9.
    Syed, A.Z., Aslam, M., Martinez-Enriquez, A.M.: Sentiment analysis of Urdu language: handling phrase-level negation. In: Batyrshin, I., Sidorov, G. (eds.) MICAI 2011. LNCS, vol. 7094, pp. 382–393. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-25324-9_33 CrossRefGoogle Scholar
  10. 10.
    Lee, M.C., Chang, J.W., Hsieh, T.C.: A grammar-based semantic similarity algorithm for natural language sentences. Sci. World J. (2014)Google Scholar
  11. 11.
    Singh, J., Sharan, A.: Lexical ontology-based computational model to find semantic similarity. In: Mohapatra, D.P., Patnaik, S. (eds.) Intelligent Computing, Networking, and Informatics. AISC, vol. 243, pp. 119–128. Springer, New Delhi (2014). doi: 10.1007/978-81-322-1665-0_12 CrossRefGoogle Scholar
  12. 12.
    Almarsoomi, A.F., Oshea, D.J., Bandar, Z., Crockett, K.: AWSS: an algorithm for Arabic word semantic similarity. In: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, pp. 504–509 (2013)Google Scholar
  13. 13.
    Shehata, S.: A wordnet-based semantic model for enhancing text clustering. In: IEEE International Conference on Data Mining Workshops, pp. 477–482 (2009)Google Scholar
  14. 14.
    Wagh, K., Kolhe, S.: Information retrieval based on semantic similarity using information content. Int. J. Comput. Sci. Issues 8(4), 364–370 (2011)Google Scholar
  15. 15.
    Adhikesavan, K.: An integrated approach for measuring semantic similarity between words and sentences using web search engine. Int. Arab J. Inf. Technol. 12(6), 589–596 (2015)Google Scholar
  16. 16.
    Madylova, A., Oguducu, S.G.: A taxonomy based semantic similarity of documents using the cosine measure. In: Proceedings of IEEE 24th International Symposium on Computer and Information Sciences (ISCIS), pp. 129–134 (2009)Google Scholar
  17. 17.
    Wali, W., Gargouri, B., hamadou, A.B.: Supervised learning to measure the semantic similarity between Arabic sentences. In: Núñez, M., Nguyen, N.T., Camacho, D., Trawiński, B. (eds.) ICCCI 2015. LNCS, vol. 9329, pp. 158–167. Springer, Cham (2015). doi: 10.1007/978-3-319-24069-5_15 CrossRefGoogle Scholar
  18. 18.
    Awajan, A.: Semantic similarity based approach for reducing Arabic texts dimensionality. Int. J. Speech Technol. 19(2), 191–201 (2016)CrossRefGoogle Scholar
  19. 19.
    Daud, A., Khan, W., Che, D.: Urdu language processing: a survey. Artif. Intell. Rev. 1–33 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Rida Hijab Basit
    • 1
  • Muhammad Aslam
    • 1
    Email author
  • A. M. Martinez-Enriquez
    • 2
  • Afraz Z. Syed
    • 3
  1. 1.Department of Computer Science and EngineeringUniversity of Engineering and TechnologyLahorePakistan
  2. 2.Department of Computer ScienceCINVESTAV-IPNMexico, D.F.Mexico
  3. 3.Information Technology Program (ITP)Lambton College of Applied Science and TechnologySarniaCanada

Personalised recommendations