A Semantic Proximity Based System of Arabic Text Indexation

  • Taher Zaki
  • Driss Mammass
  • Abdellatif Ennaji
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6134)


In this paper, we extended the vectorial model of Salton [9], [11], [12] and [14], by adapting the TF-IDF parameter by its combination with the Okapi formula for index terms extraction and evaluation of the in order to identify the relevant concepts which represent a document.Indeed, we have proposed a new measure TFIDF-ABR which takes in consideration the concept of semantic vicinity using a measure of similarity between terms by combining the calculation of TF-IDF-Okapi with a kernel approach (Radial Basis function).

This indexation approach allows a contextual and semantic research. In order to have a robust descriptor index, we used not only a semantic graph to highlight the semantic connections between terms, but also an auxiliary dictionary to increase the connectivity of the constructed graph and therefore the semantic weight of indexation words.


Document indexation semantic graph semantic vicinity dictionary kernel function okapi formula similarity TF-IDF vectorial model 


  1. 1.
    Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)CrossRefGoogle Scholar
  2. 2.
    Dijkstra, E.W.: A short introduction to the art of programming, contenant l’article original décrivant l’algorithme de Dijkstra, pp. 67–73Google Scholar
  3. 3.
    Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness singWikipedia-based Explicit Semantic Analysis. In: Proc. IJCAI’07, pp. 1606–1611 (2007)Google Scholar
  4. 4.
    Khoja, S., Garside, S.: Stemming Arabic Text. Computing Department. Lancaster University, Lancaster, September 22 (1999),
  5. 5.
    Quillian, M.R.: Semantic memory. In: Semantic Information Processing (1968)Google Scholar
  6. 6.
    Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transaction on Systems, Man, and Cybernetics 19(1), 17–30 (1989)CrossRefGoogle Scholar
  7. 7.
    Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)zbMATHGoogle Scholar
  8. 8.
    Robertson, S., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing and Management 36(1), 95–108 (2000)CrossRefGoogle Scholar
  9. 9.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)CrossRefGoogle Scholar
  10. 10.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  11. 11.
    Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. Journal of the American Society for Information Science and Technology 26(1), 33–44 (1975)CrossRefGoogle Scholar
  12. 12.
    Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Communications of the ACM, 1022–1036 (1983)Google Scholar
  13. 13.
    Salton, G., Singhal, A., Buckley, C., Mitra, M.: Automatic text decomposition using text segments and text themes. In: UK Conference on Hypertext, pp. 53–65 (1996)Google Scholar
  14. 14.
    Salton, G.: The SMART retrieval system: experiments in automatic document processing. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  15. 15.
    Seydoux, F., Rajman, M., Chappelier, J.C.: Exploitation de connaissances sémantiques externes dans les représentations vectorielles en recherche documentaire. Ph.D. thesis (2006)Google Scholar
  16. 16.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction à l’algorithmique, version (en). section 24.3, Dijkstra’s algorithm, deuxième edn., pp. 595–601. MIT Press, McGraw-Hill (2001) ISBN 0-262-03293-7 Google Scholar
  17. 17.
  18. 18.
    Al charq Al awsat,
  19. 19.
    Al ahdat Al maghrebiya,
  20. 20.
  21. 21.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Taher Zaki
    • 1
  • Driss Mammass
    • 1
  • Abdellatif Ennaji
    • 2
  1. 1.Ibn Zohr UniversityAgadirMorrocco
  2. 2.LITIS EA 4108University of RouenFrance

Personalised recommendations