Abstract
In this paper, we extended the vectorial model of Salton [9], [11], [12] and [14], by adapting the TF-IDF parameter by its combination with the Okapi formula for index terms extraction and evaluation of the in order to identify the relevant concepts which represent a document.Indeed, we have proposed a new measure TFIDF-ABR which takes in consideration the concept of semantic vicinity using a measure of similarity between terms by combining the calculation of TF-IDF-Okapi with a kernel approach (Radial Basis function).
This indexation approach allows a contextual and semantic research. In order to have a robust descriptor index, we used not only a semantic graph to highlight the semantic connections between terms, but also an auxiliary dictionary to increase the connectivity of the constructed graph and therefore the semantic weight of indexation words.
Chapter PDF
Similar content being viewed by others
Keywords
References
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)
Dijkstra, E.W.: A short introduction to the art of programming, contenant l’article original décrivant l’algorithme de Dijkstra, pp. 67–73
Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness singWikipedia-based Explicit Semantic Analysis. In: Proc. IJCAI’07, pp. 1606–1611 (2007)
Khoja, S., Garside, S.: Stemming Arabic Text. Computing Department. Lancaster University, Lancaster, September 22 (1999), http://www.comp.lancs.ac.uk/computing/users/khoja/stemmer.ps
Quillian, M.R.: Semantic memory. In: Semantic Information Processing (1968)
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transaction on Systems, Man, and Cybernetics 19(1), 17–30 (1989)
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Robertson, S., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing and Management 36(1), 95–108 (2000)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. Journal of the American Society for Information Science and Technology 26(1), 33–44 (1975)
Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Communications of the ACM, 1022–1036 (1983)
Salton, G., Singhal, A., Buckley, C., Mitra, M.: Automatic text decomposition using text segments and text themes. In: UK Conference on Hypertext, pp. 53–65 (1996)
Salton, G.: The SMART retrieval system: experiments in automatic document processing. Prentice-Hall, Englewood Cliffs (1971)
Seydoux, F., Rajman, M., Chappelier, J.C.: Exploitation de connaissances sémantiques externes dans les représentations vectorielles en recherche documentaire. Ph.D. thesis (2006)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction à l’algorithmique, version (en). section 24.3, Dijkstra’s algorithm, deuxième edn., pp. 595–601. MIT Press, McGraw-Hill (2001) ISBN 0-262-03293-7
Al Jazeera: http://www.aljazeera.net/
Al charq Al awsat, http://www.aawsat.com/
Al ahdat Al maghrebiya, http://www.almaghribia.ma/
Associated Press, http://www.cs.princeton.edu/~blei/lda-c/ap.tgz
Wikipedia, http://en.wikipedia.org/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zaki, T., Mammass, D., Ennaji, A. (2010). A Semantic Proximity Based System of Arabic Text Indexation. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D., Meunier, J. (eds) Image and Signal Processing. ICISP 2010. Lecture Notes in Computer Science, vol 6134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13681-8_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-13681-8_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13680-1
Online ISBN: 978-3-642-13681-8
eBook Packages: Computer ScienceComputer Science (R0)