Use of Elliptic Curves in Term Discrimination

  • Darnes Vilariño
  • David Pinto
  • Carlos Balderas
  • Mireya Tovar
  • Beatriz Beltrán
  • Sofia Paniagua
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6718)

Abstract

Detection of discriminant terms allow us to improve the performance of natural language processing systems. The goal is to be able to find the possible term contribution in a given corpus and, thereafter, to use the terms of high contribution for representing the corpus. In this paper we present various experiments that use elliptic curves with the purpose of discovering discriminant terms of a given textual corpus. Different experiments led us to use the mean and variance of the corpus terms for determining the parameters of a Weierstrass reduced equation (elliptic curve). We use the elliptic curves in order to graphically visualize the behavior of the corpus vocabulary. Thereafter, we use the elliptic curve parameters in order to cluster those terms that share characteristics. These clusters are then used as discriminant terms in order to represent the original document collection. Finally, we evaluated all these corpus representations in order to determine those terms that best discrimine each document.

Keywords

Elliptic Curve Elliptic Curf North American Free Trade Agreement Textual Corpus Elliptic Curve Cryptography 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Can, F., Ozkarahan, E.A.: Computation of term/document discrimination values by use of the cover coefficient concept. Journal of the American Society for Information Science 38(3), 171–183 (1987)CrossRefGoogle Scholar
  2. 2.
    Manning, D.C., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  3. 3.
    Pinto, D.: On Clustering and Evaluation of Narrow Domain Short-Text Corpora. Phd thesis, Department of Information Systems and Computation, UPV (2008)Google Scholar
  4. 4.
    Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)CrossRefMATHGoogle Scholar
  5. 5.
    Montemurro, M.A., Zanette, D.H.: Entropic analysis of the role of words in literary texts. Advances in Complex Systems (ACS) 05(01), 7–17 (2002)CrossRefMATHGoogle Scholar
  6. 6.
    Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulchloper, J.: Topic discovery based on text mining techniques. Information Processing and Management 43(3), 752–768 (2007)CrossRefGoogle Scholar
  7. 7.
    Santiesteban, Y., Pons-Porrata, A.: LEX: a new algorithm for the calculus of typical testors. Mathematics Sciences Journal 21(1), 85–95 (2003)Google Scholar
  8. 8.
    Hankerson, D., Menezes, A.J., Vanstone, S.: Guide to Elliptic Curve Cryptography. Springer, New York (2003)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Darnes Vilariño
    • 1
  • David Pinto
    • 1
  • Carlos Balderas
    • 1
  • Mireya Tovar
    • 1
  • Beatriz Beltrán
    • 1
  • Sofia Paniagua
    • 1
  1. 1.Faculty of Computer ScienceBenemérita Universidad Autónoma de PueblaMexico

Personalised recommendations