Semantic Classification and Co-occurrences: A Method for the Rules Production for the Information Extraction from Textual Data

  • Alessio CanzonettiEmail author
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Information extraction is a field of computer science research which explores the problem of detecting and retrieving desired information from textual data. This paper proposes a two-steps method that enables the detection of relevant information within a corpus of textual data. The first phase consists of observing the most recurrent structures through the study of textual co-occurrences and collocations, while the following phase consists of deriving rules from these structures which make it possible to create an inventory of all the expressions that identify a particular concept of interest, that is, the desired information.


Information Extraction Semantic Classification Textual Data Semantic Class Computer Science Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The present research was funded by MIUR 2004 – C26F040421.


  1. Bolasco, S., & Canzonetti, A. (2005). Some insights into the evolution of 1990s standard Italian using text mining and automatic categorization. In M. Vichi, P. Monari, S. Mignani, &A. Montanari (Eds.), New development in classification and data analysis (pp. 293–302). Berlin: Springer.Google Scholar
  2. Lafon, P. (1984). Dépouillements et statistiques en lexicométrie. Paris: Slatkine-Champion.Google Scholar
  3. Martinez, W. (2003). Contribution à une méthodologie de l’analyse des cooccurrences lexicales multiples dans les corpus textuels. Thèse de Doctorat enSciences du Langage, Université de la Sorbonne nouvelle – Paris 3, Paris.Google Scholar
  4. Poibeau, T. (2005). Una metodologia per l’annotazione semantica e l’estrazione di informazione. In S. Bolasco, A. Canzonetti, & F. M. Capo (Eds.), Text mining – Uno strumento strategico per imprese e istituzioni (pp. 37–44). Rome: CISU.Google Scholar
  5. Salem, A. (1987). Pratique des segments répétés, Publications de L’InaLF, collection “Saint-Cloud”, Klimcksieck, Paris.zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Dipartimento Studi Geoeconomici, Linguistici, Statistici, Storici per l’Analisi regionale - Facolta’ di EconomiaSapienza Universita’ di RomaRomaItaly

Personalised recommendations