Advertisement

Meta-data and Strategies of Textual Data Analysis: Problems and Instruments

  • Sergio Bolasco
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Summary

In order to develop a proper multidimensional content analysis, we discuss some typical aspects of a pre-treatment of a textual data analysis. In particular: i) how to select the peculiar subset of the words in a text; ii) how to reduce the word ambiguity. Our proposal is to use both frequency dictionaries and reference lexicons as external lexical knowledge bases with respect to the corpus, by means of a comparison of ranking, inspired by Wegman’s parallel coordinate method. The conditions of iso-frequency of unlernmatized forms as an indication of the need for lemmatization is considered. Finally in order to evaluate the opportunities of the choices (both disambiguations and fusions), we propose the reconstruction, by means of bootstrapping strategy, of some convex hulls — as word confidence areas — in a factorial plane. Some examples from a large corpus of parliamentary discourses are presented.

Keywords

Convex Hull Word Ambiguity Factorial Plane Inflected Form Reference Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Balbi, S. (1995): Non symmetrical correspondence analysis of textual data and confidence regions for graphical forms. In: JADT 1995 Analisi statistica dei dati testuali, Bolasco, S. et al. (eds.), II, 5–12, CISU, RomaGoogle Scholar
  2. Bécue, M. et Haeusler, L. (1995): Vers une post-codification automatique In: JADT 1995 Analisi statistica dei dati testuali, Bolasco, S. et al. (eds.), I, 35–42, CISU, RomaGoogle Scholar
  3. Bolasco, S. (1993): Choix de lemmatisation en vue de reconstructions syntagmatiques du texte par l’analyse des correspondances. Proc. JADT 1993, 399–410, ENST-Telecom, ParisGoogle Scholar
  4. Bolasco, S. (1994): L’individuazione di forme testuali per lo studio statistico dei testi con tecniche di analisi multidimensionale. Atti della XXXVII Riunione Scientifica della S.I.S., II, 95–103, CISU, Roma Google Scholar
  5. Bortolini N., Tagliavini C., Zampolli A. (1971): Lessico di frequenza della lingua italiano contemporanea. Garzanti., Milano.Google Scholar
  6. Dubois, J. et al. (1979): Dizionario di Linguistica, Bologna: Zanichelli Google Scholar
  7. Elia, A. (1995): Per una disambiguazione semi-automatica di sintagmi composti: i dizionari elettronici lessico-grammaticali. In: Ricerca Qualitativa e Computer, Cipriani, R. e Bolasco, S. (eds.), 112–141, Franco Angeli, MilanoGoogle Scholar
  8. Cipriani, R. e Bolasco, S., eds. (1995): Ricerca Qualitativa e Computer. Franco Angeli, MilanoGoogle Scholar
  9. Lavit, Ch. (1988): Analyse conjointe de tableaux quantitatifs. Masson, ParisGoogle Scholar
  10. Lebart, L. et Salem, A. (1994): Statistique textuelle. Dunod, Paris Google Scholar
  11. Lyne A. A. (1985): The vocabulary of french business correspondence, Slatkine-Champion, ParisGoogle Scholar
  12. Salem, A. (1987): Pratique des segments répétés. Essai de statistique textuelle. Klincksieck, ParisGoogle Scholar
  13. Weguran, E. J. (1990): Hyperdimensional Data Analysis Using Parallel Coordinates JASA, 85, 411, 664–675Google Scholar

Copyright information

© Springer Japan 1998

Authors and Affiliations

  • Sergio Bolasco
    • 1
  1. 1.Faculty of EconomyUniversity of Rome “La Sapienza”RomaItaly

Personalised recommendations