Summary
In the domain of text analysis, a large spectrum of statistical methods has been developed in order to solve problems such as authorship attribution, time determination, information retrieval, processing of responses to open questions in surveys. In analyses of this kind, the applied statistical methods have to produce discrimination models. The textual data entities may be chosen by features of form, or by characteristics of content.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AKUTO, H. (Ed.) (1992): International Comparison of Dietary Cultures, Nihon Keizai Shimbun, Tokyo.
AKUTO, H., LEBART, L. (1992): Le Repas Idéal. Analyse de Réponses Libres en Anglais, Franais, Japonais. Les Cahiers de l’Analyse des Données, vol XVII, n°3, Dunod, Paris, 327–352.
BENZECRI, J.-P. (1977): Analyse Discriminante et Analyse Factorielle, Les Cahiers de l’Analyse des Données, II, n°4, 369–406.
BENZECRI J-P.& COLL.(1981a): -Pratique de l’Analyse des Données, tome 3, Linguistique &: Lexicologie, Dunod , Paris.
BENZECRI, J.-P. (1992a): Note de Lecture : Sur l’Analyse des Données dans une Enquête Internationale. Les Cahiers de l’Analyse des Données, vol XVII, n°3, Dunod, Paris, 353–358.
BENZECRI, J.-P. et F. (1992b): Typologie de Textes espagnols de la Litterature du Siècle d’Or d’après les Occurrences des Formes des mots outil. Les Cahiers de l’Analyse des Données, vol XVII, n°4, Dunod, Paris, 425–464.
CELEUX, G., HÉBRAIL, G., MKHADRI, A., SUCHARD, M. (1991): Reduction of a Large Scale and ill-conditioned statistical Problem on textual Data, in Applied Stochastic Models and Data Analysis, Proceedings of the 5th Symposium in ASMDA, Gutierrez R. and Valderrama M.J. Eds, World Scientific, 129–137.
DEERWESTER, S., DUMAIS, S.T., FURNAS, G.W., LANDAUER, T.K., HARSHMAN, R. (1990): “Indexing by Latent Semantic Analysis”, J. of the Amer. Soc. for Information Science, 41 (6), 391–407.
FOWLER, R.H., FOWLER, W.A.L., WILSON, B.A. (1991): “Integrating Query, Thesaurus, and Documents through a Common Visual Representation”, Proceedings of the 14th Int. ACM Conf., on Res. and Dev. in Information Retrieval, Bookstein A. and al., , Ed, p 142–151, ACM Press, New York.
HOLMES, D.I. (1985): The Analysis of Literary Style — A Review J.R.Statist.Soc., 148, Part 4, 328–341.
HOLMES, D.I. (1992): A Stylometric Analysis of Mormon Scripture and Related Texts. J.R.Statist.Soc., 155, Part 1, 91–120.
LEBART, L. (1982): Exploratory Analysis of Large Sparse Matrices, with Application to Textual Data. COMPSTAT, Physica Verlag, p 67–76.
LEBART, L. (1992a): Discrimination through the Regularized Nearest Cluster Method, in: Computational Statistics, (Y. Dodge, J. Whittaker, eds) Physica Verlag, Heidelberg, 103–118.
LEBART, L. (1992b): Assessing and Comparing Patterns in Multivariate Analysis, Second Japanese French Seminar on Data Science, in Data Science and application, Hayashi et al. ed, HBJ, Tokyo, Japan.
LEBART, L., SALEM, A. (1988): Analyse Statistique des Données Textuelles, Dunod, Paris.
LEBART, L., SALEM, A. (1994): Statistique Textuelle, Dunod, Paris.
LEBART, L., SALEM, A., BERRY, E. (1991): Recent Development in the Statistical Processing of Textual Data, in Applied Stoch. Model and Data Analysis, 7, 47–62, Wiley.
MCLACHLAN, G.J. (1992): Discriminant Analysis and Statistical Pattern Recognition, Wiley, New York.
MOSTELLER, F., WALLACE, D. (1964): Inference and disputed Authorship : The Federalists. Addison-Wesley, Reading, Mass.
SALEM, A. (1984): “La Typologie des Segments Répétés dans un Corpus, Fondée sur l’Analyse d’un Tableau Croisant Mots et textes”, Les Cahiers d’Analyse des Données, Vol IX — n°4, p. 489–500.
SALTON, G. (1988): Automatic Text Processing: the Transformation, Analysis and Retrieval of Information by Computer, Addison-Wesley.
SALTON, G., MC GILL, M.J. (1983): Introduction to Modern Information Retrieval, International Student Edition.
THISTED, R., EFRON, B. (1987): Did Shakespeare write a newly discovered poem? Biometrika, 74, 445–455.
YULE, G.U. (1944): The Statistical Study of Literary Vocabulary, Cambridge University Press, Reprinted in 1968 by Archon Books, Hamden, Connecticut.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lebart, L., Callant, C. (1994). Discriminant Analysis Using Textual Data. In: Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., Burtschy, B. (eds) New Approaches in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-51175-2_68
Download citation
DOI: https://doi.org/10.1007/978-3-642-51175-2_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58425-4
Online ISBN: 978-3-642-51175-2
eBook Packages: Springer Book Archive