A performance evaluation of automatic survey classifiers

  • Peter Viechnicki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1433)


A novel NLP task, automatic survey coding, is described, and two methods for performing this task are presented. The first method uses a Boolean pattern-matching strategy to code survey responses, while the second uses a vector-based (probabilistic) method. The performance of the two methods is tested and compared on three representative survey datasets. The Boolean method is shown to perform slightly better on average than the vector-based method. Linguistic factors affecting the difficulty of the coding task for each survey are discussed.


Classification Algorithm General Social Survey Category Definition Close Category Category Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berlin, B. (1978) ‘Ethnobiological classification.’ In E. Rosch and B. Lloyd (eds.) Cognition and Categorization, pp. 9–27. Hillsdale, New Jersey: Lawrence Erlbaum.Google Scholar
  2. 2.
    Bookstein, A., (1985) ‘Probability and fuzzy-set applications to information retrieval.’ In M. Williams (ed.), Annual Review of Information Science and Technology 20:117–151.Google Scholar
  3. 3.
    Cohen, J. (1960) ‘A coefficient of agreement for nominal scales.’ Education and Psychological Measurement 20:37–46.Google Scholar
  4. 4.
    Davis, J., and Smith, T. (1996) General Social Surveys, 1972–1996: Cumulative Codebook. Chicago: National Opinion Research Center.Google Scholar
  5. 5.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990) ‘Indexing by latent semantic analysis.’ Journal of the American Society for Information Science 41(6).Google Scholar
  6. 6.
    Duda, R., and Hart, P. (1973) Pattern Classification and Scene Analysis. New York: John Wiley & Sons.Google Scholar
  7. 7.
    Ellis, D. (1990) New Horizons in Information Retrieval. London: Library Association.Google Scholar
  8. 8.
    Fellbaum, C. (1993) ‘English verbs as a semantic net.’ In G. Miller (ed.) Five Papers on Wordnet. Scholar
  9. 9.
    Landis, J., and Koch, G. (1977) ‘The measurement of observer agreement for categorical data.’ Biometrics 33:159–174.zbMATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Lewis, D. (1992) ‘An evaluation of phrasal and clustered representations on a text categorization task.’ ACM-SIGIR'92, pp. 37–50.Google Scholar
  11. 11.
    Pratt, D., and Mays, J. (1989) ‘Automatic coding of transcript data for a survey of recent college graduates.’ Proceedings of the Section on Survey Methods of the American Statistical Association Annual Meeting,pp. 796–801.Google Scholar
  12. 12.
    Raud, R., and Fallig, M. (1995) ‘Automating the coding process with neural networks.’ Scholar
  13. 13.
    Rosch, E. (1978) ‘Principles of categorization.’ In E. Rosch and B. Lloyd (eds.)Cognition and Categorization, pp. 28–49. Hillsdale, New Jersey: Lawrence Erlbaum.Google Scholar
  14. 14.
    Salton, G. (ed.) (1971) The SMART Retrieval System — Experiments in Automatic Document Processing. Englewood Cliffs, New Jersey: Prentice-Hall.Google Scholar
  15. 15.
    Salton, G., and McGill, M. (1983) Introduction to Modern Information Retrieval. New York: McGraw-Hill.Google Scholar
  16. 16.
    Schuetze, H., Hull, D., and Pedersen, P. (1995) ‘A comparison of classifiers and document representations for the routing problem.’ ACM-SIGIR'95, pp. 229–237.Google Scholar
  17. 17.
    Thomas, T. (1994) ‘Concept extraction applied to text analysis of medical records.’ Los Alamos Science 22:145–148.Google Scholar
  18. 18.
    Viechnicki, P. (1997) ‘A comparison of classification algorithms for a survey coding task.’ Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Peter Viechnicki
    • 1
  1. 1.Department of LinguisticsThe University of ChicagoUSA

Personalised recommendations