Skip to main content

Simple Estimation for Categorical Data

  • Chapter
  • First Online:
Lectures on Categorical Data Analysis

Part of the book series: Springer Texts in Statistics ((STS))

  • 3081 Accesses

Abstract

This chapter summarizes several simple procedures often used in the analysis of categorical data. These include maximum likelihood estimation of parameters of binomial, multinomial, and Poisson distributions and also unbiased estimation with unequal selection probabilities. The Lagrange multiplier method is introduced, and maximum likelihood estimation in general parametric models is considered. In addition to the usual formula for the standard error of an estimated probability, the δ-method is used to derive asymptotic standard errors for estimates of more complex quantities, which are routinely reported in surveys. Standard errors of estimates of fractions based on stratified samples are compared to standard errors obtained from simple random samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ​​In a categorical setup, the likelihood of the sample is the same as its probability. The word likelihood is used to also refer to situations where all observations have zero probability (as with continuous random variables) but different likelihoods (as, e.g., with a normal distribution).

  2. 2.

    ​In a more general setting, the density.

  3. 3.

    ​​For the time being, it is assumed that this is possible, that is, p > 0. Section 4.1.1 gives a more detailed discussion that includes the case of p = 0 and that also applies here.

  4. 4.

    ​​Later on, maximum likelihood estimates under statistical models defined in terms of restrictions on p will be determined. In those cases, the additional restrictions implied by the model need to be imposed, too.

  5. 5.

    ​​Avoiding self-selection of the respondents and reducing the effects of nonresponse and other kinds of missing data are major problems of survey methodology. Also, much of the published statistical analyses of survey data disregard the peculiarities of the sampling procedure and work as if the sampling distribution was multinomial or Poisson. In reality, most of the nationwide surveys use complex sampling procedures that often include stratification and multistage selection. The goal of applying these procedures may be to reduce the data collection cost per respondent or to incorporate information about the population to reduce the standard deviations of estimates. There are many good books on survey sampling; (41) and (54) are recommended in particular.

  6. 6.

    ​​Note that some authors distinguish between standard deviation, which is a parameter associated with a random variable, and standard error, which is the same parameter associated with a quantity determined from a sample. Such a strict distinction is not made in this book.

  7. 7.

    ​​Care should be taken not to interpret the margin of error as if it was an absolute bound on the magnitude of the possible error of the estimate.

  8. 8.

    ​​More precisely, the margin of error addresses only the size of the so-called sampling error, that is, the difference between the estimate from the survey and the value that would be obtained if the methods of the survey were used to carry out a census. In a census, data are collected from the entire population; no sampling occurs. In most cases, however, even the value obtained from the census may be different from the true value. For example, respondents may not remember or do not want to tell the truth, or any other kind of measurement error may occur. The difference between the census value and the true value is called the nonsampling error of the survey.

  9. 9.

    ​​In many countries, a party has to receive at least 5% of the votes that were actually cast to get into the parliament. For this, and other reasons, the number of seats in the parliament may not be a linear function of the fraction of votes received.

References

  1. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York (1982)

    Google Scholar 

  2. Hansen, M.H., Hurwitz, W.N. Madow, W.G.: Sample Survey Methods and Theory, Volumes I and II. Wiley, New York (1993)

    Google Scholar 

  3. Kish, L.: Survey Sampling. Wiley, New York (1995)

    Google Scholar 

  4. Lohr, S.L.: Sampling: Design and Analysis. Brooks/Cole, Boston (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rudas, T. (2018). Simple Estimation for Categorical Data. In: Lectures on Categorical Data Analysis. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-7693-5_4

Download citation

Publish with us

Policies and ethics