Skip to main content

On the Sample Coefficient of Nominal Variation

  • Conference paper
  • First Online:

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 294))

Abstract

Categorical dispersion is commonly measured in terms of the index of qualitative variation (Gini index), but a transformed version of it, the coefficient of nominal variation (CNV), is recommended as being better interpretable. We consider the sample version of the CNV and derive its asymptotic distribution both for independent and time series data. The finite-sample performance of this approximation is analyzed in a simulation study. The CNV is also applied to a real-data example.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, Hoboken, NJ (2002)

    Google Scholar 

  2. Billingsley, P.: Convergence of Probability Measures. 2nd edin. Wiley, New York (1999)

    Google Scholar 

  3. GESIS—Leibniz Institute for the Social Sciences. ALLBUS 2016 (German General Social Survey 2016). ZA5250 data file (version 2.1.0), GESIS Data Archive, Cologne (2017)

    Google Scholar 

  4. Jacobs, P.A., Lewis, P.A.W.: Stationary discrete autoregressive-moving average time series generated by mixtures. J. Time Ser. Anal. 4(1), 19–36 (1983)

    Article  MathSciNet  Google Scholar 

  5. Kvålseth, T.O.: Coefficients of variation for nominal and ordinal categorical data. Percept. Mot. Skills 80(3), 843–847 (1995)

    Article  Google Scholar 

  6. Kvålseth, T.O.: Variation for categorical variables. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 1642–1645. Springer, Berlin (2011)

    Google Scholar 

  7. Kvålseth, T.O.: The lambda distribution and its applications to categorical summary measures. Adv. Appl. Stat. 24(2), 83–106 (2011)

    Google Scholar 

  8. Rao, C.R.: Diversity and dissimilarity coefficients: a unified approach. Theor. Popul. Biol. 21(1), 24–43 (1982)

    Article  MathSciNet  Google Scholar 

  9. Weiß, C.H.: Serial dependence of NDARMA processes. Comput. Stat. Data Anal. 68, 213–238 (2013)

    Article  MathSciNet  Google Scholar 

  10. Weiß, C.H.: An Introduction to Discrete-Valued Time Series. Wiley, Chichester (2018)

    Google Scholar 

  11. Weiß, C.H., Göb, R.: Measuring serial dependence in categorical time series. AStA Adv. Stat. Anal. 92(1), 71–89 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian H. Weiß .

Editor information

Editors and Affiliations

Appendices

Appendix 1

The NDARMA model has been proposed by [4], and its definition might be given as follows [11]:

Let \((X_t)_{\mathbb {Z}}\) and \((\epsilon _t)_{\mathbb {Z}}\) be nominal processes with state space \(\mathcal S\), where \((\epsilon _t)_{\mathbb {Z}}\) is i.i.d. with marginal distribution \({\varvec{p}}\), and where \(\epsilon _t\) is independent of \((X_s)_{s<t}\). Let

$$ (\alpha _{t,1},\ldots ,\alpha _{t,{\text {p}}},\beta _{t,0},\ldots ,\beta _{t,{\text {q}}}) \quad \sim \ {\text {MULT}}(1;\ \phi _1, \ldots ,\phi _{{\text {p}}},\varphi _0,\ldots ,\varphi _{{\text {q}}}) $$

be i.i.d. multinomial random vectors, which are independent of \((\epsilon _t)_{\mathbb {Z}}\) and of \((X_s)_{s<t}\). Then \((X_t)_{\mathbb {Z}}\) is said to be an NDARMA(p, q) process (and the cases \({\text {q}}=0\) and \({\text {p}}=0\) are referred to as a DAR(p) process and DMA (q) process, respectively) if it follows the recursion

$$\begin{aligned} X_t\ =\ \alpha _{t,1}\cdot X_{t-1}+\cdots +\alpha _{t,{\text {p}}}\cdot X_{t-{\text {p}}}\ +\ \beta _{t,0}\cdot \epsilon _{t}+\cdots +\beta _{t,{\text {q}}}\cdot \epsilon _{t-{\text {q}}}. \end{aligned}$$
(7)

(Here, if the state space \(\mathcal S\) is not numerically coded, we assume \(0\cdot s = 0\), \(1\cdot s=s\) and \(s+0=s\) for each \(s\in \mathcal S\).)

NDARMA processes have several attractive properties, e.g., \(X_t\) and \(\epsilon _{t}\) have the same stationary marginal distribution: \(P(X_t=s_i)\,=\,p_i\,=\,P(\epsilon _t=s_i)\) for all \(i\in \mathcal S\). Their serial dependence structure is characterized by a set of Yule-Walker-type equations for the serial dependence measure Cohen’s \(\kappa \) from (3) [11]:

$$\begin{aligned} \textstyle \kappa (h)\ =\ \sum _{j=1}^{{\text {p}}} \phi _j\, \kappa (|h-j|)\ +\ \sum _{i=0}^{{\text {q}}-h} \varphi _{i+h}\, r(i)\qquad \text {for } h\ge 1, \end{aligned}$$
(8)

where the r(i) satisfy \(r(i)\, =\, \sum _{j=\max {\{0,i-{\text {p}}\}}}^{i-1} \phi _{i-j}\cdot r(j)\, +\, \varphi _i\,\mathbbm {1}(0\le i\le {\text {q}})\). The bivariate distributions at lag h, in turn, are \(p_{i|j}(h)\, =\, p_i\, +\, \kappa (h)\,(\delta _{i,j}-p_i)\). Consequently, see (4), we always have \(\vartheta (h)=\kappa (h)\) for NDARMA processes.

Finally, [9] showed that an NDARMA process is \(\phi \)-mixing with exponentially decreasing weights such that the CLT on p. 200 in [2] is applicable.

Appendix 2

From Sect. 2, we know that \(\widehat{\text {IQV}}\) is asymptotically normally distributed. Furthermore, with \(f(y)=1-\sqrt{1-y}\), it holds that \(f(\text {IQV})=\text {CNV}\). Since the derivative of f equals \(f'(y)=1/\big (2\,\sqrt{1-y}\big )=1/\big (2\,(1-f(y))\big )\), the Delta method immediately implies that the asymptotic variance of \(\widehat{\text {CNV}}\) equals \(\sigma _{\text {CNV}}^2/n\) with

$$\begin{aligned} \sigma _{\text {CNV}}^2\ =\ \frac{\sigma _{\text {IQV}}^2}{4\,(1-\text {IQV})} \ \overset{(6)}{=}\ c_\vartheta \, \left( \tfrac{m+1}{m}\right) ^2\,\frac{s_3({\varvec{p}}) - s_2^2({\varvec{p}})}{(1-\text {CNV})^2}. \end{aligned}$$
(9)

Note that (9) also implies the relation \(\sigma _{\text {IQV}}^2\,=\,4\,(1-\text {CNV})^2\,\sigma _{\text {CNV}}^2\).

To derive an approximate bias correction for \(\widehat{\text {CNV}}\), we have to go back to the asymptotic result (5) for \(\sqrt{n}\,\big (\hat{{\varvec{p}}}-{\varvec{p}}\big )\). Define \(g({\varvec{x}})=\frac{m+1}{m}\,(1-\sum _{i=0}^{m} x_i^2)\) such that \(g({\varvec{p}}) = \text {IQV}\) and \(f\big (g({\varvec{p}})\big ) = \text {CNV}\). Our approach is to consider the second-order Taylor expansion of \(f\big (g({\varvec{x}})\big )\), where \(f\big (g({\varvec{x}})\big )\) satisfies

$$ \begin{array}{rl} \frac{\partial }{\partial x_k}\,f\big (g({\varvec{x}})\big )\ =&{} f'\big (g({\varvec{x}})\big )\,\frac{\partial }{\partial x_k}\,g({\varvec{x}}),\\ \frac{\partial ^2}{\partial x_k\,\partial x_l}\,f\big (g({\varvec{x}})\big )\ =&{} f''\big (g({\varvec{x}})\big )\,\big (\frac{\partial }{\partial x_k}\,g({\varvec{x}})\big )\,\big (\frac{\partial }{\partial x_l}\,g({\varvec{x}})\big ) \ +\ f'\big (g({\varvec{x}})\big )\,\frac{\partial ^2}{\partial x_k\,\partial x_l}\,g({\varvec{x}}). \end{array} $$

Here,

$$ \textstyle f''(y)=\frac{1/4}{(1-f(y))^3},\quad \tfrac{\partial }{\partial x_k}\,g({\varvec{x}})=-2\,\frac{m+1}{m}\,x_k,\ \tfrac{\partial ^2}{\partial x_k\,\partial x_l}\,g({\varvec{x}}) = -2\,\frac{m+1}{m}\,\delta _{k,l}. $$

Hence, the matrix \(\mathbf{H }\), defined as the Hessian of \(f\big (g({\varvec{x}})\big )\) evaluated in \({\varvec{x}}={\varvec{p}}\), has the entries

$$ h_{k,l} \ =\ \frac{(\frac{m+1}{m})^2\,p_k\,p_l}{(1-\text {CNV})^3} \ -\ \frac{\frac{m+1}{m}\,\delta _{k,l}}{1-\text {CNV}}. $$

Since \(E[\hat{{\varvec{p}}}-{\varvec{p}}]={\varvec{0}}\), the second-order Taylor expansion implies that \(n\,\Big (E\big [f\big (g(\hat{{\varvec{p}}})\big )\big ]-f\big (g({\varvec{p}})\big )\Big )\,\approx \, \frac{1}{2}\,E\big [\sqrt{n}(\hat{{\varvec{p}}}-{\varvec{p}})^\top \,\mathbf{H }\,\sqrt{n}(\hat{{\varvec{p}}}-{\varvec{p}})\big ]\). Thus, using (5), a bias correction for \(n\,\widehat{\text {CNV}}\) is computed as

$$ \textstyle \tfrac{1}{2}\sum \limits _{k=0}^{m} h_{k,k}\,\sigma _{k,k} \ +\ \sum \limits _{k=0}^{m-1} \sum \limits _{l=k+1}^{m} h_{k,l}\,\sigma _{k,l} \ =\ \tfrac{1}{2}\sum \limits _{k,l=0}^m\frac{(\frac{m+1}{m})^2\,p_k\,p_l}{(1-\text {CNV})^3}\,\sigma _{k,l} \ -\ \tfrac{1}{2}\sum \limits _{k=0}^{m} \frac{\frac{m+1}{m}}{1-\text {CNV}}\,\sigma _{k,k}. $$

Using (6) and that \(\sum _{k=0}^{m} \sigma _{k,k} = c_\kappa \left( 1-s_2({\varvec{p}})\right) \) according to (5), this expression becomes

$$ \tfrac{1}{8}\,\frac{\sigma _{\text {IQV}}^2}{(1-\text {CNV})^3} \ -\ \tfrac{1}{2}\,\frac{c_\kappa \,\text {IQV}}{1-\text {CNV}} \ \overset{(9)}{=}\ \tfrac{1}{2}\,\frac{\sigma _{\text {CNV}}^2}{1-\text {CNV}} \ -\ \tfrac{1}{2}\,\frac{c_\kappa \,\text {CNV}\,(2-\text {CNV})}{1-\text {CNV}}. $$

So the proof of Theorem 1 is complete.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Weiß, C.H. (2019). On the Sample Coefficient of Nominal Variation. In: Steland, A., Rafajłowicz, E., Okhrin, O. (eds) Stochastic Models, Statistics and Their Applications. SMSA 2019. Springer Proceedings in Mathematics & Statistics, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-030-28665-1_18

Download citation

Publish with us

Policies and ethics