Abstract
Categorical dispersion is commonly measured in terms of the index of qualitative variation (Gini index), but a transformed version of it, the coefficient of nominal variation (CNV), is recommended as being better interpretable. We consider the sample version of the CNV and derive its asymptotic distribution both for independent and time series data. The finite-sample performance of this approximation is analyzed in a simulation study. The CNV is also applied to a real-data example.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, Hoboken, NJ (2002)
Billingsley, P.: Convergence of Probability Measures. 2nd edin. Wiley, New York (1999)
GESIS—Leibniz Institute for the Social Sciences. ALLBUS 2016 (German General Social Survey 2016). ZA5250 data file (version 2.1.0), GESIS Data Archive, Cologne (2017)
Jacobs, P.A., Lewis, P.A.W.: Stationary discrete autoregressive-moving average time series generated by mixtures. J. Time Ser. Anal. 4(1), 19–36 (1983)
Kvålseth, T.O.: Coefficients of variation for nominal and ordinal categorical data. Percept. Mot. Skills 80(3), 843–847 (1995)
Kvålseth, T.O.: Variation for categorical variables. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 1642–1645. Springer, Berlin (2011)
Kvålseth, T.O.: The lambda distribution and its applications to categorical summary measures. Adv. Appl. Stat. 24(2), 83–106 (2011)
Rao, C.R.: Diversity and dissimilarity coefficients: a unified approach. Theor. Popul. Biol. 21(1), 24–43 (1982)
Weiß, C.H.: Serial dependence of NDARMA processes. Comput. Stat. Data Anal. 68, 213–238 (2013)
Weiß, C.H.: An Introduction to Discrete-Valued Time Series. Wiley, Chichester (2018)
Weiß, C.H., Göb, R.: Measuring serial dependence in categorical time series. AStA Adv. Stat. Anal. 92(1), 71–89 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1
The NDARMA model has been proposed by [4], and its definition might be given as follows [11]:
Let \((X_t)_{\mathbb {Z}}\) and \((\epsilon _t)_{\mathbb {Z}}\) be nominal processes with state space \(\mathcal S\), where \((\epsilon _t)_{\mathbb {Z}}\) is i.i.d. with marginal distribution \({\varvec{p}}\), and where \(\epsilon _t\) is independent of \((X_s)_{s<t}\). Let
be i.i.d. multinomial random vectors, which are independent of \((\epsilon _t)_{\mathbb {Z}}\) and of \((X_s)_{s<t}\). Then \((X_t)_{\mathbb {Z}}\) is said to be an NDARMA(p, q) process (and the cases \({\text {q}}=0\) and \({\text {p}}=0\) are referred to as a DAR(p) process and DMA (q) process, respectively) if it follows the recursion
(Here, if the state space \(\mathcal S\) is not numerically coded, we assume \(0\cdot s = 0\), \(1\cdot s=s\) and \(s+0=s\) for each \(s\in \mathcal S\).)
NDARMA processes have several attractive properties, e.g., \(X_t\) and \(\epsilon _{t}\) have the same stationary marginal distribution: \(P(X_t=s_i)\,=\,p_i\,=\,P(\epsilon _t=s_i)\) for all \(i\in \mathcal S\). Their serial dependence structure is characterized by a set of Yule-Walker-type equations for the serial dependence measure Cohen’s \(\kappa \) from (3) [11]:
where the r(i) satisfy \(r(i)\, =\, \sum _{j=\max {\{0,i-{\text {p}}\}}}^{i-1} \phi _{i-j}\cdot r(j)\, +\, \varphi _i\,\mathbbm {1}(0\le i\le {\text {q}})\). The bivariate distributions at lag h, in turn, are \(p_{i|j}(h)\, =\, p_i\, +\, \kappa (h)\,(\delta _{i,j}-p_i)\). Consequently, see (4), we always have \(\vartheta (h)=\kappa (h)\) for NDARMA processes.
Finally, [9] showed that an NDARMA process is \(\phi \)-mixing with exponentially decreasing weights such that the CLT on p. 200 in [2] is applicable.
Appendix 2
From Sect. 2, we know that \(\widehat{\text {IQV}}\) is asymptotically normally distributed. Furthermore, with \(f(y)=1-\sqrt{1-y}\), it holds that \(f(\text {IQV})=\text {CNV}\). Since the derivative of f equals \(f'(y)=1/\big (2\,\sqrt{1-y}\big )=1/\big (2\,(1-f(y))\big )\), the Delta method immediately implies that the asymptotic variance of \(\widehat{\text {CNV}}\) equals \(\sigma _{\text {CNV}}^2/n\) with
Note that (9) also implies the relation \(\sigma _{\text {IQV}}^2\,=\,4\,(1-\text {CNV})^2\,\sigma _{\text {CNV}}^2\).
To derive an approximate bias correction for \(\widehat{\text {CNV}}\), we have to go back to the asymptotic result (5) for \(\sqrt{n}\,\big (\hat{{\varvec{p}}}-{\varvec{p}}\big )\). Define \(g({\varvec{x}})=\frac{m+1}{m}\,(1-\sum _{i=0}^{m} x_i^2)\) such that \(g({\varvec{p}}) = \text {IQV}\) and \(f\big (g({\varvec{p}})\big ) = \text {CNV}\). Our approach is to consider the second-order Taylor expansion of \(f\big (g({\varvec{x}})\big )\), where \(f\big (g({\varvec{x}})\big )\) satisfies
Here,
Hence, the matrix \(\mathbf{H }\), defined as the Hessian of \(f\big (g({\varvec{x}})\big )\) evaluated in \({\varvec{x}}={\varvec{p}}\), has the entries
Since \(E[\hat{{\varvec{p}}}-{\varvec{p}}]={\varvec{0}}\), the second-order Taylor expansion implies that \(n\,\Big (E\big [f\big (g(\hat{{\varvec{p}}})\big )\big ]-f\big (g({\varvec{p}})\big )\Big )\,\approx \, \frac{1}{2}\,E\big [\sqrt{n}(\hat{{\varvec{p}}}-{\varvec{p}})^\top \,\mathbf{H }\,\sqrt{n}(\hat{{\varvec{p}}}-{\varvec{p}})\big ]\). Thus, using (5), a bias correction for \(n\,\widehat{\text {CNV}}\) is computed as
Using (6) and that \(\sum _{k=0}^{m} \sigma _{k,k} = c_\kappa \left( 1-s_2({\varvec{p}})\right) \) according to (5), this expression becomes
So the proof of Theorem 1 is complete.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Weiß, C.H. (2019). On the Sample Coefficient of Nominal Variation. In: Steland, A., Rafajłowicz, E., Okhrin, O. (eds) Stochastic Models, Statistics and Their Applications. SMSA 2019. Springer Proceedings in Mathematics & Statistics, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-030-28665-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-28665-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28664-4
Online ISBN: 978-3-030-28665-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)