On the Sample Coefficient of Nominal Variation

Weiß, Christian H.

doi:10.1007/978-3-030-28665-1_18

On the Sample Coefficient of Nominal Variation

Christian H. Weiß⁴

Conference paper
First Online: 16 October 2019

1135 Accesses
2 Citations

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 294))

Abstract

Categorical dispersion is commonly measured in terms of the index of qualitative variation (Gini index), but a transformed version of it, the coefficient of nominal variation (CNV), is recommended as being better interpretable. We consider the sample version of the CNV and derive its asymptotic distribution both for independent and time series data. The finite-sample performance of this approximation is analyzed in a simulation study. The CNV is also applied to a real-data example.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, Hoboken, NJ (2002)
Google Scholar
Billingsley, P.: Convergence of Probability Measures. 2nd edin. Wiley, New York (1999)
Google Scholar
GESIS—Leibniz Institute for the Social Sciences. ALLBUS 2016 (German General Social Survey 2016). ZA5250 data file (version 2.1.0), GESIS Data Archive, Cologne (2017)
Google Scholar
Jacobs, P.A., Lewis, P.A.W.: Stationary discrete autoregressive-moving average time series generated by mixtures. J. Time Ser. Anal. 4(1), 19–36 (1983)
Article MathSciNet Google Scholar
Kvålseth, T.O.: Coefficients of variation for nominal and ordinal categorical data. Percept. Mot. Skills 80(3), 843–847 (1995)
Article Google Scholar
Kvålseth, T.O.: Variation for categorical variables. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 1642–1645. Springer, Berlin (2011)
Google Scholar
Kvålseth, T.O.: The lambda distribution and its applications to categorical summary measures. Adv. Appl. Stat. 24(2), 83–106 (2011)
Google Scholar
Rao, C.R.: Diversity and dissimilarity coefficients: a unified approach. Theor. Popul. Biol. 21(1), 24–43 (1982)
Article MathSciNet Google Scholar
Weiß, C.H.: Serial dependence of NDARMA processes. Comput. Stat. Data Anal. 68, 213–238 (2013)
Article MathSciNet Google Scholar
Weiß, C.H.: An Introduction to Discrete-Valued Time Series. Wiley, Chichester (2018)
Google Scholar
Weiß, C.H., Göb, R.: Measuring serial dependence in categorical time series. AStA Adv. Stat. Anal. 92(1), 71–89 (2008)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Helmut Schmidt University, Hamburg, Germany
Christian H. Weiß

Authors

Christian H. Weiß
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian H. Weiß .

Editor information

Editors and Affiliations

Institute of Statistics, RWTH Aachen University, Aachen, Germany
Ansgar Steland
Department of Control Systems and Mechatronics, Wrocław University of Technology, Wrocław, Poland
Ewaryst Rafajłowicz
Institute of Transport and Economics, Technische Universität Dresden, Dresden, Germany
Ostap Okhrin

Appendices

Appendix 1

The NDARMA model has been proposed by [4], and its definition might be given as follows [11]:

Let $(X_t)_{\mathbb {Z}}$ and $(\epsilon _t)_{\mathbb {Z}}$ be nominal processes with state space $\mathcal S$, where $(\epsilon _t)_{\mathbb {Z}}$ is i.i.d. with marginal distribution ${\varvec{p}}$, and where $\epsilon _t$ is independent of $(X_s)_{s<t}$. Let

$$ (\alpha _{t,1},\ldots ,\alpha _{t,{\text {p}}},\beta _{t,0},\ldots ,\beta _{t,{\text {q}}}) \quad \sim \ {\text {MULT}}(1;\ \phi _1, \ldots ,\phi _{{\text {p}}},\varphi _0,\ldots ,\varphi _{{\text {q}}}) $$

be i.i.d. multinomial random vectors, which are independent of $(\epsilon _t)_{\mathbb {Z}}$ and of $(X_s)_{s<t}$. Then $(X_t)_{\mathbb {Z}}$ is said to be an NDARMA(p, q) process (and the cases ${\text {q}}=0$ and ${\text {p}}=0$ are referred to as a DAR(p) process and DMA (q) process, respectively) if it follows the recursion

$$\begin{aligned} X_t\ =\ \alpha _{t,1}\cdot X_{t-1}+\cdots +\alpha _{t,{\text {p}}}\cdot X_{t-{\text {p}}}\ +\ \beta _{t,0}\cdot \epsilon _{t}+\cdots +\beta _{t,{\text {q}}}\cdot \epsilon _{t-{\text {q}}}. \end{aligned}$$

(7)

(Here, if the state space $\mathcal S$ is not numerically coded, we assume $0\cdot s = 0$, $1\cdot s=s$ and $s+0=s$ for each $s\in \mathcal S$.)

NDARMA processes have several attractive properties, e.g., $X_t$ and $\epsilon _{t}$ have the same stationary marginal distribution: $P(X_t=s_i)\,=\,p_i\,=\,P(\epsilon _t=s_i)$ for all $i\in \mathcal S$. Their serial dependence structure is characterized by a set of Yule-Walker-type equations for the serial dependence measure Cohen’s $\kappa $ from (3) [11]:

$$\begin{aligned} \textstyle \kappa (h)\ =\ \sum _{j=1}^{{\text {p}}} \phi _j\, \kappa (|h-j|)\ +\ \sum _{i=0}^{{\text {q}}-h} \varphi _{i+h}\, r(i)\qquad \text {for } h\ge 1, \end{aligned}$$

(8)

where the r(i) satisfy $r(i)\, =\, \sum _{j=\max {\{0,i-{\text {p}}\}}}^{i-1} \phi _{i-j}\cdot r(j)\, +\, \varphi _i\,\mathbbm {1}(0\le i\le {\text {q}})$. The bivariate distributions at lag h, in turn, are $p_{i|j}(h)\, =\, p_i\, +\, \kappa (h)\,(\delta _{i,j}-p_i)$. Consequently, see (4), we always have $\vartheta (h)=\kappa (h)$ for NDARMA processes.

Finally, [9] showed that an NDARMA process is $\phi $-mixing with exponentially decreasing weights such that the CLT on p. 200 in [2] is applicable.

Appendix 2

From Sect. 2, we know that $\widehat{\text {IQV}}$ is asymptotically normally distributed. Furthermore, with $f(y)=1-\sqrt{1-y}$, it holds that $f(\text {IQV})=\text {CNV}$. Since the derivative of f equals $f'(y)=1/\big (2\,\sqrt{1-y}\big )=1/\big (2\,(1-f(y))\big )$, the Delta method immediately implies that the asymptotic variance of $\widehat{\text {CNV}}$ equals $\sigma _{\text {CNV}}^2/n$ with

$$\begin{aligned} \sigma _{\text {CNV}}^2\ =\ \frac{\sigma _{\text {IQV}}^2}{4\,(1-\text {IQV})} \ \overset{(6)}{=}\ c_\vartheta \, \left( \tfrac{m+1}{m}\right) ^2\,\frac{s_3({\varvec{p}}) - s_2^2({\varvec{p}})}{(1-\text {CNV})^2}. \end{aligned}$$

(9)

Note that (9) also implies the relation $\sigma _{\text {IQV}}^2\,=\,4\,(1-\text {CNV})^2\,\sigma _{\text {CNV}}^2$.

To derive an approximate bias correction for $\widehat{\text {CNV}}$, we have to go back to the asymptotic result (5) for $\sqrt{n}\,\big (\hat{{\varvec{p}}}-{\varvec{p}}\big )$. Define $g({\varvec{x}})=\frac{m+1}{m}\,(1-\sum _{i=0}^{m} x_i^2)$ such that $g({\varvec{p}}) = \text {IQV}$ and $f\big (g({\varvec{p}})\big ) = \text {CNV}$. Our approach is to consider the second-order Taylor expansion of $f\big (g({\varvec{x}})\big )$, where $f\big (g({\varvec{x}})\big )$ satisfies

$$ \begin{array}{rl} \frac{\partial }{\partial x_k}\,f\big (g({\varvec{x}})\big )\ =&{} f'\big (g({\varvec{x}})\big )\,\frac{\partial }{\partial x_k}\,g({\varvec{x}}),\\ \frac{\partial ^2}{\partial x_k\,\partial x_l}\,f\big (g({\varvec{x}})\big )\ =&{} f''\big (g({\varvec{x}})\big )\,\big (\frac{\partial }{\partial x_k}\,g({\varvec{x}})\big )\,\big (\frac{\partial }{\partial x_l}\,g({\varvec{x}})\big ) \ +\ f'\big (g({\varvec{x}})\big )\,\frac{\partial ^2}{\partial x_k\,\partial x_l}\,g({\varvec{x}}). \end{array} $$

Here,

$$ \textstyle f''(y)=\frac{1/4}{(1-f(y))^3},\quad \tfrac{\partial }{\partial x_k}\,g({\varvec{x}})=-2\,\frac{m+1}{m}\,x_k,\ \tfrac{\partial ^2}{\partial x_k\,\partial x_l}\,g({\varvec{x}}) = -2\,\frac{m+1}{m}\,\delta _{k,l}. $$

Hence, the matrix $\mathbf{H }$, defined as the Hessian of $f\big (g({\varvec{x}})\big )$ evaluated in ${\varvec{x}}={\varvec{p}}$, has the entries

$$ h_{k,l} \ =\ \frac{(\frac{m+1}{m})^2\,p_k\,p_l}{(1-\text {CNV})^3} \ -\ \frac{\frac{m+1}{m}\,\delta _{k,l}}{1-\text {CNV}}. $$

Since $E[\hat{{\varvec{p}}}-{\varvec{p}}]={\varvec{0}}$, the second-order Taylor expansion implies that $n\,\Big (E\big [f\big (g(\hat{{\varvec{p}}})\big )\big ]-f\big (g({\varvec{p}})\big )\Big )\,\approx \, \frac{1}{2}\,E\big [\sqrt{n}(\hat{{\varvec{p}}}-{\varvec{p}})^\top \,\mathbf{H }\,\sqrt{n}(\hat{{\varvec{p}}}-{\varvec{p}})\big ]$. Thus, using (5), a bias correction for $n\,\widehat{\text {CNV}}$ is computed as

$$ \textstyle \tfrac{1}{2}\sum \limits _{k=0}^{m} h_{k,k}\,\sigma _{k,k} \ +\ \sum \limits _{k=0}^{m-1} \sum \limits _{l=k+1}^{m} h_{k,l}\,\sigma _{k,l} \ =\ \tfrac{1}{2}\sum \limits _{k,l=0}^m\frac{(\frac{m+1}{m})^2\,p_k\,p_l}{(1-\text {CNV})^3}\,\sigma _{k,l} \ -\ \tfrac{1}{2}\sum \limits _{k=0}^{m} \frac{\frac{m+1}{m}}{1-\text {CNV}}\,\sigma _{k,k}. $$

Using (6) and that $\sum _{k=0}^{m} \sigma _{k,k} = c_\kappa \left( 1-s_2({\varvec{p}})\right) $ according to (5), this expression becomes

$$ \tfrac{1}{8}\,\frac{\sigma _{\text {IQV}}^2}{(1-\text {CNV})^3} \ -\ \tfrac{1}{2}\,\frac{c_\kappa \,\text {IQV}}{1-\text {CNV}} \ \overset{(9)}{=}\ \tfrac{1}{2}\,\frac{\sigma _{\text {CNV}}^2}{1-\text {CNV}} \ -\ \tfrac{1}{2}\,\frac{c_\kappa \,\text {CNV}\,(2-\text {CNV})}{1-\text {CNV}}. $$

So the proof of Theorem 1 is complete.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weiß, C.H. (2019). On the Sample Coefficient of Nominal Variation. In: Steland, A., Rafajłowicz, E., Okhrin, O. (eds) Stochastic Models, Statistics and Their Applications. SMSA 2019. Springer Proceedings in Mathematics & Statistics, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-030-28665-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-28665-1_18
Published: 16 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28664-4
Online ISBN: 978-3-030-28665-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics