A Sparse Latent Class Model for Cognitive Diagnosis

Chen, Yinyin; Culpepper, Steven; Liang, Feng

doi:10.1007/s11336-019-09693-2

A Sparse Latent Class Model for Cognitive Diagnosis

Theory and Methods
Published: 11 January 2020

Volume 85, pages 121–153, (2020)
Cite this article

Psychometrika Aims and scope Submit manuscript

1646 Accesses
35 Citations
Explore all metrics

Abstract

Cognitive diagnostic models (CDMs) are latent variable models developed to infer latent skills, knowledge, or personalities that underlie responses to educational, psychological, and social science tests and measures. Recent research focused on theory and methods for using sparse latent class models (SLCMs) in an exploratory fashion to infer the latent processes and structure underlying responses. We report new theoretical results about sufficient conditions for generic identifiability of SLCM parameters. An important contribution for practice is that our new generic identifiability conditions are more likely to be satisfied in empirical applications than existing conditions that ensure strict identifiability. Learning the underlying latent structure can be formulated as a variable selection problem. We develop a new Bayesian variable selection algorithm that explicitly enforces generic identifiability conditions and monotonicity of item response functions to ensure valid posterior inference. We present Monte Carlo simulation results to support accurate inferences and discuss the implications of our findings for future SLCM research and educational testing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Reduced RUM as a Logit Model: Parameterization and Constraints

Article 03 April 2015

Identifiability and Cognitive Diagnosis Models

Exploring Joint Maximum Likelihood Estimation for Cognitive Diagnosis Models

Notes

https://openpsychometrics.org/_rawdata/.

References

Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Article Google Scholar
Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37, 3099–3132.
Article Google Scholar
Carreira-Perpiñán, M., & Renals, S. (2000). Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Computation, 12, 141–152.
Article PubMed Google Scholar
Chen, Y., Culpepper, S. A., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q-matrix. Psychometrika, 83, 89–108.
Article PubMed Google Scholar
Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.
Article PubMed Google Scholar
Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74(4), 633–665.
Article Google Scholar
Cox, D., Little, J., O’Shea, D., & Sweedler, M. (1994). Ideals, varieties, and algorithms. American Mathematical Monthly, 101(6), 582–586.
Google Scholar
Culpepper, S. A. (2019). Estimating the cognitive diagnosis Q matrix with expert knowledge: Application to the fraction-subtraction dataset. Psychometrika, 84, 333–357.
Article PubMed Google Scholar
Dang, N. V. (2015). Complex powers of analytic functions and meromorphic renormalization in QFT. arXiv preprint arXiv:1503.00995.
Davier, M. (2005). A general diagnostic model applied to language testing data. ETS Research Report Series, 2005(2)
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.
Article Google Scholar
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.
Article Google Scholar
DiBello, L. V., Stout, W. F., & Roussos, L. A. (1995). Unified cognitive/psychometric diagnostic assessment likelihood-based classification techniques. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Cognitively Diagnostic Assessment, Chapter 15 (pp. 361–389). New York: Routledge.
Google Scholar
Fang, G., Liu, J., & Ying, Z. (2019). On the identifiability of diagnostic classification models. Psychometrika, 84(1), 19–40.
Article PubMed Google Scholar
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.
Article Google Scholar
Gyllenberg, M., Koski, T., Reilink, E., & Verlaan, M. (1994a). Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Probability, 31(2), 542–548.
Article Google Scholar
Gyllenberg, M., Koski, T., Reilink, E., & Verlaan, M. (1994b). Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Probability, 31(2), 542–548.
Article Google Scholar
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321.
Article Google Scholar
Hagenaars, J. A. (1993). Loglinear models with latent variables (Vol. 94). Newbury Park, CA: Sage Publications Inc.
Book Google Scholar
Hartz, S.M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Ph.D. thesis, University of Illinois at Urbana-Champaign.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.
Article Google Scholar
Kruskal, J. B. (1976). More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41(3), 281–293.
Article Google Scholar
Kruskal, J. B. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Its Applications, 18(2), 95–138.
Article Google Scholar
Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning q-matrix. Bernoulli, 19(5A), 1790–1817.
Article PubMed PubMed Central Google Scholar
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.
Article Google Scholar
Mityagin, B. (2015). The zero set of a real analytic function. arXiv preprint arXiv:1512.07276.
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guilford Press.
Google Scholar
Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society, Series C (Applied Statistics), 51(3), 337–350.
Article Google Scholar
Tatsuoka, K. K. (1984). Analysis of errors in fraction addition and subtraction problems. Final report, Technical report.
Teicher, H., et al. (1961). Identifiability of mixtures. The Annals of Mathematical Statistics, 32(1), 244–248.
Article Google Scholar
Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287.
Article PubMed Google Scholar
Xu, G. (2017). Identifiability of restricted latent class models with binary responses. The Annals of Statistics, 45(2), 675–707.
Article Google Scholar
Xu, G., & Shang, Z. (2017). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113, 1284–1295.
Article Google Scholar
Yakowitz, S. J., & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39, 209–214.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Illinois at Urbana-Champaign, 725 South Wright Street, Champaign, IL , 61820, USA
Yinyin Chen, Steven Culpepper & Feng Liang
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana–Champaign, 725 South Wright Street, Champaign, IL , 61820, USA
Steven Culpepper

Authors

Yinyin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Steven Culpepper
View author publications
You can also search for this author in PubMed Google Scholar
Feng Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven Culpepper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Connections of SLCM to Popular CDMs

In this section, we discuss the connections between SLCM and popular CDMs. To simplify the expression, we assume the relevant skills of item j are $k_1, \ldots , k_R$, i.e., $q_{jk_1} = \cdots = q_{jk_R} = 1, \, q_{jk}=0, \text { otherwise}$.

Example 3

(DINA model). The deterministic input noisy output “and” gate model (Haertel 1989; Junker and Sijtsma 2001) is a conjunctive model. It assumes that a student is most capable of answering question j positively only if he/she masters all of its relevant skills. The item response function takes the following form,

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) =(1-s_j)^{\varvec{1}({\varvec{\alpha }\succeq \varvec{q}_j})} {g_j}^{\varvec{1}(\varvec{\alpha }\nsucceq \varvec{q}_j)}, \end{aligned}$$

where $s_j = {\mathbb {P}}(Y_j = 1 | \varvec{\alpha }\succeq \varvec{q}_j)$ is the slipping parameter, which is the probability that a student capable for item j but response negatively and $g_j= {\mathbb {P}}(Y_j = 1 | \varvec{\alpha }\nsucceq \varvec{q}_j)$ is the guessing parameter, which is the probability that a non-master answers positively. It is assumed that $g_j < 1 - s_j$ in most applications. The DINA model can be written as

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{\beta }_j) = \Psi \left( \beta _{j,0} +\beta _{j,k_1\ldots k_R} \alpha _{k_1}\ldots \alpha _{k_R}\right) \end{aligned}$$

where only one coefficient, besides the intercept, in $\varvec{\beta }_j$ is active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1\ldots k_R} = 1, \quad \delta _{j,p}=0\, \, \text { otherwise}. \end{aligned}$$

The guessing parameter $g_j$ and slipping parameter $s_j$ is given by,

$$\begin{aligned} g_j = \Psi (\beta _{j,0}),\quad s_j = 1 - \Psi (\beta _{j,0}+\beta _{j,k_1\ldots k_R}). \end{aligned}$$

Example 4

(DINO model). The deterministic input noisy output “or” gate model (Templin and Henson 2006) is a disjunctive model, which assumes that a student is capable to answer question j positively if at least one of the relevant skills is mastered. The item response function is

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j)= (1-s_j)^{\varvec{1}(\varvec{\alpha }^\mathrm{T} \varvec{q}_j >0 )} {g_j}^{{\varvec{1}(\varvec{\alpha }^\mathrm{T} \varvec{q}_j = 0 )}} \end{aligned}$$

where $s_j$ and $g_j$ are defined the same as in DINA, and $g_j < 1 - s_j$ is assumed. The DINO model can be reparameterized as

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{\beta }_j) = \Psi \left( \beta _{j,0}+\sum _{r=1}^{R} \beta _{j,k_r} \alpha _{k_r}+\underset{k_r> k_r^\prime }{\sum \sum }\beta _{j,k_rk_r^\prime }\alpha _{k_r}\alpha _{k_r^\prime }+\cdots +\beta _{j,k_1\ldots k_R}\prod _{r=1}^R \alpha _{k_r}\right) \end{aligned}$$

where the coefficients containing only the relevant skills are active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1} = \cdots = \delta _{j,k_R} = \delta _{j,k_1k_2} = \cdots = \delta _{j,k_{R-1}k_{R}} = \cdots = \delta _{j,k_1\ldots k_R} = 1,\quad \delta _{j,p}=0\, \, \text { otherwise} \end{aligned}$$

The coefficients with odd orders are all equal and positive. The coefficients with even orders are the additive inverse of those with odd orders.

$$\begin{aligned}&\beta _{j,k_1} = \beta _{j,k_2} = \cdots = \beta _{j,k_R} = \beta _{j,k_1k_2k_3} = \cdots = \beta _{j,k_{R-2}k_{R-1}k_{R}}=\cdots \\ =&-\beta _{j,k_1k_2} = \cdots = -\beta _{j,k_{R-1}k_R} = -\beta _{j,k_1k_2k_3k_4} = \cdots =-\beta _{j,k_{R-3}k_{R-2}k_{R-1}k_R} = \cdots . \end{aligned}$$

The guessing parameter $g_j$ is in the same form of the one in DINA model and slipping parameter $s_j$ is given by $1 - \Psi (\varvec{a}_{\varvec{\alpha }}^\mathrm{T} \varvec{\beta }_j)$, with $\varvec{\alpha }$ satisfying $\varvec{\alpha }^\mathrm{T} \varvec{q}_j > 0$, which is equivalent to $ 1 - \Psi (\beta _{j,0}+\beta _{j,k_r})$, $r = 1,\ldots , R$,

$$\begin{aligned} g_j = \Psi (\beta _{j,0}),\quad s_j = 1 - \Psi (\beta _{j,0}+\beta _{j,k_r}), r = 1,\ldots , R. \end{aligned}$$

Example 5

(G-DINA model). The DINA model is generalized to the G-DINA model by de la Torre (2011), which takes the form of

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) = \beta _{j,0}+\sum _{k=1}^{K} \beta _{j,k} q_{jk}\alpha _{k}+\underset{k> k^\prime }{\sum \sum }\beta _{j,kk^\prime }q_{jk}\alpha _{k}q_{jk^\prime }\alpha _{k^\prime }+\cdots +\beta _{j,12\ldots K}\prod _{k=1}^K q_{jk}\alpha _{k}. \end{aligned}$$

By using the identity link in Eq. (1), it can be written as

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{\beta }_j) =\beta _{j,0}+\sum _{r=1}^{R} \beta _{j,k_r} \alpha _{k_r}+\underset{k_r> k_r^\prime }{\sum \sum }\beta _{j,k_rk_r^\prime }\alpha _{k_r}\alpha _{k_r^\prime }+\cdots +\beta _{j,k_1\ldots k_R}\prod _{r=1}^R \alpha _{k_r} \end{aligned}$$

where the coefficients containing only the relevant skills are active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1} = \cdots = \delta _{j,k_R} = \delta _{j,k_1k_2} = \cdots = \delta _{j,k_{R-1}k_{R}} = \cdots = \delta _{j,k_1\ldots k_R} = 1,\quad \delta _{j,p}=0\, \, \text { otherwise}. \end{aligned}$$

Example 6

(NC-RUM model). Under the reduced noncompensatory reparameterized unified model (DiBello et al. 1995; Rupp et al., 2010), attributes have a noncompensatory relationship with observed response. It assumes missing any relevant skill would inflict a penalty on the positive response probability.

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) = b_j \prod _{k=1}^K r_{j,k}^{q_{jk}(1-\alpha _{k})} \end{aligned}$$

$b_j$ is the positive response probability for students who possess all relevant skills and $r_{j,k}$, $0<r_{j,k}<1$, is the penalty for not mastering kth attribute. As pointed by Xu (2017), by using the exponential link function, NC-RUM can be equivalently written as

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{\beta }_j) = \exp \left( \beta _{j,0} + \sum ^{R}_{r=1} \beta _{j,k_r}\alpha _{k_r} \right) , \end{aligned}$$

where the main effects of relevant attributes are active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1} = \cdots \delta _{j,k_R} = 1, \quad \delta _{j,p}=0\, \, \text { otherwise}. \end{aligned}$$

The parameters are given by

$$\begin{aligned} b_j = \exp \left( \beta _{j,0}+\sum _{r=1}^R \beta _{j,k_r}\right) ,\quad r_{j,k} = {\left\{ \begin{array}{ll} \exp (-\beta _{j,k_r}), \quad &{}\text {if } k \in \{k_1, \ldots , k_R\}\\ 1, \quad &{}\text {otherwise}\end{array}\right. } \end{aligned}$$

Example 7

(C-RUM model). Compensatory-RUM (Hagenaars 1993; Maris 1999) is given by,

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) = \frac{\exp \left( \beta _{j,0} + \sum _{k=1}^K \beta _{j,k}q_{jk}\alpha _{k}\right) }{\exp \left( \beta _{j0} + \sum _{k=1}^K \beta _{j,k}q_{jk}\alpha _{k}\right) +1}. \end{aligned}$$

Equivalently,

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) = \text {logit}^{-1} \left( \beta _{j,0} + \sum ^{R}_{r=1} \beta _{j,k_r}\alpha _{k_r} \right) \end{aligned}$$

where $\Psi (\cdot )$ is the inverse of the logit function and the main effects of relevant attributes are active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1} = \cdots \delta _{j,k_R} = 1, \quad \delta _{j,p}=0\, \, \text { otherwise}. \end{aligned}$$

B Proof of Theorems

In this section, we provide the proof of Theorems 4 and 2.

1.1 B.1 Proof of Theorem 4

We first introduce Lemma 5 (Mityagin 2015; Dang 2015) which shows that the solution set of a real analytic function is of Lebesgue measure zero if the function is not constantly 0. Then in Proposition 3, we show that $G_{\varvec{D}}(\varvec{B}) : = \det [\varvec{M}(\varvec{D}, \varvec{B})]$ is a real analytic function, and in Proposition 4, we show that $G_{\varvec{D}}(\varvec{B})$ is not constantly zero for any $\varvec{B}\in \Omega _{\varvec{D}} (\varvec{B})$ if $\varvec{D} \in {\mathbb {D}}_g$, so that Lemma 5 applies and Theorem 4 is proved.

Lemma 5

(Mityagin 2015; Dang 2015) If $f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}$ is a real analytic function which is not identically zero, then the set $\{\varvec{x} :f(\varvec{x}) =0\}$ has Lebesgue measure zero.

Proposition 3

$G_{\varvec{D}}(\varvec{B}) = \det [\varvec{M}(\varvec{D}, \varvec{B})]: \Omega _{\varvec{D}} \rightarrow {\mathbb {R}}$ is a real analytic function of $\varvec{B}$.

Proof

$G_{\varvec{D}}(\varvec{B})$ is a composition function:

$$\begin{aligned} G_{\varvec{D}}(\varvec{B}) = \det [\varvec{M}] = h(\varvec{\theta }_{\varvec{\alpha }_0},\ldots , \varvec{\theta }_{\varvec{\alpha }_{2^K-1}} ) = h \left( \Psi (\varvec{B}_{\varvec{D}} \varvec{a}_{\varvec{\alpha }_0}),\ldots , \Psi (\varvec{B}_{\varvec{D}} \varvec{a}_{\varvec{\alpha }_{2^K-1}})\right) \end{aligned}$$

where $h(\varvec{\theta }): [0,1]^{K\times 2^K}\rightarrow {\mathbb {R}}$ is a polynomial function and $\Psi (\cdot )$ is a CDF.

$\Psi (\cdot )$ is a real analytic function because a CDF is an integral of a real analytic function, and $h(\varvec{\theta })$ is also a real analytic function since it is a polynomial. Therefore, the composition function $G_{D}(\varvec{B})$ is a real analytic function due to the fact that the composition of real analytic functions is a real analytic function. $\square $

Proposition 4

If $\varvec{D}\in {\mathbb {D}}_g$, there exists some $\varvec{B}\in \Omega _{\varvec{D}} (\varvec{B})$, s.t., $G_{\varvec{D}}(\varvec{B})\ne 0$.

Proof

Let $\varvec{B}^1 = (\varvec{1}_K, \varvec{I}_K, \varvec{0}) \in \Omega _{\varvec{D}}(\varvec{B}), \forall \varvec{D} \in {\mathbb {D}}_g$. As shown in Example 2, $\varvec{M}(\varvec{D}, \varvec{B}^1)$ is of full rank, so that $G_{\varvec{D}}(\varvec{B}^1 )\ne 0$. $\square $

Remark 7

$G_{\varvec{D}}(\varvec{B})\not \equiv 0$ is not a trivial conclusion holds for all kinds of $\varvec{D}$. $\varvec{D} \in {\mathbb {D}}$ is a sufficient condition for $G_{\varvec{D}}(\varvec{B})\not \equiv 0$. If $\varvec{D} \not \in {\mathbb {D}}$, it is possible $G_{\varvec{D}}(\varvec{B})\equiv 0$. See the following example.

Example 8

Assume $K=3$ and the main effect of the first skill is inactive for all items, i.e., $\delta _{j,1}=0, \, \forall 1\le j \le K$, then $\varvec{D}_{3\times 8}$ takes the form

$$\begin{aligned} \begin{bmatrix} * &{}\quad 0 &{}\quad * &{}\quad * &{}\quad * &{}\quad * &{}\quad * &{}\quad *\\ * &{}\quad 0 &{}\quad 1 &{}\quad * &{}\quad * &{}\quad * &{}\quad * &{}\quad *\\ * &{}\quad 0 &{}\quad * &{}\quad 1 &{}\quad * &{}\quad * &{}\quad * &{}\quad * \\ \end{bmatrix}. \end{aligned}$$

For any $\varvec{B}\in \Omega _{\varvec{D}}(\varvec{B})$ and any response $\varvec{y} \in \{0,1\}^3$,

$$\begin{aligned} \varvec{M}_{\varvec{\alpha }= (0,0,0),\varvec{y}}(\varvec{D}, \varvec{B}) = \varvec{M}_{\varvec{\alpha }= (1,0,0),\varvec{y}}(\varvec{D}, \varvec{B}). \end{aligned}$$

So the two rows of $\varvec{M}(\varvec{D}, \varvec{B})$ are identical, and $\varvec{M}(\varvec{D}, \varvec{B})$ is not full row rank, i.e., $\det [\varvec{M}(\varvec{D}, \varvec{B})] \equiv 0$.

By Lemma 5 and Propositions 3 and 4, Theorem 4 is proved.

1.2 B.2 Proof of Theorem 2

Proof

As shown in Example 2, for any $\varvec{B} \in \Omega _{\varvec{D}_s} (\varvec{B})$, the corresponding class-response matrix is of full rank, $rank(\varvec{M}(\varvec{D}_s, \varvec{B})) = 2^K$, holds if and only if, for each item, the success probabilities for students with the relevant skill and those without the relevant skill are different. In fact, if the two probabilities are the same, the monotonicity constraints would be violated. Then, using notation from Sect. 3.5, we conclude that under condition (S1) and the monotonicity constraints, $rank(\varvec{M}_1) = rank(\varvec{M}_2) = 2^K$.

For $\varvec{M}_3$, as each element is nonnegative and each row sums to 1. Under condition (S2), there must exist one item j, such that $\theta _{j,\varvec{\alpha }_s} \ne \theta _{j, \varvec{\alpha }_t}$, so $rank(\varvec{M}_3) \ge 2$.

$\square $

C Initialization from the Identifiable Space

Initialization of the sparsity matrix $\varvec{\Delta }^{(0)}_{J\times 2^K}$:

1.
activate the intercepts. Fix the entries in the first column of $\varvec{\Delta }^{(0)}$ (i.e., $\varvec{\Delta }^{(0)}_{\cdot 1}$) as 1. Denote the remaining $J\times (2^K-1)$ sub-matrix as ${{\tilde{\varvec{\Delta }}}}^{(0)}$.
2.
construct $\varvec{D}^{(0)}_1$ and $\varvec{D}^{(0)}_2$. Fix the first 2K rows of $ {{\tilde{\varvec{\Delta }}}}^{(0)}$ to be
$$\begin{aligned}\left( \begin{array}{ll} \varvec{I}_K &{} \varvec{0} \\ \varvec{I}_K &{}\varvec{0} \\ \end{array}\right) .\end{aligned}$$
3.
construct ${{\tilde{\varvec{\Delta }}}}^{'(0)}$.
1. (a)
  Randomly select K indexes, $j_1,\ldots ,j_K$, from the set $\{2K+1, \ldots , J\}$ with replacement and set ${\tilde{\varvec{\Delta }}}^{(0)}_{j_k,k} = 1.$
2. (b)
  Sample the remaining entries in $ {{\tilde{\varvec{\Delta }}}}^{'(0)}$ by
  $$\begin{aligned} \delta _{jp}^{(0)}|w^{(0)}\sim \text {Bernoulli}(w^{(0)}), \quad j>2K,\,\, (j,p) \notin \{(j_k,k)\}_{k=1}^K \end{aligned}$$
  where $w^{(0)}\sim \text {Beta}(w_0, w_1)$ and $w_0, w_1$ are the parameters of the prior distribution and are treated as fixed.
3. (c)
  Check the row sum. If any row of $\varvec{\Delta }^{'(0)}$ sums to 0, then we randomly pick up an entry on this row and set it at 1.
4.
shuffle the rows. Draw a $J\times J$ permutation matrix $\varvec{P} = (\varvec{e}_{j_1},\ldots ,\varvec{e}_{j_J})$ where $(j_1,\ldots , j_J)$ is a permutation of $(1,\ldots ,J)$, and let ${{\tilde{\varvec{\Delta }}}}^{(0)} \leftarrow \varvec{P} {{\tilde{\varvec{\Delta }}}}^{(0)}$.

The above initialization is designed for strict identifiability conditions. To generate a $\varvec{\Delta }$ under the generic identifiability conditions, we just need to enlarge the range of entries sampling from the prior distribution in step 3b. Specifically, we change step 3b to

Sample the remaining entries in ${{\tilde{\varvec{\Delta }}}}^{'(0)}$ by

$$\begin{aligned} \delta _{jp}^{(0)}|w^{(0)}\sim \text {Bernoulli}(w^{(0)}), \quad \, (j,p) \notin \{(j_k,k), (k,k), (K+k,k)\}_{k=1}^K. \end{aligned}$$

Initialization of the coefficients matrix $\varvec{\beta }^{(0)}_{J\times 2^K} = (\varvec{\beta }_1, \ldots , \varvec{\beta }_J)^\mathrm{T}$:

$$\begin{aligned} \beta ^{(0)}_{jp}=0, \text {if } \delta _{jp}^{(0)} = 0, \end{aligned}$$

$$\begin{aligned} \beta ^{(0)}_{jp} | \delta _{jp}^{(0)}=1 \propto {\mathcal {N}}(0, \sigma _{\beta }^2)I(\beta ^{(0)}_{jp} >0). \end{aligned}$$

D Derivation of ${{\tilde{\omega }}}_{jp}$

$$\begin{aligned} \delta _{jp}\left| \varvec{Z}_j,\varvec{\alpha },\varvec{\beta }_{j(p)},\omega ,\sigma _\beta ^2 \right. \sim \text {Bernoulli}\left( {{\tilde{\omega }}}_{jp}\right) \end{aligned}$$

$$\begin{aligned} {{\tilde{\omega }}}_{jp} = \frac{\omega \int _{L}^\infty p\left( \varvec{Z}_j\left| \varvec{\alpha },\varvec{\beta }_j\right. \right) p\left( \beta _{jp}\left| \sigma _\beta ^2\right. \right) \hbox {d}\beta _{jp} }{\omega \int _{L}^\infty p\left( \varvec{Z}_j\left| \varvec{\alpha },\varvec{\beta }_j\right. \right) p\left( \beta _{jp}\left| \sigma _\beta ^2\right. \right) \hbox {d}\beta _{jp} + \left( 1-\omega \right) p\left( \varvec{Z}_j\left| \varvec{\alpha },\varvec{\beta }_{j(p)},\beta _{jp}=0\right. \right) } \end{aligned}$$

The numerator is

$$\begin{aligned}&\omega \int _{L}^\infty p\left( \varvec{Z}_j\left| \varvec{\alpha },\varvec{\beta }_j\right. \right) p\left( \beta _{jp}\left| \sigma _\beta ^2\right. \right) \hbox {d}\beta _{jp} \\&\quad = \omega \int _{L}^\infty \left( 2\pi \right) ^{-\frac{N}{2}} \exp \left[ -\frac{1}{2}\left( \widetilde{\varvec{Z}}_j - \varvec{A}_p \beta _{jp}\right) ^\prime \left( \widetilde{\varvec{Z}}_j - \varvec{A}_p \beta _{jp}\right) \right] \\&\qquad \cdot {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1}\left( 2\pi \right) ^{-\frac{1}{2}} \frac{1}{\sigma _\beta } \exp \left( -\frac{\beta _{jp}^2}{2\sigma _\beta ^2}\right) \hbox {d}\beta _{jp} \\&\quad = \left( 2\pi \right) ^{-\frac{N}{2}} {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega }{\sigma _\beta }\\&\qquad \times \int _{L}^\infty \left( 2\pi \right) ^{-\frac{1}{2}} \exp \left\{ -\frac{1}{2}\left[ \left( \varvec{A}'_p\varvec{A}_p+\frac{1}{\sigma _\beta ^2}\right) \beta _{jp}^2-2\varvec{A}'_p\widetilde{\varvec{Z}}_j \beta _{jp} + \widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j\right] \right\} \hbox {d}\beta _{jp} \\&\quad = \left( 2\pi \right) ^{-\frac{N}{2}} {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left[ -\frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j +\frac{1}{2}{{\tilde{\sigma }}}_p^2\left( \varvec{A}'_p\widetilde{\varvec{Z}}_j \right) ^2\right] \\&\qquad \times \int _{L}^\infty \left( 2\pi \right) ^{-\frac{1}{2}}\frac{1}{{{\tilde{\sigma }}}_p} \exp \left[ -\frac{1}{2{{\tilde{\sigma }}}_p^2}\left( \beta _{jp}- {{\tilde{\sigma }}}_p^2\varvec{A}'_p\widetilde{\varvec{Z}}_j \right) ^2\right] \hbox {d}\beta _{jp} \\&\quad = \left( 2\pi \right) ^{-\frac{N}{2}} {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left( -\frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j + \frac{1}{2} \frac{{{\tilde{\mu }}}^2_{jp}}{{{\tilde{\sigma }}}^2_p} \right) \int _{-\frac{{{\tilde{\mu }}}_{jp}-L}{{{\tilde{\sigma }}}_p}}^\infty \left( 2\pi \right) ^{-\frac{1}{2}} \\&\qquad \times \exp \left[ -\frac{1}{2}\left( \frac{\beta _{jp}- {{\tilde{\mu }}}_{jp}}{{{\tilde{\sigma }}}_p}\right) ^2\right] \hbox {d}\left( \frac{\beta _{jp}- {{\tilde{\mu }}}_{jp}}{{{\tilde{\sigma }}}_p}\right) \\&\quad = \left( 2\pi \right) ^{-\frac{N}{2}} \exp \left( - \frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j \right) {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left( \frac{1}{2} \frac{{{\tilde{\mu }}}^2_{jp}}{\sigma ^2_p} \right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp} -L}{{{\tilde{\sigma }}}_p} \right) \\ \end{aligned}$$

where ${{\tilde{\sigma }}}_p^2 = \left( \varvec{A}_p^\prime \varvec{A}_p + \sigma _\beta ^{-2}\right) ^{-1}$ and ${{\tilde{\mu }}}_{jp} = \varvec{A}_p^\prime \widetilde{\varvec{Z}}_j\left( \varvec{A}_p^\prime \varvec{A}_p + \sigma _\beta ^{-2}\right) ^{-1}$. Accordingly, ${{\tilde{\omega }}}_{jp}$ is,

$$\begin{aligned} {{\tilde{\omega }}}_{jp}&= \frac{ \left( 2\pi \right) ^{-\frac{N}{2}} \exp \left( - \frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j \right) {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left( \frac{1}{2} \frac{{{\tilde{\mu }}}^2_{jp}}{\sigma ^2_p} \right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp}-L}{{{\tilde{\sigma }}}_p} \right) }{ \left( 2\pi \right) ^{-\frac{N}{2}} \exp \left( - \frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j \right) {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left( \frac{1}{2} \frac{{{\tilde{\mu }}}^2_{jp}}{\sigma ^2_p} \right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp} -L}{{{\tilde{\sigma }}}_p}\right) + (1-\omega )\left( 2\pi \right) ^{-\frac{N}{2}} \exp \left( -\frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j \right) } \\&= \frac{ {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1}\omega \left( \frac{{{\tilde{\sigma }}}_p}{\sigma _\beta }\right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp} -L}{{{\tilde{\sigma }}}_p}\right) \exp \left( \frac{1}{2}\frac{{\tilde{\mu }}_{jp}^2}{{\tilde{\sigma }}_p^2}\right) }{ {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \omega \left( \frac{{{\tilde{\sigma }}}_p}{\sigma _\beta }\right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp} -L}{{{\tilde{\sigma }}}_p}\right) \exp \left( \frac{1}{2}\frac{{\tilde{\mu }}_{jp}^2}{{\tilde{\sigma }}_p^2}\right) + 1-\omega } \end{aligned}$$

E Proof of Lower Bound L

In this section, we derive the lower bound L of $\beta _{jp}$ (Proposition 1).

Suppose at time t, we have a $\varvec{B}^{(t)}\in {\mathcal {B}}(\varvec{B})$ satisfying the monotonocity constraints (4), and we only sample $\beta _{jp}$ at time $t+1$ and leave any other coefficient the same as the one at time t, i.e., $\beta _{js}^{(t+1)} = \beta _{js}^{(t)},\quad \forall s\ne p$.

In what follows, we introduce some notations.

We denote $\beta _{jp}$ and $\delta _{jp}$ as $\beta _p$ and $\delta _p$, respectively. That is, we omit the subscript of item, j, as the lower bound of coefficient $\beta _{jp}$ does not depend on any other coefficient of other items.

Let $\gamma _{\varvec{\alpha }} = \varvec{\beta }^\mathrm{T} \varvec{a}_{\varvec{\alpha }} - \beta _0 = \Psi ^{-1}(\theta _{\varvec{\alpha }}) - \beta _0$ be the sum of the linear component excluding the intercept for class $\varvec{\alpha }$. Further, let $ \gamma _{\varvec{\alpha }, -p} = {\left\{ \begin{array}{ll} \gamma _{\varvec{\alpha }} - \beta _p &{} \varvec{\alpha }\in {\mathbb {L}}^p_1 \\ \gamma _{\varvec{\alpha }} &{} \varvec{\alpha }\in {\mathbb {L}}^p_0, \end{array}\right. }$ denote the sum of the linear component excluding the intercept and the pth coefficients for class $\varvec{\alpha }$, where ${\mathbb {L}}^p_1= \{\varvec{\alpha }| \varvec{a}_{\varvec{\alpha },p} = 1\}$ and ${\mathbb {L}}^p_0 = \{\varvec{\alpha }| \varvec{a}_{\varvec{\alpha },p} = 0, \varvec{\alpha }\succ \varvec{\alpha }_0\}$.

We rewrite the monotonicity constraints (4) as follows,

where ${\mathbb {S}}_0 = \{\varvec{\alpha }| \varvec{\alpha }\nsucceq \varvec{q}, \, \varvec{\alpha }\succ \varvec{\alpha }_0= \varvec{0} \}$ is the set the classes that not mastering all the relevant skills, and ${\mathbb {S}}_1 = \{\varvec{\alpha }| \varvec{\alpha }\succeq \varvec{q}\} $ is the set of classes mastering all the relevant skills.

Note that $\Psi (\cdot )$ is a strictly increasing function, we have the following equivalent form of the monotonicity constraints ($\star $):

$$\begin{aligned} \min _{\varvec{\alpha }\succ \varvec{\alpha }_0} \gamma _{\varvec{\alpha }}&\ge \gamma _{\varvec{\alpha }_0} = 0 , \end{aligned}$$

(11)

$$\begin{aligned} \gamma _ {\varvec{q}} = \max _{\varvec{\alpha }\in {\mathbb {S}}_1} \gamma _{\varvec{\alpha }} = \min _{\varvec{\alpha }\in {\mathbb {S}}_1} \gamma _{\varvec{\alpha }}&> \max _{\varvec{\alpha }\in {\mathbb {S}}_0} \gamma _{\varvec{\alpha }}. \end{aligned}$$

(12)

In SLCM, $\varvec{q}$ is uniquely determined by the structure vector $\varvec{\delta }$. Mathematically, $\varvec{q} = \arg \min _{\varvec{\alpha }: \varvec{a}_{\varvec{\alpha }\succeq \varvec{\delta }}} |\varvec{\alpha }|$, where $|\cdot |$ is the cardinality. By such definition, $\gamma _ {\varvec{q}} = \max _{\varvec{\alpha }\in {\mathbb {S}}_1} \gamma _{\varvec{\alpha }} = \min _{\varvec{\alpha }\in {\mathbb {S}}_1} \gamma _{\varvec{\alpha }} $ always holds, therefore, to verify (12), we only need to check,

$$\begin{aligned} \gamma _ {\varvec{q}} > \max _{\varvec{\alpha }\in {\mathbb {S}}_0} \gamma _{\varvec{\alpha }}. \end{aligned}$$

(13)

In the following two remarks, we list some observations that are useful in the proof.

Remark 8

1.
${\mathbb {L}}^p_0 \bigcup {\mathbb {L}}^p_1 = {\mathbb {S}}_0 \bigcup {\mathbb {S}}_1 = \{\varvec{\alpha }| \varvec{\alpha }\succ \varvec{\alpha }_0\}$
2.
$\varvec{a}_{\varvec{q}, p}= 1$$\Rightarrow $$ {\mathbb {S}}_1 \subseteq {\mathbb {L}}^p_1$, $ {\mathbb {S}}_0 \supseteq {\mathbb {L}}^p_0$.

Remark 9

1.
$\forall \varvec{\alpha }, \qquad \gamma ^{(t)}_{\varvec{\alpha }, -p} = \gamma ^{(t+1)}_{\varvec{\alpha }, -p} := \gamma _{\varvec{\alpha }, -p}$
2.
$ \forall \varvec{\alpha }\in {\mathbb {L}}^p_0, \qquad \gamma ^{(t+1)}_{\varvec{\alpha }} =\gamma ^{(t)}_{\varvec{\alpha }}$
3.
$\forall \varvec{\alpha }\in {\mathbb {L}}^p_1, \qquad \gamma ^{(t+1)}_{\varvec{\alpha }} =\gamma ^{(t)}_{\varvec{\alpha }} - \varvec{\beta }_p^{(t)} +\varvec{\beta }_p^{(t+1)}$
4.
$\forall \varvec{\alpha }_1, \varvec{\alpha }_2 \in {\mathbb {L}}^p_1$, $\qquad \varvec{\gamma }^{(t)}_{\varvec{\alpha }_1} > \varvec{\gamma }^{(t)}_{\varvec{\alpha }_2} $$\Rightarrow $$\varvec{\gamma }^{(t+1)}_{\varvec{\alpha }_1} > \varvec{\gamma }^{(t+1)}_{\varvec{\alpha }_2}$
5.
$\forall \varvec{\alpha }_1, \varvec{\alpha }_2 \in {\mathbb {L}}^p_0$, $\qquad \varvec{\gamma }^{(t)}_{\varvec{\alpha }_1} >\varvec{\gamma }^{(t)}_{\varvec{\alpha }_2} $$\Rightarrow $$\varvec{\gamma }^{(t+1)}_{\varvec{\alpha }_1} > \varvec{\gamma }^{(t+1)}_{\varvec{\alpha }_2} $

In the following lemma, we give the sufficient and necessary condition for (11).

Lemma 6

(Lower bound 1)

$$\begin{aligned} \min _{\varvec{\alpha }\succ \varvec{\alpha }_0} \gamma ^{(t+1)}_{\varvec{\alpha }} \ge \gamma ^{(t+1)}_{\varvec{\alpha }_0} = 0 \end{aligned}$$

holds if and only if

$$\begin{aligned} \beta ^{(t+1)}_p \ge \max _{\varvec{\alpha }\in {\mathbb {L}}^p_1}( -\gamma _{\varvec{\alpha }, -p}). \end{aligned}$$

(14)

Proof

Since $\varvec{B}^{(t)}\in {\mathcal {B}}(\varvec{B})$, we have $\min _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma _{\varvec{\alpha }}^{(t+1)} = \min _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma _{\varvec{\alpha }}^{(t)} \ge 0.$ So we only need to consider $\varvec{\alpha }\in {\mathbb {L}}^p_1$, such that

$$\begin{aligned} \min _{\varvec{\alpha }\in {\mathbb {L}}^p_1}\gamma _{\varvec{\alpha }}^{(t+1)} = \min _{\varvec{\alpha }\in {\mathbb {L}}^p_1}(\gamma ^{(t)}_{\varvec{\alpha }, -p} + \beta ^{(t+1)}_p )\ge 0, \end{aligned}$$

which holds if and only if (14) holds. $\square $

We show the relationship between $\gamma ^{(t+1)}_{\varvec{q}^{(t+1)}}$ and $\gamma ^{(t)}_{\varvec{q}^{(t)}}$ in the following lemma.

Lemma 7

$$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma _{\varvec{q}^{(t)}, -p} + \beta ^{(t+1)}_p. \end{aligned}$$

Proof

If $\varvec{q}^{(t+1)} = \varvec{q}^{(t)}$, $\gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma ^{(t+1)}_{\varvec{q}^{(t)}} = \gamma _{\varvec{q}^{(t)}, -p} + \beta ^{(t+1)}_p.$
If $\varvec{q}^{(t+1)} \succ \varvec{q}^{(t)}$, it implies $\delta _p^{(t)} = 0$ and $\delta _p^{(t+1)} = 1$. Therefore, $\varvec{q}^{(t)}, \varvec{q}^{(t+1)} \in {\mathbb {S}}^{(t)}_1$, so that,
$$\begin{aligned} \gamma ^{(t)}_{\varvec{q}^{(t+1)}} = \gamma ^{(t)}_{\varvec{q}^{(t)}} = \gamma _{\varvec{q}^{(t)}, -p}. \end{aligned}$$
Then,
$$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma _{\varvec{q}^{(t+1)},-p}+\beta ^{(t+1)}_p = \gamma _{\varvec{q}^{(t)}, -p} + \beta ^{(t+1)}_p. \end{aligned}$$
If $\varvec{q}^{(t+1)} \prec \varvec{q}^{(t)}$, it means $\delta _p^{(t)} = 1$, $\beta _p^{(t+1)} = \delta _p^{(t+1)} = 0$, and $\varvec{a}_{\varvec{q}^{(t+1)}} + \varvec{e}_p = \varvec{a}_{\varvec{q}^{(t)}}$. Then,
$$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} =\gamma _{\varvec{q}^{(t+1)},-p} =\gamma _{\varvec{q}^{(t)},-p}= \gamma _{\varvec{q}^{(t)},-p} + \beta ^{(t+1)}_p . \end{aligned}$$

$\square $

Next, we give the sufficient and necessary condition for (13) in the following lemma.

Lemma 8

(Lower bound 2) Suppose $\delta _p^{(t+1)} = 1$,

$$\begin{aligned} \gamma ^{(t+1)}_ {\varvec{q}^{(t+1)}} > \max _{\varvec{\alpha }\in {\mathbb {S}}^{(t+1)}_0} \gamma ^{(t+1)}_{\varvec{\alpha }}, \end{aligned}$$

if and only if,

$$\begin{aligned} \beta _p^{(t+1)} > \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma _{\varvec{\alpha }, -p} - \gamma _{\varvec{q}^{(t)}, -p}. \end{aligned}$$

(15)

Proof

Since $\delta ^{(t+1)}_p = 1$, by Remark 2, we have ${\mathbb {S}}_0^{(t+1)} \supseteq {\mathbb {L}}_0^p$.

It is easy to see that if (12) holds at time $t+1$, then (15) holds, because

$$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma _{\varvec{q}^{(t)}, -p} + \beta ^{(t+1)}_p > \max _{\varvec{\alpha }\in {\mathbb {S}}_0^{(t+1)}}\gamma _{\varvec{\alpha }}^{(t+1)} \ge \max _{\varvec{\alpha }\in {\mathbb {L}}_0^{p}}\gamma _{\varvec{\alpha }}^{(t+1)} = \max _{\varvec{\alpha }\in {\mathbb {L}}_0^{p}}\gamma _{\varvec{\alpha }, -p}. \end{aligned}$$

Next we show that if (15) holds, then (13) holds at time $t+1$.

Because (12) holds at time t, we have,

$$\begin{aligned} \gamma ^{(t)}_{\varvec{q}^{(t)}}> \max _{\varvec{\alpha }\in \mathbb S_0^{(t)}}\gamma ^{(t)}_{\varvec{\alpha }} \ge \max _{\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t)}}\gamma ^{(t)}_{\varvec{\alpha }}. \end{aligned}$$

(16)

Next, we check (13) in two different scenarios.

If $\varvec{q}^{(t)} = \varvec{q}^{(t+1)}$, ${\mathbb {S}}_0^{(t)} = {\mathbb {S}}_0^{(t+1)}$, then by (16) and Remark 9.4, we obtain
$$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma ^{(t+1)}_{\varvec{q}^{(t)}} > \max _{\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap \mathbb S_0^{(t+1)}}\gamma ^{(t+1)}_{\varvec{\alpha }}. \end{aligned}$$
If $\varvec{q}^{(t)} \prec \varvec{q}^{(t+1)}$, then since $\varvec{q}^{(t+1)},\in {\mathbb {S}}^{(t)}_1$, we have $\gamma _{\varvec{q}^{(t+1)}}^{(t)} > \max _{\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap \mathbb S_0^{(t)}} \gamma ^{(t)}_{\varvec{\alpha }}$. By Remark 9.4, we have
$$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} > \max _{\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t)}}\gamma ^{(t+1)}_{\varvec{\alpha }}.\end{aligned}$$
On the other hand, since $\varvec{\delta }^{(t+1)} = \varvec{\delta }^{(t)} + \varvec{e}_p$,
$$\begin{aligned}\{\varvec{\alpha }| \varvec{\alpha }\in {\mathbb {S}}^{(t+1)}_0, \varvec{\alpha }\notin {\mathbb {S}}^{(t)}_0 \} = \{\varvec{\alpha }| \varvec{\alpha }\succeq \varvec{\delta }^{(t)}, \varvec{\alpha }\nsucceq \varvec{\delta }^{(t+1)} \} \subseteq {\mathbb {L}}^p_0 = ({\mathbb {L}}^p_1)^c,\end{aligned}$$
leading to,
$$\begin{aligned} {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t+1)} = {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t)}. \end{aligned}$$

$\square $

Proof of Proposition 1

Suppose $\delta _p^{(t+1)} = 1$, by Lemmas 6 and 8, the monotonicity constraints hold at time $t+1$, if

$$\begin{aligned} \beta ^{(t+1)}_p&> \max \left\{ \max _{\varvec{\alpha }\in {\mathbb {L}}^p_1}( -\gamma _{\varvec{\alpha }, -p}), \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma _{\varvec{\alpha }, -p} - \gamma _{\varvec{q}^{(t)}, -p}\right\} \\&:= \max (L_1, L_2) = L. \end{aligned}$$

In the following two lemmas, we discuss the flipping rule of $\delta _p$.

Lemma 9

(Flipping rule 1) If $\delta ^{(t)}_p = 0, \delta ^{(t+1)}_p = 0$, the monotonicity constraints hold at time $t+1$ and $L = 0$.

Proof

The monotonicity constraints hold at time $t+1$, because $\varvec{B}^{(t)} = \varvec{B}^{(t+1)}$ and $\varvec{B}^{(t)}$ satisfy the constraints.

$L_1 = - \min _{\varvec{\alpha }\in {\mathbb {L}}^p_1} \gamma ^{(t)}_{\varvec{\alpha }} \le 0$ because (11) holds at t.
$L_2 = \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma ^{(t)}_{\varvec{\alpha }} - \gamma ^{(t)}_{\varvec{q}^{(t)}} = 0$ because
$$\begin{aligned} \gamma ^{(t)}_{\varvec{q}^{(t)}} = \min _{\varvec{\alpha }\in \mathbb S^{(t)}_1}\gamma ^{(t)}_{\varvec{\alpha }} \le \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma ^{(t)}_{\varvec{\alpha }} \le \max _{\varvec{\alpha }}\gamma ^{(t)}_{\varvec{\alpha }} = \gamma ^{(t)}_{\varvec{q}^{(t)}}, \end{aligned}$$
since ${\mathbb {L}}^p_0 \bigcap {\mathbb {S}}^{(t)}_1$ is not empty.

Therefore, $L = \max (L_1, L_2 ) = 0$. $\square $

Lemma 10

(Flipping rule 2) Suppose $\delta ^{(t)}_p = 1, \delta ^{(t+1)}_p = 0$. The monotonicity constraints hold at time $t+1$ if $L \le 0$.

Proof

If $\varvec{q}^{(t)} = \varvec{q}^{(t+1)}$, the statement can be proved easily by Lemma 6 and Lemma 8. We check (11) and (13) in for the case that $\varvec{q}^{(t)} \succ \varvec{q}^{(t+1)}$.

Since $L_1 = - \min _{\varvec{\alpha }\in {\mathbb {L}}^p_1} \gamma ^{(t+1)}_{\varvec{\alpha }} \le 0$ and $\min _{\varvec{\alpha }\in {\mathbb {L}}^p_0} \gamma ^{(t+1)}_{\varvec{\alpha }} = \min _{\varvec{\alpha }\in {\mathbb {L}}^p_0} \gamma ^{(t)}_{\varvec{\alpha }} \ge 0$, (11) holds at $t+1$.
By Remark 9.4, for any $\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t+1)}$,
$$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma ^{(t+1)}_{\varvec{q}^{(t)}} > \gamma ^{(t+1)}_{\varvec{\alpha }} \end{aligned}$$
(17)
Together with $L_2 = \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma ^{(t+1)}_{\varvec{\alpha }} - \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} \le 0$, (17) holds for any $\varvec{\alpha }\in {\mathbb {L}}^p_0 \bigcup ({\mathbb {L}}_1^p \bigcap \mathbb S_0^{(t+1)})$. Further, as shown in the proof of Lemma 8, we have $\{\varvec{\alpha }| \varvec{\alpha }\in {\mathbb {S}}^{(t+1)}_0, \varvec{\alpha }\notin {\mathbb {S}}^{(t)}_0 \} \subseteq {\mathbb {L}}^p_0$, such that ${\mathbb {S}}^{(t+1)}_0 \subseteq {\mathbb {L}}^p_0 \bigcup ({\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t+1)})$.

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Culpepper, S. & Liang, F. A Sparse Latent Class Model for Cognitive Diagnosis. Psychometrika 85, 121–153 (2020). https://doi.org/10.1007/s11336-019-09693-2

Download citation

Received: 10 August 2018
Revised: 13 December 2019
Published: 11 January 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11336-019-09693-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Sparse Latent Class Model for Cognitive Diagnosis

Abstract

Access this article

Similar content being viewed by others

The Reduced RUM as a Logit Model: Parameterization and Constraints

Identifiability and Cognitive Diagnosis Models

Exploring Joint Maximum Likelihood Estimation for Cognitive Diagnosis Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Connections of SLCM to Popular CDMs

Example 3

Example 4

Example 5

Example 6

Example 7

B Proof of Theorems

1.1 B.1 Proof of Theorem 4

Lemma 5

Proposition 3

Proof

Proposition 4

Proof

Remark 7

Example 8

1.2 B.2 Proof of Theorem 2

Proof

C Initialization from the Identifiable Space

D Derivation of \({{\tilde{\omega }}}_{jp}\)

E Proof of Lower Bound L

Remark 8

Remark 9

Lemma 6

Proof

Lemma 7

Proof

Lemma 8

Proof

Proof of Proposition 1

Lemma 9

Proof

Lemma 10

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation