A Sparse Latent Class Model for Cognitive Diagnosis

  • 36 Accesses


Cognitive diagnostic models (CDMs) are latent variable models developed to infer latent skills, knowledge, or personalities that underlie responses to educational, psychological, and social science tests and measures. Recent research focused on theory and methods for using sparse latent class models (SLCMs) in an exploratory fashion to infer the latent processes and structure underlying responses. We report new theoretical results about sufficient conditions for generic identifiability of SLCM parameters. An important contribution for practice is that our new generic identifiability conditions are more likely to be satisfied in empirical applications than existing conditions that ensure strict identifiability. Learning the underlying latent structure can be formulated as a variable selection problem. We develop a new Bayesian variable selection algorithm that explicitly enforces generic identifiability conditions and monotonicity of item response functions to ensure valid posterior inference. We present Monte Carlo simulation results to support accurate inferences and discuss the implications of our findings for future SLCM research and educational testing.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2


  1. 1.


  1. Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.

  2. Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37, 3099–3132.

  3. Carreira-Perpiñán, M., & Renals, S. (2000). Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Computation, 12, 141–152.

  4. Chen, Y., Culpepper, S. A., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q-matrix. Psychometrika, 83, 89–108.

  5. Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.

  6. Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74(4), 633–665.

  7. Cox, D., Little, J., O’Shea, D., & Sweedler, M. (1994). Ideals, varieties, and algorithms. American Mathematical Monthly, 101(6), 582–586.

  8. Culpepper, S. A. (2019). Estimating the cognitive diagnosis Q matrix with expert knowledge: Application to the fraction-subtraction dataset. Psychometrika, 84, 333–357.

  9. Dang, N. V. (2015). Complex powers of analytic functions and meromorphic renormalization in QFT. arXiv preprint arXiv:1503.00995.

  10. Davier, M. (2005). A general diagnostic model applied to language testing data. ETS Research Report Series, 2005(2)

  11. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.

  12. de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.

  13. DiBello, L. V., Stout, W. F., & Roussos, L. A. (1995). Unified cognitive/psychometric diagnostic assessment likelihood-based classification techniques. In P. D. Nichols, S. F. Chipman, & R. L. Brennan (Eds.), Cognitively Diagnostic Assessment, Chapter 15 (pp. 361–389). New York: Routledge.

  14. Fang, G., Liu, J., & Ying, Z. (2019). On the identifiability of diagnostic classification models. Psychometrika, 84(1), 19–40.

  15. Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231.

  16. Gyllenberg, M., Koski, T., Reilink, E., & Verlaan, M. (1994a). Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Probability, 31(2), 542–548.

  17. Gyllenberg, M., Koski, T., Reilink, E., & Verlaan, M. (1994b). Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Probability, 31(2), 542–548.

  18. Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321.

  19. Hagenaars, J. A. (1993). Loglinear models with latent variables (Vol. 94). Newbury Park, CA: Sage Publications Inc.

  20. Hartz, S.M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Ph.D. thesis, University of Illinois at Urbana-Champaign.

  21. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.

  22. Kruskal, J. B. (1976). More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41(3), 281–293.

  23. Kruskal, J. B. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Its Applications, 18(2), 95–138.

  24. Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning q-matrix. Bernoulli, 19(5A), 1790–1817.

  25. Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.

  26. Mityagin, B. (2015). The zero set of a real analytic function. arXiv preprint arXiv:1512.07276.

  27. Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guilford Press.

  28. Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society, Series C (Applied Statistics), 51(3), 337–350.

  29. Tatsuoka, K. K. (1984). Analysis of errors in fraction addition and subtraction problems. Final report, Technical report.

  30. Teicher, H., et al. (1961). Identifiability of mixtures. The Annals of Mathematical Statistics, 32(1), 244–248.

  31. Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287.

  32. Xu, G. (2017). Identifiability of restricted latent class models with binary responses. The Annals of Statistics, 45(2), 675–707.

  33. Xu, G., & Shang, Z. (2017). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113, 1284–1295.

  34. Yakowitz, S. J., & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39, 209–214.

Download references

Author information

Correspondence to Steven Culpepper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


A Connections of SLCM to Popular CDMs

In this section, we discuss the connections between SLCM and popular CDMs. To simplify the expression, we assume the relevant skills of item j are \(k_1, \ldots , k_R\), i.e., \(q_{jk_1} = \cdots = q_{jk_R} = 1, \, q_{jk}=0, \text { otherwise}\).

Example 3

(DINA model). The deterministic input noisy output “and” gate model (Haertel 1989; Junker and Sijtsma 2001) is a conjunctive model. It assumes that a student is most capable of answering question j positively only if he/she masters all of its relevant skills. The item response function takes the following form,

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) =(1-s_j)^{\varvec{1}({\varvec{\alpha }\succeq \varvec{q}_j})} {g_j}^{\varvec{1}(\varvec{\alpha }\nsucceq \varvec{q}_j)}, \end{aligned}$$

where \(s_j = {\mathbb {P}}(Y_j = 1 | \varvec{\alpha }\succeq \varvec{q}_j)\) is the slipping parameter, which is the probability that a student capable for item j but response negatively and \(g_j= {\mathbb {P}}(Y_j = 1 | \varvec{\alpha }\nsucceq \varvec{q}_j)\) is the guessing parameter, which is the probability that a non-master answers positively. It is assumed that \(g_j < 1 - s_j\) in most applications. The DINA model can be written as

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{\beta }_j) = \Psi \left( \beta _{j,0} +\beta _{j,k_1\ldots k_R} \alpha _{k_1}\ldots \alpha _{k_R}\right) \end{aligned}$$

where only one coefficient, besides the intercept, in \(\varvec{\beta }_j\) is active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1\ldots k_R} = 1, \quad \delta _{j,p}=0\, \, \text { otherwise}. \end{aligned}$$

The guessing parameter \(g_j\) and slipping parameter \(s_j\) is given by,

$$\begin{aligned} g_j = \Psi (\beta _{j,0}),\quad s_j = 1 - \Psi (\beta _{j,0}+\beta _{j,k_1\ldots k_R}). \end{aligned}$$

Example 4

(DINO model). The deterministic input noisy output “or” gate model (Templin and Henson 2006) is a disjunctive model, which assumes that a student is capable to answer question j positively if at least one of the relevant skills is mastered. The item response function is

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j)= (1-s_j)^{\varvec{1}(\varvec{\alpha }^\mathrm{T} \varvec{q}_j >0 )} {g_j}^{{\varvec{1}(\varvec{\alpha }^\mathrm{T} \varvec{q}_j = 0 )}} \end{aligned}$$

where \(s_j\) and \(g_j\) are defined the same as in DINA, and \(g_j < 1 - s_j\) is assumed. The DINO model can be reparameterized as

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{\beta }_j) = \Psi \left( \beta _{j,0}+\sum _{r=1}^{R} \beta _{j,k_r} \alpha _{k_r}+\underset{k_r> k_r^\prime }{\sum \sum }\beta _{j,k_rk_r^\prime }\alpha _{k_r}\alpha _{k_r^\prime }+\cdots +\beta _{j,k_1\ldots k_R}\prod _{r=1}^R \alpha _{k_r}\right) \end{aligned}$$

where the coefficients containing only the relevant skills are active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1} = \cdots = \delta _{j,k_R} = \delta _{j,k_1k_2} = \cdots = \delta _{j,k_{R-1}k_{R}} = \cdots = \delta _{j,k_1\ldots k_R} = 1,\quad \delta _{j,p}=0\, \, \text { otherwise} \end{aligned}$$

The coefficients with odd orders are all equal and positive. The coefficients with even orders are the additive inverse of those with odd orders.

$$\begin{aligned}&\beta _{j,k_1} = \beta _{j,k_2} = \cdots = \beta _{j,k_R} = \beta _{j,k_1k_2k_3} = \cdots = \beta _{j,k_{R-2}k_{R-1}k_{R}}=\cdots \\ =&-\beta _{j,k_1k_2} = \cdots = -\beta _{j,k_{R-1}k_R} = -\beta _{j,k_1k_2k_3k_4} = \cdots =-\beta _{j,k_{R-3}k_{R-2}k_{R-1}k_R} = \cdots . \end{aligned}$$

The guessing parameter \(g_j\) is in the same form of the one in DINA model and slipping parameter \(s_j\) is given by \(1 - \Psi (\varvec{a}_{\varvec{\alpha }}^\mathrm{T} \varvec{\beta }_j)\), with \(\varvec{\alpha }\) satisfying \(\varvec{\alpha }^\mathrm{T} \varvec{q}_j > 0\), which is equivalent to \( 1 - \Psi (\beta _{j,0}+\beta _{j,k_r})\), \(r = 1,\ldots , R\),

$$\begin{aligned} g_j = \Psi (\beta _{j,0}),\quad s_j = 1 - \Psi (\beta _{j,0}+\beta _{j,k_r}), r = 1,\ldots , R. \end{aligned}$$

Example 5

(G-DINA model). The DINA model is generalized to the G-DINA model by de la Torre (2011), which takes the form of

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) = \beta _{j,0}+\sum _{k=1}^{K} \beta _{j,k} q_{jk}\alpha _{k}+\underset{k> k^\prime }{\sum \sum }\beta _{j,kk^\prime }q_{jk}\alpha _{k}q_{jk^\prime }\alpha _{k^\prime }+\cdots +\beta _{j,12\ldots K}\prod _{k=1}^K q_{jk}\alpha _{k}. \end{aligned}$$

By using the identity link in Eq. (1), it can be written as

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{\beta }_j) =\beta _{j,0}+\sum _{r=1}^{R} \beta _{j,k_r} \alpha _{k_r}+\underset{k_r> k_r^\prime }{\sum \sum }\beta _{j,k_rk_r^\prime }\alpha _{k_r}\alpha _{k_r^\prime }+\cdots +\beta _{j,k_1\ldots k_R}\prod _{r=1}^R \alpha _{k_r} \end{aligned}$$

where the coefficients containing only the relevant skills are active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1} = \cdots = \delta _{j,k_R} = \delta _{j,k_1k_2} = \cdots = \delta _{j,k_{R-1}k_{R}} = \cdots = \delta _{j,k_1\ldots k_R} = 1,\quad \delta _{j,p}=0\, \, \text { otherwise}. \end{aligned}$$

Example 6

(NC-RUM model). Under the reduced noncompensatory reparameterized unified model (DiBello et al. 1995; Rupp et al., 2010), attributes have a noncompensatory relationship with observed response. It assumes missing any relevant skill would inflict a penalty on the positive response probability.

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) = b_j \prod _{k=1}^K r_{j,k}^{q_{jk}(1-\alpha _{k})} \end{aligned}$$

\(b_j\) is the positive response probability for students who possess all relevant skills and \(r_{j,k}\), \(0<r_{j,k}<1\), is the penalty for not mastering kth attribute. As pointed by Xu (2017), by using the exponential link function, NC-RUM can be equivalently written as

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{\beta }_j) = \exp \left( \beta _{j,0} + \sum ^{R}_{r=1} \beta _{j,k_r}\alpha _{k_r} \right) , \end{aligned}$$

where the main effects of relevant attributes are active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1} = \cdots \delta _{j,k_R} = 1, \quad \delta _{j,p}=0\, \, \text { otherwise}. \end{aligned}$$

The parameters are given by

$$\begin{aligned} b_j = \exp \left( \beta _{j,0}+\sum _{r=1}^R \beta _{j,k_r}\right) ,\quad r_{j,k} = {\left\{ \begin{array}{ll} \exp (-\beta _{j,k_r}), \quad &{}\text {if } k \in \{k_1, \ldots , k_R\}\\ 1, \quad &{}\text {otherwise}\end{array}\right. } \end{aligned}$$

Example 7

(C-RUM model). Compensatory-RUM (Hagenaars 1993; Maris 1999) is given by,

$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) = \frac{\exp \left( \beta _{j,0} + \sum _{k=1}^K \beta _{j,k}q_{jk}\alpha _{k}\right) }{\exp \left( \beta _{j0} + \sum _{k=1}^K \beta _{j,k}q_{jk}\alpha _{k}\right) +1}. \end{aligned}$$


$$\begin{aligned} {\mathbb {P}}(Y_j =1| \varvec{\alpha }, \varvec{q}_j) = \text {logit}^{-1} \left( \beta _{j,0} + \sum ^{R}_{r=1} \beta _{j,k_r}\alpha _{k_r} \right) \end{aligned}$$

where \(\Psi (\cdot )\) is the inverse of the logit function and the main effects of relevant attributes are active,

$$\begin{aligned} \delta _{j,0} = \delta _{j,k_1} = \cdots \delta _{j,k_R} = 1, \quad \delta _{j,p}=0\, \, \text { otherwise}. \end{aligned}$$

B Proof of Theorems

In this section, we provide the proof of Theorems 4 and 2.

B.1 Proof of Theorem 4

We first introduce Lemma 5 (Mityagin 2015; Dang 2015) which shows that the solution set of a real analytic function is of Lebesgue measure zero if the function is not constantly 0. Then in Proposition 3, we show that \(G_{\varvec{D}}(\varvec{B}) : = \det [\varvec{M}(\varvec{D}, \varvec{B})]\) is a real analytic function, and in Proposition 4, we show that \(G_{\varvec{D}}(\varvec{B})\) is not constantly zero for any \(\varvec{B}\in \Omega _{\varvec{D}} (\varvec{B})\) if \(\varvec{D} \in {\mathbb {D}}_g\), so that Lemma 5 applies and Theorem 4 is proved.

Lemma 5

(Mityagin 2015; Dang 2015) If \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) is a real analytic function which is not identically zero, then the set \(\{\varvec{x} :f(\varvec{x}) =0\}\) has Lebesgue measure zero.

Proposition 3

\(G_{\varvec{D}}(\varvec{B}) = \det [\varvec{M}(\varvec{D}, \varvec{B})]: \Omega _{\varvec{D}} \rightarrow {\mathbb {R}}\) is a real analytic function of \(\varvec{B}\).


\(G_{\varvec{D}}(\varvec{B})\) is a composition function:

$$\begin{aligned} G_{\varvec{D}}(\varvec{B}) = \det [\varvec{M}] = h(\varvec{\theta }_{\varvec{\alpha }_0},\ldots , \varvec{\theta }_{\varvec{\alpha }_{2^K-1}} ) = h \left( \Psi (\varvec{B}_{\varvec{D}} \varvec{a}_{\varvec{\alpha }_0}),\ldots , \Psi (\varvec{B}_{\varvec{D}} \varvec{a}_{\varvec{\alpha }_{2^K-1}})\right) \end{aligned}$$

where \(h(\varvec{\theta }): [0,1]^{K\times 2^K}\rightarrow {\mathbb {R}}\) is a polynomial function and \(\Psi (\cdot )\) is a CDF.

\(\Psi (\cdot )\) is a real analytic function because a CDF is an integral of a real analytic function, and \(h(\varvec{\theta })\) is also a real analytic function since it is a polynomial. Therefore, the composition function \(G_{D}(\varvec{B})\) is a real analytic function due to the fact that the composition of real analytic functions is a real analytic function. \(\square \)

Proposition 4

If \(\varvec{D}\in {\mathbb {D}}_g\), there exists some \(\varvec{B}\in \Omega _{\varvec{D}} (\varvec{B})\), s.t., \(G_{\varvec{D}}(\varvec{B})\ne 0\).


Let \(\varvec{B}^1 = (\varvec{1}_K, \varvec{I}_K, \varvec{0}) \in \Omega _{\varvec{D}}(\varvec{B}), \forall \varvec{D} \in {\mathbb {D}}_g\). As shown in Example 2, \(\varvec{M}(\varvec{D}, \varvec{B}^1)\) is of full rank, so that \(G_{\varvec{D}}(\varvec{B}^1 )\ne 0\). \(\square \)

Remark 7

\(G_{\varvec{D}}(\varvec{B})\not \equiv 0\) is not a trivial conclusion holds for all kinds of \(\varvec{D}\). \(\varvec{D} \in {\mathbb {D}}\) is a sufficient condition for \(G_{\varvec{D}}(\varvec{B})\not \equiv 0\). If \(\varvec{D} \not \in {\mathbb {D}}\), it is possible \(G_{\varvec{D}}(\varvec{B})\equiv 0\). See the following example.

Example 8

Assume \(K=3\) and the main effect of the first skill is inactive for all items, i.e., \(\delta _{j,1}=0, \, \forall 1\le j \le K\), then \(\varvec{D}_{3\times 8}\) takes the form

$$\begin{aligned} \begin{bmatrix} * &{}\quad 0 &{}\quad * &{}\quad * &{}\quad * &{}\quad * &{}\quad * &{}\quad *\\ * &{}\quad 0 &{}\quad 1 &{}\quad * &{}\quad * &{}\quad * &{}\quad * &{}\quad *\\ * &{}\quad 0 &{}\quad * &{}\quad 1 &{}\quad * &{}\quad * &{}\quad * &{}\quad * \\ \end{bmatrix}. \end{aligned}$$

For any \(\varvec{B}\in \Omega _{\varvec{D}}(\varvec{B})\) and any response \(\varvec{y} \in \{0,1\}^3\),

$$\begin{aligned} \varvec{M}_{\varvec{\alpha }= (0,0,0),\varvec{y}}(\varvec{D}, \varvec{B}) = \varvec{M}_{\varvec{\alpha }= (1,0,0),\varvec{y}}(\varvec{D}, \varvec{B}). \end{aligned}$$

So the two rows of \(\varvec{M}(\varvec{D}, \varvec{B})\) are identical, and \(\varvec{M}(\varvec{D}, \varvec{B})\) is not full row rank, i.e., \(\det [\varvec{M}(\varvec{D}, \varvec{B})] \equiv 0\).

By Lemma 5 and Propositions 3 and 4, Theorem 4 is proved.

B.2 Proof of Theorem 2


As shown in Example 2, for any \(\varvec{B} \in \Omega _{\varvec{D}_s} (\varvec{B})\), the corresponding class-response matrix is of full rank, \(rank(\varvec{M}(\varvec{D}_s, \varvec{B})) = 2^K\), holds if and only if, for each item, the success probabilities for students with the relevant skill and those without the relevant skill are different. In fact, if the two probabilities are the same, the monotonicity constraints would be violated. Then, using notation from Sect. 3.5, we conclude that under condition (S1) and the monotonicity constraints, \(rank(\varvec{M}_1) = rank(\varvec{M}_2) = 2^K\).

For \(\varvec{M}_3\), as each element is nonnegative and each row sums to 1. Under condition (S2), there must exist one item j, such that \(\theta _{j,\varvec{\alpha }_s} \ne \theta _{j, \varvec{\alpha }_t}\), so \(rank(\varvec{M}_3) \ge 2\).

\(\square \)

C Initialization from the Identifiable Space

Initialization of the sparsity matrix \(\varvec{\Delta }^{(0)}_{J\times 2^K}\):

  1. 1.

    activate the intercepts. Fix the entries in the first column of \(\varvec{\Delta }^{(0)}\) (i.e., \(\varvec{\Delta }^{(0)}_{\cdot 1}\)) as 1. Denote the remaining \(J\times (2^K-1)\) sub-matrix as \({{\tilde{\varvec{\Delta }}}}^{(0)}\).

  2. 2.

    construct \(\varvec{D}^{(0)}_1\) and \(\varvec{D}^{(0)}_2\). Fix the first 2K rows of \( {{\tilde{\varvec{\Delta }}}}^{(0)}\) to be

    $$\begin{aligned}\left( \begin{array}{ll} \varvec{I}_K &{} \varvec{0} \\ \varvec{I}_K &{}\varvec{0} \\ \end{array}\right) .\end{aligned}$$
  3. 3.

    construct \({{\tilde{\varvec{\Delta }}}}^{'(0)}\).

    1. (a)

      Randomly select K indexes, \(j_1,\ldots ,j_K\), from the set \(\{2K+1, \ldots , J\}\) with replacement and set \({\tilde{\varvec{\Delta }}}^{(0)}_{j_k,k} = 1.\)

    2. (b)

      Sample the remaining entries in \( {{\tilde{\varvec{\Delta }}}}^{'(0)}\) by

      $$\begin{aligned} \delta _{jp}^{(0)}|w^{(0)}\sim \text {Bernoulli}(w^{(0)}), \quad j>2K,\,\, (j,p) \notin \{(j_k,k)\}_{k=1}^K \end{aligned}$$

      where \(w^{(0)}\sim \text {Beta}(w_0, w_1)\) and \(w_0, w_1\) are the parameters of the prior distribution and are treated as fixed.

    3. (c)

      Check the row sum. If any row of \(\varvec{\Delta }^{'(0)}\) sums to 0, then we randomly pick up an entry on this row and set it at 1.

  4. 4.

    shuffle the rows. Draw a \(J\times J\) permutation matrix \(\varvec{P} = (\varvec{e}_{j_1},\ldots ,\varvec{e}_{j_J})\) where \((j_1,\ldots , j_J)\) is a permutation of \((1,\ldots ,J)\), and let \({{\tilde{\varvec{\Delta }}}}^{(0)} \leftarrow \varvec{P} {{\tilde{\varvec{\Delta }}}}^{(0)}\).

The above initialization is designed for strict identifiability conditions. To generate a \(\varvec{\Delta }\) under the generic identifiability conditions, we just need to enlarge the range of entries sampling from the prior distribution in step 3b. Specifically, we change step 3b to

Sample the remaining entries in \({{\tilde{\varvec{\Delta }}}}^{'(0)}\) by

$$\begin{aligned} \delta _{jp}^{(0)}|w^{(0)}\sim \text {Bernoulli}(w^{(0)}), \quad \, (j,p) \notin \{(j_k,k), (k,k), (K+k,k)\}_{k=1}^K. \end{aligned}$$

Initialization of the coefficients matrix \(\varvec{\beta }^{(0)}_{J\times 2^K} = (\varvec{\beta }_1, \ldots , \varvec{\beta }_J)^\mathrm{T}\):

$$\begin{aligned} \beta ^{(0)}_{jp}=0, \text {if } \delta _{jp}^{(0)} = 0, \end{aligned}$$
$$\begin{aligned} \beta ^{(0)}_{jp} | \delta _{jp}^{(0)}=1 \propto {\mathcal {N}}(0, \sigma _{\beta }^2)I(\beta ^{(0)}_{jp} >0). \end{aligned}$$

D Derivation of \({{\tilde{\omega }}}_{jp}\)

$$\begin{aligned} \delta _{jp}\left| \varvec{Z}_j,\varvec{\alpha },\varvec{\beta }_{j(p)},\omega ,\sigma _\beta ^2 \right. \sim \text {Bernoulli}\left( {{\tilde{\omega }}}_{jp}\right) \end{aligned}$$
$$\begin{aligned} {{\tilde{\omega }}}_{jp} = \frac{\omega \int _{L}^\infty p\left( \varvec{Z}_j\left| \varvec{\alpha },\varvec{\beta }_j\right. \right) p\left( \beta _{jp}\left| \sigma _\beta ^2\right. \right) \hbox {d}\beta _{jp} }{\omega \int _{L}^\infty p\left( \varvec{Z}_j\left| \varvec{\alpha },\varvec{\beta }_j\right. \right) p\left( \beta _{jp}\left| \sigma _\beta ^2\right. \right) \hbox {d}\beta _{jp} + \left( 1-\omega \right) p\left( \varvec{Z}_j\left| \varvec{\alpha },\varvec{\beta }_{j(p)},\beta _{jp}=0\right. \right) } \end{aligned}$$

The numerator is

$$\begin{aligned}&\omega \int _{L}^\infty p\left( \varvec{Z}_j\left| \varvec{\alpha },\varvec{\beta }_j\right. \right) p\left( \beta _{jp}\left| \sigma _\beta ^2\right. \right) \hbox {d}\beta _{jp} \\&\quad = \omega \int _{L}^\infty \left( 2\pi \right) ^{-\frac{N}{2}} \exp \left[ -\frac{1}{2}\left( \widetilde{\varvec{Z}}_j - \varvec{A}_p \beta _{jp}\right) ^\prime \left( \widetilde{\varvec{Z}}_j - \varvec{A}_p \beta _{jp}\right) \right] \\&\qquad \cdot {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1}\left( 2\pi \right) ^{-\frac{1}{2}} \frac{1}{\sigma _\beta } \exp \left( -\frac{\beta _{jp}^2}{2\sigma _\beta ^2}\right) \hbox {d}\beta _{jp} \\&\quad = \left( 2\pi \right) ^{-\frac{N}{2}} {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega }{\sigma _\beta }\\&\qquad \times \int _{L}^\infty \left( 2\pi \right) ^{-\frac{1}{2}} \exp \left\{ -\frac{1}{2}\left[ \left( \varvec{A}'_p\varvec{A}_p+\frac{1}{\sigma _\beta ^2}\right) \beta _{jp}^2-2\varvec{A}'_p\widetilde{\varvec{Z}}_j \beta _{jp} + \widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j\right] \right\} \hbox {d}\beta _{jp} \\&\quad = \left( 2\pi \right) ^{-\frac{N}{2}} {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left[ -\frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j +\frac{1}{2}{{\tilde{\sigma }}}_p^2\left( \varvec{A}'_p\widetilde{\varvec{Z}}_j \right) ^2\right] \\&\qquad \times \int _{L}^\infty \left( 2\pi \right) ^{-\frac{1}{2}}\frac{1}{{{\tilde{\sigma }}}_p} \exp \left[ -\frac{1}{2{{\tilde{\sigma }}}_p^2}\left( \beta _{jp}- {{\tilde{\sigma }}}_p^2\varvec{A}'_p\widetilde{\varvec{Z}}_j \right) ^2\right] \hbox {d}\beta _{jp} \\&\quad = \left( 2\pi \right) ^{-\frac{N}{2}} {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left( -\frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j + \frac{1}{2} \frac{{{\tilde{\mu }}}^2_{jp}}{{{\tilde{\sigma }}}^2_p} \right) \int _{-\frac{{{\tilde{\mu }}}_{jp}-L}{{{\tilde{\sigma }}}_p}}^\infty \left( 2\pi \right) ^{-\frac{1}{2}} \\&\qquad \times \exp \left[ -\frac{1}{2}\left( \frac{\beta _{jp}- {{\tilde{\mu }}}_{jp}}{{{\tilde{\sigma }}}_p}\right) ^2\right] \hbox {d}\left( \frac{\beta _{jp}- {{\tilde{\mu }}}_{jp}}{{{\tilde{\sigma }}}_p}\right) \\&\quad = \left( 2\pi \right) ^{-\frac{N}{2}} \exp \left( - \frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j \right) {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left( \frac{1}{2} \frac{{{\tilde{\mu }}}^2_{jp}}{\sigma ^2_p} \right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp} -L}{{{\tilde{\sigma }}}_p} \right) \\ \end{aligned}$$

where \({{\tilde{\sigma }}}_p^2 = \left( \varvec{A}_p^\prime \varvec{A}_p + \sigma _\beta ^{-2}\right) ^{-1}\) and \({{\tilde{\mu }}}_{jp} = \varvec{A}_p^\prime \widetilde{\varvec{Z}}_j\left( \varvec{A}_p^\prime \varvec{A}_p + \sigma _\beta ^{-2}\right) ^{-1}\). Accordingly, \({{\tilde{\omega }}}_{jp}\) is,

$$\begin{aligned} {{\tilde{\omega }}}_{jp}&= \frac{ \left( 2\pi \right) ^{-\frac{N}{2}} \exp \left( - \frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j \right) {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left( \frac{1}{2} \frac{{{\tilde{\mu }}}^2_{jp}}{\sigma ^2_p} \right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp}-L}{{{\tilde{\sigma }}}_p} \right) }{ \left( 2\pi \right) ^{-\frac{N}{2}} \exp \left( - \frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j \right) {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \frac{\omega {{\tilde{\sigma }}}_p}{\sigma _\beta } \exp \left( \frac{1}{2} \frac{{{\tilde{\mu }}}^2_{jp}}{\sigma ^2_p} \right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp} -L}{{{\tilde{\sigma }}}_p}\right) + (1-\omega )\left( 2\pi \right) ^{-\frac{N}{2}} \exp \left( -\frac{1}{2}\widetilde{\varvec{Z}}'_j\widetilde{\varvec{Z}}_j \right) } \\&= \frac{ {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1}\omega \left( \frac{{{\tilde{\sigma }}}_p}{\sigma _\beta }\right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp} -L}{{{\tilde{\sigma }}}_p}\right) \exp \left( \frac{1}{2}\frac{{\tilde{\mu }}_{jp}^2}{{\tilde{\sigma }}_p^2}\right) }{ {\Phi \left( \frac{-L}{\sigma _\beta }\right) }^{-1} \omega \left( \frac{{{\tilde{\sigma }}}_p}{\sigma _\beta }\right) \Phi \left( \frac{{{\tilde{\mu }}}_{jp} -L}{{{\tilde{\sigma }}}_p}\right) \exp \left( \frac{1}{2}\frac{{\tilde{\mu }}_{jp}^2}{{\tilde{\sigma }}_p^2}\right) + 1-\omega } \end{aligned}$$

E Proof of Lower Bound L

In this section, we derive the lower bound L of \(\beta _{jp}\) (Proposition 1).

Suppose at time t, we have a \(\varvec{B}^{(t)}\in {\mathcal {B}}(\varvec{B})\) satisfying the monotonocity constraints (4), and we only sample \(\beta _{jp}\) at time \(t+1\) and leave any other coefficient the same as the one at time t, i.e., \(\beta _{js}^{(t+1)} = \beta _{js}^{(t)},\quad \forall s\ne p\).

In what follows, we introduce some notations.

We denote \(\beta _{jp}\) and \(\delta _{jp}\) as \(\beta _p\) and \(\delta _p\), respectively. That is, we omit the subscript of item, j, as the lower bound of coefficient \(\beta _{jp}\) does not depend on any other coefficient of other items.

Let \(\gamma _{\varvec{\alpha }} = \varvec{\beta }^\mathrm{T} \varvec{a}_{\varvec{\alpha }} - \beta _0 = \Psi ^{-1}(\theta _{\varvec{\alpha }}) - \beta _0\) be the sum of the linear component excluding the intercept for class \(\varvec{\alpha }\). Further, let \( \gamma _{\varvec{\alpha }, -p} = {\left\{ \begin{array}{ll} \gamma _{\varvec{\alpha }} - \beta _p &{} \varvec{\alpha }\in {\mathbb {L}}^p_1 \\ \gamma _{\varvec{\alpha }} &{} \varvec{\alpha }\in {\mathbb {L}}^p_0, \end{array}\right. }\) denote the sum of the linear component excluding the intercept and the pth coefficients for class \(\varvec{\alpha }\), where \({\mathbb {L}}^p_1= \{\varvec{\alpha }| \varvec{a}_{\varvec{\alpha },p} = 1\}\) and \({\mathbb {L}}^p_0 = \{\varvec{\alpha }| \varvec{a}_{\varvec{\alpha },p} = 0, \varvec{\alpha }\succ \varvec{\alpha }_0\}\).

We rewrite the monotonicity constraints (4) as follows,


where \({\mathbb {S}}_0 = \{\varvec{\alpha }| \varvec{\alpha }\nsucceq \varvec{q}, \, \varvec{\alpha }\succ \varvec{\alpha }_0= \varvec{0} \}\) is the set the classes that not mastering all the relevant skills, and \({\mathbb {S}}_1 = \{\varvec{\alpha }| \varvec{\alpha }\succeq \varvec{q}\} \) is the set of classes mastering all the relevant skills.

Note that \(\Psi (\cdot )\) is a strictly increasing function, we have the following equivalent form of the monotonicity constraints (\(\star \)):

$$\begin{aligned} \min _{\varvec{\alpha }\succ \varvec{\alpha }_0} \gamma _{\varvec{\alpha }}&\ge \gamma _{\varvec{\alpha }_0} = 0 , \end{aligned}$$
$$\begin{aligned} \gamma _ {\varvec{q}} = \max _{\varvec{\alpha }\in {\mathbb {S}}_1} \gamma _{\varvec{\alpha }} = \min _{\varvec{\alpha }\in {\mathbb {S}}_1} \gamma _{\varvec{\alpha }}&> \max _{\varvec{\alpha }\in {\mathbb {S}}_0} \gamma _{\varvec{\alpha }}. \end{aligned}$$

In SLCM, \(\varvec{q}\) is uniquely determined by the structure vector \(\varvec{\delta }\). Mathematically, \(\varvec{q} = \arg \min _{\varvec{\alpha }: \varvec{a}_{\varvec{\alpha }\succeq \varvec{\delta }}} |\varvec{\alpha }|\), where \(|\cdot |\) is the cardinality. By such definition, \(\gamma _ {\varvec{q}} = \max _{\varvec{\alpha }\in {\mathbb {S}}_1} \gamma _{\varvec{\alpha }} = \min _{\varvec{\alpha }\in {\mathbb {S}}_1} \gamma _{\varvec{\alpha }} \) always holds, therefore, to verify (12), we only need to check,

$$\begin{aligned} \gamma _ {\varvec{q}} > \max _{\varvec{\alpha }\in {\mathbb {S}}_0} \gamma _{\varvec{\alpha }}. \end{aligned}$$

In the following two remarks, we list some observations that are useful in the proof.

Remark 8

  1. 1.

    \({\mathbb {L}}^p_0 \bigcup {\mathbb {L}}^p_1 = {\mathbb {S}}_0 \bigcup {\mathbb {S}}_1 = \{\varvec{\alpha }| \varvec{\alpha }\succ \varvec{\alpha }_0\}\)

  2. 2.

    \(\varvec{a}_{\varvec{q}, p}= 1\)\(\Rightarrow \)\( {\mathbb {S}}_1 \subseteq {\mathbb {L}}^p_1\), \( {\mathbb {S}}_0 \supseteq {\mathbb {L}}^p_0\).

Remark 9

  1. 1.

    \(\forall \varvec{\alpha }, \qquad \gamma ^{(t)}_{\varvec{\alpha }, -p} = \gamma ^{(t+1)}_{\varvec{\alpha }, -p} := \gamma _{\varvec{\alpha }, -p}\)

  2. 2.

    \( \forall \varvec{\alpha }\in {\mathbb {L}}^p_0, \qquad \gamma ^{(t+1)}_{\varvec{\alpha }} =\gamma ^{(t)}_{\varvec{\alpha }}\)

  3. 3.

    \(\forall \varvec{\alpha }\in {\mathbb {L}}^p_1, \qquad \gamma ^{(t+1)}_{\varvec{\alpha }} =\gamma ^{(t)}_{\varvec{\alpha }} - \varvec{\beta }_p^{(t)} +\varvec{\beta }_p^{(t+1)}\)

  4. 4.

    \(\forall \varvec{\alpha }_1, \varvec{\alpha }_2 \in {\mathbb {L}}^p_1\), \(\qquad \varvec{\gamma }^{(t)}_{\varvec{\alpha }_1} > \varvec{\gamma }^{(t)}_{\varvec{\alpha }_2} \)\(\Rightarrow \)\(\varvec{\gamma }^{(t+1)}_{\varvec{\alpha }_1} > \varvec{\gamma }^{(t+1)}_{\varvec{\alpha }_2}\)

  5. 5.

    \(\forall \varvec{\alpha }_1, \varvec{\alpha }_2 \in {\mathbb {L}}^p_0\), \(\qquad \varvec{\gamma }^{(t)}_{\varvec{\alpha }_1} >\varvec{\gamma }^{(t)}_{\varvec{\alpha }_2} \)\(\Rightarrow \)\(\varvec{\gamma }^{(t+1)}_{\varvec{\alpha }_1} > \varvec{\gamma }^{(t+1)}_{\varvec{\alpha }_2} \)

In the following lemma, we give the sufficient and necessary condition for (11).

Lemma 6

(Lower bound 1)

$$\begin{aligned} \min _{\varvec{\alpha }\succ \varvec{\alpha }_0} \gamma ^{(t+1)}_{\varvec{\alpha }} \ge \gamma ^{(t+1)}_{\varvec{\alpha }_0} = 0 \end{aligned}$$

holds if and only if

$$\begin{aligned} \beta ^{(t+1)}_p \ge \max _{\varvec{\alpha }\in {\mathbb {L}}^p_1}( -\gamma _{\varvec{\alpha }, -p}). \end{aligned}$$


Since \(\varvec{B}^{(t)}\in {\mathcal {B}}(\varvec{B})\), we have \(\min _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma _{\varvec{\alpha }}^{(t+1)} = \min _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma _{\varvec{\alpha }}^{(t)} \ge 0.\) So we only need to consider \(\varvec{\alpha }\in {\mathbb {L}}^p_1\), such that

$$\begin{aligned} \min _{\varvec{\alpha }\in {\mathbb {L}}^p_1}\gamma _{\varvec{\alpha }}^{(t+1)} = \min _{\varvec{\alpha }\in {\mathbb {L}}^p_1}(\gamma ^{(t)}_{\varvec{\alpha }, -p} + \beta ^{(t+1)}_p )\ge 0, \end{aligned}$$

which holds if and only if (14) holds. \(\square \)

We show the relationship between \(\gamma ^{(t+1)}_{\varvec{q}^{(t+1)}}\) and \(\gamma ^{(t)}_{\varvec{q}^{(t)}}\) in the following lemma.

Lemma 7

$$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma _{\varvec{q}^{(t)}, -p} + \beta ^{(t+1)}_p. \end{aligned}$$


  • If \(\varvec{q}^{(t+1)} = \varvec{q}^{(t)}\), \(\gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma ^{(t+1)}_{\varvec{q}^{(t)}} = \gamma _{\varvec{q}^{(t)}, -p} + \beta ^{(t+1)}_p.\)

  • If \(\varvec{q}^{(t+1)} \succ \varvec{q}^{(t)}\), it implies \(\delta _p^{(t)} = 0\) and \(\delta _p^{(t+1)} = 1\). Therefore, \(\varvec{q}^{(t)}, \varvec{q}^{(t+1)} \in {\mathbb {S}}^{(t)}_1\), so that,

    $$\begin{aligned} \gamma ^{(t)}_{\varvec{q}^{(t+1)}} = \gamma ^{(t)}_{\varvec{q}^{(t)}} = \gamma _{\varvec{q}^{(t)}, -p}. \end{aligned}$$


    $$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma _{\varvec{q}^{(t+1)},-p}+\beta ^{(t+1)}_p = \gamma _{\varvec{q}^{(t)}, -p} + \beta ^{(t+1)}_p. \end{aligned}$$
  • If \(\varvec{q}^{(t+1)} \prec \varvec{q}^{(t)}\), it means \(\delta _p^{(t)} = 1\), \(\beta _p^{(t+1)} = \delta _p^{(t+1)} = 0\), and \(\varvec{a}_{\varvec{q}^{(t+1)}} + \varvec{e}_p = \varvec{a}_{\varvec{q}^{(t)}}\). Then,

    $$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} =\gamma _{\varvec{q}^{(t+1)},-p} =\gamma _{\varvec{q}^{(t)},-p}= \gamma _{\varvec{q}^{(t)},-p} + \beta ^{(t+1)}_p . \end{aligned}$$

\(\square \)

Next, we give the sufficient and necessary condition for (13) in the following lemma.

Lemma 8

(Lower bound 2) Suppose \(\delta _p^{(t+1)} = 1\),

$$\begin{aligned} \gamma ^{(t+1)}_ {\varvec{q}^{(t+1)}} > \max _{\varvec{\alpha }\in {\mathbb {S}}^{(t+1)}_0} \gamma ^{(t+1)}_{\varvec{\alpha }}, \end{aligned}$$

if and only if,

$$\begin{aligned} \beta _p^{(t+1)} > \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma _{\varvec{\alpha }, -p} - \gamma _{\varvec{q}^{(t)}, -p}. \end{aligned}$$


Since \(\delta ^{(t+1)}_p = 1\), by Remark 2, we have \({\mathbb {S}}_0^{(t+1)} \supseteq {\mathbb {L}}_0^p\).

It is easy to see that if (12) holds at time \(t+1\), then (15) holds, because

$$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma _{\varvec{q}^{(t)}, -p} + \beta ^{(t+1)}_p > \max _{\varvec{\alpha }\in {\mathbb {S}}_0^{(t+1)}}\gamma _{\varvec{\alpha }}^{(t+1)} \ge \max _{\varvec{\alpha }\in {\mathbb {L}}_0^{p}}\gamma _{\varvec{\alpha }}^{(t+1)} = \max _{\varvec{\alpha }\in {\mathbb {L}}_0^{p}}\gamma _{\varvec{\alpha }, -p}. \end{aligned}$$

Next we show that if (15) holds, then (13) holds at time \(t+1\).

Because (12) holds at time t, we have,

$$\begin{aligned} \gamma ^{(t)}_{\varvec{q}^{(t)}}> \max _{\varvec{\alpha }\in \mathbb S_0^{(t)}}\gamma ^{(t)}_{\varvec{\alpha }} \ge \max _{\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t)}}\gamma ^{(t)}_{\varvec{\alpha }}. \end{aligned}$$

Next, we check (13) in two different scenarios.

  • If \(\varvec{q}^{(t)} = \varvec{q}^{(t+1)}\), \({\mathbb {S}}_0^{(t)} = {\mathbb {S}}_0^{(t+1)}\), then by (16) and Remark 9.4, we obtain

    $$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma ^{(t+1)}_{\varvec{q}^{(t)}} > \max _{\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap \mathbb S_0^{(t+1)}}\gamma ^{(t+1)}_{\varvec{\alpha }}. \end{aligned}$$
  • If \(\varvec{q}^{(t)} \prec \varvec{q}^{(t+1)}\), then since \(\varvec{q}^{(t+1)},\in {\mathbb {S}}^{(t)}_1\), we have \(\gamma _{\varvec{q}^{(t+1)}}^{(t)} > \max _{\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap \mathbb S_0^{(t)}} \gamma ^{(t)}_{\varvec{\alpha }}\). By Remark 9.4, we have

    $$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} > \max _{\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t)}}\gamma ^{(t+1)}_{\varvec{\alpha }}.\end{aligned}$$

    On the other hand, since \(\varvec{\delta }^{(t+1)} = \varvec{\delta }^{(t)} + \varvec{e}_p\),

    $$\begin{aligned}\{\varvec{\alpha }| \varvec{\alpha }\in {\mathbb {S}}^{(t+1)}_0, \varvec{\alpha }\notin {\mathbb {S}}^{(t)}_0 \} = \{\varvec{\alpha }| \varvec{\alpha }\succeq \varvec{\delta }^{(t)}, \varvec{\alpha }\nsucceq \varvec{\delta }^{(t+1)} \} \subseteq {\mathbb {L}}^p_0 = ({\mathbb {L}}^p_1)^c,\end{aligned}$$

    leading to,

    $$\begin{aligned} {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t+1)} = {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t)}. \end{aligned}$$

\(\square \)

Proof of Proposition 1

Suppose \(\delta _p^{(t+1)} = 1\), by Lemmas 6 and 8, the monotonicity constraints hold at time \(t+1\), if

$$\begin{aligned} \beta ^{(t+1)}_p&> \max \left\{ \max _{\varvec{\alpha }\in {\mathbb {L}}^p_1}( -\gamma _{\varvec{\alpha }, -p}), \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma _{\varvec{\alpha }, -p} - \gamma _{\varvec{q}^{(t)}, -p}\right\} \\&:= \max (L_1, L_2) = L. \end{aligned}$$

In the following two lemmas, we discuss the flipping rule of \(\delta _p\).

Lemma 9

(Flipping rule 1) If \(\delta ^{(t)}_p = 0, \delta ^{(t+1)}_p = 0\), the monotonicity constraints hold at time \(t+1\) and \(L = 0\).


The monotonicity constraints hold at time \(t+1\), because \(\varvec{B}^{(t)} = \varvec{B}^{(t+1)}\) and \(\varvec{B}^{(t)}\) satisfy the constraints.

  • \(L_1 = - \min _{\varvec{\alpha }\in {\mathbb {L}}^p_1} \gamma ^{(t)}_{\varvec{\alpha }} \le 0\) because (11) holds at t.

  • \(L_2 = \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma ^{(t)}_{\varvec{\alpha }} - \gamma ^{(t)}_{\varvec{q}^{(t)}} = 0\) because

    $$\begin{aligned} \gamma ^{(t)}_{\varvec{q}^{(t)}} = \min _{\varvec{\alpha }\in \mathbb S^{(t)}_1}\gamma ^{(t)}_{\varvec{\alpha }} \le \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma ^{(t)}_{\varvec{\alpha }} \le \max _{\varvec{\alpha }}\gamma ^{(t)}_{\varvec{\alpha }} = \gamma ^{(t)}_{\varvec{q}^{(t)}}, \end{aligned}$$

    since \({\mathbb {L}}^p_0 \bigcap {\mathbb {S}}^{(t)}_1\) is not empty.

Therefore, \(L = \max (L_1, L_2 ) = 0\). \(\square \)

Lemma 10

(Flipping rule 2) Suppose \(\delta ^{(t)}_p = 1, \delta ^{(t+1)}_p = 0\). The monotonicity constraints hold at time \(t+1\) if \(L \le 0\).


If \(\varvec{q}^{(t)} = \varvec{q}^{(t+1)}\), the statement can be proved easily by Lemma 6 and Lemma 8. We check (11) and (13) in for the case that \(\varvec{q}^{(t)} \succ \varvec{q}^{(t+1)}\).

  • Since \(L_1 = - \min _{\varvec{\alpha }\in {\mathbb {L}}^p_1} \gamma ^{(t+1)}_{\varvec{\alpha }} \le 0\) and \(\min _{\varvec{\alpha }\in {\mathbb {L}}^p_0} \gamma ^{(t+1)}_{\varvec{\alpha }} = \min _{\varvec{\alpha }\in {\mathbb {L}}^p_0} \gamma ^{(t)}_{\varvec{\alpha }} \ge 0\), (11) holds at \(t+1\).

  • By Remark 9.4, for any \(\varvec{\alpha }\in {\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t+1)}\),

    $$\begin{aligned} \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} = \gamma ^{(t+1)}_{\varvec{q}^{(t)}} > \gamma ^{(t+1)}_{\varvec{\alpha }} \end{aligned}$$

    Together with \(L_2 = \max _{\varvec{\alpha }\in {\mathbb {L}}^p_0}\gamma ^{(t+1)}_{\varvec{\alpha }} - \gamma ^{(t+1)}_{\varvec{q}^{(t+1)}} \le 0\), (17) holds for any \(\varvec{\alpha }\in {\mathbb {L}}^p_0 \bigcup ({\mathbb {L}}_1^p \bigcap \mathbb S_0^{(t+1)})\). Further, as shown in the proof of Lemma 8, we have \(\{\varvec{\alpha }| \varvec{\alpha }\in {\mathbb {S}}^{(t+1)}_0, \varvec{\alpha }\notin {\mathbb {S}}^{(t)}_0 \} \subseteq {\mathbb {L}}^p_0\), such that \({\mathbb {S}}^{(t+1)}_0 \subseteq {\mathbb {L}}^p_0 \bigcup ({\mathbb {L}}_1^p \bigcap {\mathbb {S}}_0^{(t+1)})\).

\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Culpepper, S. & Liang, F. A Sparse Latent Class Model for Cognitive Diagnosis. Psychometrika (2020) doi:10.1007/s11336-019-09693-2

Download citation


  • sparse latent class models
  • Bayesian variable selection
  • identifiability