Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A very simple data structure is a triple \(\mathfrak {C}= \langle U,V,R \rangle \) where R is a binary relation between elements of U and elements of V which is sometimes called a formal context [6, 19]. From this, various data models can be obtained, one of the more popular ones being the concept lattice obtained from \(\mathfrak {C}\) introduced by Wille [19]. With each concept a line diagram can be associated which depicts the concept lattice in a consolidated way. For lack of space we shall not describe this further; for details we invite the reader to consult, for example, [20] or [6].

As a context \(\mathfrak {C}\) grows large, the construction of the concept lattice is costly and it is difficult to interpret the structure and its associated line diagram. Therefore, various techniques have been proposed to simplify a formal context \(\mathfrak {C}\) or its associated concept lattice such as stability indices [1, 11, 14, 15] which only consider only part of the concept lattice, simplification using fuzzy K-Means clustering (FKM) [13] or object similarity [2], or selection of relevant concepts in the presence of noisy data [11]. All these techniques can be subsumed under one of the following strategies:

  1. 1.

    Omit attributes (or objects), or

  2. 2.

    Merge attributes (or objects) which are similar according to some criterion, or

  3. 3.

    Remove concepts with low index values.

In each case, the adjacency matrix of R is changed. However, reducing the matrix does not guarantee that the associated concept lattice will be reduced as well, see Example 3 of [12]. In this paper we propose a simple algorithm to simplify a concept which does not increase the size of its associated concept lattice.

2 Notation and Definitions

Throughout we suppose that \(U = \{p_1, \ldots , p_n\}\) is a finite set of objects (such as problems) and \(V = \{s_1, \ldots , s_k\}\) is a finite set of attributes (such as skills). \(R \subseteq U \times V\) is a binary relation between elements of U and elements of V. For each \(p \in U\) we set \(R(u) \overset{\mathrm {df}}{=}\{s \in V: pRs\}\), and \(\fancyscript{R}\overset{\mathrm {df}}{=}\{R(u): u \in U\}\). The identity relation on U is denoted by \(1'_U\). The relational converse of R is denoted by , and \(-R\) is the complement of R in \(U \times V\). The set \(\fancyscript{R}\) is partially ordered by \(\subseteq \). The adjacency matrix of R has rows labeled by the elements of U, and columns labeled with the elements of V. An entry \(\langle u,v \rangle \) is 1 if and only if \(u_iRs_j\), otherwise, the entry in this cell is left empty. A formal context \(\langle U,V,R \rangle \) gives rise to several set operators frequently used in modal logics: Let \(X,X' \subseteq U\) and define

The mappings \(\langle R \rangle \) and \([[ R ]]\) are, respectively, the existential (disjunctive) and universal (conjunctive) extensions of the assignment \(x \mapsto R(x)\) to subsets of U, since it follows immediately from the definitions that for all \(x \in U, X \subseteq U\),

$$\begin{aligned} \langle R \rangle (\{x\})&= [[ R ]](\{x\}) = R(x), \end{aligned}$$
(1)
$$\begin{aligned} \langle R \rangle (X)&= \bigcup _{x \in X} R(x), [[ R ]](X) = \bigcap _{x \in X} R(x). \end{aligned}$$
(2)

The operators \([[ R ]]\) and \([ R ]\), as well as \(\langle R \rangle \), are related since

(3)

For unexplained notation and concepts in lattice theory we refer the reader to [8].

3 Data Models Based on Modal Operators

Suppose we have a formal context \(\mathfrak {C}= \langle U,V,R \rangle \) which we regard as “raw data”. The image sets R(x) are our basic constructs. As a first approach to a data model based on \(\langle U,V,R \rangle \), which, in our view, is a structural representation of raw data, we define a quasiorder \(\preceq \) on U by setting \(x \preceq y\) if and only if \(R(x) \subseteq R(y)\). We also define the incomparability relation by

$$\begin{aligned} x \# y \overset{\mathrm {df}}{\Longleftrightarrow }(x \not \preceq y) \text { and }(y \not \preceq x). \end{aligned}$$
(4)

From this starting point, several more involved data models can be developed. One of the better known models are those based on the sufficiency operators \([[ R ]]\) (“intent”) and (“extent”): For each \(X \subseteq U\), \([[ R ]](X)\) is the set of all attributes common to all elements of X, and for \(Y \subseteq V\), is the set of all objects which possess all attributes in Y. A pair is called a formal concept. The set of all formal concepts can be made into a lattice which can be drawn as a consolidated line diagram [19] as in Fig. 1 Footnote 1. Each node of the diagram represents a formal concept, and for each object x, R(x) is the set of all attributes above the node labelled x (we interpret “above” and “below” as reflexive relations). In the line diagram of R, \(x \preceq y\) if and only if x and y label the same node or the node labelled by y is below the node labelled by x.

Fig. 1.
figure 1

A context and its line diagram

A data model which in some sense competes with concept lattices are the knowledge spaces introduced in [4]. These are set systems closed under union and can be related to the modal operator \(\langle R \rangle \) which is called the span operator in [3]. It was shown in [7] that the models arising from \([[ R ]]\) and \(\langle R \rangle \) have the same expressive power and are useful in situations different from those where conjunctive assignments such as the (DINA) model [9, 10, 16] and the rule space model [18] are employed.

Taking \(\{R(x): x \in U\}\) as a starting point, the set of spans and the set of intent go into different directions: It follows from (1) and (2) that \(\fancyscript{K}_R \overset{\mathrm {df}}{=}\{\langle R \rangle (X): X \subseteq U\}\) is the \(\cup \) – semilattice generated by \(\{R(x): x \in U\}\), and \(\fancyscript{I}_R \overset{\mathrm {df}}{=}\{[[ R ]](X): X \subseteq U\}\) is the \(\cap \) – semilattice generated by \(\{R(x): x \in U\}\). For \(X \subseteq U\), \([[ R ]]\) is the set of all attributes lying above all objects in X, and \(\langle R \rangle (\{x\})\) is the set of all attributes not upwards reachable from object x in the line diagram of \(-R\).

4 Reducing the Complexity

The simplest way to change the adjacency matrix is to change one bit at a time, according to a given criterion. The question arises which criterion we shall use. If \(\preceq \) is a linear quasi order – i.e. if any two objects of U are comparable – then \(\fancyscript{K}_R\) and \(\fancyscript{I}_R\) coincide and are equal to \(\langle \fancyscript{K}_R, \subseteq \rangle \) (possibly with added \(\emptyset \) or V); nothing is gained by going from the simple model \(\langle |C, \preceq \rangle \) to one of the more involved ones. At the other extreme, if no two different elements of U are comparable with respect to \(\#\), then the representations obtained from \(\mathfrak {C}\) very strongly depend on the modal operator used and may widely differ. Consider the simple relation depicted in Fig. 2. There, \(\fancyscript{I}_R\) consists of the singletons \(\{v_i\}\) and the empty set, while \(\fancyscript{K}_R\) is the set of all nonempty subsets of V. If we consider the complement of \(-R\), then situation is reversed, see Fig. 3.

Fig. 2.
figure 2

\(\# = U^2 \setminus 1'_U\), 1st example

Fig. 3.
figure 3

\(\# = U^2 \setminus 1'_U\), 2nd example

Therefore, if the incomparability relation is large, choosing one operator over the other may not provide a meaningful interpretation, and it may not be the wisest choice at the outset to prefer one over the other. Keeping in mind the problem/skill situation, we suggest the relative incomparability of objects as a measure of context complexity which we aim to reduce: If \(\mathfrak {C}= \langle U,V,R \rangle \) is a formal context and \(u \in U\), then we let

$$\begin{aligned} \mathtt {incomp}(u) \overset{\mathrm {df}}{=}\{v \in U: u \# v\}, \quad \mathtt {incomp}(\mathfrak {C}) \overset{\mathrm {df}}{=}\frac{|\{\langle u,v \rangle : u\# v\} |}{n^2 - n}, \end{aligned}$$

where \(n = |U |\). Now, \(\mathtt {incomp}(\mathfrak {C}) = 0\) if and only if \(\preceq \) is a linear quasiorder, and \(\mathtt {incomp}(\mathfrak {C}) = 1\) if no two different elements are \(\preceq \) – comparable. The measure of success is the reduction of \(\mathtt {incomp}(\mathfrak {C})\) relative to the number of bit changes.

Our InComparablity Reduction Analysis algorithm (ICRA)Footnote 2 is based on a simple steepest descent method: We consider objects u for which \(|\mathtt {incomp}(u) |\) is maximal and then invert a bit – i.e. an entry in the adjacency matrix of the relation under consideration – for which the drop of the number of overall incomparable pairs is maximal. This will increase the comparability of objects with respect to \(\preceq \) or, equivalently, of sets R(x) without increasing the number of intents, respectively, knowledge states. Indeed, in most cases we have looked at, the complexity of the concept lattice was significantly reduced. If one bit is inverted, so that the resulting relation is \(R'\) and \(x \preceq _{R'} y\), then there will be a path from y to x in the line diagram of \(R'\) as well, so that the new representation is closer to the data as represented by R.

Fig. 4.
figure 4

Pseudocode of the algorithm

The basic concept is that we assume some of the data to be faulty, but we do not know which entries. More concretely, we assume that some (or all) incomparabilities are caused by faulty data. In this sense, our proposed procedure is a trade – off measure.

The stop criterion is a predetermined relative value of incomparable pairs, i.e. a value for \(\mathtt {incomp}(\mathfrak {C})\), where \(\mathfrak {C}\) is the current context, or no more complexity reduction is possible. As a rule of thumb we suggest to require that 50 % of pairs with different components should be comparable (Median InComparablity Reduction Analysis). An overview of the pseudocode the ICRA algorithm is shown in Fig. 4.

5 Experiments

Even though our procedure is simple, it compares well with other simplification measures. As a case in point we shall consider the reduction using fuzzy K-Means clustering (FKM) proposed in [13]. This method is based on partitioning a set of vectors into k fuzzy clusters, specifying to what degree a vector belongs to the cluster centre. Owing to lack of space we cannot explain their method in detail and refer the reader to [13]. The context \(\mathfrak {C}\) of their first example relates documents with keywords and it is shown in Fig. 5 along with its context lattice. The relative incomparability of \(\mathfrak {C}\) is 94 %.

Fig. 5.
figure 5

Example from [13], p. 2699

After applying FKM based clustering with \(k = 2\), the columns D1 – D2 are identified and the entry \(\langle T_i, D1`--D4 \rangle \) of the resulting adjacency matrix is \(\max \{\langle T_i, D1 \rangle , \ldots , \langle T_i, D4 \rangle \}\). The simplified context \(\mathfrak {C}_1\) and its concept lattice are shown in Fig. 6.

Fig. 6.
figure 6

Example from [13], p. 2699, reduced

To achieve the FKM result \(\mathfrak {C}_1\) from\(\mathfrak {C}\) requires to change 15 bits for a relative incomparability of 49 %; this includes the effort to identify columns. In comparison, our algorithm needs only 4 bits for a 50 % incomparability, and 9 bits for 0 % incomparability. The resulting context along with its line diagram is shown in Fig. 7. It has the same number of concepts as the concept lattice obtained from FKM (9), and the same number of edges (14).

Fig. 7.
figure 7

Reduction of Example 1 from [13] using ICRA

In classification tasks, there is often a trade – off between the (relative) number of correctly classified objects and, for example, the (relative) cost of obtaining the classification or the clarity of a pictorial representation. In some instances, this may be expressed as the amount of errors we are prepared to allow to achieve another aim. A case in point are curves based on receiver operating characteristics (ROC), where the sensitivity (benefit) of a binary classifier is plotted as a function of its FP rate (cost), see [5] for an overview. We can plot the relative incomparability as a function of the number of bits changed to achieve it, see the graph in Fig. 8. If we interpret (in-)comparability as sensitivity and the number of changed bits as cost to retrieve the original data, this can be interpreted as a ROC curve.

Fig. 8.
figure 8

Reducing relative incomparability with ICRA

Fig. 9.
figure 9

Reducing relative incomparability of the bacterial dataset with ICRA

The next example for [13] investigates a dataset consisting of various species of bacteria and 16 phenotypic characters, shown in Table 1.

Table 1. Bacterial dataset from [13]

For this context \(\mathfrak {C}\), the incomparability \(\mathtt {incomp}(\mathfrak {C})\) turns out to be \(81\,\%\). \(\mathfrak {C}\) is reduced with the FKM method for \(k = 5\) and \(k = 9\), resulting in contexts \(\mathfrak {C}_5\) and \(\mathfrak {C}_9\) with \(\mathtt {incomp}(\mathfrak {C}_5) = 34.5\,\%\) and \(\mathtt {incomp}(\mathfrak {C}_9) = 64.7\,\%\). 40 bits are required to reduce \(\mathfrak {C}\) to \(C_5\), and the reduction to \(\mathfrak {C}_9\) with 64.7 % incomparability needs changing 11 bits. In contrast, our algorithm requires changing 19 bits to achieve an incomparability reduction to 34.6 %, and 8 bits for a reduction to 66.1 %. Changing 11 bits (as in the FKM reduction with k = 9) results in a reduction to 60.2 %. The ICRA reducibility graph is shown in Fig. 9.

6 Conclusion and Outlook

We have introduced a simple algorithm ICRA to simplify a formal context, the success criterion of which is a prescribed reduction of incomparable pairs. As a rule of thumb, we propose a relative frequency of incomparable pairs of objects of 50 %. This seems a fair compromise between closeness to the data on the one hand, and the additional structure introduced by the chosen model on the other. We have compared the success of our algorithm with several examples of [13] and have found that fewer bits are needed than FKM to obtain similar incomparability ratios. Furthermore, the FKM algorithm requires much more effort and additional model assumptions so that its cost/benefit ratio is much smaller than for the median comparability algorithm. Furthermore, it is not clear which k should used for the reduction.

In the available space, only an indication of the impact of the median comparability algorithm could be given. Further work will include investigation of the powers and limitations of the ICRA algorithm using both theoretical and practical analysis. In particular, we shall consider its effects on implication sets and association rules.