# Scalability of Data Decomposition Based Algorithms: Attribute Reduction Problem

## Abstract

This paper studies the issue of scalability of data decomposed based algorithms that are intended for attribute reduction. Two approaches that decompose a decision table and use the relative discernibility matrix method to compute all reducts are investigated. The experiments results reported in this paper show that application of the approaches makes it possible to gain a better scalability compared with the standard algorithm based on the relative discernibility matrix method.

## Keywords

Rough sets Attribute reduction Data decomposition Scalability## 1 Introduction

Attribute reduction is a challenging task in areas such as data mining and pattern recognition. Rough set theory [9] as a mathematical tool to deal with inconsistent data has commonly been used to investigate this issue from a theoretical and practical viewpoint (e.g. [6, 10, 11]). Much research has been devoted to finding a reduct, especially a minimal one. Although one reduct is sufficient to reduce the attribute set, the problem of finding all reducts still has its justification. A deeper analysis of the data can be conducted when all reducts are known.

The method proposed in [10] for finding all reducts is based on a discernibility matrix and its alternative representation in the form of discernibility function. The idea of relative discernibility matrix/function for attribute reduction in decision tables has been intensively studied by many researchers (e.g. [2, 5, 7]).

The main problem to face when developing a method for finding reducts is the computationally complexity of the attribute reduction task. Finding all reducts is proven to be an NP-hard problem [10]. Much effort has therefore been made to accelerate the attribute reduction process (e.g. [1, 12, 14]).

Another direction for making attribute reduction methods more efficient for large databases is to divide the attribute reduction problem into subproblems. It can be done by applying data decomposition based attribute reduction approaches. Such a solution makes it possible to considerably decrease the space complexity. This limitation is essential since approaches for computing all reducts are mainly based on the discernibility matrix method which leads to quadratic complexity with respect to the data size.

One can encounter a few studies on the use of data decomposition for finding all reducts of a decision table. In [3], the discernibility matrix of a decision table is divided into submatrices. The reduct set is computed based on those obtained from the submatrices. In [4], a general data decomposition based approach for computing all reducts of an information system and decision table is proposed. It can be treated as a generalization of the approach from [3]. A data decomposition based method proposed in [13] uses the core attribute to generate all minimal reducts.

All the above approaches were verified theoretically; however, no experimental research has been reported yet. A practical verification is needed to evaluate an important property of a data decomposition based approach, i.e. its scalability compared with the approach that operates on the whole data.

The paper’s contribution is to define the notion of scalability in the context of data decomposition based algorithms. The paper also presents experimental research that shows the scalability of data decomposition based algorithms for attribute reduction. The algorithms use the relative discernibility matrix method and are constructed based on the approach introduced in [4] and on its modification proposed in this paper.

Section 2 restates basic notions related to attribute reduction in rough set theory. Section 3 introduces a data decomposition based approach for attribute reduction and proposes its modification using dual reducts. The problem of scalability of data decomposition based algorithms is investigated in Sect. 4 from the theoretical and practical viewpoints. Section 5 provides concluding remarks.

## 2 Basic Notions

This section restates basic definitions from rough set theory related to attribute reduction.

### **Definition 1**

[9] (decision table) A decision table is a pair \(DT=\left( U,A\cup \{d\}\right) \), where *U* is a non-empty finite set of objects, called the universe, *A* is a non-empty finite set of condition attributes, and \(d\not \in A\) is the decision attribute.

Each attribute \(a\in A\cup \{d\}\) is treated as a function \(a:U\rightarrow V_a\), where \(V_a\) is the value set of *a*.

For a decision table a relative indiscernibility relation and relative reduct of the attribute set are defined.

### **Definition 2**

*IND*(

*B*,

*d*) generated by \(B\subseteq A\) on

*U*is defined by

### **Definition 3**

The set of all relative reducts of *A* on *U* is denoted by *RED*(*A*, *d*).

The relative reduct set of a decision table can be computed using a relative discernibility function.

### **Definition 4**

*k*Boolean variables \(a_1^{*},\dots ,a_k^{*}\) that correspond, respectively, to attributes \(a_1,\dots ,a_k\in A\) and is defined by

*DT*such that

\(\forall _{x,y\in U} c^d_{x,y}=\{a\in A:a(x)\ne a(y), d(x)\ne d(y)\}\).

A prime implicant^{1} \(a^{*}_{i_1}\wedge \dots \wedge a^{*}_{i_k}\) of \(f_{DT}\) is equivalent to a relative reduct \(\{a_{i_1},\dots ,a_{i_k}\}\) of *DT*. For details, see e.g. [4, 10].

## 3 Decomposition of Decision Table

This section introduces two data decomposition based approaches for attribute reduction.

Let \(DT=(U,A\cup \{d\})\) be a decision table.

### 3.1 Reduct Based Approach

In this approach partial results are reduct sets of subtables of the decision table.

A relative indiscernibility relation and relative reduct of attribute set on a universe subset are defined as follows.

### **Definition 5**

### **Definition 6**

*A*on \(X\subseteq U\) if and only if

- 1.
\(IND_X(B,d)=IND_X(A,d)\),

- 2.
\(\forall _{\emptyset \ne C\subset B}IND_X(C,d)\ne IND_X(B,d)\).

The set of all relative reducts of *A* on \(X\subseteq U\) is denoted by \(RED_X(A,d)\).

To decompose a decision table (see Fig. 1), each its decision class (i.e., the set \(X_v=\{x\in U:d(x)=v\}\), where \(v\in V_d\)) is divided into subsets (middle subtables), then each pair of subsets of different classes is merged into one set (final subtables). To compute relative reduct sets of a decision table, the subreduct sets of all the final subtables are joined using the following operation.

### **Definition 7**

- 1.
\(\mathcal {S}\,\dot{\cup }\,\emptyset =\emptyset \,\dot{\cup }\,\mathcal {S}=\mathcal {S}\);

- 2.
\(\mathcal {S}\,\dot{\cup }\,\mathcal {S}^\prime =\{S\cup S^\prime : S\in \mathcal {S},S^\prime \in \mathcal {S}^\prime \}\);

- 3.
\({\dot{\bigcup }}_{i=1}^k \mathcal {S}_i=\mathcal {S}_1\,\dot{\cup }\,\mathcal {S}_2\,\dot{\cup }\,\cdots \,\dot{\cup }\,\mathcal {S}_k\), where \(k>1\).

The family of attribute subsets created by the above operation includes, in general, not only reducts but also supersets of them. To remove unnecessary sets, the following operation is used. Let \(min(\mathcal {S})\) be the set of minimal elements of a family \(\mathcal {S}\) of sets partially ordered by the relation \(\subseteq \).

### **Theorem 1**

*DT*. Let \(\mathcal {X}_{v_i}\) be a covering of \(X_{v_i}\) \((1\le i\le k)\). The following holds

### 3.2 Dual Reduct Based Approach

This subsection proposes an approach that uses dual reducts of subtables of the decision table.

### **Proposition 1**

### *Proof*

(sketch) We have \(RED(A,d)=PI(\mathop \bigwedge \limits _{c^d_{x,y}\ne \emptyset }\mathop \bigvee \limits _{a\in c^d_{x,y}}a^{*})\), where *PI*(*p*) is the set of all prime implicants of a Boolean expression *p*. We obtain \({\dot{\bigcup }}_{c^d_{x,y}\ne \emptyset } \{\{a\}:a\in c^d_{x,y}\}=\mathop \bigwedge \limits _{c^d_{x,y}\ne \emptyset }\mathop \bigvee \limits _{a\in c^d_{x,y}}a^{*}\) and \(PI(f_{IS}(a^*_1,\dots ,a^*_m))\Leftrightarrow min(\mathcal {S}_{f_{IS}})\) where \(\mathcal {S}_{f_{IS}}\) is the family of sets corresponding to \(f_{IS}\) (for details, see [4]).

### **Definition 8**

*RED*(

*A*,

*d*) is defined by

### **Proposition 2**

This can be proved analogously to Proposition 1.

### **Theorem 2**

*DT*. Let \(\mathcal {X}_{v_i}\) be a covering of \(X_{v_i}\) \((1\le i\le k)\). The following holds

### *Proof*

(sketch) By Proposition 1 and \(\mathcal {S}=min(\{c^d_{x,y}\ne \emptyset :x,y\in U\})\).

The approach based on Theorem 2 can use any dual attribute reduction algorithm for computing all dual reducts in subtables.

## 4 Scalability of Data Decomposition Based Algorithms

This section defines the notion of scalability in the context of data decomposition based algorithms. The experiments results reported here illustrate the introduced definitions.

A data decomposition based algorithm is understood in this work as follows.

### **Definition 9**

- 1.
Decomposition of the database into certain number of portions such that the union of them is the whole database.

- 2.
The use of an additional algorithm, called embedded algorithm, to compute partial results on portions or their combinations, e.g. union.

- 3.
Merging the partial results to obtain the final result that coincides with that computed by the embedded algorithm on the whole database.

### 4.1 Scalability with Respect to the Number of Data Portions

To evaluate the scalability of a dd-algorithm the following definition is proposed. Let *n* be the data size and *p* be the number of portions the data is divided into.

### **Definition 10**

(scalability with respect to the number of data portions) A dd-algorithm is scalable with respect to the number of portions the data is divided into if its run-time is constant as *p* is increased and *n* is constant.

Theoretically, *p* can be increased up to *n*, i.e. each data portion includes one object. Normally, such a “dense” data decomposition is unnecessary or even undesirable, e.g. due to the large number of files, each including one object. In practice, the number of data portions can be indirectly defined by the maximal allowed size of the data portion, e.g. memory capacity limitation.

From practical viewpoint, scalability to some extent is sufficient.

### **Definition 11**

(scalability to extent \(p^\prime \) with respect to the number of data portions) A dd-algorithm is scalable to extent \(p^\prime \) with respect to the number of portions the data is divided into if its run-time is constant as *p* is increased up to \(p^\prime \) and *n* is constant.

Characteristics of databases

sym\(^\mathrm{a}\) | db | attr | obj | cls | red | sym | db | attr | obj | cls | red |
---|---|---|---|---|---|---|---|---|---|---|---|

D1 | electricity brd | 5 | 45781 | 31 | 2 | D2 | kingrook vs king (krk) | 7 | 28056 | 18 | 1 |

D3 | pima ind. diab | 9 | 768 | 2 | 28 | D4 | nursery | 9 | 12960 | 5 | 1 |

D5 | shuttle | 10 | 43500 | 7 | 19 | D6 | australian credit appr | 15 | 690 | 2 | 44 |

D7 | adult | 15 | 32561 | 2 | 2 | D8 | mushroom | 23 | 8124 | 7 | 4 |

D9 | trains | 33 | 10 | 2 | 333 | D10 | sonar, mines vs. rocks | 61 | 208 | 2 | 1314 |

Attribute reduction with varying number of data portions

db | RDM \(^\mathrm{a}\) | RA | DRA | ||||||
---|---|---|---|---|---|---|---|---|---|

1* | 1 | 2 | 5 | 10 | 1 | 2 | 5 | 10 | |

D1 | 649.87 | 277.28 | 278.41 | 279.52 | 277.63 | 276.84 | 277.70 | 278.55 | 277.63 |

D2 | 249.63 | 116.14 | 116.60 | 117.09 | 118.18 | 116.63 | 117.27 | 117.98 | 118.91 |

D3 | 000.14 | 000.06 | 000.06 | 000.09 | 000.14 | 000.06 | 000.06 | 000.06 | 000.06 |

D4 | 050.06 | 021.07 | 021.29 | 021.91 | 021.22 | 021.21 | 021.04 | 021.27 | 021.15 |

D5 | 377.57 | 144.43 | 145.62 | 143.79 | 145.58 | 144.71 | 145.78 | 144.65 | 146.13 |

D6 | 000.22 | 000.11 | 000.19 | 000.97 | 002.39 | 000.11 | 000.11 | 000.11 | 000.13 |

D7 | 299.46 | 121.14 | 120.05 | 119.92 | 122.44 | 121.76 | 120.64 | 119.85 | 121.80 |

D8 | 050.98 | 026.84 | 027.50 | 030.69 | 049.52 | 025.86 | 026.40 | 026.28 | 026.77 |

D9 | 000.27 | 000.25 | 000.52 | 000.48 | — | 000.02 | 000.25 | 000.25 | — |

D10 | 090.01 | 090.53 | 102.18 | 181.51 | 343.16 | 091.36 | 100.36 | 096.17 | 092.27 |

The most interesting observation derivable from Table 2 is that the dd-algorithms are about twice faster compared to the standard algorithm. The latter is more time consuming because of operations such as loading the whole data into the memory, constructing all pairs of objects to check which of them are to be used to compute the discernibility matrix cells. The only exception is the last database, i.e. *sonar, mines vs. rock*, where the dd-algorithms need (a little) more time than the standard one. This phenomena can be caused by the large number of computations due to a big number of reducts (1314).

The result of the RA version are comparable with those of the DRA one for databases with small number of attributes or reducts. In the remaining cases, the former version is more time consuming. The main reason is the \(\,\dot{\cup }\,\) operation (see Definition 7 and Theorem 1) is used directly to compute the final reduct set based on subreduct ones. This solution is not efficient since for each two subreduct sets to be joined we construct all possible combinations of their reducts and then check these sets to find the minimal ones. Joining of two subreduct sets in such way that the minimal sets are directly obtained could considerably speed up finding the reduct set. This issue is to be the direction of future work.

Summing up, the DRA version is more scalable (at least to do degree 10) that the standard one w.r.t. to the number of data portions. The scalability of RA version depends on the sizes of attribute set and reduct set.

### 4.2 Scalability with Respect to the Data Size

The scalability of a dd-algorithm with respect to the data size is defined as follows.

### **Definition 12**

(scalability w.r.t. the data size) A dd-algorithm with the fixed \(p>1\) is scalable with respect to the data size if its run-time is constant in comparison with the run-time of the algorithm with \(p=1\) as *m* is increased^{2}.

Note that the scalability of a dd-algorithm w.r.t. the data size does not coincide with the general scalability of an algorithm w.r.t. the data size^{3}. The general scalability of a dd-algorithm depends mainly on that of the embedded algorithm.

To investigate this type of scalability, each of three selected databases (i.e., *electricity board*, *shuttle*, and *sonar, mines vs. rocks*) was divided into ten samples. Both versions of the data decomposition based algorithm (DRA and RA) were tested for \(p=2\).

*sonar, mines vs. rocks*database is an exception). Therefore, one can conclude that the scalability of the dd-algorithms w.r.t. the data size (at least for ten times bigger databases) is comparable with that of the standard one.

Attribute reduction with growing data size

db | alg | \(1^\mathrm{a}\) | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|

D1 | RDM | 006.34 | 025.51 | 058.57 | 102.60 | 159.83 | 229.84 | 313.17 | 408.66 | 525.81 | 649.87 |

RA | 002.92 | 011.63 | 025.67 | 046.64 | 071.33 | 102.74 | 139.87 | 182.61 | 230.99 | 283.66 | |

DRA | 002.87 | 011.76 | 025.82 | 046.70 | 071.50 | 102.87 | 139.61 | 182.84 | 231.19 | 283.44 | |

D5 | RDM | 003.45 | 014.13 | 034.22 | 061.47 | 096.75 | 139.22 | 188.43 | 243.68 | 308.62 | 377.57 |

RA | 001.48 | 005.84 | 013.12 | 023.30 | 036.05 | 052.11 | 070.82 | 092.87 | 118.02 | 145.62 | |

DRA | 001.51 | 005.90 | 013.21 | 023.43 | 036.58 | 052.25 | 071.45 | 092.98 | 117.76 | 145.78 | |

D10 | RDM | 000.27 | 000.93 | 004.21 | 005.53 | 010.17 | 015.67 | 027.15 | 053.47 | 088.75 | 090.01 |

RA | 000.27 | 000.84 | 003.63 | 007.78 | 014.71 | 030.37 | 038.17 | 069.79 | 101.03 | 102.18 | |

DRA | 000.27 | 000.89 | 004.31 | 005.54 | 009.74 | 015.30 | 026.65 | 051.50 | 088.90 | 100.36 |

### 4.3 Full Scalability

Using Definition 10 or 12 one can define the full scalability of a data decomposition based algorithm.

### **Definition 13**

(full scalability based on scalability w.r.t. *p*) A dd-algorithm is fully scalable if its scalable with respect to *p* as *n* is increased.

### **Definition 14**

(full scalability based on scalability w.r.t. *n*) A dd-algorithm is fully scalable if its scalable with respect to *n* as *p* is increased.

The above scalability, like that from Definition 10, is not necessary in practice. Therefore, a less strict version is proposed.

### **Definition 15**

(quasi full scalability) A dd-algorithm is quasi fully scalable if it is scalable with respect to the data size as \(p=n/n_p\), where \(n_p\) is the fixed data portion size.

The quasi full scalability is desirable when the data size grows over time and the data partition size is limited in advance. In such a case, we have to increase the number of data portions at any time the data size grows respectively.

^{4}. For the first two databases the results reported in Table 4 are coincided with those from Table 3. Namely, the increase of

*p*does not influence the run-time of the dd-algorithms. For the third database slower run-time can be observed only for the DRA version. The other version is not efficient due to the same reason as in Sect. 4.1. Therefore, DRA version can be treated as quasi fully scalable.

Attribute reduction with growing data size and number of data portions

db | alg | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|

D1 | RDM | 006.34 | 025.51 | 058.57 | 102.60 | 159.83 | 229.84 | 313.17 | 408.66 | 525.81 | 649.87 |

RA | 002.87 | 011.55 | 025.80 | 045.84 | 071.81 | 103.62 | 140.24 | 186.92 | 232.42 | 283.94 | |

DRA | 002.89 | 011.50 | 025.79 | 045.99 | 071.64 | 102.96 | 140.42 | 185.94 | 232.07 | 284.26 | |

D5 | RDM | 003.45 | 014.13 | 034.22 | 061.47 | 096.75 | 139.22 | 188.43 | 243.68 | 308.62 | 377.57 |

RA | 001.50 | 005.93 | 013.32 | 023.66 | 036.98 | 053.35 | 072.92 | 095.66 | 119.67 | 148.15 | |

DRA | 001.48 | 005.94 | 013.34 | 023.64 | 037.09 | 053.19 | 072.64 | 096.46 | 120.11 | 147.50 | |

D10 | RDM | 000.27 | 000.93 | 004.21 | 005.53 | 010.17 | 015.67 | 027.15 | 053.47 | 088.75 | 090.01 |

RA | 000.25 | 000.83 | 003.99 | 010.37 | 024.66 | 058.27 | 098.81 | 172.99 | 267.15 | 344.14 | |

DRA | 000.27 | 000.90 | 004.23 | 005.48 | 009.93 | 015.88 | 026.52 | 046.70 | 082.40 | 092.70 |

## 5 Conclusion

This paper studied the problem of scalability of data decomposition based algorithms. General definitions devoted to investigating the scalability of such algorithms were proposed. They were applied to evaluate data decomposition based algorithms using the relative discernibility matrix method for computing all reducts of a decision table.

The experimental research done under this paper showed that it is possible to obtain the same or better scalability of an attribute reduction algorithm using a data decomposition based approach. The version using dual reducts is more likely to be scalable than that using reducts themselves. In the latter case, the main reason for the increase of the number of computations is that the cardinalities of subreduct sets can be big, often considerably bigger than that of the reduct set. However, the reduct based approach can be improved by applying a more efficient method for computing the final reduct set based on subreduct ones.

## Footnotes

- 1.
An implicant of a Boolean function is any conjunction of literals (variables or their negations) such that, if the values of these literals are true under an arbitrary valuation of variables, then the value of the function under the valuation is also true. A prime implicant is a minimal implicant (with respect to the number of literals).

- 2.
An algorithm with \(p=1\) is understood as one that is run on non-decomposed data.

- 3.
An algorithm is scalable w.r.t. the data size if its run-time grows linearly in proportion to the data size.

- 4.
The number of portions each class of the database is divided into grows proportionally to the number of taken samples.

## References

- 1.Chen, D., Zhao, S., Zhang, L., Yang, Y., Zhang, X.: Sample pair selection for attribute reduction with rough set. IEEE Trans. Knowl. Data Eng.
**24**(11), 2080–2093 (2012)CrossRefGoogle Scholar - 2.Degang, C., Changzhong, W., Qinghua, H.: A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets. Inf. Sci.
**177**(17), 3500–3518 (2007)zbMATHCrossRefMathSciNetGoogle Scholar - 3.Deng, D., Huang, H.-K.: A new discernibility matrix and function. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 114–121. Springer, Heidelberg (2006) CrossRefGoogle Scholar
- 4.Hońko, P.: Attribute reduction: A horizontal data decomposition approach. Soft Comput. (2014). doi: 10.1007/s00500-014-1554-8
- 5.Hu, X., Cercone, N.: Learning in relational databases: a rough set approach. Comput. Intell.
**11**(2), 323–338 (1995)CrossRefGoogle Scholar - 6.Kryszkiewicz, M.: Comparative study of alternative type of knowledge reduction in inconsistent systems. Int. J. Intell. Syst.
**16**, 105–120 (2001)zbMATHCrossRefGoogle Scholar - 7.Kryszkiewicz, M.: Rough set approach to incomplete information systems. Inf. Sci.
**112**(1–4), 39–49 (1998)zbMATHCrossRefMathSciNetGoogle Scholar - 8.Miao, D., Zhao, Y., Yao, Y., Li, H.X., Xu, F.: Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Inf. Sci.
**179**(24), 4140–4150 (2009)zbMATHCrossRefMathSciNetGoogle Scholar - 9.Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic, Dordrecht (1991) zbMATHCrossRefGoogle Scholar
- 10.Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support. Springer, Amsterdam (1992) Google Scholar
- 11.Swiniarski, R.: Rough sets methods in feature reduction and classification. Int. J. Appl. Math. Comput. Sci.
**11**(3), 565–582 (2001)MathSciNetGoogle Scholar - 12.Thi, V.D., Giang, N.L.: A method for extracting knowledge from decision tables in terms of functional dependencies. Cybern. Inf. Technol.
**13**(1), 73–82 (2013)MathSciNetGoogle Scholar - 13.Ye, M., Wu, C.: Decision table decomposition using core attributes partition for attribute reduction. In: ICCSE. vol. 23, pp. 23–26. IEEE (2010)Google Scholar
- 14.Zhang, X., Mei, C., Chen, D., Li, J.: Multi-confidence rule acquisition oriented attribute reduction of covering decision systems via combinatorial optimization. Knowl.-Based Syst.
**50**, 187–197 (2013)CrossRefGoogle Scholar