Keywords

1 Introduction

Similarity is one of fundamental aspects of reasoning in artificial intelligence [1]. In this paper, we show a similarity-based approach to constructing classification and identification models for compound objects. By an object we mean an element of real world, which can be stored as a data object represented using ontology specified in alignment with domain knowledge about a given problem [2]. By a compound object we mean an object, which combines a plurality of objects within ontology-definable structure. Such structure can be further used to synthesize similarities basing on analysis of object components [3].

The proposed framework is based on hierarchical comparisons of investigated objects with reference sets reflecting different levels of object structures. As a case study, we consider a task of identifying components in texts representing bibliography items [4]. The process of assigning dynamically derived text fragments to particular categories relies on comparing them with a reference database of publications treated as compound objects [5]. It can be envisioned as a resemblance-based recognition method, where similarity to labeled objects enables us to extrapolate assignments onto new items.

In our previous research, we investigated a number of applications of compound object comparators going beyond typical classification tasks [6]. We also attempted to provide possibly complete description of how to construct networks of comparators in practice. However, a deeper analysis of theoretical foundations of our approach has been still missing. Thus, the main focus in this paper is on mathematical interpretation of transmitting comparisons through a network, using mainly terminology of fuzzy sets and relations [7].

The paper is organized as follows: Sect. 2 introduces basic notions corresponding to a single comparator of compound objects. Section 3 establishes foundations for similarity-related operations, which occur inside multilayered networks of comparators. Section 4 recalls and clarifies already-mentioned experiments with analysis of bibliography items. Section 5 summarizes our work and specifies some research directions for nearest future.

One can think about our approach as analogous to feedforward neural networks [8]. However, comparators work with two kinds of information: about an input object described by its possible structural characteristics and attribute values, and about its similarities to reference objects produced as outputs of previous network layers. Coexistence of these two components makes our model unique. On the other hand, it is certainly useful to compare it with other frameworks, such as those developed using already-mentioned fuzzy sets [9], those based on rough sets and rough mereology [10] and others.

2 Basics of Comparators

Comparator com computes a vector of similarities of an input object \(u\in U\) to elements of a subset X(u) of reference set ref. The aim of com is to narrow down the space of reference objects comparable to a given u. Such output can be used as final result of a comparison module embedded into a bigger application, it can be also combined with outputs of other comparators or transmitted to comparators in the next layer of a more compound network.

Comparator com receives a value of u on an attribute a, denoted as a(u), and compares it to values a(x) for reference objects \(x\in X(u)\). The choice of a and other parameters inside com are based on domain knowledge about a given problem. A content of \(X(u)\subseteq ref\) may depend on outcomes of other comparators in a network. At the start of computations, we assume \(X(u)=ref\). Given input set U, we will represent com as function \(\mu _{com}:U \times 2^{ref}\rightarrow [0,1]^{ref}\), where \([0,1]^{ref}\) denotes all fuzzy sets over discrete domain ref.

Fuzzy sets \(\mu _{com}(u,X(u))\) are computed in several steps. First, \(u\in U\) is compared to each \(x\in X(u)\) separately. The result of comparison can be represented as fuzzy relation \(\mu _a(u,x)\) between values (representations) of a for u and x. Quantities of \(\mu _a(u,x)\) are then filtered in two stages. First, we check through predefined exception rules to exclude reference objects, which should not be compared with u based on available information. Secondly, we check whether the remaining quantities are not lower than an activation threshold \(p>0\). Overall, we can use a formula modifying initial \(\mu _a\) as follows:

$$\begin{aligned} \mu ^{*}_a(u,x)=\left\{ \begin{array}{ll} \mu _a(u,x) &{} \text{ if } x\in X(u) \text{ and } \mu _a(u,x)\ge p\\ &{} \text{ and } \text{ there } \text{ are } \text{ no } \text{ rules } \text{ which }\\ &{} \text{ disallow } \text{ comparing } u \text{ with } x\\ 0 &{} \text{ otherwise }\\ \end{array}\right. \end{aligned}$$
(1)

The next two steps, represented as filtering function \(f:[0,1]^{ref}\rightarrow [0,1]^{ref}\) and sharpening function \(s:[0,1]^{ref}\rightarrow [0,1]^{ref}\), aim at further filtration of similarity coefficients by their mutual comparisons and strengthening the highest remaining weights. For this purpose, it is more convenient to think about vector \(\mu ^{*}_{a}(u)\) with coordinates defined as \(\mu ^{*}_{a}(u)[i]=\mu ^{*}_a(u,x_i)\), \(i=1,...,|ref|\).

The role of f is to increase the amount of zero coordinates of \(\mu ^{*}_{a}(u)\). For example, one can set to 0 all elements, which are not among n highest similarity scores, for \(n\ge 1\). Calculation of most of filtering functions considered in our previous research can be optimized by splitting coordinates onto blocks, deriving f concurrently and merging results. It is particularly important for applications, which require operating with large cardinalities of ref.

The role of s is to introduce non-linearity, whose benefits can be compared to a usage of exponential functions in feedforward neural networks. Let us put \(\mu ^{f}_{a}(u)=f(\mu ^{*}_{a}(u))\). The following formula for s works only with non-zero coefficients and keeps maximal values of \(\mu ^{f}_{a}(u)\) unchanged. These properties are important for both the speed and accuracy of computing.

$$\begin{aligned} s(\mu ^{f}_{a}(u))[i] = \left\{ \begin{array}{ll} \max _{u} \cdot \, e^{\mu ^{f}_{a}(u)[i]- \max _{u}} &{} \text{ if } \mu ^{f}_{a}(u)[i]> 0\\ 0 &{} \text{ otherwise } \end{array}\right. \end{aligned}$$
(2)

where \(\max _{u} = \max _{i}\mu ^{f}_{a}(u)[i]\). Derivation of \(s(f(\mu ^{*}_{a}(u)))\) can be also expressed using operations on fuzzy sets and relations, where similarities between u and reference objects correspond to fuzzy membership degrees.

Vector \(s(f(\mu ^{*}_{a}(u)))\) can be treated as output of single comparator. In a larger network, there can exist several interrelated comparators looking at the same ref by means of different attributes. Let us denote by \(com_1,...,com_l\) a set of such comparators, working with \(a_1,...,a_l\) respectively. We put

$$\begin{aligned} \mu _{com}(u,X(u)) = \overline{\left( s(f(\mu ^{*}_{a_1}(u))),...,s(f(\mu ^{*}_{a_l}(u)))\right) } \end{aligned}$$
(3)

as output of composite comparator com containing \(com_1,...,com_l\) as its parts. The role of function \(\overline{(...)}\) is to synthesize local outcomes in order to send further a unique signal related to ref. Such synthesis can be based on fuzzy t-norms and s-norms, statistical tools, election algorithms and so on [11].

3 Comparator Networks

Let us denote a network of comparators by net. Performance of net can be characterized analogously to a single comparator, by a function \(\mu _{net}:U\rightarrow [0,1]^{ref}\). Overall outcome can be utilized directly in object identification process or, e.g., as an input to a similarity-based classifier, which checks sums of weights of reference objects dropping into particular decision classes.

Structure of net is similar to multilayered feedforward neural networks, although transmitted signals and calculations inside nodes are different. Each layer of net contains a set of comparators and a specific translating/aggregating mechanism. Comparators run in parallel, usually basing on different attributes. Thus, from computational perspective, we can see that concurrency can be achieved both at the level of single comparators and their larger groups. The role of translator is to convert comparator outputs to information about reference objects that would be useful for the next layer. The role of aggregator is to choose the most likely outputs of the translator, in case there was any non-uniqueness in assigning information about input objects to comparators. We will see that those roles can be interpreted using fuzzy t-norms and fuzzy s-norms.

Each object is described using ontologies defined by concepts and relationships between them. Given a hierarchy of concepts, one can consider relationships of generalization and decomposition. Generalization is a relationship of being a sub-object of another object, while decomposition is a relationship of being a parent (super-object) of a set of sub-objects. Particular layers of net usually correspond to hierarchy levels, so transitions between them correspond to generalization of decomposition. This affects the way of handling both input and reference objects, as well as modeling similarities between them.

Consequently, in a single net, different comparators can refer to different types and levels of reference (sub-)objects, using different attributes and parameters. Thus, the first task is to extract for a given u its structural representation, i.e., all its parts and their corresponding attribute values. Moreover, it is not always obvious which parts of u should be compared to particular reference sets. In such cases, a single u can yield multiple possible combinations of assignments of its parts to particular comparators. All such alternative representations, denoted as \(u'\), should be processed through the first layers and, later, the most probable assignments of u’s parts to particular categories of reference objects can be derived. One can think about collections of possible representations \(u'\) as information granules g(u) created around input objects \(u\in U\) [12].

Inputs to each layer are determined by values of attributes for \(u\in U\) or its sub-objects. However, subsets of reference (sub-)objects, which u is going to be compared to, are induced dynamically by comparators in previous layers. In the simplest scenario, comparators in preliminary layers aim at reducing subsets of potentially comparable reference objects using relatively easily-computable attributes, leaving more complex calculations to further layers, where the number of reference items to be compared is already decreased. In other cases, initial layers work with attributes specified for sub-objects, producing vectors of similarities that need translation to the level of similarities between more compound objects, whose attributes are analyzed later. However, the complexity does not need to grow with consecutive layers. In some applications, the first layers can work with relatively basic attributes of compound objects, whose similarities are then translated to lower structural levels for detailed processing.

Types of reference objects can vary from layer to layer or even within a single layer. Comparators in a given layer usually refer to entities at the same level of ontology-based hierarchy of considered objects. However, a given hierarchy level can include multiple types of entities. Let us denote by \(\mu ^{k}_{net}(u)\) an outcome of the k-th layer for input object \(u\in U\), after applying above-mentioned operations of translation and aggregation. Denote by \(ref^{k+1}_1,...,ref^{k+1}_{m(k+1)}\) reference sets used by comparators in the \((k+1)\)-th layer. Our goal in this section is to specify function \(\mu ^{k}_{net}:U\rightarrow [0,1]^{ref^{k+1}_{1}}\times ... \times [0,1]^{ref^{k+1}_{m(k+1)}}\), which takes into account similarity vectors obtained from comparators in the k-th layer. Once we have \(\mu ^{k}_{net}(u)\), we can forward it as a signal granule and prepare subsets \(X(u)^{k+1}_{1}\subseteq ref^{k+1}_{1},...,X(u)^{k+1}_{m(k+1)}\subseteq ref^{k+1}_{m(k+1)}\) to be utilized by next comparators. Those two types of granules – the above signal granule and previously-mentioned information granule g(u) – illustrate a twofold way of operating with information about objects throughout networks of comparators.

The central part of \(\mu ^{k}_{net}\) is matrix \(M^{k}_{net}\) with dimensions \(|ref^{k}_{1}|+...+|ref^{k}_{m(k)}|\) and \(|ref^{k+1}_{1}|+...+|ref^{k+1}_{m(k+1)}|\), which links the k-th and the \((k+1)\)-th layers of net. In its simplest implementation, it is a sparse boolean matrix encoding these of combinations of reference (sub-)objects in sets \(ref^{k}_{1},...,ref^{k}_{m(k)}\) and \(ref^{k+1}_{1},...,ref^{k+1}_{m(k+1)}\), which structurally correspond to each other. Matrices are created during the process of defining reference sets, whose elements are decomposed due to their ontology-based specifications. Connections can be also additionally weighted with degrees expressing, e.g., to what extent particular sub-objects should influence similarities between their parents.

Translation can be executed as a product of \(M^{k}_{net}\) with concatenated vectors of similarities obtained as outputs of comparators \(com^{k}_{1},...,com^{k}_{m(k)}\) in the k-th layer, for each of possible representations of u gathered in information granule g(u). Let us enumerate all such representations as \(u'_1,...,u'_{|g(u)|}\) and denote by \(G^{k}_{net}(u)\) the matrix of all possible output combinations, that is:

$$\begin{aligned} G^{k}_{net}(u) = \left[ \begin{array}{ccc} \mu _{com^{k}_{1}}(u'_{1})[1]&{} \dots &{} \mu _{com^{k}_{1}}(u'_{|g(u)|})[1]\\ \vdots &{} \ddots &{} \vdots \\ \mu _{com^{k}_{1}}(u'_{1})\left[ |ref^{k}_{1}|\right] &{} \dots &{} \mu _{com^{k}_{1}}(u'_{|g(u)|})\left[ |ref^{k}_{1}|\right] \\ \mu _{com^{k}_{2}}(u'_{1})[1] &{} \dots &{} \mu _{com^{k}_{2}}(u'_{|g(u)|})[1]\\ \vdots &{} \ddots &{} \vdots \\ \mu _{com^{k}_{m(k)}}(u'_{1})\left[ |ref^{k}_{m(k)}|\right] &{} \dots &{} \mu _{com^{k}_{m(k)}}(u'_{|g(u)|})\left[ |ref^{k}_{m(k)}|\right] \end{array} \right] \end{aligned}$$
(4)

We can represent the mechanism for computing \(\mu ^{k}_{net}(u)\) as follows:

$$\begin{aligned} \begin{array}{c} \mu ^{k}_{net}(u)[i] = \max _{j}\min \left( (M^{k}_{net}G^{k}_{net}(u))[i][j],1\right) \end{array} \end{aligned}$$
(5)

where [i][j] denotes coordinates of matrix \(M^{k}_{net}G^{k}_{net}(u)\). Surely, specification of required operations in terms of matrices and vectors helps in efficient implementation. On the other hand, we can see below that these calculations can be indeed interpreted by means of well-known t-norms and s-norms.

Firstly, for a given \(u'_j\in g(u)\), column \((M^{k}_{net}G^{k}_{net}(u))[j]\) represents possible similarities of u to reference objects in the \((k+1)\)-th layer. Each of those similarities is computed as a sum of similarities between components of u (distributed among comparators according to combination \(u'_j\)) and reference objects in the k-th layer. If it exceeds 1, then of course we cut it down. Thus, similarities between objects at the \((k+1)\)-th layer are computed as Łukasiewicz’s t-norm of similarities between the corresponding objects at the k-th layer.

Secondly, in order to finally assess similarity of u to a given reference object in the \((k+1)\)-th layer, we look at all combinations in g(u) and choose the maximum possible score. Thus, we follow Zadeh’s s-norm. Intuitively, our usage of t-norm corresponds to taking a conjunction of component similarities in order to judge similarity between compound objects, while our usage of s-norm reflects a disjunction of all alternative ways of obtaining that similarity. From this perspective, our current implementation reflects one of possible specifications and other settings of t-norm and s-norm could be considered as well.

Surely, the above layout is still a kind of simplification. As noted in Sect. 2, some comparators can comprise of multiple sub-units referring to different attributes or even different types of objects. However, function-based interpretation of network performance enables to look at such composite cases as a recursive specification of how information is flowing. Moreover, it lets better understand how to adapt existing data-based learning approaches, such as error backpropagation in neural networks transmitting compound signals [13], which might be utilized, e.g., to adjust weights in translation matrices.

Actually, the topic of learning comparator networks is far wider. For example, parameters responsible for synthesis of partial outcomes of composite comparators, usage of layer outcomes to specify reference subsets for next layers, as well as aggregation of final network results can be all tuned by basing on, e.g., evolutionary algorithms [14]. Moreover, some attribute and object selection methods developed within already-mentioned framework of rough sets could be utilized to optimize configuration of comparators and reference sets [15].

4 Illustrative Example

Methods outlined in previous sections have been used in a number of academic and commercial projects. As a case study, let us discuss the task of analysis of bibliography items, described in more detail in [5]. The goal here is to determine structural patterns of references represented as unstructured texts, so their fragments get identified as members of classes such as author names, paper titles, publication dates and so on. The comparator-network-based solution aimed at this kind of text processing was designed as a component of the system responsible for indexing articles stored in scientific repositories [4].

As an example, the text “Sosnowski, Ł.: Framework of Compound Object Comparators. Intelligent Decision Technologies (2015)” should be recognized as aligned with structural pattern ATJY, where A, T, J and Y stand for authors, title, journal and year, respectively. Also, “Sosnowski, Ł.” should be identified as existing or added as new element of reference set of authors etc.

Such recognition process can be divided into preprocessing, parsing and classification. The first stage is responsible for filtering out completely useless characters (e.g.: exclamation marks). The second stage splits text onto potentially meaningful parts, basing on appropriate interpretation of punctuation and additional rules aiming at merging some of produced parts together and final cleaning. As a result, we obtain components for further usage.

For the third stage, we employ the network with input layer containing comparators corresponding to the following categories of reference objects acquired from the considered repository [4]: Authors (A), Book (B), Country (C), Doi (D), Journal (J), Pages (P), Proceedings (R), Series (S), Title (T), Volume (V), Year (Y). Different comparators work with different attributes. We assume that elements of reference sets are already correctly classified.

Comparator dedicated to authors includes sub-comparators looking at sorted initials (si), longest lengths of text fragments (ll) and full strings representing authors (au). Their similarity measures are as follows:

$$\begin{aligned} \mu _{si}(u,x)= & {} 1-\frac{d_L(u,x)}{max\{n(u),n(x)\}}\nonumber \\ \mu _{ll}(u,x)= & {} 1-\frac{|n(u)-n(x)|}{max\{n(u),n(x)\}}\nonumber \\ \mu _{au}(u,x)= & {} \frac{1+pos(u,x)-neg(u,x)}{2+pen(u,x)} \end{aligned}$$
(6)

where n(x) denotes the length of x (if x and u are empty, then we put \(\mu _{si}(u,x)=\mu _{ll}(u,x)=1\)), \(d_L(u,x)\) denotes Levenshtein’s edit distance, pos(ux) is the average similarity between tokens occurring within u and their corresponding best-matching tokens within x, neg(ux) is the ratio of tokens within u, for which we could not find any sufficiently similar tokens within x (please note that both pos and neg cannot exceed 1), and pen(ux) is the number of tokens within x, which were not chosen as best-matching counterparts for any tokens within u. Similarities used in other comparators are defined analogously, sometimes also involving comparisons of regular expression patterns.

For experiments, for training and testing, we use data sets with 132 and 268 texts, respectively. Training data set is used to fill in reference set of structural patterns. For each of 132 texts, we manually detect and classify their parts to A/B/C/D/J/P/R/S/T/V/Y categories and treat obtained sequences of codes (such as ATJY above) as structural reference objects.

Network is initiated with default activation thresholds \(p = 0.5\) and uniform aggregation/translation weights. For each comparator and each text used for training, there is a dedicated unit test, which checks whether comparator’s output includes correct answer. If not, then – depending on specific situation – reference set is enriched with a new object, which covers a given case, or comparator’s activation threshold is set to be less rigorous.

Each of parsed test texts is processed in two stages. Firstly, our network completes part classification and produces the sets of candidate structural patterns. Then, the network conducts structure classification based on comparing those candidates with patterns in structural reference set. Final result is an ordered subset of reference structural patterns.

Table 1. Upper left/right: best/worst results obtained when using part classification. Lower left/right: best/worst results obtained using complete process (part classification + structure classification). \(P_{*}\) (where \(*\) is p or m), \(R_{*}\) and \(F_{*}\) stand for precision, recall and \(F_{1}\)-score, respectively. \(_{p}\) and \(_{m}\) stand for measurements related to outcomes of part classification and structure classification, respectively.

Table 1 includes results in terms of standard evaluation measures, such as precision, recall and \(F_{1}\)-score [16]. It shows the best and the worst results for part classification (the first stage only) and complete solution (both stages mentioned above). Global average values of \(F_{1}\)-score are equal to 0.86 and 0.78 for the first case and the second case, respectively.

The reason for lower \(F_{1}\)-score in the second case is that some structural patterns obtained for test texts may not be present in structural reference sets, so performing structure classification is actually a harder task. It can also happen that inputs are corrupted or wrongly created, which is a bigger problem for entire texts than for their parts. Still, obtained results make it possible to use this solution in practice, if applied together with incremental methods for cleaning, unifying and extending reference sets.

5 Conclusion

Networks of comparators are useful for solving decision problems requiring similarity modelling. They are characterized by a common modular approach to various tasks, such as classification, identification, etc., based on comparator units and their corresponding reference sets. In this paper, we showed to what extent networks of comparators can be described using fuzzy set terminology and operations. We hope that reported mathematical foundations will lead toward new areas of applications of our methodology.