Introduction

The first motivation of this article is to extend the Closed Pattern Mining (CPM) and Formal Concept Analysis (FCA) methodologies in order to investigate attributed two-mode networks. Note that there is no difference between the two methodologies in that they enumerate the same closed patterns, however FCA is also interested in the structure of this result as a conceptual structure. The present work follows previous work in which CPM and FCA were applied to undirected and directed graphs. In what follows we recall the notions which CPM of attributed networks rely on. Then we also discuss the necessity of defining bi-patterns in order to mine two-mode networks.

Most of the work in social and complex networks analysis consider unlabelled and undirected networks and is concerned by what may be said about the topological structure of the network. Various ways have been proposed to extract interesting subgraphs. In particular in the core-periphery model the network is made of a core subgraph, i.e. a dense subgraph whose vertices are highly connected, together with its periphery, made of vertices highly connected to the core, but poorly interconnected (Borgatti and Everett 2000). The first formal core definition was the k-core subgraph which is the greatest subnetwork whose vertices all have degree at least k in the subnetwork (Seidman 1983). By changing the topological property we obtain various core definitions within the generalized cores framework proposed by V. Batagelj (Batagelj and Zaversnik 2011).

Various recent work on complex networks analysis take into account information provided as labels about vertices or edges. The network is then called a labelled or attributed network. Recently an approach has been presented extending CPM and FCA to mine attributed graphs. For that purpose, the vertex subset in which an attribute pattern occurs is reduced to its core subset using some interior operator (Soldano and Santini 2014). Applying interior operators to compute closed patterns make them abstract closed pattern for which enumeration algorithms exists (Soldano and Ventos 2011). They are called core closed pattern when this methodology rely on core definitions (Soldano and Santini 2014; Soldano et al. 2017a).

Now, two-mode networks are made of two vertex sets representing in general two kind of entities, for instance actors and movies, together with edge relating entities of each kind, as for instance “G. Clooney acted in Ocean’s Eleven”. Until recently they were mostly investigated by extracting single mode networks, relating for instance actors to actors who participated to the same movies. However in (Borgatti and Everett 1997) the authors advocated the direct investigation of two-mode networks, and a core definition for two-mode networks have been recently proposed by Cerinsek and Batagelj (Cerinsek and Batagelj 2015). However applying core closed pattern mining to such two-mode networks requires to extend the methodology. The difficulty is that when such a network is attributed each kind of vertex is described according to a proper attribute set. This means that we have to consider patterns made of two attribute subsets, we further call bi-patterns, that each selects two interconnected vertex subsets we call its support set pair. This allows for instance to require actors to be American and movies to be recent, but only consider vertices of a subnetwork in which each actor played in at least 2 movies and each movie is linked to at least 3 actors. Interestingly, such bi-patterns may also be defined in the directed case when considering subgraphs in which a single pattern is associated to each of the in or out vertex roles. Finally we will see that the methodology we propose may also apply to undirected networks as far as we may dynamically define two different roles in the network, namely here considering in one hand high degree nodes and in the other hand their neighbours.

Note that in oder to properly define bi-pattern mining we also need to extract cores from subgraphs induced by vertex subset pairs. This also means defining cores made of two vertex subsets, which goes beyond generalized cores definition.

On the computational side, we adapt the general core extraction algorithm for our new core definitions and we propose a closed bi-pattern enumeration algorithm that we have implemented within the minerLC softwareFootnote 1. We have experimented the resulting program on three networks. The first network is an epistemological two-mode network relating deep sea exploration campaigns to their participants (Bary 2018). The second network is a lawyers network in which directed links represents lawyers asking for advice from other lawyers (Lazega 2001) that was previously used to illustrate closed pattern mining of attributed directed networks (Soldano et al. 2017b). The third one is an undirected co-authoring bibliographical network investigated in (Galbrun et al. 2014).

Finally, there may be a large number of bi-patterns to extract from directed and undirected networks, when compared to single patterns: any pair of core closed single patterns is a candidate to be a core closed bi-pattern. We will propose to focus on bi-patterns in which the two components, which are expressed in the same pattern language, are different enough. For that purpose we define a homogeneity measure and select inhomogenous bi-patterns.

This work was presented in a workshop article (Soldano et al. 2018) in which the bi-pattern methodology were first introduced for two-mode and directed networks. The present article also introduces the star-satellite core definition for undirected networks and discuss the bi-patterns extracted and selected from a bibliographical network, exhibiting in particular some cooperation and competition examples in the pattern mining research domain. Overall, the main contributions of this work may be summarized as follows:

  • A general definition of closed bi-pattern mining.

  • A general algorithm for closed bi-patterns enumeration

  • A new definition of the core of a network as a pair of vertex subsets

  • A general algorithm to extract such new cores

  • A definition of homogeneity for bi-patterns.

  • The methodology of core closed bi-pattern mining of attributed networks, including core definitions designed respectively for two-mode, directed and undirected networks.

Related work” section discusses related work. “Preliminaries” section gives preliminary definitions and results on core Closed Pattern Mining. In “Bi-concept lattices and abstract closed bi-patterns” section we introduce abstract closed bi-pattern mining and abstract bi-concept lattices. In “Cores as subset pairs and core closed bi-pattern mining” section we extend the definition of cores in order to obtain two-component cores and consequently define core closed bi-pattern mining. In “Core definitions: two-Mode, directed and undirected networks” section we introduce such two components cores for two-mode, directed and undirected networks. In “Computing the interior of (X1,X2) and enumerating abstract closed bi-patterns” section we provide algorithms to compute two component cores and to enumerate the associated closed bi-patterns. Finally, in “Experiments” section we present the results obtained on the three networks mentioned above and discuss the scalability of this pattern mining methodology.

Related work

Analyzing attributed graphs led to various ways of extracting cohesize subgraphs. First, various pattern mining work investigated mining patterns as pairs of constraints on topology and labels, and rank them according to interestingness measures (Mougel et al. 2012; Silva et al. 2012). This includes abstract closed pattern mining mentioned above as well as work coming from the subgroup discovery field in which selection and pruning of interesting patterns is performed during enumeration(Atzmueller et al. 2016). A second way consists in extending community detection algorithms by taking into account both topology and attribute information. Various definition of hybrid objective functions and efficient ways to find optimal solutions have been proposed. In most case the result is a set of non overlapping communities (Baroni et al. 2017; Sánchez et al. 2015; Combe et al. 2015). The overlapping case has been addressed by soft clustering schemes (Xu et al. 2012), by hard clustering of the edge set (Galbrun et al. 2014) or by building generative models in such a way that a node may freely belong to several communities (Yang et al. 2013). Finally, network embedding algorithms have been proposed to learn an appropriate representation of nodes as vectors, and then apply standard clustering methods (Gao and Huang 2018).

In all these approaches, when considering the relationship between attributes and nodes, the latter have a unique role. This is obviously not appropriate regarding two-mode networks, while in single mode network allowing nodes to have different roles within a group may lead to a more flexible way to define cohesive subnetworks. What we propose here, beyond the extension of the core closed pattern methodology to bi-patterns, is a first step in revisiting the various methodologies mentioned above.

Regarding core definitions, recent work have proposed definitions designed to investigate directed networks. In particular a core definition has been proposed in Giatsidis et al. (2013) to investigate collaboration within directed networks. The requirement is then that both indegrees and outdegrees of vertices have to be higher than thresholds, therefore all nodes in the core are required to have both the out and in role. A different kind of core is related to the Hub-Authority idea which considers that a vertex may be prominent in a network according to only one or both of its out or in roles (Kleinberg 1999). The HA-core has been recently defined in order to express this idea (Soldano et al. 2017b).

Preliminaries

Abstract closed pattern mining and concept lattices

The closed pattern mining and Formal Concept Analysis (FCA) frameworks consider the occurrences of patterns in a set of objects V. The pattern language L is partially ordered in such a way that if qq, i.e q is more specific than q, then whenever qoccurs in object v, q also occurs in v. The set of occurrences ext(q) of a pattern q, i.e. the object subset in which q occurs, is called its support set or its extension in V. The purpose common to Closed Pattern Mining and FCA is then to represent, in a condensed way, the set of definable subsets of V, i.e. subsets which are pattern support sets.

Enumerating the definable subsets of V comes down to enumerate the equivalence classes of patterns when considering as equivalent two patterns with same support set. Whenever the pattern language is a finite lattice there is a unique most specific pattern in each class. Recall that a lattice is such that any pair a,b of elements have both a join ab and a meet ab. The meet ab is the unique greatest lower bound of a and b, i.e. aba,abb and there is no c>ab which is a lower bound of both a and b. In a dual way the join ab is the least upper bound of a and b. When considering an equivalence class of patterns, the most specific element of the class is then the meet of all its elements. This most specific pattern represents then what is common to all the patterns that occur in exactly the same object subset.

In the case of powersets, the order is the inclusion order ⊆ and join and meet respectively are set theoretical union and intersection. In standard FCA the pattern language L is the powerset 2I of a set I of binary attributes and the extensional space is the powerset 2V of the object set V. However, for our purpose of defining and mining bi-patterns we need a more general presentation. First, we define below closure operators together with their dual interior operators.

Definition 1

Let S be an ordered set and f:SS a self map such that for any x,yS, f is monotone, i.e. xy implies f(x)≤f(y) and idempotent, i.e. f(f(x))=f(x), then if f(x)≥x, f is called a closure operator while if f(x)≤x, f is called an interior operator.

Formal Concept Analysis goes beyond enumeration of closed patterns: FCA considers knowledge discovery as the process of discovering the ordering structure of the data to analyse. It relies primarily on the Galois connectionFootnote 2 between the pattern language and the powerset of objects:

Proposition 1

Let (L,≤) be a lattice called the pattern language, V be a set of objects and d:VL be an operator that describes the object x as an element d(x) of L. Let ext(q)={xVqd(x)} be the subset of objects in which pattern q occurs. Then

  • \(\text {int}(V^{\prime })= \bigwedge _{x \in V^{\prime }} d(x)\) is the greatest element of L which occurs in V

  • (int,ext) define a Galois connection on (2V,L)

In what follows we will use interior operators to define the general framework of abstract concept lattices. First we recall a general result (Pernelle et al. 2002; Soldano and Ventos 2011) together with a corollary defining abstract concept lattices:

Proposition 2

Let X and L be two lattices, (int,ext) be a Galois connection on (X,L) and p be an interior operator on X. Let A=p[X] be the image of X under p, then (int,p∘ext) is a Galois connection on (A,L).

Corollary 1

i) f=int∘p∘ext is a closure operator on Lii) h=p∘ext∘int is a closure operator on Aiii) The set of the (e,c) pairs where c=f(c)=int(e) and e=h(e)=p∘ext(c) form a lattice, ordered following A.

Such a pair (e,c) is called a concept, e is its (abstract) extent while c is its intent i.e. the abstract closed pattern whose abstract support set p∘ext(c) is e. As the new equivalence relation is coarser, i.e. ext(q)=ext(q) implies p∘ext(q)=p∘ext(q), there is less abstract closed patterns than closed patterns.

Abstract closed pattern mining is illustrated in Example 5 of Appendix 2.

Cores and closed pattern mining of attributed networks

Now, consider the object set as the vertex set V of some graph whose vertices are each labelled by a description in a pattern language. Defining the essential part of a graph, i.e. its core subgraph, relies on all vertices satisfying some boolean property. Let G=(V,E) be a graph. A core property P is defined as a mapping P:V×2V→{true,false} where P(v,X) is true whenever vertex v satisfies some condition within the subgraph GX induced by the vertex subset X. The core subgraph of a graph (V,E) is then defined as the subgraph \(G_{V^{\prime }}\) induced by the largest vertex subset V, also called its core, whose vertices v all have property P(v,V).

To define a core, we need P to be such that there does exist such a largest vertex subset with property P. This is true whenever P is monotone i.e. for any xX1X we have that P(x,X1) and X2X1 implies P(x,X2) (Batagelj and Zaversnik 2011; Soldano and Santini 2014). The following result allows then to apply abstract FCA to graphs:

Proposition 3

The operator that reduces a vertex subset V of a graph G to the core of the subgraph \(G_{V^{\prime }}\) is an interior operator on 2V.

As a result, abstract concept lattices together with closure operators are defined in such a way that each extent p∘ext(c) is a core while the associated intent c is the most specific pattern that occurs in this core. Abstract closed pattern mining has been applied to undirected networks (Soldano et al. 2017a) as well as directed networks (Soldano et al. 2017b).

Example 1

We consider the small attributed graph displayed Fig. 1 and the 2-core property that states that in a core subgraph all vertices have degree at least 2. We have then that the support set ext(a) of pattern a is 123457. The pattern a 2-core is then 123: when adding to 123 any vertex v among 457, the degree of v in G123v is strictly less than 2. Therefore, p∘ext(a)=123 is the core support set of a. The corresponding core closed pattern is then int(123)=ababab i.e. the greatest pattern common to the vertices of this 2-core.

Fig. 1
figure 1

The pattern a 2-core subgraph of an attributed graph. The vertices 123457 of the pattern a subgraph are displayed in bold. The vertices 123 of Its 2-core subgraph are colored in blue

Summary

We have briefly presented standard closed pattern mining and FCA together with abstract closed pattern mining in which the support set ext(q) of a pattern q is reduced to its abstract support set p∘ext(q) where p is an interior operator. The abstract closed pattern c associated to q is then the most specific pattern with the same abstract support set. We have then c=int∘p∘ext(q) where the intersection operator int intersects the descriptions of the objects in p∘ext(q). Then we have introduced core closed pattern mining in which p reduces a vertex subset to the core of its induced subgraph. Any such core definition, including the well-known k-core, relies on a core property P such that P(v,S) holds for all vertices v of the core S. In order to be a core property, P is required to satisfy a monotony condition. Core closed pattern mining consists then in enumerating the set of core closed patterns in an attributed graph.

Bi-concept lattices and abstract closed bi-patterns

This section is motivated by the extension of core closed pattern mining to two-mode networks, i.e. networks in which each edge relates a vertex from a vertex set V1 to a vertex from a vertex set V2. The vertices may then be described in two different pattern languages L1 and L2. This requires to extend the closed pattern mining and FCA methodology to patterns made of two components and that we call bi-patterns. A way to properly define such bi-patterns it to first extends the concept lattices of FCA. For that purpose, we need to consider lattice products and will obtain a new Galois connection.

Lattice products are also lattices according to the so-called cartesian ordering:

Proposition 4

Let (X1,≤1,∨1,∧1) and (X2,≤2,∨2,∧2) be two lattices, and consider the cartesian product X=X1×X2 together with the binary relation ≤ defined as (x1,x2)≤(y1,y2) iff x11y1 and x22y2. Then (X,≤,∨,∧) is a lattice with join and meet defined as:

  • (x1,x2)∨(y1,y2)=(x11y1,x22y2)

  • (x1,x2)∧(y1,y2)=(x11y1,x22y2)

We may then build a Galois connection on lattices products (see proof in Appendix 1):

Proposition 5

Let X=X1×X2 and L=L1×L2 be two lattices product, and let (int1,ext1) and (int2,ext2) be Galois connections on respective lattices pairs (X1,L1) and (X2,L2). Consider the mappings int and ext on X and L such that:

  • int(x1,x2)=(int1(x1),int2(x2))

  • ext(l1,l2)=(ext1(l1),ext2(l2))

then (int,ext)define a Galois connection on (X,L)

In what follows we consider two Galois connections as defined in Proposition 1 and use an interior operator to create the dependency between the two components of the extent which is necessary to represent cores of two-mode networks.

Proposition 2 states that applying an interior operator to a lattice involved in a Galois connection preserves the connection. The interior operator in the bi-concept case applies to a pair of object subsets, i.e. has domain \(\phantom {\dot {i}\!}X=2^{V_{1}}\times 2^{V_{2}}\):

Definition 2

Let (int,ext) be the Galois connection on (X,L) as defined in Proposition 5 and let p be an interior operator on X. Then, the lattice of the Galois connection (int,p∘ext) on (p[ X],L) is called an abstract bi-concept lattice.

The intents of the bi-concepts defined this way are what we call abstract closed bi-patterns. In a similar way as in abstract closed (single) pattern mining, each such abstract closed bi-pattern is the most specific bi-pattern c such that p∘ext(c) where p is an interior operator. However, bi-patterns occurrences are gathered in object subset pairs while closure and interior operator are self-map on lattice products. Abstract closed bi-pattern mining is illustrated in Example 6 of Appendix 2.

In what follows we apply this methodology to attributed graphs and for that purpose we define such interior operators with respect to pairs of logical properties and use them to give a new definition of cores as vertex subset pairs.

Cores as subset pairs and core closed bi-pattern mining

In what follows we consider the subnetwork induced by a pair of vertex subsets (W1,W2). When considering W1V1 and W2V2 we simply write (W1,W2)≤(V1,V2) or call (W1,W2) a subset pair of (V1,V2). The following definition may be applied to a two-mode network (V1,V2,E) as well as to a single mode network by considering V=V1=V2.

Definition 3

Let G=(V1,V2,E) be a network, the subnetwork induced by the subset pair (W1,W2) is the network \(G_{(W_{1},W_{2})}=(W_{1},W_{2},E^{\prime })\) where E is the edge subset relating vertices from W1 to vertices from W2.

To obtain an interior operator, we need to define monotone properties in this context:

Definition 4

\(P_{1}: V_{1}\times 2^{V_{1}} \times 2^{V_{2}} \rightarrow \{true,false\}\) is said monotone if and only if for any wV1 and any subset pairs (W1,W2)and \((W_{1}^{\prime },W_{2}^{\prime })\geq (W_{1},W_{2})\),

$$P_{1}(w,W_{1},W_{2})\ \text{implies} P_{1}\left(w,W_{1}^{\prime},W_{2}^{\prime}\right)$$

In the same way, P2 defined on \(\phantom {\dot {i}\!}V_{2}\times 2^{V_{1}} \times 2^{V_{2}}\) is monotone whenever for any \(w \in W_{2}, P_{2}(w,W_{1},W_{2})\ \text {implies}\ P_{2}\left (w,W_{1}^{\prime },W_{2}^{\prime }\right)\)

Cores will then be defined thanks to the following result (see proof in Appendix 1):

Proposition 6

Let (P1,P2) be a pair of monotone properties, and (W1,W2) be a subset pair of (V1,V2). Then there exists a greatest subset pair (S1,S2)≤(W1,W2) such that P1(v1,S1,S2) holds for all elements v1 of S1 and P2(v2,S1,S2) holds for all elements v2 of S2.

We will further call this subset pair (S1,S2) the core subset pair of (W1,W2) and define core subgraphs accordingly:

Definition 5

Let G=(V1,V2,E) be a network, and (P1,P2) be a pair of monotone properties. The subnetwork \(G_{(S_{1},S_{2})}\) induced by the core subset pair (S1,S2) is called the core subnetwork of G.

We benefit then from a result similar to Proposition 3:

Proposition 7

The operator that reduces a subset pair (W1,W2)≤(V1,V2) to its core subset pair (S1,S2) is an interior operator on \(\phantom {\dot {i}\!}2^{V_{1}}\times 2^{V_{2}}\).

Summary

In the same way as in core closed pattern mining, given some bi-pattern q=(q1,q2) we may compute its core support set pair p∘ext(q) where p is an interior operator. This interior operator relies on a pair of core properties that are each required to satisfy a monotony property. The associated core closed bi-pattern c=(c1,c2) is obtained by intersecting componentwise, the vertex descriptions in p∘ext(q). Enumerating these core closed bi-patterns defines the bi-pattern mining task. In the next section we consider various core definitions to apply bi-pattern mining to two-mode, directed and undirected attributed graphs. Note that cores are here vertex subset pairs, which extends the previous (single) core notion referred to in “Cores and closed pattern mining of attributed networks” section.

Core definitions: two-Mode, directed and undirected networks

Two-mode network cores

According to this new definitions, we first define the h-a BHA-core of a two-mode network:

Definition 6

The h-a BHA-core of the two-mode network G is defined through the following pair of core properties:

  • P1(v,X1,X2) holds if and only if the degree of vX1 in \(G_{(X_{1}, X_{2})}\) is at least h.

  • P2(v,X1,X2) holds if and only if the degree of vX2 in \(G_{(X_{1}, X_{2})}\) is at least a.

P1 and P2 are clearly monotone and therefore the h-a BHA-core is properly defined. This core definition is equivalent to the definition presented by Cerinsek and Batagelj (2015) in which the p-q BHA-core is called the (p,q)-core. We provide hereunder an example of an attributed two-mode network together with the set of closed bi-patterns associated to its h-a BHA cores.

Example 2

We consider the two-mode network pictured on the leftmost part of Fig. 2. The two vertex sets are V1={l1,l2,l3} and V2={r1,r2,r3}. Vertices of V1 are labelled by subsets of I1={a,b,c,d} while vertices of V2 are labelled by subsets of I2={w,x,y,z}.

Fig. 2
figure 2

The two 2-2 BHA-cores in bi-concepts of Example 2. The leftmost part displays the whole network. In the middle we have its 2-2 BHA-core associated to the closed bi-pattern (ab,wx). The rightmost part of the figure displays the 2-2 BHA-core associated to the other, more specific, bi-pattern (ab,wxy)

The most general bi-pattern (,) occurs in the whole network. Its 2-2 BHA-core is displayed in the middle of Fig. 2 and is induced by (l1l2,r1r2r3). We have then as the corresponding closed bi-pattern int(l1l2,r1r2r3)=(ab,wx). When adding attributes to this bi-pattern we obtain subnetworks whose 2-2 HA-core is empty, except when adding y to wx. The corresponding bi-pattern (ab,wxy) occurs in (l1l2l3,r1r3) whose corresponding 2-2 BHA-core is displayed in the rightmost part of Fig. 2 and has vertex sets pair (l1l2,r1r3). This bi-pattern is closed as no item can be added without losing some vertex. Furthermore, adding any item to (ab,wxy) results in an empty 2-2 BHA-core. The corresponding bi-concept lattice is therefore the total ordering of the 3 bi-concepts ((l1l2,r1r2r3),(ab,wx)), ((l1l2,r1r3),(ab,wxy)) and ((,),abcd,wxyz). Also see Fig. 4 the search tree developed for this example by the algorithm we propose in “Bi-pattern enumeration” section.

Now, let G(V,E) be a single mode network, we may still consider the subgraph induced by a pair of vertex subsets according to Definition 3. This leads to core definitions for undirected and directed networks and in which vertices may have two roles.

Directed network cores : the hub and authority roles

In the directed case we reconsider the property pair of Definition 6 as a property on directed networks and obtain a BHA-core definition that extends the hub-authority core defined in Soldano et al. (2017b). We begin with a definition of the h-a BHA core for directed network:

Definition 7

The h-a BHA-core of the directed network G is defined through the following pair of core properties:

  • P1(v,X1,X2) holds if and only if the outdegree of vX1 in \(G_{(X_{1}, X_{2})}\) is at least h.

  • P2(v,X1,X2) holds if and only if the indegree of vX2 in \(G_{(X_{1}, X_{2})}\) is at least a.

The BHA-core of directed networks extends the hub authority (HA) core definition:

Proposition 8

Let G=(V,E)be a directed network, let (SH,SA)be its h-a BHA-core, and let HA be its h-a HA core where H and A are its hub and authority vertex subsets. Let then pBHA and pHA be the core operators respectively associated with the BHA core and the HA core, we have then:

$$\begin{array}{*{20}l} (S_{H}, S_{A}) &=& (H,A) \text{ and } \end{array} $$
(1)
$$\begin{array}{*{20}l} \cup p_{\text{BHA}}(X,X)&=&p_{\text{HA}}(X) \end{array} $$
(2)

for any vertex subset X.

Undirected network cores : the star and satellite roles

The previous section showed that bi-pattern mining could be applied to directed networks as far as each bi-pattern component were associated to one of the in and out roles of the vertices. In what follows we extend the k-near-star core which was defined on undirected networks, and exploits the two roles it relies on (Soldano and Santini 2014). This new core is called the k- StSa core referring to the “Star” and “Satellite” roles: a star vertex is required to have degree at least k while its neighbours have the satellite role. The k- StSa core subgraph is then the subgraph induced by its Star and Satellite vertex subsets as defined below:

Definition 8

The k StSa-core of the undirected network G is defined through the following pair of core properties:

  • P1(v,X1,X2) holds if and only if the degree of vX1 in \(G_{(X_{1}, X_{2})}\) is at least k.

  • P2(v,X1,X2) holds if and only if there exists some edge xv such that P1(x,X1,X2) holds

In the corresponding core subset pair (St,Sa),St is called the star vertex subset and Sa the satellite vertex subset.

Star-satellite bi-pattern mining will be exemplified on an undirected bibliographical network in “Star-Satellite bi-patterns in a bibliographical network” section.

Computing the interior of (X 1,X 2) and enumerating abstract closed bi-patterns

Computing interiors

We present now the generic algorithm Interior that computes the interior p(X1,X2)=(S1,S2) associated to the pair of monotone properties (P1,P2). In the bipartite case, i.e. when V1V2=, the algorithm is basically a rewriting of the algorithm proposed in Cerinsek and Batagelj (2015). When considering X1=X2, Interior is similar to the algorithm proposed in Soldano et al. (2017b) to compute the directed HA-core. Let n be the number of vertices and m be the number of edges, the algorithm performs at most n iterations while the inner loop needs \(\mathcal {O}(m)\) operations as far as p needs only to access the neighbourhood of each vertex. The overall complexity is then \(\mathcal {O}(m*n)\). A more efficient algorithm in \(\mathcal {O}(m * \max (\Delta, \log n))\), where Δ is the highest degree within the graph, is obtained by adapting the variant cited in Batagelj and Zaversnik (2011) which uses two heaps as data structures for the vertex subset associated to each mode.

The following example illustrates how Interior computes the St-Sa core of an undirected network:

Example 3

Let G=(V,E) be an undirected graph with V=12345 and E={12,13,23,34,45}. We consider its 3 StSa core. Execution of Interior(V,V) starts with S1=S2=12345 and results in the following iterations:

  1. 1.

    Z1=12345 and Z2=12345 and then vertices 1245 are removed from S1 as their degree in G is less than 3 while 3 and 5 are removed from S2 as in G there is neither and edge x3 nor an edge x5 such that the degree of x is at least 3.

  2. 2.

    Z1=3 and Z2=124 and no vertex is removed from S1=Z1 as degree of vertex 3 in G(3,124) still is 3. In the same way no vertex is removed from S2=Z2 as undirected edges 31, 32, 34 are in G(3,124). As a result Z1=S1 and Z2=S2 and the iterations stop.

We note in this example that i) only one iteration is necessary to converge, which is always the case when computing k St-Sa cores and ii) St=3 and Sa=124 are disjoint, but this is not necessarily the case as, for instance, when adding edges 46 and 47 to G. In the new graph we obtain St=34 and Sa=1234567 as 3 and 4 are both stars and neighbours of each other (Fig. 3).

Fig. 3
figure 3

The 3-StS cores of the two graphs from Example 3. On the left, the first graph followed by its 3-StS core subgraph whose edges (in plain lines) relate stars (in blue) to satellites (in red). On the right, the second graph followed by its 3-StS core subgraph

Bi-pattern enumeration

We focus now on abstract closed bi-pattern enumeration. Building the bi-concept lattice has therefore to be a post-processing step. The enumeration follows the same process as abstract closed pattern enumeration, i.e. the efficient divide and conquer scheme described in Boley et al. (2010) as implemented in the MinerLC software. The adaptation is straightforward: the closure operator is now fA=int∘p∘ext where p is the interior operator as defined above. To perform enumeration of abstract closed bi-patterns we specialize each abstract closed bi-pattern (q1,q2) by adding either an element of I1 to q1 or an element of I2 to q2.

The algorithm bi-patterns is described below with the following notations:

Let q=(q1,q2) be a bi-pattern, i) add(i,q) returns either (q1i,q2) when iI1 or (q1,q2i) when iI2, ii) minus(I,q) returns the set of items which belong neither to the left part nor to the right part of the bi-pattern q=(q1,q2), i.e. minus(I,q) = I1q1I2q2. iii) The exclusion pair list EL is a subset pair of (I1,I2).

Example 4

We follow on from Example 2 and consider s=1 as the minimum support. The algorithm starts by computing the 2-2 HA-core Gc of the whole graph G. G and Gc are displayed respectively on the left and on the middle of Fig. 2. Function enum is then called with the core closed pattern q=int(vs(Gc))=int(l1l2,r1r2r3)=(ab,wx) and first outputs the pair ((ab,wx),(l1l2,r1r2r3), and then adds to q in turn each item in minus(I,q)=(cd,yz):

  • add(c,q))=(abc,wx) selects a subgraph whose core is empty. As a result the branch is pruned as smaller subgraphs would also result in an empty core.

  • add(d,q))=(abd,wx) selects also a subgraph whose core is empty.

  • add(y,q))=(ab,wxy) selects (l1l2l3,r1r3) whose core displayed on the right of Fig. 2 has vertex set (l1l2,r1r3). The core closed bi-pattern qx=(ab,wxy) is computed and having null intersection with the empty list EL leads to another recursive call of enum. This call will output the pair (qx,(l1l2,r1r3)) but there will be no deeper recursive calls as 2-2 HA structure with strictly less than four nodes are excluded. We have then EL set to = {y} prior to the next iteration.

  • add(z,q))=(ab,wxz) selects a subgraph whose core is empty.

As enum ends bi-patterns also ends. The two closed bi- patterns that have been output are the most specific bi-patterns that occur respectively in the 2-2 BHA-cores, displayed on the middle and the right of Fig. 2. The search tree is represented Fig. 4.

Fig. 4
figure 4

The search tree developed by minerLC during the bi-pattern enumeration of Example 4. Each box represents on the first line a bi-pattern q together with its support set pair e in the current vertex set, and on the second line the core p(e) of the subgraph induced by this support set pair, preceded by the associated abstract closed bi-pattern int∘p(e). In the top box q is the empty bi-pattern, and its successors are obtained by adding an item to the abstract bi-pattern of the top box. In all leaves the core p(e) is empty

Experiments

The first experiment concerns an original two-mode network, the second concerns a well-known directed social network available on the minerLC web page while the third one is an attributed undirected bibliographical network. The actual implementation, as part of the minerLC suite, relies on a pre-processing of the dataset that transforms the original network into a new network. Closed bi-patterns are then represented as single patterns whose items are prefixed by a role. Note that in this section there is no comparison with other programs or methods, as the task of bi-pattern mining is new as far as we know. However regarding the second dataset, we display a single pattern core subgraph, obtained in a previous work, together with a bi-pattern core subgraph sharing some nodes with the former.

h-a BHA bi-patterns in a two-mode network of epistemological data

We are currently investigating a two-mode network concerning data related to a MNHN-IRD program (called MUSORSTOM then Tropical Deep-Sea Benthos) of expeditions exploring the deep-sea in the Indo-West Pacific region, since 1976 (Bary 2018). In this network 596 edges relate 74 campaigns (V1) to 268 participants (V2). Campaigns are described following their date and location, the type of fishing gear (dredge, trawl), the objectives of the campaign as well as species described during the campaign. Regarding participants, the attributes concern the location of the institution they belong to, their scientific domain as well as bibliometrics. We have in particular searched bi-concepts associated to 3-4 HA cores (subnetworks with participants to at least 3 campaigns with at least 4 participants to these campaigns). As an illustration Fig. 5 displays the respective 3-4 HBA-cores S=(S1,S2) and \(S^{\prime }=\left (S^{\prime }_{1},S^{\prime }_{2}\right)\) of two bi-patterns q and q. The corresponding core subgraphs contains respectively S1+S2=80 vertices and \(S^{\prime }_{1}+S^{\prime }_{2}=76\) vertices. Vertices are displayed at their original position in the whole network according to a standard force directed drawing (Kobourov 2013). The difference between the extents are mainly in the left part of the network, i.e. the part that corresponds to campaigns before 2000 which means that differences concern campaigns and participants which are strongly related within the original network.

Fig. 5
figure 5

Two 3-4 HA bi-concept extents from the experiments on the Participant-Campaign two-mode network. On the left the first bi-pattern select campaigns (prefixed with ’c’ and red-colored) whose main objective is the faunistic inventory while on the right the second bi-pattern select campaigns that satisfy various constraints in particular about the species described during the campaign

h-a BHA bi-patterns in a Lawyer Advice directed network

This dataset concerns a network study of corporate law partnership that was carried out from 1988 to 1991 in New England (Lazega 2001). It concerns 71 attorneys (partners and associates). The vertices 1 to 36 represent partners while vertices 37 to 71 represents associates, i.e. attorneys with a lower position in the firm. In the Advice networkFootnote 3, each attorney is described using various attributes, and 892 directed edges xy relate attorney x who goes to attorney y for basic professional advice. This network was investigated in Soldano et al. (2017b) applying the abstract closed pattern methodology using the HA-core definition. We use here the same attributed network as found in the minerLC web page (see above).

There may be many bi-patterns when considering a single mode network as their number is quadratic in the number of single patterns in the same network. We will focus on bi-patterns associated to cores which are unlikely to appear as cores of single patterns. In this way, bi-pattern analysis is complementary to single pattern analysis. For that purpose we define the homogeneity of a bi-pattern as the Jaccard similarity of its components support sets. Homogeneity is then 1 when q1=q2 and 0 when q1 and q2 never both occur in the same vertex. We will then select bi-patterns with low homogeneity.

Definition 9

(Homogeneity of a bi-pattern q=(q1,q2))

$$h(q)=\frac{|\text{ext}_{1}(q_{1}) \cap \text{ext}_{2}(q_{2}) | }{ | \text{ext}_{1}(q_{1}) \cup \text{ext}_{2}(q_{2}) |}$$

We apply our bi-pattern methodology using the 9-9 BHA-core which corresponds to a 9-9 HA-core as far as we have equal input vertex subsets W1=W2=W (see Proposition 8). As an example, we consider the following closed bi-pattern q=(q1,q2) where

q1={ 25<Age≤50, Seniority ≤ 25} and

q2={ 30<Age≤65,5<Seniority}.

This bi-pattern is the abstract closed bi-pattern with least homogeneity among the 82 abstract closed bi-patterns. It represents a group of young lawyers seeking advices from older lawyers who are in the firm for more than five years. We observe that 68 vertices over the 71 vertices of the whole advice network satisfy what is common to q1 and q2 i.e. satisfy q1q2={ 25<Age≤65}. Only 24 vertices among these 68 satisfy both patterns q1 and q2 resulting in homogeneity h(q)=0.368. The 9-9 BHA-core subgraph of q is displayed Fig. 6. It is made of 33 vertices 13 of which are both in H and A vertex subsets. Note that the 9-9 HA core associated to the single abstract closed pattern { 25<Age≤65} is much larger: it contains 50 vertices with |HA|=23 and also is the 9-9 HA-core of the whole graph.

Fig. 6
figure 6

The 9-9 BHA-core of the lawyers advice subnetwork associated to the bi-pattern ({ 25<Age≤50, Seniority ≤ 25}, { 30<Age≤65, 5<Seniority}). Vertices from 1 to 36 are partners, the other are associates. Vertices both red and blue have both the hub and authority roles

We also experimented with a weaker 6-6 BHA-core abstraction, then resulting in 32010 abstract closed bi-patterns among which 262 have homogeneity less than 0.1. There were in particular 7 bi-patterns with null homogeneity, one among which represents lawyers from Boston whose law domain is litigation. In this bi-pattern 7 associate lawyers with age between 26 and 45 and seniority no more than 5 years go for advice to 7 older lawyers (both partners and associates) with age between 31 and 60 and seniority more than 6. The associated core subgraph is displayed on the right part of Fig. 7. This bi-pattern reflects the composition and cohesion of one of the relatively stable teams of lawyers on the litigation side in this Boston office. It shows the very special proximity in this team between, on the one hand, Partners 13, 21, 24 and 26 as well as senior Associates 38, 39 and 40 (in red) and on the other hand the more junior Associates (in blue) who seek advice from the former. A single Pattern 4-4 BHA-core, previously discussed in Soldano et al. (2017b), is displayed on the left of Fig. 7 and identifies an even stronger tie between these Partners and senior Associate 40 who, in 1991, was sought out for advice by the Partners themselves in breach of the unspoken status rule related to advice seeking (’You do not seek advice from others lower in the social pecking order’). In [13, page 107], blockmodelling clustered Associates 38 and 40 in these Partners’ position (Position One) as structurally equivalent to them, an exceptional status heterogeneity. A year later, still as exceptionally, Associate 40 (male) was made partner. More senior Associates 38 and 39 (both female) had to wait for longer (Associate 38 made it to partnership two years later). Based on the up or out rule, Associate 39 (who was not part of Position One to begin with) had to leave the firm. Inspection of these pattern and bi-pattern thus captures a very real process.

Fig. 7
figure 7

Two related single pattern and bi-pattern core subgraphs of the lawyers advice network. On the left, the 4-4 HA-core subgraph associated with the single pattern { 30<Age≤50, 5<Seniority≤20, Gender-Man, Office-Boston}. On the right, the 6-6 BHA-core subgraph associated with the bi-pattern ({25 <Age≤45, Office-Boston, Litigation, Seniority ≤ 5, Associate}, {30 <Age≤60, Office-Boston, Litigation, 5 <Seniority≤25})

Finally we conduct experiments involving 4-4 BHA-cores resulting in 293 490 bi-patterns, found in few minutesFootnote 4, to be compared to the 930 single patterns observed in Soldano et al. (2017b).

Star-Satellite bi-patterns in a bibliographical network

We also investigated the co-authoring network DBLP.E extracted from the DBLP database. DBLP.E is part of a family of networks used in various experiments on graph mining (Galbrun et al. 2014). To build the vertices description, first the terms in the titles of the author’s articles were gathered and stemmed. Stop-words as well as terms that occur with more than 60% of the authors were then removed. Finally, each researcher is labelled by the terms whose occurrence count si higher than one percent of the total volume of terms for that researcher. The network is the ego-network of radius 2 of co-authors of George Karypis and has 721 authors connected by 1427 undirected co-authoring links. The maximum vertex degree is 68 and the average vertex degree is 3.95. Each vertex is described by a subset of labels among a set of 2782 labels and the average vertex description size is 23.9. We experimented bi-pattern mining with 20-Star-Satellites cores. The core of the whole network is made of 17 stars among a total of 589 nodes in the core. We display Fig. 8 this core subgraph in which blue nodes represent stars and red nodes represent satellites. Note that all blue nodes are also red nodes. This means that any star, i.e. an author with at least 20 co-authors, is also a satellite of, i.e. is connected to, another star.

Fig. 8
figure 8

The 20-Star-Sat core subgraph of the DBLP.E co-authoring Network. The 2D coordinates are computed on the whole DBLP.E network using a spring-electric display. Nodes in red are satellites surrounding 17 stars (in blue). Dashed edges relate satellites

We obtained 214 bi-patterns among which we found in particular bi-patterns representing single stars with all their satellites. Most of such bi-patterns have the form (d(s),) where d(s) is the description of the star s and in which the satellites have no common label. When considering homogeneity as defined above, these single star bi-patterns have low homogeneity. We also found bi-patterns made of a single star with null homogeneity, meaning the co-authors of this single star in the core subgraph have at least one common label they do not share with the star. We display Fig. 9 two such bi-patterns sharing the same single star.

Fig. 9
figure 9

Two bi-patterns with null homogeneity and sharing the same single star. The star represents Vipin Kumar who is labeled as d={algorithm,analysy,assocy,data,graph,mine,parallel,partit,pattern,scalabl,search}. The bi-pattern q=(d, {Model}) is displayed on the left while the bi-pattern q=(d, {Network}) is displayed on the right. Some co-authors belong to both bi-patterns, i.e. have both labels, while Vipin Kuma has none of them. See Table 3 the authors associated to these vertices

With low homogeneity we also have a bi-pattern representing a pair of co-authors, namely Jianyong Wang and Lizhu Zhou), who are both stars and satellites (since an edge relate them). Such a bi-pattern represents a close cooperation between two senior researchers. Conversely, we have a bi-pattern with two unconnected stars, namely Mohammed J. Zaki and Jianyong Wang, who share labels {cluster,data,databas,efficy,frequ,graph,mine,pattern} but no satellites, thus suggesting some competition on close subjects. The corresponding core subgraphs are displayed Fig. 10.

Fig. 10
figure 10

Two bi-patterns q=(q1,) and \(q^{\prime }=(q^{\prime }_{1}, \emptyset)\) with homogeneity less than 0.003. On the left the core subgraph represents the two cooperating stars Jianyong Wang and Lizhu Zhou who share labels q1= {data,databas,efficy,graph,keyword,mine,query,search,web,xml}. The core subgraph on the right represents the two stars Jianyong Wang and Mohammed Zaki sharing labels \(q^{\prime }_{1}=\){cluster,data,efficy,frequ,graph,mine,pattern,query} in a competitive configuration. See Table 4 the authors associated to these vertices

Scalability

First note that we did not use any constraint on the cores size, i.e. we considered s=1 as a minimum size threshold. This is a rather general situation: the topological constraint associated with the core definition allows a better exploration of patterns occurrences since strengthening the constraint, i.e. increasing h or a, decreases the number of closed patterns, therefore allowing to find unfrequent patterns. Now, the first two networks in our experiments are rather small and dense networks whose vertices have a detailed description. Scalability of the enumeration depends on the cost of core computation as well as the number of bi-patterns to output. Core computation is efficient as far as the logical property P only depends on neighbours of the considered vertex (Batagelj and Zaversnik 2011), and has been performed on very large networks. Regarding the closed pattern enumeration, our algorithm is based on an efficient top-down general algorithm (Boley et al. 2010) and the implementation uses data reduction techniques borrowed from (Negrevergne et al. 2013). However the scalability, as mentioned above, depends on the number of bi-patterns to generate. This number depends on the size of the pattern language and bi-pattern mining means a pattern space which size is the product of the single pattern spaces. Note that though the vertices of the undirected network ICDM_E are described in a large language, each vertex is described with a small number of terms. As a consequence the number of bi-patterns with different cores is limited and the enumeration stops after few minutes (namely 470 s). We still have to experiment bi-pattern mining on large attributed networks of hundred thousands of nodes and edges. The ICDM_E case shows that, as far as we consider strong enough core definitions, we may investigate a large network in a reasonable time. In the general case there may be a large number of bi-patterns to investigate (see, for instance, the 4-4 BHA experiments at the end of “h-a BHA bi-patterns in a Lawyer Advice directed network” section). Only considering, as a post-processing, bi-patterns with low homogeneity allows to reduce the number of patterns to examine, while selecting unexpected patterns, adapting the method from (Soldano et al. 2017b), should also be efficient. Finally, in order to present to domain experts a limited number of interesting patterns, we still need some way, as the Minimum Description Length pattern selection scheme (see for instance Spyropoulou et al. (2014)), to sample among bi-patterns associated with similar cores.

Summary

Table 1 summarizes the various two component cores used in the bi-pattern mining problems we have investigated. The definitions are very close but concern different types of networks. More core definitions are obviously possible as far as monotony of properties pair, as defined in Definition 4, is satisfied. For sake of simplicity the BHA and StSa cores have been defined using subgraphs induced by vertex subset pairs, according to Definition 3. However, this is not mandatory and could preclude some interesting core definitions. For instance, core definitions designed to constrain some core-periphery structure should take also into account edges relating nodes within one of the vertex subset pair.

Table 1 Two components core definitions for various kind of networks

Conclusion

In this article we have extended the core closed pattern methodology in order to address two-mode attributed networks. For such networks there were no methodology, to the best of our knowledge, to extract subnetworks according to constraints on both topology and attributes. For that purpose, we have first extended the core notion: a core subgraph is now induced from a pair of vertex subsets. In each vertex subset the nodes have to satisfy an associated topological property. We may then start from any vertex subset pair and reduce this subset pair to its core, according to this new definition. We have then defined a bi-pattern as a pattern pair each component of which selects a vertex subset. This leads to define core closed bi-pattern mining which is a new and natural way to investigate attributed two-mode networks: each component of a bi-pattern select the nodes associated to a mode. We have also provided efficient algorithms to extract cores and enumerate core closed bi-patterns.

Closed bi-pattern mining as defined here may be applied to single mode networks when considering nodes separately according to two different roles. In directed networks we may then straightforwardly consider the in and out roles of nodes. In undirected networks we may still apply bi-pattern mining as far as the core definition relies on two different roles, as exemplified when introducing the star-satellite core. In these single mode networks bi-pattern mining allows to extract information which is not accessible using standard pattern mining: we may rank or select bi-patterns with low homogeneity i.e. whose components select vertex subsets with a limited or null overlap. This allows for instance to extract bi-patterns representing young lawyers asking for advice to older lawyers or representing a group of coauthors made of senior researchers sharing a large list of keywords together with a set of junior co-authors who share few or no keywords.

It should be emphasized i) that the results and definitions presented in this article may be extended to multiple patterns i.e. tuples rather than pairs, and therefore to the analysis of multi mode or multi role networks, and ii) that by using appropriate core and multi-pattern definitions, the methodology may also be extended to multiplex networks i.e. basically to address general linked data. For instance, the core of a multiplex network may be obtained in the same way as the BHA core of a directed network: as edges have a type we may associate a node degree with each edge type, associate a role to each edge type and require nodes to have a sufficient degree to belong to the corresponding role component in the core. We could then investigate, for instance, gene regulation networks by considering two different types of regulation: a regulator may either increase or decrease the gene expression. Note that in this case edges have both a direction and a type. There is no technical difficulty in defining appropriate cores in such situations, but of course core definitions as well as multi-pattern definitions, should be accurately designed according to the questions we intend to investigate: we may or not be interested in the direction according to the specific biological question we consider.

Appendix 1: Notations, Definitions and Proofs

Table 2 summarizes the main notations regarding bi-pattern mining on attributed graphs.

Closed bi-patterns are ordered in a bi-concept lattice whose definition relies, as the concept lattice definition, on the Galois connection between an extensional and an intensional space. We denote both order relations by the set theory inclusion symbols.

Definition 10

Let(L,⊆) and (X, ⊆) be two lattices. Let int and ext be two maps defined on X and L by

Table 2 Notations

int: X →L

ext: L→X

and such that:

C1- ∀e,eX, ee implies int(e)⊇int(e)

C2- ∀c,cL, cc implies ext(c)⊇ext(c)

C3- ∀cL, c ⊆int(ext(c)), and ∀eE, e ⊆ext(int(e))

Then (int,ext) define a Galois connection on (X,L)

Proposition 5 is then straightforward according to the componentwise defintion of the orders on pairs X=(X1,X2) and L=(L1,L2).

Note that in closed pattern mining the Galois Connection definition is not always mentioned as such since results focus on the closure operator on the pattern language. Still, it is a simple way using Propositions 5 and 2 to obtain abstract closed bi-patterns as well as their partial ordering.

The proof of Proposition 6 is also straightforward:

Proof

Let (P1,P2) be a pair of monotone properties, and (W1,W2) be a subset pair of (V1,V2). Then there exists a greatest subset pair (S1,S2)≤(W1,W2) such that P1(v1,S1,S2) holds for all elements v1 of S1 and P2(v2,S1,S2) holds for all elements v2 of S2.

As we consider the finite case, there are maximal subset pairs such that the required condition (referred to as C) is satisfied. We will assume that there are two maximal pairs (S1,S2) and \(\left (S_{1}^{\prime },S_{2}^{\prime }\right) \) that satisfy C. i) This means that for any element v of S1 we have that P1(v,S1,S2) holds, and as P1 is monotone we also have that \(P_{1}\left (v,S_{1}\cup S_{1}^{\prime },S_{2}\cup S_{2}^{\prime }\right)\) holds. In the same way, for any element v of \(S_{1}^{\prime }\) we have that \(P_{1}\left (v,S_{1}\cup S_{1}^{\prime },S_{2}\cup S_{2}^{\prime }\right)\) also holds. This means for any element v of \(S_{1} \cup S_{1}^{\prime }\) we have that \(P_{1}\left (v,S_{1}\cup S_{1}^{\prime },S_{2}\cup S_{2}^{\prime }\right)\) holds. ii) The same reasoning regarding \(S_{2}, S_{2}^{\prime }\) and P2 shows that for any element v of \(S_{2} \cup S_{2}^{\prime }\) we have that \(P_{2}\left (v,S_{2}\cup S_{1}^{\prime },S_{2}\cup S_{2}^{\prime }\right)\) holds. From i) and ii) we conclude that \(\left (S_{1}\cup S_{1}^{\prime }, S_{2} \cup S_{2}^{\prime }\right)\) satisfy condition C, and is therefore greatest than both (S1,S2) and \(\left (S_{1}^{\prime },S_{2}^{\prime }\right)\). As both pairs are maximal subset pairs satisfying C, this means that \(S_{1}=S_{1}^{\prime }\) and \(S_{2}=S_{2}^{\prime }\). □

Appendix 2: Examples of abstract closed pattern and bi-pattern mining

In this section, we exemplify abstract closed pattern mining discussed in “Abstract closed pattern mining and concept lattices” section and abstract closed bi-pattern mining presented in “Bi-concept lattices and abstract closed bi-patterns” section. we first note an useful one to one correspondance between interiors operators on a lattice and their range (see (Blyth 2005) for the dual result on closure operators):

Proposition 9

Let X be a complete lattice. A subset A of X is the range of an interior operator on X if and only if A is closed under join. The interior operator f:XX is then unique and defined as f(x)=∨{aAax}a.

We further call A an abstraction of X, hence we may define abstract concept lattices through interior operators as well as abstractions. By A is closed under join means we intend that the join of any subset {W1,…,Wn} of A, including the empty subset , belongs to A. In the bi-pattern case, X is a pair \(\left (2^{V_{1}},2^{V_{2}}\right)\) of powersets and an element W of A is a pair of object subsets.

We give now a simple example of abstract closed pattern mining.

Example 5

We exemplify the closure operator f=int∘ext returning closed patterns in the standard closed itemset mining case. We further write subsets as strings, i.e. 12 stands for {1,2}. Patterns are subsets of I={a,b,c,d}, objects in V={1,2,3} are described as d[ V]={a,ab,abc}. We have then ext(b)=23 and as a consequence, f(b)=d(2)∩d(3)=ababc=ab,f(abc)=d(3)=abc and f(d)=abcd. The latter closure means that d is in the set of patterns with empty support set whose greatest element is abcd.

Now, to exemplify abstract closed patterns, we consider the operator p on 2V such that p(e)=e except for singletons whose images are the empty set: p(1)=p(2)=p(3)=. It is straightforward following Definition 1 that p is an interior operator and as a consequence of Proposition 2, f=p∘int∘ext is a closure operator. As we have p∘ext(ab)=p(23)=23, we obtain that f(ab)=abc as in the non-abstract case. However p∘ext(abc)=p(3)= and now f(abc)=abcd is the greatest element with empty abstract support set.

The corresponding abstraction A=p[2123] is generated by union closure of size 2 subsets {12,23,13} and it is straightforward that for any e, p[ e] is the greatest subset of A smaller than or equal to e. For instance, p[ 12]=12 as 12 belongs to A while p[ 1]= as no element of A except is included in subset 1.

We provide hereunder an example of closed bi-pattern mining that makes use of Proposition 9 to represent the interior operator.

Example 6

Let V1={1,2} and V2={3,4} be two object sets and \(\phantom {\dot {i}\!}X_{1}=2^{V_{1}}=\{\emptyset, 1,2,12\}\) while \(\phantom {\dot {i}\!}X_{2}=2^{V_{2}} =\{\emptyset, 3,4,34\}\). Objects of V1 are labelled by subsets of I1={a,b,c} while objects of V2 are labelled by subsets of I2={w,x}. The descriptions of the objects from V1 and V2 respectively as subsets of I1 and I2 are as follows:

  • d1(1)=ab,d1(2)=b,d2(3)=wx,d2(4)=x

Consider the abstraction {(,),(1,4),(2,3),(12,34)} and the associated interior operator p. Now, we have that

  • p(12,34)=(12,34),

  • p(1,34)=(1,4) and int(1,4)=(ab,x)

  • p(12,3)=(2,3) and int(2,3)=(b,wx)

  • p(1,3)=p(,3)=(,) and int(,)=(abc,wx)

We obtain then the abstract bi-concept lattice displayed Fig. 11. The set of abstract closed-bi-patterns with extent different from (,) is then {(b,x),(ab,x),(b,wx)}.

Fig. 11
figure 11

The abstract bi-concept lattice of Example 6. Each node represents a closed bi-pattern (on the left) together with the associated extent (on the right)

Appendix 3: Supplementary details on experimental results

Table 3 Authors from the DBLPE dataset and their index as it appears on the core subgraphs of the bi-patterns q and q depicted en left and right parts of Fig. 9
Table 4 Authors from the DBLPE dataset and their index as it appears on the core subgraphs of the bi-patterns q and q depicted on the left and right part of Fig. 10