Keywords

1 Introduction

Networks of company branches operating within the city arise in different domains like retail industry, banking and finance, car manufacturing and many other. Different units, e.g., shops or bank branches, may be located in different areas, serve different number of customers and may have different efficiency of functioning related to actions of line staff and managers. In this regard, modification of the network of branches (closing or opening units) is considered as multi-criteria optimization problem with at least two criteria: (i) profitability of a branch (taking into account cost of maintenance, remuneration, rental charges etc.), (ii) Quality-of-Service (QoS) requirements (related to the business standards of customer care).

Collection of massive data sets about customer behavior makes possible to conclude about implicit factors of customer satisfaction (QoS criteria). For the set of branches distributed within the city and having identical functionality, one of the determinant factors of customer satisfaction is spatial availability of the branches. To decide if a certain branch is located conveniently for a certain customer, one needs to know the areas of his or her frequent visits. After that, we can optimize branches network to achieve better spatial availability for the population of customers or to reduce the effect of closing the branches.

This study is focused on the latter case when a decision maker is aimed to close a fixed number of branches to reduce operational costs of a network. As a baseline business case, we consider a problem of optimizing bank branches network under constraint on maximum desirable number of units after modification of the network. We assume that each customer is assigned with a number of points of interest, and available branches may be located in the neighborhood of each of these points. We formulate a problem of optimizing spatial availability of branches networks and propose several greedy algorithms to solve it. To show that spatial availability is related to actual choice of branches by customers, we make a prediction on preferable set of branches for each customer according to her points of interests, and compare the predictions with the actual places of visits after closing the branch. As the research is conducted in partnership with one of the largest regional banks in Russian Federation, for the experimental study we use the data about visits and transactions of more than 800 000 customers for 2 years.

The outline of the paper is as follows. In the next section, related works are discussed. In Sect. 3 a formulation of the problem is given. In Sect. 4 algorithms for solving the considered problem are proposed. Section 5 contains a description of the data set and the results of computational experiments. In the last section, conclusions are drawn.

2 Related Work

The problem of branch location evaluation has been widely studied. In [12], this problem in application for shopping centers is solved. The following characteristics are taken into account: diversity of the tenant inside the shopping center, retail agglomeration near the shopping center, distance to metro stations, and distance between consumers and shopping center. To calculate the last two characteristics, the Euclidean distance is used. The problem of branch network optimization was studied in [14]. In that paper, a Lagrangian relaxation optimization method is proposed to optimize location of locomotive maintenance shops. In [10], a multi-agent model for optimizing supermarkets location is proposed. Using the Particle Swarm Optimization heuristic, the model iteratively determines a place maximizing the sales volume of the new supermarket.

The problem of branch location evaluation in application to a bank network is studied in a number of papers. In [2] the problem is solved by an integrated method, based on analytic hierarchy process, geographic information system and maximal covering location problem. In [13], this problem is formulated and studied as a Markov decision process. In [6] the problem was studied using demand–covering models which determine the locations that achieve the maximum capture of the demand.

The problem of optimizing a bank branches network has been also studied recently. In [9], the problem is solved for the branch network of a German retail bank. A decision support system us used for this purpose. In [3] the problem was studied by a tabu search optimization algorithm [5] in application to a Turkish bank’s branches network. In [7] the problem was solved by the following algorithm. A related linear problem is solved first, then the obtained solution is iteratively improved by a local search algorithm. In [11], three heuristic algorithms are suggested to solve the problem: greedy interchange, tabu search and Lagrangian relaxation approximation. A bank’s branches network in a large-size town Amherst, USA was analyzed by these algorithms.

In this study, due to the availability of digital traces of customers of our industrial partner, we present new statement of the problem of optimizing branch location which uses zones of interests of individuals instead of city-wide information about popularity of different areas. As a result, we formulate target function as spatial availability estimated directly from points of interests of different customers. Thus, the present study is a step forward in branch network optimization and, in particular, in a bank branches network optimization.

3 Problem Statement

Suppose we have a company with branches network of size M within the city, and \(b_1,...,b_k,...,b_M\) – branches represented with pairs of geo coordinates. This company has N associated customers \(c_1,...,c_i,...,c_N\). Customers are described with the following attributes: (i) place of living L (geo coordinates); (ii) place of work W (geo coordinates); (iii) ordered sequence of pairs \(\langle date, branch\rangle \) of visits to branches; (iv) ordered sequence of pairs \(\langle date, P_{ij}\rangle \) of visits to different locations within the city where \(P_{ij}\) is j-th known location for i-th customer. If the considered company is a bank, then (iv) may be extracted, from transactional data, because transactions have the field ‘Address’, and one can extract from this field geo coordinates of a location where purchase was made. In this case, (iv) for a fixed date is a daily path of payments of a user within the city.

Let’s assume that a company decided to close K branches of M (e.g. for optimizing the expenses). The problem is to find K of M branches which closing leads to the smallest reduction in spatial availability of branches network for the customers.

In the simplest case, all branches for customer i can be divided into two types, accessible and non-accessible. By the accessible branch for a customer i, we mean a branch that is within the \(\delta \)-neighborhood of any of points \(S = \{H,L,P_{ij}\}\). Here \(\delta \) may be chosen as a maximum walking distance still comfortable for a customer, e.g. 1 km.

Let’s introduce the accessibility coefficient \(a_{ik}\) as a measure of accessibility of a branch k for a customer i. This coefficient may be defined in several ways as follows:

  1. (i)

    \(a_{ik} = \mathbb {I}_{\delta }(c_i, b_k)\) where \(\mathbb {I}_{\delta }(c_i, b_k)\) is a binary variable. If its value is 0, it denotes that k-th branch is not \(\delta \)-accessible for customer i, and 1 denotes opposite situation;

  2. (ii)

    \(a_{ik} = \Big ({\frac{\mathbb {I}_{\delta }(c_i,b_k)}{\mathbb {B}_{\delta }(c_i)}}\Big )^\gamma \) for \(\mathbb {B}_{\delta }(c_i)=\sum _k{\mathbb {I}_{\delta }(c_i,b_k)} > 0\), and 0 otherwise (\(\gamma >0\)). The meaning of this expression is as follows. When \(b_k\) is not accessible for customer i (or customer i does not have any accessible branch in \(\delta \)-neighborhood), the accessibility coefficient is equal to zero. If \(b_k\) is accessible, \(a_{ik}\) is the greater, the smaller is the number of accessible branches for customer i, \(\mathbb {B}_{\delta }(c_i)\). The logic behind that is that customers with a higher number of accessible branches should influence the decision about closing one of them to a smaller extent than customers for which the accessible branch is unique. Here, coefficient \(\gamma \) allows to tune the extent to which customers with increasing number of accessible branches will influence the results.

  3. (iii)

    \(a_{ik} = e^{-\lambda \cdot d_{min}(c_i,b_k)}, \lambda > 0\), where \(d_{min}(c_i,b_k)=\min _j d(P_{ij},b_k)\)—minimum distance to branch \(b_k\) from any of the points of interests of customer i. In such a case, accessibility coefficient exponentially decreases with an increase of a distance of branch \(b_k\) from the nearest of points of customer i.

Further, we create an accessibility matrix \(A = \{a_{ik}\}\), in which the sum of elements for i-th row (i-th customer) means the potential coverage of a customer, that is, a large value of the sum corresponds to the case when a customer has a lot of appropriate branches; and a small value of the sum denotes critical customers. For the case (ii), this value for the customers with at least one accessible branch, is equal to 1. The sum by columns for all three cases means the potential popularity of the branch.

Considering this matrix, we may also come to conclusion that if \(\mathbb {B}_{\delta }(c_i)>K\) that customer \(c_i\) will be satisfied after elimination of each K columns, and then, to exclude them from consideration.

The optimization problem can be stated as follows: to find K columns of accessibility matrix A which deletion will minimize the number of customers with \(\mathbb {B}_{\delta }(c_i)=0\) in the case (i) or minimize the decrease of \(\sum _i{a_{ik}}\) in the cases (ii) and (iii). In other words, we would like to reduce the number of branches keeping the number of potentially satisfied customers as large as it is possible. Further we shall focus on the case (i) and by potentially satisfied customer, we mean customer having at least one branch in the \(\delta \)-neighborhood of any of his or her visiting points.

4 Proposed Algorithms

In order to solve the problem considered in the previous section, algorithms of three types were developed: (i) a brute-force algorithm, that guarantees to find exact solutions, but work very slowly; (ii) greedy algorithms, which are able to quickly find approximate solutions; (iii) a local search algorithm that can be used to find an approximate solution or to improve a solution found by a greedy algorithm. All these algorithms are described below. Also, in Subsect. 4.4 several examples of the accessibility matrices and results obtained by the proposed algorithms on them are shown.

4.1 Brute-Force Algorithm

The optimal solution of the considered problem can be found by the brute-force approach. There are two possible variants. According to the first one, \(\left( {\begin{array}{c}M\\ M-K\end{array}}\right) \) different combinations of choice of \(M-K\) ‘good’ branches from M branches are generated, then for each of them the number of potentially satisfied customers is calculated, and the optimal combination (with the greatest number of satisfied customers) of \(M-K\) branches is chosen. According to the second variant, K ‘bad’ branches should be found, so \(\left( {\begin{array}{c}M\\ K\end{array}}\right) \) combinations are generated, then for each of them the number of potentially unsatisfied customers is calculated, and the optimal combination (with the smallest number of unsatisfied customers) of K branches is chosen.

In practice, \(M \gg K\), and also K is quite small (as one wants to close just several branches), so in the following computational experiments the second variant (choosing K ‘bad’ branches) is used. Since \(\left( {\begin{array}{c}M\\ K\end{array}}\right) = O(M^K)\), the algorithm works in polynomial time. The pseudo-code is shown in Algorithm 1.

figure a

4.2 Greedy Algorithms

Despite the fact, that the proposed brute-force algorithm works in polynomial time, on some hard instances of the considered problem it cannot be executed in reasonable time. A greedy algorithm can be used instead to quickly find an approximate solution, which can be similar to the optimal one (but it is not guaranteed).

The following greedy algorithm is based on closing K branches with the smallest number of unsatisfied customers one by one. Further this algorithm is denoted as GreedyLP (LP stands for Lowest Popularity). The pseudo-code of this algorithm is shown in Algorithm 2.

figure b

The following greedy algorithm is based on keeping branches with the greatest number of satisfied customers. Further this algorithm is denoted as GreedyHP (HP stands for Highest Popularity). The pseudo-code of this algorithm is shown in Algorithm 3.

figure c

Both proposed greedy algorithms have linear time complexity. If \(K<\frac{M}{2}\), GreedyLP requires less iterations than GreedyHP.

4.3 Simple Hill Climbing Algorithm

To find an approximate solution, it is also possible to use local search heuristics. A set of K branches corresponds to a Boolean vector of size M, where \(i-th\) component is 1 if the set contains branch number i and 0 otherwise. Such a vector, in turn, corresponds to a point in a search space S. This search space consists of all different Boolean vectors of size M and with exactly K 1s, so its size is \(\left( {\begin{array}{c}M\\ K\end{array}}\right) \). For an arbitrary point \(\chi \in S\), a neighborhood \(Nh(\chi )\) of radius R is defined as a set of such points \(\chi '\), \(\chi ^{'} \in S\), that \(d_H(\chi ,\chi ') = R\), where \(d_H(\chi ,\chi ')\) stands for Hamming distance between \(\chi \) and \(\chi '\).

Let’s consider an objective function which operates in this search space. Given a point from the search space, it calculates the number of customers that would be potentially unsatisfied after the closure of all corresponding branches. Any discrete black-box optimization algorithm can be employed to minimize it, e.g., the genetic algorithm [1] or the tabu search [5]. As an optimization algorithm, the simple hill climbing algorithm [8] was chosen.

Simple hill climbing starts from a given point, in the role of which it is possible to use a random point from the search space. Then the algorithm checks points from the neighborhood of the given point. If a better point (i.e. a point with lower value of the objective function) is found, then this new point is considered as a new best point, and checking of its neighborhood is started. If all points from a current neighborhood are worse than a current best point, then it means that a local minimum is reached. The pseudo-code of the proposed algorithm is shown in Algorithm 4. Here \(\chi _{best}\) is the point with the best found value of the objective function.

figure d

In the proposed algorithm, checking a neighborhood of a given point has a linear time complexity. The amount of such checks cannot be predicted, but in practice for the considered problem it is usually less than 10.

4.4 Examples

Let’s consider two instances of the proposed problem. In the first one, \(M=4, K=2\), the accessibility matrix is \(A_1\), see Table 1. In the second one, \(M=3, K=1\), the accessibility matrix is \(A_2\).

Table 1. Examples of the accessibility matrices

GreedyLP finds a non-optimal solution for the first instance. It removes branch 1 on step 1 and branch 2 on step 2. As a result, one customer which has only one accessible branch is unsatisfied. As for GreedyHP, it keeps branch 1 on step 1 and branch 4 on step 2, and as a result all customers are satisfied. However, GreedyHP finds a non-optimal solution for the second instance. GreedyHP keeps branch 3 at step 1, and branch 2 at step 2, leaving customer 1 unsatisfied. However, if branches 1 and 2 are kept, all customers will be satisfied.

As for simple hill climbing, it finds the optimal solution for both considered instances if it starts from a solution found by any proposed greedy algorithm.

5 Computational Experiments

In this section, a data set used for computational experiments is described first. Then the influence of the spatial accessibility on customers choice of bank branches is studied. Finally, the experimental study of the algorithms, proposed in Sect. 4 is presented.

5.1 Data Set Description

We consider a data set from one of the largest regional banks in Russia. This data set contains data about 58 bank branches, 844 864 customers and 3 132 296 customers’ visits. Customers visits in one year (from September 2017 to September 2018) were taken into account in this study. Note, that only 51 out of 58 bank branches were active at the moment of this study. In Fig. 1a the distribution of bank branches by the amount of customers visits is presented. In Fig. 1b the distribution of bank branches by customers visits per month is shown. Seasonal effects can be seen: the activity during the summer is decreasing, while March and December are highly popular months.

Fig. 1.
figure 1

Customers visits to bank branches

According to Sect. 3, three types of points of interests (POIs) are used for each customer. The first two types (home address and work address) were provided by the industrial partner. POIs of the third type (places of visits) were taken from history of debit card transactions (the text field “Address” was used). This allows us to supplement information about home and work POIs which a customer reports to a bank because this information may become outdated.

To transform address data into pairs of geo coordinates, we use OpenStreetMap open database. OpenStreetMap is a map service, in which data are updated by its users in a collaborative manner. Using OpenStreetMap, we have compiled a database of addresses of St. Petersburg. This database matches text representation of addresses to their coordinates. Then, we were able to identify geo coordinates of customers’ POI.

POI extracted from transactions may contain outliers—points that were visited only once and that are off the usual routes of a customer. As our goal is to identify the frequently visited locations, we propose the following approach of getting POI from traces of visits to locations for a given customer \(c_i\):

  1. 1.

    Collect a set of initial POI \(\{l_{iv}\}, v=1\dots V_i\) where \(V_i\) is a number of distinct locations of a customer i, \(l_{iv}\) is a location of v-th point.

  2. 2.

    Group a set of locations \(\{l_{iv}\}\) into \(\hat{V_i}\) clusters.

  3. 3.

    Remove clusters which contain a single point to get \(V_i^*\) clusters.

  4. 4.

    Find centers of \(V_i^*\) clusters and replace initial set of POI with \(V_i^*\) points which stand for the centers of payment interests of customer \(c_i\).

We used this algorithm to get POI from transactional data. For step 2, DBSCAN clustering method was used [4].

5.2 On the Importance of Spatial Accessibility

We tested the hypothesis that the spatial accessibility of bank branches is important for customers. Two bank branches closed in Spring of 2018 were chosen for this purpose. Hereinafter they are called Branch 1 and Branch 2. For each of these branches, customers visiting it at least twice during the last 6 months before the closure were considered. It turned out, that for Branch 1 there were 964 such customers, while for Branch 2 there were 1483 of them. It was counted, how many customers visited other bank branches during 6 months after the closure. For Branch 1, there were 252 such customers, for Branch 2 – 409 customers. Then it was analyzed how many of these customers visited the most accessible active branches. The results are presented in Table 2. Here ‘Top-i’ stands for the amount of customers visited at least one of the i most accessible active branches (in accordance with case (iii), see Sect. 3). For instance, value 109 of Top-2 for Branch 1 means, that after the closure of Branch 1, 109 out of 252 of its customers visited one of two most accessible branches. Note, that each customer has its own list of the most accessible branches.

Table 2. The comparison of bank branches spatial accessibility and real visits for Branch 1 and Branch 2 customers

It turned out, that 65%–75% of customers chose one of top 5 branches by accessibility (out of 51) after the closure. Thus, it is can be concluded, that the hypothesis on spatial accessibility influence was confirmed.

5.3 Experimental Study of the Proposed Algorithms

The algorithms described in Sect. 4 were implemented in a form of a sequential C++ program. While in almost all cases K was significantly smaller than M, only one greedy algorithm, GreedyLP, was used. It was done because GreedyLP suits better for small K than GreedyHP. Hereinafter ‘GreedyLP+SHC’ stands for simple hill climbing which starts from a point found by the GreedyLP algorithm. In simple hill climbing, radius R was equal to 2 (see Subsect. 4.3).

Computational experiments were held on a computer equipped with the 6-core processor Intel i7-3930K and 16 GB of RAM. Two series of experiments were conducted. In the first one, all 51 branches were available for closing (i.e. M was equal to 51), K (the amount of branches for closing) was varied from 1 to 10. The obtained results are presented in Table 3. Here ‘calc.’ stands for the number of the objective function calculations, while ‘unsat.’ stands for the number of unsatisfied customers in the solution found by the corresponding algorithm. Hereinafter the best found solutions are marked with bold, while solutions which are worse than the best ones, are marked with Italic. It should be noted, that the brute-force algorithm was not launched for \(K=5,..,10\) because the corresponding experiments could not be finished in reasonable time. For instance, for \(K=5\) it would take about 10 days.

Table 3. Results for \(M=51, K=1,..,10\), time in seconds

According to Table 3, for \(K=1,..4\) all three algorithms found optimal solutions. As for \(K=5,..10\), the best solutions are unknown, because, as it was mentioned above, the brute-force algorithm was not launched in these cases. Nevertheless, for \(K=6,7,8\) GreedyLP+SHC found better solutions than GreedyLP. It is not guaranteed that solutions, found by GreedyLP+SHC, are optimal. However, these solutions are local minima (in the sense described in Subsect. 4.3). It means that each of them can not be improved by replacing any branch from the corresponding set by any other possible branch. Note that GreedyLP+SHC takes a little more time compared to GreedyLP. However, it takes reasonable time even for the most difficult instances of the problem. Note, the in all cases GreedyLP+SHC required from 1 up to 3 neighborhood checks (see Subsect. 4.3) to reach a local minimum. These results show, that a solution found by GreedyLP is a very good start point for GreedyLP+SHC.

The results from Table 3 are also shown in Fig. 2. In Fig. 2a the solving time in seconds are shown on a logarithmic scale, while in Fig. 2b the objective function values are shown on a linear scale.

In the second series of experiments, a constraint was added. Only 17 out of 51 branches were available for closing (i.e. M was equal to 17). These branches were chosen because other 34 branches cannot be closed in accordance with the bank’s strategy. K was varied from 1 to 10. Note, that in the accessibility matrix all 51 branches were presented, so the whole bank branches network was taken into account during the objective function calculations. The obtained results are presented in Table 4. The brute-force algorithm managed to finish its work within the runtime limit of 1 day for all K, so it was possible to compare solutions found by GreedyLP and GreedyLP+SHC with optimal solutions in all cases.

Fig. 2.
figure 2

Results for \(M=51,K=1,..,10\)

It turned out, that the branches, recommended for closing in accordance with the first experiment (without any constraint) significantly differ from the ones recommended in accordance with the second experiment. In particular, for \(K=1\) in the first experiment the closure of the recommended branch will lead to 17 unsatisfied customers (see Table 3), while in the second experiment the closure of the recommended branch will lead to 498 unsatisfied customers. It means that the employed constraint is very strong and some branches which suit well for closing from the spatial accessibility point of view, cannot be closed due to some additional reasons.

Table 4. Results for \(M=17, K=1,..,10\), time in seconds

According to Table 4, both Brute-force and GreedyLP+SHC found optimal solutions in all cases. As for GreedyLP, it found optimal solutions in all cases except \(K=5\). The results from Table 4 are also shown in Fig. 3. In Fig. 3a the solving time are shown on a logarithmic scale, while in Fig. 3b the found objective function values are shown on a linear scale.

Fig. 3.
figure 3

Results for \(M=17,K=1,..,10\)

The obtained results show, that GreedyLP suits very well for the considered problem. It works fast and in almost all cases finds an optimal solution. GreedyLP+SHC, being a little slower than GreedyLP, finds optimal solutions in all practical cases.

6 Conclusions

In this paper, the problem of company branches location optimization was considered. It was considered as a problem of minimizing the number of potentially dissatisfied customers according to the distances between points of individuals and branches. According to the conducted experiments, a greedy algorithm found exact solutions of the considered problem in almost all cases. A local search algorithm, which starts from a solution found by the greedy algorithm, managed to find exact solutions in all cases, where it was possible to verify by the brute-force algorithm. Meanwhile, both suggested algorithms work much faster than the brute-force algorithm. We have also shown that in the case of constraints related to the selection of branches for closing, the number of unsatisfied customers can increase significantly. In the future, we are planning to apply the suggested approach to other data sets.