Introduction

The team formation (TF) problem plays a crucial role in many real-life applications ranging from software project development to various participatory tasks in social networks. In such applications, collaboration among experts is required. There are a number of experts associated with their capabilities (i.e., skills) and a collaborative task (i.e., project) that requires set of skills needed to be accomplished. The problem is how to find the effective team of experts that covers all the required skills for a given task with least communication cost. It is known that this problem is NP-hard problem (Lappas et al. 2009); hence, it will be interesting to develop heuristic search methods to solve it.

It is well known that the swarm-based algorithms such as particle swarm optimization (PSO) (Vallade and Nakashima 2013) are capable of reaching solutions quickly and efficiently because they have the ability to generate different outputs from the same sample inputs. It is a heuristic method that based on execution of various alternative solutions via iterations to find the best solution. Another adaptive heuristic method is genetic algorithm (GA) (Holland 1975; Kalita et al. 2017). It is based on the natural law of evolution through the natural selection and the exchange of genetic information. Generally speaking, the goal of optimization methods is to find adequate incorporation of a set of parameters to achieve the most satisfaction (e.g., minimum or maximum) that depends on the requirement of the problem.

Therefore, the main objective of this research is to form the effective team of experts with minimum communication cost by using a hybrid improved PSO with a new swap operator and the main operator of GA (i.e., crossover operator). We call the proposed algorithm an improved particle optimization with a new swap operator (IPSONSO).

The problem in Karduck and Sienou (2004) is defined as the process anterior to the forming stage of the group development theory. The key problem is the selection of best candidates that fulfills the requirement specification for achieving the goal. Most of existing team formation based on approximation algorithms (Anagnostopoulos et al. 2012; Kargar et al. 2013) considers different communication costs such as diameter and minimum spanning tree (Lappas et al. 2009) and sum of distance from team leader (Kargar and An 2011).

A generalization of the team formation problem is given (Appel et al. 2014; Li and Shan 2010; Li et al. 2015) by assigning each skill to a specific number of experts. Consideration of the maximum load of experts according to different tasks is taken from Anagnostopoulos et al. (2010) without taking into consideration the minimum communication cost for team formation.

On the other side of team formation problem, a minimal research work has been done based on meta-heuristic algorithms such as PSO and GA (Haupt and Haupt 2004). These algorithms have been successfully applied in an optimization method as in Blum and Roli (2003), Pashaei et al. (2015), Sedighizadeh and Masehian (2009) for many real-world applications.

A group formation method using genetic algorithm is presented in Zhang and Si (2010), where the members for each group are generated based on the students’ programming skill. A genetic algorithm in team formation is used in Nadershahi and Moghaddam (2012) based on Belbin team role that categorized individuals in nine roles regarding their specialty and attitude toward team working.

A team formation problem is presented in Gutiérrez et al. (2016) based on sociometric matrix in which a mathematical programming model for maximizing the efficiency understood relationships among people who share a multidisciplinary work cell is considered. A variable neighborhood local search meta-heuristic is applied in Gutiérrez et al. (2016) to solve team formation problem and showed the most efficient in almost all cases, but in our work, the global search meta-heuristic considered with least minimum communication cost among all the locals is the most efficient all over the search.

A team formation is considered in Huang et al. (2017) based on the available work time and set of skills for each expert in order to build the effective team. Each expert is associated with a skill level indicating his competence in this skill. In our research, all experts that have the ability to perform the skill are attentive to share in a collaborative group in order to achieve the goal.

A mathematical framework for dealing the team formation problem is proposed in Farasat and Nikolaev (2016) explicitly incorporating social structure among experts where a LK-TFP heuristic is used to perform variable-depth neighborhood search and compared the results with standard genetic algorithm. In our paper by given a pool of individuals, an improved PSO algorithm for team formation problem is proposed and compared the results with standard PSO.

Finally, in Fathian et al. (2017) a mathematical model is proposed to maximize team reliability by considering the probability of unreliable experts that may leave the team with a probability and prepare a backup for each unreliable one. In that case, for each team, associated team members in the two sets, namely, main and backup members, should be presented and is effective only in some specific situations. In contrast to our research, among all the available team members, the most feasible one is chosen in the team formation that has no incentive to leave the team.

The rest of the paper is organized as follows. Section 2 illustrates the definition of team formation problem. Section 3 introduces the formulation of proposed algorithm and how it works. Section 4 discusses the experimental results of the proposed algorithm. Finally, Sect. 5 concludes the work and highlights the future work.

Team formation problem

The team formation problem in social network can be formulated as finding a set of experts from a social network graph G(VE) to accomplish a given task (i.e., project) in which a number of experts n exist such that, \(V =\{v_1,v_2,\ldots ,v_n\}\) and a set of m skills \(S =\{s_1,s_2,\ldots ,s_m\}\), which represent their abilities to a given task. Each expert \(v_i\) is associated with a set of specific skills \(s(v_i), s(v_i)\subset S\). The set of experts that have the skill \(s_k\) is denoted as \(C(s_k)\), (i.e., \(C(s_k)\subset V\)). A given task T is formed by a set of required skills (i.e., \(T=\{s_i,\ldots ,s_j\} \subseteq S\)) that can be applied by a set experts forming a team. A set of possible teams that can achieve a given task is denoted as \(X, X={x_1,x_2,\ldots ,x_k}\). Therefore, the task (\(T \subseteq \bigcup _{v_i \in x_k} s(v_i))\). The collaboration cost (i.e., communication cost) between any two experts (e.g., \(v_i\) and \(v_j\)) is denoted by \(e_{ij}\in E\) that can be computed according to Eq. 1.

$$\begin{aligned} e_{ij}=1- \frac{(s(v_i)\cap s(v_j))}{(s(v_i)\cup s(v_j))} \end{aligned}$$
(1)

The goal is to find a team with least communication cost among team members \(CC(x_k)\) according to Eq. 2.

$$\begin{aligned} CC(x_k )= \sum _{i=1}^{|x_k|} \sum _{j=i+1}^{|x_k|} e_{ij} \end{aligned}$$
(2)

where \(|x_k |\) is the cardinality of team \(x_k\).

The team formation problem can be considered as an optimization problem by forming a feasible team \(x^*\) among a set of possible teams which covers the required skills for a given task with minimum communication cost among team’s experts, and \(x^*\) can be obtained by the following

$$\begin{aligned}&{\text {Min}}_{(x_i\in X)} CC(x_i)= \sum _{l=1}^{|x_i|} \sum _{j=l+1}^{|x_i|} e_{ij} \end{aligned}$$
(3)
$$\begin{aligned}&{\text {subject~to}} \nonumber \\&\forall v_i,v_j : e_{ij} \in [0,1] \nonumber \\&\forall s_i \in T,\exists ~~~~ C(s_i)\ge 1 \end{aligned}$$
(4)

where the communication cost between any pair of experts within the range 0 and 1 and for each required skill in the given task, there exists at least one expert that have the required skill. All the skills should be achieved for a given task to obtain a feasible team \(x^*\).

The notations of the team formation problem are summarized in Table 1.

Table 1 Notations of team formation problem

Remark

Set covering problem is one of the traditional problems in complexity theory and computer science. Set covering problem is regarded as one of the most important discrete optimization problems because it can be formulated as a model for various real-life problems, e.g., vehicle routing, resource allocation, nurse scheduling problem, airline crew scheduling, facility location problem. The name of problem, set covering problem, arises from covering the rows of an m-row/n-column zero-one matrix with a subset of columns at minimal cost set (Beasley and Chu 1996). Covering problem can be modeled as follows:

$$\begin{aligned}&{\text {Min} }\sum _{j=1}^{n} c_j x_j \end{aligned}$$
(5)
$$\begin{aligned}&{\text {subject~to}} \nonumber \\&\sum _{j=1}^{n} a_{ij} x_j \ge 1\qquad j=1,\ldots , m \end{aligned}$$
(6)
$$\begin{aligned}&\forall x_j \in (0,1)\qquad j=1,\ldots , n \end{aligned}$$
(7)

Equation (5) is the objective function of set covering problem, where \(x_j\) is decision variable and \(c_j\) denotes to weight or cost of covering j column. Equation (6) is a constraint to assure that each row is covered by at least one column where \(a_{ij}\) is constraint coefficient matrix of size \(m \times n\) whose elements consist of either “1” or “0.” Also, Eq. (7) is the integrality constraint in which the value is expressed as follows

$$\begin{aligned} x_j= {\left\{ \begin{array}{ll} 1, &{}{\text {if }} j \in S ;\\ 0, &{}{\text {otherwise.} }\end{array}\right. } \end{aligned}$$

Despite the fact that it may look to be an easy problem from the objective functions and constraints of the problem, set covering problem is a combinational optimization problem and NP-complete decision problem (Lappas et al. 2009).

As mentioned in the literature, e.g., Kargar and An (2011), team formation problem is a special instance of the minimum set cover problem.

An example of the team formation problem

We describe an example of the team formation problem in Fig. 1.

Fig. 1
figure 1

An example of team formation problem

In Fig. 1, a network of experts \(V=\{v_1,v_2,v_3,v_4,v_5,v_6\}\) is considered where each expert has a set of skills S and there is a communication cost between every two adjacent experts \(v_i,v_j\), which is represented as a weight of edge \((v_i,v_j)\) (e.g., \(w(v_1,v_2)=0.2\)). The communication cost between non-adjacent experts is represented by the shortest path between them.

The aim is to find team X of experts V with the required skills S with a minimum communication cost. In Fig. 1, two teams with the required skills \(X_1=\{v_1,v_2,v_3,v_4\}\) and \(X_2=\{v_2,v_4,v_5,v_6\}\) are obtained.

The proposed algorithm

In the following subsections, the main processes of the standard particle swarm optimization (PSO), single-point crossover, and the improved swap operator are highlighted and invoking them in the proposed algorithm is described.

Particle swarm optimization

Particle swarm optimization (PSO) is a population-based meta-heuristic method developed by Kennedy and Eberhart in 1995 (Eberhart et al. 2001). The main process of the PSO is shown in Fig. 2. The PSO population is called swarm SW, the swarm contains particles (individuals), and each particle is represented by n-dimensional vectors as shown in Eq. 8

$$\begin{aligned} {\mathbf {x}}_{i}= (x_{i1}, x_{i2},\ldots ,x_{in}) \in SW. \end{aligned}$$
(8)

Each particle has a velocity, which is generated randomly as shown in Eq. 9.

$$\begin{aligned} {\mathbf {v}}_i= (v_{i1}, v_{i2},\ldots ,v_{in}). \end{aligned}$$
(9)

The best personal (\(P_{{best}}\)) and global positions (\(g_{{best}}\)) of each particle are assigned according to Eq. 10.

$$\begin{aligned} {\mathbf {p}}_i= (p_{i1}, p_{i2},\ldots ,p_{in}) \in SW. \end{aligned}$$
(10)

At each iteration, each particle updates its personal position (\(P_{{best}}\)) and the global position (\(g_{{best}}\)) among particles in the neighborhood as shown in Eqs. 11 and 12, respectively.

$$\begin{aligned} {\mathbf {x}}_i^{(t+1)}= & {} {\mathbf {x}}_i^{(t)}+ {\mathbf {v}}_i^{(t+1)}, \;\;\;i=\{1,\ldots , SW\} \end{aligned}$$
(11)
$$\begin{aligned} {\mathbf {v}}_i^{(t+1)}= & {} {\mathbf {v}}_i^{(t)}+c_1r_{i1}\times ({\mathbf {p}}_{best_i}^{(t)}-{\mathbf {x}}_i^{(t)})+c_2r_{i2}\times ({\mathbf {g}}_{best}-{\mathbf {x}}_i^{(t)}). \end{aligned}$$
(12)

where \(c_1\) and \(c_2\) are the cognitive and social parameters, respectively. \(r_1\) and \(r_2\) are random vector \(\in [0,1]\). The process are repeated till termination criteria are satisfied.

Fig. 2
figure 2

Particle swarm operator processes

Single-point crossover

Crossover is the one of the most important operators in GA. It creates one or more offspring from the selected parents. The single-point crossover (Goldberg 1989) is one of the most used operators in GA. The process starts by selecting a random point k in the parents between the first gene and the last gene. The two parents are swamping all the genes between the point k and the last gene. The process of the single-point crossover is shown Fig. 3.

Fig. 3
figure 3

Single-point crossover

A new swap operator

A swap operator (SO) in Wang et al. (2003), Wei et al. (2009) an Zhang and Si (2010) consists of two indices SO(ab), which applied on the current solution to make a new solution. For example, if we have a solution \(S=(1-2-3-4-5), {\text {SO}}=(2,3)\); then, the new solution \(S^\prime =S+{\text {SO}}(2,3)=(1-2-3-4-5)+{\text {SO}}(2,3)=(1-3-2-4-5)\). A collection of one or more swap operators SO(s), which can apply sequentially, is called swap sequence (SS). SS applies on a solution by maintaining all its \({\text {SS}}=({\text {SO}}_1,{\text {SO}}_2,\ldots ,{\text {SO}}_n)\) to produce a final solution.

In our proposed algorithm, the proposed swap operator \({{\text {NSO}}}(a,b,c)\) contains three indices: the first one argument a is the \({{ {skill}}}_{id}\), and the second and the third arguments bc are the current and the new experts’ indices, respectively, which are selected randomly and they have the same \({ {skill}}_{id}\) where \(b \ne c\). For example, \({\text {NSO}}(2,1,3)\) means for \({ {skill}}_{id}=2\) there is a swap between the \({{expert}}_{id} = 1\) and \({ {expert}}_{id} = 3\).

Improved Particle Swarm Optimization with New Swap Operator (IPSONSO)

In this subsection, the main structure of the proposed IPSONSO is explained and shown in Algorithm 1.

figure a

In the following subsections, the proposed IPSONSO is applied and explained how to solve team formation problem.

Initialization and representation

IPSONSO starts by setting the initial values of its main parameters such as the population size P, social and cognitive coefficients \(c_1, c_2\) and the maximum number of iterations \({\text {max}}_{itr}\). Given a project Pr contains a set of skills \(s_i\), \(i=\{1,2,\ldots ,d\}\), where d is the number of the requested skills in the project. IPSONSO initializes the positions and the velocities of all particles randomly, where each particle represents a vector of random skills to form the project and the velocity is a sequence of random swap operators, that represented by a new swap operator \({\text {NSO}}(x,y,z)\), where x is the \({ {skill}}_{id}\) and yz are the indices of experts that have the skill from experts’ list \(C(s_i)=\{1,2,\ldots ,E_i\}\).

Particle evaluation

The relationship between experts is represented by a social network, where nodes represent experts and edges represent the communication cost (i.e., weight) between two experts. The weight between expert i and expert j is represented in Eq. 1.

The least communication cost among team members \(CC(x_k)\) can be computed according to Eq. 2. The particle with minimum weight among all evaluated particles is considered as a gbest (global best particle), where the local best is assigned for each particle as pbest.

Particle velocity update

The initial particles’ velocities contain a set of random new swap operators \(({\text {NSO}}(s))\). Each particle updates its velocity as shown in Eq. 13.

The single-point crossover operator is used to produce new individuals by combining sub-individuals from the current individual and the global best individual (gbest) in the whole population. After applying the crossover operator, two new individuals are obtained with mixed expert assignments from each other. Finally, one team configuration will be selected randomly \({\mathbf {x}}_{cross}^{(t)}\)

$$\begin{aligned} {\mathbf {v}}_i^{(t+1)}={\mathbf {v}}_i^{(t)} \oplus \alpha \otimes ({\mathbf {p}}_{best_i}^{(t)}-{\mathbf {x}}_i^{(t)}) \oplus \beta \otimes ({\mathbf {x}}_{cross}^{(t)}-{\mathbf {x}}_i^{(t)}). \end{aligned}$$
(13)

where \(\alpha , \beta \) are random numbers between [0,1] and the mark “\(\oplus \)” is a combined operator of two swap operators. The mark “\(\otimes \)” means the probability of \(\alpha \) that all swap operators are selected in the swap sequences \(({\mathbf {P}}_{best_i}^{(t)}-{\mathbf {x}}_i^{(t)})\) and the probability of \(\beta \) that all swap operators are selected in the swap sequences (\({\mathbf {x}}_{cross}^{(t)}-{\mathbf {x}}_i^{(t)}\)) to include in the updated velocity.

Particle position update

Particle positions are updated according to Eq. 14 by applying the sequences of the new swap operators \([{\text {NSO}}(s)]\) to the current particle in order to obtain the new particle with a new position. All previous process are repeated till reaching to the maximum number of iterations.

$$\begin{aligned} {\mathbf {x}}_i^{(t+1)}={\mathbf {x}}_i^{(t)}\oplus {\mathbf {v}}_i^{(t+1)}, \;\;\;i=\{1,\ldots , SW\} \end{aligned}$$
(14)

Example of IPSONSO for team formation problem

In the following example, we consider a given project Pr which requires a set of skills to be accomplished, i.e., =\(\{Network,Analysis,Algorithm\}\). Also, assume there exist a set of 5 experts (a,b,c,d,e) associated with their skills as follows: \(s(a)=\{Network,Algorithm,Search\}\), \(s(b)=\{Algorithm,Classification,Network\}\), \(s(c)=\{Detection,Analysis\}\), \(s(d)=\{Analysis,Graph\}\), \(s(e)=\{Network,Analysis\}\).

The relationship between experts is represented by a social network where the nodes represent experts and the edges represent the communication cost (i.e., weight) between two experts as shown in Fig. 4.

Fig. 4
figure 4

The relationship between experts

The weight between experts can be computed as shown in Eq. (1).

Some of teams that have the required skills can be formed such as \(T_1=\{a,c\}\), \(T_2=\{a,d\}\), \(T_3=\{a,e\}\), \(T_4=\{b,c\}\), \(T_5=\{a,b,c\}\) and \(T_6=\{a,d,e\}\). The communication cost of the formed teams is defined as follows: \(C(T_1)=\infty \), \(C(T_2)=0.8\), \(C(T_3)=0.8\), \(C(T_4)=\infty \), \(C(T_5)=0.66\), \(C(T_6)=1.6\)

A particle in IPSONSO algorithm is an array list of size \(1 \times 3\), where the first needed skill is “Network,” the second one is “Analysis,” and the third skill is “Algorithm” as shown in Fig. 5. Figure 5 represents the possible values for each index of a particle in the IPSONSO algorithm. As for required \({ {skill}}_{id}=1\), there are three experts that have this skill, i.e., (a,b,e).

Fig. 5
figure 5

Particle representation in the IPSONSO algorithm

Table 2 Example of two particles and their velocities
Fig. 6
figure 6

Example of single-point crossover

In the following subsection, the main steps of the proposed algorithm are highlighted when it is applied on the random dataset as described in Sect. 3.5 and shown in Figs. 4 and 5.

  • Initialization In the IPSONSO algorithm, the initial population (particles) and their velocities are generated randomly. Each velocity is a swap sequence (i.e., sequence set of swap operators) that represented by a tuple \(<x,y,z>\) where x is the \({ {skill}}_{id}\) and y and z are the indices of the current and the new experts, respectively. An example of the initialization of two particles A, B is shown in Table 2.

  • Particles evaluation The communication cost for each particle is computed as: \(C(A) = \infty \), and \(C(B) = 1.55\).

  • Particle positions and velocities update The particle with minimum weight among all evaluated particles is considered as a gbest (particle B in our example), and the local best is assigned for each particle as pbest. In each iteration, the updated velocities and particle positions are computed as shown in Equations 13 and 14, respectively.

  • Crossover The single-point crossover is applied between the gbest and particle A as shown in Fig. 6. The particle with minimum weight is chosen as a result of crossover, in our example \(C(A1)= 1.55\) and \(C( A2)=0.66\). Therefore, the \({\mathbf {x}}_{cross}^{(t)}\) particle is \(A_2\) = (a,c,b).

  • Velocity update. The velocity of particle A is calculated as follows. \(v^{(t)} = ((1,1,2),(2,1,2)) \oplus (3,1,0)= ((1,1,2),(2,1,2),(3,1,0))\).

  • Particle position update. \(A^{(t=1)}= (a,c,a) + ((1,1,2),(2,1,2),(3,1,0)) = (e,c,a)+ ((2,1,2),(3,1,0)) = (e,e,a)+((3,1,0))=(e,e,a)\)

  • Particles evaluation. Particle A (a,c,a) is updated to (e,e,a), and its communication cost is \(C(A)=0.8\).

    The same processes are applied for particle B. The next iteration, a pbest, is updated for particle A that changed from \(\infty \) to 0.8, and the same gbest can be updated according to the particle that has a minimum communication cost. After a number of iterations, the most feasible team is formed so far for required skills (i.e., the global best particle gbest so far).

Numerical experiments

Ten experiments are performed on random dataset as described in Sect. 3.5 with different skills and expert numbers to evaluate the performance of the proposed algorithm that focuses on iteratively minimizing the communication cost among team members. The proposed algorithm is compared against the standard PSO (SPSO). Also, the performance of the proposed algorithm is investigated on real-life DBLP dataset. The experiments are implemented by Eclipse Java EE IDE V-1.2 running on Intel(R) core i3 CPU- 2.53 GHz with 8 GB RAM and (Windows 7).

Parameter setting

In this subsection, the parameter setting of the proposed algorithm is highlighted, which is used in the ten experiments for a random dataset. The parameters are reported in Table 3.

Table 3 Parameter setting

Random dataset

In this subsection, the performance of the proposed algorithm is investigated on random dataset which is described in Sect. 3.5. The proposed algorithm is applied on different numbers of experts and skills. The results of the proposed algorithm are reported on the subsequent subsections.

Comparison between SPSO and IPSONSO on random data

The first test of the proposed algorithm is to compare it against the standard PSO (SPSO) to verify its efficiency. The results are reported in Table 4. In Table 4, the minimum (min), maximum (max), average (mean) and the standard deviation (SD) of the results are reported over 50 random runs. The best results are reported in bold font. The results in Table 4 and Fig. 7 show that the proposed algorithm is better than the standard PSO.

Table 4 Comparison between SPSO and IPSONSO on random data (numerical results)
Fig. 7
figure 7

Comparison between SPSO and IPSONSO on random dataset

Also, the performance of the SPSO and the IPSONSO is shown in Fig. 8 by plotting the number of iterations versus the communication costs. The solid line represents the results of the proposed algorithm, while the dotted line represents the results of the standard PSO (SPSO). The results in Fig. 8 show that the proposed algorithm can obtain minimum communication cost faster than the standard PSO.

Fig. 8
figure 8

Comparison between SPSO and IPSONSO on random dataset on average communication cost

DBLP: real-life data

In this work, the DBLP datasets are used, which has been extracted from DBLP XML released on July 2017. The DBLP dataset is one of the most popular open bibliographic information about computer science journals and different proceedings that can be extracted in the form of XML document type definition (DTD). The following steps are applied to construct 4 tables as follows.

  1. 1.

    Author ( name, paper_key), 6054672 records.

  2. 2.

    Citation (paper_cite_key, paper_cited_key), 79002 records.

  3. 3.

    Conference (conf_key,name,detail), 33953 records.

  4. 4.

    Paper (title, year, conference,paper_key), 1992157 records.

Our attention is focused on papers that have been published only in year 2017 (22364 records). Then, the DBLP dataset is restricted to the following 5 fields of computer science: databases (DB), theory (T), data mining (DM), artificial intelligence (AI), and software engineering (SE).

In order to construct the DBLP graph, the following steps are applied.

  • The expert set consists of the authors who have at least three papers in DBLP (77 authors have published papers \( > 3) \).

  • Two experts are connected if they share papers’ skills. The communication cost \(c_{ij}\) of expert i and j is estimated as shown in Eq. (1).

  • The most important shared skills are considered among the experts extracted from the titles of 267 papers by using StringTokenizer in java.

It worth to mention that the papers of the major 10 conferences in computer science (with 1707 records) are included. Five experiments are conducted, and the average results are taken over 50 runs. The number of skills selected randomly from the most shared skills among authors with initial population is 3 and 10 number of iterations.

Comparison between SPSO and IPSONSO on DBLP dataset

In this subsection, the proposed algorithm is compared against the standard PSO (SPSO) with different numbers of experts and skills for DBLP dataset by reporting the maximum (max), average (mean) and standard deviation (SD) in Table 5.

Table 5 Comparison between SPSO and IPSONSO on DBLP data (numerical results)

Also, in Fig. 9, the results of the standard PSO (SPSO) and the proposed IPSONSO are presented by plotting the number of iterations versus the CI of average communication cost. The solid line represents the results of the proposed IPSONSO, while the dotted line represents the results of the SPSO. The result in Fig. 9 shows that the performance of the proposed algorithm is better than the performance of SPSO.

Fig. 9
figure 9

Comparison between SPSO and IPSONSO on DBLP data on average communication cost

Confidence interval (CI)

A confidence interval (CI) measures the probability that a population parameter falls between two set values (upper and lower bound). It constructed at a confidence level (C) such as 95% (i.e., 95% CI). The 95% confidence interval uses the sample’s mean and standard deviation by assuming a normal distribution. CI can be computed as follows.

$$\begin{aligned} {\text{CI}} = {{mean }} \pm {{ confidence}} \end{aligned}$$
(15)

(\(\gamma \), SD, sample size), where \(\gamma \) depends on the confidence level (i.e., \(\gamma =1-C\)), SD is the standard deviation of the sample, and sample size is the size of population. In case of using 95% CI, \(\gamma =(1-0.95)=0.05\) and CI is used to approximate the mean of the population.

The performance (\(\%\)) between the compared algorithms can be computed in Eq. (16).

$$\begin{aligned} {{Performance}} (\%) = \frac{(Avg_{({\text{SPSO}})}- Avg_{({\text{ISPSONSO}})})}{Avg_{({\text{SPSO}})}} \end{aligned}$$
(16)

where \(Avg_{({\text{SPSO}})}\) and \(Avg_{({\text{ISPSONSO}})}\) are the average results obtained from SPSO and IPSONSO algorithms, respectively.

Confidence interval (CI) for random data

In the following tables, the CI of average communication cost is presented for 10 experiments on random generated data. The results in Table 6 show the average communication cost for experiments 1 and 2. In Table 6, the results of IPSONSO decrease iteratively to the number of iterations than SPSO with achieving better performance ranged from 8% in the second iteration to 19% in the last iteration for experiment 1, while the percentage of the improved results ranged from 4 to 10% when it is compared with SPSO in experiment 2.

Table 6 CI on average communication cost for experiments 1 and 2

The results of experiments 3 and 4 are reported in Table 7. In Table 7, the results of IPSONSO are better and more efficient than SPSO with average communication cost and went down from 5 to 13% during iterations and the average communication cost of proposed IPSONSO minimized by percentage ranged from 2 to 9% at the end of iterations when compared with SPSO results (Table 7).

Table 7 CI on average communication cost for experiments 3 and 4

In Table 8, for experiment 5, the performance of average communication cost of IPSONSO solution is improved within the range 2–8% when it is compared with SPSO along with the number of iterations and the proposed IPSONSO has proven its efficiency for team formation with minimum communication cost in the range from 3 to 7% better than SPSO.

Table 8 CI on average communication cost for experiments 5 and 6

In Table 9, the results of experiment 7 show that the IPSONSO achieves better performance results and reaches to 8% than SPSO with respect to the average communication cost along the number of iterations, and for experiment 8, the average communication cost of the proposed IPSONSO is reduced by 8% over the 20 iterations when it is compared with SPSO.

Table 9 CI on average communication cost for experiments 7 and 8

In Table 10, the results of experiment 9 show that the average communication cost of IPSONSO performance results is improved from 3 to 9% iteratively with respect to number of iterations when it is compared with the SPSO solution and the results of experiment 10 show that the average communication cost of IPSONSO is reduced iteratively and achieved better performance than SPSO by 8% with respect to the large number of experts and skills.

Table 10 CI on average communication cost for experiments 9 and 10

In Fig. 10, the CI of the proposed algorithm is presented against the standard PSO for different skill numbers by plotting the number of iterations against the CI on average communication cost. The solid line represents the results of the proposed algorithm, while the dotted line represents the standard PSO. The results in Fig. 10 show that the proposed algorithm is better than the standard PSO.

Fig. 10
figure 10

Confidence interval of SPSO and IPSONSO on random data

Confidence interval (CI) of SPSO and IPSONSO for DBLP dataset

In this subsection, the CI of SPSO and IPSONSO for DBLP dataset is reported with different numbers of skills as shown in Tables 11, 12 and 13. The results in Table 11 show that the average communication cost of IPSONSO achieves better results than SPSO over the number of iterations. The percentage of improved results is up to 5 and 8% for 2 and 4 skills, respectively, when it is compared with SPSO (Fig. 11).

Table 11 CI on average communication cost for 2 and 4 skills in DBLP dataset
Fig. 11
figure 11

Confidence interval of SPSO and IPSONSO on DBLP dataset

In Table 12, the results of the PSO and IPSONSO are reported for 6 and 8 skills. The results in Table 12 show that the IPSONSO obtains better and more efficient results than SPSO with average communication cost and goes down from 3 to 5% during iterations for 6 skills, while it costs up to 6% better than SPSO for 8 skills.

Table 12 CI on average communication cost for 6 and 8 skills in DBLP dataset

Finally, the IPSONSO algorithm achieves better performance results ranged from 2 to 7% than SPSO with respect to the average communication cost along the number of iterations and number of skills.

Table 13 CI on average communication cost for 10 skills in DBLP dataset

The results in Tables 11, 12 and 13 and Fig. 11 show that the performance of the proposed algorithm is better the performance of the standard PSO algorithm.

We can conclude from the previous tables and figure that the performance of the proposed algorithm is better than the performance of the standard PSO.

Average processing time of SPSO and IPSONSO on DBLP dataset

The average processing time (in seconds) of the SPSO and IPSONSO is reported in Table 14 over 30 runs. The time for forming a team by using the proposed algorithm IPSONSO increases almost linearly with number of skills with average processing time ranged from 8 to 34% more time than SPSO due to some processing factors such as the crossover and swap sequence operator.

Table 14 Average processing time of SPSO and IPSONSO on DBLP dataset

Conclusion and future work

Team formation problem is the problem of finding a group of team members with the requirement skills to perform a specific task. In this study, a new particle swarm optimization algorithm is investigated with a new swap operator to solve team formation problem. The proposed algorithm is called improved particle optimization with new swap operator (IPSONSO). In IPSONSO algorithm, a new swap operator NSO(xyz) is proposed, where x is the \({ {skill}}_{id}\) and yz are the indices of experts that have the skill from experts’ list. Invoking the single-point crossover in the proposed algorithm can exploit the promising area of the solutions and accelerate the convergence of it by mating the global best solution with a random selected solution. The performance of proposed algorithm is investigated on ten experiments with different numbers of skills and experts and five experiments for real-life DBLP dataset. The results of the proposed algorithm show that it can obtain a promising result in reasonable time. In the future work, combination of the proposed algorithm with other swarm intelligence algorithms is considered to accelerate the convergence of it and avoid the premature convergence. It is worthwhile to test our proposed algorithm over various benchmark problems of nonlinear mixed integer programming problems.