Introduction

Frequently changing demands, global competition and technological advances pose new challenges to the manufacturing sector. Traditional centralized hierarchical organizations cannot effectively respond to rapidly changing demands, innovative production processes and highly dynamic business partnership in supply chains. How to take advantage of the advancement of technologies to effectively support the operation and create competitive advantage is critical for enterprises to survive. To reap the potential technological benefits, new organizational structure and strategy must be developed to effectively manage the business processes/workflows, resources and changes in business environment to support inter-enterprise collaboration.

Manufacturers rely on cooperation and collaboration of business partners to share costs, risks and expertise as no single company has all the expertise needed. The partners need to collaborate to achieve a business goal by forming a supply chain network. A supply chain is a system of organizations, people, activities, information and resources involved in transforming resources, materials and components into a finished product that is delivered to the end customer. Please refer to the book by Hugos (2018) for an introduction to supply chain management and the recent works by Moktadir et al. (2018a, b) on sustainable supply chain management and robustness of supply chains by Monostori (2018). The systemic, strategic coordination of the traditional business functions and the tactics across these business functions within a particular company and across businesses within the supply chain for improving the long-term performance is the essence of supply chain management (Mentzer et al. 2001). Lambert et al. defined supply chain management as, “the integration of key business processes from end user through original suppliers, that provides products, services, and information that add value for customers and other stakeholders” (Lambert et al. 2006). The concept of virtual enterprises (VE) makes it possible to achieve business goals through dynamic coalition and sharing of core competencies and resources in supply chains. It also poses new challenges and issues (Petrie and Bussler 2003). A wide variety of research issues and topics of VE have been studied, including cooperation/coordination by Camarinha-Matos and Pantoja-Lima (2001), formation by Afsarmanesh and Analide (2009), Camarinha-Matos et al. (2009), Hoffner et al. (2001), partner selection by Camarinha-Matos and Cardoso (1999), Hsieh and Lin (2012), planning and control by Soares et al. (2000), McFarlane and Bussmann (2000), dynamic network process management by Grefen et al. (2009), dynamic process composition by Hsieh and Chiang (2011) and design and implementation of automated procurement systems by Jagdev et al. (2008) in VE. Recently, Samdantsoodol et al. (2017) studied how to predict the relationships between VE and agility in supply chains. Kovács and Kot (2017) studied economic and social effects of novel supply chain concepts and VE. However, scalability of supply chains and VE is not addressed. Scalability is an important issue but there is little study in the context of supply chains and VE. Scalability is defined by Putnik et al. (2013) as the design of a system with adjustable structure to enable system adjustment in response to market demand changes. Scalability is a system’s feature that provides potentials for resolving a number of problems in supply chains design and operation of VE.

In the existing literature, several issues of supply chain management have been studied. Liu and Chung (2017) considered a two-stage supply chain problem in which the first stage is to produce jobs by several suppliers and the second stage is to transport those jobs by a number of vehicles. A centralized mathematical model is established to describe the problem and develop a solution algorithm. However, the scheduling problem for multistage supply chain was not addressed in [a]. Ivanov et al. (2016) studied a two-stage supply chain with job shop processes at each supplier stage. Ivanov et al. introduced a robust analysis of schedule coordination in the presence of disruptions in capacities and supply to derive managerial insights for scheduling problem and dynamic control methods for supply chain coordination. However, an extended study of Ivanov et al. (2016) is required for multistage supply chain. To respond to business opportunities, an important issue is to develop a flexible, reconfigurable and scalable technology for integration of collaborative workflows or processes of partners in supply chains.

In this paper, we will focus on dynamic configuration and collaborative workflow scheduling in VE to attain flexibility, reconfigurability and scalability. The problem is to develop a solution methodology for the partners to configure their resources and create workflow schedules to fulfill customers’ orders timely under workflow and resource constraints. An effective scheme for managing collaborative workflows in supply chain networks should provide a methodology that is flexible, reconfigurable and scalable to respond to business opportunities. We will propose architecture and design methodology to reduce cost and time in the development of software for managing collaborative workflows. In addition to develop software to support workflow scheduling in supply chains, we also study the scalability of our approach in terms of response time. Response time is an important performance index in measuring scalability of a supply chain management method. To be applicable in supply chains, response time should be acceptable as the number of partners in the network grows. We exploit recent advancements in multi-agent systems, scheduling theory and algorithms to propose a scalable method to dynamically and collaboratively configure and schedule workflows in supply chains.

To achieve flexibility, reconfigurability and scalability, several requirements must be met. First, the workflow of a company in supply chains must be described and specified by a standard format. Second, the multi-agent system platform used for implementation must also support information infrastructure and interaction protocols/mechanism defined by industrial standard organization to attain interoperability between agents. Third, the scheduling algorithm must be developed based on dynamic publication and discovery of services. Fourth, the scheduling method must be scalable by taking advantage of distributed computing architecture based on a divide-and-conquer strategy. Fifth, to achieve interoperability, all information in the negotiation processes, including call for proposals, proposals, awarding of contracts and establishment of contracts, must be described based on a standard format such as XML.

To propose a pragmatic, sustainable, flexible and scalable methodology for solving the workflow management problem, a proper architecture and suitable models must be adopted. Nilsson (1998) and Ferber (1999) indicated that the distributed architecture of multi-agent systems (MAS) and agents’ characteristics of autonomy and cooperation make MAS a potential model for managing collaborative workflows. Cooperative distributed problem solving (CDPS) by Durfee et al. (1989) is a technique for loosely coupled network of problem solvers to work together to solve problems that are beyond their individual capabilities. It is an approach for solving a problem based on coordination and cooperation in multi-agent systems (MAS). Our solution methodology combines MAS architecture by Ferber (1999) and Nilsson (1998) with Petri net models (Murata 1989) and CDPS.

In our architecture, the workflow to be performed by an agent is represented by a workflow agent and each resource is modeled by a resource agent. The study by Wang et al. (2007) shows that MAS provide a flexible architecture for capturing the main features of VE and agent-based computer integrated manufacturing systems. In MAS, the most well-known protocol for coordination and negotiation is the contract net protocol (CNP) by Smith (1980). There are a lot of works by Parunak (1987), Ramos (1996), Neligwa and Fletcher (2003), Hsieh and Lin (2014a, b) on distributing tasks in MAS with CNP. Our approach takes advantage of CNP and the service publication/discovery capabilities of MAS defined by FIPA. To facilitate representation of workflows, a modeling tool or specification language is required to represent atomic services. In the existing literature, many workflow specification languages have been proposed, for example XPDL by Workflow Management Coalition (1999) and Web Services Business Process Execution Language (WS-BPEL) by OASIS (2009). However, these workflow specification languages lack formal analysis method. The works by van der Aalst (1998), van der Aalst and Kumar (2001) and Weske et al. (2004) indicate that Petri net is an effective model for modeling and analysis of workflows. To endow each agent with the knowledge to perform operations in the workflows, we construct the timed Petri net (TPN) model for each workflow agent and resource agent. The Petri Net Markup Language (PNML) by Weber and Kindler (2002) and Billington et al. (2003) is an XML-based interchange standard for Petri nets. Therefore, we adopt PNML as the format for representing the Petri net models. Our approach uses a software module to formulate the scheduling problem based on the Petri net models and the order requirements. The cost and time involved in the development of scheduling software can be significantly reduced. The collaborative workflow scheduling problem can be decomposed into a number of interrelated workflow scheduling subproblems that are solved by individual agents. To schedule workflows, we first transform the TPN models into network models and then develop a scheduling algorithm by combining network models, Lagrangian relaxation and subgradient algorithm. We illustrate our dynamic configuration and collaborative workflow scheduling method by examples.

Based on the proposed dynamic configuration and workflow scheduling method, we will present the analysis to show that our method is scalable in terms of response time as the size of supply chain network grows. The response time is the longest response time of all directed paths that start with a leaf node and end with the final node of a supply chain network. We also illustrate scalability of our approach by examples. We have compared our approach with an industrial centralized problem solver used in the existing literature. In the work by Liu and Chung (2017), a two-stage supply chain is considered and a centralized mathematical model is established to describe the problem and develop a solution algorithm. Both our analysis and numerical results in “Scalability analysis and verification by examples” section indicate that our approach is much more efficient than the centralized industrial problem solver as the supply chains grow.

The remainder of this paper is organized as follows. In “Dynamic configuration and workflow scheduling of supply chain networks” section, we describe the dynamic configuration and workflow scheduling problem in supply chains and introduce our approach in “A model-based collaborative scheduling approach” section. In fourth section, we propose a “Subgradient method for scheduling collaborative workflows.” In “Agent interaction model for collaborative scheduling” section, we introduce our agent interaction model for configuring and scheduling collaborative workflows. We present the experimental results by examples in “Numerical results” section. In “Scalability analysis and verification by examples” section, we conduct scalability analysis and verify our analysis by examples. We conclude this paper in “Conclusions” section.

Dynamic configuration and workflow scheduling of supply chain networks

Figure 1 illustrates a supply chain formed by six companies. For a manufacturer, usually there are several activities performed in their operations, from order management, product design, process design, manufacturing to delivery. Although there are a variety of tools that support these activities individually, there is still a lack of methodology that addresses how to effectively configure supply chain networks and manage relevant workflows of partners. As the above-mentioned activities are usually distributed, an urgent need is to develop a framework to support dynamic configuration of supply chain networks and optimization of workflows in collaborative and distributed environment. In this paper, we focus on the development of a software platform to support dynamic supply chain network configuration and collaborative workflow management.

Fig. 1
figure 1

Manufacturers in a supply chain

Note that formation of a supply chain network and scheduling of workflows in the supply chain network are two related problems. In practice, a two-stage process is usually adopted to find a solution. At the first stage, generation of a supply chain network is done first. At the second stage, scheduling of workflows is then done for the generated supply chain network. However, such a two-stage process may not lead to satisfactory results as the configuration of a supply chain network will influence the performance of workflows. In addition, a supply chain network formed at the first stage may not be able to generate feasible schedules to meet the order requirements due to insufficient capacity of the resources provided by the partners. Therefore, an effective approach should generate the supply chain network and schedules simultaneously and dynamically. Workflow scheduling in a supply chain is a complex issue. In supply chains, the requirements of an order are specified by the product demands, price and due date. The decisions of a company depend on those of its upstream partners and have influence on its downstream partners. The decisions of different companies in a supply chain must be coherent so that the customers’ order requirements can be met timely and cost effectively. Development of an effective workflow scheduling method is an important research issue in supply chain management.

Figure 2 illustrates the collaboration of six companies \(C_{1}\) through \(C_{6}\). Company \(C_{n}\) must create an associated workflow schedule \(S_{n}\) to meet the order requirements. To develop an effective scheme to schedule workflows based on collaboration of partners, a problem formulation is required. Let \(\{ 1,2, \ldots ,N\}\) be the set of companies involved in the scheduling decisions. We use \(w_{n}\) to represent workflow agent corresponding to \(C_{n}\). The workflow of company \(n \in \{ 1,2, \ldots ,N\}\) is described by a workflow model. Let \(Q\) denote the product demand of order \(o\) placed to a supply chain. Let \(\psi\) denote the due date for completing an order \(o\). The set of all workflow agents is denoted by WA. The operations in the workflows need to be performed by some resources. Let \(\Re\) denote the set of all resources in the system. The activities of a resource are described by a resource activity model \(a_{r}\), \(r \in \Re\). In our workflow scheduling system, we use \(a_{r}\) to represent resource agent corresponding to resource \(r \in \Re\). Let WR denote the set of all resource agents in the system.

Fig. 2
figure 2

Dependency of workflow scheduling decisions in a supply chain

A supply chain network for handling an order \(o\) is denoted by a digraph \({\text{SCM}}({\text{WA}} \cup {\text{RA}},E)\), where \({\text{WA}} \subseteq {\mathbf{WA}}\) is the set of nodes of workflow agents in \({\text{SCM}}\), \({\text{RA}} \subseteq {\mathbf{WR}}\) denotes the set of nodes of resource agents that take part in the activities in \({\text{SCM}}\) and \(E\) is the set of arcs connecting nodes. An arc in \(E\) represents the dependency between two workflow agents. Figure 3 shows the digraph representation of a supply chain network associated with Fig. 2.

Fig. 3
figure 3

Digraph representation of a supply chain

Note that \({\text{SCM}}({\text{WA}} \cup {\text{RA}},E)\) only defines the structure of a supply chain. A supply chain must be constructed dynamically to respond to business opportunities. In addition, the operations of \({\text{SCM}}\) must be scheduled properly for each \(a \in {\text{WA}} \cup {\text{RA}}\). Therefore, the workflow management problem in \({\text{SCM}}\) can be broken down into two related subproblems: (1) configuration/formation of \({\text{SCM}}({\text{WA}} \cup {\text{RA}},E)\) and (2) scheduling of operations/workflows for each agent involved in \({\text{SCM}}({\text{WA}} \cup {\text{RA}},E)\) to meet the order requirements, including product demand and due date. The aforementioned problem calls for the development of a problem-solving mechanism to determine whether there exist \({\text{WA}}\), \({\text{RA}}\), where \({\text{WA}} \subseteq {\mathbf{WA}}\) and \({\text{RA}} \subseteq {\mathbf{WR}}\), and associated schedules \(S_{a}\) for each \(a \in {\text{WA}} \cup {\text{RA}}\) such that the order requirements can be met. To solve this problem, a divide-and-conquer approach that combines distributed computation capability of MAS, formal models of agents’ workflows/activities and optimization theory is adopted. We develop a solution methodology to solve the dynamic supply chain configuration and collaborative workflow scheduling problem based on interaction of agents and application of optimization technique to individual agents. To facilitate negotiation between order agents, workflow agents and resource agents, formal models for workflow agents and resource agents are proposed in the next section.

A model-based collaborative scheduling approach

Our approach to workflow scheduling relies on interaction between order agents, manager agent, workflow agents, resource agents and collaborative scheduling agents. Figure 4 shows the connection between an order agent, workflow agents, resource agents and collaborative scheduling agents, where a collaborative scheduling agents consists of two procedures: automated scheduling problem formulation based on network flow model construction and a subgradient-based algorithm for solving scheduling problem. In Fig. 4, Order Agent 1 places an order to Workflow Agent 1, which invokes Collaborative Scheduling Agent 1. Note that in the process of scheduling, Collaborative Scheduling Agent 1 interacts with Resource Agent 1 and Resource Agent 2. Workflow Agent 1 then requests Workflow Agent 2, which is at the upstream of Workflow Agent 1, to invoke Collaborative Scheduling Agent 2 to schedule operations. Collaborative Scheduling Agent 2 then interacts with Resource Agent 3 and Resource Agent 4. Interactions among different types of agents are based on a negotiation mechanism that extends the well-known CNP by Smith (1980). CNP relies on an infrastructure for individual agents to publish and discover their services and communicate with each other based on the ACL language defined by the FIPA international standard for agent discovering other related agents. To realize the proposed idea, a platform that supports the development of multi-agent systems, publishing/discovery of agent services is required. Java Agent Development Environment (JADE) is a multi-agent platform that fulfills the aforementioned requirements. Therefore, we develop a system based on JADE to realize our methodology.

Fig. 4
figure 4

Architecture for scheduling collaborative workflows

Petri nets are a powerful tool to model workflows and activities in a supply chain. To model the workflow agents and resource agents, a brief introduction to Petri net can be found in the paper by Murata (1989). A timed Petri net (TPN) \(G\) is a five-tuple \(G = (P,T,F,m_{0} ,\mu )\), where \(P\) is a finite set of places, \(T\) is a finite set of transitions, \(F \subseteq (P \times T) \cup (T \times P)\) is the flow relation, \(m_{0} :P \to Z^{\left| P \right|}\) is the initial marking of the TPN with \(Z\) as the set of nonnegative integers and \(\mu :T \to R^{ + }\) is a mapping that specifies the firing time for each transition performed by \({\text{RA}}\). The marking of \(G\) is a vector \(m \in Z^{\left| P \right|}\) that indicates the number of tokens in each place and is state of the system. In TPN, \({}^{ \cdot }t\) denotes the set of input places of transition \(t\) and \(t^{ \cdot }\) denotes the set of output places of transition \(t\). A transition \(t\) is enabled and can be fired under a marking \(m\) if and only if \(m(P) \ge F(p,t)\;\forall p \in {}^{ \cdot }t\). Firing a transition once removes one token from each of its input places and adds one token to each of its output places. To model the activity of a workflow agent in Petri net, we use a place to represent a state in the workflow while a transition to represent an event or operation that brings the workflow from one state to another.

Definition 3.1

The workflow of a workflow agent \(w_{n}\) is an acyclic timed marked graph ATMG \(W_{n} = (P_{n} ,T_{n} ,F_{n} ,m_{n0} ,\mu_{n} )\). As each transition represents a distinct operation in a task, \(T_{j} \cap T_{k} = \varPhi\) for \(j \ne k\). Individual workflow must satisfy certain timing constraints so that overall collaborative workflow can meet the timing requirements. The timing constraints for a workflow agent are determined by the timing constraints imposed on downstream workflow agent.

A workflow cannot be performed without using the required resources. In each step of the workflow, specific resource requirements must be met to start its operation. Each operation in a workflow consumes a number of different types of resources. An activity is a sequence of operations to be performed by certain type of resources. A cycle indicates that the resource activity includes resource allocation and de-allocation. Each resource has an idle state and each resource activity starts and ends with an idle state. The Petri net model for the kth activity of resource agent \(a_{r}\) is described by a Petri net \(A_{r}^{k}\) that starts and ends with the resource idle state place \(p_{r}\) as follows.

Definition 3.2

Petri net \(A_{r}^{k} = (P_{r}^{k} ,T_{r}^{k} ,F_{r}^{k} ,m_{r0}^{k} ,\mu_{r}^{k} )\) denotes the activity model of the kth activity of resource agent \(a_{r}\), where \(a_{r} \in {\text{RA}}\). The initial marking \(m_{r}^{k}\) is determined based on the number of resources allocated to the kth activity. There is no common transition between \(A_{r}^{k}\) and \(A_{r}^{{k{\prime }}}\) for \(k \ne k^{{\prime }}\). Note that \(\mu_{r}^{k}\) only specifies the firing time for each transition in \(A_{r}^{k}\). Figure 5a shows the model of a workflow agent. Figure 5b–d illustrates the resource activity models associated with the workflow in Fig. 5a.

Fig. 5
figure 5

a An example of Petri net model for a workflow agent \(w_{n}\), b Petri net models for three activities of two resource agents \(a_{{r_{1} }}\) and \(a_{{r_{2} }}\)

To formulate the scheduling problem for a collaborative scheduling agent, we first obtain the parameters from the corresponding timed Petri net models. In our system, each order agent places only one order. To formulate the problem, we define the following notations.

Notations:

\(O\) :

The number of order agents, i.e., \(O = \left| {\varvec{OA}} \right|\).

\(N\) :

The number of workflow agents, i.e., \(N = \left| {\varvec{WA}} \right|\).

\(K_{n}\) :

The number of different resource activities involved in \(W_{n}\).

\(k\) :

The index of the \(k\) th resource activity, \(A_{r}^{k}\), in \(W_{n}\); \(k \in \left\{ {1,2, \ldots ,K_{n} } \right\}\).

\(T\) :

The total number of time periods.

\(t\) :

A time period index; \(t \in \left\{ {1,2,3, \ldots ,T} \right\}\).

\(\Re\) :

The set of all resources in the system.

\(C_{rt}\) :

The capacity of resource \(r\) at time period \(t\), where \(C_{rt} = m_{r0}^{k} (r)\).

\(r_{k}\) :

Resource agent that performs the \(k\) th resource activity, \(A_{r}^{k}\), in \(W_{n}\).

\(\mu_{r}^{k} (t_{s}^{k} )\) :

The firing time for starting transition \(t_{s}^{k}\) of the \(k\)th activity, \(A_{r}^{k}\).

\(\mu_{r}^{k} (t_{e}^{k} )\) :

The firing time for ending transition \(\mu_{r}^{k} (t_{e}^{k} )\) of the \(k\)th activity, \(A_{r}^{k}\).

\(\pi_{nk}\) :

The processing time \(\pi_{nk}\) of the \(k\)th resource activity \(A_{r}^{k}\) in \(W_{n}\).

\(D_{n}\) :

The quantity of products demand for order \(o\).

\(d_{n}\) :

The due date of order \(o\) that is placed to workflow agent \(w_{n}\).

\(S_{nkt}\) :

The input buffer constraint of the \(k\)th resource activity of \(W_{n}\) at time period \(t \in \left\{ {1,2,3, \ldots ,T} \right\}\).

\(u_{onkt}\) :

The number of parts of order \(o\) loaded onto the corresponding resource \(r_{k}\) for processing the \(k\)th resource activity in \(W_{n}\) during time period \(t\), where \(u_{onkt} \ge 0\) and \(u_{onkt} \in Z^{ + }\) is the set of nonnegative integers.

\(z_{ot}\) :

The number of parts of order \(o\) in workflow \(W_{n}\) finished during time period \(t\).

\(x_{onkt}\) :

Be the number of parts of order \(o\) at the input buffer of the \(k\)th resource activity in \(W_{n}\) at the beginning of period \(t\), where \(x_{onkt} \ge 0\) and \(x_{onkt} \in Z^{ + }\) is the set of nonnegative integers.

Note that the due date \(d_{n}\) for workflow agent \(w_{n}\) is set by its downstream workflow agents in the negotiation processes.

Given \(W_{n}\), \(A_{r}^{k}\), \(\pi_{nk}\), \(D_{n}\) and \(d_{n}\), where \(n \in \left\{ {1,2,3, \ldots ,N} \right\}\) and \(k \in \left\{ {1,2,3, \ldots ,K_{n} } \right\}\), the problem for scheduling the parts in \(W_{n}\) requested by the orders is formulated as follows.

We now define the earliness/lateness penalty coefficient \(\theta_{ont}\) for each product of workflow \(W_{n}\) completed at time \(t\) as

$$\theta_{ont} = \left\{ {\begin{array}{ll} 0 \hfill & {\text{if}} \quad t = d_{n} \hfill \\ {\psi (t - d_{n} )} \hfill & {\text{otherwise}} \hfill \\ \end{array} }\right., \quad {\text{where}} \quad \psi (t - d_{n}) > 0\;\forall t \ne d_{n} > 0$$

The scheduling problem for workflow \(W_{n}\) is to find an allocation of resource capacities over the scheduling horizon that minimizes the total production costs while satisfying all production constraints. Mathematically, it is formulated as

$$(OP_{n} )\quad \begin{array}{*{20}l} {\hbox{min} \sum\limits_{o = 1}^{O} {\sum\limits_{n = 1}^{N} {\sum\limits_{t = 1}^{T} {(\theta_{ont} z_{ont} )} } } } \hfill \\ {s.t.} \hfill \\ \end{array}$$
$$\sum\limits_{o = 1}^{O} {\sum\limits_{n = 1}^{N} {\sum\limits_{{\tau = t - \pi_{nk} + 1}}^{t} {u_{onk\tau } \le C_{{r_{k} t}} \quad \forall k \in \left\{ {1,2, \ldots ,K_{n} } \right\},\;\forall t} } }$$
(3.1)
$$z_{ot} = u_{{onK_{n} (t - \pi_{{nK_{n} }} )}} ,\quad \forall o,\;\forall t$$
(3.2)
$$x_{on11} = D_{n}$$
(3.3)
$$x_{on1(t + 1)} = x_{on1t} - u_{on1t}$$
$$x_{onk(t + 1)} = x_{onkt} - u_{onkt} + u_{{on(k - 1)(t - \pi_{n(k - 1)} )}} \quad \forall k \in \left\{ {2,3, \ldots ,K_{n} } \right\}$$
(3.4)
$$x_{{on(K_{n} + 1)(t + 1)}} = x_{{on(K_{n} + 1)(t + 1)}} + u_{{onK_{n} (t - \pi_{nk} )}}$$
(3.5)
$$x_{onkt} \le S_{nkt} \quad \forall k \in \left\{ {1,2, \ldots ,K_{n} } \right\},\quad \forall t$$
(3.6)

Note that constraints (3.1) are the capacity constraints, constraints (3.2) state the number of parts of order \(o\) in workflow \(W_{n}\) finished during time period \(t\) and constraints (3.3)–(3.5) are the flow balance equations. Constraints (3.6) state the buffer level of intermediate parts.

Subgradient method for scheduling collaborative workflows

In this paper, we combine optimization theory with multi-agent system architecture to allocate resources and perform the operations of workflows. We adopt a divide-and-conquer approach to perform optimization locally by each workflow agent involved and determine whether the temporal constraint can be satisfied based on the solutions of individual workflow agents. Optimization is achieved by each workflow agent that applies the Lagrangian relaxation technique to develop a solution algorithm for workflow scheduling. To apply optimization scheme, an optimization problem is formulated based on transformation of the corresponding time Petri net model. The structure of the scheduling problem faced by each workflow agent can be represented by a minimum cost flow (MCF) problem. The Lagrangian relaxation technique provides a systematic way to determine the cost of each arc in MCF.

In problem \(OP_{n}\), we observe that the coupling among production flows of different product types is caused by contention for resources. Based on this observation, we apply Lagrangian relaxation to relax resource capacity constraints (1) and form the Lagrangian function as

$$L(\lambda ) = \hbox{min} \sum\limits_{o = 1}^{O} {\sum\limits_{n = 1}^{N} {\sum\limits_{t = 1}^{T} {\theta_{ont} z_{ont} } } } + \sum\limits_{k = 1}^{{K_{n} }} {\sum\limits_{t = 1}^{T} {\lambda_{onkt} \left( {\sum\limits_{o = 1}^{O} {\sum\limits_{n = 1}^{N} {\sum\limits_{{\tau = t - \pi_{nk} + 1}}^{t} {u_{onk\tau } } } - C_{{r_{k} t}} } } \right)} }$$

s.t. constraints (3.2), (3.3), (3.4), (3.5) and (3.6),where \(\lambda_{onkt}\) is the associated Lagrange multipliers that must be nonnegative.

We define the optimization problem for type \(n\) workflow as follows:

$$MCF_{on} (v_{on} ,\lambda_{on} ) \equiv \hbox{min} \left[ {\sum\limits_{t - 1}^{T} {\left\{ {\theta_{ont} z_{ont} + \sum\limits_{r = 1}^{{K_{n} }} {\lambda_{okt} \sum\limits_{n = 1}^{N} {\sum\limits_{{\tau = t - \pi_{nk} + 1}}^{t} {u_{onk\tau } } } } } \right\}} } \right]$$

s.t. constraints (3.2), (3.3), (3.4), (3.5) and (3.6).Note that the flow balance equations described by constraints (3.2), (3.3), (3.4), (3.5) and (3.6) can be represented by a network flow model. Please refer to the work by Hsieh and Lin (2014b).

Subgradient-based algorithm to find a solution

figure a

Our approach to finding a solution of \(\max_{\lambda \ge 0} L(\lambda )\) is based on an iterative scheme for adjusting Lagrangian multipliers according to the solutions of MCF subproblems.

Let \(l\) be the iteration index. Let \(v^{l}\) denote the optimal solution to MCF subproblems for given Lagrange multipliers \(\lambda^{l}\) at iteration \(l\). We define the subgradients of \(L(\lambda )\) with respect to Lagrangian multipliers \(\lambda^{l}\) as follows: \(g_{onkt}^{l} = \sum\limits_{o = 1}^{O} {\sum\limits_{n = 1}^{N} {\sum\limits_{{\tau = t - \pi_{nk} + 1}}^{t} {u_{onk\tau } } } - C_{{r_{k} t}} } ,\quad \forall k = 1, \ldots ,K_{n} ,\;\forall t = 1, \ldots ,T\)

The subgradient method proposed by Polyak (1969) is adopted to update \(\lambda\) as follows:\(\lambda_{onkt}^{l + 1} = \left\{ {\begin{array}{*{20}l} {\lambda_{onkt}^{l} + \alpha^{l} g_{onkt}^{l} } \hfill & {{\text{if}}\quad \lambda_{onkt}^{l} + \alpha^{l} g_{onkt}^{l} \ge 0} \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.,\)

where \(\alpha^{l} = \frac{{\beta [\bar{L}(\lambda^{*} ) - L(\lambda^{l} )]}}{{\sum\nolimits_{k,t} {(g_{onkt}^{l} )^{2} } }}\) and \(\bar{L}\) is an estimate of the optimal dual cost and \(0 < \beta < 2\).

A heuristic algorithm to adjust a dual solution

figure b

Iterative application of the subgradient algorithm will converge to an optimal dual solution (\(u^{*}\), \(\lambda^{*}\)). It should be emphasized that Lagrangian relaxation does not guarantee the optimal solution to the underlying problem. Thus, the solution generated may not satisfy the complementary slackness conditions. In case the solution is not feasible, we must develop a heuristic algorithm to find a feasible solution. In our system, we implement a simple heuristic algorithm that removes the excessive flows from the arcs with capacity violation by setting the arc capacity to zero and reroutes the excessive flows to other part of the network based on MCF algorithm.

Agent interaction model for collaborative scheduling

Interactions among resource agents, workflow agents, order agents and collaborative scheduling agents are through a mechanism that extends the well-known contract net protocol originally proposed by Smith (1980) by taking into account the dependency between workflows in supply chains. In contract net protocol, there are two roles an agent can play: manager or bidder. Four stages are involved to establish a contract between a manager and one or more bidders: (1) call for proposals (CFP): The manager announces a task to all potential bidders. The announcement contains the description of the task. (2) Submission of proposals: On receiving the tender announcement, bidders capable of performing the task draw up proposals and submit them to the manager. (3) Awarding of contract: On receiving and evaluating the submitted proposals, the manager awards the contract to the best bidder. (4) Establishment of contract: The awarded bidder may either commit itself to carry out the task or refuse to accept the contract by sending messages to the manager. For the latter case, the manager will reevaluate the bids and award the contract(s) to another bidder(s).

Each workflow may rely on some type of products from other workflows and may produce some other type of products. Each workflow agent has an internal process flow, the required input types and output types. This leads to dependency between workflows. Due to the dependency between workflows, the original contract net protocol must be extended to be applied to solve the collaborative scheduling problem in supply chains.

Figure 6 shows the flowchart of a workflow agent. Initially, a workflow agent waits for a request from an order agent or another workflow agent at its downstream. It will discover the potential resource agents only if the requested part type is supported. Otherwise, it will not respond to the request. If the part type is supported, the workflow agent will query the Directory Facilitator (DF) agent, which provides directory services in JADE platform, to find the potential resource agents and apply the contract net protocol to determine the best proposals. The workflow agent then invokes the collaborative scheduling agent to schedule the workflow. Based on the schedule, the workflow agent then requests its upstream workflow agents to schedule their workflows and then waits for the confirmation message from them. If it receives a negative confirmation indicating that there does not exist a feasible schedule, the negotiation process is aborted. Otherwise, it will accept the proposals of resource agents.

Fig. 6
figure 6

Flowchart of a workflow agent

We have implemented a software system based on the methodology proposed in this paper. Each agent has a graphical user interface (GUI) and a software module to interact with other agents in the system. The requirements of an order agent are specified by a GUI and are represented and stored in XML format. In addition, a workflow agent consists of a proper GUI to specify its properties and represent its workflow model. The workflow model is described by a timed Petri net model. The properties of a workflow agent are described by an XML file. The activities of the workflow to be performed by resource agents are also represented by timed Petri net models. The capabilities of a resource agent are defined by a GUI and are also described by an XML file. The order requirements are the inputs of order agents and are represented in XML in our system. The inputs of workflow agents are the workflow Petri net models represented in PNML, and the inputs of resource agents are the activity Petri net models represented in PNML.

Numerical results

Based on the algorithms proposed in the previous section, we verify our method by examples. We first use a small example to illustrate the functions of the software developed in this paper. We then present the results for several examples by applying our software.

Example 1

Consider three companies, A, B and C, which may cooperate to form a supply chain, as shown in Fig. 7. Company A produces type 1 parts, whereas Company B produces type 2 parts. Company C depends on type 1 parts from Company A and type 2 parts from Company B to produce the products (type 3 parts). Suppose Company C receives an order. The requirements of the order are to produce five units of type 3 parts by the time PM 16:40, April 26, 2015 (\(d_{1}\)). By applying our software, Company A, Company B and Company C must first define their workflows and resources. The GUI for defining the properties of a workflow agent is shown in Fig. 8. “Appendix A.1” shows the PNML models for workflow agents \(W_{1}\), \(W_{2}\) and \(W_{3}\). “Appendix A.2” shows the PNML models for activities \(A_{r1}^{k}\), \(A_{r2}^{k}\) and \(A_{r3}^{k}\) of resource agents. The firing time of transitions is listed in Table 1.

Fig. 7
figure 7

Three companies, A, B and C, in a supply chain

Fig. 8
figure 8

GUI for a workflow agent

Table 1 Firing time of transitions

Figure 9 shows the GUI for a resource agent. Company C first defines an order agent. Figure 10 illustrates the graphical user interface (GUI) for an order agent. An order is specified by a due date, a product type, quantity and the penalty cost (earliness penalty cost and lateness penalty cost) to achieve just-in-time production. For this example, the earliness/lateness penalty coefficient \(\theta_{11t}\) is as follows:

Fig. 9
figure 9

GUI for a resource agent

Fig. 10
figure 10

GUI for an order agent

$$\theta_{11t} = \left\{ {\begin{array}{*{20}l} 0 \hfill & {{\text{if}}\quad t = d_{1} } \hfill \\ {40} \hfill & {{\text{if}}\quad t > d_{1} } \hfill \\ {20} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.$$

For this example, three workflow agents (\(W_{1}\), \(W_{2}\), \(W_{3}\)), five resource agents (\(R_{1}\), \(R_{2}\), \(R_{3}\), \(R_{4}\), \(R_{5}\)) and one order agent (\(O_{1}\)) need to be defined and created.

Handling an order relies on the collaboration of a number of agents. Our software solves the workflow scheduling problem based on interaction of different types of agents in the system. Figure 11 shows interactions between agents. Order agent \(O_{1}\) issues a request to the potential workflow agents. As workflow agent \(W_{3}\) can produce type 3 parts, it will send a CFP to the potential resource agents. As resource agent \(R_{4}\) and resource agent \(R_{5}\) can perform the operations, they submit proposals to workflow agent \(W_{3}\). Note that as resource agent \(R_{1}\), resource agent \(R_{2}\) and resource agent \(R_{3}\) cannot perform the operations, they will not submit any proposal to workflow agent \(W_{3}\). Once the proposal has been received by workflow agent \(W_{3}\), it will request a collaborative scheduling agent \(Opt1\) to optimize the schedule. Figure 12 shows the network model constructed by the collaborative scheduling agent for \(W_{3}\) in the process of optimization. The collaborative scheduling agent will send a message to confirm feasibility of the solution. On receiving the message, workflow agent \(W_{3}\) will request its potential upstream workflow agents, \(W_{1}\) and \(W_{2}\), to optimize their schedules. Workflow agents \(W_{1}\) and \(W_{2}\) will send CFP to the potential resource agents, wait for the proposals from the resource agents \(R_{1}\),\(R_{2}\) and \(R_{3}\) and invoke collaborative scheduling agents to optimize the schedules based on the proposals. Figures 13 and 14 show the network models constructed by the collaborative scheduling agent for \(W_{2}\) and \(W_{1}\) in the process of optimization. For this example, allocation of resource agents to perform the operations in the associated workflow for the order agent is depicted in Figs. 15, 16, 17, 18 and 19, respectively. For this example, the order due date can be met. The contracts established between agents are shown in Fig. 20. The schedules for each resource agent are shown in Table 2.

Fig. 11
figure 11

Interaction between agents

Fig. 12
figure 12

Network model of workflow agent \(W_{3}\)

Fig. 13
figure 13

Network model of workflow agent \(W_{2}\)

Fig. 14
figure 14

Network model of workflow agent \(W_{1}\)

Fig. 15
figure 15

Calendar of resource agent R1

Fig. 16
figure 16

Calendar of resource agent R2

Fig. 17
figure 17

Calendar of resource agent R3

Fig. 18
figure 18

Calendar of resource agent R4

Fig. 19
figure 19

Calendar of resource agent R5

Fig. 20
figure 20

Contracts established between agents for Example 1

Table 2 Schedules for resource agents

Example 2

Consider six companies, A, B, C, D, E and F, which may cooperate to form a supply chain, as shown in Fig. 21. Company A produces type 1 parts, whereas Company B produces type 2 parts. Company D produces type 4 parts and Company E produces type 5 parts. Company C depends on type 1 parts from Company A and type 2 parts from Company B to produce the products (type 3 parts). Company F depends on type 3 parts from Company C, type 4 parts from Company D and type 5 parts from Company E to produce the products (type 6 parts).

Fig. 21
figure 21

Six companies in a supply chain

Suppose Company F receives an order. The requirements of the order are to produce 150 units of type 6 parts by the time AM 12:00, April 23, 2017 (\(d_{1}\)). By applying our software, Company A, Company B and Company C must first define their workflows and resources. “Appendix B.1” shows the PNML models for workflow agents \(W_{1}\), \(W_{2}\), \(W_{3}\), \(W_{4}\), \(W_{5}\) and \(W_{6}\). “Appendix B.2” shows the PNML models for activities \(A_{r1}^{k}\), \(A_{r2}^{k}\), \(A_{r3}^{k}\), \(A_{r4}^{k}\), \(A_{r5}^{k}\), \(A_{r6}^{k}\) and \(A_{r7}^{k}\),of resource agents. The firing time of transitions is listed in Table 3.

Table 3 Firing time of transitions

For this example, the earliness/lateness penalty coefficient \(\theta_{11t}\) is as follows:

$$\theta_{11t} = \left\{ {\begin{array}{*{20}l} 0 \hfill & {{\text{if}}\quad t = d_{1} } \hfill \\ {40} \hfill & {{\text{if}}\quad t > d_{1} } \hfill \\ {20} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.$$

For this example, six workflow agents (\(W_{1}\), \(W_{2}\), \(W_{3}\), \(W_{4}\), \(W_{5}\) and \(W_{6}\)), seven resource agents (\(R_{1}\), \(R_{2}\), \(R_{3}\), \(R_{4}\), \(R_{5}\), \(R_{7}\)) and one order agent (\(O_{1}\)) need to be defined and created.

The output of our collaborative workflow management system includes the contracts established between agents for handling orders, the schedules for executing each workflow and the schedules for performing the operations of workflows by resource agents. Figure 22 shows the contracts established between agents for handling Order 1. It indicates that resource agents \(R_{1}\) through \(R_{6}\) take part in the operations of workflows \(W_{1}\) through \(W_{6}\) required for Order 1. Our system also shows the assignment of resources to process workflows and orders. The schedules for each resource agent are shown in Table 4.

Fig. 22
figure 22

Contracts established between agents for Example 2

Table 4 Schedules for resource agents

Scalability analysis and verification by examples

Response time is an important performance index in supply chain management. To be applicable in supply chains, response time should be acceptable as the number of partners in the network grows. In the remainder of this paper, we will present the analysis and numerical results to show that our method is scalable in terms of response time as the size of supply chain network grows. Response time is the total amount of time it takes to respond to a request for service. The response time for solving the problem defined in this paper consists of two parts: computation time and transmission time. With the widespread adoption of broadband network, transmission time is much less than computation time for solving the problem defined in this paper. To evaluate the scalability of our proposed methodology in terms of response time, we first analyze and compare the computational complexity of our proposed method and that of an industrial centralized optimizer. We then conduct experiments based on the developed software system.

We focus on assembly supply chain networks. In this type of networks, the response time can be measured based on the concept of “depth” of supply chains. In a typical assembly supply chain network, let’s call a node without any upstream node a leaf node and a node without any downstream node a final node. Note that there is one and only one final node in an assembly supply chain network. For each directed path that starts with a leaf node and ends with a final node, the number of nodes in the directed path is called the depth of the path. The response time of a supply chain network is the longest response time of all directed paths that start with a leaf node and end with the final node.

We compare the response time of supply chains based on the distributed MAS architecture used in this paper and a centralized architecture as follows. To compute \(L(\lambda )\) for a workflow agent \(w_{n}\) with given \(\lambda\), it is necessary to solve the minimum cost flow problem. The computational complexity to solve a minimum cost flow problem with \(n\) nodes and flow of \(f\) is \(O(n^{2} f)\). As the number of nodes in the network associated with \(L(\lambda )\) is proportional to \(K_{n} T\) and the flow is \(D_{n}\), the computational complexity is \(O(K_{n}^{2} T^{2} D_{n} )\). Note that the Lagrange multipliers are updated as follows:

$$\begin{aligned} g_{onkt}^{l} & = \sum\limits_{o = 1}^{O} {\sum\limits_{{\tau = t - \pi_{nr} + 1}}^{t} {u_{onk\tau }^{l} - C_{{or_{k} t}} } } \quad {\text{for}}\;{\text{each}}\quad k \in \{ 1, \cdots ,K_{n} \} ,\quad t \in \{ 1, \ldots ,T\} \\ \lambda_{onkt}^{l + 1} & = \left\{ {\begin{array}{*{20}l} {\lambda_{okt}^{l} + \alpha^{l} g_{onkt}^{l} } \hfill & {{\text{if}}\quad \lambda_{okt}^{l} + \alpha^{l} g_{onkt}^{l} \ge 0} \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} .} \right. \\ \end{aligned}$$

As the number of Lagrange multipliers is proportional to \(K_{n}\) and \(T\), the computation time involved in updating \(\lambda\) will increase approximately with \(O(K_{n} T)\). Therefore, the overall computational complexity is \(O(K_{n}^{2} T^{2} D_{n} )\). This indicates the computational complexity of our algorithm is polynomial with respect to problem size. Suppose the depth is \(V\). In a distributed MAS architecture, a supply chain with depth \(V\) has at most \(V\) node in each directed path from a leaf node to the final node. Therefore, the overall response time will be \(O(\sum_{n} K_{n}^{2} T^{2} D_{n} )\). Let \(K = \max_{n} K_{n}\). Then, the overall response time will be bounded by \(O(VK_{{}}^{2} T^{2} D_{n} )\).

Let’s analyze the overall response time for a centralized computing architecture as follows. If we solve the scheduling problem for V echelon supply chain based on a centralized computing architecture, the Petri net models of all V workflow agents will be merged first. As the number of nodes in the network associated with \(L(\lambda )\) is proportional to \(({\mathop {\varSigma }\nolimits_{{\rm n}} {\text{K}}_{{\rm n}}} )T\) and the flow is \(D_{n}\), the computational complexity to compute \(L(\lambda )\) is bounded by \(O(V^{2} {\text{K}}_{{}}^{ 2} T^{2} D_{n} )\). As the number of Lagrange multipliers is proportional to \((\sum_{n} K_{n} )T\), the computation time involved in updating \(\lambda\) will increase approximately with \(O((\varSigma_{n} K_{n} )T)\) and is bounded by \(O(VKT)\). Therefore, the overall response time for a centralized architecture will be bounded by \(O(V^{2} K_{{}}^{2} T^{2} D_{n} )\).

To verify the analysis above, we conduct experiments by increasing the depth of a supply chain network and comparing the response time. All the experiments are conducted for multiple echelon supply chains. Figure 23 shows and compares the response time obtained based on MAS and a centralized problem solver, CPLEX (CPLEX integer programming solver 2015), as the depth of supply chains grows. As expected, the response time of our agent-based approach is significantly less than that of the centralized CPLEX problem solver as the depth of supply chains grows. Figure 24 shows and compares the response time with respect to demand in supply chains based on MAS and the CPLEX centralized problem solver. It also indicates that MAS architecture is much more efficient than the centralized architecture.

Fig. 23
figure 23

Response time with respect to the depth of supply chain (in second). A: MAS architecture, B: centralized architecture

Fig. 24
figure 24

Response time with respect to demand (in second). A: MAS architecture, B: centralized architecture

Conclusions

Management of collaborative workflows in supply chains is an important issue. In supply chains, the workflows of a company depend on those of its upstream partners and have influence on those of its downstream partners. Such dependency complicates the workflow management problem in supply chains. The workflow scheduling problem in supply chains is a dynamic and challenging one. Scheduling workflows in supply chains relies on the development of a methodology that is flexible, reconfigurable and scalable. In this paper, we propose a reconfigurable, flexible and scalable architecture for scheduling workflows in supply chains based on MAS, workflow specification language and optimization theories. Our proposed methodology achieves reconfigurability and flexibility by using Petri net as the workflow specification language, specifying all the messages of CNP in XML and adopting a FIPA compliant multi-agent platform that supports ACL, contract net protocol (CNP) and publication/discovery infrastructure. Our approach attains scalability by developing algorithms based on distributed computing architecture to solve collaborative workflow scheduling problem. To take into account the dependency among the different workflow scheduling subproblems, a multi-level contract net protocol is applied in this paper to facilitate negotiation of different companies. A divide-and-conquer approach is adopted to take advantage of the distributed computation provided by MAS to optimize the workflow schedules. The original workflow scheduling problem is decomposed into a number of agents’ subproblems that can be solved efficiently based on the collaboration of agents. A prototype system has been implemented based on our proposed methodology. As Petri net is adopted as a process specification language in our scheduling system, our approach formulates the scheduling problem based on the Petri net models and the order requirements dynamically. The cost and time involved in the development of scheduling software can be significantly reduced. In addition to the advantage of reduction in development cost and time, we also illustrate the effectiveness and analyze scalability of our approach by examples. To study scalability of our approach, we analyze response time with respect to the depth and the demand of supply chains. Both our analysis and numerical results indicate that our approach is much more efficient than an industrial centralized problem solver as the depth and the demand of supply chains grow. Our approach relies on the service discovery function which is provided by many multi-agent platforms. It relies on construction of workflow models and activity models for individual partners represented by agents in the supply chains. Currently, the class of workflow models used in the proposed approach is acyclic timed marked graph. Extension of workflow models to more general classes of Petri nets is one of our current research directions. As the ability to cope with external and internal disruptions and disturbances gains more and more importance, another research direction relevant to this paper is to study mechanisms to deal with uncertainties in supply chains.