1 Introduction

Planning and decision-making processes in contemporary companies generally use deterministic methods, without taking into account the conditions of uncertainty [2, 6]. This increases the risk, because there is no information about the possible occurrence of threats and the resulting effects. To mitigate the risk and increase the probability of taking correct decisions, actions should be taken in order to identify the area of risk, its extent and the impact on the operations in the organization, as well to search for measures for eliminating the risk. The awareness of the omnipresence of various types of risk raises the need to identify it in terms of the place of its occurrence and the strength of its impact on the company.

As an answer to the lack of standards in understanding the risk and managing it, the International Organization for Standardization created a standard, which was translated into Polish in March 2012. ISO 31000 standard: 2012 “Risk Management - principles and guidelines” defines risk as “effect of uncertainty on objectives”, while uncertainty is “the state, even partial, of deficiency of information related to understanding or knowledge of an event, its consequences, or likelihood” [8]. From the engineering point of view, risk is the probability that the system, at a certain moment of time, will not perform the function, for which it has been designed [1]. Therefore, in order to identify a risk, the hazard that causes it should be located.

2 Risk in Production Systems

Failure is an unforeseen and undesirable phenomenon that occurs in every production process or technical object. It is a degree of malfunctioning which prevents a correct operation of a device or results in its complete shut-down of the device. The risk of failure cannot be completely eliminated – it is only possible to determine the risk level and the probability of occurrence as well as to prepare adequate preventive measures [7, 10, 13].

Reliability is a quantitative measure of the failure rate, which can be defined as the probability of correct operation of a technical object in specific operating conditions and over a specific period of time [11, 12, 14]. Reliability is not a constant value, as the probability of occurrence of a failure increases over time.

The specific character of today’s production systems and, in particular, their complexity, allows treating them as operation systems, and then the reliability is one of their features measured by the extent of realization of determined indicators, parameters and characteristics. In turn, production systems must operate in an environment which continuously affects the system and causes its disturbances. This is a reason that the reliability in real conditions is of random nature [3, 12, 13].

The general reliability theory can be transposed to the sphere of production systems by treating the unreliability (Z), i.e. the opposite of the reliability, as a synonym of risk (R) [3]:

$$ {\text{R }} = {\text{ Z}} $$
(1)

The risk (unreliability) of a system (e.g. a production system) interpreted in this way will represent the probability that the system will not perform the functions, for which it has been designed, or the probability of occurrence of losses in this system. For this interpretation, the following equation should be true:

$$ {\text{N }} + {\text{ Z }} = { 1} $$
(2)

The concept of the reliability engineering is often compared with the system survival ability. Reliability (N) can be represented by a reliability function N(t) which determines the probability that the system will be operational within a specified time interval [17].

Thus, in the interval from zero to infinity, the function is a decreasing function. If the variable Z(t) is adopted as a measure of unreliability, it can be concluded that the probability of malfunction is expressed by the formula [1]:

$$ {\text{Z}}_{{({\text{t}})}} + {\text{ N}}_{{({\text{t}})}} = { 1} $$
(3)

From the viewpoint of the reliability engineering, an object can be treated as an element (selected from a system) or as a system (a set of interoperating elements). Individual elements in a system can be linked to each other, but it is usually assumed in practice that there are no links between them.

3 Classical Method for Determining the Risk in Systems with a Parallel Structure

The definition of the reliability according to the classical theory is that the system is fit for operation, if at least one of its objects is fit for operation, which means that a correct functioning of one element of the system is sufficient for functioning of the system. An example of a diagram of a parallel reliability structure of a system with n objects is shown in Fig. 1.

Fig. 1.
figure 1

An example of a parallel reliability structure of a system with n objects

Reliability of the system \( N_{S,} \) presented in Fig. 1 will be determined by the formula [4]:

$$ N_{S} = 1 - [\left( {1 - N_{1} } \right) \cdot \left( {1 - N_{2} } \right) \cdot \ldots \cdot \left( {1 - N_{n} } \right)] $$
(4)

where \( N_{1} , N_{2} , \ldots , N_{n} \) - reliability of individual objects/subsystems of the system.

\( R_{c} \) of the system can be determined based on the formulas (3) and (4):

$$ R_{C} = R_{1} \cdot R_{2} \cdot \ldots \cdot R_{n} = \prod\nolimits_{{\text{i = 1}}}^{\text{n}} {{\text{R}}_{\text{i}} } $$
(5)

where \( R_{1} , R_{2} , R_{n} \) - the risk occurring in individual elements of the system.

Parallel structures occur in the production practice, however the nature of the production process does not allow for such an interpretation of the reliability structure. The classical theory of reliability considers 0/1 states of technical equipment. This means that (in the interpretation according to the classical theory) a production system would be recognized as reliable, if at least one element functioned correctly. In production systems, such a situation occurs only in redundant systems [4], i.e. with excess of elements functioning in the system. In reality, redundant systems occur very rarely, because excess of elements (e.g. machines, workers, means of transport, etc.) results in unused resources, which increases the costs.

Considering the structure of the system from Fig. 1 as a parallel production structure, the formula of the risk for this system should be as follows:

$$ R_{C} = R_{1} + R_{2} + \ldots + R_{n} = \sum\nolimits_{i = 1}^{n} {R_{i} } $$
(6)

where \( R_{1} , R_{2} , R_{n} \) - the risk occurring in individual objects/subsystems of the system.

If \( R_{C} > 1 \) is obtained as the result of such calculations, then:

$$ R_{C} = maxR_{i} $$
(7)

Individual risks \( R_{i} \) for n areas, depending on the amount of losses \( S_{i} \) incurred in these areas, will be as follows [3, 4]:

$$ R_{1} = \frac{{S_{1} }}{{W_{teoret} }} $$
(8)
$$ R_{1} = \frac{{S_{2} }}{{W_{teoret} }} $$
(9)
$$ R_{1} = \frac{{S_{\text{n}} }}{{W_{teoret} }} $$
(10)

If the areas differ from each other, in the case of such a type of structure it is necessary to determine the theoretical value of the indicator (\( W_{teoret} \)) for each area. When determining the values \( W_{teoret} \) for each of the n areas examined, the individual losses \( S_{i} \) in these areas, depending on the time losses caused by the occurrence of risk factors in individual areas, will be as follows:

$$ S_{1} = W_{teoret}^{1} \frac{{\Delta t_{1} }}{T} $$
(11)
$$ S_{2} = W_{teoret}^{2} \frac{{\Delta t_{2} }}{T} $$
(12)
$$ S_{\text{n}} = W_{teoret}^{n} \frac{{\Delta t_{\text{n}} }}{T} $$
(13)

where: \( W_{teoret}^{i} \) - theoretical value of the indicator in individual areas of the system.

\( \Delta t_{i} \) - time losses in individual areas caused by risk factors.

Thus the total risk \( \varvec{R}_{\varvec{C}} \) for a system with n areas and a parallel structure of production will be as follows:

$$ R_{\text{C}} = \frac{{W_{teoret}^{1}\Delta t_{1} + W_{teoret}^{2}\Delta t_{2} + \ldots + W_{teoret}^{n}\Delta t_{n} }}{{W_{teoret} T}} $$
(14)

4 The Application of the FMEA Method and Linguistic Variables for Determining the Risk in Production Systems with a Parallel Reliability Structure

Failure Mode and Effects Analysis (FMEA) is one of many methods belonging to the group of quality control methods. It is described in the standard PN-IEC 812: 1994 – Procedure for Failure Mode and Effects Analysis – FMEA.

In order to reduce the level of risk in a production system, a series of actions must be taken. The first of them is the risk identification, which determines the threats that might occur during realization of company’s goals. Due to a potential possibility that many risk factors may occur, it is important to find the source risk, which is the key cause of the problems. During the identification, it is important to search for the answers to the following questions: in which area of the production system the risk occurs and which area is affected by the highest risk.

The next step in reducing the risk level is measuring the risk and determining the extent of the impact on the production system. Failure Mode and Effect Analysis (FMEA) is one of the methods which allow determining the extent of risk in the designated area of a production process or in a product, as well as the resulting effects [9]. Thanks to this, corrective actions aiming at mitigation of the risk can be found subsequently [14]. “One of the key factors in proper implementation of the FMEA program is to act before an event occurs and not to gain experience after the event. In order to obtain the best results, FMEA should be performed before a particular type of construction or process defect is “designed” for a given product.” [5].

When assessing the risk in a production process with the use of the FMEA method, the first step is to detail the operations in the process, then to identify the risk factors present in the process, determine the effects caused by their presence, and to find possible causes. The next step in the analysis is to assign numerical values to the following parameters shown in Table 1.

Table 1. Characteristics of the parameters used in the FMEA method for determining RPN

The RPN number is a standard and most frequently used methodology and technique for the risk level analyzing of potential failures in the FMEA analysis [1, 17]. It is calculated for each of the selected areas of the production system using the formula [5]:

$$ RPN = S\left( {severity} \right)x\,O\left( {occurrence} \right)x\,D\left( {detectability} \right)^{{}} $$
(15)

The numerical values of S, O, and D represent the numerical values of the linguistic terms. They are usually in the range of 1–5 or 1–7 [15]. These criteria are defined by a team which conducts the analysis based on the data and the previous experience on the behavior of the system and the frequency of occurrence and the adverse effects of the machine parts failures on the system.

The value of RPN may be in the range between 1 and 343. So a high value of RPN corresponds to a high risk in the process. If the RPN value is high, efforts should be taken to mitigate the risk using corrective actions [5]. The corrective actions shall be taken first in the areas with the highest RPN level.

It is also needed to categorize the values of RPN and then, on the basis of the obtained values for RPN, take the measures necessary to reduce the risk level. Table 2 presents the range of five risk levels, with marginal values and measures to be taken to reduce the level of risk to an acceptable value.

Table 2. Matrix for RPN in the FMEA method

It is also needed to categorize the values of RPN and then, on the basis of the obtained values for RPN, take the measures necessary to reduce the risk level. Table 3 presents the range of five risk levels with marginal values and measures to be taken to reduce the level of risk to an acceptable value.

Table 3. Measures to reduce the level of risk [15]

Determination of a general limit for a high RPN value is not easy. Each FMEA analysis is unique and the risk estimation in this method cannot be compared with other analyses. This is caused by some sort of subjectivity, the dependence during the assessment, and the decisions made by the person performing the analysis. Therefore for each FMEA analysis a system of criteria should be developed and it should be determined from which values of RPN the corrective actions should be taken.

The values of risk in individual system elements defined in the FMEA method are greater than 1 and are within the range [1, 343] (see. Table 2). Therefore, in the next step of applying the method for estimating the risk in a parallel production system, normalization of RPN on the interval [0,1] should be carried out using the formula:

$$ RPN^{ '} = \frac{{RPN_{i} - RPN_{min} }}{{RPN_{max} - RPN_{min} }} $$
(16)

where \( RPN_{max} \) is the maximum value obtained in the FMEA table based on the product of the values of parameters S, O, D, while \( RPN_{min} \) is the minimum value.

In order to determine the risk in a production system with a parallel structure using the FMEA method, a normalized RPN value should be substituted to the formula (6). Then the formula for the total risk of the system with n elements will be as follows:

$$ R_{C} = RPN_{1}^{ '} + RPN_{2}^{ '} + \ldots + RPN_{n}^{ '} = \mathop \sum \limits_{i = 1}^{n} RPN_{i}^{ '} $$
(17)

If \( R_{C} > 1 \) is obtained as the result of such calculations, then:

$$ R_{C} = maxRPN_{i}^{ '} $$
(18)

An advantage of the use of the FMEA method for determining the risk in a production system with a parallel structure is the possibility of assigning linguistic variables to values of individual parameters S, O, D by a team established for this purpose. Unlike in the case of the classical method presented in Sect. 3, there is no need to measure and determine the extent of losses (\( S_{i} \)) and increases of production times (\( \Delta t_{i} \)) caused by the occurrence of risk factors in a production system (compare the formulas 814).

5 Characteristics of the Production System and the Assessment of Risk with Use of the Method Proposed

The company whose production data were used to verify the proposed method manufactures steel products. The factory has 4 production lines with the layout shown in Fig. 2.

Fig. 2.
figure 2

Layout of production lines in the production floor

The production lines differ from each other by type and age of machines. Each production line consists of three workstations: a cutter, a press and a finishing workstation where quality control is also performed (Fig. 3).

Fig. 3.
figure 3

Diagram of the production system

The factory makes products to individual orders, while individual production orders vary considerably in size of the products and the degree of their complexity. When planning the production, individual orders are assigned to different production lines depending on the size of products and the degree of their complexity. Table 4 summarizes the production lines and compares them in terms of the same parameters.

Table 4. Summary of basic parameters of production lines

LP1 and LP2 are the oldest production lines which are also most prone to failures. However, an inventory of spare parts for the elements that fail most frequently is kept there. Therefore, the time of repair of most failures on the LP1 and LP2 lines is relatively short. Repairs are performed by the maintenance department. A worse situation is in the case of the LP3 and LP4 production lines – in the event of a failure an external company is called to perform the service and thus the time of repair is much longer.

Due to the specific character of production described above, it is impossible to determine the average daily production volume for individual production lines. For example, the LP1 production line can manufacture products with a length from 2 to 14 m. The level of their complexity is also very different. Therefore, the potential losses in production volume caused by the occurrence of risk factors will vary considerably. In conjunction with the above, a decision was made to use the FMEA method.

In order to analyze and assess the risk in the factory with the use of the FMEA method, all 4 production lines have been subjected to detailed observation. Throughout July, the employees used forms prepared especially for this purpose to collect data on the random factors occurring in individual production lines and recorded the information on the type of risk factor as well as its severity (S), occurrence (O) and detectability (D).

For this purpose a team of production workers was set up. The task of this team was to assign values to the S, O, D (severity, occurrence and detectability) parameters and to determine the RPN value. In order to parameterize the values of individual risk factors, auxiliary tables were prepared, which are presented in brief in Table 5.

Table 5. Auxiliary table for determining the FMEA table in the company in question

Then, FMEA tables were prepared for all the identified risk factors. The results from the FMEA tables obtained for all 3 machines located on the production lines have shown that the following factors are of key importance for the processes taking place in the factory:

  • frequent failures of machines on the LP1 and LP2 production lines,

  • long time of repair of the machines, and

  • the necessity of additional setting or changeover of the machines.

Table 6 shows the synthetic values of RPN for these key risk factors. Since individual workstations at the production lines operate in a serial manner, the workstations with the largest RPN values were selected for the summary and further calculations.

Table 6. Abridged FMEA table

In order to use the formula for the total risk of the system (6), the RPN value must be normalized first to the interval [0,1] using the formula (16).

$$ R_{LP1}^{'} = \frac{210 - 1}{343 - 1} = 0,61 $$
$$ R_{LP2}^{'} = \frac{210 - 1}{343 - 1} = 0,61 $$
$$ R_{LP3}^{'} = \frac{150 - 1}{343 - 1} = 0,44 $$
$$ R_{LP4}^{'} = \frac{180 - 1}{343 - 1} = 0,52 $$

The values of the resulting risks were substituted to the formula for the total risk (17):

$$ R_{C} = R_{LP1}^{ '} + R_{LP2}^{ '} + R_{LP3}^{ '} + R_{LP4}^{ '} = 0,61 + 0,61 + 0,44 + 0,52 = 2,18 $$

Since \( R_{C} > 1 \), then:

$$ R_{C} = maxR_{i} = 0,61 $$

The resulting value of the total risk in the production system indicates with a probability of 61% that it is not possible to execute the production plan in July. This result coincides with the extent of delays in the execution of production orders in the factory.

6 Summary

The paper presents a method for assessing the risk in a parallel production system with the use of the FMEA method and linguistic variables. It has many advantages as compared with the classical method described in Sect. 3. In order to assess the amount of losses caused by the occurrence of risk factors in individual elements of the system, it is enough to establish a team composed of employees who are familiar with the system. These employees provide verbally the information on the type of risk factor as well as its severity, occurrence and detectability. In the next step, values are assigned to individual parameters with the use of auxiliary tables of the FMEA method and the RPN is calculated. After normalization of RPN, the classical method for analyzing and assessing the risk in production systems with a parallel structure can be used.