Keywords

1 Introduction

Quality related errors in manufacturing create a lot of problems for the industrial sector mainly because they lead to a great waste of resources in terms of time, money and effort spent to identify and solve them [17]. For this reason, researchers and practitioners aim to find novel solutions capable of tracking and analyzing manufacturing products in an appropriate and easy to use manner [14]. There are already several approaches belonging to different manufacturing domains: additive manufacturing [16], toy manufacturing [4], electrical equipment [12], or fabrication of cylindrical markers [11].

In this context, one of the challenges of modern manufacturing is that a final product usually consists of different components which themselves can also consist of different components, and so forth. At the lowest level, there is raw material (e.g. steel coils, sheet metal, steel rolls, etc.) which nature is also very relevant to the quality of the final product. In this way, tracking and connecting all the data of the different manufacturing stages is crucial to finding the causes of quality-related errors in the final product.

In case the components and raw materials have one-to-one or one-to-many relationships to the final products, the tracking process is quite straightforward and has already been implemented satisfactorily in some of the existing solutions. However, manufacturing products can also be made up of parts that are in lots, and lots can be combined to make other lots. Also, many different lots can be combined into a new one that can be part of many other ones. This means that some manufacturers have to deal with a problem involving many-to-many relationships with blurred relationships among the single parts.

One of the most important problems here is that providing these relationships is very important for both the operators and the quality engineers, so they can drill down from the final product to the assembled components and ultimately to the used raw materials and can identify causes to problems that are not obvious at the first glance.

Unfortunately, the existing solutions based on relational databases are not very useful for the people who are in charge of examining these lots. On the one hand, the existing solutions have bad response times when dealing with this problem, and on the other hand, there are no meaningful probability values available. So the quality engineers must invest a lot of time to analyze all possibly related data and cannot just focus on the relevant data.

To alleviate this problem, we have tried to look for a solution so that it can be possible to track all items from different lots that were used in final products. Our proposed solution is based on the exploitation of graph databases. The major advantage of such approach, concerning the existing ones, is that it allows for informed queries, i.e. queries that can lead to early termination if nodes with no compatible outgoing relations are found. As a result, we have got a software system that presents lower execution time for most of the use cases concerning the tracking of items.

In our previous work [19], we presented a specific approach for the design and solution of a tailored solution for a manufacturing company located in Upper Austria. In the present work, our major contribution here is an extension of our previous work to build a general solution for appropriately tracking the manufacturing products that have different kinds of dependencies (evolution, distribution, and so on). Our solution is intended to outperform traditional systems based on relational databases in the specific context of tracking defective items in lots of manufacturing products. Besides, to illustrate our proposal, we include a complete use case that shows some of the functionality that can be derived from a solution of this kind.

The rest of this work is structured in the following way: Sect. 2 presents the state-of-the-art concerning the current graph-based solutions for the manufacturing industry. Section 3 describes the design and implementation of our solution and a use case whereby our solution outperforms the traditional tracking systems. Section 4 discusses the lessons learned from this research. Finally, in Sect. 5, we remark the conclusions that can be extracted from this work as well as the possible future lines of research.

2 State-of-the-art

In spite of the great need for tracking solutions in manufacturing environments, it seems that most of the quality assurance processes that require controlling and supervising the whole production chain to timely detect human errors and defective materials need to be further automatized. The major reason is that little attention has been paid to this problem due to technical limitations, and as result, there are not too many solutions in the manufacturing domain, but just a few works have been proposed to date [9, 20, 24, 25].

We can go even a step further beyond to see that the problem can be aggravated, even in the case of passing all quality assurance controls. The reason is that products can be rejected by end-users or other manufacturers if unknown problems appear. Therefore, the capability to track each part of a lot from its origin is of vital importance for the manufacturing industry.

In this context, it is necessary to remark that existing manufacturing systems are far from being trivial since the data models they work with often consist of many data types and data sources that are in no relation to each other. Due to this, modeling and optimization, as well as process analysis, represent often a hard task [26].

One of the major limiting factors for the solution to this problem is that traditional relational database systems (i.e. the systems that have so far been used mostly in the industry) are unable to model this specific problem effectively. Therefore, we have focused our research on graph databases [23]. The idea behind graph databases is their capability to store data in nodes and edges versus tables, as found in relational databases [3]. Each node represents an entity, and each edge represents a relationship between two nodes [5]. This way of modeling the problem is much more natural and is in line with a problem arising from dependencies such as this.

It is generally assumed that graph databases have some key advantages over relational databases in this context. The reason is that unlike relational databases, graph databases are designed to store interconnected data what makes it easier to work with these data by not forcing intermediate indexing at every time, and also making it easier to facilitate the evolution of the data that we are working with.

In the literature, recurrent mention is made of some of the advantages of graph databases concerning traditional relational models. It has been possible to identify the three major advantages of graph databases in comparison with traditional relational databases to tackle this problem:

  1. (1)

    The first advantage when dealing with a graph is that as opposed to the relational world, foreign key relationships are not relations in the sense of edges of a graph.

  2. (2)

    The second important advantage of graph databases in relation to relational databases is that, when referring to the latter ones, it is not possible to assign properties or labels to relationships. It is possible to give them a name in the database, but it is not possible to visualize them (e.g. derivations, transformations, etc.). When, in fact, in graph databases is the natural way to model data.

  3. (3)

    Last, but not least, traditional systems based on the exploitation of relational databases are not able to scale as well as graph databases when dispatching relationship-like queries [2].

Some of these advantages are making graph databases gaining popularity among big industrial players. Moreover, their application domain is very broad [1]. In fact, many organizations are already using databases of this kind for detecting fraud in monetary transactions, providing product and service recommendations, documenting use cases and lessons learned in a wide range of domains, managing access control in restricted places, network monitoring to identify potential risks and hazards, and so on. Moreover, if we focus strictly on the manufacturing industry, graph-based solutions have already been proposed in forecasting and recommendation [22].

3 General Model for Tracking Manufacturing Products Using Graph Databases

To illustrate the problem that we are facing with an example, let us think of a situation where a finished part of a product is rejected by the customer because of a number of quality errors. In that case, it is quite common that the manufacturer of the finished part must make a statement within 24 h if this error can be restricted to the single rejected product or if a greater amount of parts is affected. In case a greater amount of parts is affected, then the exactly affected lots must be reported to the customer.

As it is not difficult to imagine, a situation of this kind happens very often, and the worst thing is not that, but that it is really expensive in terms of effort, time, money and brand image. So it should be tried by all means to minimize its impact as far as possible.

To do that, the first task of the manufacturer is to try to find the cause of the quality error. Therefore it must be possible to analyze the captured data of the finished part but also the captured data of all assembled components and raw materials. In case the cause of the quality error lies in a specific component or raw material lot, the manufacturer must find all finished parts that contain this lot. This means that finding the right affected lots in a short period of time can decide whether the manufacturer must recall millions of finished parts or just a few. This means that if we can identify the affected lots quickly, it is not necessary to recall all delivered lots.

It is possible to think a situation whereby to perform this analytic task, the manufacturer needs to search back and forth across all the data of the finished parts. If a relational database is used, this means that many queries and their corresponding responses would need to be combined. In our approach, the solution is much more intuitive since it is in general possible to easily write queries capable of running over the data in any direction as we will see later in this paper. The capability to discover and see the connections between different parts of a product allows a human operator to effectively perform this tracking.

3.1 Notation

Let \(L_m^t\) be a lot of parts produced on machine m at time step t. Every machine m has a buffer \(b_m\) where lots to be processed at this machine are poured into, i.e. they are getting blurred. Then, the relation \(usage: (L_m^{t2}, L_n^{t1})\) defines that at time slice t2 the lot \(L_m^{t1}\) has been poured into the buffer of machine m for processing. So beginning at time slice t2 parts of lot \(L_n^{t1}\) are installed with a certain probability into parts of lot \(L_m^{t2}\).

The relation \(pred: \{(L_m^{t2}, L_m^{t1})\}\) defines that lot \(L_m^{t1}\) is produced before lot \(L_m^{t2}\) on machine m and the buffer \(b_m\) of machine m was not empty when the production of \(L_m^{t2}\) started. Lots that are delivered by other suppliers, i.e. raw materials, are treated the same way. They will be assigned a virtual production machine number and time slice which uniquely identifies the batch number from the supplier.

Fig. 1.
figure 1

One-to-one relationship between lots. This is the simplest relationship that we can find in the manufacturing industry. In principle, a solution for dealing with this type of relationship is quite simple and can be implemented efficiently in a wide range of database systems, including those databases making use of the traditional model.

Using the same mathematical notation, simple examples for one-to-one and the one-to-many relationship between lots without blurring are depicted in Fig. 1 and 2. An example with a many-to-many relationship that includes blurring of lots is given in Fig. 3. By following the edges of the graph, the lots that are built in other lots can be determined easily, i.e. for lot \(L_3^4\) parts of the lots \(\{L_1^1, L_1^2, L_2^2, L_2^3\}\) may be included, or for lot \(L_3^5\) parts of the lots \(\{L_1^2, L_1^3, L_2^3, L_2^4\}\) may be included.

The missing relation pred between \(L_3^4\) and \(L_3^5\) indicates that the buffer of machine 3 was empty before the production of lot \(L_3^5\) started so no blurring of lots could have occurred. The distance between lots can be used as a basis to determine the probability a certain part of a lot is used in another lot. A more precise determination of probabilities would also require to consider buffer levels during manufacturing but that is beyond the scope of this work.

Fig. 2.
figure 2

One-to-many relationship between lots. It is a fairly common type of relationship in the manufacturing industry. The content of one batch is distributed or transformed in turn into other batches. This type of relationship can also be implemented efficiently in most current database systems, including relational systems.

Fig. 3.
figure 3

Many-to-many relationship between lots i.e. blurred lots. It is a complex, yet a fairly common, type of relationship in the manufacturing industry. It takes place when many lots evolve or are distributed among other lots making their trace very difficult to follow. We hypothesize that only graph database systems can model and implement a solution efficiently. This type of relationship represents the central problem around which our research work revolves.

3.2 Implementation

We have implemented a prototypical software solution to we can see how a given tracking system could use the graph database OrientDBFootnote 1 in order to implement common operations. We have chosen OrientDB, since it is a multimodal NoSQL (which stands for not only SQL) database that combines properties of document-oriented and graph databases [7]. It allows users to define graph structures using concepts for nodes and edges but also allows us to append complex data to nodes in the form of documents. Nodes and edges can have attributes (e.g. edge weight or similar).

Moreover, inheritable classes can be defined for the nodes and edges which can be extended flexibly. The query language is an adapted form of SQL and it is very intuitive and easy to use. In fact, the queries can be easy expressed through a user interface as represented in Fig. 4.

Fig. 4.
figure 4

OrientDB graphical interface that allows users to design and launch queries related to the distribution and/or evolution of the different lots through the manufacturing process.

It is important to note that the smallest unit that can be loaded and saved from the database is a record. OrientDB distinguishes between four types of records: A record can be a document, a RecordBytes (BLOB), a node (Vertex) or an edge (Edge). In our specific case, working with Vertex and Edge data types is enough. But as future work, we can envision that it would be useful to use the data type document to offer explicit support when dealing with situations that have been already before. Therefore, choosing OrientDB also represents an efficient and scalable alternative, which is why an OrientDB-based implementation has been set up.

The rationale behind the election of this solution is that, compared to implementations based on relational database systems, using a graph database leads to an efficient and scalable solution in which the problem at hand can be modeled easily [13]. Maybe the most clear example is the action of traversing graphs. The fact is that traversing graphs modeled in relational database systems would require to write nested and recursive queries that are difficult to maintain and provide bad comparable performance.

The following SQL source code shows an brief example of how to model the problem using a traditional relational database approach to illustrate our viewpoint.

figure a

This source code shows how workpieces and their relations (in the form of belonging to a lot) can be modeled. We need a table for workpieces and a different one for lots, Then by means of insertions we can store the corresponding data. Finally, it is necessary to design a query to get the results. However, when using our approach we can appreciate several advantages. In this way we can represent the problem in a very natural way and proceed to the implementation of a solution that is both fast and efficient.

It is important to remark that traversing a graph is the act of visiting the nodes in the graph. For graph databases, traversing to nodes via their relationships can be compared to the join operations on the relational database tables.

The great advantage of this solution is that the operation for traversing a graph is much faster than the traditional joins from the relational databases world. The reason is that when querying the database with a traversal, the model only considers the data that is needed without taking into account any kind of grouping operations on the entire data, as it happens in traditional relational databases.

3.3 Use Cases

Based on the implementation that has been described above, such as the graph-based approach is considered to have a positive impact on the daily operations of the manufacturing industry. In particular, the depth calculation, i.e. the distance between lots, could be used as a basis for calculating the dependency path of a part in a product.

In order to illustrate our approach, we show here some examples of queries that our system can support. The following code shows how to calculate the shortest path between nodes #33:27050 and #29:2667 regardless of edge direction. Note that the unwind directive is useful while performing the aggregation of the nodes in the path.

figure b

Figure 5 shows the resulting graph from the calculation of the shortest path between nodes #33:27050 and #29:2667 regardless of edge direction.

Fig. 5.
figure 5

Result of calculating the shortest path between nodes #33:27050 and #29:2667 regardless of edge direction.

OrientDB solution already implements the so-called search with the so-called BREADTH FIRST, which returns the real depth in the graph. It is trivial to see that the greater the depth, the less likely it is that each lot has been incorporated into the end product. The following code results in Fig. 6, and it shows us how to traverse the graph in the direction of the edge direction from node #33:27050 to a depth of 10.

figure c
Fig. 6.
figure 6

Result of traversing the graph from node #33:27050 to a depth of 10.

An example of SQL-like query to get the shortest path between lot 33 : 27050 and 29 : 2667 can be easily written as:

figure d

The result of this query is depicted in Fig. 7. The number of edges between the lots gives a basic notion of probability that parts of lot 29 : 2667 are built-in parts of lot 33 : 27050. As we have seen before, the higher the number of nodes visited by the query, the lower the probability that each of the specific nodes is affected by the error or failure that the operator or quality engineer are looking for.

Fig. 7.
figure 7

Result of calculating the shortest path between the nodes #33:27050 and #29:2667.

4 Discussion

During the last years, the ever-increasing technical literature concerning graph-based research clearly shows us that one of the current main challenges of computer science consists of finding proper ways to model the knowledge generated in a specific domain [18].

With this regard, Knowledge Graphs [21] have gained some popularity recently. In this respect, we believe that the manufacturing domain fits very well. This opinion is shared with other researchers who have already been working in this direction to be able to explore the knowledge graph by the industry [15].

In this context, some software solutions being able to facilitate the handling of product-related production for manufacturing enterprises in the industry sector are becoming more popular [10]. Some of the most popular models to represent domain knowledge are based on some kind of the so-called knowledge graphs [8]. Knowledge graphs are intended to represent entities and relationships between entities within a particular universe of discourse. The advantage of using this kind of knowledge representation is that it is easy to understand for both humans and computers at the same time.

The origin of knowledge graphs is the combination of the knowledge bases with the inference engines, what is also referred to as Knowledge-Based Systems in the literature. One of the first approaches that one could think about would be a system that represents knowledge with uncertainty using a set of rules to the that they are given a certainty factor. However, these types of systems based on rules are not very robust, so they have been progressively replaced by another type of more efficient system, being at present the Bayesian networks the most used way of representing and inferring interesting knowledge with uncertainty currently.

Using this idea of knowledge-based systems, Google launched the so-called Google Knowledge Graph [6] several years ago. This graph seeks to have a universal domain, representing all existing entities and relationships between them, without being subject to a single context. Having such a domain broad has great complexity, and is incomplete. It is complex to introduce a new entity and determine what other entities it relates to and what type of relationship unites them.

This task of generating new entities is not currently automated but is the users themselves who introduce new entities to the network and determine what other entities relate to each other and how. It is expected that within the next years, this technology will be developed and it is possible that some of its foundations can be applied in the manufacturing domain.

5 Conclusions and Future Work

In this work, we have presented the design of a general solution for tracking each component that comprises manufacturing products through diverse stages of the production chain. Our solution has been modeled using graph databases as opposed to most existing solutions that use relational databases. In this way, our approach provides an improved level of both transparency and traceability, since we think that a graph is the natural way to model a problem involving dependencies of this kind.

Transparency is given by the fact that it is for a human operator to see what actions have been performed during the manufacturing process. Traceability is given by the fact that it is possible to monitor the whole development process followed by the manufacturer. In addition, these two properties are assumed to facilitate the analysis of all manufacturing products as well as the capability to look for final products that could be affected by some specific problems. In this way, our approach presents more efficient modeling and querying mechanisms than traditional approaches based on relational databases.

As future research work, we envision that graph databases hold a lot of unrealized potential in the next years as companies will be moving towards approaches being able to better data analysis and exploration. More specifically, we think that there is a number of pending challenges. For example, it is important to further investigate whether it is possible to document past errors, to facilitate the task of discovering future errors. To do this, it is necessary to investigate whether it is possible to associate documentation with certain error patterns, which have to happen recurrently during the manufacturing process in the past. We also believe that the use of Knowledge Graphs can play a determining role, since these graphs allow not only to model the problem in a natural way, but also to reason with the data we work with.