Hybrid Approach Using Ontology-Supported Case-Based Reasoning and Machine Learning for Defect Rate Prediction

Ji, Bongjun; Ameri, Farhad; Choi, Junhyuk; Cho, Hyunbo

doi:10.1007/978-3-030-30000-5_37

Bongjun Ji^19,20,
Farhad Ameri¹⁹,
Junhyuk Choi²⁰ &
…
Hyunbo Cho²⁰

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 566))

Included in the following conference series:

IFIP International Conference on Advances in Production Management Systems

4320 Accesses
2 Citations

Abstract

Manufacturers always strive to eliminate defects using different quality assurance tools and methods but some defect is often unavoidable. To compensate for defective products, surplus batches should be produced. However, surplus production is costly and it results in waste. In this paper, we propose an approach to predict defect rate and to set an appropriate amount of surplus production to replace defective products. This will result in reduced overproduction and underproduction costs. In the proposed work, the production order is represented ontologically. A formal ontology enables building clusters of similar production orders. A defect prediction model is developed for each cluster using Mixture Density Networks when a new order is received, the most similar production order, and its related cluster is retrieved. The prediction model of the retrieved cluster is then applied to the new production order. Accordingly, the optimal production amount is calculated based on defect rate, the overproduction cost and the underproduction cost. The proposed approach was validated based on a use case from the cosmetic packaging industry.

You have full access to this open access chapter, Download conference paper PDF

Using Bayesian network technology to predict the semiconductor manufacturing yield rate in IoT

Article 01 February 2021

Xiaodong Fang, Chan Chang & Genggeng Liu

Data analytics for quality management in Industry 4.0 from a MSME perspective

Article 06 August 2021

Gorkem Sariyer, Sachin Kumar Mangla, … Sunil Luthra

Development of an Ontology for Defect Classification in Remanufacturing

Keywords

1 Introduction

Overproduction is one of the major sources of waste in manufacturing. Overproduction is excessive production of parts and products beyond the actual need. In lean literature, overproduction is known as the first type of waste, because it induces another major waste, which is inventory. One of the most common causes of overproduction is the expectation of defects. Typically, underproduction cost is much larger than overproduction cost, so to compensate for the number of defective items, manufacturers tend to produce more than the needed amount of a product. For example, if the order is for 95 units, and the process has a 5% defect rate, then the manufacturer would produce at least 100 units to cover the expected defect rate. The surplus production might increase even more: (1) if the manufacturing process has not yet reached a stable state, (2) production proceeds in small quantity batch production (3) failure to produce the required amount causes stoppage of subsequent processes.

Kanban is a type of pull production system which is used to prevent overproduction in lean manufacturing [1]. In a Kanban system, a work can be started only when production approval card, called Kanban, is available [2]. However, the Kanban system assumes stable and repetitive production. Therefore, it is not suitable for fluctuating demand and product mix.

Cosmetic packaging is an example industry with fluctuating demand and product mix. For this reason, overproduction occurs frequently in the cosmetic packaging industry. Customer needs change rapidly, and therefore, product life continues to shrink. Frequent introduction of new products hinders stabilization of the manufacturing process, which often calls for production of small batches. Another complicating factor is the number of steps involved cosmetic packaging. To improve the aesthetics of the products, several processes are required. Therefore, overproduction is often propagated to several downstream processes. Therefore, more intelligent and data-driven methods are required to predict the defect rate accurately and, consequently, avoid overproduction.

The objective of this paper is to propose a systematic approach for defect rate prediction using ontology-supported case-based reasoning and machine learning (ML) techniques in cosmetic packaging industry. This paper is organized as follows. Section 2 provides a brief review of the related works. Section 3 introduces the concept of an ontology-supported case-based reasoning approach for defect rate prediction in cosmetic manufacturing enterprise. Finally, Sect. 4 provides the concluding remarks and identifies the future work.

2 Related Work

In this section, we provide a review of the proposed methods for predicting yield and number of defects in the manufacturing domain.

2.1 Yield Prediction

Yield refers to the percentage of non-defective product; i.e., the complementary measurement of defect rate. Yield prediction is widely used in semiconductor manufacturing to improve yield by providing early alert of nonconforming wafers, and thereby decreasing monetary losses. Semiconductor wafer yield is affected many factors, so traditional statistical analysis models do not work well to predict it [3].

Neural networks (NNs) have been used to predict yield. Tong et al. [4] proposed an NN-based approach, and also used fuzzy adaptive resonance theory to groups patterns into the appropriate number of clusters. Tong and Chao [5] used a general regression neural network (GRNN) because it can process both continuous and categorized output, and can be used if the linearity assumption is violated. Chen and Lin [6] proposed a fuzzy NN system, but it does not consider electrical parameters even though it is critical Wu and Zhang [7] considered electrical parameters along with key attributed parameters and physical parameters; the authors conducted statistical correlation analysis to identify electrical parameters. Lee and Ha [8] integrated a back-propagation network with case-based reasoning. The approach consists of four phases: learning relations between case variables and yield, weighting of features, extracting similar cases, and calculating the weighted averages of extracted yield. The paper was the first attempt to hybridize machine learning with case-based reasoning for yield prediction. Pak et al. [9] used a support-vector machine to predict yield; they also used an under-sampling method to eliminate imbalance from the data.

2.2 Defect Rate Prediction in Assembly Process

Various approaches of prediction have been proposed based on design characteristics of products, and on ergonomics. A Design for Assembly (DFA) technique allows a manufacturer to examine design alternatives in early design stage, to reduce assembly cost [10]. It is also used to evaluate the likelihood of mistake, and to identify potential failure [11]. The Hinckley model [12] is based on the idea that defect rate is positively correlated with assembly time and negatively correlated with the number of assembly operations. This model provides insight, but the real word is not that simple. Shibata [13] suggested a model that considers process and design factors, and Antani [14] considered human factors by developing a regression-based defect rate prediction model in automated and semi-automated assembly; this model was then validated in a manual automobile-assembly process [15].

Numerous prediction models that have been developed in Sect. 2.1 are highly suitable for implementation in large scale manufacturing. The most of small and medium engineering enterprises (SMEs) cannot afford the cost of introducing equipment with real-time sensors or installing sensors on every existing equipment. Therefore, it is hard to be implemented in small and medium engineering enterprises (SMEs). On the other hand, the prediction approaches in Sect. 2.2 is based on characteristics of product and process, do not require additional investment on equipment. Hence the approaches in Sect. 2.2 are relatively easy to apply. However, previous approaches cannot consider the cost of error, only focused on accuracy. To overcome these issues, this paper tried to consider expected cost based on the probability of defect rate is used to consider the expected cost.

3 The Proposed Framework for Defect Rate Prediction

The proposed framework is composed of 3 main phases as shown in Fig. 1. The first phase is the off-line phase of the framework when the clusters of existing work orders are created and their corresponding prediction model is developed. The second phase (the on-line phase) is related to predicting defect rate for new work orders. The third phase is a continuous phase where the actual and predicted defect rates are compared and the prediction models are further tuned and updated.

3.1 Ontology Development

One innovative aspect of the proposed frame work is to use an ontological approach for representing the data related to previous production work orders. Ontology-based approach can be used to determine the similarity between production orders. When data is annotated by ontological entities, one can easily and accurately retrieve the most similar production orders in from the repository of the existing orders and reused their related defect rate prediction model. Also, ontology helps the users understand, communicate, and manage information effectively by standardizing the terminology used for production order description. Some examples of the key notions in production orders include product, customer, production month, manufacturing process, production team, and production quantity. Figure 2 shows the major classes and the relationship between the classes for the Work Order Ontology (WOO).

The workorder ontology is publicly available at https://github.com/corori/Ontology1.

3.2 Phase 1: Prediction Model Deployment

In this phase, the production orders, that are represented ontologically, are clustered based on their similarities. Next, historical data collection is conducted to be used in development of the prediction model for each cluster of production order. Sensitivity analysis and data wrangling are used to verify the relationship between various properties of the production order class and defect rate. For example, for some types of production order, the defect rate may decrease as production volume increases (Fig. 3).

The prediction models are developed for clusters of similar production order because similar production order shows similar trend of defect rate. For example, the defect rate trend of tube which is made by extrusion is totally different with bottle set which is made by assembly process. In the clustering step, the concepts are treated as features and properties are treated as feature value. Also, we should consider that even though the manufacturer produces the product with same production order properties, the defect rate is not a single point, but has a distribution in some cases (Fig. 4). For this type of problem, a Mixture Density Networks (MDN) is a suitable prediction algorithm. It can model general conditional probability densities and outputs the distribution [16]. Also, the distribution which is output of Mixture Density Network, can be used when the expected cost of defect rate is calculated.

3.3 Phase 2: Prediction Model Deployment

In this phase, the developed prediction model is deployed for use. When a new production order is received, the feature value of the production order is measured in WO description step. Then the most-similar production order is identified by calculating similarity between the new production order and all existing production orders. The similarity of the new production order to the stored production orders is determined by calculating the similarity between production order features. Three major methods can be used to determine the similarity between production orders: the edge-based method, the information-content-based method, and the feature-based method [17]. In the edge-based method, the path length between terms in an is-a taxonomy represents the similarity [18]. In the information-content-based method, the similarity of two production orders depends on the degree of informativeness of the superclass that includes both production orders [19]. The similarity is defined as

$$ {\text{Sim }}\left( {{\text{c}}_{1} ,{\text{c}}_{2} } \right) = \mathop { \hbox{max} }\limits_{{c \in S\left( {c_{1} c_{2} } \right)}} \left[ { - \log p\left( c \right)} \right], $$

(1)

where, $ {\text{c}}_{1} ,{\text{c}}_{2} $ are production orders, and $ S\left( {c_{1} c_{2} } \right) $ is the set of concepts that subsumes both production orders. The negative log likelihood is the information content of a production order c according to information theory [20]. In the feature-based method, the similarity between production orders $ {\text{C}}_{1} $ and $ {\text{C}}_{2} $ is a function of their common and distinctive features [21]:

$$ {\text{Sim }}\left( {{\text{c}}_{1} ,{\text{c}}_{2} } \right) = \frac{{n_{{C_{1} \cap C_{2} }} }}{{n_{{C_{1} \cap C_{2} }} + \mu n_{{C_{1} - C_{2} }} + \nu n_{{C_{2} - C_{1} }} }}, $$

(2)

where $ \mu ,\nu \in {\mathbb{R}} $ are constants that are weighting factors, $ n_{{C_{1} \cap C_{2} }} $ represents the number of common features, and $ n_{{C_{1} - C_{2} }} $ and $ n_{{C_{2} - C_{1} }} $ are the numbers of distinctive features of $ {\text{C}}_{1} $ and $ {\text{C}}_{2} $ respectively.

Each of these methods has its strengths and weaknesses. The first and second similarity measure methods need a taxonomy [17]. In the cosmetic industry, such a taxonomy is not available for production order, so the third method is preferred. The most-similar production order can be retrieved by using the third method. Then the prediction model of the cluster which include the most similar production order is applied. The last step in phase 2 is to set an optimal defect rate that considers the cost. The expected cost of under production and overproduction is calculated as

$$ \sum\nolimits_{i} {p\left( {Defect\,rate = i|z_{i} ,x} \right) \times unit\,{\text{cost}}_{underproduction} \left( {order - z_{i} } \right)} , $$

(3)

$$ \sum\nolimits_{i} {p(Defect\,rate = i|z_{i} ,x) \times unit\,{\text{cost}}_{overproduction} \left( {order - z_{i} } \right)} , $$

(4)

where $ x $ is the set features, $ z_{\text{i}} $ is the production amount when defect rate i is applied. Underproduction cost includes a delivery-delay penalty, an additional transfer fee to meet the delivery deadline, the cost of additional production, and the cost of adjusting production planning. Overproduction cost includes additional production cost and inventory cost. Unit costs vary; examples of units are minimum lot size, production order, and date. The expected cost of each defect rate is the sum of the underproduction cost, overproduction cost and production cost. The optimal defect rate is the one that has the lowest expected cost. When this rate is determined, the production amount can be determined.

3.4 Phase3: Monitoring

Deployment is not the end of the phase. In many cases, the users of the model will the be manufacturing operators, and not the data analyst. For effective and efficient use of the model, it should be updated at appropriate times. Hence, a threshold is set such that if the difference between predicted defect rate and actual defect rate exceeds some threshold, the model should be re-trained.

4 Conclusion and Future Work

This paper proposes an approach to predict defect rate to minimize costs of underproduction and overproduction. Defects are unavoidable, so to compensate for expected defects, a manufacturer tends to produce more products than the quantity ordered. Overproduction wastes production resources and increases inventory cost. Underproduction causes delivery delay, and adds the cost of adjusting production planning; underproduction can even cause overproduction, because manufacturers must re-produce a minimum amount of production. We develop a method to predict defect rate by combining an ontology-supporting case-based reasoning approach with a machine-learning approach. Existing methods to predict defect rates have not considered the costs of underproduction and overproduction. The proposed approach has two main advantages: (1) it combines ontology-supporting case-based reasoning and machine learning to improve the accuracy of the prediction, and (2) it considers probability to minimize costs caused by both overproduction and underproduction. Although this approach is still conceptual at present, and must be developed and verified, it is an important step towards efficient production. Further study will include experiment, validation and verification of proposed approach.

References

Sendil Kumar, C., Panneerselvam, R.: Literature review of JIT-KANBAN system. Int. J. Adv. Manuf. Technol. 32(3–4), 393–408 (2007)
Article Google Scholar
Dalllery, Y., Liberopoulos, G.: Extended kanban control system: combining kanban and base stock. IIE Trans. 32(4), 369–386 (2000)
Google Scholar
Li, T.S., Huang, C.L., Wu, Z.Y.: Data mining using genetic programming for construction of a semiconductor manufacturing yield rate prediction system. J. Intell. Manuf. 17, 355–361 (2006)
Article Google Scholar
Tong, L.I., Lee, W.I., Su, C.T.: Using a neural network-based approach to predict the wafer yield in integrated circuit manufacturing. IEEE Trans. Compon. Packag. Manuf. Technol. Part C: 20(4), 288–294 (1997)
Article Google Scholar
Tong, L.I., Chao, L.C.: Novel yield model for integrated circuits with clustered defects. Expert Syst. Appl. 34(4), 2334–2341 (2008)
Article Google Scholar
Chen, T., Lin, Y.C.: A fuzzy-neural system incorporating unequally important expert opinions for semiconductor yield forecasting. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 16(01), 35–58 (2008)
Article Google Scholar
Wu, L., Zhang, J.: Fuzzy neural network based yield prediction model for semiconductor manufacturing system. Int. J. Prod. Res. 48(11), 3225–3243 (2010)
Article Google Scholar
Lee, J.H., Ha, S.H.: Recognizing yield patterns through hybrid applications of machine learning techniques. Inf. Sci. 179(6), 844–850 (2009)
Article Google Scholar
Pak, S.R., et al.: Yield prediction using support vectors based under-sampling in semiconductor process. Int. Sch. Sci. Res. Innov. 6(12), 2755–2759 (2012)
Google Scholar
Boothroyd, G., Dewhurst, P.: Product Design for Assembly, 3rd edn. Boothroyd Dewhurst Incorporated, Wakefield (1986)
Google Scholar
Su, Q., Liu, L., Lai, S.: Measuring the assembly quality from the operator mistake view: a case study. Assembly Autom. 29(4), 332–340 (2009)
Article Google Scholar
Hinckley, C.M.: A global conformance quality model: a new strategic tool for minimizing defects caused by variation, error, and complexity. Stanford University, California (1993)
Google Scholar
Shibata, H.: Global assembly quality methodology: a new method for evaluating assembly complexities in globally distributed manufacturing. Ph.D. dissertation, Stanford University (2002)
Google Scholar
Antani, K.R.: A study of the effects of manufacturing complexity on product quality in mixed-model automotive assembly. Ph.D. dissertation, Clemson University (2014)
Google Scholar
Krugh, M., Antani, K., Mears, L., Schulte, J.: Prediction of defect propensity for the manual assembly of automotive electrical Connectors. Procedia Manuf. 5, 144–157 (2016)
Article Google Scholar
Bishop, C.M.: Mixture density networks. Technical report. Aston University, Birmingham (1994)
Google Scholar
Ameri, F., Dutta, D.: A matchmaking methodology for supply chain deployment in distributed manufacturing environments. J. Comput. Inf. Sci. Eng. 8(1), 011002 (2008)
Article Google Scholar
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 19(1), 17–30 (1989)
Article Google Scholar
Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999)
Article Google Scholar
Sheldon, R.: A First Course in Probability, 9th edn. Pearson, Boston (2014)
MATH Google Scholar
Tversky, A.: Features of similarity. Psychol. Rev. 84(4), 327 (1977)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Texas State University, San Marcos, TX, 78666, USA
Bongjun Ji & Farhad Ameri
Pohang University of Science and Technology, Pohang, Gyeongsangbuk-do, 37673, Republic of Korea
Bongjun Ji, Junhyuk Choi & Hyunbo Cho

Authors

Bongjun Ji
View author publications
You can also search for this author in PubMed Google Scholar
Farhad Ameri
View author publications
You can also search for this author in PubMed Google Scholar
Junhyuk Choi
View author publications
You can also search for this author in PubMed Google Scholar
Hyunbo Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyunbo Cho .

Editor information

Editors and Affiliations

Texas State University, San Marcos, TX, USA
Farhad Ameri
The University of Texas at Dallas, Richardson, TX, USA
Kathryn E. Stecke
ZF Friedrichshafen AG, Friedrichshafen, Germany
Gregor von Cieminski
EPFL, SCI-STI-DK, Lausanne, Switzerland
Dimitris Kiritsis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, B., Ameri, F., Choi, J., Cho, H. (2019). Hybrid Approach Using Ontology-Supported Case-Based Reasoning and Machine Learning for Defect Rate Prediction. In: Ameri, F., Stecke, K., von Cieminski, G., Kiritsis, D. (eds) Advances in Production Management Systems. Production Management for the Factory of the Future. APMS 2019. IFIP Advances in Information and Communication Technology, vol 566. Springer, Cham. https://doi.org/10.1007/978-3-030-30000-5_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-30000-5_37
Published: 24 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29999-6
Online ISBN: 978-3-030-30000-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)