An effective approach for causal variables analysis in diesel engine production by using mutual information and network deconvolution



The effective control of the power consistency, which is one of the most important quality indicators of diesel engine, plays a decisive role for improving the competitiveness of the products. The widely used sensors and other data acquisition equipment make the “data-driven quality control” become possible. However, how to determine the highly related parameters with the engine power from massive captured manufacturing data and effectively discriminated the direct and indirect dependencies between these variables are still challenging. This paper proposed a feature selection algorithm named NMI-ND which uses network deconvolution (ND) to infer causal correlations among various diesel engine manufacturing parameters from the observed correlations based on normalized mutual information (NMI). The proposed algorithm is thoroughly evaluated through the experimental study by comparing it with other representative feature selection algorithms. The comparison demonstrates that NMI-ND performs better in both effectiveness and efficiency.


Power consistency Causal variables analysis Transitive effects Mutual information Network deconvolution 



This work was supported by financial support of National Science Foundation of China (Nos. 51435009, 51775348), National Technology Support Program of China (No. 2015BAF12B02) and Shanghai Aerospace Science and Technology Innovation Fund (No. SAST2016048).


  1. Alaeddini, A., & Dogan, I. (2011). Using Bayesian networks for root cause analysis in statistical process control. Expert Systems with Applications, 38(9), 11230–11243. Scholar
  2. Arturo Garza-Reyes, J., Flint, A., Kumar, V., Antony, J., & Soriano-Meier, H. (2014). A DMAIRC approach to lead time reduction in an aerospace engine assembly process. Journal of Manufacturing Technology Management, 25(1), 27–48.CrossRefGoogle Scholar
  3. Bai, Y., Sun, Z., Zeng, B., Long, J., Li, L., & Oliveira, J. V. D., et al. (2018). A comparison of dimension reduction techniques for support vector machine modeling of multi-parameter manufacturing quality prediction. Journal of Intelligent Manufacturing, 1–12.Google Scholar
  4. Barzel, B., & Barabási, A. L. (2013). Network link prediction by global silencing of indirect correlations. Nature Biotechnology, 31(8), 720–725.CrossRefGoogle Scholar
  5. Çaydaş, U., & Ekici, S. (2012). Support vector machines models for surface roughness prediction in cnc turning of aisi 304 austenitic stainless steel. Journal of Intelligent Manufacturing, 23(3), 639–650.CrossRefGoogle Scholar
  6. Chang, W., Gao, C., Xiao, Y., & Zhou, S. (2016). Mining approximate dependencies from diesel engine assembling data using clustering-based rough sets theory. In Control and decision conference (CCDC), 2016 Chinese (pp. 5683–5687). IEEE.Google Scholar
  7. De La Fuente, A., Bing, N., Hoeschele, I., & Mendes, P. (2004). Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 20(18), 3565–3574.CrossRefGoogle Scholar
  8. Du, S., Lv, J., & Xi, L. (2010a). An integrated system for on-line intelligent monitoring and identifying process variability and its application. International Journal of Computer Integrated Manufacturing, 23(6), 529–542.CrossRefGoogle Scholar
  9. Du, S., Lv, J., & Xi, L. (2012). A robust approach for root causes identification in machining processes using hybrid learning algorithm and engineering knowledge. New York: Springer.Google Scholar
  10. Estvez, P. A., Tesmer, M., Perez, C. A., & Zurada, J. M. (2009). Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 20(2), 189–201.CrossRefGoogle Scholar
  11. Feizi, S., Marbach, D., Médard, M., & Kellis, M. (2013). Network deconvolution as a general method to distinguish direct dependencies in networks. Nature Biotechnology, 31(8), 726.CrossRefGoogle Scholar
  12. Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models. Science, 303(5659), 799–805.CrossRefGoogle Scholar
  13. Hall, M. A. (1998). Correlation-based feature subset selection for machine learning. Thesis submitted in partial fulfillment of the requirements of the degree of Doctor of Philosophy at the University of Waikato.Google Scholar
  14. Han, X., Shen, Z., Wang, W. X., & Di, Z. (2015). Robust reconstruction of complex networks from sparse data. Physical Review Letters, 114(2), 028701.CrossRefGoogle Scholar
  15. Hopf, T. A., Colwell, L. J., Sheridan, R., Rost, B., Sander, C., & Marks, D. S. (2012). Three-dimensional structures of membrane proteins from genomic sequencing. Cell, 149(7), 1607–1621.CrossRefGoogle Scholar
  16. Jia Q. (2012). Research and application of multivariate correlation and data processing engine, M.S. thesis, Dept. Mechanical Eng., Shanghai Jiao Tong University, Shanghai.Google Scholar
  17. Jones, D. T., Buchan, D. W., Cozzetto, D., & Pontil, M. (2011). PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics, 28(2), 184–190.CrossRefGoogle Scholar
  18. Kong, D., Ding, C., Huang, H., & Zhao, H. (2012). Multi-label relieff and f-statistic feature selections for image annotation. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2352–2359). IEEE.Google Scholar
  19. Le Novere, N. (2015). Quantitative and logic modelling of gene and molecular networks. Nature Reviews Genetics, 16(3), 146.CrossRefGoogle Scholar
  20. Li, Z., Wang, Y., & Wang, K. (2017a). A data-driven method based on deep belief networks for backlash error prediction in machining centers. Journal of Intelligent Manufacturing.
  21. Li, C., Liu, S., Zhang, H., & Hu, Y. (2017b). Machinery condition prediction based on wavelet and support vector machine. Journal of Intelligent Manufacturing, 28(4), 1–11.Google Scholar
  22. Neyman, J., & Pearson, E. S. (1992). On the problem of the most efficient tests of statistical hypotheses. In Breakthroughs in statistics (pp. 73–108). New York: Springer.
  23. Sun, H. P., Huang, Y., Wang, X. F., Zhang, Y., & Shen, H. B. (2015). Improving accuracy of protein contact prediction using balanced network deconvolution. Proteins: Structure, Function, and Bioinformatics, 83(3), 485–496.CrossRefGoogle Scholar
  24. Veiga, D. F. T., Vicente, F. F. R., Grivet, M., De la Fuente, A., & Vasconcelos, A. T. R. (2007). Genome-wide partial correlation analysis of Escherichia coli microarray data. Genetics and Molecular Research, 6(4), 730–742.Google Scholar
  25. Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2), 1–305.Google Scholar
  26. Wang, J., & Zhang, J. (2016). Big data analytics for forecasting cycle time in semiconductor wafer fabrication system. International Journal of Production Research, 54(23), 7231–7244.CrossRefGoogle Scholar
  27. Wang, J. L., Zhang, J., & Wang, X. X. (2018). A data driven cycle time prediction with feature selection in a semiconductor wafer fabrication system. IEEE Transactions on Semiconductor Manufacturing,. Scholar
  28. Weigt, M., White, R. A., Szurmant, H., Hoch, J. A., & Hwa, T. (2009). Identification of direct residue contacts in protein-protein interaction by message passing. Proceedings of the National Academy of Sciences, 106(1), 67–72.CrossRefGoogle Scholar
  29. Yanai, T., Kurashige, Y., Mizukami, W., Chalupský, J., Lan, T. N., & Saitow, M. (2015). Density matrix renormalization group for ab initio Calculations and associated dynamic correlation methods: A review of theory and applications. International Journal of Quantum Chemistry, 115(5), 283–299.CrossRefGoogle Scholar
  30. Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 856–863).Google Scholar
  31. Yu, J., Lee, H., Im, Y., Kim, M. S., & Park, D. (2010). Real-time classification of internet application traffic using a hierarchical multi-class SVM. KSII Transactions on Internet & Information Systems, 4(5), 859–876.Google Scholar
  32. Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205–1224.Google Scholar
  33. Zhang, X., Zhao, J., Hao, J. K., Zhao, X. M., & Chen, L. (2014). Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic Acids Research, 43(5), e31–e31.CrossRefGoogle Scholar
  34. Zhou, D., Gozolchiani, A., Ashkenazy, Y., & Havlin, S. (2015). Teleconnection paths via climate network direct link detection. Physical Review Letters, 115(26), 268501.CrossRefGoogle Scholar
  35. Zhou, X., & Jiang, P. (2014). Variation source identification for deep hole boring process of cutting-hard workpiece based on multi-source information fusion using evidence theory. Journal of Intelligent Manufacturing, 28, 1–16.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Mechanical EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.College of Mechanical EngineeringDonghua UniversityShanghaiChina

Personalised recommendations