Truth Discovery in Material Science Databases

  • Eve BélisleEmail author
  • Zi Huang
  • Aimen Gheribi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9093)


Instead of performing expensive experiments, it is common in industry to make predictions of important material properties based on some existing experimental results. Databases consisting of experimental observations are widely used in the field of Material Science Engineering. However, these databases are expected to be noisy since they rely on human measurements, and also because they are an amalgamation of various independent sources (research papers). Therefore, some conflicting information can be found between various sources. In this paper, we introduce a novel truth discovery approach to reduce the amount of noise and filter the incorrect conflicting information hidden in the scientific databases. Our method ranks the multiple data sources by considering the relationships between them, i.e., the amount of conflicting information and the amount of agreement, and as well eliminates the conflicting information. The scalable Gaussian process interpolation technique (SGP) is then applied to the clean dataset to make predictions of materials property. Comprehensive performance study has been done on a real life scientific database. With our new approach, we are able to highly improve the accuracy of SGP predictions and provide a more reliable database.


Noisy Data Interpolation Technique Truth Discovery Gaussian Process Regression Training Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bélisle, E., Huang, Z., Gheribi, A.: Scalable gaussian process regression for prediction of material properties. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp. 38–49. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  2. 2.
    Bélisle, E., Huang, Z., Le Digabel, S., Gheribi, A.E.: Evaluation of machine learning interpolation techniques for prediction of physical properties. Computational Materials Science 98, 170–177 (2015)CrossRefGoogle Scholar
  3. 3.
    Besses, B.D.D.: Xongrid interpolation add-in (2015)Google Scholar
  4. 4.
    Birol, B., Polat, G., Saridede, M.: Estimation model for electrical conductivity of molten caf2-al2o3-cao slags based on optical basicity. JOM, pp. 1–9 (2014)Google Scholar
  5. 5.
    Dekel, O., Shamir, O.: Vox populi: Collecting high-quality labels from a crowd (2009)Google Scholar
  6. 6.
    Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proceedings of the VLDB Endowment 2(1), 562–573 (2009)CrossRefGoogle Scholar
  7. 7.
    Dong, X.L., Saha, B., Srivastava, D.: Less is more: Selecting sources wisely for integration. In: Proceedings of the VLDB Endowment, vol. 6, pp. 37–48. VLDB Endowment (2012)Google Scholar
  8. 8.
    Gheribi, A., Audet, C., Digabel, S.L., Bélisle, E., Bale, C., Pelton, A.: Calculating optimal conditions for alloy and process design using thermodynamic and property databases, the factsage software and the mesh adaptive direct search algorithm. Calphad 36, 135–143 (2012)CrossRefGoogle Scholar
  9. 9.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5), 604–632 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  10. 10.
    Mualem, Y., Friedman, S.P.: Theoretical prediction of electrical conductivity in saturated and unsaturated soil. Water Resources Research 27(10), 2771–2777 (1991)CrossRefGoogle Scholar
  11. 11.
    Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622. ACM (2008)Google Scholar
  12. 12.
    Sourmail, T., Garcia-Mateo, C.: Critical assessment of models for predicting the ms temperature of steels. Computational Materials Science 34(4), 323–334 (2005)CrossRefGoogle Scholar
  13. 13.
    Tsuboi, H., Chutia, A., Lv, C., Zhu, Z., Onuma, H., Miura, R., Suzuki, A., Sahnoun, R., Koyama, M., Hatakeyama, N., Endou, A., Takaba, H., Carpio, C.A.D., Deka, R.C., Kubo, M., Miyamoto, A.: An electrical conductivity prediction simulator based on tb-qcmd and kmc. system development and applications. Journal of Molecular Structure: THEOCHEM, 903(1–3):11–22, Recent advances in the theoretical understanding of catalysis (2009)Google Scholar
  14. 14.
    Wang, D., Kaplan, L., Le, H., Abdelzaher, T.: On truth discovery in social sensing: A maximum likelihood estimation approach. In: Proceedings of the 11th International Conference on Information Processing in Sensor Networks, pp. 233–244. ACM (2012)Google Scholar
  15. 15.
    Yin, X., Han, J., Yu, P.: Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering 20(6), 796–808 (2008)CrossRefGoogle Scholar
  16. 16.
    Yin, X., Tan, W.: Semi-supervised truth discovery. In: Proceedings of the 20th International Conference on World Wide Web, pp. 217–226. ACM (2011)Google Scholar
  17. 17.
    Zhao, Z., Cheng, J., Ng W.: Truth discovery in data streams: A single-pass probabilistic approach. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1589–1598. ACM (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.University of QueenslandBrisbaneAustralia
  2. 2.École Polytechnique de MontréalMontréalCanada

Personalised recommendations