Abstract
Instead of performing expensive experiments, it is common in industry to make predictions of important material properties based on some existing experimental results. Databases consisting of experimental observations are widely used in the field of Material Science Engineering. However, these databases are expected to be noisy since they rely on human measurements, and also because they are an amalgamation of various independent sources (research papers). Therefore, some conflicting information can be found between various sources. In this paper, we introduce a novel truth discovery approach to reduce the amount of noise and filter the incorrect conflicting information hidden in the scientific databases. Our method ranks the multiple data sources by considering the relationships between them, i.e., the amount of conflicting information and the amount of agreement, and as well eliminates the conflicting information. The scalable Gaussian process interpolation technique (SGP) is then applied to the clean dataset to make predictions of materials property. Comprehensive performance study has been done on a real life scientific database. With our new approach, we are able to highly improve the accuracy of SGP predictions and provide a more reliable database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bélisle, E., Huang, Z., Gheribi, A.: Scalable gaussian process regression for prediction of material properties. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp. 38–49. Springer, Heidelberg (2014)
Bélisle, E., Huang, Z., Le Digabel, S., Gheribi, A.E.: Evaluation of machine learning interpolation techniques for prediction of physical properties. Computational Materials Science 98, 170–177 (2015)
Besses, B.D.D.: Xongrid interpolation add-in (2015)
Birol, B., Polat, G., Saridede, M.: Estimation model for electrical conductivity of molten caf2-al2o3-cao slags based on optical basicity. JOM, pp. 1–9 (2014)
Dekel, O., Shamir, O.: Vox populi: Collecting high-quality labels from a crowd (2009)
Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth discovery and copying detection in a dynamic world. Proceedings of the VLDB Endowment 2(1), 562–573 (2009)
Dong, X.L., Saha, B., Srivastava, D.: Less is more: Selecting sources wisely for integration. In: Proceedings of the VLDB Endowment, vol. 6, pp. 37–48. VLDB Endowment (2012)
Gheribi, A., Audet, C., Digabel, S.L., Bélisle, E., Bale, C., Pelton, A.: Calculating optimal conditions for alloy and process design using thermodynamic and property databases, the factsage software and the mesh adaptive direct search algorithm. Calphad 36, 135–143 (2012)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5), 604–632 (1999)
Mualem, Y., Friedman, S.P.: Theoretical prediction of electrical conductivity in saturated and unsaturated soil. Water Resources Research 27(10), 2771–2777 (1991)
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622. ACM (2008)
Sourmail, T., Garcia-Mateo, C.: Critical assessment of models for predicting the ms temperature of steels. Computational Materials Science 34(4), 323–334 (2005)
Tsuboi, H., Chutia, A., Lv, C., Zhu, Z., Onuma, H., Miura, R., Suzuki, A., Sahnoun, R., Koyama, M., Hatakeyama, N., Endou, A., Takaba, H., Carpio, C.A.D., Deka, R.C., Kubo, M., Miyamoto, A.: An electrical conductivity prediction simulator based on tb-qcmd and kmc. system development and applications. Journal of Molecular Structure: THEOCHEM, 903(1–3):11–22, Recent advances in the theoretical understanding of catalysis (2009)
Wang, D., Kaplan, L., Le, H., Abdelzaher, T.: On truth discovery in social sensing: A maximum likelihood estimation approach. In: Proceedings of the 11th International Conference on Information Processing in Sensor Networks, pp. 233–244. ACM (2012)
Yin, X., Han, J., Yu, P.: Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering 20(6), 796–808 (2008)
Yin, X., Tan, W.: Semi-supervised truth discovery. In: Proceedings of the 20th International Conference on World Wide Web, pp. 217–226. ACM (2011)
Zhao, Z., Cheng, J., Ng W.: Truth discovery in data streams: A single-pass probabilistic approach. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1589–1598. ACM (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bélisle, E., Huang, Z., Gheribi, A. (2015). Truth Discovery in Material Science Databases. In: Sharaf, M., Cheema, M., Qi, J. (eds) Databases Theory and Applications. ADC 2015. Lecture Notes in Computer Science(), vol 9093. Springer, Cham. https://doi.org/10.1007/978-3-319-19548-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-19548-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19547-6
Online ISBN: 978-3-319-19548-3
eBook Packages: Computer ScienceComputer Science (R0)