Identify crystal structures by a new paradigm based on graph theory for building materials big data
Material identification technique is crucial to the development of structure chemistry and materials genome project. Current methods are promising candidates to identify structures effectively, but have limited ability to deal with all structures accurately and automatically in the big materials database because different material resources and various measurement errors lead to variation of bond length and bond angle. To address this issue, we propose a new paradigm based on graph theory (GTscheme) to improve the efficiency and accuracy of material identification, which focuses on processing the “topological relationship” rather than the value of bond length and bond angle among different structures. By using this method, automatic deduplication for big materials database is achieved for the first time, which identifies 626,772 unique structures from 865,458 original structures. Moreover, the graph theory scheme has been modified to solve some advanced problems such as identifying highly distorted structures, distinguishing structures with strong similarity and classifying complex crystal structures in materials big data.
Keywordstructures identification graph theory big data topological relationship materials database
Unable to display preview. Download preview PDF.
The authors thank Dr. Lin-Wang Wang from Lawrence Berkeley National Laboratory and Dr. Wenfei Fan from the University of Edinburgh for their helpful discussions. This work was supported by the National Key R&D Program of China (2016YFB0700600), the National Natural Science Foundation of China (21603007, 51672012), Soft Science Research Project of Guangdong Province (2017B030301013), and New Energy Materials Genome Preparation & Test Key-Laboratory Project of Shenzhen (ZDSYS201707281026184).
- 10.Crystallography Open Database. https://doi.org/www.crystallography.net/cod/index.php