New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships


Data mining has revolutionized sectors as diverse as pharmaceutical drug discovery, finance, medicine, and marketing, and has the potential to similarly advance materials science. In this paper, we describe advances in simulation-based materials databases, open-source software tools, and machine learning algorithms that are converging to create new opportunities for materials informatics. We discuss the data mining techniques of exploratory data analysis, clustering, linear models, kernel ridge regression, tree-based regression, and recommendation engines. We present these techniques in the context of several materials application areas, including compound prediction, Li-ion battery design, piezoelectric materials, photocatalysts, and thermoelectric materials. Finally, we demonstrate how new data and tools are making it easier and more accessible than ever to perform data mining through a new analysis that learns trends in the valence and conduction band character of compounds in the Materials Project database using data on over 2500 compounds.

This is a preview of subscription content, access via your institution.

FIG. 1
FIG. 2
FIG. 3
FIG. 4
FIG. 5
FIG. 6
FIG. 7
FIG. 8
FIG. 9
FIG. 10


  1. 1.

    G. Hautier, A. Jain, and S.P. Ong: From the computer to the laboratory: Materials discovery and design using first-principles calculations. J. Mater. Sci. 47 (21), 7317–7340 (2012).

    CAS  Article  Google Scholar 

  2. 2.

    K. Rajan and P. Mendez: Materials informatics. Mater. Today 8 (10), 38–45 (2005).

    CAS  Article  Google Scholar 

  3. 3.

    M. Rupp, E. Proschak, and G. Schneider: Kernel approach to molecular similarity based on iterative graph similarity. J. Chem. Inf. Model. 47 (6), 2280–2286 (2007).

    CAS  Article  Google Scholar 

  4. 4.

    M. Rupp, A. Tkatchenko, K.-R. Müller, V. Lilienfeld, and O. Anatole: Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).

    Article  CAS  Google Scholar 

  5. 5.

    K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, O.A. Von Lilienfeld, A. Tkatchenko, and K.R. Müller: Assessment and validation of machine learning methods for predicting molecular atomization energies. J. Chem. Theory Comput. 9, 3404–3419 (2013).

    CAS  Article  Google Scholar 

  6. 6.

    G. Bergerhoff, R. Hundt, R. Sievers, and I.D. Brown: The inorganic crystal-structure database. J. Chem. Inf. Comput. Sci. 23 (2), 66–69 (1983).

    CAS  Article  Google Scholar 

  7. 7.

    F.H. Allen: The cambridge structural database: a quarter of a million crystal structures and rising. Acta Crystallogr., Sect. B: Struct. Sci. 58, 380–388 (2002).

    Article  CAS  Google Scholar 

  8. 8.

    P. Villars: The linus pauling file (LPF) and its application to materials design. J. Alloys Compd. 279 (1), 1–7 (1998).

    CAS  Article  Google Scholar 

  9. 9.

    R.D. Shannon: Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides. Acta Crystallogr., Sect. A: Found. Adv. 32 (5), 751–767 (1976).

    Article  Google Scholar 

  10. 10.

    I.D. Brown and D. Altermatt: Bond-valence parameters obtained from a systematic analysis of the inorganic crystal structure database. Acta Crystallogr., Sect. B: Struct. Sci. 244 (2), 244–247 (1985).

    Article  Google Scholar 

  11. 11.

    M. O’Keefe and N.E. Brese: Bond–valence parameters for solids. Acta Crystallogr., Sect. B: Struct. Sci., Cryst. Eng. Mater. 47, 192–197 (1991).

    Article  Google Scholar 

  12. 12.

    I. Brown and K. Wu: Empirical parameters for calculating cation-oxygen bond valences. Acta Crystallogr., Sect. B: Struct. Sci. 32 (31563), 1957–1959 (1976).

    Article  Google Scholar 

  13. 13.

    I.D. Brown: On the geometry of OH…O hydrogen bonds. Acta Crystallogr., Sect. A: Found. Adv. 32 (31563), 24–31 (1976).

    Article  Google Scholar 

  14. 14.

    D. Yu and D. Xue: Bond analyses of borates from the inorganic crystal structure database. Acta Crystallogr., Sect. B: Struct. Sci. 62, 702–709 (2006).

    Article  CAS  Google Scholar 

  15. 15.

    A.L. Mackay: The statistics of the distribution of crystalline substances among the space groups. Acta Crystallogr. 22, 329–330 (1967).

    CAS  Article  Google Scholar 

  16. 16.

    V.S. Urusov and T.N. Nadezhina: Frequency distribution and selection of space groups in inorganic crystal chemistry. J. Struct. Chem. 50, 22–37 (2009).

    Article  CAS  Google Scholar 

  17. 17.

    S.C. Abrahams: Inorganic structures in space group P3m1; Coordinate analysis and systematic prediction of new ferroelectrics. Acta Crystallogr., Sect. B: Struct. Sci. 64, 426–437 (2008).

    CAS  Article  Google Scholar 

  18. 18.

    M. Avdeev, M. Sale, S. Adams, and R.P. Rao: Screening of the alkali-metal ion containing materials from the inorganic crystal structure database (ICSD) for high ionic conductivity pathways using the bond valence method. Solid State Ionics 2–5 (2012).

  19. 19.

    O. Muller and R. Roy: The Major Ternary Structural Families (Springer-Verlag, New York, 1974).

    Google Scholar 

  20. 20.

    D.G. Pettifor: The structures of binary compound: I. Phenomenological structure maps. J. Phys. C: Solid State Phys. 19, 285–313 (1986).

    CAS  Article  Google Scholar 

  21. 21.

    D.G. Pettifor: Structure maps in alloy design. J. Chem. Soc., Faraday Trans. 86 (8), 1209–1213 (1990).

    Article  Google Scholar 

  22. 22.

    D.G. Pettifor: Structure maps revisited. J. Phys.: Condens. Matter 15, 13–16 (2003).

    Google Scholar 

  23. 23.

    D. Morgan, J. Rodgers, and G. Ceder: Automatic construction, implementation and assessment of Pettifor maps. J. Phys.: Condens. Matter 15, 4361–4369 (2003).

    CAS  Google Scholar 

  24. 24.

    C.S. Kong, W. Luo, S. Arapan, P. Villars, S. Iwata, R. Ahuja, and K. Rajan: Information-theoretic approach for the discovery of design rules for crystal chemistry. J. Chem. Inf. Model 52, 1812–1820 (2012).

    CAS  Article  Google Scholar 

  25. 25.

    P.S. White, J.R. Rodgers, and Y. Le Page: Crystmet: A database of the structures and powder patterns of metals and intermetallics. Acta Crystallogr., Sect. B: Struct. Sci. 58, 343–348 (2002).

    Article  CAS  Google Scholar 

  26. 26.

    P. Villars and K. Cenzual: Pearsons crystal data: Crystal structure database for inorganic compounds (ASM International/Material Phases Data System, Vitznau, Switzerland, 2010).

    Google Scholar 

  27. 27.

    L. Glasser: Crystallographic information resources. J. Chem. Educ. (2015). acs.jchemed.5b00253.

  28. 28.

    SpringerMaterials: The Landolt-Börnstein database.

  29. 29.

    C. Bale, E. Bélisle, P. Chartrand, S. Decterov, G. Eriksson, K. Hack, I-H. Jung, Y-B. Kang, J. Melançon, A. Pelton, C. Robelin, and S. Petersen: FactSage thermochemical software and databases recent developments. Calphad 33 (2), 295–311 (2009).

    CAS  Article  Google Scholar 

  30. 30.

    P. Linstrom and W. Mallard: NIST Chemistry WebBook, NIST Standard Reference Database Number 69 (National Institute of Standards and Technology, Gaithersburg MD 20899, 2015).

    Google Scholar 

  31. 31.

    L. MatWeb: MatWeb, Material property data, Data base of materials data sheets.

  32. 32.

    MatNavi: NIMS materials database. (2014).

  33. 33.

    O. Kubaschewski, C.B. Alcock, and P.J. Spencer: Thermochemical Data, in: Materials Thermochemistry, 6th ed. (Pergamon Press, Oxford, 1993); ch. 5, p. 376.

    Google Scholar 

  34. 34.

    H. Okamoto: In Handbook of Ternary Alloy Phase Diagrams, P. Villars, A. Prince, and H. Okamoto eds.; (ASM International: OH, 1995); pp. 10378–10379.

    Google Scholar 

  35. 35.

    P. Hohenberg and W. Kohn: Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964).

    Article  Google Scholar 

  36. 36.

    W. Kohn and L. Sham: Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, 1133–1138 (1965).

    Article  Google Scholar 

  37. 37.

    M.D. Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C.K. Ande, S.V.D. Zwaag, J.J. Plata, C. Toher, S. Curtarolo, G. Ceder, K.A. Persson, and M. Asta: Charting the complete elastic properties of inorganic crystalline compounds. Sci. Data 2, 1–13 (2015).

    Article  Google Scholar 

  38. 38.

    A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K.A. Persson: Performance of genetic algorithms in search for water splitting perovskites. APL Mater. 1, 011002 (2013).

    Article  CAS  Google Scholar 

  39. 39.

    S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang, R.H. Taylor, L.J. Nelson, G.L.W. Hart, S. Sanvito, M. Buongiorno-Nardelli, N. Mingo, and O. Levy: A distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227–235 (2012).

    CAS  Article  Google Scholar 

  40. 40.

    J.E. Saal, S. Kirklin, M. Aykol, B. Meredig, and C. Wolverton: Materials design and discovery with high-throughput density functional theory: The open quantum materials database (OQMD). JOM 65 (11), 1501–1509 (2013).

    CAS  Article  Google Scholar 

  41. 41.

    J. Hachmann, R. Olivares-Amaya, S. Atahan-Evrenk, C. Amador-Bedolla, R.S. Sanchez-Carrera, A. Gold-Parker, L. Vogt, A.M. Brockway, and A. Aspuru-Guzik: The Harvard clean energy project: large-scale computational screening and design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2 (17), 2241–2251 (2011).

    CAS  Article  Google Scholar 

  42. 42.

    C. Ortiz, O. Eriksson, and M. Klintenberg: Data mining and accelerated electronic structure theory as a tool in the search for new functional materials. Comput. Mater. Sci. 44 (4), 1042–1049 (2009).

    CAS  Article  Google Scholar 

  43. 43.

    E. Blokhin, L. Pardini, F. Mohamed, K. Hannewald, L. Ghiringhelli, P. Pavone, C. Carbogno, J-C. Freytag, C. Draxl, and M. Scheffler: The NoMaD Repository.

  44. 44.

    V. Stevanović, S. Lany, X. Zhang, and A. Zunger: Correcting density functional theory for accurate predictions of compound enthalpies of formation: fitted elemental-phase reference energies. Phys. Rev. B: Condens. Matter Mater. Phys. 85 (11), 1–12 (2012).

    Article  CAS  Google Scholar 

  45. 45.

    D.D. Landis, J.S. Hummelshøj, S. Nestorov, J. Greeley, M. Dulak, T. Bligaard, J. Norskov, and K. Jacobsen: The computational materials repository. Comput. Sci. Eng. 14, 51–57 (2012).

    Article  Google Scholar 

  46. 46.

    J.S. Hummelshøj, F. Abild-Pedersen, F. Studt, T. Bligaard, and J.K. Nørskov: CatApp: A web application for surface chemistry and heterogeneous catalysis. Angew. Chem., Int. Ed. Engl. 51 (1), 272–274 (2012).

    Article  CAS  Google Scholar 

  47. 47.

    A. Togo and I. Tanaka: First principles phonon calculations in materials science. Scr. Mater. 108, 1–5 (2015).

    CAS  Article  Google Scholar 

  48. 48.

    A. Togo: PhononDB at Kyoto University (

  49. 49.

    P. Gorai, D. Gao, B. Ortiz, S. Miller, S.A. Barnett, T. Mason, Q. Lv, V. Stevanović, and E.S. Toberer: Te design lab: A virtual laboratory for thermoelectric material design. Comput. Mater. Sci. 112, 368–376 (2016).

    Article  Google Scholar 

  50. 50.

    G. Yuan and F. Gygi: Estest: A framework for the validation and verification of electronic structure codes. Comput. Sci. Discovery 3 (1), 015004 (2010).

    Article  Google Scholar 

  51. 51.

    H.E. Pence and A. Williams: ChemSpider: An online chemical information resource. J. Chem. Educ. 87 (11), 1123–1124 (2010).

    CAS  Article  Google Scholar 

  52. 52.

    L. Lin: Materials databases infrastructure constructed by first principles calculations: A review. Mater. Perform. Charact. 4, MPC20150014 (2015).

    Article  Google Scholar 

  53. 53.

    S.P. Ong, W.D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, D. Gunter, V.L. Chevrier, K.A. Persson, and G. Ceder: Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).

    CAS  Article  Google Scholar 

  54. 54.

    S. Bahn and K. Jacobsen: An object-oriented scripting interface to a legacy electronic structure code. Comput. Sci. Eng. 4 (3), 56–66 (2002).

    CAS  Article  Google Scholar 

  55. 55.

    S. Curtarolo, W. Setyawan, G.L. Hart, M. Jahnatek, R.V. Chepulskii, R.H. Taylor, S. Wang, J. Xue, K. Yang, O. Levy, M.J. Mehl, H.T. Stokes, D.O. Demchenko, and D. Morgan, AFLOW: An automatic framework for high-throughput materials discovery, Comput. Mater. Sci. 58, 218–226 (2012).

    CAS  Article  Google Scholar 

  56. 56.

    G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari, and B. Kozinsky: AiiDA: automated interactive infrastructure and database for computational science. Comput. Mater. Sci. 111, 218–230 (2016).

    Article  Google Scholar 

  57. 57.

    A. Jain, S. Ong, W. Chen, B. Medasani, X. Qu, M. Kocher, M. Brafman, G. Petretto, G-M. Rignanese, G. Hautier, D. Gunter, and K. Persson: FireWorks: a dynamic workflow system designed for high-throughput applications. Concurr. Comput. Pract. Exp. 27, 5037–5059 (2015).

    Article  Google Scholar 

  58. 58.

    R.T. Fielding: Architectural styles and the design of network-based software architectures. Ph.D. Dissertation, University of California, Irvine, 2000.

    Google Scholar 

  59. 59.

    S.P. Ong, S. Cholia, A. Jain, M. Brafman, D. Gunter, G. Ceder, and K.A. Persson: The materials application programming interface (API): A simple, flexible and efficient API for materials data based on REpresentational state transfer (REST) principles. Comput. Mater. Sci. 97, 209–215 (2015).

    Article  Google Scholar 

  60. 60.

    T. Hastie, R. Tibshirani, and J. Friedman: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics, 2nd ed. (Springer, New York, 2009); ch. 4, pp. 80–113.

    Google Scholar 

  61. 61.

    L.M. Ghiringhelli, J. Vybiral, S.V. Levchenko, C. Draxl, and M. Scheffler: Big data of materials science: Critical role of the descriptor. Phys. Rev. Lett. 114, 105503 (2015).

    Article  CAS  Google Scholar 

  62. 62.

    K. Yang, W. Setyawan, S. Wang, M. Buongiorno Nardelli, and S. Curtarolo: A search model for topological insulators with high-throughput robustness descriptors. Nat. Mater. 11 (7), 614–619 (2012).

    CAS  Article  Google Scholar 

  63. 63.

    H. Burzlaff and H. Zimmermann: On symmetry classes of crystal structures. Acta Crystallogr., Sect. A: Found. Crystallogr. 65, 456–465 (2009).

    CAS  Article  Google Scholar 

  64. 64.

    R. Allmann and R. Hinek: The introduction of structure types into the inorganic crystal structure database ICSD. Acta Crystallogr., Sect. A: Found. Crystallogr. 63, 412–417 (2007).

    CAS  Article  Google Scholar 

  65. 65.

    G. Hautier, C.C. Fischer, A. Jain, T. Mueller, and G. Ceder: Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem. Mater. 22 (12), 3762–3767 (2010).

    CAS  Article  Google Scholar 

  66. 66.

    G. Hautier, C. Fischer, V. Ehrlacher, A. Jain, and G. Ceder: Data mined ionic substitutions for the discovery of new compounds. Inorg. Chem. 50 (17), 656–663 (2010).

    Google Scholar 

  67. 67.

    J. Behler and M. Parrinello: Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98 (14), 146401 (2007).

    Article  CAS  Google Scholar 

  68. 68.

    L. Yang, S. Dacek, and G. Ceder: Proposed definition of crystal substructure and substructural similarity. Phys. Rev. B: Condens. Matter Mater. Phys. 90 (5), 054102 (2014).

    Article  CAS  Google Scholar 

  69. 69.

    A.R. Oganov and M. Valle: How to quantify energy landscapes of solids. J. Chem. Phys. 130 (10), 104504 (2009).

    Article  CAS  Google Scholar 

  70. 70.

    O. Isayev, D. Fourches, E.N. Muratov, C. Oses, K. Rasch, A. Tropsha, and S. Curtarolo: Materials cartography: Representing and mining material space using structural and electronic fingerprints. Chem. Mater. 27, 735–743 (2014).

    Article  CAS  Google Scholar 

  71. 71.

    F. Faber, A. Lindmaa, O.A. von Lilienfeld, and R. Armiento: Crystal structure representations for machine learning models of formation energies. Int. J. Quantum Chem. 115, 1–8 (2015).

    Article  CAS  Google Scholar 

  72. 72.

    C.S. Kong, S.R. Broderick, T.E. Jones, C. Loyola, M.E. Eberhart, and K. Rajan: Mining for elastic constants of intermetallics from the charge density landscape. Phys. B 458, 1–7 (2015).

    CAS  Article  Google Scholar 

  73. 73.

    A. Seko, T. Maekawa, K. Tsuda, and I. Tanaka: Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids. Phys. Rev. B: Condens. Matter Mater. Phys. 89, 054303 (2014).

    Article  CAS  Google Scholar 

  74. 74.

    M. Schmidt and H. Lipson: Distilling free-form natural laws from experimental data. Science 324 (5923), 81–85 (2009).

    CAS  Article  Google Scholar 

  75. 75.

    A. Jain, G. Hautier, C.J. Moore, S.P. Ong, C.C. Fischer, T. Mueller, K.A. Persson, G. Ceder, and S. Ping Ong: A high-throughput infrastructure for density functional theory calculations. Comput. Mater. Sci. 50, 2295–2310 (2011).

    CAS  Article  Google Scholar 

  76. 76.

    G. Hautier, A. Jain, S.P. Ong, B. Kang, C. Moore, R. Doe, and G. Ceder: Phosphates as lithium-ion battery Cathodes: An evaluation based on high-throughput ab initio calculations. Chem. Mater. 23, 3508–3945 (2011).

    Article  CAS  Google Scholar 

  77. 77.

    A. Jain, G. Hautier, S.P. Ong, S. Dacek, and G. Ceder: Relating voltage and thermal safety in Li-ion battery cathodes: a high-throughput computational study. Phys. Chem. Chem. Phys. 17, 5942–5953 (2015).

    CAS  Article  Google Scholar 

  78. 78.

    S.P. Ong, A. Jain, G. Hautier, B. Kang, and G. Ceder: Thermal stabilities of delithiated olivine MPO4 (M = Fe, Mn) cathodes investigated using first principles calculations. Electrochem. Commun. 12 (3), 427–430 (2010).

    CAS  Article  Google Scholar 

  79. 79.

    N.A. Godshall, I.D. Raistrick, and R.A. Huggins: Relationships among electrochemical, thermodynamic, and oxygen potential quantities in lithium-transition metal-oxygen molten salt cells. J. Electrochem. Soc. 131 (3), 543 (1984).

    CAS  Article  Google Scholar 

  80. 80.

    R. Xu and D. Wunsch II: Survey of clustering algorithms, neural networks, IEEE Trans. Neural Networks 16, 645–678 (2005).

    Article  Google Scholar 

  81. 81.

    G. Gan, C. Ma, and J. Wu: Data clustering: theory, algorithms, and applications, Vol. 20 (Society for Industrial and Applied Mathematics, Philadelphia, 2007).

    Google Scholar 

  82. 82.

    B. Meredig and C. Wolverton: Dissolving the periodic table in cubic zirconia: Data mining to discover chemical trends. Chem. Mater. 26 (6), 1985–1991 (2014).

    CAS  Article  Google Scholar 

  83. 83.

    I.E. Castelli and K.W. Jacobsen: Designing rules and probabilistic weighting for fast materials discovery in the perovskite structure. Modell. Simul. Mater. Sci. Eng. 22 (5), 055007 (2014).

    Article  Google Scholar 

  84. 84.

    S.R. Broderick, H. Aourag, and K. Rajan: Classification of oxide compounds through data-mining density of states spectra. J. Am. Ceram. Soc. 94 (9), 2974–2980 (2011).

    CAS  Article  Google Scholar 

  85. 85.

    R. Andersen: Modern Methods for Robust Regression (Sage, Los Angeles, 2008).

    Google Scholar 

  86. 86.

    J.R. Chelikowsky and K.E. Anderson: Melting point trends in intermetallic alloys. J. Phys. Chem. Solids 48 (2), 197–205 (1987).

    CAS  Article  Google Scholar 

  87. 87.

    R. Tibshirani: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996).

    Google Scholar 

  88. 88.

    H. Zou and T. Hastie: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67 (2), 301–320 (2005).

    Article  Google Scholar 

  89. 89.

    P. Dey, J. Bible, S. Datta, S. Broderick, J. Jasinski, M. Sunkara, M. Menon, and K. Rajan: Informatics-aided bandgap engineering for solar materials. Comput. Mater. Sci. 83, 185–195 (2014).

    CAS  Article  Google Scholar 

  90. 90.

    S. Srinivasan and K. Rajan: “Property phase diagrams” for compound semiconductors through data mining. Materials 6 (1), 279–290 (2013).

    CAS  Article  Google Scholar 

  91. 91.

    C.S. Kong and K. Rajan: Rational design of binary halide scintillators via data mining. Nucl. Instrum. Methods Phys. Res., Sect. A 680, 145–154 (2012).

    CAS  Article  Google Scholar 

  92. 92.

    I. Toda-Caraballo, E.I. Galindo-Nava, and P.E.J. Rivera-Díaz-Del-Castillo: Unravelling the materials genome: Symmetry relationships in alloy properties. J. Alloys Compd. 566, 217–228 (2013).

    CAS  Article  Google Scholar 

  93. 93.

    W.B. Park, S.P. Singh, M. Kim, and K-S. Sohn: Phosphor informatics based on confirmatory factor analysis. ACS Comb. Sci. 150408124118005 (2015).

  94. 94.

    S. Curtarolo, D. Morgan, K. Persson, J. Rodgers, and G. Ceder: Predicting crystal structures with data mining of quantum calculations. Phys. Rev. Lett. 91 (13), 135503 (2003).

    Article  CAS  Google Scholar 

  95. 95.

    P.V. Balachandran, S.R. Broderick, and K. Rajan: Identifying the ‘inorganic gene’ for high-temperature piezoelectric perovskites through statistical learning. Proc. R. Soc. A 467, 2271–2290 (2011).

    CAS  Article  Google Scholar 

  96. 96.

    N. Cristianini and J. Shawe-Taylor: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, 2000).

  97. 97.

    G. Pilania, C. Wang, X. Jiang, S. Rajasekaran, and R. Ramprasad: Accelerating materials property predictions using machine learning. Sci. Rep. 3, 2810 (2013).

    Article  Google Scholar 

  98. 98.

    K.T. Schütt, H. Glawe, F. Brockherde, A. Sanna, K.R. Müller, and E.K.U. Gross: How to represent crystal structures for machine learning: Towards fast prediction of electronic properties. Phys. Rev. B: Condens. Matter Mater. Phys. 89, 1–5 (2014).

    Article  CAS  Google Scholar 

  99. 99.

    R. Jalem, M. Nakayama, and T. Kasuga: An efficient rule-based screening approach for discovering fast lithium ion conductors using density functional theory and artificial neural networks. J. Mater. Chem. A 2 (3), 720 (2014).

    CAS  Article  Google Scholar 

  100. 100.

    F. Pettersson, C. Suh, H. Saxen, K. Rajan, and N. Chakraborti: Analyzing sparse data for nitride spinels using data mining, neural networks, and multiobjective genetic algorithms. Mater. Manuf. Processes 24 (1), 2–9 (2009).

    CAS  Article  Google Scholar 

  101. 101.

    D. Scott, S. Manos, and P. Coveney: Design of electroceramic materials using artificial neural networks and multiobjective evolutionary algorithms. J. Chem. Inf. Model. 48, 262–273 (2008).

    CAS  Article  Google Scholar 

  102. 102.

    Y. Zhang, S. Yang, and J. Evans: Revisiting Hume-Rotherys rules with artificial neural networks. Acta Mater. 56 (5), 1094–1105 (2008).

    CAS  Article  Google Scholar 

  103. 103.

    J. Carrete, N. Mingo, S. Wang, and S. Curtarolo: Nanograined half-heusler semiconductors as advanced Thermoelectrics: An ab initio high-throughput statistical study. Adv. Funct. Mater. 24, 7427–7432 (2014).

    CAS  Article  Google Scholar 

  104. 104.

    A. Liaw and M. Wiener: Classification and regression by randomForest. R News 2 (3), 18–22 (2002).

    Google Scholar 

  105. 105.

    B. Meredig, A. Agrawal, S. Kirklin, J.E. Saal, J.W. Doak, A. Thompson, K. Zhang, A. Choudhary, and C. Wolverton: Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys. Rev. B: Condens. Matter Mater. Phys. 89 (9), 094104 (2014).

    Article  CAS  Google Scholar 

  106. 106.

    R. Bell, Y. Koren, and C. Volinsky: Chasing $1,000,000: How We Won The Netflix Progress Prize, Statistical Computing and Statistical Graphics Newsletter 18(2), 4–12 (2007).

    Google Scholar 

  107. 107.

    C.C. Fischer, K.J. Tibbetts, D. Morgan, and G. Ceder: Predicting crystal structure by merging data mining with quantum mechanics. Nature Mater. 5 (8), 641–646 (2006).

    CAS  Article  Google Scholar 

  108. 108.

    T. Fix, S-L. Sahonta, V. Garcia, J.L. MacManus-Driscoll, and M.G. Blamire: Structural and Dielectric Properties of SnTiO3, a putative ferroelectric. Cryst. Growth Des. 11, 1422–1426 (2011).

    CAS  Article  Google Scholar 

  109. 109.

    A. Jain, G. Hautier, C.J. Moore, B. Kang, J. Lee, H. Chen, N. Twu, and G. Ceder: A computational investigation of Li9M3(P2O7)3(PO4)2 (M = V, Mo) as cathodes for Li ion batteries. J. Electrochem. Soc. 159 (5), A622–A633 (2012).

    CAS  Article  Google Scholar 

  110. 110.

    Q. Kuang, J. Xu, Y. Zhao, X. Chen, and L. Chen: Layered monodiphosphate Li9V3(P2O7)3(PO4)2: A novel cathode material for lithium-ion batteries. Electrochim. Acta 56 (5), 2201–2205 (2011).

    CAS  Article  Google Scholar 

  111. 111.

    H. Chen, G. Hautier, and G. Ceder: Synthesis, computed stability and crystal structure of a new family of inorganic compounds: Carbonophosphates. J. Am. Chem. Soc. 134 (48), 19619–19627 (2012).

    CAS  Article  Google Scholar 

  112. 112.

    G. Hautier, A. Jain, H. Chen, C. Moore, SP. Ong, and G. Ceder: Novel mixed polyanions lithium-ion battery cathode materials predicted by high-throughput ab initio computations. J. Mater. Chem. 21, 17147–17153 (2011).

    CAS  Article  Google Scholar 

  113. 113.

    C. Jähne, C. Neef, C. Koo, H-P. Meyer, and R. Klingeler: A new LiCoPO4 polymorph via low temperature synthesis. J. Mater. Chem. A 1 (8), 2856 (2013).

    Article  CAS  Google Scholar 

  114. 114.

    K. Snyder, B. Raguž, W. Hoffbauer, R. Glaum, H. Ehrenberg, and M. Herklotz: Lithium copper(I) orthophosphates Li3−xCuxPO4: Synthesis, crystal structures, and electrochemical properties. Z. Anorg. Allg. Chem. 640 (5), 944–951 (2014).

    CAS  Article  Google Scholar 

  115. 115.

    E. Mosymow, R. Glaum, and R.K. Kremer: Searching for “LiCrIIPO4”. J. Solid State Chem. 218, 131–140 (2014).

    CAS  Article  Google Scholar 

  116. 116.

    L. Yang and G. Ceder: Data-mined similarity function between material compositions. Phys. Rev. B: Condens. Matter Mater. Phys. 88, 224107 (2013).

    Article  CAS  Google Scholar 

  117. 117.

    M.W. Gaultois, A.O. Oliynyk, A. Mar, T.D. Sparks, G.J. Mulholland, and B. Meredig: A recommendation engine for suggesting unexpected thermoelectric chemistries. 7, (2015), 7arXiv: 1502.07635.

  118. 118.

    A. Seko, A. Togo, H. Hayashi, K. Tsuda, L. Chaput, and I. Tanaka: Prediction of Low-Thermal-Conductivity Compounds with First-Principles Anharmonic Lattice-Dynamics Calculations and Bayesian Optimization, Phys. Rev. Lett. 115 (20), 205901 (2015).

    Article  CAS  Google Scholar 

  119. 119.

    H. Turner and D. Firth: Bradley-Terry models in R: The BradleyTerry2 Package. J. Stat. Software 48 (9), 1–21 (2012).

    Article  Google Scholar 

  120. 120.

    R.A. Bradley and M.E. Terry: Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39, 324–345 (1952).

    Google Scholar 

  121. 121.

    J. Robertson and S.J. Clark: Limits to doping in oxides. Phys. Rev. B 83 (7), 075205 (2011).

    Article  CAS  Google Scholar 

  122. 122.

    D.O. Scanlon and G.W. Watson: On the possibility of p-type SnO2. J. Mater. Chem. 22 (48), 25236 (2012).

    CAS  Article  Google Scholar 

  123. 123.

    A. Zunger: Practical doping principles. Appl. Phys. Lett. 83 (1), 57 (2003).

    CAS  Article  Google Scholar 

  124. 124.

    H. Kawazoe, M. Yasukawa, and H. Hyodo: P-type electrical conduction in transparent thin films of CuAlO2. Nature 389, 939–942 (1997).

    CAS  Article  Google Scholar 

  125. 125.

    S. Sheng, G. Fang, C. Li, S. Xu, and X. Zhao: p-type transparent conducting oxides. Phys. Status Solidi A 203 (8), 1891–1900 (2006).

    CAS  Article  Google Scholar 

  126. 126.

    A. Kudo, H. Yanagi, H. Hosono, and H. Kawazoe: SrCu2O2: A p-type conductive oxide with wide band gap. Appl. Phys. Lett. 73 (2), 220 (1998).

    CAS  Article  Google Scholar 

  127. 127.

    G. Trimarchi, H. Peng, J. Im, A. Freeman, V. Cloet, A. Raw, K. Poeppelmeier, K. Biswas, S. Lany, and A. Zunger: Using design principles to systematically plan the synthesis of hole-conducting transparent oxides: Cu3VO4 and Ag3VO4 as a case study. Phys. Rev. B 84 (16), 165116 (2011).

    Article  CAS  Google Scholar 

  128. 128.

    A. Walsh and J.L.F. Da Silva, S-H. Wei: Multi-component transparent conducting oxides: Progress in materials modelling. J. Phys.: Condens. Matter 23 (33), 334210 (2011).

    Google Scholar 

  129. 129.

    G. Hautier, A. Miglio, G. Ceder, G-M. Rignanese, and X. Gonze: Identification and design principles of low hole effective mass p-type transparent conducting oxides. Nat. Commun. 4, 2292 (2013).

    Article  CAS  Google Scholar 

  130. 130.

    H. Peng and S. Lany: Semiconducting transition-metal oxides based on d$5 cations: Theory for MnO and Fe2O3. Phys. Rev. B: Condens. Matter Mater. Phys. 85 (85), 201202 (2012).

    Article  CAS  Google Scholar 

  131. 131.

    S. Arlot and A. Celisse: A survey of cross-validation procedures for model selection. Stat. Surveys 4, 40–79 (2010).

    Article  Google Scholar 

Download references


This work was intellectually led by the Materials Project (DOE Basic Energy Sciences Grant No. EDCBEE). Work at the Lawrence Berkeley National Laboratory was supported by the U.S. Department of Energy Office of Science, Office of Basic Energy Sciences Department under Contract No. DE-AC02-05CH11231. GH acknowledges financial support from the European Union Marie Curie Career Integration (CIG) grant HT4TCOs PCIG11-GA-2012-321988. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility.

Author information



Corresponding author

Correspondence to Anubhav Jain.

Supplementary Material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jain, A., Hautier, G., Ong, S.P. et al. New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships. Journal of Materials Research 31, 977–994 (2016).

Download citation