INVITED: Accelerating the Discovery of Solid State Materials: From Traditional to Machine-Learning Approaches

Wednesday, October 28, 2020: 11:20 AM
Prof. Anton Oliynyk , Manhattan College, Riverdale, NY
Traditional approaches to search for new materials (systematic investigations, serendipitous discoveries, or, data-driven rational strategies) can have high time and cost risks. In order to discover, develop, and deploy new materials faster and efficiently we are applying high-throughput machine-learning methods to predict the structures of new compounds and optimize properties of materials. Typically, for machine-learning models millions or billions of data points are used for training a model. For example sorting algorithms used at the Large Hadron Collider, dialogue algorithms used by Facebook and recommendation algorithms implemented by Youtube. They rely significantly on the large number of training data points, and have the advantage of employing algorithms that benefit from a large dataset, such as artificial neural networks. In the field of materials informatics, a typical size of training data is limited to hundreds or thousands of data points. Often, the problem is solved with an assumption that the data in a database is reliable, but, this is not always the case, and generic approach is not suitable for this type of problem. Feature selection and data cleansing are essential steps in creating a machine-learning model with materials data. Having an extensive expertise in applying machine-learning methods with subsequent experimental validation for prediction of novel unexpected thermoelectrics, optimization of organic photovoltaics screening for efficient phosphors, accelerating discovery of high hardness materials, and classifying crystal structures of inorganic compounds. The presentation will focus on how to make machine learning and data science work in the chemistry and materials domain. Methodology on data preparation, sanitizing, and feature selection will be discussed.