Literature-ML-Validation Automation in Ecosystem of a HEA Database

Monday, September 12, 2022: 4:20 PM
Convention Center: 273 (Ernest N. Morial Convention Center)
Mr. Adam M. Krajewski , The Pennsylvania State University, University Park, PA
Prof. Zi-Kui Liu , The Pennsylvania State University, University Park, PA
The quality of materials design has always been dependent on the availability and quality of starting data. In recent years, advances in machine learning further complicated the task of merging data from many sources into a useful, homogeneous structure. In this work, we show an implementation of an automated cyclic data ecosystem that alleviates many of the commonly found challenges, is highly efficient and can be transferred to different types of materials with minor data ontology adjustments.

ULtrahigh TEmperature Refractory Alloys (ULTERA) database, developed under the ARPA-E's ULTIMATE program, is focused on high entropy alloys (HEAs). It's main purpose is to automate the integration of data from sources such as literature extraction (manual and natural language processing), generative modeling of hypothetical HEAs, predictive modeling, experimental or computational validations. Furthermore, it connects a wide range of data sources, including manual collection by researchers, external open databases, and contributions from our industry partners. Merging of the data is done in real-time, fully automatically, on the cloud, allowing any project component to operate on the best available dataset. Thus, at any given time, generative modeling is done on the best starting dataset, and experiments/simulations can be run on the most likely candidate materials.