Intellegens Alchemite engine
Alchemite is a deep learning method that builds a machine learning model from a sparse and noisy dataset, such as the experimental data typically found in the materials or manufacturing sectors.
It uses a deep and iterative multiple-imputation method to bridge gaps in the dataset and build a model where most other machine learning methods would fail, require significant data preparation, or impose limitations on the outputs.
In addition to handling sparse, noisy data, Alchemite uses an advanced method (non-parametric probability distributions) to quantify uncertainty when imputing or predicting data. This provides valuable guidance on data quality, and which predictions are likeliest to succeed.
Key features of the engine
- Exploit sparse, noisy data where other machine learning methods fail.
- Quickly generate a model with little need for data cleaning or manual input of starting assumptions.
- Build a model that encompasses all available information, finding inter-relationships missed by traditional approaches that treat design, test and process data separately.
- Uncertainty quantification allows you to make rational decisions about where to focus time and effort.
- Optimize multiple targets to design practical processes that will lead to commercially successful solutions.
Overview of the algorithm
The Alchemite algorithm takes a neural network and applies an iterative multiple-imputation method. The feature matrix is split into training and test datasets. The latter is used to validate the model once it reaches convergence.
- Input a complete set of descriptors for the system (input features, for example, process parameters) and a sparse set of endpoints (output features, for example, material properties).
- Estimate the missing values. The algorithm starts by using the average of all other values for that property in the training dataset.
- Use all values to train a standard neural network, with the parameters of the neural network considered as hyperparameters of the model. The neural network takes all input and output feature values as inputs, and predicts all output feature values, taking care not to use a given property value to predict itself. This neural network identifies both input-output correlations as well as output-output correlations, to train an accurate model and guide extrapolation.
- Train multiple neural networks with different weights on each row of the training data, using a bootstrapping approach. The average prediction of these neural networks is used as the model prediction, with the standard deviation between predictions delivering an estimate of uncertainty.
- Iteratively predict values to impute gaps in the original sparse endpoint data using the trained neural network, and train an improved neural network using these predicted values in place of the original estimates.
- At each iterative pass, predictions are 'softened' by combining with those from the previous pass. The iterative improvement of the neural network model and imputed values is repeated until convergence is reached.
- Validate results using the test dataset. The algorithm tests how accurately the model can predict the values in the test set (as illustrated below) to select optimal model hyperparameters such as the size of the network.
