Key terminology in Granta MI

A glossary of general machine learning concepts and the corresponding terminology and objects in Granta MI.

Name Description

algorithm

Within MI Machine Learning, this always refers to a machine learning algorithm - a computer algorithm whose predictions or results improve through exposure to data.

neural network

Type of machine learning algorithm based on the behavior of neurons in living organisms. The Alchemite algorithm used by MI Machine Learning is a neural network.

model

Within MI Machine Learning, model generally means a specific feature matrix, engine, and the Granta MI data (to be) used as the training and test dataset. It can refer to a model where one or more of these is still to be defined, untrained models, and trained models.

untrained model

A model which hasn't been sent to the machine learning engine to be trained. Partially-defined models are also listed as untrained.

trained model

An algorithm trained on a specific feature matrix and training/test dataset.

cloned model

An untrained copy of a trained or untrained model.

engine

Third-party machine learning algorithm and software able to communicate with Granta MI.

hyperparameters

Settings for the machine learning algorithm or engine, typically relating to algorithm architecture or the training process. These are not currently accessible through MI Machine Learning; the Alchemite engine uses default or pre-optimized values as recommended by Intellegens.

feature

Attribute, property or category included in a model. Within MI Machine Learning, these variables are typically a material property.

feature matrix

Specially formatted input data for a machine learning algorithm, where each column corresponds to a feature of the model, and each row a single sample or set of measurements.

Also called: Dataset

feature matrix column

Each column of a feature matrix corresponds to a feature of the model (one of the attributes, properties or categories whose relationships the model will explore). Columns are axes in the model's parameter space.

In MI Machine Learning, each column is created from an attribute stored in Granta MI plus a transform specifying how to convert the attribute's value into a data point in the feature matrix.

feature matrix row

Each row of a feature matrix represents a real-life sample or example, which is also a single point in the model's parameter space.

In MI Machine Learning, each record added to the training/test dataset corresponds to one or more rows (depending on whether the record contains one-to-one or one-to-many links in attributes included in the Feature Set).

Input, input feature

Represents a property which is already known or controllable, and can be used to predict other properties. For example, process parameters or AM build data.

Also called: Independent feature, Descriptor, Predictor

Output, output feature

Represents a property which can be predicted using, or is influenced by, input features. For example, test results or final properties of an AM part.

Also called: Dependent feature, Result

variable feature

Not currently supported in MI Machine Learning. Some features can be used as input or output features by a machine learning algorithm, depending on the data available.

Matrix

A set of tables, attributes and records stored in Granta MI, plus the transforms required to convert them into a true feature matrix.

primary table

The Granta MI table containing the records you want to include in your Matrix. All attributes you want to include should be in this table, or tables linked to it.

Feature Set

The set of Granta MI attributes and their transforms which form the columns of the Matrix (and will become the columns of the feature matrix sent to the engine).

purpose

Whether the feature defined in the Feature Set will be an Input or Output.

transform

The transform you want to apply to an attribute's value when the Matrix is converted into a feature matrix. For example, you may want to use the average value or the maximum/minimum value of a range attribute.

training dataset

The initial feature matrix used to define a model's parameter space and "train" the machine learning algorithm. The size and quality of the training dataset directly impacts the quality of the model's predictions.

In MI Machine Learning, a single feature matrix is submitted to the engine, and split into training and test datasets.

Also called: Input dataset, training feature matrix

test dataset

After training, a second input dataset is used to validate the newly-created model. This dataset is typically much smaller than the training dataset, and does not change the model.

In MI Machine Learning, a single feature matrix is submitted to the engine, and split into training and test datasets.

model range

In MI Machine Learning, the model range for a feature is the range of the data the model was trained on (the maximum and minimum values in that column of the training dataset). Outside this range, the model relies on extrapolation and predictions will not be as reliable.

Model Quality

The Alchemite engine includes an estimate of the ability of a trained model to predict output features solely from input features. Model Quality is the median R2 value of each Output in the test dataset versus the same Output generated by the model.

Also called: Network error

Correlation

The correlation between (R2 value of) two features of a trained model.

correlation matrix

All correlations (R2 values) between all features in a trained model.

Also called: Correlation analysis, Sensitivity analysis, Importance

Criteria Set

A model-specific set of constraints on input features and target values (goals) for output features, which defines a process optimization problem to be sent to the engine.

unoptimized Criteria Set

A Criteria Set which hasn't been sent to the engine to be solved.

optimized Criteria Set

A Criteria Set plus the results returned by the engine. Predicted values meet the criteria, but do not represent maxima or minima of the parameter space (the input values of the system are optimized to produce a specified result, but the output values are not mathematically optimized).

Weight

Relative importance of the specified Goals in a Criteria Set.

Probability

Part of a Criteria Set's results. An estimate from the model of the overall likelihood of achieving the stated output feature values from the stated input feature values.

Also called: Likelihood, Cost function

Uncertainty

Part of a Criteria Set's results. Each feature has a predicted value and an uncertainty of 1 standard deviation.

Also called: Erro