How Granta MI converts data into a feature matrix

When you send a model to the engine for training, the Matrix of attributes, transforms and records you have selected is converted into a true feature matrix.

The conversion process is as follows:

A feature matrix column is created for each feature (attribute, transform and purpose) in the model's Feature Set, including one for Purpose (whether the feature is an Input or Output).
Starting from the primary table, each record's links are identified.
For each unique "chain" of links:
- A row is created in the feature matrix.
- The row is populated with data from all attributes that exist in tables in that link chain. Transforms are applied to the attribute value.

Examples - Primary table only

When all attributes in the Feature Set belong to a single table, there are no linked tables and so no link chains. Each record in the Matrix corresponds to a single row in the feature matrix.

Examples - Multiple tables

When the primary table is linked to other tables, each record in it corresponds to multiple rows in the feature matrix. The columns are based on attributes from all linked tables.

For example, if the primary table is linked to a second table containing five linked records, each record in the primary table would be converted into five rows, each populated with data from the primary table record and data from one of the five linked records.

Multiple tables in series

If a third table is added to the example, linked to the second, our feature matrix will now also have rows corresponding to each record in the third table. If that table has three linked records, each of the five rows generated in the two-table example above would become three rows, each populated with the same data from the primary and second tables plus data from each of the three linked records in the third. All link chains include all three tables, therefore rows in the feature matrix contain data from all three tables.

Multiple tables in parallel

If the third table were linked directly to the primary table instead, there are now two 'sets' of link chains. One includes the second table but not the third, and vice versa.

Each primary table record in our example would then correspond to 8 rows in the feature matrix. One for each of 5 linked records in the second table, and one for each of the 3 linked tables in the third. The rows would only contain data from the tables in their link chain, and columns for attributes belonging to the other table would be left empty.