2.3.2. Coefficient of Determination

The Coefficient of Determination (CoD) can be used to assess the approximation quality of a polynomial regression model. This measure is defined as the relative amount of variation explained by the approximation (Montgomery and Runger 2003)

(2–9)

where SST is equivalent to the total variation of the output Y, SSR represents the variation due to the regression, and SSE quantifies the unexplained variation,

(2–10)

If the CoD is close to one, the polynomial approximation represents the support point values with small errors. However, the polynomial model would fit exactly through the support points, if their number is equivalent to the number of coefficients p. In this case, the CoD would be equal to one, independent of the true approximation quality. In order to penalize this over-fitting, the adjusted Coefficient of Determination was introduced (Montgomery and Runger 2003)

(2–11)

However, the over-estimation of the approximation quality cannot be avoided completely.

Figure 2.5: Subspace Plot and Convergence of the CoD Measures

Subspace Plot and Convergence of the CoD Measures

Subspace plot of the investigated nonlinear function (Equation 2–12) and convergence of the CoD measures with increasing number of support points.

In order to demonstrate this statement, an investigation of a nonlinear analytical function is performed. The function of five independent and uniformly distributed input variables reads as follows

(2–12)

where the contributions of the five inputs to the total variance are X1: 18.0%, X2: 30.6%, X3: 64.3%, X4: 0.7%, X5: 0.2%. This means, that the three variables, X1, X2 and X3, are the most important.

In Figure 2.5: Subspace Plot and Convergence of the CoD Measures, the convergence of the standard CoD of linear and quadratic response surfaces is shown, where a strong over-estimation of the approximation quality can be noticed, when the number of samples is relatively small. Even the adjusted CoD shows a similar behavior. This fact limits the CoD to cases where a large number of support points compared to the number of polynomial coefficients is available. However, in industrial applications this is often not the case. Another disadvantage of the CoD measure is its limitation to polynomial regression. For other local approximation models, like interpolating Kriging, this measure may be equal or close to one, however the approximation quality may be poor.