2.2.2. Coefficient of Correlation

The coefficient of correlation is the standardized covariance between two random variables X and Y

(2–4)

where COV(X, Y) is the covariance and σ is the standard deviation. This quantity, known as the linear correlation coefficient, measures the strength and the direction of a linear relationship between two variables. It can be estimated from a given sampling set as follows

(2–5)

where N is the number of samples, xi and yi are the sample values, and and are the estimates of the mean value and the standard deviation, respectively. The estimated correlation coefficient becomes more inaccurate, as its value is closer to zero, which may cause a wrong deselection of apparently unimportant variables.

If both variables have a strong positive correlation, the correlation coefficient is close to one. For a strong negative correlation ρ is close to minus one. The squared correlation coefficient can be interpreted as the first order sensitivity index by assuming a linear dependence. The drawback of the linear correlation coefficient is its assumption of just a linear dependence. Based on the estimated coefficients only, it is not possible to decide on the validity of this assumption. Correlation coefficients, which assume a higher order dependence or use rank transformations solve this problem only partially. Additionally, often interactions between the input variables are important. These interactions can not be quantified with the linear and higher order correlation coefficients.

We can summarize that although the correlation coefficient can be simply estimated from a single sampling set, it can only quantify first order effects with an assumed dependence without any quality control of this assumption.