Principal Components Analysis
A. PCA without “robustness” or scaling is a simple technique (what rotation of the data will appear to maximize the variance?) and the answer is unambiguous. However when scaling is introduced there are many options and most other statistics tools are very complex to use. In ioGAS the only scaling we allow is z-score scaling.
What is happening is that the data are ‘stretched’ along the principal axes (proportional to SD) before the ‘rotation’ of the PCA. When data is z-scaled it is a mathematical fact that the covariance matrix (used by PCA) becomes a correlation matrix. We have taken a simple approach as described in the publication Campbell, N.A. 1980. Robust procedures in multivariate analysis. I: Robust covariance estimation. Appl. Statist., 29, 231-237. What is happening here is that ‘outliers’ are down-weighted in the calculations of the covariance matrix (this weighted covariance is fed back into the mahalanobis distance calculation that produces the weights, and this is repeated until convergence). This results in a final covariance matrix that can be used to perform a PCA (or other things) that “ignores” the outliers.
In version 3.4 step one is to z-scale the data, step two is to compute the robust covariance matrix and step three is to use this matrix for the PCA. The result of this is a covariance matrix that is not a correlation matrix. The scaling was described as ‘z scaling’ rather than ‘correlation matrix’ in the robust user interface. In version 4.0 the processing order was changed so that the robustness algorithm is used to down-weight the outlying data as the first step. The second step is then to perform a correlation based PCA in the same way as the non-robust case. This is a neater combination of steps and the terms ‘correlation matrix’ and ‘z-scaling’ can be used interchangeably in all cases.
A: This message is mainly in place to prevent users from performing a PCA calculation and then running it again without realising that the Selected Variables dialog has been updated with the PCA columns from the previous calculation. Therefore the new PCA calculation would be using the previous PCA columns as the input columns. Go back into the Select Variables dialog and make sure that the correct variables are selected and then continue.
This message generally indicates that there is a variable in the selection that has zero variance, ie. at detection limit.