Draft:Invariant coordinate selection
![]() | Draft article not currently submitted for review.
This is a draft Articles for creation (AfC) submission. It is not currently pending review. While there are no deadlines, abandoned drafts may be deleted after six months. To edit the draft click on the "Edit" tab at the top of the window. To be accepted, a draft should:
It is strongly discouraged to write about yourself, your business or employer. If you do so, you must declare it. Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Last edited by Bearcat (talk | contribs) 7 days ago. (Update) |
Invariant coordinate selection (ICS) is a multivariate statistical technique used to identify interesting structures in high-dimensional data. It is commonly applied in outlier detection, independent component analysis (ICA), and robust dimension reduction tasks. ICS generalizes Principal Component Analysis (PCA) by simultaneously diagonalizing two scatter matrices, typically a covariance matrix and a higher-order or robust scatter matrix.
Overview
[edit]ICS was introduced in the 1990s as a method for identifying affine-invariant directions in multivariate data. Unlike PCA, which is sensitive to linear transformations and only captures second-order structure, ICS can reveal non-Gaussian features and is invariant under full-rank affine transformations.
The core idea behind ICS is to find a linear transformation that simultaneously diagonalizes two scatter matrices, thereby revealing directions of interest based on their statistical properties.
Intuition
[edit]The intuition behind Invariant Coordinate Selection (ICS) lies in comparing two different measures of multivariate scatter to detect departures from ellipticity in the data. A scatter matrix is a generalization of the covariance matrix that captures the spread or variability of multivariate data, and different scatter matrices emphasize different aspects of the data distribution. In ICS, one typically uses the regular covariance matrix along with a second scatter matrix that is either robust to outliers or sensitive to higher-order moments.
In purely elliptical distributions (e.g., multivariate normal), all directions look essentially the same in terms of shape, and the scatter matrices are proportional, revealing no particularly informative directions. However, when the data contain non-elliptical structures—such as clusters, skewness, or outliers—different scatter measures react differently. By simultaneously diagonalizing two such scatter matrices, ICS identifies directions in which these discrepancies are most pronounced. These directions are expected to correspond to the sources of non-Gaussianity or structural deviations in the data, making ICS a powerful tool for dimension reduction, clustering, and outlier detection.
Methodology
[edit]Let the data consist of a matrix , where each row is an observation in -dimensional space.
- Choose two scatter matrices, typically:
- (the sample covariance matrix)
- is a robust or higher-order scatter matrix (e.g., one based on kurtosis or fourth moments.
- Solve the generalized eigenvalue problem:
The solutions to this problem provide the eigenvalues and eigenvectors . The eigenvectors form the transformation matrix , which defines the new coordinates: .
The matrix contains the ICS components, which are invariant under affine transformations of the original data.
Implementation
[edit]The steps for implementing ICS are:
- Center the data: subtract the sample mean from each observation.
- Compute scatter matrices:
- ## depends on the chosen method; for example, the fourth-moment scatter matrix is:
- Simultaneous diagonalization: solve the generalized eigenvalue problem.
- Select components: retain those that reveal clusters, outliers, or non-Gaussianity.
ICS can be implemented in various statistical software environments. In R, the ICS
package provides ready-to-use functions for performing the analysis.
Applications
[edit]ICS has been used in several areas:
- Outlier detection: Identifies multivariate outliers not visible with PCA.
- Independent Component Analysis (ICA): ICS can serve as an alternative or pre-processing step to ICA.
- Financial data analysis: Detecting regime changes or anomalies.
- Bioinformatics: Reducing noise in gene expression datasets.
Advantages and limitations
[edit]Advantages
[edit]- Invariant under affine transformations.
- Can detect higher-order structures, not just variance.
- Robust versions improve performance in contaminated or heavy-tailed data.
Limitations
[edit]- Choice of scatter matrices significantly affects results.
- Interpretation of invariant components may be less intuitive than PCA.
- More computationally demanding.
See also
[edit]- Principal component analysis
- Independent component analysis
- Robust statistics
- Multivariate statistics
References
[edit]- Tyler, D. E., Critchley, F., Dümbgen, L., & Oja, H. (2009). Invariant co-ordinate selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 549–592. [1]
- Nordhausen, K., Ilmonen, P., Mandal, A., & Oja, H. (2011). ICS and ICA with Robust Scatter Matrices. Canadian Journal of Statistics, 39(3), 398–416. [2]