Jump to content

Multiple factor analysis

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Statistix35 (talk | contribs) at 12:30, 27 May 2014 (Created page with '{{Userspace draft|source=ArticleWizard|date={{Subst:CURRENTMONTHNAME}} {{Subst:CURRENTYEAR}}}} {{Subst:Nul|<==do not change this line, it will set the date autom...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)


Multiple Factor Analysis (MFA) new article content ... Introduction The Multiple Factor Analysis is a factorial method devoted to the study of tables in which a group of individuals is described by a set of variables (quantitative and / or qualitative) structured in groups. It may be seen as an extension of: • the Principal component analysis (PCA) when variables are quantitative, • the Multiple correspondence analysis (MCA) when variables are qualitative, • the Factorial Analysis of Mixed Data (FAMD) when the active variables belong to the two types.

Introductory Example

Why introduce several groups of variables active in the same factorial analysis?

Data

Let us consider the case of quantitative variables, that is to say, within the framework of the PCA. An example of data from ecological research provides a useful illustration. There are, for 72 stations, two types of measurements.

  1. The abundance-dominance coefficient of 50 plant species (coefficient ranging from 0 = the plant is absent, to 9 = the species covers more than three-quarters of the surface). The whole set of the 50 coefficients defines the floristic profile of a station.
  2. Eleven pedological measurements (Pedology = soil science): particle size, physical, chemistry, etc. The set of these eleven measures defines the pedological profile of a station.

Three possible analyses

PCA of flora (pedology as supplementary) This analysis focuses on the variability of the floristic profiles. Two stations are close one another if they have similar floristic profiles. In a second step, the main dimensions of this variability (i.e. the principal components) are related to the pedological variables introduced as supplementary.

PCA of pedology (flora as supplementary) This analysis focuses on the variability of soil profiles. Two stations are close if they have the same soil profile. The main dimensions of this variability (i.e. the principal components) are then related to the abundance of plants.

PCA of the two groups of variables as active One may want to study the variability of stations from both the point of view of flora and soil. In this approach, two stations should be close if they have both similar flora 'and' similar soils.

Balance between groups of variables

Methodology

The third analysis of the introductory example implicitly assumes a balance between flora and soil. However, in this example, the mere fact that the flora is represented by 50 variables and the soil by 11 variables implies that the PCA with 61 active variables will be influenced mainly by the flora at least on the first axis). This is not desirable: there is no reason to wish one group play a more important role in the analysis. The core of MFA is based on a factorial analysis (PCA in the case of quantitative variables, MCA in the case of qualitative variables) in which the variables are weighted. These weights are identical for the variables of the same group (and vary from one group to another). They are such that the maximum axial inertia of a group is equal to 1: in other words, by applying the PCA (or, where applicable, the MCA) to one group with this weighting, we obtain a first eigenvalue equal to 1. To get this property, MFA assigns to each variable of group a weight equal to the inverse of the first eigenvalue of the analysis (PCA or MCA according to the type of variable) of the group . Formally, noting the first eigenvalue of the factorial analysis of one group , the MFA assigns weight for each variable of the group .

Balancing maximum axial inertia rather than the total inertia (= the number of variables in standard PCA) gives the MFA several important properties for the user. More directly, its interest appears in the following example.

Example

Let two groups of variables defined on the same set of individuals.

  1. The group 1 is composed of two uncorrelated variables A and B.
  2. The group 2 is composed of two variables {C1, C2} identical to the same variable C uncorrelated with the first two.

This example is not completely unrealistic. It is often necessary to simultaneously analyse multi-dimensional and (quite) one-dimensional groups.

Each group having the same number of variables has the same total inertia.

In this example the first axis of the PCA is almost coincident with C. Indeed, in the space of variables, there are two variables in the direction of C: group 2, with all its inertia concentrated in one direction, influences predominantly the first axis. For its part, group 1, consisting of two orthogonal variables (= uncorrelated), has its inertia uniformly distributed in a plane (the plane generated by the two variables) and hardly weighs on the first axis.

Numerical Example


References