Oblique Rotation Methods in Factor Analysis

Aus Teachwiki
Wechseln zu: Navigation, Suche

Oblique Rotation Methods in Factor Analysis


The paper at hand aims to give an overview on oblique rotation methods, and especially on the direct oblimin rotation procedure. It was assembled out of multiple contributions to the mentioned topics and should be seen as a mere attempt of a summary, rather than personal contribution to statistical literature. For a full list of sources see the list of references.


(mainly from Abdi, 2003)

The different methods of factor analysis first extract a set a factors from a data set. These factors are almost always orthogonal and are ordered according to the proportion of the variance of the original data that these factors explain. In general, only a (small) subset of factors is kept for further consideration and the remaining factors are considered as either irrelevant or nonexistent (i.e, they are assumed to reflect measurement error or noise).

In order to make the interpretation of the factors that are considered relevant, the first selection step is generally followed by a rotation of the factors that were retained. Two main types of rotation are used: orthogonal when the new axes are also orthogonal to each other and oblique when the new axes are not required to be orthogonal. Because the rotations are always performed in a subspace (the so-called factor space), the new axes will always explain less variance than the original factors (which are computed to be optimal), but obviously the part of variance explained by the total subspace after rotation is the same as it was before rotation (only the partition of the variance has changed). Because the rotated axes are not defined according to a statistical criterion, their aim is to facilitate the interpretation.

Before proceeding further, it is important to stress that because the rotations always take place in a subspace (i.e., the space of the retained factors), the choice of this subspace strongly influences the result of the rotation. Therefore, in the practice of rotation in factor analysis, it is strongly recommended to try several sizes for the subspace of the retained factors in order to assess the robustness of the interpretation of the rotation.

The Idea of a Simple Structure

(mainly from Abdi, 2003)

Most of the rationale for rotating factors comes from Thurstone (1947) and Cattell (1978) who defended its use because this procedure simplifies the factor structure and therefore makes its interpretation easier and more reliable (i.e., easier to replicate with different data samples).

Thurstone suggested five criteria to identify a simple structure. According to these criteria, still often reported in the literature, a matrix of loadings (where the rows correspond to the original variables and the columns to the factors) is simple if:

  1. each row contains at least one zero;
  2. for each column, there are at least as many zeros as there are columns (i.e., number of factors kept);
  3. for any pair of factors, there are some variables with zero loadings on one factor and large loadings on the other factor;
  4. for any pair of factors, there is a sizable proportion of zero loadings;
  5. for any pair of factors, there is only a small number of large loadings.<br\><br\>

A Simple-Structure Solution vs. an Unifactor Pattern <br\>

Tabelle 1.jpg<br\>
x denotes a high factor loading

Rotations of factors can (and used to) be done graphically, but are mostly obtained analytically and necessitate to specify mathematically the notion of simple structure in order to implement it with a computer program. To make the discussion more concrete, the rotations will be described within the principal component analysis (PCA) framework.

Oblique Rotations

(mainly from Basilevsky, 1994)

In oblique rotations the assumptions of independent factors is relaxed and the new axes are free to take any position in the factor space, but the degree of correlation allowed among factors is, in general, small because two highly correlated factors are better interpreted as only one factor. Oblique rotations relax the orthogonality constraint in order to gain simplicity in the interpretation. They were strongly recommended by Thurstone, but are used more rarely than their orthogonal counterparts. Even in the case of cluster orthogonality, an oblique rotation will still yield orthogonal (or approximately orthogonal) axes as a special case. Oblique rotations make possible a particularly straightforward interpretation of PC factors, in terms of the variables forming the clusters that represent the components.

Let Eq. (1) be a PC model with respect to a basis of oblique components  G_1,…, G_r:

X = G\Lambda^' + \epsilon \ (1)

Where G is a matrix of obliquely rotated standardized factor scores, Λ is a matrix of factor loadings after rotation (of the initial correlation loading coefficients A’), and ε stands for random error.

Then the following properties hold in the r < n space:

i.   X'X = \Lambda\Theta\Lambda'

where \Theta = G'G = T'T is the correlation matrix of the oblique components, and T is the oblique transformation matrix.

ii. \overline{ X'} \overline{ X } = G \Lambda'\Lambda G'.
iii. \Lambda' = (G'G)^{-1}G'X and  X = G\Lambda' = P_GX

where P_G is the matrix of latent vectors, which is idempotent and symmetric.


i. From Eq. (1) we have

X'X = (G \Lambda' + \epsilon)' (G \Lambda' + \epsilon)	

= \Lambda G'G \Lambda' + \Lambda G'\epsilon + \epsilon' G \Lambda' +\epsilon'\epsilon

= \Lambda\Theta\Lambda' + \epsilon'\epsilon

where G’ε = ε’G = 0 owing to the initial orthogonality of the PCs. Since ε’ε is

the matrix of residual errors, \overline{ X }\overline{ X } = ΛΦΛ’ represents the variance/covariance structure accounted for by the first r components.

ii. X’X = (GΛ’ + ε) (GΛ’ + ε)’ = GΛ’ΛG’ + ε’ε

and the explained proportion is then XX’ = GΛ’ΛG’.

iii. Premultiplying Eq. (1) by G we have

G’X = G’GΛ’ + G’ε = G’GΛ’

so that

\Lambda' = (G'G)^{-1}G'X (2)

Is the (r x n) matrix of regression coefficients of the original random variables on r oblique PCs Premultiplying Eq. (2) by G we then obtain

G \Lambda' = (G'G)^{-1}G'X = PGX

=\overline{ X }\,\, (3)

the predicted values of X.

Equation (2) yields an important relationship between the oblique correlation loading coefficients, component correlations, and correlations between the oblique components and observed random variables. Since components are no longer orthogonal, the correlation loadings obtained from the correlation matrix need not be equal to the correlation coefficients between the variates and the PCs. In the psychometric literature the former are known as “pattern” (regression loading coefficients) and the latter as “structure” (correlation loading coefficients).

From Eq. (2) we have

ΦΛ’ = G’X (3a)

where Φ is the (r x r) component correlation matrix, Λ’ is a (n x r) matrix of regression loading coefficients (pattern matrix), i.e. coordinates of the variables X with respect to the oblique components G. G’X is the correlation matrix of the variables and oblique components (structure matrix). For the sake of simplicity it is assumed, that both variables and components are standardized to unit length. In general, the oblique PC model can be interpreted in terms of least squares regression theory where both the coefficients and the independent variables (i.e., the components) are to be estimated from the data. However, since the elements of the matrix of factor loadings Λ do not necessarily lie in the interval [-1, 1], component identification is usually made easier by consulting the structure matrix G’X. As is the case for orthogonal rotations, the oblique transformation matrix T is arbitrary from a general mathematical point of view. There exist an infinite number of matrices which can rotate orthogonal components to an oblique form. Since the aim is to rotate the PCs so as to obtain the clearest identification possible (given the data), a further optimization criterion is necessary. In the subsequent section the frequently used oblimin criterion is presented.

The Way to Oblimin and the Special Case of Direct Oblimin

(mainly from Clarkson and Jennrich, 1988; and Harman, 1976)

The term oblimin describes a class of methods, involving oblique factors and a minimizing criterion. The starting point of rotation is the matrix Λ (p x m) of factor loadings with components \lambda_{ir}.

The initial development of analytic rotation in the oblique case was done by Caroll (1953) who introduced the quartimin criterion:

QMIN =\sum_{r\neq s} \sum_i \lambda_{ir}^2 \lambda_{is}^2 \,\,. (4)

In Caroll (1953), and in other early work on oblique rotation, the \lambda_{ir} represent covariances between the observed variables and what are called refrence factors (see, e.g., Harman, 1976, p. 270). The quartimin criterion is a natural generalization of the quartimax criterion to the oblique case. As an oblique generalization to the varimax criterion Kaiser (1958) proposed:

CMIN = \sum_{r\neq s}(\sum_{i}\lambda_{ir}^2 \lambda_{is}^2 - {1 \over n}\sum_{i}\lambda_{ir}^2 \sum_{i} \lambda_{is}^2) \,\,. (5)

which Caroll (1957) called the covarimin criterion.

Generalizing both these criteria, Caroll (1957, 1960) generalized the simple sum of the criteria “quartimin” and “covarimin” and permitted varying weights for each component. Thereby he introduced the oblimin family (general oblimin):

OBMIN = \sum_{r\neq s}(n \sum_{i}\lambda_{ir}^2 \lambda_{is}^2 - \gamma  \sum_{i}\lambda_{ir}^2 \sum_{i} \lambda_{is}^2) \,\,. (6)

Special instances of the general oblimin criterion are:

Quartimin: \gamma = 0, (most oblique)<br\> Biquartimin: \gamma = 0,5 (less oblique)<br\> Covarimin: \gamma = 1 (least oblique)<br\>

Jennrich and Sampson (1966) derived an analytical procedure to go directly from an initial to primary-factor pattern. Their procedure also involves a parameter, different values of which lead to a whole class of oblimin-like solutions. Because a primary-factor pattern is obtained directly (out of the factor loading matrix), without involving an intermediate reference structure, and because the method involves oblique factors and a minimizing criterion, it is designated “direct oblimin” in keeping with the previous names. Jennrich and Sampson use the term “simple loadings.”

The point of departure in the Jennrich and Sampson approach is to seek a simple structure solution directly by minimizing a function of the primary-factor-pattern coefficients:

 min F(\Lambda) = \sum_{r\neq s}(\sum_{i}\lambda_{ir}^2 \lambda_{is}^2 - {\gamma \over n} \sum_{i}\lambda_{ir}^2 \sum_{i} \lambda_{is}^2) \,\, (7)

where Λ is a primary factor-pattern matrix with elements \lambda_{ir} . The algebraical difference between the general oblimin and the direct oblimin is the division of the terms by n in the case of direct oblimin. However, although the formula is similar, choice of γ/n in direct oblimin does not have a known correspondence with the choice of γ in indirect oblimin.

In general, grater values of γ/n produce more oblique solutions and smaller (negative) values produce more orthogonal solutions. If the factor pattern is unifactorial (the simplest possible), the specification of a γ/n = 0 identifies the correct pattern.

The direct oblimin solution is obtained by minimizing F(Λ), where:

 \Lambda = A(T')^{-1}\,\, (8)

So that the problem amounts to finding a transformation matrix T that will minimize  F(A(T')^{-1}), where A is the initial loading matrix, under the side condition

Diag (T'T) = I \,\,. (9)

In the next section, a rotation algorithm for oblimin is presented in detail.

A Planar Rotation Algorithm for Oblimin

(mainly from Clarkson and Jennrich 1988)

Most analytic rotation criteria for simple loadings are quartic functions (a polynomial function with a degree of four) of the loadings. If these functions are homogeneous quadratic functions of the squares of the loadings and are row and column symmetric (i.e. are invariant under permutations of the rows and columns of the loading matrix), then they must have the form:

F = k_1F_1 + k_2F_2 + k_3F_3 + k_4F_4 \,\, (10)

for constants k_1,\,  k_2,\,  k_3, \, k_4 where:


F_1 = (\sum _{i=1}^p \sum_{r=1}^m \lambda_{ir}^2)^2

F_2 = \sum _{i=1}^n (\sum_{r=1}^m \lambda_{ir}^2)^2

F_3 = \sum _{i=1}^m (\sum_{r=1}^p \lambda_{ir}^2)^2

F_4 = \sum _{i=1}^p \sum_{r=1}^m \lambda_{ir}^4

This is called the general symmetric family of quartic criteria. Table 1 summarizes different choices of k_1,\, k_2,\, k_3, \,k_4 leading to different criteria for the oblique case.

Table 1: Specific criteria in the General Symmetric Family, Oblique Tabelle 2.jpg

Current planar algorithms for oblique rotation are based on ordered pairs of factors. Following Jennrich and Sampson (1966) the first factor in each pair is rotated in the plane defined by the pair. In terms of the loading matrix Λ this means selecting a pair of columns\lambda_r \,\, and \,\, \lambda_s and applying a transformation of the form

 \bar \lambda_{ir} = \gamma \lambda_{ir} ,<br\> \bar \lambda_{is} = - \delta \lambda_{ir} + \lambda_{is} \,\,, (12)

 \gamma^2 = 1 + 2 \phi_{rs} \delta + \delta^2 \,\,, (13)

where \phi_{rs} is the rs-element in the factor correlation matrix  \Phi . One finds a value of δ that minimizes the resulting value of the criterion F. Then Λ is updated using (12), and the r-th row and column of  \Phi are updated using

 \bar \phi_{tr} = \bar \phi_{rt} = {1 \over \gamma} \phi_{rt} + {\delta \over \gamma} \phi_{st}, \,\,\, t \neq r \,\,, (14)<br\>

Beginning with initial values for Λ and Φ one proceeds by stepping uniformly through all ordered pairs \lambda_r \,\,and\,\, \lambda_s until the algorithm converges. It is assumed, without loss of generality, that the minimum of F is required.

Let F(\delta) be the value of F resulting from transformation (12). Following Jennrich and Sampson (1966) it has following form:

F(\delta) = a + b \delta + c \delta^2 + d \delta^3 + e \delta^4 \,\,. (15)

This reduces the problem of finding an optimal “rotation” (12) to the relatively simple and routine problem of minimizing a quartic in one variable. The main problem is in expressing a through e in (14) in terms of the components of Λ and Φ.

From (15)

a = k_1a_1 + k_2 a_2 + k_3a_3 + k_4a_4 \,\, (16)<br\> .<br\> .<br\> .<br\>  e = k_1e_1 + k_2e_2 + k_3e_3 + k_4e_4,

where i through ei are the “a” through “e” coefficients for the quartic functions  F_t in (11). Since the optimal δ in (15) does not depend on a, all that is really required are expressions for bi through ei for i = 1,…,4. Let  \lambda_+^2 denote a vector of row sums of squared loadings so (\lambda_+^2)_i = \sum_r \lambda_{ir}^2.

Using (11), (12), and (14) one finds that


b_1 = 4 \phi_{rs}(1, \lambda_+^2)(\lambda_r, \lambda_r) - 4(1, \lambda_+^2)(\lambda_r, \lambda_s),<br\> c_1 = 4(1, \lambda_+^2)(\lambda_r, \lambda_r) + 4 \phi_{rs}^2 (\lambda_r, \lambda_r)^2 - 8 \phi_{rs}(\lambda_r, \lambda_r)(\lambda_r, \lambda_s) + 4(\lambda_r, \lambda_s^2),<br\>

d_1 = 8 \phi_{rs}( \lambda_r ,\lambda_r)^2 - 8( \lambda_r, \lambda_r)( \lambda_r ,\lambda_s),<br\>

e_1 = 4( \lambda_r, \lambda_r)^2,<br\>



 b_2 = 4 \phi_{rs}( \lambda_+^2, \lambda_r^2) - 4( \lambda_+^2, \lambda_r \lambda_s), <br\>  c_2 = 4( \lambda_+^2, \lambda_r^2) + 4 \phi_{rs}^2( \lambda_r^2, \lambda_r^2) -8 \phi_{rs}( \lambda_r^3, \lambda_s)+ 4(\lambda_r^2, \lambda_s^2),<br\> d_2 = 8 \phi_{rs}( \lambda_r^2, \lambda_r^2) - 8(\lambda_r^3, \lambda_s),<br\> e_2 = 4( \lambda_r^2, \lambda_r^2),<br\>



b_3 = 4 \phi_{rs}( \lambda_r, \lambda_r)^2 - 4( \lambda_r, \lambda_s)( \lambda_s, \lambda_s),<br\>  c_3 = (2 + 4 \phi_{rs}^2)( \lambda_r ,\lambda_r)^2 + 2( \lambda_r, \lambda_r)( \lambda_s,\lambda_s) + 4( \lambda_r, \lambda_s)^2 , <br\> d_3 = 4 \phi_{rs}( \lambda_r, \lambda_r)^2 - 4( \lambda_r, \lambda_s) ( \lambda_r, \lambda_r), <br\>  e_3 = 2( \lambda_r, \lambda_r)^2, <br\>

and that


b_4 = 4 \phi_{rs}( \lambda_r^2, \lambda_r^2) - 4( \lambda_r, \lambda_s^3), <br\> c_4 = (2 + 4 \phi_{rs}^2)( \lambda_r^2, \lambda_r^2) + 6( \lambda_r^2, \lambda_s^2), <br\> d_4 = 4 \phi_{rs}( \lambda_r^2, \lambda_r^2) - 4( \lambda_r^3, \lambda_s),<br\> e_4 = 2( \lambda_r^2, \lambda_r^2). <br\>

Equations (16) through (20) are used to obtain values for b, c, d, and e in (15). The optimal δ is then found by minimizing the quartic F(δ) and the corresponding γ is given by (13) up to a choice of sign. Either choice will give an optimal rotation which is then defined by (12) and (14). In implementing the algorithms, it is more efficient to update, rather than recompute, quantities such as Λ, Φ, and  \lambda_+^2.


  • Abdi. H. (2003), Factor Rotations in Factor Analyses, Encyclopedia of Social Sciences Research Methods, (state: June 2006), P. 1-8
  • Basilevsky, A. (1994). Statistical Factor Analysis and Related Methods: Theory and Applications, Wiley Series in Probability and Mathematical Statistics, P. 250 - 269
  • Caroll, J.B. (1953). Ana Analytical Solution for Approximating Simple Structure in Factor Analysis, Psychometrica 18, P. 23-38
  • Caroll, J.B. (1960). IBM 704 Programm for Generalized Analytic Rotation Solution in Factor Analysis, unpublished manuscript, Harvard University
  • Caroll,J.B. (1957). Biquartimin Criterion for Rotation to Oblique Simple Structure in Factor Analysis, Science, P. 126, 1114-1115
  • Cattel,R.B. (1978) The scientific use of factor analysis. New York: Plenum
  • Clarkson, D.B. and Jennrich, R.I.(1988), Psychometrica 53, No 2, P. 251-259
  • Harman, H.H. (1976). Modern Factor Analysis, Chicago: University Press, P .300 - 327
  • Hildebrandt, Skript Einführung in die Marktforschung WS 05/06, Kap. 5.3, S. 49
  • Jennrich,R.I., and Sampson P.F. (1966). Rotation for Simple Loadings, Psychometrica 31, P. 313-323
  • Kaiser, (1958). The Varimax Criterion for Analytic Rotation in Factor Analysis. Psychometrica 23, P.187-200
  • Kim, J.O. and Mueller C.W. (1990) Statistical Methods and Practical Issues, Sage University Press, P. 39
  • Thurstone, L.L. (1947). Multiple-Factor Analysis, Chicago: University Press


  • This work could only be done as a literature work; so no problem.
  • Idempotent: A^2=A, it follows that all eigenvalues are either 0 or 1
  • It would be better to use the mediawiki features
  • Typo: refrence factors ?
  • How does oblique and quartmin fit together? Why was quartmin defined that way?
  • It would have been nice to see some SPSS examples, to see which effect the different methods have