# Anacor Algorithm

**Topics:**Singular value decomposition, Linear algebra, Matrix

**Pages:**7 (1200 words)

**Published:**February 28, 2013

The ANACOR algorithm consists of three major parts: 1. 2. 3. A singular value decomposition (SVD) Centering and rescaling of the data and various rescalings of the results Variance estimation by the delta method.

Other names for SVD are “Eckart-Young decomposition” after Eckart and Young (1936), who introduced the technique in psychometrics, and “basic structure” (Horst, 1963). The rescalings and centering, including their rationale, are well explained in Benzécri (1969), Nishisato (1980), Gifi (1981), and Greenacre (1984). Those who are interested in the general framework of matrix approximation and reduction of dimensionality with positive definite row and column metrics are referred to Rao (1980). The delta method is a method that can be used for the derivation of asymptotic distributions and is particularly useful for the approximation of the variance of complex statistics. There are many versions of the delta method, differing in the assumptions made and in the strength of the approximation (Rao, 1973, ch. 6; Bishop et al., 1975, ch. 14; Wolter, 1985, ch. 6).

Notation

The following notation is used throughout this chapter unless otherwise stated:

k1 k2

p

Number of rows (row objects) Number of columns (column objects) Number of dimensions

Data-Related Quantities

f ij

f i+

Nonnegative data value for row i and column j: collected in table F Marginal total of row i, i = 1,K , k1 Marginal total of column j, j = 1,K , k 2 Grand total of F

f+ j

N

1

2 ANACOR

Scores and Statistics

ris

Score of row object i on dimension s Score of column object j on dimension s Total inertia

c js

I

Basic Calculations

One way to phrase the ANACOR objective (cf. Heiser, 1981) is to say that we wish to find row scores {ris } and column scores {c js } so that the function

σ {ris };{c js } =

3

8 ∑ ∑ f ∑ 3r

ij i j s

is

− c js

8

2

is minimal, under the standardization restriction either that

∑f

i

i + ris rit

= δ st

or

∑ f + j c jsc jt = δ st

j

where δ st is Kronecker’s delta and t is an alternative index for dimensions. The trivial set of scores ({1},{1}) is excluded. The ANACOR algorithm can be subdivided into five steps, as explained below.

ANACOR 3

1. Data scaling and centering The first step is to form the auxiliary matrix Z with general element

zij =

f ij fi + f + j

−

fi+ f + j N

2. Singular value decomposition Let the singular value decomposition of Z be denoted by

Z = KΛL’

with K ’K = I , L’L = I , and Λ diagonal. This decomposition is calculated by a routine based on Golub and Reinsch (1971). It involves Householder reduction to bidiagonal form and diagonalization by a QR procedure with shifts. The routine requires an array with more rows than columns, so when k1 < k 2 the original table is transposed and the parameter transfer is permuted accordingly. 3. Adjustment to the row and column metric The arrays of both the left-hand singular vectors and the right-hand singular vectors are adjusted row-wise to form scores that are standardized in the row and in the column marginal proportions, respectively:

ris = kis c js = l js

fi + N , f+ j N .

This way, both sets of scores satisfy the standardization restrictions simultaneously.

4 ANACOR

4. Determination of variances and covariances For the application of the delta method to the results of generalized eigenvalue methods under multinomial sampling, the reader is referred to Gifi (1981, ch. 12) && and Israels (1987, Appendix B). It is shown there that N time variance-covariance matrix of a function φ of the observed cell proportions p = pij = f ij N asymptotically reaches the form

>

C

’ ∂φ ∂φ ’ − ∂φ ∂φ ~ N × cov2φ 1 p67 = ∑ ∑ π ij π π ∂pij ∂pij ∑ ∑ ij ∂pij ∑ ∑ ij ∂pij i j i j i j Here the quantities π ij are the cell probabilities of the multinomial distribution, and ∂φ ∂pij are the partial derivatives of φ...

References: Benzécri, J. P. 1969. Statistical analysis as a tool to make patterns emerge from data. In: Methodologies of Pattern Recognition, S. Watanabe, ed. New York: Academic Press. Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. 1975. Discrete multivariate analysis: Theory and practice. Cambridge, Mass.: MIT Press. Eckart, C., and Young, G. 1936. The approximation of one matrix by another one of lower rank. Psychometrika, 1: 211–218. Gifi, A. 1981. Nonlinear multivariate analysis. Leiden: Department of Data Theory. Golub, G. H., and Reinsch, C. 1971. Linear algebra, Chapter I.10. In: Handbook for Automatic Computation, Volume II, J. H. Wilkinson and C. Reinsch, eds. New York: Springer-Verlag. Greenacre, M. J. 1984. Theory and applications of correspondence analysis. London: Academic Press. Heiser, W. J. 1981. Unfolding analysis of proximal data. Doctoral dissertation. Department of Data Theory, University of Leiden. Horst, P. 1963. Matrix algebra for social scientists. New York: Holt, Rinehart, and Winston. Israëls, A. 1987. Eigenvalue techniques for qualitative data. Leiden: DSWO Press. Nishisato, S. 1980. Analysis of categorical data: dual scaling and its applications. Toronto: University of Toronto Press. Rao, C. R. 1973. Linear statistical inference and its applications, 2nd ed. New York: John Wiley & Sons, Inc. Rao, C. R. 1980. Matrix approximations and reduction of dimensionality in multivariate statistical analysis. In: Multivariate Analysis, Vol. 5, P. R. Krishnaiah, ed. Amsterdam: North-Holland. Wolter, K. M. 1985. Introduction to variance estimation. Berlin: Springer-Verlag.

Please join StudyMode to read the full document