1

Introduction

A central problem in neural network research, as well as in statistics and signal processing, is ﬁnding a suitable representation or transformation of the data. For computational and conceptual simplicity, the representation is often sought as a linear transformation of the original data. Let us denote by x = (x1 , x2 , ..., xm )T a zero-mean m-dimensional random variable that can be observed, and by s = (s1 , s2 , ..., sn )T its n-dimensional transform. Then the problem is to determine a constant (weight) matrix W so that the linear transformation of the observed variables s = Wx (1) has some suitable properties. Several principles and methods have been developed to ﬁnd such a linear representation, including principal component analysis [30], factor analysis [15], projection pursuit [12, 16], independent component analysis [27], etc. The transformation may be deﬁned using such criteria as optimal dimension reduction, statistical ’interestingness’ of the resulting components si , simplicity of the transformation, or other criteria, including application-oriented ones. We treat in this paper the problem of estimating the transformation given by (linear) independent component analysis (ICA) [7, 27]. As the name implies, the basic goal in determining the transformation is to ﬁnd a representation in which the transformed components si are statistically as independent from each other as possible. Thus this method is a special case of redundancy reduction [2]. Two promising applications of ICA are blind source separation and feature extraction. In blind source separation [27], the observed values of x correspond to a realization of an m-dimensional discrete-time signal x(t), t = 1, 2, .... Then the components si (t) are called source signals, which are usually original, uncorrupted 1

signals or noise sources. Often such sources are statistically independent from each other, and thus the signals can be recovered from linear mixtures xi by ﬁnding a transformation in which the transformed signals are as independent as possible, as in ICA. In feature extraction [4, 25], si is the coefﬁcient of the i-th feature in the observed data vector x. The use of ICA for feature extraction is motivated by results in neurosciences that suggest that the similar principle of redundancy reduction [2, 32] explains some aspects of the early processing of sensory data by the brain. ICA has also applications in exploratory data analysis in the same way as the closely related method of projection pursuit [16, 12]. In this paper, new objective...