Theoretical and Empirical Analysis of ReliefF and RReliefF
ˇ Marko Robnik-Sikonja (firstname.lastname@example.org) Igor Kononenko (email@example.com) University of Ljubljana, Faculty of Computer and Information Science, Trˇ aˇka 25, z s 1001 Ljubljana, Slovenia tel.: + 386 1 4768386 fax: + 386 1 4264647 Abstract. Relief algorithms are general and successful attribute estimators. They are able to detect conditional dependencies between attributes and provide a uniﬁed view on the attribute estimation in regression and classiﬁcation. In addition, their quality estimates have a natural interpretation. While they have commonly been viewed as feature subset selection methods that are applied in prepossessing step before a model is learned, they have actually been used successfully in a variety of settings, e.g., to select splits or to guide constructive induction in the building phase of decision or regression tree learning, as the attribute weighting method and also in the inductive logic programming. A broad spectrum of successful uses calls for especially careful investigation of various features Relief algorithms have. In this paper we theoretically and empirically investigate and discuss how and why they work, their theoretical and practical properties, their parameters, what kind of dependencies they detect, how do they scale up to large number of examples and features, how to sample data for them, how robust are they regarding the noise, how irrelevant and redundant attributes inﬂuence their output and how different metrics inﬂuences them. Keywords: attribute estimation, feature selection, Relief algorithm, classiﬁcation, regression
ˇ Robnik Sikonja and Kononenko
1. Introduction A problem of estimating the quality of attributes (features) is an important issue in the machine learning. There are several important tasks in the process of machine learning e.g., feature subset selection, constructive induction, decision and regression tree building, which contain the attribute estimation procedure as their (crucial) ingredient. In many learning problems there are hundreds or thousands of potential features describing each input object. Majority of learning methods do not behave well in this circumstances because, from a statistical point of view, examples with many irrelevant, but noisy, features provide very little information. A feature subset selection is a task of choosing a small subset of features that ideally is necessary and sufﬁcient to describe the target concept. To make a decision which features to retain and which to discard we need a reliable and practically efﬁcient method of estimating their relevance to the target concept. In the constructive induction we face a similar problem. In order to enhance the power of the representation language and construct a new knowledge we introduce new features. Typically many candidate features are generated and again we need to decide which features to retain and which to discard. To estimate the relevance of the features to the target concept is certainly one of the major components of such a decision procedure. Decision and regression trees are popular description languages for representing knowledge in the machine learning. While constructing a tree the learning algorithm at each interior node selects the splitting rule (feature) which divides the problem space into two separate subspaces. To select an appropriate splitting rule the learning algorithm has to evaluate several possibilities and decide which would partition the given (sub)problem most appropriately. The estimation of the quality of the splitting rules seems to be of the principal importance. The problem of feature (attribute) estimation has received much attention in the literature. There are several measures for estimating attributes’ quality. If the target concept is a discrete variable (the classiﬁcation problem) these are e.g., information...