rpart.select {StabPerf} | R Documentation |
Uses a decision tree (from rpart
) as the basis for a greedy feature selection algorithm. Returns features from the root of the tree down, in order.
rpart.select(data, labels, best=NULL, thresh=0.1, start.indices=1:dim(data)[2], minsplit=10, ...)
data |
matrix or data.frame. Features in columns, samples in rows |
labels |
factor or integer. Labels of the samples of data |
best |
integer. How many features to return. Iterates training until fulfilled. |
thresh |
numeric. Minimum feature score accepted for a given feature. |
start |
integer. Reduce search space to these features in data (e.g. from a t-test) |
minsplit |
See rpart |
... |
Other parameters to pass to rpart |
Implements a greedy (i.e. non-optimal) feature selection algorithm, which trains a decision tree and plucks the features from the tree starting at the root and proceeding toward the leaves. A new tree is then built with the original features removed and the best features from this tree are removed similarly. The process is iterated until enough features have been found or until the tree is unable to split the data any further. If best
is given, the best best
features are returned, regardless of how poor they are. If thresh
is given, only features with a complexity
score above thresh
are returned, even if this results in returning an empty list. See rpart
for information on the complexity
of a feature.
Note: If the data are difficult to split, even the first feature returned may not be meaningful.
Note: The speed of rpart
depends much more on the number of features than on the number of samples. Use start.indices
to reduce the feature space from the start using, for example, a t-test or an F-test to fetch a couple hundred of the most significant features.
features. list. Selected features. |
rpart
, t.test
, wilcox.test
, mt.teststat
fets <- rpart.select(t(expr_data), some.factors, start=c(1:50,88,132))