R: Feature select based on ('rpart') decision trees

rpart.select {StabPerf}

R Documentation

Feature select based on ('rpart') decision trees

Description

Uses a decision tree (from rpart) as the basis for a greedy feature selection algorithm. Returns features from the root of the tree down, in order.

Usage

rpart.select(data, labels, best=NULL, thresh=0.1,
 start.indices=1:dim(data)[2], minsplit=10, ...)

Arguments

`data`	matrix or data.frame. Features in columns, samples in rows
`labels`	factor or integer. Labels of the samples of `data`
`best`	integer. How many features to return. Iterates training until fulfilled.
`thresh`	numeric. Minimum feature score accepted for a given feature.
`start`	integer. Reduce search space to these features in `data` (e.g. from a t-test)
`minsplit`	See `rpart`
`...`	Other parameters to pass to `rpart`

Details

Implements a greedy (i.e. non-optimal) feature selection algorithm, which trains a decision tree and plucks the features from the tree starting at the root and proceeding toward the leaves. A new tree is then built with the original features removed and the best features from this tree are removed similarly. The process is iterated until enough features have been found or until the tree is unable to split the data any further. If best is given, the best best features are returned, regardless of how poor they are. If thresh is given, only features with a complexity score above thresh are returned, even if this results in returning an empty list. See rpart for information on the complexity of a feature.

Note: If the data are difficult to split, even the first feature returned may not be meaningful.

Note: The speed of rpart depends much more on the number of features than on the number of samples. Use start.indices to reduce the feature space from the start using, for example, a t-test or an F-test to fetch a couple hundred of the most significant features.

Value

features. list. Selected features.

Examples

fets <- rpart.select(t(expr_data), some.factors, start=c(1:50,88,132))

[Package StabPerf version 0.5 Index]