Data filtering menu

Filtering the Data with the GUI

If you want to use a subset of your data instead of the complete content of the data file, you can use the filtering menu in order to reduce the probesets according to a set of filtering methods.

Extending the menu

You can access the filtering menu after loading of the data through clicking the menu option labeled Filter Data in the menu bar.

Filtering methods

In the 1.3 version of HALO a set of four different filtering methods is provided. Below you can find a short description of their functionality, as well as attributes that might be needed.

Threshold
The Threshold-method filters your data according to a numerical threshold. Probesets with at least one RNA value below this threshold will be discarded in this step.
Present/Absent calls
This method uses present/absent calls to filter the data. The list of calls has to be loaded as an attribute in a separate file or with the original data. It can also be loaded from the original data file if you did not choose to include these when loading.
If you choose this method you can choose a call and a threshold for filtering. Your data will be filtered in such a way, that every probeset with at least threshold calls of the type call will be discarded.
Probeset quality score - Threshold
This method is based on the quality scores of each probeset. You can enter a numerical threshold, and every probeset with a quality score exceeding this number will be discarded.
For the calculation of quality scores normalized data is needed; if you have not performed normalization up to this point it will be started automatically.
You are also provided with the possibility to save the calculated quality scores or plot them in a histogram.
Probeset quality score - Optimal probeset
Like the previous method this filtering method is based on the quality scores. For each gene the probeset with the minimal quality score is kept, all others discarded. The method thus results in one probeset per gene.
You can limit the calculations to one replicate; otherwise an average over all replicates is used. Before using this method normalization has to be performed (see above) and gene names have to be loaded as attribute.
You can again save the quality scores and create a histogram from them.

You can choose more than one of these filtering methods; all methods will be used subsequently in order to filter your data. To start the process you have to click the Start button. If you are not satisfied with your data you can always repeat filtering with stricter thresholds or other parameters. Please note that these new filtering steps will then be used additionally to those already performed. If you want to restart the complete process, you can reset the data with the Clear button.

Saving the data

If you want to save the data after the filtering step you can do this by clicking the Save button. In the popup dialog you can then choose all data columns that you want to be included in the output file and choose a saving destination as well as a name.

Subsequent steps

Filtering is an optional procedure that does not unlock any additional steps.

HALO documentation