Filtering the Data with the GUI
If you want to use a subset of your data instead of the complete content of the data file,
you can use the filtering menu in order to reduce the probesets according to a set of
filtering methods.
Extending the menu
You can access the filtering menu after loading of the data through clicking the menu option
labeled Filter Data in the menu bar.
Filtering methods
In the 1.3 version of HALO a set of four different filtering methods is provided. Below you can
find a short description of their functionality, as well as attributes that might be needed.
- Threshold
The Threshold-method filters your data according to a numerical threshold. Probesets with
at least one RNA value below this threshold will be discarded in this step.
- Present/Absent calls
This method uses present/absent calls to filter the data. The list of calls has to be loaded
as an attribute in a separate file or with the original data. It can also be loaded from the original
data file if you did not choose to include these when loading.
If you choose this method you can choose a call
and a threshold
for filtering. Your data will be
filtered in such a way, that every probeset with at least threshold
calls of the type
call
will be discarded.
- Probeset quality score - Threshold
This method is based on the quality scores of each probeset. You can enter a numerical threshold,
and every probeset with a quality score exceeding this number will be discarded.
For the calculation of quality scores normalized data is needed; if you have not performed normalization
up to this point it will be started automatically.
You are also provided with the possibility to save the calculated quality scores or plot them in a
histogram.
- Probeset quality score - Optimal probeset
Like the previous method this filtering method is based on the quality scores. For each gene
the probeset with the minimal quality score is kept, all others discarded. The method thus results
in one probeset per gene.
You can limit the calculations to one replicate; otherwise an average over all replicates is used.
Before using this method normalization has to be performed (see above) and gene names have to be
loaded as attribute.
You can again save the quality scores and create a histogram from them.
You can choose more than one of these filtering methods; all methods will be used subsequently in order to
filter your data. To start the process you have to click the Start button.
If you are not satisfied with your data you can always repeat filtering with stricter thresholds or other parameters.
Please note that these new filtering steps will then be used additionally to those already performed. If you
want to restart the complete process, you can reset the data with the Clear
button.
Saving the data
If you want to save the data after the filtering step you can do this by clicking the
Save button. In the popup dialog you can then choose all data columns that you want to be included
in the output file and choose a saving destination as well as a name.
Subsequent steps
Filtering is an optional procedure that does not unlock any additional steps.