Example case - Using the HALO GUI
This example case demonstrates the usage of the HALO GUI on example data provided in HALO/data
.
Analogous examples are given for the command line and the
Java API. The data used in the examples is part of the HALO package and can be found in the data
folder.
Table of contents:
Loading the data
Please start the HALO GUI as described under Graphical user interface and
click OK in the greeting pop-up. You will now see the HALO GUI interface.
For this example we want to use the data provided in the corresponding HALO/data
folder.
If you are using the webstart version, please download the example files from the
HALO website.
In order to load this into the application, you can use the Load-Data-panel always shown at the top.
- Click Browse and choose the file
HALO/data/Example_mouse.txt
from where you extracted HALO. (Please note that all your data files have to be '.txt'-files)
- In the following three popups you have to choose first labels of newly transcribed RNA (
E1, E2
and
E3
), then labels of pre-existing RNA (U1, U2
and U3
) and last
labels for total RNA (T1, T2
and T3
). You will be taken to the next
popup when you click on Enter at the bottom of the popup.
- In the next popup you will be asked to define the name of the column that contains
the probeset ids. Please check the box named
Column1
.
- When asked what scale your data is in, you can leave the default linear scale.
- You will now be asked if you want to load additional attributes. Since we want to
load present/absent calls from the original file, please click Yes.
- Now you can check the labels that begin with
Call_
.
- When asked if these labels are present calls, confirm with Yes.
- Click Load to complete the data loading.
- If you have chosen the correct labels you will be now informed, that HALO has loaded
31,451 probesets from the data file.
- We now want to load the gene name attribute from the original data file via the Add attributes/sequences button. Choose Attributes from the original data file and check
Gene Symbol
in the following popup. When asked if these labels are present calls, click No and the attribute will be loaded.
Filtering the data
We now want to filter the data according to a numerical threshold and present/absent calls.
- Click the menu called Filter data in the menu bar to open the sub menu.
- Choose the option Threshold. A popup will ask you for your chosen threshold. You can
now enter
50
and click OK.
- Now you can start the filtering by clicking on Start. HALO will inform you that you have now
11,031 probesets left.
- We also want to filter according to present/absent calls. For this you can click the
Present/Absent calls option. You will be asked which labels describe these calls, so choose
every label beginning with
Call_
.
After this you will be asked for a call and a threshold. As a call we want to use A
for absent, threshold
can be left at the default value 1. This will remove all probe sets with at least 1 absent call.
- Now you can start the filtering by clicking on Start. HALO will inform you that you have now
10,984 probesets left.
- We want to filter again according to present/absent calls. For this you can repeat the procedure
two steps above and enter
M
as a call.
- Click again Start. There should be now 10,937 probesets.
- We want to filter a third time for those probesets that have no annotated genes. For this we
repeat the present/absent filtering, but this time we choose Gene Symbol instead of
the present/absent calls as relevant attribute, and enter
---
as a call,
therefore filtering all those that have no gene names at all.
- Click Start. There should be 10,731 probesets left.
Normalizing the data
Before calculating half-lives we have to normalize the data.
- Go to the Normalization menu in the menu bar, and choose Normalization in the sub menu.
- Click Normalization method to choose a method. In the pop-up you should choose
Linear Regression
(since you don't know the median half-life yet) and click OK.
- You will now be provided with the possibility to choose a method for ratio calculation.
You can leave this as default and click OK.
- Start normalization by clicking on Start.
- These should be your reported correction factors:
c_l:0.11605928227524738
c_u: 0.8326610520522192
c_lu: 0.13938358470016307
- You can now also plot the linear regression line compared to the unnormalized data by clicking
on Plot. This procedure may take a short moment.
- You can choose to save the figure or the plotting file.
Calculating the median half-life
Based on the normalized data we can now calculate the median half-life.
- Go to the menu bar and choose
Half-Life
. In the opening sub menu choose
Calculate median half-life
.
- You will be asked for a labeling time, where you can enter
55
.
- Your median half-life will now be calculated. The result should be
320.3161021500094
.
Filtering with probeset quality scores
Before starting the half-life calculation we want to filter again according to the probeset quality scores
(PQS). If you want to compare these scores before and after filtering, you can access the calculation of
these via the menu bar, Quality control and then Calculate the probeset quality control scores.
If you compare the histogram produced here with the histogram generated after filtering, you should see
a high decrease in numbers of probesets with scores higher than 1.
- Go back to the filtering sub panel.
- We want to filter the data using the probeset quality scores. This method requires
gene names as an attribute, which we have already loaded here.
- Choose
Probeset quality score - optimal probeset
in the filtering panel.
This method can only be performed after normalization! If normalization by linear regression was not performed yet, it will be done when you click the optimal probe set option.
- You will be asked to provide the label of the gene names. Per default this is
gene_name
,
but we want to change it and choose Gene Symbol
in the following popup.
- You can enter a replicate now, but we will leave this field empty. Also, we don't want
to save or plot probeset quality scores, so click No when asked. Afterwards you can click
Start and should be informed that there are now 7,208 probesets.
- We can save the results by clicking on Save. Please choose a location and save
the filtered probesets under the name
Examples_mouse_filtered_pqs.txt
.
Calculating half-lives
We are now ready to calculate our half-lives. We will demonstrate how to use normalization with linear regression
and based on median as a basis for half-live calculation.
- First, we want to calculate the half-lives with the method based on Newly transcribed/Total RNA and data normalized with the linear regression method. You get to the Half-life panel via Ctrl+H or over the Half-Life →
Half-Life calculation menu.
- For this click on the Add calculation button in the Half-life panel. Enter
55
as labeling time and choose Newly transcribed/Total based
as method. Confirm your choice and click
Start and HALO will
calculate your half-lives.
- You can now save them by clicking on the Save button.
- Now we want to calculate the half-lives for data normalized with the median half-life. For this we will have to repeat normalization: Click on
Normalization, confirm that you want to repeat it and choose
Median half-life based
. Enter now
the median that we calculated earlier: 320.3161021500094
.
You will be asked to enter a half-life calculation method, so choose Newly transcribed/Total based
,
enter 55
as labeling time, confirm your choice and start normalization.
The resulting correction factor should be c_l: 0.11748556305145565
.
- When you now go to the Half-life panel, you can only click Start. This is because
you already chose half-life methods when using normalization. Click Start and HALO will
calculate your half-lives.
- You can now save your half-lives singularly with attributes or all together, and also plot them!
Quality control
We can use RNA sequences to evaluate the quality of our input data. This is based on a comparison of
number of Uracils per sequence and the logarithm of a certain ratio, e.g. newly transcribed/total.
- You first have to load a multiple fasta file containing the sequences corresponding to your
data. For this go to the data panel and click on Add attributes/sequences. Then choose
Multiple Fasta file
and locate the file sequences_mouse.txt
in your file system. HALO will ask you, which of a series of labels is the gene name.
Choose the correct label (in our case the third label) and confirm.
- To map sequences to probe sets, the gene name attribute needs to be specified. Per default this is
gene_name
, but you can change this via the Settings-menu or directly
after loading your sequences. Since we already set the Gene name attribute to Gene Symbol
when filtering probe sets according to the probe set quality score, you will not be asked for the label now. If you skipped the probe set quality step, you will be asked for the label, so choose Gene Symbol
from the following list.
- When you have loaded your sequences, go to the menu bar and choose Quality control.
Under Calculation of Uracil number you can perform the quality control.
- Choose a ratio for your comparison. We want to use newly transcribed to total, so please click on
log(e'/n')
and confirm your choice.
- When asked if you want to save the plotting data choose Yes and save the file
under the name
Example_mouse_quality-scores.txt
.
- The evaluation will start automatically and you will be prompted with a plot and the results:
Average uracil number: 1279.1442745032682
Average log(e'/n'):-0.11168352570969527