Example case - Using the HALO GUI

This example case demonstrates the usage of the HALO GUI on example data provided in HALO/data. Analogous examples are given for the command line and the Java API. The data used in the examples is part of the HALO package and can be found in the data folder.

Table of contents:

Loading the data
Filtering the data
Normalizing the data
Calculating the median half-life
Filtering with probeset quality scores
Calculating half-lives
Quality control
Bias correction

Loading the data

Please start the HALO GUI as described under Graphical user interface and click OK in the greeting pop-up. You will now see the HALO GUI interface. For this example we want to use the data provided in the corresponding HALO/data folder. If you are using the webstart version, please download the example files from the HALO website.
In order to load this into the application, you can use the Load-Data-panel always shown at the top.

Click Browse and choose the file HALO/data/Example_mouse.txt from where you extracted HALO. (Please note that all your data files have to be '.txt'-files)
In the following three popups you have to choose first labels of newly transcribed RNA (E1, E2 and E3), then labels of pre-existing RNA (U1, U2 and U3) and last labels for total RNA (T1, T2 and T3). You will be taken to the next popup when you click on Enter at the bottom of the popup.
In the next popup you will be asked to define the name of the column that contains the probeset ids. Please check the box named Column1.
When asked what scale your data is in, you can leave the default linear scale.
You will now be asked if you want to load additional attributes. Since we want to load present/absent calls from the original file, please click Yes.
Now you can check the labels that begin with Call_.
When asked if these labels are present calls, confirm with Yes.
Click Load to complete the data loading.
If you have chosen the correct labels you will be now informed, that HALO has loaded 31,451 probesets from the data file.
We now want to load the gene name attribute from the original data file via the Add attributes/sequences button. Choose Attributes from the original data file and check Gene Symbol in the following popup. When asked if these labels are present calls, click No and the attribute will be loaded.

Filtering the data
We now want to filter the data according to a numerical threshold and present/absent calls.

Click the menu called Filter data in the menu bar to open the sub menu.
Choose the option Threshold. A popup will ask you for your chosen threshold. You can now enter 50 and click OK.
Now you can start the filtering by clicking on Start. HALO will inform you that you have now 11,031 probesets left.
We also want to filter according to present/absent calls. For this you can click the Present/Absent calls option. You will be asked which labels describe these calls, so choose every label beginning with Call_.
After this you will be asked for a call and a threshold. As a call we want to use A for absent, threshold can be left at the default value 1. This will remove all probe sets with at least 1 absent call.
Now you can start the filtering by clicking on Start. HALO will inform you that you have now 10,984 probesets left.
We want to filter again according to present/absent calls. For this you can repeat the procedure two steps above and enter M as a call.
Click again Start. There should be now 10,937 probesets.
We want to filter a third time for those probesets that have no annotated genes. For this we repeat the present/absent filtering, but this time we choose Gene Symbol instead of the present/absent calls as relevant attribute, and enter --- as a call, therefore filtering all those that have no gene names at all.
Click Start. There should be 10,731 probesets left.

Normalizing the data
Before calculating half-lives we have to normalize the data.

Go to the Normalization menu in the menu bar, and choose Normalization in the sub menu.
Click Normalization method to choose a method. In the pop-up you should choose Linear Regression (since you don't know the median half-life yet) and click OK.
You will now be provided with the possibility to choose a method for ratio calculation. You can leave this as default and click OK.
Start normalization by clicking on Start.
These should be your reported correction factors:
c_l:0.11605928227524738 c_u: 0.8326610520522192 c_lu: 0.13938358470016307
You can now also plot the linear regression line compared to the unnormalized data by clicking on Plot. This procedure may take a short moment.
You can choose to save the figure or the plotting file.

Calculating the median half-life
Based on the normalized data we can now calculate the median half-life.

Go to the menu bar and choose Half-Life. In the opening sub menu choose Calculate median half-life.
You will be asked for a labeling time, where you can enter 55.
Your median half-life will now be calculated. The result should be 320.3161021500094.

Filtering with probeset quality scores
Before starting the half-life calculation we want to filter again according to the probeset quality scores (PQS). If you want to compare these scores before and after filtering, you can access the calculation of these via the menu bar, Quality control and then Calculate the probeset quality control scores. If you compare the histogram produced here with the histogram generated after filtering, you should see a high decrease in numbers of probesets with scores higher than 1.

Go back to the filtering sub panel.
We want to filter the data using the probeset quality scores. This method requires gene names as an attribute, which we have already loaded here.
Choose Probeset quality score - optimal probeset in the filtering panel. This method can only be performed after normalization! If normalization by linear regression was not performed yet, it will be done when you click the optimal probe set option.
You will be asked to provide the label of the gene names. Per default this is gene_name, but we want to change it and choose Gene Symbol in the following popup.
You can enter a replicate now, but we will leave this field empty. Also, we don't want to save or plot probeset quality scores, so click No when asked. Afterwards you can click Start and should be informed that there are now 7,208 probesets.
We can save the results by clicking on Save. Please choose a location and save the filtered probesets under the name Examples_mouse_filtered_pqs.txt.

Calculating half-lives
We are now ready to calculate our half-lives. We will demonstrate how to use normalization with linear regression and based on median as a basis for half-live calculation.

First, we want to calculate the half-lives with the method based on Newly transcribed/Total RNA and data normalized with the linear regression method. You get to the Half-life panel via Ctrl+H or over the Half-Life → Half-Life calculation menu.
For this click on the Add calculation button in the Half-life panel. Enter 55 as labeling time and choose Newly transcribed/Total based as method. Confirm your choice and click Start and HALO will calculate your half-lives.
You can now save them by clicking on the Save button.
Now we want to calculate the half-lives for data normalized with the median half-life. For this we will have to repeat normalization: Click on Normalization, confirm that you want to repeat it and choose Median half-life based. Enter now the median that we calculated earlier: 320.3161021500094. You will be asked to enter a half-life calculation method, so choose Newly transcribed/Total based, enter 55 as labeling time, confirm your choice and start normalization. The resulting correction factor should be c_l: 0.11748556305145565.
When you now go to the Half-life panel, you can only click Start. This is because you already chose half-life methods when using normalization. Click Start and HALO will calculate your half-lives.
You can now save your half-lives singularly with attributes or all together, and also plot them!

Quality control
We can use RNA sequences to evaluate the quality of our input data. This is based on a comparison of number of Uracils per sequence and the logarithm of a certain ratio, e.g. newly transcribed/total.

You first have to load a multiple fasta file containing the sequences corresponding to your data. For this go to the data panel and click on Add attributes/sequences. Then choose Multiple Fasta file and locate the file sequences_mouse.txt in your file system. HALO will ask you, which of a series of labels is the gene name. Choose the correct label (in our case the third label) and confirm.
To map sequences to probe sets, the gene name attribute needs to be specified. Per default this is gene_name, but you can change this via the Settings-menu or directly after loading your sequences. Since we already set the Gene name attribute to Gene Symbol when filtering probe sets according to the probe set quality score, you will not be asked for the label now. If you skipped the probe set quality step, you will be asked for the label, so choose Gene Symbol from the following list.
When you have loaded your sequences, go to the menu bar and choose Quality control. Under Calculation of Uracil number you can perform the quality control.
Choose a ratio for your comparison. We want to use newly transcribed to total, so please click on log(e'/n') and confirm your choice.
When asked if you want to save the plotting data choose Yes and save the file under the name Example_mouse_quality-scores.txt.
The evaluation will start automatically and you will be prompted with a plot and the results:
Average uracil number: 1279.1442745032682 Average log(e'/n'):-0.11168352570969527

HALO documentation