Command Line Use Case



Important note: This use case demonstrates the usage of the command line tool with a simple example. Most of the features of the tool are shown in the course of this use case. You can also find demonstrations with the same steps and example values for the GUI and the API .
All variable parameters (methods, thresholds, etc) that are used in this example are chosen arbitrary and only for description purposes. These parameters have to be chosen carefully depending on the data and goals for practical uses.

Table of contents:


Loading and filtering the data

This step will show you how to load and filter data from an example file provided with the HALO package and how to save it again. You will need to define the labels of the input columns ( -ct for total, -cp for pre-existing, -cn for newly transcribed RNA, -ca for attributes), the input file and the filtering methods. We will filter the data with a numerical threshold of 50 and according to present/absent calls with two different calls: 'A' and 'M'. Additionally we use the present/absent call method with the gene names and call '---' to filter out the probesets without annotated genes. We set -caPC to TRUE since we try to load present call attributes, in order to speed up the process.

You will use the following parameters (for detailed explanation of the parameters see here):

-i data/Example_mouse.txt
-ct T1,T2,T3
-cp U1,U2,U3
-cn E1,E2,E3
-ca Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3
-pc TRUE
-f threshold=50
-f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:A:1
-f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:M:1
-f present=Gene~Symbol:---:1
-of Example_mouse_filtered.txt

Your complete call will look like this:

-i data/Example_mouse.txt -ct T1,T2,T3 -cp U1,U2,U3 -cn E1,E2,E3 -ca Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3 -pc TRUE -f threshold=50 -f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:A:1 -f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:M:1 -f present=Gene~Symbol:---:1 -of Example_mouse_filtered.txt



Normalizing the data and calculating the half-lives

In the next step you can normalize your data and calculate the half-lives. You will need all previous parameters (except for output) in order to load and filter the data before normalization. Additionally you have to define the normalization method (in this case standard for linear regression), the half-life calculation methods and a labeling time, as well as several flags that are necessary for saving.
For this example two half-life methods are used, based on newly transcribed/total RNA and on pre-existing/total RNA, respectively, with a labeling time of 55.

Parameters:

-i data/Example_mouse.txt
-ct T1,T2,T3
-cp U1,U2,U3
-cn E1,E2,E3
-ca Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3
-pc TRUE
-f threshold=50
-f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:A:1
-f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:M:1
-f present=Gene~Symbol:---:1
-l standard
-h1 new
-h2 pre
-t 55
-o Example_mouse_halflives.txt
-w halflife
-m new,pre
-plot TRUE

Your complete call will look like this:

-i data/Example_mouse.txt -ct T1,T2,T3 -cp U1,U2,U3 -cn E1,E2,E3 -ca Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3 -pc TRUE -f threshold=50 -f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:A:1 -f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:M:1 -f present=Gene~Symbol:---:1 -of Example_mouse_filtered.txt -l standard -h1 new -h2 pre -t 55 -o Example_mouse_halflives.txt -w halflife -m new,pre -plot TRUE



Quality control

The command line tool allows you to test for the quality of your data if you have sequence information. The example below will demonstrate you how to load a sequence file (with the flag -uf ) and load separate attributes from the original file (with the flag -ca2 ). Please note that we have to replace any whitespace characters from our values with '~' for the program to ran correctly, so we have to call -ca2 with the label Gene~Symbol instead of Gene Symbol. You also have to define the column of the fasta header that contains the gene name, and a ratio ( e.g. log(e'/n')) for the comparison of uracil numbers with this ratio. We will also print the probeset quality score to a file called Example_mouse_quality.txt and plot a histogram of the previous quality control (with the flag -pp ).
Additionally we will use the probeset quality score to find the best probeset for each gene.

You will use the following parameters (for detailed explanation of the parameters see here):

-i data/Example_mouse.txt
-ct T1,T2,T3
-cp U1,U2,U3
-cn E1,E2,E3
-ca Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3
-pc TRUE
-ca2 Gene~Symbol
-genelabel Gene~Symbol
-f threshold=50
-f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:A:1
-f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:M:1
-f present=Gene~Symbol:---:1
-f pqs=min
-uf data/sequences_mouse.txt
-uc 3
-ur "log(e'/n')"
-pqs Example_mouse_quality.txt
-pp TRUE

Your complete call will look like this:

-i data/Example_mouse.txt -ct T1,T2,T3 -cp U1,U2,U3 -cn E1,E2,E3 -ca Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3 -pc TRUE -ca2 Gene~Symbol -genelabel Gene~Symbol -f threshold=50 -f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:A:1 -f present=Call_T1,Call_T2,Call_T3,Call_U1,Call_U2,Call_U3,Call_E1,Call_E2,Call_E3:M:1 -f present=Gene~Symbol:---:1 -f pqs=min -uf data/sequences_mouse.txt -uc 3 -ur "log(e'/n')" -pqs Example_mouse_quality.txt -pp TRUE


Output

The output produced by HALO should look like this:

Data loading and filtering

Reading data...
Loading data...
Done loading data.
You have 31451 probesets.
------------------------------
Done reading data
Filtering data...
Filtering data...
Done filtering data.
You have 11031 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10984 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10937 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10731 probesets.
------------------------------
Done filtering data
Writing filtered data into file...
Done writing filtered data
Writing filtered data into file...
Done writing filtered data


Normalizing and half-life calculation

Reading data...
Loading data...
Done loading data.
You have 31451 probesets.
------------------------------
Done reading data
Filtering data...
Filtering data...
Done filtering data.
You have 11031 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10984 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10937 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10731 probesets.
------------------------------
Done filtering data
Writing filtered data into file...
Done writing filtered data
Performing normalization...
Starting linear regression...
Done with linear regression.
These are your correction factors:
c_u: 0.8326610520522192
c_l: 0.11605928227524738
------------------------------
Done with normalization
Calculating half-lives...
Starting half-life calculation...
Done calculating half-lives.
------------------------------
Starting half-life calculation...
Done calculating half-lives.
------------------------------
Done calculating half-lives
Writing results into file...
Writing results into file...
Done writing results.
Done writing results


Quality control:

Reading data...
Loading data...
Done loading data.
You have 31451 probesets.
------------------------------
Loading attributes...
Done loading attributes.
------------------------------
Done reading data
Evaluating data...
------------------------------
Starting quality control...
13571 probesets had to be discarded, because no sequence data was available for them.
You have 17881 probesets.
Done with quality control
------------------------------
Done evaluating
Filtering data...
Filtering data...
Done filtering data.
You have 11031 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10984 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10937 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10731 probesets.
------------------------------
Done filtering data



The following files should be produced:




HALO documentation