File formats

File format descriptions

Below you can find a short description of the file formats used for and produced by HALO. Note that <TAB> represents the tab character.

Overview

Input files:
Data from microarray/RNA-seq experiments
Multiple fasta files
Attribute files

Output files:
Filtered data
Half-life/Ratio values
Half-lives with attributes
Probeset quality scores
Normalization plotting results

Input files

Data from microarray or RNA-seq experiments

The data from your microarray or RNA-seq experiments can be loaded into HALO as a single '.txt'-file of the following format:


##############################################################################

#########Information that is no data has to be masked with a sharp############

##############################################################################

####First line after the commentary has to contain labels for data columns####

probeset_id<TAB>label1<TAB>label2<TAB>label3<TAB>....<TAB>labeln

After the header the values are given, tab separated like in the following example:


123456<TAB>140.014<TAB>30.12<TAB>213.5<TAB>...<TAB>54.12

There can be only one probeset id per line.

Multiple fasta files

Gene sequences have to be provided in a multiple fasta file, where one column in the fasta header describes the gene name. For an example of such a file see below.


> gene name|attribute|description|another attribute

ATCGTCAGAGATTATTACAGATACATTGAGATGAGTACGATGATAATGACATG

The sequence should be provided without any newline characters. It does not matter which column of the header contains the gene name, but it has to be the same column for every entry.

Attribute files

You can load additional attributes corresponding to your data. These files can contain one or more attributes, but they have to contain one column with probeset ids for mapping the attributes with the data. An example attribute file looks like this:


############################################################################

##########################Commentary can be given###########################

############################################################################

probeset_id<TAB>attribute1<TAB>attribute2<TAB>....<TAB>attributen

123456<TAB>gene1<TAB>A<TAB>...<TAB>1.23

The order of the columns is not important, but you have to define the label of the probeset_ids previously if it is not the same as in the original data.

Output files

Filtered data

The filtered data is in the same format as the input data. There is a header produced and the columns are put to the output file in the order probeset id, total RNA replicates, newly transcribed RNA replicates, pre-existing RNA replicates, attributes.

Half-life/Ratio values

When you have calculated the half-lives the program offers you three possible ways of saving: You can either save only the half-lives, only the ratios on which the half-lives are based, or both.

Half-lives and Ratios only:

spotid<TAB>half-life (method1)<TAB>half-life (method2)

123456<TAB>1.23<TAB>2.34

Both half-lives and Ratios:
Here you are presented with half-life values first and ratio values second, separated by a <TAB>.

spotid<TAB>half-life (method1)<TAB>half-life (method2)<TAB>method1 (ratios)<TAB>method2 (ratios)

12345<TAB>1.23<TAB>2.34<TAB>3.45<TAB>4.56

Half-lives with attributes

You can also save the results of a single half-life calculation, combined with one or more attributes. The file format of these output files looks like this:

spotid<TAB>half-life<TAB>attr_1<TAB>...<TAB>attr_n

12345<TAB>1.23<TAB>gene_1<TAB>...<TAB>A

Probeset quality scores

You can calculate the probeset quality score for each probeset. The output is structured


spotid<TAB>value

, e.g.:

123456<TAB>0.123

Normalization plotting file

HALO allows you not only to plot your normalization results, but also save the plotting information in a file for plotting with a program of your choice. Normalization plot display two types of information: the data points in a scatter plot and the linear regression line. Since these two types of values seldom have overlapping x-values, the plotting file is NOT structured in the usual

x-value<TAB>y1-value
<TAB>y2-value

, but instead can be separated into two parts. In the first part you will find the data for the scatter-plot, structured as x-value<TAB>y-value, and after that the data values for the regression line are displayed, structured in the same way. You can easily distinguish between the two parts of the files through a header line inserted before each part. The file will look something like this:

#Data points:x<TAB>y

1.2155489028028954<TAB>1.0138011284375397

2.3153839627405824<TAB>0.9239562391183962

#Regression line:x<TAB>y

0.0<TAB>0.07405405303601431

0.0010<TAB>0.07483151974610898

0.0020<TAB>0.07560898645620365

0.0030<TAB>0.07638645316629832

0.0040<TAB>0.07716391987639298

0.0050<TAB>0.07794138658648765

0.0060<TAB>0.07871885329658232

You can now go on and parse the two parts into separate files or a format fitting your needs.

HALO documentation