Command line tools - Loading data

The commands necessary for the loading of the data have to be used for every of the subsequent steps provided by HALO. Below you find a list of necessary and optional flags that are needed for the extracting of data from your datafile, as well as filtering it, evaluating it and printing it to an output file. You can combine these flags with the subsequent commands in any order.

List of necessary flags

-i inputfile A file containing expression data for newly transcribed, pre-existing and total RNA in different columns; more than 1 replicate is possible
-of outputfile The filename for the filtered data
-f filtermethod One of several filtering methods that are used as follows: 'method=value', e.g. threshold=50. For more methods see JavaDocs. For usage of more than one method add another -f flag.
-ct column labels The labels of the columns that contain expression data for total RNA; it is possible to use only the labels of the wanted replicates. For more than one label please use ',' as separation.
-cn column labels Column labels for newly transcribed RNA; for details see '-ct'
-cp column labels Column labels for pre-existing RNA; for details see '-ct'

List of optional flags

-log BOOLEAN TRUE if your data is in logarithmic scale and values should be loaded as 2 to the power of [value].
-ps column labels The label of the column containing the probeset_id; default = 'probeset_id'
-genelabel column label If your gene label is not 'gene_name' you have to call this flag with the correct attribute label for genes.
-cto column labels The labels of the columns containing total RNA that will be written into the output file
-cno column labels Output labels for newly transcribed RNA
-cpo column labels Output labels for pre-existing RNA
-ca column labels Column labels for attributes from the original file, separated by ','
-pc BOOLEAN TRUE if attributes with -ca are present/absent calls. This is necessary to speed up the loading procedure.
-ca2 column labels Column labels for attributes from the original file that will be loaded separately (e.g. if present calls and other attributes should be loaded from the file, you should load them separately), separated by ','.
-pc2 BOOLEAN TRUE if second list of attributes are present calls
-pqs filename Name of the file in which the quality control values will be written
-pp BOOLEAN TRUE if histogram of probeset quality scores should be created after filtering with probeset quality scores.
-R System path The path to your R bin directory, which is needed for flags -correl and -bias.
-correl method If you define this flag, a correlation coefficient will be calculated. In order for this to work you have to use the -ur, -ufo and -R flags also. Allowed methods: 'spearman', 'pearson' or 'kendal'.

Flags for data quality evaluation

-map filename Name of the file containing more attributes that will be added to the data; Structure of this file has to be: probeset ids in the first column, corresponding attribute in the second column.The first line should describe the columns, e.g.: '#spotid attribute1'. You can give multiple attribute files separated by comma; e.g. '-map filename1,filename2,filename3
-uf filename Name of the fasta file containing the sequences corresponding to the data
-uc column number Number of the column of the fasta header that contains the genename; e.g.'> genename|attribute|attribute' would result in '-uc 1'.
-ur method Either 'log(e'/n')', 'log(u'/n')' or 'log(e'/u')'; if -uf, -uc and -ur are given the average uracil number and average defined ratio are calculated and a file containing information for plotting is provided.
-ufo filename Name of the output file for the plotting information (ratio and uracilnumber).
-up BOOLEAN TRUE if plot about uracil number vs. ratio should be created.

Example call

-i data.txt -f pqs=min -of output.txt -ct totalRNA2 -cp preexistingRNA2 -cn newlytranscribed_HumanExon2 -map genenames.txt -uf genenames.fasta -uc 1 -ur log(e'/n') -pqs quality.txt -pp TRUE

HALO documentation