Command line tools - Loading data

The commands necessary for the loading of the data have to be used for every of the subsequent steps provided by HALO. Below you find a list of necessary and optional flags that are needed for the extracting of data from your datafile, as well as filtering it, evaluating it and printing it to an output file. You can combine these flags with the subsequent commands in any order.

List of necessary flags

`-i`	`inputfile`	A file containing expression data for newly transcribed, pre-existing and total RNA in different columns; more than 1 replicate is possible
`-of`	`outputfile`	The filename for the filtered data
`-f`	`filtermethod`	One of several filtering methods that are used as follows: 'method=value', e.g. threshold=50. For more methods see JavaDocs. For usage of more than one method add another -f flag.
`-ct`	`column labels`	The labels of the columns that contain expression data for total RNA; it is possible to use only the labels of the wanted replicates. For more than one label please use ',' as separation.
`-cn`	`column labels`	Column labels for newly transcribed RNA; for details see '-ct'
`-cp`	`column labels`	Column labels for pre-existing RNA; for details see '-ct'



List of optional flags


  
  
  	-log
	BOOLEAN
	TRUE if your data is in logarithmic scale and values should be loaded
as 2 to the power of [value].
  
    
    	-ps
    	column labels
    	The label of the column containing the probeset_id; default = 'probeset_id'
    
            
        
    	-genelabel
    	column label
    	If your gene label is not 'gene_name' you have to call this flag with the correct attribute 
    	label for genes.
    
    
    	-cto
    	column labels
    	The labels of the columns containing total RNA that will be written into the output file
    
    
    	-cno
    	column labels
    	Output labels for newly transcribed RNA
    
    
    	-cpo
    	column labels
    	Output labels for pre-existing RNA
    
    
    	-ca
    	column labels
    	Column labels for attributes from the original file, separated by ','
    
        
    	-pc
    	BOOLEAN
    	TRUE if attributes with -ca are present/absent calls. This is necessary to speed up the
    	loading procedure.
    
        
    	-ca2
    	column labels
    	Column labels for attributes from the original file that will be loaded separately (e.g. if 
    	present calls and other attributes should be loaded from the file, you should load them separately),
    	separated by ','.
    
        
    	-pc2
    	BOOLEAN
    	TRUE if second list of attributes are present calls
    
    
    	-pqs
    	filename
    	Name of the file in which the quality control values will be written
    
    
    	-pp
    	BOOLEAN
    	TRUE if histogram of probeset quality scores should be created after filtering with 
    	probeset quality scores.
    
        
        
    	-R
    	System path
    	The path to your R bin directory, which is needed for flags -correl and -bias.
    
        
        
    	-correl
    	method
    	If you define this flag, a correlation coefficient will be calculated. In order for this to work you 
    	have to use the -ur, -ufo and -R flags also. Allowed methods: 
    	'spearman', 'pearson' or 'kendal'.
    
   
 


Flags for data quality evaluation


  
    
    	-map
    	filename
    	Name of the file containing more attributes that will be added to the data; 
    	Structure of this file has to be: probeset ids in the first column, corresponding 
    	attribute in the second column.The first line should describe the columns, e.g.: 
    	'#spotid	attribute1'. You can give multiple attribute files separated by comma; 
    	e.g. '-map filename1,filename2,filename3
    
    
    	-uf
    	filename
    	Name of the fasta file containing the sequences corresponding to the data
    
    
    	-uc
    	column number
    	Number of the column of the fasta header that contains the genename; e.g.'> 
    	genename|attribute|attribute' would result in '-uc 1'.
    
    
    	-ur
    	method
    	Either 'log(e'/n')', 'log(u'/n')' or 'log(e'/u')'; if -uf, -uc and -ur are 
    	given the average uracil number and average defined ratio are calculated and a 
    	file containing information for plotting is provided.
    
    
    	-ufo
    	filename
    	Name of the output file for the plotting information (ratio and uracilnumber).
    
    
    	-up
    	BOOLEAN
    	TRUE if plot about uracil number vs. ratio should be created.
    
   
 



Example call

-i data.txt -f pqs=min -of output.txt -ct totalRNA2 -cp preexistingRNA2 -cn newlytranscribed_HumanExon2 -map genenames.txt  -uf genenames.fasta -uc 1 -ur log(e'/n') -pqs quality.txt -pp TRUE






HALO documentation

`-log`	`BOOLEAN`	TRUE if your data is in logarithmic scale and values should be loaded as 2 to the power of [value].
`-ps`	`column labels`	The label of the column containing the probeset_id; default = 'probeset_id'
`-genelabel`	`column label`	If your gene label is not 'gene_name' you have to call this flag with the correct attribute label for genes.
`-cto`	`column labels`	The labels of the columns containing total RNA that will be written into the output file
`-cno`	`column labels`	Output labels for newly transcribed RNA
`-cpo`	`column labels`	Output labels for pre-existing RNA
`-ca`	`column labels`	Column labels for attributes from the original file, separated by ','
`-pc`	`BOOLEAN`	TRUE if attributes with `-ca` are present/absent calls. This is necessary to speed up the loading procedure.
`-ca2`	`column labels`	Column labels for attributes from the original file that will be loaded separately (e.g. if present calls and other attributes should be loaded from the file, you should load them separately), separated by ','.
`-pc2`	`BOOLEAN`	TRUE if second list of attributes are present calls
`-pqs`	`filename`	Name of the file in which the quality control values will be written
`-pp`	`BOOLEAN`	TRUE if histogram of probeset quality scores should be created after filtering with probeset quality scores.
`-R`	`System path`	The path to your R bin directory, which is needed for flags `-correl` and `-bias`.
`-correl`	`method`	If you define this flag, a correlation coefficient will be calculated. In order for this to work you have to use the `-ur`, `-ufo` and `-R` flags also. Allowed methods: 'spearman', 'pearson' or 'kendal'.

`-map`	`filename`	Name of the file containing more attributes that will be added to the data; Structure of this file has to be: probeset ids in the first column, corresponding attribute in the second column.The first line should describe the columns, e.g.: '#spotid attribute1'. You can give multiple attribute files separated by comma; e.g. '-map filename1,filename2,filename3
`-uf`	`filename`	Name of the fasta file containing the sequences corresponding to the data
`-uc`	`column number`	Number of the column of the fasta header that contains the genename; e.g.'> genename\|attribute\|attribute' would result in '-uc 1'.
`-ur`	`method`	Either 'log(e'/n')', 'log(u'/n')' or 'log(e'/u')'; if -uf, -uc and -ur are given the average uracil number and average defined ratio are calculated and a file containing information for plotting is provided.
`-ufo`	`filename`	Name of the output file for the plotting information (ratio and uracilnumber).
`-up`	`BOOLEAN`	TRUE if plot about uracil number vs. ratio should be created.