Sample Use Case 4: Working with additional tools

Besides calculating half-lives we can also evaluate the quality of the original expression data based on numbers of uracils in the corresponding sequences or calculate the median half-life. With the source code below you can reproduce these processes.

Important note: All variable parameters (methods, thresholds, etc) that are used in this example are chosen arbitrarily and only for description purposes. These parameters have to be chosen carefully depending on the data and goals for practical uses.

Table of contents:


Quality control of data

Based on sequences corresponding to our probesets we can evaluate the quality of our data measurements. For this we first have to define a small set of parameters, e.g. the ratio method that the number of uracils should be compared to. From these the probeset quality score is calculated for each probeset and can be printed to a file as well as plotted as a histogram.

//Define parameters
int column    = 1; //The column of the fasta header that contains gene names
String method = "log(e'/n')"; //The method used for comparison against uracil numbers
String output = "Example_mouse_quality_uracil.txt"; //Path for the output file (optional)
boolean histo = true; //true if histogram should be plotted
HashMap biasCorrection = null; //Bias correction factors

//Start quality control
data.evaluate("data/sequences_mouse.txt", column, method, output, biasCorrection, histo);


Bias correction with R

If you have R installed you can perform a bias correction on your data. This is based on the output file from the previous quality control step, see above. You can calculate the correlation coefficient as well as the bias correction for each value, so that you can subsequently repeat all previous analyses with corrected values.
At first you have to set the path to your R bin directory:
public static final String PATHTOR = "";;

//calculate the correlation coefficient
RCorrelationCoefficient cor = new RCorrelationCoefficient(PATHTOR, evalOut);
cor.setMethod(RCorrelationCoefficient.PEARSON);
//start correlation coefficient calculation
double coefficient = cor.calculateCorrelationCoefficient();
System.out.println("The correlation coefficient: "+coefficient);

//fill mapping and corresponding arrays
String[] spots = data.getSpot();
double[] datas = data.getDat();


//Start the loess regression with R
RLoessRegression loess = new RLoessRegression(PATHTOR, evalOut, spots, datas);
HashMap lo = loess.calculateLoessRegression();
data.setCorrNewTot(lo);
data = Filter.filterCorrectionBias(data, lo);


Output

The output produced by HALO should look like this:

Loading data...
Done loading data.
You have 31451 probesets.
------------------------------
Loading attributes...
Done loading attributes.
------------------------------
Filtering data...
Done filtering data.
You have 11031 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10984 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10937 probesets.
------------------------------
Filtering data...
Done filtering data.
You have 10731 probesets.
------------------------------
------------------------------
Starting quality control...
Graph generated
Average uracil number: 1372.0303373025854; Average log(e'/n'): -0.14055195374872945
You have 8814 probesets.

1918 probesets had to be discarded, because no sequence data was available for them.Done with quality control
------------------------------
------------------------------
Calculating correlation coefficient...
0
Done calculating correlation coefficient
------------------------------
The correlation coefficient: 0.3516099
------------------------------
Calculate loess regression...
0
Done calculating loess regression
------------------------------
Filtering data...
Done filtering data.
You have 8813 probesets.
------------------------------

The following files should be produced:
The following plots should be produced:





HALO documentation