Sample Use Case 4: Working with additional tools

Besides calculating half-lives we can also evaluate the quality of the original expression data based on numbers of uracils in the corresponding sequences or calculate the median half-life. With the source code below you can reproduce these processes.

Important note: All variable parameters (methods, thresholds, etc) that are used in this example are chosen arbitrarily and only for description purposes. These parameters have to be chosen carefully depending on the data and goals for practical uses.

Table of contents:

Quality control of data
Bias correction with R
Output

Quality control of data

Based on sequences corresponding to our probesets we can evaluate the quality of our data measurements. For this we first have to define a small set of parameters, e.g. the ratio method that the number of uracils should be compared to. From these the probeset quality score is calculated for each probeset and can be printed to a file as well as plotted as a histogram.

//Define parameters int column = 1; //The column of the fasta header that contains gene names String method = "log(e'/n')"; //The method used for comparison against uracil numbers String output = "Example_mouse_quality_uracil.txt"; //Path for the output file (optional) boolean histo = true; //true if histogram should be plotted HashMap biasCorrection = null; //Bias correction factors //Start quality control data.evaluate("data/sequences_mouse.txt", column, method, output, biasCorrection, histo);

Bias correction with R

If you have R installed you can perform a bias correction on your data. This is based on the output file from the previous quality control step, see above. You can calculate the correlation coefficient as well as the bias correction for each value, so that you can subsequently repeat all previous analyses with corrected values.
At first you have to set the path to your R bin directory:
public static final String PATHTOR = "";;

//calculate the correlation coefficient RCorrelationCoefficient cor = new RCorrelationCoefficient(PATHTOR, evalOut); cor.setMethod(RCorrelationCoefficient.PEARSON); //start correlation coefficient calculation double coefficient = cor.calculateCorrelationCoefficient(); System.out.println("The correlation coefficient: "+coefficient); //fill mapping and corresponding arrays String[] spots = data.getSpot(); double[] datas = data.getDat(); //Start the loess regression with R RLoessRegression loess = new RLoessRegression(PATHTOR, evalOut, spots, datas); HashMap lo = loess.calculateLoessRegression(); data.setCorrNewTot(lo); data = Filter.filterCorrectionBias(data, lo);

Output

The output produced by HALO should look like this:

Loading data... Done loading data. You have 31451 probesets. ------------------------------ Loading attributes... Done loading attributes. ------------------------------ Filtering data... Done filtering data. You have 11031 probesets. ------------------------------ Filtering data... Done filtering data. You have 10984 probesets. ------------------------------ Filtering data... Done filtering data. You have 10937 probesets. ------------------------------ Filtering data... Done filtering data. You have 10731 probesets. ------------------------------ ------------------------------ Starting quality control... Graph generated Average uracil number: 1372.0303373025854; Average log(e'/n'): -0.14055195374872945 You have 8814 probesets. 1918 probesets had to be discarded, because no sequence data was available for them.Done with quality control ------------------------------ ------------------------------ Calculating correlation coefficient... 0 Done calculating correlation coefficient ------------------------------ The correlation coefficient: 0.3516099 ------------------------------ Calculate loess regression... 0 Done calculating loess regression ------------------------------ Filtering data... Done filtering data. You have 8813 probesets. ------------------------------

The following files should be produced:

Example_mouse_quality_uracil.txt

The following plots should be produced:

Evaluation of data quality

HALO documentation