Sample Use Case 2: Working with score networks

This example code demonstrates how to manipulate, evaluate, merge, filter and iterate over scores networks. The full source code of this example is available in the procope.examples package of the src/ folder.

Table of contents

Loading score networks

First we load two existing score networks from the filesystem. The network reading functionality is provided by the NetworkReader class.

ProteinNetwork hart=null, pe=null;
try {
    hart = NetworkReader.readNetwork(new GZIPInputStream(
            new FileInputStream("data/scores/hart_scores.txt.gz")));
    pe = NetworkReader.readNetwork(new GZIPInputStream(
            new FileInputStream("data/scores/pe_combined.txt.gz")));
} catch (Exception e) {
    // something went wrong, output error message
    System.err.println("Could not load score networks:");
    System.err.println(e.getMessage());
    System.exit(1);
}

Note that you can use any InputStream to read network data from. In this case we create a GZIPInputStream for on-the-fly decompression of GZIPed input files.


Comparing the networks

Next we compare the edge weights of both networks to figure out if they have a significant correlation. The NetworkComparison.weightsOverlap function outputs pairs of numbers, each pair is a weight for the same edge in both networks. Missing edges get an implicit weight of zero.

List<Point> overlap = NetworkComparison.weightsOverlap(hart, pe, false);
CorrelationCoefficient coeff = new PearsonCoefficient(); // could also use Spearman
coeff.feedData(overlap);
System.out.println("Correlation between the networks: " + coeff.getCorrelationCoefficient());


Cutoff networks and create randomized network

To avoid a too large ROC curve we remove all edges from the networks which are below a given threshold value (3 and 2 in this case. arbitrary values used here). Then we create a randomized version of one of the networks which we will need later on.

hart = hart.getCutOffNetwork(3);
pe = pe.getCutOffNetwork(2);

ProteinNetwork randomized = hart.randomizeByRewiring();


Loading the reference complex set

We need a reference complex set to evaluate the networks using the Complex enrichment and ROC curves methods. Again, we use the MIPS reference protein complex set.

ComplexSet mips=null;
try {
    mips = ComplexSetReader.readComplexes("data/complexes/mips_complexes.txt");
} catch (Exception e) {
    // something went wrong, output error message
    System.err.println("Could not load reference set:");
    System.err.println(e.getMessage());
    System.exit(1);
}


Complex enrichment

Next we calculate and output the complex enrichment of all score networks with respect to the given reference complex set. We use 100 randomizations to minimize variations. See also: Network evaluation

System.out.println("Complex enrichment of Hart: " +
        ComplexEnrichment.calculateComplexEnrichment(hart, mips, 100, true));
System.out.println("Complex enrichment of PE: " +
        ComplexEnrichment.calculateComplexEnrichment(pe, mips, 100, true));
System.out.println("Complex enrichment of Hart-randomized: " +
        ComplexEnrichment.calculateComplexEnrichment(randomized, mips, 100, true));

The result shows that the randomized network does not have significantly higher edges within the reference complex set than between the complexes.


ROC preparation

We want to calculate ROC curves which demonstrate the scoring performance of the given scores networks (see also: Network evaluation). As the the original MIPS complex set contains two very large complexes which could generate too many positive edges, we remove all complexes larger than 50 proteins from this set.

Additionally we generate a list of networks and a list of network names which will be used in the next step for ROC curve calculation and plot generation.

ComplexSet mips_below50 = mips.copy();
mips_below50.removeComplexesBySize(50, false);

ArrayList<ProteinNetwork> networks = new ArrayList<ProteinNetwork>();
networks.add(hart);
networks.add(randomized);
networks.add(pe);
ArrayList<String> names = new ArrayList<String>();
names.add("Hart");
names.add("Randomized");
names.add("PE");


ROC curves

Now we actually generate the ROC curves by calling the corresponding function in the ROC class. The result is a list of ROCCurve objects which we pass into the chart generation function of the ROCCurveHandler class to create the plot and save it into a file. For the generation of the negative set, we use the original MIPS set.

List<ROCCurve> curves = ROC.calculateROCCurves(networks, mips_below50, mips, null, false);
// plot them & write to file
JFreeChart chart = ROCCurveHandler.generateChart(curves, names);
try {
    ChartTools.writeChartToPNG(chart, new File("roc.png"), 800, 600);
} catch (IOException e) {
    // could not write the image
    System.err.println("Could not write image: roc.png\n\n" + e.getMessage());
    System.exit(1);
}


Network merging

Next we merge the two score networks. The final network does not have any edge weights, but both weights of the original networks are stored as annotations to the edges.

CombinationRules rules = new CombinationRules(CombinationRules.CombinationType.INTERSECT);
rules.setWeightMergePolicy(CombinationRules.WeightMergePolicy.ANNOTATE_WEIGHTS, "Hart", "PE");
ProteinNetwork merged = hart.combineWith(pe, rules);


Filtering

We filter out all edges from this merged network where both annotated scores are below certain thresholds (arbitrary values used) to get very highly confident edges. To filter a network we first create a BooleanExpression and pass this expression to the filter method of the ProteinNetwork object. See also: Annotations and filtering

BooleanExpression expression=null;
try {
    expression = new BooleanExpression("Hart>=70 & PE>=25");
} catch (InvalidExpressionException e) {
    // this won't happen as we know the expression is correct
}
ProteinNetwork filtered = merged.getFilteredNetwork(expression);


Iterating over the network

Finally we iterate over the filtered network and output the high-confidence edges. A ProteinNetwork objects is iterable using the NetworkEdge class. Note how the ProteinManager is used to map back internal protein IDs to their String labels.

for (NetworkEdge edge : filtered) {
    int protein1 = edge.getSource(); // source and target are not relevant
    int protein2 = edge.getTarget(); // as this is an undirected network
    System.out.println(ProteinManager.getLabel(protein1) +
            "\t" + ProteinManager.getLabel(protein2));
}


Output

The output of the program should look like this:

Loading networks...
Comparing networks...
Correlation between the networks: 0.7263753
Cutting off networks...
Generating randomized network...
Complex enrichment of Hart:83.504555
Complex enrichment of PE:61.335667
Complex enrichment of Hart-randomized:1.0372198
Calculating and saving ROC curve...
Combining networks...
High-confidence interactions:
ykl012w    ygr013w
yfr052w    yhr200w
yfr004w    yhr200w
ygl019w    yor061w
yor061w    yil035c
ylr277c    ykr002w
ylr115w    ylr277c




ProCope documentation