Sample Use Case 3: Working wih complex sets


In this sample use case you learn how to manipulate, filter and iterate over a complex set. The full source code of this example is available in the procope.examples package of the src/ folder.

Table of contents

Loading the data

First of all we load one of the bootstrap clusterings and the bootstrap network.

ComplexSet BT893=null;
ProteinNetwork bootstrap=null;
try {
    BT893 = ComplexSetReader.readComplexes("data/complexes/BT_893.txt");
    bootstrap = NetworkReader.readNetwork(new GZIPInputStream(
            new FileInputStream("data/scores/bootstrap_combined.txt.gz")));
} catch (Exception e) {
    // we do not do any further error handling here
    System.err.println("Could not read file.");
    System.err.println(e.getMessage());
    System.exit(1);
}


Decomposition


We now want to decompose the complex set. The method treats each complex as a graph where the edge weights are taken from a given scores network. All edges below a given threshold are deleted and thus the graph may decompose into two or more subgraphs. These subgraphs create the resulting complex set of decomposition. (Note: the 0.5 threshold is again arbitrarily chosen).

ComplexSet decomposed = BT893.decompose(bootstrap, 0.5f);

This method often produces a lot of singletons (clusters with only one protein). We remove this singletons and print out how many complexes there are before decomposition, after decomposition and after removing the singletons.

System.out.println("Original complexes:  " + BT893.getComplexCount());
System.out.println("After decomposition: " + decomposed.getComplexCount());
System.out.println("Without singletons:  " + decomposed.getComplexCount());


Filtering complexes by score

We will calculate highly confident complexes now. This is achieved by removing all complexes whose average edge scores is below a given threshold value.

BT893.removeComplexesByScore(bootstrap, 0.8f, false);


Iterating over the complex set

Next we iterate over the filtered complex set and output each complex to the console. Note how the ProteinManager is used to map back internal protein IDs to their String labels. Each ComplexSet object is iterable by the Complex class.

System.out.println("High-confidence complexes:");
for (Complex complex : BT893) {
    // iterate over all proteins
    for (int protein : complex) {
        System.out.print(ProteinManager.getLabel(protein)+" ");
    }
    System.out.println();
}


Writing the complexes to a file

Finally we write the filtered complex set to a file using the ComplexSetWriter class.

try {
    ComplexSetWriter.writeComplexes(BT893, "bt_highconfidence.txt");
} catch (IOException e) {
    System.err.println("Could not write file.");
    System.err.println(e.getMessage());
    System.exit(1);
}


Output

The output of the program should look like this:

Loading datasets...
Decomposing...
Original complexes:  893
After decomposition: 302
Without singletons:  302
High-confidence complexes:
ymr125w ypl178w ykl214c
yjl006c ykl139w yml112w
ypl171c yhr179w
ynr051c yer151c
yfl022c ylr060w




ProCope documentation