Protein manager


The Protein manager maps protein string identifiers like YAL016W or YBR148W to internal integer IDs for performance reasons. Equal strings will always get the same internal ID. All methods and data objects exclusively use these internal identifiers. Name mappings and the automatic extraction of substrings using regular expressions are also implemented within the Protein manager.

All mentioned features are implemented as static methods in the ProteinManager class.

Example code

Below you find some example codes which demonstrate how to work with the ProteinManager class. The full source code can be found in the procope.examples.ProtManager class. Note that name mappings are not handled here, but can for example be found in the Use case 1 codes.

Mapping string identifiers to internal IDs

The ProteinManager.getInternalID function maps String identifiers to internal integer IDs. Same strings always return the same internal IDs. By default the protein manager does not work case-sensitive, i.e. "ABC", "abc" and "Abc" will map to the same integer. In this example we explicitly activate case sensitivity using the ProteinManager.setCaseSensitivity method.

ProteinManager.setCaseSensitivity(true);
int id1 = ProteinManager.getInternalID("PROTID_A");
int id2 = ProteinManager.getInternalID("PROTID_B");
int id3 = ProteinManager.getInternalID("PROTID_A");
int id4 = ProteinManager.getInternalID("protid_b");

System.out.println(id1+", " + id2 + ", " + id3 + ", " + id4); 

Note that id1==id3 but id2!=id4 (due to case sensitivity).

Regular expressions

Sometimes you might want to extract certain parts of string identifiers read from the file system. In the ProCope API you can use regular expressions for this:

ProteinManager.setRegularExpression("\\s*(.*)\\s+");
int id5 = ProteinManager.getInternalID("     PROTID_C     ");

ProCope will apply the regular expression to the string and use the first capturing group as the result. In the next section we will see that this extraction step worked.

First we want to deactivate regular expressions again.

ProteinManager.unsetRegularExpression();

Now display the actual result of what we have done:

System.out.println("String for " + id5 + ": " + ProteinManager.getLabel(id5));

This produces the output "PROTID_C" (and not "     PROTID_C     "). This line also shows how to map back internal IDs to string identifiers using the getLabel method.

Adding annotations

We add some arbitrary numeric annotations to some proteins in our set:

ProteinManager.addAnnotation(id1, "value", 0.2f);
ProteinManager.addAnnotation(id2, "value", 0.4f);
ProteinManager.addAnnotation(id5, "value", 0.6f);

Saving and loading annotations

For demonstration purposes, we save the annotations to a file, delete all annotations from the protein manager and reload them from the file we just wrote.

ProteinManager.saveProteinAnnotations("annotations");
ProteinManager.clearAnnotations();
ProteinManager.loadProteinAnnotations("annotations");

Filtering annotations

Finally we determine all proteins where "value" is greater or equal than 0.3. Note that proteins which do not have this annotation never match the boolean expression and are thus not contained in the result.

Set<Integer> filtered =
    ProteinManager.getFilteredProteins(new BooleanExpression("value >= 0.3"));

Output the result to the console:

for (int protein : filtered) {
    System.out.println(ProteinManager.getLabel(protein));
}

Output

The output of the program should look like this:

1, 2, 1, 3
String for 4: PROTID_C   
PROTID_B
PROTID_C





ProCope documentation