Name mappings

There are often multiple identifiers for single proteins. For instance, in the Saccharomyces Genome Databases (SGD), you will find standard names, systematic names and SGDIDs. This causes problems when working with data from different sources and authors. In ProCope you can load so-called name mappings which are lists of protein identifiers along with their synonyms. When reading data objects from the file systen, synonyms are automatically translated to their original identifiers and will thus be treated as the same proteins.

Name mappings are loaded from files as directed network. A mapping file contains one mapping per line, each line contains the identiifer of the target identifier and its synoym.. Both identifiers are separated by a TAB character.

All parts of ProCope support name mappings:

Example mappings

The ProCope package contains existing name mappings for yeast in the file data/yeastmappings_YYMMDD.txt. It maps different kinds of identifiers (Uniprot, SGDIDs, standard names) to systematic names. The file was assembled from the Uniprot Yeast data and from Yeast Genome Database flat files.


Some example entries from the file:

S000002143      YAL069W
S000028594      YAL068W-A
S000002142      YAL068C
S000028593      YAL067W-A
S000000062      YAL067C
S000000061      YAL066W
S000001817      YAL065C


When loading this name mapping, each occurences of S000002143 will be treated as YAL069W, each occurence of S000028594 will be treated as YAL068W-A and so on.

Identifier order in the file

You can tell ProCope whether you want the first identifier in each name mapping line to be the synonym and the second one the target or vice versa.



ProCope documentation