Name mapping

Name mappings

There are often multiple identifiers for single proteins. For instance, in the Saccharomyces Genome Databases (SGD), you will find standard names, systematic names and SGDIDs. This causes problems when working with data from different sources and authors. In ProCope you can load so-called name mappings which are lists of protein identifiers along with their synonyms. When reading data objects from the file systen, synonyms are automatically translated to their original identifiers and will thus be treated as the same proteins.

Name mappings are loaded from files as directed network. A mapping file contains one mapping per line, each line contains the identiifer of the target identifier and its synoym.. Both identifiers are separated by a TAB character.

All parts of ProCope support name mappings:

In the GUI, use Name mappings from the Tools menu
All command line tools which load data containing protein identifiers support the -namemap option
For the Java API, read the documentation of the ProteinManager class or check out Use Case 1.

Example mappings

The ProCope package contains existing name mappings for yeast in the file data/yeastmappings_YYMMDD.txt. It maps different kinds of identifiers (Uniprot, SGDIDs, standard names) to systematic names. The file was assembled from the Uniprot Yeast data and from Yeast Genome Database flat files.

Some example entries from the file:

S000002143      YAL069W

S000028594      YAL068W-A

S000002142      YAL068C

S000028593      YAL067W-A

S000000062      YAL067C

S000000061      YAL066W

S000001817      YAL065C

When loading this name mapping, each occurences of S000002143 will be treated as YAL069W, each occurence of S000028594 will be treated as YAL068W-A and so on.

Identifier order in the file

You can tell ProCope whether you want the first identifier in each name mapping line to be the synonym and the second one the target or vice versa.

The GUI will ask you for this information automatically
The command line tools whihc support the -namemap option also support the -synfirst switch

Without the -synfirst switch, the first identifier will be the target and the second the synoym
Accordingly, when using the -synfirst switch, the first identifier will be the synonym

For the Java API, read the documentation of the ProteinManager class or check out Use Case 1.

ProCope documentation