Name mappings
There are often multiple identifiers for single proteins. For instance, in the Saccharomyces Genome Databases (SGD), you will find standard names, systematic names and SGDIDs.
This causes problems when working with data from different sources and
authors. In ProCope you can load so-called name mappings
which are
lists of protein identifiers along with their synonyms. When reading
data objects from the file systen, synonyms are automatically
translated to
their original identifiers and will thus be treated as the same
proteins.
Name mappings are loaded from files as directed network. A mapping file
contains one mapping per line, each line contains the identiifer of the target identifier and its synoym.. Both identifiers
are separated by a TAB character.
All parts of ProCope support name mappings:
- In the GUI, use Name mappings from the Tools menu
- All command line tools which load data containing protein identifiers support the
-namemap
option
- For the Java API, read the documentation of the
ProteinManager
class or check out Use Case 1.
Example mappings
The ProCope package contains existing name mappings for yeast in the file data/yeastmappings_YYMMDD.txt
. It maps different kinds of identifiers (Uniprot, SGDIDs, standard names) to systematic names. The file was assembled from the Uniprot Yeast data and from Yeast Genome Database flat files.
Some example entries from the file:
S000002143 YAL069W
S000028594 YAL068W-A
S000002142 YAL068C
S000028593 YAL067W-A
S000000062 YAL067C
S000000061 YAL066W
S000001817 YAL065C
When loading this name mapping, each occurences of S000002143 will be treated as YAL069W, each occurence of S000028594 will be treated as YAL068W-A and so on.
Identifier order in the file
You can tell ProCope whether you want the first identifier in each name
mapping line to be the synonym and the second one the target or vice
versa.
- The GUI will ask you for this information automatically
- The command line tools whihc support the -namemap option also support the -synfirst switch
- Without the
-synfirst
switch, the first identifier will be the target and the second the synoym
- Accordingly, when using the
-synfirst
switch, the first identifier will be the synonym
- For the Java API, read the documentation of the
ProteinManager
class or check out Use Case 1.