Datasets
The package uses three main types of dataset objects: Complex sets,
Purification data sets and Protein-protein interaction networks. Below
you find an introduction on what these objects represent and how they
interact with each other.
Complex sets
A complex set represents a list of protein complexes. Such a set can
either be the result of a prediction based on experimental
data, an experimentally determined reference set (see below) or any
other collection of of protein sets (e.g. complexes derived from GO
annotations). The creation of a correct and complete set of complexes
for a given organism or cell (in a specific state) is the main goal for
the methods contained in this package.
Two reference sets of protein complexes are commonly used to evaluate
prediction result. You find both sets in the data/complexes/
folder of this package.
- 102 complexes by Aloy et
al., 2004
- A set of 217 complexes from the MIPS database (Mewes et al., 2004)
It is to be noted that these reference sets of course do not provide full coverage of the yeast proteome.
ProCope contains different methods to evaluate the quality
of predicted complex sets using reference complex sets or other
data like
GO annotations and localization data. See also: Complex evaluation
Purification data sets
The main source of information used for the prediction of complexes in
ProCope comes from protein purification assays. Such a set of
purifications contains a number experiments containing interactions
between the bait protein of that experiment and a set of preys it
purified. Two large datasets created using the Tandem Affinity Purification (TAP) method were published in 2006 by Gavin et al. and Krogan et al. Another set produced with a different method was published earlier in 2002 by Ho et al.
The bait-prey interactions in these purification data sets are used to
derive confidence values for potential interactions in the
proteome of an organism. The resulting PPI scores network can then be
clustered to derive a set of predicted proteins. See also: Network evaluation
Protein interaction networks
As described above protein interaction score networks can be derived
from purification data. Such a network consists of a set of nodes
which represent the interacting proteins and a set of edges describing
interactions between those proteins. In ProCope you can associate
numeric weights with each edge as well as arbitrary annotations given
as key/value pairs. This allows you to integrate many different kinds
of information in your PPI networks (aside from the actual confidence
scores this could be literature reference information, user comments
etc.).
Usually different steps of manipulation, filtering and processing are
required to create a final network which can then be clustered.
ProCope contains various methods for the work with protein networks. See also: Networks