Datasets

The package uses three main types of dataset objects: Complex sets, Purification data sets and Protein-protein interaction networks. Below you find an introduction on what these objects represent and how they interact with each other.

Complex sets

A complex set represents a list of protein complexes. Such a set can either be the result of a prediction based on experimental data, an experimentally determined reference set (see below) or any other collection of of protein sets (e.g. complexes derived from GO annotations). The creation of a correct and complete set of complexes for a given organism or cell (in a specific state) is the main goal for the methods contained in this package.

Two reference sets of protein complexes are commonly used to evaluate prediction result. You find both sets in the data/complexes/ folder of this package.

102 complexes by Aloy et al., 2004
A set of 217 complexes from the MIPS database (Mewes et al., 2004)

It is to be noted that these reference sets of course do not provide full coverage of the yeast proteome.

ProCope contains different methods to evaluate the quality of predicted complex sets using reference complex sets or other data like GO annotations and localization data. See also: Complex evaluation

Purification data sets

The main source of information used for the prediction of complexes in ProCope comes from protein purification assays. Such a set of purifications contains a number experiments containing interactions between the bait protein of that experiment and a set of preys it purified. Two large datasets created using the Tandem Affinity Purification (TAP) method were published in 2006 by Gavin et al. and Krogan et al. Another set produced with a different method was published earlier in 2002 by Ho et al.

The bait-prey interactions in these purification data sets are used to derive confidence values for potential interactions in the proteome of an organism. The resulting PPI scores network can then be clustered to derive a set of predicted proteins. See also: Network evaluation

Protein interaction networks

As described above protein interaction score networks can be derived from purification data. Such a network consists of a set of nodes which represent the interacting proteins and a set of edges describing interactions between those proteins. In ProCope you can associate numeric weights with each edge as well as arbitrary annotations given as key/value pairs. This allows you to integrate many different kinds of information in your PPI networks (aside from the actual confidence scores this could be literature reference information, user comments etc.).

Usually different steps of manipulation, filtering and processing are required to create a final network which can then be clustered. ProCope contains various methods for the work with protein networks. See also: Networks

ProCope documentation