Complex set prediction
After generating a score network from a set of purification data we
need to derive a set of protein complexes from this network. This is
achived by clustering the respective networks. Gavin et al., 2006 used
hierarchical agglomerative clustering, Brohee et al., 2006 have shown
that Markov
Clustering
is the most efficient method for complex set prediction. Both
clustering methods are implemented in ProCope and can easily be
accessed to cluster any given scores network.
Clustering algorithms produce distinct sets of proteins whereas in vivo proteins
can be contained in multiple complexes. This is taken into account
differently by the authors which published sets of predicted protein
complexes. For instance Gavin et al. used iterative clustering and
introduced a core/module/attachment terminology. Our bootstrap approachcalculates
such shared proteins based on the underlying score network. ProCope
also implements a shared protein computation method proposed by Pu et al.
All clustering algorithms use parameters, e.g. the inflation coefficient
for Markov Clustering and the cutoff value for hierarchical clustering.
These parameters need to be tuned carefully in order to generate a
confident set of protein complexes. See also: Complex evaluation