Complex set prediction

After generating a score network from a set of purification data we need to derive a set of protein complexes from this network. This is achived by clustering the respective networks. Gavin et al., 2006 used hierarchical agglomerative clustering, Brohee et al., 2006 have shown that Markov Clustering is the most efficient method for complex set prediction. Both clustering methods are implemented in ProCope and can easily be accessed to cluster any given scores network.

Clustering algorithms produce distinct sets of proteins whereas in vivo proteins can be contained in multiple complexes. This is taken into account differently by the authors which published sets of predicted protein complexes. For instance Gavin et al. used iterative clustering and introduced a core/module/attachment terminology. Our bootstrap approachcalculates such shared proteins based on the underlying score network. ProCope also implements a shared protein computation method proposed by Pu et al.

All clustering algorithms use parameters, e.g. the inflation coefficient for Markov Clustering and the cutoff value for hierarchical clustering. These parameters need to be tuned carefully in order to generate a confident set of protein complexes. See also: Complex evaluation


ProCope documentation