Network evaluation

The PPI score networks should contain a good reflection of the interactions existing in living cells. Ideally, edges between interacting proteins would have a weight of 1 in the network whereas edges between non-interacting protein pairs have a weight of zero. Using experimental data, we can not only evaluate predicted complexes but also the quality of a scores network directly. ProCope currently contains two methods to perform this task based on reference complex sets.

Complex enrichment

A high-quality PPI network should have many high-scoring edges within the complexes of a reference dataset whereas inter-complex edges should have a low weight. This property is taken into account when calculating the complex enrichment of a score networks. The method uses the network to calculate the quotient of the average inner-complex edge score and the average inner-complex score of a randomized version of the reference complex set. The higher this value the more high-scoring edges there are within the correct complexes.

To minimize variations in the result due to the random character of the method, the average score of multiple randomization runs should be taken. The ProCope GUI for instance performs 100 randomizations by default to calculate a complex enrichment score.

Receiver operating characteristic (ROC) curves

The quality of a score network can also be evaluated and compared using a ROC curve (for an introduction click here). In this diagram the true-positive rate is plotted against the false-positive rate for decreasing thresholds. To determine the true- and false-positive rate, first each edge in the network is assigned one of the following states:

true positive edge: If the edge is between two proteins which are in the same complex.
false positive edge: If the edge is between complexes and the proteins are not colocalized (see also: Colocalization). Note that only a limited number of such negative edges are randomly sampled from the network as there are too many possible negative edges. Our methods generates 10 times as many negative edges as there are positive edges.
unknown otherwise

The negative set generation will also work without any localization data.

To generate the ROC curve we sort the list of edges by their weight and iterate over the edges beginning from the highest-scoring edge. The true and false positive counts are increased according to the states of the edges (see above), no value is increased for unknown edges. The true positive rates is the quotient of the current true positive count and the total number of true edges. The false negative rate is calculated analogically.

ProCope documentation