Network evaluation
The PPI score networks should contain a good reflection of the
interactions existing in living cells. Ideally, edges between
interacting proteins would have a weight of 1 in the network whereas
edges between non-interacting protein pairs have a weight of zero. Using
experimental data, we can not only evaluate predicted complexes but
also the quality of a scores network directly. ProCope currently
contains two methods to perform this task based on reference complex
sets.
Complex enrichment
A high-quality PPI network should have many high-scoring edges within
the complexes of a reference dataset whereas inter-complex edges should
have a low weight. This property is taken into account when calculating
the complex enrichment of a
score networks. The method uses the network to calculate the quotient
of the average inner-complex edge score and the average inner-complex
score of a randomized version of the reference complex set. The higher this value the more high-scoring edges there are within the correct complexes.
To minimize variations in the result due to the random character of the
method, the average score of multiple randomization runs should be
taken. The ProCope GUI for instance performs 100 randomizations by
default to calculate a complex enrichment score.
Receiver operating characteristic (ROC) curves
The quality of a score network can also be evaluated and compared using a ROC curve (for an introduction click here).
In this diagram the true-positive rate is plotted against the
false-positive rate for decreasing thresholds. To determine the true-
and false-positive rate, first each edge in the network is assigned one
of the following states:
- true positive edge: If the edge is between two proteins which are in the same complex.
- false positive edge: If the edge is between complexes and the proteins are not colocalized (see also: Colocalization).
Note that only a limited number of such negative edges are randomly
sampled from the network as there are too many possible negative edges.
Our methods generates 10 times as many negative edges as there are
positive edges.
- unknown otherwise
The negative set generation will also work without any localization data.
To generate the ROC curve we sort the list of edges by their weight and
iterate over the edges beginning from the highest-scoring edge. The
true and false positive counts are increased according to the states of
the edges (see above), no value is increased for unknown
edges. The true positive rates is the quotient of the current true
positive count and the total number of true edges. The false negative
rate is calculated analogically.