The benchmark set contains the consistently defined and classified domain pairs in CATH 3.1.0 and SCOP 1.73 (the green and red cells in those tables
The domains used are filtered so that the maximal sequence identity between any domain pair is below 50%.
The benchmark set contains domains from 2043 (out of 3751) SCOP families, 1139 (out of 2034) SCOP superfamilies and 672 (out of 1283) SCOP folds.
- similar pairs (same SCOP fold and same CATH topology, green cells in the tables): 129436 domain pairs
- non-similar pairs (different folds in SCOP and different topologies in CATH, red cells in the tables): 1740476 domain pairs
The clusters of domains with minimum 50% sequence identity can be downloaded here
The file determines for each representative domain used in the benchmark set the domains that share at least 50% sequence identity with the representant domain (if any). The file allows to add additional domains to or to exchange domain in the benchmark set (with respect to the mapping). The full mapping can be downloaded here: here
. The pairwise similarities (if sequence identity > 0.4) between the domains in mappable subsets of CATH superfamilies can be downloaded here
. (Similarities > 0.95 are on-line clustered i.e. if for a given domain d1 a domain d2 with sequence identity > 0.95 is detected, d2 is excluded from the further process and put to the same cluster as d1.