procope.data.complexes
Class ComplexSet

java.lang.Object
  extended by procope.data.complexes.ComplexSet
All Implemented Interfaces:
Iterable<Complex>, ProteinSet

public class ComplexSet
extends Object
implements Iterable<Complex>, ProteinSet

A complex set is a list of complexes. This class contains various methods which work on this complex list, e.g. methods for manipulating, selecting and randomizing complexes.

ComplexSet is one of the major classes of this software library and is used in many different classes and methods.

Author:
Jan Krumsiek
See Also:
Complex, ComplexSetReader, ComplexSetWriter

Constructor Summary
ComplexSet()
          Creates an empty complex set
ComplexSet(Collection<? extends Collection<Integer>> newComplexes)
          Creates a complex set from a given list of lists of internal IDs.
 
Method Summary
 void addComplex(Complex toAdd)
          Adds a complex to the complex set.
 void addComplexes(Collection<Complex> toadd)
          Adds a list of complexes to the set
 ComplexSet calculateSharedProteinsBootstrap(ProteinNetwork scores, float lambda)
          Calculates the complex set with added shared proteins using a given scores network.
 ComplexSet calculateSharedProteinsPu(ProteinNetwork interactions, float a, float b)
          Calculates shared proteins with respect to a given interaction network as proposed by Pu et al., 2007 (Pubmed: 17370254).
 boolean contains(Complex complex)
          Checks if a specified complex is contained in this set.
 ComplexSet copy()
          Creates a copy of this complex set
 ComplexSet decompose(ProteinNetwork scores, float cutoff)
          Decomposes a complex set with respect to a given scores network.
 boolean equals(Object obj)
          Checks if two complex sets are equal.
 Complex getComplex(int index)
          Returns the complex at a given index in the complex set list.
 int getComplexCount()
          Returns the number of complexes in this set
 List<Complex> getComplexes()
          Returns the list of complexes backing this complex set.
 ProteinNetwork getComplexInducedNetwork()
          Creates a network which contains a fully connected subgraph for each complex.
 int getProteinCount()
          Returns the number of proteins in this set as the sum of the single complex sizes.
 Set<Integer> getProteins()
          Returns the set of proteins involved in this complex set.
 Iterator<Complex> iterator()
          Returns an iterator over the Complex objects in this set
 ComplexSet randomizeByExchanging()
          Returns a randomized copy of the complex set.
 ComplexSet randomizeByRemapping()
          Returns a randomized copy of the complex set.
 boolean removeComplex(Complex toRemove)
          Removes a given complex from the complex set.
 void removeComplex(int toRemove)
          Removes the complex at the specified index from the complex set.
 ComplexSet removeComplexesByScore(ProteinNetwork scores, float cutoff)
          Removes all complexes from the complex set whose average edge score between all proteins of the complex regarding a given scores network is below the cutoff.
 ComplexSet removeComplexesByScore(ProteinNetwork scores, float cutoff, boolean ignoreMissing)
          Removes all complexes from the complex set whose average edge score between all proteins of the complex regarding a given scores network is below the cutoff.
 ComplexSet removeComplexesBySize(int cutoffSize, boolean below)
          Removes all complexes smaller or larger than a given threshold from the complex set.
 ComplexSet removeSingletons()
          Removes singletons from the complex set.
 ComplexSet restrictToProteins(Set<Integer> proteins, boolean fullCoverage)
          Returns a new complex set containing only those complexes whose proteins are completely or partially contained in a given set of proteins.
 ComplexSet restrictToProteinSpace(ProteinSet proteins, boolean fullCoverage)
          Returns a new complex set containing only those complexes whose proteins are completely or partially contained in a given set of proteins.
 void sortBySize(boolean ascendingly)
          Sorts the list of complexes by their size.
 String toString()
          Returns a string representation of this complex set constructed by a list of string representations of the contained Complex objects
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ComplexSet

public ComplexSet()
Creates an empty complex set


ComplexSet

public ComplexSet(Collection<? extends Collection<Integer>> newComplexes)
Creates a complex set from a given list of lists of internal IDs. Each of this lists of internal IDs represents one complex.

The constructor will create copies of the collections, the original objects will not be altered.

Parameters:
newComplexes - List of lists of internal IDs
Method Detail

getComplexes

public List<Complex> getComplexes()
Returns the list of complexes backing this complex set. Note: A reference to the original backing list will be returned, changes to this object will also affect the complex set.

Returns:
The list of complexes contained in this set

getComplexCount

public int getComplexCount()
Returns the number of complexes in this set

Returns:
number of complexes in this set

getProteinCount

public int getProteinCount()
Returns the number of proteins in this set as the sum of the single complex sizes. Note: For complex sets where proteins are contained in multiple complexes, this value will be larger than getProteins().size().

Returns:
number of proteins in this complex set

randomizeByRemapping

public ComplexSet randomizeByRemapping()
Returns a randomized copy of the complex set. Randomization is achieved using a random permutation of the proteins. That means, for instance, that all occurences of Protein1 will be replaced by Protein4, all occurences of Protein2 will be replaced by Protein3 and so on. The sizes of the complexes will be preserved.

Returns:
a randomized copy of the complex set

randomizeByExchanging

public ComplexSet randomizeByExchanging()
Returns a randomized copy of the complex set. Randomization is achieved by randomly exchanging proteins between complexes. The size of the complexes as well as the number of complexes a protein is contained in will be preserved.

Returns:
a randomized copy of the complex set

copy

public ComplexSet copy()
Creates a copy of this complex set

Returns:
a copy of the complex set

iterator

public Iterator<Complex> iterator()
Returns an iterator over the Complex objects in this set

Specified by:
iterator in interface Iterable<Complex>

getComplex

public Complex getComplex(int index)
                   throws IndexOutOfBoundsException
Returns the complex at a given index in the complex set list.

Parameters:
index - index of the complex which will be retrieved
Returns:
Complex at the given index
Throws:
IndexOutOfBoundsException - if the complex index is invalid for this complex set

getProteins

public Set<Integer> getProteins()
Returns the set of proteins involved in this complex set.

Specified by:
getProteins in interface ProteinSet
Returns:
a set of internal IDs

addComplex

public void addComplex(Complex toAdd)
                throws ProCopeException
Adds a complex to the complex set.

Parameters:
toAdd - Complex to be added
Throws:
ProCopeException - if toAdd is an empty complex

removeComplex

public void removeComplex(int toRemove)
                   throws IndexOutOfBoundsException
Removes the complex at the specified index from the complex set.

Parameters:
toRemove - index of complex to be removed from the set
Throws:
IndexOutOfBoundsException - if the complex index is invalid for this complex set

removeComplex

public boolean removeComplex(Complex toRemove)
Removes a given complex from the complex set. A complex o from the set will be removed if and only if toRemove.equals(o), i.e. if the complexes are equal.

Parameters:
toRemove - Complex to be removed
Returns:
true if a complex was removed from the set, false otherwise

removeSingletons

public ComplexSet removeSingletons()
Removes singletons from the complex set. Singletons are complexes containing only one protein. Note: This method will alter the complex set object directly and return the removed singletons as a new complex set object.

Returns:
removed singletons

removeComplexesBySize

public ComplexSet removeComplexesBySize(int cutoffSize,
                                        boolean below)
Removes all complexes smaller or larger than a given threshold from the complex set. Note: This method will alter the complex set object directly and return the removed complexes as a new complex set object.

Parameters:
cutoffSize - size threshold
below - If this is true, all complexes with size < cutoffSize will be removed. If below is false, all complexes having > cutoffSize proteins will be removed.
Returns:
set of removed complexes

addComplexes

public void addComplexes(Collection<Complex> toadd)
Adds a list of complexes to the set

Parameters:
toadd - list of complexes to add

toString

public String toString()
Returns a string representation of this complex set constructed by a list of string representations of the contained Complex objects

Overrides:
toString in class Object

equals

public boolean equals(Object obj)
Checks if two complex sets are equal. Returns true if and only if (1) the specified object is also a complex set, (2) both sets have the same number of complexes and (3) each complex c1 in this set has a complex c2 in the other set such that c1.equals(c2) == true.

Overrides:
equals in class Object

contains

public boolean contains(Complex complex)
Checks if a specified complex is contained in this set. That means there exists a complex c1 in this set such that c1.equals(complex) == true

Parameters:
complex - Complex object which will be searched in the set
Returns:
true if the complex is contained in the set, false otherwise

getComplexInducedNetwork

public ProteinNetwork getComplexInducedNetwork()
Creates a network which contains a fully connected subgraph for each complex. A fully connected subgraph is a subgraph where all nodes are connected to each other. All edges get a weight of 1.0.

Returns:
network containing a fully connected subgraph for each complex

decompose

public ComplexSet decompose(ProteinNetwork scores,
                            float cutoff)
Decomposes a complex set with respect to a given scores network. All complexes are treated as subgraphs, the edge weights within these graphs are taken from the scores network. All edges below the given cutoff will be deleted and the smaller components resulting from eventual decompositions will be treated as single complexes.

The result will contain at least the number of complexes as the original complex set.

Parameters:
scores - scores network to be used for decomposition
cutoff - value below which edges will be deleted from the complex graphs
Returns:
decomposed complex

restrictToProteinSpace

public ComplexSet restrictToProteinSpace(ProteinSet proteins,
                                         boolean fullCoverage)
Returns a new complex set containing only those complexes whose proteins are completely or partially contained in a given set of proteins.

Parameters:
proteins - set of proteins for restriction
fullCoverage - if this value is true: all proteins of a complex have to be contained in the reference set in order to be returned. if fullCoverage==false only one protein of a complex has to be contained in the protein set.
Returns:
Complex set restricted to the given protein space

restrictToProteins

public ComplexSet restrictToProteins(Set<Integer> proteins,
                                     boolean fullCoverage)
Returns a new complex set containing only those complexes whose proteins are completely or partially contained in a given set of proteins.

Parameters:
proteins - set of proteins for restriction
fullCoverage - if this value is true: all proteins of a complex have to be contained in the reference set in order to be returned. if fullCoverage==false only one protein of a complex has to be contained in the protein set.
Returns:
Complex set restricted to the given protein space

removeComplexesByScore

public ComplexSet removeComplexesByScore(ProteinNetwork scores,
                                         float cutoff)
Removes all complexes from the complex set whose average edge score between all proteins of the complex regarding a given scores network is below the cutoff.

Parameters:
scores - scores network
cutoff - cutoff
Returns:
the complexes which were removed from the set

removeComplexesByScore

public ComplexSet removeComplexesByScore(ProteinNetwork scores,
                                         float cutoff,
                                         boolean ignoreMissing)
Removes all complexes from the complex set whose average edge score between all proteins of the complex regarding a given scores network is below the cutoff.

Parameters:
scores - scores network
cutoff - cutoff
ignoreMissing - true: scores of edges which do not exist in the score network are not considered for average calculation; false: missing scores ar treated as 0
Returns:
the complexes which were removed from the set

sortBySize

public void sortBySize(boolean ascendingly)
Sorts the list of complexes by their size.

Parameters:
ascendingly - if true the smallest complex will be the first complex in the list after sorting, if false it will be the largest complex

calculateSharedProteinsPu

public ComplexSet calculateSharedProteinsPu(ProteinNetwork interactions,
                                            float a,
                                            float b)
Calculates shared proteins with respect to a given interaction network as proposed by Pu et al., 2007 (Pubmed: 17370254). Parameters proposed in the paper: a=1.5, b=-0.5.

A protein is added to a given complex if it interacts with a minimum fraction of a*S**b proteins of that complex (where ** is the power function and S is the size of the acceptor complex).

Parameters:
interactions - protein interaction network to be used
a - a parameter
b - b parameter
Returns:
complex set containing shared proteins

calculateSharedProteinsBootstrap

public ComplexSet calculateSharedProteinsBootstrap(ProteinNetwork scores,
                                                   float lambda)
Calculates the complex set with added shared proteins using a given scores network. The lambda parameter describes the scores theshold for the newly created complex. For example, if lambda = 0.95 (the recommended value) the complex set with shared proteins is calculated whose average complex score is about 95% of the original complex set. Check out the documentation for more details.

Parameters:
scores - scores network to be used for evaluation
lambda - lambda parameter (see description of this method)
Returns:
the complex set containing added shared proteins