procope.data.networks
Class ProteinNetwork

java.lang.Object
  extended by procope.data.networks.ProteinNetwork
All Implemented Interfaces:
Iterable<NetworkEdge>, ProteinSet

public class ProteinNetwork
extends Object
implements ProteinSet, Iterable<NetworkEdge>

Represents a network of binary protein interactions. For each edge an edge weight as well as edge annotations as key/value pairs can be stored. This class is one of the central objects in this library and provides high- performance accession and manipulation methods. Below you find some basic information and examples for these networks:

Edges in the network

There are two possibilities on how to create an edge between two proteins in the network. (1) You can assign a numeric weight to the edge using setEdge(int, int, float). This weight is used by all scoring functions delivered with this library. (2) Alternativly you can assign a set of key/value pairs to an edge which allows you to store virtually any kind of information for an edge.

Notes:

Undirected and directed networks

By default networks are undirected, but can can create directed networks using the constructor ProteinNetwork(boolean). For undirected networks the edges (a,b) and (b,a) are identical and only stored once.

The directedness of a network affects different functionalities, e.g. neighbor detection or network searching. Further information are provided with the documentations of the functions.

Iterating over networks

There are different ways of iterating over all edges of a network. Basically the different methods provide different convenience/efficency tradeoffs. For the examples below we assume that there is an existing network object called net.

  1. The most convenient and Java-like way to iterate over a network is the use of iterator() function:
      for (NetworkEdge edge : net) {
           // do something with 'edge'
      }
    Note: This method is very easy to use but for each edge a NetworkEdge object has to be created which might lead to efficiency problems for large networks.

  2. A more efficient way to iterate over a network is the usage of getEdgesArray(). It returns an array which alternatingly contains both partners of each edge:
      int[] edges = net.getEdgesArray();
      for (int i=0; i<edges.length; i+=2) {
          int protein1 = edges[i];
          int protein2 = edges[i+1];
          // protein1 and protein2 have an edge
      }

Author:
Jan Krumsiek

Constructor Summary
ProteinNetwork()
          Creates an empty undirected network.
ProteinNetwork(boolean directed)
          Creates an empty network.
 
Method Summary
 void breadthFirstSearch(int start, NetworkSearchCallback callback)
          Performs a breadth-first search one the network.
 ProteinNetwork combineWith(ProteinNetwork other, CombinationRules rules)
          Combines two network using a given combination rules.
 ProteinNetwork copy()
          Create a copy of the network.
 boolean deleteEdge(int prot1, int prot2)
          Deletes an edge from the network along with all of its annotations.
 void depthFirstSearch(int start, NetworkSearchCallback callback)
          Performs a depth-first search one the network.
 PurificationData derivePurificationData(boolean poolBaits)
          Creates a purification data object from a directed network.
 boolean equals(Object obj)
          Returns true if and only if obj is also of type ProteinNetwork, both network have the same directedness and all edge weights and annotations of the networks are identical
 boolean equalScores(ProteinNetwork compare)
          Checks whether two networks are equal regardings their edge weights
 Set<String> getAnnotationKeys()
          Returns a set of all distinct annotation keys used in the network.
 ProteinNetwork getCutOffNetwork(float cutOff)
          Returns a network containing only edges with a weight greater or equal than a given cutoff value.
 ProteinNetwork getCutOffNetwork(float cutOff, boolean cutBelow)
          Returns a network containing only edges with a weight above or below a given cutoff value.
 Collection<NetworkEdge> getDirectedNeighbors(int protein, boolean fromProtein)
          Returns all incident edges of a given direction from a directed network.
 float getEdge(int prot1, int prot2)
          Returns the weight of an edge between two given proteins.
 Object getEdgeAnnotation(int prot1, int prot2, String key)
          Retrieves an annotation from a given edge.
 Map<String,Object> getEdgeAnnotations(int prot1, int prot2)
          Returns all annotations associated with a given edge.
 int getEdgeCount()
          Returns the number of edges in this network.
 int[] getEdgesArray()
          Returns an array containing all edges of the network.
 ProteinNetwork getFilteredNetwork(BooleanExpression expression)
          Filters the network using a given boolean expression.
 List<NetworkEdge> getNearestNeighbors(int protein)
          Basically the same as getNeighbors(int), but returns the IDs of all neighbors with descending weights.
 int[] getNeighborArray(int protein)
          Returns an array of proteins which contains all neighbors in the network of a given protein.
 List<NetworkEdge> getNeighbors(int protein)
          Returns all incident edges for a given protein in the network.
 int getNodeCount()
          Returns the number of nodes in this network.
 Set<Integer> getNodes()
          Get list of proteins in this network.
 Set<Integer> getProteins()
          Returns the set of proteins which are contained as nodes in this network
 boolean hasEdge(int prot1, int prot2)
          Checks whether there is an edge in the network between two given proteins.
 boolean isDirected()
          Returns if the network is a directed network
 Iterator<NetworkEdge> iterator()
          Returns an iterator over all edges of the network.
 ProteinNetwork randomizeByRewiring()
          Randomizes a network by rewiring.
 ProteinNetwork randomizeByRewiring(int rewirings)
          Randomizes a network by rewiring.
 ProteinNetwork restrictToProteins(ProteinSet proteins, boolean fullCoverage)
          Returns a new network object which contains only those edges where one or both adjacent proteins are contained in a given set of proteins.
 ProteinNetwork restrictToProteins(Set<Integer> proteinIDs, boolean fullCoverage)
          Returns a new network object which contains only those edges where one or both adjacent proteins are contained in a given set of proteins.
 void scalarMultiplication(float factor)
          Multiplies all existing edge weights of the network with a given value.
 void setEdge(int prot1, int prot2)
          Sets an edge between two given proteins to a standard weight of 1.0.
 void setEdge(int prot1, int prot2, float weight)
          Sets a weighted edge between two given proteins.
 void setEdgeAnnotation(int prot1, int prot2, String key, Object value)
          Labels an edge between two proteins with a given key/value pair.
 void setEdgeAnnotations(int prot1, int prot2, Map<String,Object> annotations)
          Labels an edge in the network with a given set of annotations.
 void setFullEdge(NetworkEdge edge)
          Takes an existing NetworkEdge object and inserts the edge into this network.
 void setIterateEdgesTwice(boolean iterateTwice)
          For undirected networks this function determines the iterator behaviour.
 ProteinNetwork undirectedCopy()
          Create an explicitly undirected copy of the network.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ProteinNetwork

public ProteinNetwork()
Creates an empty undirected network.


ProteinNetwork

public ProteinNetwork(boolean directed)
Creates an empty network.

Parameters:
directed - specifies if this will be a directed or an undirected network
Method Detail

hasEdge

public boolean hasEdge(int prot1,
                       int prot2)
Checks whether there is an edge in the network between two given proteins.

Parameters:
prot1 - first protein
prot2 - second protein
Returns:
true if there is an edge between the two proteins

getEdge

public float getEdge(int prot1,
                     int prot2)
Returns the weight of an edge between two given proteins. If there is no weighted edge between the two proteins this function will return Float.NaN

Parameters:
prot1 - first protein
prot2 - second protein
Returns:
the weight of the edge or Float.NaN if no weighted edge exists between those proteins in the network

setEdge

public void setEdge(int prot1,
                    int prot2,
                    float weight)
             throws ProCopeException
Sets a weighted edge between two given proteins. The edge weight can be an arbitrary float value.

Parameters:
prot1 - first protein
prot2 - second protein
weight - weight to assign to the edge
Throws:
ProCopeException - if weight == Float.NaN

setFullEdge

public void setFullEdge(NetworkEdge edge)
Takes an existing NetworkEdge object and inserts the edge into this network.

Parameters:
edge - network edge to be inserted

setEdge

public void setEdge(int prot1,
                    int prot2)
Sets an edge between two given proteins to a standard weight of 1.0.

Parameters:
prot1 - first protein
prot2 - second protein

setEdgeAnnotation

public void setEdgeAnnotation(int prot1,
                              int prot2,
                              String key,
                              Object value)
                       throws ProCopeException
Labels an edge between two proteins with a given key/value pair. If there already is an annotation with this key it will be overwritten.

Note: An edge does not have to be created using setEdge in order to add an annotation. Adding an annotation will also create a new edge in the network.

Parameters:
prot1 - first protein
prot2 - second protein
key - key of the annotation
value - value of the annotation
Throws:
ProCopeException - if value is not an Integer, Float, String or List

setEdgeAnnotations

public void setEdgeAnnotations(int prot1,
                               int prot2,
                               Map<String,Object> annotations)
                        throws ProCopeException
Labels an edge in the network with a given set of annotations.

Note: An edge does not have to be created using setEdge in order to add an annotation. Adding an annotation will also create a new edge in the network.

Parameters:
prot1 - first protein
prot2 - second protein
annotations - map of key=>value pairs to add to the edge, already existing annotations with identical keys will be overwritten
Throws:
ProCopeException - if one or more of the values is not an Integer, Float, String or List

getEdgeAnnotation

public Object getEdgeAnnotation(int prot1,
                                int prot2,
                                String key)
Retrieves an annotation from a given edge.

Parameters:
prot1 - first protein
prot2 - second protein
key - the key for which the value will be read
Returns:
the value belonging to this key (will be of type Integer, Float, String or List or null if there is no value associated with this key

deleteEdge

public boolean deleteEdge(int prot1,
                          int prot2)
Deletes an edge from the network along with all of its annotations.

Parameters:
prot1 - first protein
prot2 - second protein
Returns:
true if the edge existed and was deleted, false if no such edge exists in the network

getNeighborArray

public int[] getNeighborArray(int protein)
Returns an array of proteins which contains all neighbors in the network of a given protein. For directed networks, this will contain neighbors which are sources as well as neighbors which are targets of directed edges.

Parameters:
protein - protein for which the neighbors will be retrieved
Returns:
array of neighbor proteins

getNearestNeighbors

public List<NetworkEdge> getNearestNeighbors(int protein)
Basically the same as getNeighbors(int), but returns the IDs of all neighbors with descending weights. That is, the most similar neighbor (with the highest score) will be the first one in the result list

Parameters:
protein - of which the sorted neighbor list will be returned

getEdgesArray

public int[] getEdgesArray()
Returns an array containing all edges of the network. Note: This method is intendend for high-performance iteration over the all edges of a network. If this array has length n there are n/2 edges in the network. The array alternatingly contains the first and second protein of each edge.

See above for examples on iterating over a network

Returns:
array with edges in this network consisting of the first and second proteins of each edge alternatingly

getEdgeAnnotations

public Map<String,Object> getEdgeAnnotations(int prot1,
                                             int prot2)
Returns all annotations associated with a given edge.

Parameters:
prot1 - first protein
prot2 - second protein
Returns:
a Map of key/value pairs associated with this edge, all values are of type Integer, Float, String or List

getNeighbors

public List<NetworkEdge> getNeighbors(int protein)
Returns all incident edges for a given protein in the network. For undirected networks this query protein will always be the source of the returned edges which means that edge.getSource() == protein. For directed networks of course edge.getSource() will always be the actual source of the directed edge.

Parameters:
protein - protein to retrieve neighbors for
Returns:
collection of network edges incident with the given protein

getDirectedNeighbors

public Collection<NetworkEdge> getDirectedNeighbors(int protein,
                                                    boolean fromProtein)
Returns all incident edges of a given direction from a directed network. For fromProtein==true it will only return edges where protein is the source protein, for fromProtein==false only those edges where protein is the target will be returned.

The function should not be called for undirected networks and will output a warning on stderr if you do so.

Parameters:
protein - protein for which neighbors will be retrieved from the network
fromProtein - get directed edges where protein is the source (fromProtein==true) or the target (fromProtein==false)
Returns:
list of incident edges according to the settings

getProteins

public Set<Integer> getProteins()
Returns the set of proteins which are contained as nodes in this network

Specified by:
getProteins in interface ProteinSet
Returns:
a set of internal IDs

getEdgeCount

public int getEdgeCount()
Returns the number of edges in this network. Note that for directed networks the edges (a,b) and (b,a) will both increase this count.

Returns:
number of edges in the network

getNodeCount

public int getNodeCount()
Returns the number of nodes in this network. This value is equivalent to getProteins().size().

Returns:
number of nodes in the network

getAnnotationKeys

public Set<String> getAnnotationKeys()
Returns a set of all distinct annotation keys used in the network.

Returns:
set of distinct annotation keys used in the network

restrictToProteins

public ProteinNetwork restrictToProteins(ProteinSet proteins,
                                         boolean fullCoverage)
Returns a new network object which contains only those edges where one or both adjacent proteins are contained in a given set of proteins.

Parameters:
proteins - set of proteins to which this network will be restricted
fullCoverage - if true then both proteins of an edge have to be in the restriction set, if false then one protein is sufficient
Returns:
network restricted to the given set of proteins

restrictToProteins

public ProteinNetwork restrictToProteins(Set<Integer> proteinIDs,
                                         boolean fullCoverage)
Returns a new network object which contains only those edges where one or both adjacent proteins are contained in a given set of proteins. The proteinIDs set should be a quickly searchable Set implementation like HashSet.

Parameters:
proteinIDs - set of proteins to which this network will be restricted
fullCoverage - if true then both proteins of an edge have to be in the restriction set, if false then one protein is sufficient
Returns:
network restricted to the given set of proteins

isDirected

public boolean isDirected()
Returns if the network is a directed network

Returns:
true if this is a directed network, false otherweise

getFilteredNetwork

public ProteinNetwork getFilteredNetwork(BooleanExpression expression)
Filters the network using a given boolean expression. All edges whose weight and annotations fullfil the expression will be contained in the resulting network. Note: The edge of a weight can be addressed using the variable name @weight in the expression.

Parameters:
expression - boolean expression used for edge evaluation
Returns:
filtered network according to the boolean expression
See Also:
BooleanExpression

getNodes

public Set<Integer> getNodes()
Get list of proteins in this network. Equal to getProteins()

Returns:
set of proteins in the network

setIterateEdgesTwice

public void setIterateEdgesTwice(boolean iterateTwice)
For undirected networks this function determines the iterator behaviour. When iterating over such a network using iterator() this value sets if each undirected edge will appear just once, or if it will appear twice in each iteration with both neighbors acting as source one time and as target the second time.

If iterating once the protein with the smaller internal ID will be the source of the edge.

Note: This setting has no effect on directed networks and will output a warning on stderr if called on such a network.

By default this value is set to false.

Parameters:
iterateTwice -

iterator

public Iterator<NetworkEdge> iterator()
Returns an iterator over all edges of the network. Be sure to check out the setIterateEdgesTwice(boolean) setting and the iterating over a network section above.

Specified by:
iterator in interface Iterable<NetworkEdge>

equalScores

public boolean equalScores(ProteinNetwork compare)
Checks whether two networks are equal regardings their edge weights

Parameters:
compare - other network for comparison
Returns:
true if and only if both networks have the same edges and all of these edges have the same weight

equals

public boolean equals(Object obj)
Returns true if and only if obj is also of type ProteinNetwork, both network have the same directedness and all edge weights and annotations of the networks are identical

Overrides:
equals in class Object

depthFirstSearch

public void depthFirstSearch(int start,
                             NetworkSearchCallback callback)
Performs a depth-first search one the network. Check out NetworkSearchCallback for more information.

Parameters:
start - the protein node where to start the search
callback - callback object to which all nodes passed in the search are reported

breadthFirstSearch

public void breadthFirstSearch(int start,
                               NetworkSearchCallback callback)
Performs a breadth-first search one the network. Check out NetworkSearchCallback for more information.

Parameters:
start - the protein node where to start the search
callback - callback object to which all nodes passed in the search are reported

combineWith

public ProteinNetwork combineWith(ProteinNetwork other,
                                  CombinationRules rules)
Combines two network using a given combination rules. If one network is directed and the other one undirected the result will be undirected

For more information check out the CombinationRules API docs.

Parameters:
other - the network this one to be combined with
rules - combination rules
Returns:
the combined network

getCutOffNetwork

public ProteinNetwork getCutOffNetwork(float cutOff)
Returns a network containing only edges with a weight greater or equal than a given cutoff value. Missing edge weights (edges which only have an annotation) are treated as zero.

Parameters:
cutOff - the cutoff value
Returns:
network only containing edges with a weight above the cutoff

getCutOffNetwork

public ProteinNetwork getCutOffNetwork(float cutOff,
                                       boolean cutBelow)
Returns a network containing only edges with a weight above or below a given cutoff value. Above means greater or equal than where as below means less or equal than. Missing edge weights (edges which only have an annotation) are treated as zero.

Parameters:
cutOff - the cutoff value
cutBelow - true to cut weights below the threshold, false to cut weights above the threshold.
Returns:
network only containing edges with a weight above or below the cutoff

copy

public ProteinNetwork copy()
Create a copy of the network.

Returns:
copy of the network

undirectedCopy

public ProteinNetwork undirectedCopy()
Create an explicitly undirected copy of the network. That means each edge of an directed network will be turned into an undirected edge in the resulting network. Note: If the directed networks contains edges in both directions, e.g. (a,b) and (b,a) the second edge will overwrite the first one regarding the weight and the annotations.

For undirected networks this function is the same as copy()

Returns:
undirected copy of the network

scalarMultiplication

public void scalarMultiplication(float factor)
Multiplies all existing edge weights of the network with a given value.

Parameters:
factor - multiplication factor

randomizeByRewiring

public ProteinNetwork randomizeByRewiring()
Randomizes a network by rewiring. For each rewiring step, two edges (a,b) and (c,d) are selected such that a != b != c != d. These 4 nodes are rewired to create two new edges (a,d) and (c,b).

This function will do 10 times as many rewirings as there are edges in the network.

Returns:
rewired network

randomizeByRewiring

public ProteinNetwork randomizeByRewiring(int rewirings)
Randomizes a network by rewiring. For each rewiring step, two edges (a,b) and (c,d) are selected such that a != b != c != d. These 4 nodes are rewired to create two new edges (a,d) and (c,b).

Parameters:
rewirings - number of rewirings which will be performed
Returns:
rewired network

derivePurificationData

public PurificationData derivePurificationData(boolean poolBaits)
                                        throws ProCopeException
Creates a purification data object from a directed network. The source proteins of each edge will be treated as the bait protein where as the targt of each directed edge is a prey.

If poolBaits==true then one PurificationExperiment for each source protein will be created. With poolBaits==false each single edge will be treated as one purification experiment. Note: poolBaits==false might cause very large and memory- intense PurificationData object

Parameters:
poolBaits - if true one PurificationExperiment is created per bait, otherwise each edge is treated as a single experiment.
Returns:
PurificationExperiment object derived from the network
Throws:
ProCopeException - if the network is undirected