Package bpm :: Module prune
[hide private]
[frames] | no frames]

Module prune

source code

'prune.py' provides all the functions required to prune a set of BPMs after they have been generated.
Functions [hide private]
 
prune(bpms)
After all BPMs are generated, two different pruning mechanisms are applied.
source code
 
interweight((A, B))
Calculates the interaction weight of a BPM. It is defined as the difference of sums of interaction scores within each module and the sum of interaction scores between each module divided by the number of genes in the entire BPM.
source code
 
jaccard_index(bpm1, bpm2)
The Jaccard index of a BPM: the number of genes in the intersection of bpm1 and bpm2 divided by the number of genes in the union of bpm1 and bpm2.
source code
 
constraint_min(A, B)
Whether a BPM's modules BOTH satisfy the minimum size constraint.
source code
 
constraint_max(A, B)
Whether a BPM's modules BOTH satisfy the maximum size constraint.
source code
 
satisfy_min_max(A, B)
Whether a BPM's modules satisfy both the min and max size constraints. (Including if the constraints are disabled.)
source code
Variables [hide private]
  __package__ = 'bpm'
Function Details [hide private]

prune(bpms)

source code 

After all BPMs are generated, two different pruning mechanisms are applied.

The first is pruning all BPMs that have a module less than the minimum size or greater than the maximum size. If either is 0, then the pruning for that constraint is skipped.

The second pruning mechanism is more complex. Essentially, the interaction weight of each BPM is calculated (see 'interweight') and the list of BPMs are then sorted by that interaction weight in descending order. Starting from the beginning, BPMs are then added to final set of BPMs if and only if its Jaccard index with every BPM already in the final set is less than the threshold.

interweight((A, B))

source code 

Calculates the interaction weight of a BPM. It is defined as the difference of sums of interaction scores within each module and the sum of interaction scores between each module divided by the number of genes in the entire BPM.

The value returned is a BPM "decorated" with the interaction weight for sorting purposes. This roundabout means of decoration is used so that parallelism can be used for calculating the interaction weights. (As opposed to using a higher order function with 'sorted'.)

jaccard_index(bpm1, bpm2)

source code 

The Jaccard index of a BPM: the number of genes in the intersection of bpm1 and bpm2 divided by the number of genes in the union of bpm1 and bpm2.

A BPM is simply the union of its corresponding modules.