Package bpm :: Module geneinter
[hide private]
[frames] | no frames]

Module geneinter

source code

'geneint.py' is the module that processes the genetic interaction input file.
Functions [hide private]
 
load_genes(geneinter_file='', ignore_file='', squaring=True)
Loads all of the gene pairs and their corresponding interaction scores into memory. It also keeps a set of all genes for iterative purposes.
source code
 
genecount()
A simple method to fetch the total number of genes. It uses a pretty shotty memoization technique.
source code
 
gi(g1, g2)
This indexing used to be a bit more complex, but the dict should contain both (g1, g2) and (g2, g1). It uses more memory but speeds up execution.
source code
Variables [hide private]
  gis = defaultdict(<type 'float'>, {})
  genes = set([])
  genespace = set([])
  numgenes = 0
  __package__ = 'bpm'
Function Details [hide private]

load_genes(geneinter_file='', ignore_file='', squaring=True)

source code 

Loads all of the gene pairs and their corresponding interaction scores into memory. It also keeps a set of all genes for iterative purposes.

There is some criteria for excluding genes from this process:
  1. If an ignore gene list file is provided, any gene in that file is excluded from the set of genes used.
  2. If an interaction score is zero, it is KEPT in the set of genes used to generate BPMs with an interaction score of 0.

This gene information is then available at the 'geneint' module level, since they are both used pervasively throughout BPM generation.

Finally, if we add the gene pair (g1, g2) with score S to the dictionary, then we'll also add (g2, g1) with score S to the dictionary. This increases memory usage but saves cpu cycles when looking up interaction scores. Basically, we force the dictionary to be a reflexive matrix.

genecount()

source code 

A simple method to fetch the total number of genes. It uses a pretty shotty memoization technique.

Actually, I don't think it's even necessary. I think taking the length of a set is O(1) time complexity. Hmm...

gi(g1, g2)

source code 

This indexing used to be a bit more complex, but the dict should contain both (g1, g2) and (g2, g1). It uses more memory but speeds up execution.

Note that gis is a defaultdict of floats. So that if a particular gene pair is absent, 0.0 will be returned.