Genecentric - A package to uncover graph-theoretic structure in high-throughput epistasis data.

Examples

A typical example
Generate BPMs without pruning
Change the genespace used by FuncAssociate
Finding available species and namespaces for GO annotation
Convert an E-MAP file to a genetic interaction file

For all examples below, we are using E-MAP data from the Collins et al dataset from the Krogan Lab Interactome Database, and a list of essential Saccharomyces cerevisiae genes from the Saccharomyces Genome Deletion Project. The files should come with the Genecentric distribution, but you can download yeast_emap.gi and essentials from us.

If you'd like to use other data with Genecentric, please see our documentation for instruction on how to convert existing data to data that can be read by Genecentric.

A typical example

This example uses the default parameters to generate BPMs while excluding essential Yeast genes, and perform GO enrichment on those BPMs.

To generate BPMs:

genecentric-bpms -e essentials yeast_emap.gi output.bpm

And to perform GO enrichment on the BPMs in output.bpm:

genecentric-go -e essentials yeast_emap.gi output.bpm enrichment.gobpm

GO enrichment results will now be in the enrichment.gobpm file.

Generate BPMs without pruning

This example generates BPMs with no pruning whatsoever. Namely, the number of BPMs produced will equal the number of unique genes in the genetic interaction data.

genecentric-bpms -e essentials --no-jaccard --minimum-size 1 --maximum-size 0 yeast_emap.gi notpruned.bpm

The --no-jaccard option is used to disable Jaccard-style pruning, and the --minimum-size 1 and --maximum-size 0 options are used to prevent pruning of BPMs that are either too small or too big.

Change the genespace used by FuncAssociate

By default, Genecentric will tell FuncAssociate to use only the genes in the genetic interaction data as a genespace. In some instances, it may be desirable to specify that the default genespace be used, which consists of all genes in the species recorded by FuncAssociate. This can be accomplished using the --fa-species-genespace option:

genecentric-go --fa-species-genespace yeast_emap.gi output.bpm enrichment.gobpm

Finding available species and namespaces for GO annotation

If you'd like to perform GO enrichment on BPMs generated with species other than Saccharomyces cereivisiae, the defaults built into genecentric-go will need to be overwritten.

But first, we have to ask FuncAssociate which species it supports:

genecentric-fainfo species

Which should give some output like the following:

Agrobacterium tumefaciens Anaplasma phagocytophilum ... Homo sapiens ... Vibrio cholerae

Let's use Homo sapiens as our example. While we now know that FuncAssociate supports the species Homo sapiens, we still need to tell FuncAssociate which namespace to use (this depends on the kind of gene identifiers in your genetic interaction data). We can query FuncAssociate for the available namespaces for Homo sapiens like so:

genecentric-fainfo namespaces 'Homo sapiens'

Which should give some output like the following:

affy_hg_u133_plus_2 affy_hg_u133a ... entrezgene ... uniprot_swissprot uniprot_swissprot_accession

Let's say we'd like to use the entrezgene namespace. We can now perform GO enrichment on our BPMs like so:

genecentric-go --fa-species 'Homo sapiens' --fa-namespace 'entrezgene' homo-sapiens.gi homo-sapiens.bpm homo-sapiens.gobpm

Convert an E-MAP file to a genetic interaction file

If you have an E-MAP data file but would like to convert it to a genetic interaction data file (which is the only format of input that Genecentric supports), you can use a program provided by the Genecentric package called genecentric-from-csv. It takes as input an E-MAP file and outputs a .gi file that can be read by Genecentric.

genecentric-from-csv chrombio.csv yeast_emap.gi

You can view more options using genecentric-from-csv --help. The options allow you to specify the format of the E-MAP file; particularly which columns have the gene identifier information and which column has the genetic interaction score.

There is more information about genecentric-from-csv and some advice on what to do if you have other kinds of data in the documentation for genecentric-from-csv.