Running TEMPO on the Example

Chris Pietras

2018-07-11

Example Data

Example data is included with the package. “dflatExample” is a subset of the full DFLAT biological process gene sets, and gse32472Example is a subset of a BPD data set from GEO.

library("tempoR")
data("dflatExample")
data("gse32472Example")

Each of the example data sets contain data as it would be read in from .gct, .cls, and .gmt files using the “loadCLS”, “loadGMT”, and “loadGCT” functions. If the corresponding source files existed, the example datasets could be created with the following set of commands:

dflatExample = loadGMT("dflatExample.gmt")
gse32472Example = list()
gse32472Example$data = loadGCT("gse32472Example.gct")
gse32472Example$age = loadCLS("gse32472SExample.age.cls",sampleNames=rownames(gse32472Example$data))
gse32472Example$bpd = loadCLS("gse32472Example.phen.cls",sampleNames=rownames(gse32472Example$data))

Running TEMPO

“tempo.run” is the core function for running TEMPO. If the optional “output” argument is included (in this example, set to “path/to/output”, which should be replaced as appropriate), TEMPO produces output files “/path/to/output.table” and “/path/to/output.pdf”, which contain tables (in raw text form) and plots, respectively, for all reported results. For this example, we run only 4 permutations instead of the default 500, and use only 4 CPU cores instead of the default 24. Since the default reporting cutoffs are not meaningful with only 4 permutations, in this example, we report all results by setting the p-value, FDR, and score cutoffs to 1, 1, and -1, respectively.

results = tempo.run(phen=gse32472Example$bpd,
                    genesets=dflatExample,
                    X=gse32472Example$data,
                    Y=gse32472Example$age,
                    numPerms=4,
                    nCores=4,
                    output="/path/to/output",
                    pCutoff=1,
                    fdrCutoff=1,
                    scoreCutoff=-1)

Above, TEMPO assumes the first entry in the list passed the “phen” is a control sample and all other types are test samples. The control and text samples can instead by specified exactly - in this case, the below produces the exact same results as the above:

results = tempo.run(ctrl=gse32472Example$ctrl,
                    test=gse32472Example$test,
                    genesets=dflatExample,
                    X=gse32472Example$data,
                    Y=gse32472Example$age,
                    numPerms=4,
                    nCores=4,
                    output="/path/to/output",
                    pCutoff=1,
                    fdrCutoff=1,
                    scoreCutoff=-1)

Viewing Output

Either of the previous commands will generate at .table and .pdf output file, which will look similar to the ones presented here.

The .table file is a tab delimited text file containing full information for all reported gene sets - which, for the purposes of this example, is all gene sets - ordered by decreasing TEMPO score.

ctrlMSE score linFactor slopeFactor p BH
positive regulation of immunoglobulin production 1.094505 7.2129283 0.9999990 0.9311023 0.2 0.4
cochlea development 1.190724 6.6772864 0.9999916 0.8676115 0.2 0.4
regulation of immunoglobulin production 1.339489 6.0805513 0.9999874 0.9114747 0.2 0.4
response to cAMP 1.263533 5.6809881 0.9999927 0.9106257 0.2 0.4
cerebral cortex radial glia guided migration 1.618526 5.0567749 0.9996566 0.8134894 0.2 0.4
prostanoid biosynthetic process 1.420159 2.2424621 0.9996572 0.6793648 0.2 0.4
neural crest cell differentiation 1.747706 2.1243285 0.9950944 0.5980244 0.2 0.4
hormone metabolic process 1.748454 2.0900312 0.9980034 0.7078783 0.2 0.4
regulation of endoplasmic reticulum unfolded protein response 2.091081 1.6630500 0.9749536 0.5383708 0.2 0.4
neural precursor cell proliferation 1.929966 1.3614694 0.9808389 0.5085986 0.2 0.4
detection of bacterium 1.980796 0.5833796 0.8814510 0.2358503 0.8 1.0
regulation of bone remodeling 2.315838 0.4740784 0.7766118 0.2554804 1.0 1.0
positive regulation of branching involved in ureteric bud morphogenesis 2.226511 0.1708431 0.5560895 0.1180688 1.0 1.0
histone H3-K9 methylation 2.164172 0.1246942 0.5227711 0.0930816 1.0 1.0
negative regulation of multi-organism process 2.324547 0.0872643 0.4004752 0.0840047 1.0 1.0
pattern recognition receptor signaling pathway 2.482028 0.0422482 0.2664545 0.0610940 1.0 1.0
cholesterol biosynthetic process 2.373395 -0.0001913 0.0192604 -0.0032336 1.0 1.0
positive regulation of execution phase of apoptosis 2.270780 -0.0007428 0.0399790 -0.0053916 1.0 1.0
protein targeting to plasma membrane 2.470136 -0.0579610 0.3497852 -0.0586395 1.0 1.0
negative regulation of intrinsic apoptotic signaling pathway in response to DNA damage 2.602118 -0.2627197 0.7024475 -0.1295797 1.0 1.0

The .pdf file contains plots for each reported gene set of the actual age for each sample vs. the age predicted by the PLS models inside TEMPO for that gene set. “tempo.mkplot” can be used to generate the plot for a specified gene set. “Cochlea development” is a gene set that is significant in the full anaylsis, while “histone H3−K9 methylation” is not significant.

tempo.mkplot(results,"cochlea development")

tempo.mkplot(results,"histone H3−K9 methylation")

Additional TEMPO features

Instead of training and evaluating PLSR models on the control samples in leave-one-out cross-validation, models can instead be trained on a held-out set of training samples and evaluated scores on a separate set of control samples. The below example trains TEMPO models on the first 10 control samples, and calculates TEMPO scores using the second 10 control samples and all test samples:

results2 = tempo.run(train=gse32472Example$ctrl[1:10],
                     ctrl=gse32472Example$ctrl[11:20],
                     test=gse32472Example$test,
                     genesets=dflatExample,
                     X=gse32472Example$data,
                     Y=gse32472Example$age,
                     numPerms=4,
                     nCores=4)