Supplementary Data and Utility Scripts for Gianoulis & Raes, et al. PNAS 2009

To whom correspondence should be addressed: Mark Gerstein and Peer Bork

Gerstein Group Home Page

Bork Group Home Page


Mapping of peptides to sites and blasting against the various types of pathway databases were computationally expensive steps. To aid in reanalysis, we provide a complete data dump of unprocessed mappings, utility scripts, as well as, processed output, in easily parseable, machine readable formats. A description of all files, their formats, and associated READMEs can be found here.

The contents of all of these directories can be downloaded by clicking DOWNLOAD ALL or see Table of Contents for the particular files of interest.
Please Note: The compressed data is 1106790400 (@110 MB) and the uncompressed data is 182230797 (@182 MB).

DOWNLOAD ALL


Table of Contents

Paper Citation

Mappings

Pathway Scores

Processed Data

Utility Scripts


Paper Citation

Gianoulis & Raes et al. Quantifying environmental adaptation of microbial metabolic pathways. PNAS 2009

Interactive Metabolic Map

Materials and Methods

Highlighted in Science, Editor's Choice


Mappings

GOS Peptide-Site Mapping. Mapping GOS Peptides to Site Ids through Scaffolds

GOS Peptide-KEGG ID Mapping. Mapping GOS peptides and KEGG ids (BLAST results)


Pathway Scores

COG Frequency Score per Site. Tab delimited matrix of frequency score for each COG at each site.

KEGG Frequency Score per Site. Tab delimited matrix of frequency score for each KEGG at each site.

MODULE Frequency Score per Site. Tab delimited matrix of frequency score for each MODULE at each site.

OPERON Frequency Score per Site. Tab delimited matrix of frequency score for each OPERON at each site.


Processed Data

Pairwise Correlations. Correlation computed between each environmental feature and each COG, KEGG, module, and operon, respectively.

Linear Models for Environmental Features. Linear models constructed with environmental feature as response variable

Linear Models for Metabolism Features. Linear model constructed with metabolic feature as response variable

Results from DPM. Results of DPM for each COG, KEGG, module, and operon, respectively.

Results from Regularized CCA. Results of regularized CCA (2 dimensions) for each COG, KEGG, module, and operon, respectively.


Utility Scripts

Detailed README here:

Create Color Gradient for Metabolic Map. Matlab script to create a color gradient for ipath maps on the basis of some feature (e.g. structural correlation coefficients)

Statistics to evaluate results from regularized CCA. R script to calculate a number of metrics to evaluate results of regCCA

The next three scripts are used to evaluate similarity between two sets of clusterings. As an example, we may obtain two clusters after clustering the geographic sites on the basis of the environment, and two "similar" clusters after clustering the same sites on the basis of metabolism. Rand index and normalized mutual information can be used to quantify how "similar" the two sets of clusterings (based on environment and based on metabolism) are.

Parser. Parses the clustering input file

Rand Index. Computes the rand index between sets of clusterings where each clustering can consist of an arbitrary number of clusters

Normalized Mutual Information. Same as above except to compute normalized mutual information (NMI)



Return to Table of Contents