Supplementary Data and Utility Scripts for Gianoulis & Raes, et al. PNAS 2009
Gerstein Group Home Page
Bork Group Home Page
Mapping of peptides to sites and blasting against the various types of pathway databases were computationally expensive steps. To aid in reanalysis, we provide a complete data dump of unprocessed mappings, utility scripts, as well as, processed output, in easily parseable, machine readable formats. A description of all files, their formats, and associated READMEs can be found here.
The contents of all of these directories can be downloaded by clicking DOWNLOAD ALL or see Table of Contents for the particular files of interest.
Please Note: The compressed data is 1106790400 (@110 MB) and the uncompressed data is 182230797 (@182 MB).
Table of Contents
Gianoulis & Raes et al. Quantifying environmental adaptation of microbial metabolic pathways. PNAS 2009
Interactive Metabolic Map
Materials and Methods
Highlighted in Science, Editor's Choice
GOS Peptide-Site Mapping. Mapping GOS Peptides to Site Ids through Scaffolds
GOS Peptide-KEGG ID Mapping. Mapping GOS peptides and KEGG ids (BLAST results)
COG Frequency Score per Site. Tab delimited matrix of frequency score for each COG at each site.
KEGG Frequency Score per Site. Tab delimited matrix of frequency score for each KEGG at each site.
MODULE Frequency Score per Site. Tab delimited matrix of frequency score for each MODULE at each site.
OPERON Frequency Score per Site. Tab delimited matrix of frequency score for each OPERON at each site.
Pairwise Correlations. Correlation computed between each environmental feature and each COG, KEGG, module, and operon, respectively.
Linear Models for Environmental Features. Linear models constructed with environmental feature as response variable
Linear Models for Metabolism Features. Linear model constructed with metabolic feature as response variable
Results from DPM. Results of DPM for each COG, KEGG, module, and operon, respectively.
Results from Regularized CCA. Results of regularized CCA (2 dimensions) for each COG, KEGG, module, and operon, respectively.
Utility ScriptsDetailed README here:
Create Color Gradient for Metabolic Map. Matlab script to create a color gradient for ipath maps on the basis of some feature (e.g. structural correlation coefficients)
Statistics to evaluate results from regularized CCA. R script to calculate a number of metrics to evaluate results of regCCAThe next three scripts are used to evaluate similarity between two sets of clusterings. As an example, we may obtain two clusters after clustering the geographic sites on the basis of the environment, and two "similar" clusters after clustering the same sites on the basis of metabolism. Rand index and normalized mutual information can be used to quantify how "similar" the two sets of clusterings (based on environment and based on metabolism) are.
Parser. Parses the clustering input file
Rand Index. Computes the rand index between sets of clusterings where each clustering can consist of an arbitrary number of clusters
Normalized Mutual Information. Same as above except to compute normalized mutual information (NMI)
Return to Table of Contents