Differential Expression Data and Tools

Data Types Supported

In the PATRIC context, Differential Expression Data can include quantitative gene expression data generated by high-throughput technologies, such as microarrays or RNA-Seq, or protein expression data as well. PATRIC has integrated a large number of published gene expression datasets related to bacterial pathogens from NCBI’s GEO database.

Differential Expression Data on the PATRIC Website

You can access gene expression data at PATRIC in following ways:

  • Go to any Taxonomy or Genome level landing page and click on the Transcriptomics Tab to see a list of all experimental datasets available for that taxonomy level or genome.
  • Go to any Gene or CDS level landing page and click on the Transcriptomics Tab to see all transcriptomics datasets in which the gene is reported as being expressed. You can click on the Correlated Genes Tab to see a list of all the genes that are positively or negatively correlated with your selected gene.
  • Transcriptomics data may be accessed via PATRIC’s Global Search.

Using Differential Expression Data at PATRIC

PATRIC provides a suite of integrated tools that allow you to explore, visualize, analyze, and compare published gene expression datasets available at PATRIC. You may also privately and securely upload and analyze your own unpublished gene expression data and compare it with other published datasets.

Functionality includes:

  • View lists of gene expression datasets available at PATRIC. Filter the list of Experiments or Comparisons based on manually curated metadata (such as Organism, Strain, Gene Modification, and Experimental Condition) to quickly find experiments of interest.
  • Select one or more experiments and generate a dynamic Gene List and associated Heatmap viewer. You can filter the Gene List based on Log Ratio or Z-score cut-off, up/down regulation, or gene functions. Use the Heatmap Viewer and clustering to quickly find genes that are similarly expressed across one or more comparisons. Your selections may also be saved as a group in your Workspace for future use.
  • Select a subset of genes and view corresponding metabolic pathways.
  • View a landing page for an individual Transcriptomics experiment, which includes available metadata and curated comparisons to similar experiments.
  • View all available transcriptomics data for a particular gene of interest including:
    • A keyword search to see the expression of the gene under selected experimental conditions.
    • Application of Log Ratio or Z-score cut-off to find top experimental conditions or genetic modifications causing differential expression of the gene, which may help to assign potential function to a hypothetical gene.
    • See all the genes that are positively or negatively correlated based on gene expression ratio. This may help to identify genes that are co-regulated or have similar functions.
    • To browse all transcriptomics data related to a specific gene of interest, see Gene Page Transcriptomics FAQs.
  • Upload your own transcriptomics data, generated by microarray or RNA-seq technologies to your private Workspace and analyze it using annotations and analysis tools available at PATRIC. You can also compare your data with other transcriptomics datasets within PATRIC.

Gene Expression Data Sources

Most of the gene expression datasets available at PATRIC are published gene expression datasets related to bacterial pathogens from NCBI’s GEO database.

The following are the determining criteria for which datasets are currently incorporated into PATRIC:

  • Include only gene expression datasets. Exclude any datasets generated by genomic hybridization assays, ChiP-chip/ChiP-seq, or protein arrays.
  • Exclude any datasets generated using multi-organism platforms as it presents additional challenges for data integration, presentation, and interpretation.
  • Exclude any host response datasets, i.e., datasets measuring expression of the host genes.
  • Exclude any datasets that are not published. Exceptions are made in some cases if the experimental design is clear from the data or its description.
  • Exclude any datasets (sub-series) that are part of a larger dataset (super-series) to avoid redundancy. In such cases, only super-series are incorporated into PATRIC.

Gene Expression Data Curation and Processing Protocol

Once a dataset is retrieved from GEO, PATRIC performs the following steps to process it:

  • Map organism and gene identifiers described in the platform to corresponding genomes and genes in PATRIC.
  • Manually review the experimental description and associated publication to understand the experimental design and goals of the study.
  • Decide which comparisons should be saved in the database. Carefully check for replicates and dye swaps.
  • Use a manual curation process to assure quality: create a comparison, combine results from replicates (discard any replicates that are inconsistent), normalize and log-transform the data, if required.
  • Manually curate experiment and comparison metadata to accurately and consistently capture information such as sample organism, strain, genetic modification, experimental condition, and time-point.
  • Lastly, integrate transcriptomics data with the genomic data in PATRIC to provide integrated data analysis capabilities.