PATRIC Website Introduces New Look, Enhanced Tools and Additional Data

Published on 2010-08-10

Genome and Annotations

In this release, 93 genomes have been annotated by RAST and added to PATRIC. These include genomes that have been newly released at RefSeq and genomes that have been updated. The following table summarizes the data available in the PATRIC database through August, 2010 PATRIC Website Release.

PATRIC

Legacy BRC

RefSeq/GenBank

Number of genomes

2228

410

2638

Number of complete genomes

2226

405

2253

Number of protein coding genes

7875143

1407677

7879248

Website Enhancements

New PATRIC Home Page

We have redesigned the PATRIC home page to provide additional summary information about, and easy access into, our bacterial genome resource. Specifically, the home page now contains the following regions of interest:

  • Watchlist Genera: Contains the list of 22 genera along with a detailed pane that displays a variety of information about genera of interest including a link to the genera overview page; taxon lineage; number of CDSs; number of Genomes; links to literature, taxonomy, phylogeny and post-genomics pages; and relevant images retrieved in real-time from Google.
    • Searches and Tools: Provides lightweight interfaces to four of PATRIC’s tools including the Genome Finder, Feature Finder, Comparative Pathway Tool, and Protein Family Sorter. Users can also access the full featured front-end to each of these searches and tools by selecting the “Advanced Search” option.
    • Most Viewed Bacteria Tag Cloud: Provides an “at a glance” view of the most popular bacteria pages visited by PATRIC users as tracked by Google Analytics. Users can visit the overview page for a specific bacteria by selecting one of the terms in the Tag Cloud.
    • News: Displays a designated “Feature” article, as well as the most recent news related to PATRIC. A more complete description of News is provided below.

We have also upgraded the look and feel of the entire PATRIC website; including a new RSS feed, and social networking “share” capabilities. Site-wide, the main navigation remains consistent with the previous PATRIC website, with slight modifications to the “About” tab to accommodate the new FAQ and News web features.

PATRIC News

This release marks the introduction of PATRIC News; an interactive outreach-centered resource containing up-to-date information including PATRIC announcements, presentations, publications, website and data releases, and news stories that reference PATRIC.

PATRIC FAQs

PATRIC has new and expanded Frequently Asked Questions (FAQs) covering many of PATRIC’s key features, searches and tools. These FAQs allow users to fully leverage PATRIC’s capabilities. This release includes the FAQs about Annotation, Feature Cart, Feature Table, Comparative Pathway Tool, and Protein Family Sorter.

Comparative Pathway Analysis

PATRIC now supports comparative pathway analysis across multiple genomes. Data can be accessed in one of the two ways:

1) “Pathway” Tab: Is now available at any taxonomy level (for an example, see the Mycobacterium Pathway tab). Using the Pathway tab at this level provides all pathways, as well as the EC numbers and genes that have been annotated at this level. Clicking on any of the tabs available (Pathways, EC Numbers, or Genes) can take the user directly to specific annotation information. Additional features included on the opening page include:

  • Genome Count: This gives the number of genomes that have some genes present in this pathway at the taxonomic level chosen.
    • Unique Gene Count: This provides a list of all the genes in all the genomes that belong to this pathway. Clicking on any number in this column will provide a list of the annotated genes in each genome that belong to this family
    • Unique EC Count: The Enzyme Commission number (EC number) is a numerical classification scheme for enzymes, based on the chemical reactions they catalyze and in a given KEGG metabolic pathway, each step has an EC number assigned to it. In a given genome there may be several genes that have been assigned the same EC number, meaning that several different genes have the possibility of doing the same job. The unique EC count tells how many steps within the pathway have at least one gene behind them.
    • EC Conservation %: This number gives the percent of unique EC numbers present in all pathways. 100% describes a situation in which all the unique EC numbers are present in all the genomes being examined. Smaller numbers indicate that there is one or more genomes are missing some EC numbers.
    • Gene Conservation: A genome can have several genes that have been assigned the same EC number. Gene conservation provides an estimate of pathways where there might be redundancies, or where EC numbers are missing. Numbers greater than one mean that in at least one genome, there is more than one gene that has been assigned a particular EC number. Numbers less than one mean indicate that in at least one genome, a particular EC number is missing. This provides a quick way to see which pathways have perfect conservation across all genomes (Gene Conservation = 1) to those pathways where there are differences among the genomes. The users are then able to explore these differences by drilling down on either the Unique Gene Count, or the Unique EC Count.
    • User can also visualize data on pathway map by clicking on a “Pathway Name”. This takes the viewer to a KEGG pathway where all the annotation for all selected genomes has been mapped and displayed by either looking across the pathway, or by the table on the left that summarizes the information.

2) Comparative Pathway Tool : Allows users to select multiple genomes from different genus or taxonomic groups and compare metabolic pathways. Data is presented in a way similar to described above.

For more information, visit Comparative Pathway Tool FAQs.

Post-genomics Data

PATRIC now provides a summary of available post-genomic data across multiple sources, with consolidated access to specific experimental datasets, details, and results. At this time, PATRIC retrieves this data in real-time from the prominent post-genomic databases. Data is displayed by clicking the ‘Post-genomic Data’ tab at any taxonomic level. Actual experimental data and results can be accessed via link outs to respective databases. The post-genomic data is divided into categories that include Transcriptomics/Expression, Proteomics/Mass Spec data, Structure and Protein-Protein Interaction (for example, see Escherichia & Mycobacterium ).

Improved Search Tools

  • `Genome Finder <http://www.patricbrc.org/portal/portal/patric/GenomeFinder>`__: The usability of this tool has been improved both in speed and functionality. The results are organized as a list of genomes and a list of genomic sequences that match the specified search criteria.
    • `Pathway Finder now Comparative Pathway Tool <http://www.patricbrc.org/portal/portal/patric/PathwayFinder>`__: The usability of this tool has been improved both in speed and functionality. The results are presented as list of pathways, a list of EC numbers, and a list of genes that match the search parameters specified by the user. Comparative Pathway Tool also now supports comparative pathway analysis when multiple genomes are selected.

Data Downloads

PATRIC now offers downloadable data files in several different formats, including FASTA, tab delimited, and GenBank file formats. Download files are organized by genomes. Each genome directory includes FASTA sequence files for genomic sequences (*.fna), protein coding genes (*.ffn), RNA coding genes (*.frn), and proteins (*.faa). Each directory also includes annotations in GenBank file format (*.gbf). Genome directories include tab delimited files for all genomic features (.features.tab), protein coding genes (*.cds.tab), RNAs (*.rna.tab), GO assignments (*.go), EC assignments (*.ec), and pathway assignments (*.path). Data from three different annotation sources, PATRIC, RefSeq, and Legacy BRCs, are provided in separate files (.PATRIC.*, *.RefSeq.*, *.BRC.*). Each genome directory also includes a tab delimited file containing mappings between PATRIC and RefSeq identifiers (*.PATRIC2RefSeq). Files are available under genomes directory under Downloads. A “Download genome data” link is provided on the top right corner of all genome level pages.