PATRIC July 2011 Release Features Integration of Genome Metadata

Published on 2011-08-01

New Searches and Tools

Integration of Genome Metadata

PATRIC has collected metadata associated with bacterial genomes from multiple sources, such as NCBI’s BioProject database, GenBank records, and the Human Microbiome Project.  Following the automated collection, the metadata was manually curated for consistency and accuracy, organized into a relational database, and integrated with the genomes available in PATRIC.  This metadata provides useful information about a genome, including genome status, isolation source, isolation country, collection date, host organism, associated disease, and more. The metadata is organized into the following broad categories:  organism info, project info, sequence info, isolate info, host info, and phenotype info.

Genome metadata can be accessed in multiple ways on the PATRIC website:

  • The improved Genome Finder allows users to locate genomes available in PATRIC by their associated metadata. A keyword search allows for the search across all available metadata fields, including the free-text comments. These search results are summarized by key metadata attributes, including genome status, isolation country, host name, disease, collection date, and completion date.  Progressive filtering of the results is also available. http://patricbrc.org/portal/portal/patric/GenomeFinder?cType=taxon&cId=&dm=
  • A summary of key metadata attributes is provided on every Taxon Overview page.

For an example see: http://patricbrc.org/portal/portal/patric/Taxon?cType=taxon&cId=561

  • Metadata is also available at the Genome List tab, present at all taxonomy levels.  Progressive filtering by key metadata attributes is provided.  The Genome List table allows users to choose any of the 61 currently available metadata attributes and add them to the table as columns. In addition, users can also download all available genomes and associated metadata as a tab delimited or Excel file.  For an example see:

http://patricbrc.org/portal/portal/patric/GenomeList?cType=taxon&cId=561&dataSource=&displayMode=&pk=9379#key=9379&pS=20&aP=1&aT=0&cwG=false&cF=&gId=&gName=&gdir=ASC&gsort=genome_name&sdir=&ssort=

  • All available metadata for an individual genome is located on the Genome Overview Page under Genome Summary.

http://patricbrc.org/portal/portal/patric/Genome?cType=genome&cId=200960

To learn more about the genome metadata available at PATRIC, or how to search for a genome by its available metadata, visit Genome Metadata FAQs or Genome Finder FAQs, respectively.

The collection and refinement of genome metadata is an ongoing task. Future plans include integrating the genome metadata with various analysis tools available on the PATRIC website.

Data Download in GFF Format

PATRIC annotations are now available for download as GFF files, which can be accessed via the Downloads tab at the top of any PATRIC page.  Alternatively, GFF files can be accessed by clicking “Download genome data” in the upper-right of any PATRIC Genome Overview page.  For an example, see:

http://brcdownloads.vbi.vt.edu/patric2/genomes/Escherichia_coli_O104-H4_str_LB226692/Escherichia_coli_O104-H4_str_LB226692.PATRIC.gff

Genomes and Annotations

Since the June release, 756 new genomes have been added to PATRIC.  Many of them are draft assemblies available in GenBank but not in RefSeq. In addition, 62 genomes have been updated or replaced with their newest versions. In total, 749 new genomes have been annotated using RAST.

A summary of the genomes available on the PATRIC website through July, 2011 is provided in the table below:

PATRIC

Legacy BRC

RefSeq

Number of genomes

3535

337

3685

Number of Complete genomes

1491

237

1488

Number of WGS genomes

2044

96

1800

Number of Plasmid only genomes

4

397

To view this Sequence Summary along with Genomic and Protein Feature Summaries, please visit: http://patricbrc.org/portal/portal/patric/Taxon?cType=taxon&cId=2