Capabilities And Functionality¶
This section describes the features of the website from the end-user’s perspective. These data and features are enabled, delivered, and updated through the system infrastructure and processes, described in the other sections of this documentation.
Data Type Pages/Tabs¶
Primary data types in PATRIC have dedicated data pages that contain dynamically generated content retrieved from the PATRIC databases and queries to other external resources.
Page/Tab components: https://github.com/PATRIC3/p3_web/tree/master/public/js/p3/widget
Git submodules/external dependencies for JS: https://github.com/PATRIC3/p3_web/tree/master/public/js
From Overview tab: Provides summary information for all genomes within the selected taxon level, including links to reference and representative genomes, interactive graphical portrayal of genome counts by antimicrobial resistance by RIS value (Resistant, Susceptible, or Intermediate) and metadata category (Host Name, Disease, Isolation Country, and Genome Status). Provides links to recent PubMed articles relevant to the selected taxon.
From Phylogeny tab: Displays the interactive Phylogeny Viewer (described below), linked to the associated order-level tree for the selected taxonomy level.
From Taxonomy tab: Provides an interactive expandable/collapsible taxonomy tree browser based on the NCBI taxonomy, filtered to the user-selected taxonomy level. The page provides links back to the appropriate Taxonomy Overview Page for the selected taxonomy level.
From Genomes tab: Provides a list of genomes and sequences in the PATRIC database, filtered to the selected taxonomy level. Lists are filterable through an interactive metadata summary menu and by keyword. From this page, genomes and sequences can be selected and added to the user’s workspace and downloaded in Excel or text format. Selected genomes can be viewed in the Genome Browser. The table containing the list is sortable and columns can be shown/hidden—this feature is true of all interactive tables in the PATRIC Website.
From AMR Phenotypes tab: Provides a list of all genomes, by antibiotic AMR value, within the selected taxonomy level which have AMR values, either RIS or MIC (minimum inhibitory concentration), with associated metadata. In addition to the standard table features for genomes, links are provided to the associated Antibiotic Page in PATRIC for the selected genome/antibiotic item.
From AMR Phenotypes tab: Provides a list of all sequences (chromosome, contig, plasmid) associated with the genomes within the selected taxonomy level. Additional metadata such as length, GC content, sequence type, and topology are provided. In addition to the standard table features for genomes, links are provided to the associated Feature List Page for features associated with the selected sequences.
From Features tab: Provides a list of features (CDS, source, repeat region, tRNA, pseudogene, rRNA, gene, and others) associated with genomes in the selected taxonomy level. From this page, the following functionality is available for selected features:
download in Excel or text format
list of corresponding genomes
display FASTA DNA and Protein sequence
interactive multiple sequence alignment
ID mapping (see description below)
interactive KEGG pathways (see Comparative Pathway Tool below)
addition to the user’s workspace as a group
From Specialty Genes tab: Provides Specialty Genes, i.e., genes that are of particular interest to the infectious disease researches, such as virulence factors, transporters, drug targets, antibiotic resistance genes, human homologs, and essential genes (see Specialty Genes section below). For selected genes, the same functionality is available as is for features in the Feature Page.
From Protein Families tab: Displays the interactive Protein Family Sorter (described below), filtered to the selected taxonomy level.
From Pathways tab: Displays the interactive Comparative Pathway Tool (described below), filtered to the selected taxonomy level.
From Transcriptomics tab: Provides access to and comparative tools for transcriptomics data curated (primarily) from NCBI’s GEO database. Users can also privately and securely upload and analyze their own unpublished gene expression data—generated by microarray or RNA-Seq technologies—and compare it with the other curated transcriptomics datasets in PATRIC. Functionality is summarized below.
From Interactions tab: Provides experimentally and computationally derived host-pathogen and protein-protein interactions (HPI/PPI) associated with the selected taxon level. The HPI/PPI data are collected from over 15 public repositories, including STRINGDB. Displays tabular and interaction network graph views (described below).
Tools and Visualizations¶
PATRIC tools and visualizations are interactive components within the website that enable the user to search, retrieve, filter, compare, analyze, graphically portray, and otherwise reformat the presentation of data. Some tools (e.g., Global Search) have all of the required code in the https://github.com/PATRIC3/p3_web/blob/master/.gitmodules repo, and thus do not have a separate repo
From top-right portion of website: Performs full-text searches within the PATRIC Solr database for the specified search terms and returns lists of pages with relevant information. Pre-filter data types are selectable, including Genomes, Genome Features, Specialty Genes, Taxa, Transcriptomics Experiments, and Antibiotics. Boolean operators and exact term match options are available.
From the Graph option: Provides interactive, Cytoscape-based network visualization of experimentally confirmed and computationally derived protein-protein interactions that occur between host and bacterial proteins and proteins in the bacterium. Interaction data are collected data are collected from over 15 public repositories, including STRINGDB. Interactions can be selected at the taxon, genome, and feature levels.
Supports comparison of consistently annotated metabolic pathways across closely related or diverse groups of genomes and visualizes them using interactive KEGG maps and heatmaps. The heatmap view is an interactive visualization tool that provides an overview of the distribution of genomes across the set of EC numbers within a selected pathway.
Compares protein families across closely related or diverse groups of genomes, visualizes them using interactive heatmaps, and generates multiple sequence alignments and phylogenetic trees for individual families. The heatmap view is an interactive visualization tool that provides an overview of the distribution of proteins across a selected set of genomes.
From the Filter Tool in Genome Lists: Facilitates locating, sorting, and filtering genomes of interest based on various combinations of over 70 different metadata fields. For instance, all genomes that have been isolated from humans, genomes related by phylogeny, or genomes related by lifestyle.
From Transcriptomics tab: Provides tools for comparative analysis of transcriptomics data including metadata filters; filtering gene lists based on Log Ratio or Z-score cut-off, up/down regulation, or gene functions; using the Heatmap Viewer and clustering; viewing corresponding metabolic pathways; and finding positively or negatively correlated genes based on gene expression ratio.
Allows exploration of phylogenetic relationships using species- and genus-level coloring schemes. PATRIC’s phylogeny viewer also supports custom creation of genome groups to be used as a basis for analysis in other PATRIC tools.
Allows comparison of genomic regions around a gene of interest across closely related genomes. Shows differences in translation start sites, potential frame shifts, or missing genes and facilitates visual identification of proteins with similar functions.
Provides graphical portrayal of the alignment of genes and other genomic data (i.e., genome features) depicted along a central horizontal axis of genome coordinates. PATRIC’s genome browser supports comparison of genome annotations from multiple sources (e.g., PATRIC, RefSeq, etc.). Users can upload their own custom tracks.
Portrays the genome in a circular map, showing genome annotations and sequence properties. Provides tracks for chromosomes / plasmids / contigs, CDS (forward & reverse), RNAs, GC content, GC skew, and miscellaneous features, GC content and GC skew can be displayed as a line plot, histogram, or heatmap. Users can upload their own custom tracks.
PATRIC services provide simple, integrated access to computational software for processing and analysis of raw data and common data types. Access is provided via the Services top menu which displays a simple submission form for each service. In order to use most of the services, the user must be logged in (denoted by “Login required” at the end of the descriptions below). This is required in order to accommodate user upload of their data and longer, more computationally intensive analyses on HPC machines. The results of the service are deposited in the user’s workspace. A few of the services, such as BLAST, do not require login and instead render the results appropriately in the website.
Submission forms: https://github.com/PATRIC3/p3_web/tree/master/public/js/p3/widget/app
Application Execution Service: https://github.com/PATRIC3/app_service
Note: All of the standalone computational services are dependent on app_service. For services below for which no other repos are listed (e.g., Variation Analysis, Metagenomic Read Mapping, Taxonomic Classification, Protein Family Sorter, Comparative Pathway, and ID Mapping) app_service includes all PATRIC code required for the service, plus possibly other third party command line tools.
Also required for all services:
The Genome Assembly Service can be used to perform an automated genome assembly using the latest computational tools. Single or multiple assemblers can be invoked to compare results. The assembly service attempts to select the best assembly—i.e., assembly with the smallest number of contigs and the longest average contig length. Several assembly workflows or “recipes” are available. These workflows have been tuned and tested to fit certain data types or desired analysis criteria such as throughput or rigor. The assembly service’s flexible nature also enables the rapid design and emulation of other popular protocols. Login required.
The Genome Annotation Service is based on the RAST Toolkit (RASTtk). RASTtk is a modular extensible genome annotation system that provides mechanisms for identifying genomic features and annotating their functions. The RASTtk annotation engine uses a signature k-mer method to propagate annotations taken from the CoreSEED, a genome annotation system that has been central to the quality of the RAST annotations over the years. The CoreSEED curation process takes advantage of subsystems-based annotation to ensure high-quality, consistent annotations. RASTtk is fully defined at http://www.ncbi.nlm.nih.gov/pubmed/25666585. Links and instructions for downloading and installing RASTtk client code are included. The subsystems annotation method is described at http://nar.oxfordjournals.org/content/33/17/5691.full. Login required.
The Comprehensive Genome Analysis Service provides a streamlined analysis “meta-service” that accepts raw reads and performs a comprehensive analysis including assembly, annotation, identification of nearest neighbors, a basic comparative analysis that includes a subsystem summary, phylogenetic tree, and the features that distinguish the genome from its nearest neighbors. Login required.
The PATRIC BLAST service integrates the BLAST (Basic Local Alignment Search Tool) algorithms to perform searches against public or private genomes in PATRIC or other reference databases using a DNA or protein sequence and find matching genomes, genes, RNAs, or proteins.
The Similar Genome Finder Service will, for a user-selected genome or for an uploaded FASTA file, find the closest related public genomes (by sequence) in PATRIC using the MInHash algorithm to perform comparisons. Login required.
The Variation Service can be used to identify and annotate sequence variations using a variety of aligner and SNP calling programs. The service enables users to upload one or multiple short read samples and compare them to a closely related reference genome. For each sample, the service computes the variations against the reference and presents a detailed list of SNPs, MNPs, insertions and deletions with confidence scores and effects such as “synonymous mutation” and “frameshift”. High confidence variations are downloadable in the standard VCF format augmented by SNP annotation. A summary table illustrating how the variations are shared across the samples is also available. Login required.
The Tn-Seq Analysis Service allows users to align reads and measure essentiality of their Tn-Seq data using the TRANSIT software. The results can be downloaded or viewed as alignments to the reference genome in the Genome Browser. The alignments are presented as a separate track in the Genome Browser along with annotated genes. Login required.
The Phylogenetic Tree Service enables construction of custom phylogenetic trees for up to 50 user-selected genomes. The service builds trees using conserved protein sequences, which is the same methodology used to build the public genus-level phylogenetic trees in the PATRIC website. The service also provides an option for building a codon tree. Users can view or download a Newick file, or access the new tree in the interactive Phylogenetic Tree Viewer in PATRIC. Login required.
The Whole Genome Alignment Service aligns genomes using progressiveMauve to create whole genome alignments of up to 20 genomes. Login required.
The Metagenomic Read Mapping Service uses KMA to align reads against antibiotic resistance genes from CARD and virulence factors from VFDB. Login required.
The Taxonomic Classification Service accepts reads or contigs from sequencing of a metagenomic sample and uses Kraken 2 to assign the reads to taxonomic bins, providing an initial profile of the possible constituent organisms present in the sample. Login required.
The Metagenomic Binning Service accepts either reads or contigs, and attempts to “bin” the data into a set of genomes. This service can be used to reconstruct bacterial and archael genomes from environmental samples. Login required.
The Expression Import Service allows users to upload differential expression data into their private workspace and compare it with other expression data available in PATRIC. The service supports gene expression, protein expression, and phenotype array data in the form of log ratios, generated by comparing samples, conditions, or time points. Login required.
The RNA-Seq Analysis Service provides tools for aligning, assembling, and testing differential expression on RNA-Seq data. Three recipes for processing RNA-Seq data are included: 1) Rockhopper, based on the popular Rockhopper tool for processing prokaryotic RNA-Seq data; 2) Tuxedo, based on the tuxedo suite of tools (i.e., Bowtie, Cufflinks, Cuffdiff); and 3) Host HISTAT2 for analyzing RNA-Seq datasets from host (human, mouse, etc.) in support of dual RNA-Seq. The service provides SAM/BAM output for alignment, tab delimited files profiling expression levels, and differential expression test results between conditions. Login required.
Included RNA-Seq tools:
The Protein Family Sorter Service tool enables researchers to examine the distribution of protein families across a set of user-selected genomes. Results are displayed in a page showing all the families associated with the selected genomes, plus filter controls and an interactive heatmap.
The Proteome Comparison Service performs protein sequence-based genome comparison using bidirectional BLASTP. This service allows users to select up to 8 genomes (either public or private) and compare them to a user selected reference genome. The service also allows users to upload an external genome file in FASTA format for an additional comparison. The genome comparison result is displayed as an interactive circular genome view on the webpage. Both the SVG image and the bidirectional BLASTP comparison results can be downloaded. Login required.
The Comparative Pathway Service allows users to identify a set of pathways based on taxonomy, EC number, pathway ID, pathway name and/or specific annotation type.
The Model Reconstruction Service allows users to construct their own metabolic model for any genome in PATRIC. The service includes support for model gap-filling, flux balance analysis, essential gene prediction, and export of models in SBML format. The service leverages capabilities of the ModelSEED (PMID: 20802497). Login required.
The ID Mapper Service allows users to map individual or sets of PATRIC identifiers to those from other prominent external databases, such as GenBank, RefSeq, EMBL, UniProt, KEGG, etc. Alternatively, users can start with a list of external database identifiers and map them to the corresponding PATRIC features. Login required.
Project Information Pages¶
The information in these pages in the PATRIC website are maintained in a GitHub repository and delivered through the PATRIC Static Content management process, described below. These are available through the Help menu and other areas of the site.
PATRIC Quickstart Video¶
From Help Menu: Short video that provides an overview of the PATRIC website and how to navigate through the site.
Contains complete listing of all user documentation. User Guides are available for all major PATRIC features.
Provides print-friendly Use Case / Tutorials that explain step-by-step how to use key PATRIC features and tools using realistic biological research examples.
Provides an overview with links to User Guides and Tutorials, organized by common tasks in PATRIC.
Provides installation instructions and links to reference information and tutorials for the PATRIC Command Line Interface.
Provides information on upcoming webinars and links to previously recorded webinars. Videos of recorded webinars are hosted on PATRIC’s YouTube Channel.
Provides links to short videos that demonstrate how to perform common tasks in PATRIC. The videos are hosted on PATRIC’s YouTube Channel.
Provides listing of all past PATRIC workshops and links to registration information for upcoming workshops.
Provides information on how to get in contact with the PATRIC team.
From the Help Menu: Provides a feedback form that generates a ticket in the PATRIC Jira user issue tracking system.
Source Code: https://github.com/PATRIC3/p3_web
Listing of all recent and past PATRIC news items.
Provides listing of all publications developed in whole or in part through the PATRIC project, with links to the publications themselves.
Provides listing of all past PATRIC workshops and links to registration information for upcoming workshops.
Provides listing of all past PATRIC external presentations.
Provides summary metrics for website traffic, analysis service usage, citations to the PATRIC resource, and other similar information.
Provides general information about the PATRIC project, its scope, funding, and project team.
Provides reference information for citing PATRIC and a link to the full article at PubMed Central.
Provides listing of the PATRIC SWG members and their institutional affiliations.
Provides listing of PATRIC team personnel.
Provides listing of PATRIC key collaborators, data sources, and software tools used, with appropriate links.
Provides links to resources of relevance to PATRIC.
Link to the PATRIC GitHub code repository, including architectural and system descriptions.
Link to PATRIC source code repository.