Protein Family Sorter

Accessing the Protein Family Sorter on the PATRIC Website

An advanced search, described below, may be conducted from the “Searches and Tools” tab at the top of the PATRIC site and from the “Search Tools” box on any Organism Landing page. To learn about Organism Landing pages, see the PATRIC Data Organization Overview tutorial.

Note: Searches conducted using Comparative Pathway Tool from the “Search Tools” box on an Organism Landing page will be pre-scoped to the taxonomic level of the page.

Protein Family Sorter Overview

The Protein Family Sorter tool enables researchers to examine the distribution of specific protein families, known as FIGfams, across different genomes.

FIGfams

FIGfams are protein families generated by the Fellowship for Interpretation of Genomes (FIG), which are based on a collection of functional subsystems, as well as correspondences between genes in closely related strains. FIGfams are sets of Protein Sequences that are similar along their full length. Further, all of the proteins within a single FIGfam are believed to implement the same function. Sometimes, we simply state that a FIGfam is a set of isofunctional homologs. Proteins are thought of as implementing one or more abstract Functional Roles. All of the members of a single FIGfam are believed to implement precisely this same set of functional roles.

For more details, click here.

Using the Protein Family Sorter

To begin, select your phylum, class, order, family, genus, species, or genomes of interest via the My Groups tab, the Taxonomy Tree tab, the Alphabetized Organism tab, or begin typing in the “Jump to” field. Then click on the View Protein Families button. Note: PATRIC can not filter searches yielding more than 400 genomes.

You will be taken to an initial search results page, where results may be viewed via the default Protein Families (FIGfam) Table tab or the Heatmap Tab. For details on how to use the Heatmap Tab, please see Protein Family Heatmap FAQs. Note: Larger searches may take PATRIC several minutes.

Table Filter Panel

Both the Table and Heatmap tabs may be filtered using this panel. The Genome Filter at the top of the panel allows for the following three filter selections to be made for each genome:

Selecting “Present in all Families” will show only protein families that include members from those selected genomes. Selecting “Absent from all Families” will show only protein families that do not include members from those selected genomes. Selecting “Either/Mixed” specifies that proteins from selected genomes may or may not be included in the resulting protein family list. This option is set by default, and allows users to focus on only those genomes they want to include and/or exclude without having to explicitly set one of these two options for every genome. Five Genome Metadata categories are also available to filter on (i.e. search for protein families that are present in genomes isolated from host A but absent in genomes from host B). The categories include country of isolation, host, disease, collection date, and completion date and are accessible by grabbing the right edge of the Filtering Panel and expanding it out.

The Filters at the bottom of the panel enable narrowing of the results based on Keywords, Perfect Families, and/or the number of proteins or genomes per protein family.

Protein Families (FIGfam) Table Tab Features and Functionality

  • All tables accessed via the Protein Family Sorter may be sorted as described in Feature Table FAQs.
  • Click on any Product Description within the table to see a detailed list of protein members in that specific FIGfam. From this FIGfam details page you may click on links within rows to navigate to specific PATRIC genome pages (Genome name), NCBI nucleotide page for that genome (Accession), and specific PATRIC locus pages (Locus Tag).
    • Note: The Accession column is not shown in the default view.
  • Utilize the Protein Families Table Toolbar, located in the light blue row to:
    • Save selected items to groups within your Workspace by clicking the Add Feature(s) button. To learn more about how to save data for future visits and utilize your personal Workspace please see Workspace FAQs.
    • View or download selected DNA and/or protein sequence data in FASTA format. The table itself, or selected data within it, are also downloadable in both excel and txt file formats.
    • View pathways and associated pathway information for your selected genes by clicking the Pathway Summary button.
    • Conduct Multiple Sequence Alignment analysis on selected features by clicking the Multiple Sequence Alignment button. For more details on this tool please see Multiple Sequence Alignment FAQs.
    • Utilize the ID Mapping Tool on selected features by clicking the Map IDs button. For more information on this tool please see ID Mapping FAQs.
    • Customize columns shown/hidden in your results table with the Columns button.

Protein Alignment

The protein sequences from each family were aligned using MUSCLE (v3.6) and ambiguous portions of the alignment were removed using Gblocks (v0.91b).