Unique and Divergent Proteins in E. coli Outbreak Strains

Published on 2011-06-13

Note:  Data was updated June 17th.

The proteins from the Escherichia coli strains TY-2482 and LB226692 were used to search a specific E. coli database (compiled with 184 E. coli genomes at PATRIC) using BLAST.  We calculated Smith-Waterman alignment scores for each of the top ten best-scoring homologs of these proteins and normalized the score to the self-alignment score, creating a conformity score.  Conformity scores of 1 mean that all proteins in the alignment are identical.  Proteins with conformity scores of 0.8 or less were considered to be “divergent” from their top ten homologs in E. coli.  A graphical representation, where conformity scores for over 5000 proteins of the E. coli strains TY-2482 and LB226692, is provided in the figure below. The X-axis is in log scale.

Click image to enlarge

A list of all proteins and their conformity scores is provided in an Excel file (TY2482-LB226692_Vir_ABR_Conformity_All). Some of the proteins found in TY-2482 and LB226692 have no homologs in E. coli.  In the conformity score column, these “unique” proteins are identified by a “-“.  The data for the unique proteins for both species are also integrated into the PATRIC website here, with the additional functionality the site provides.

The divergent proteins (see above) were compared to virulence and antibiotic-resistance proteins collected from different sources with some interesting discoveries.  For example, beta-lactamase, a protein responsible for resistance to beta-lactam antibiotics like penicillins, cephamycins, and carbapenems was found to be identical in TY-2482 and LB226692 (CTX-M type), but very different from their closest homologs prevalent in other E. coli strains (TEM-type).  An alignment of these proteins is provided here. Other divergent proteins include ABC transporters, phage-related and outermembrane proteins.