Genome Annotation Service¶
The Genome Annotation Service uses the RAST tool kit, RASTtk, to provide annotation of genomic features. Once the annotation process has started by clicking the Annotate button, the genome is queued as a “job” for the Annotation Service to process, and will increment the count in the Jobs information box on the bottom right of the page. Once the annotation job has successfully completed, the output file will appear in the workspace, available for use in the PATRIC comparative tools and downloaded if desired.
Using the Genome Annotation Service¶
The Annotation submenu option under the Services main menu (Genomics category) opens the Genome Annotation input form (shown below). Note: You must be logged into PATRIC to use this service.
Note: RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions, available via the PATRIC Command Line Interface (CLI).
Annotation Input Form
Contigs: The target FASTA file containing the genome sequence to annotate.
Domain The taxonomic domain of the target organism: bacteria or archaea.
Taxonomy Name: The user-entered or selected taxonomic name for the organism. If the target species or strain is not listed, select the most specific, accurate taxonomic level available.
My Label: The user-provided name to identify the annotation result.
Output Name: The taxonomy name concatenated with the chosen label. This name will appear in the workspace when the annotation job is complete.
Genetic Code: The codon translation used in calling genes.
Output Folder: The workspace folder where results will be placed.
Taxon must be specified at the genus level or below to get the latest protein family predictions.
Clicking on the Jobs indicator at the bottom of the PATRIC page open the Jobs Status page that displays all current and previous service jobs and their status.
Jobs Indicator Bar
Once the job has completed, selecting the job by clicking on it and clicking the “View” button on the green vertical Action Bar on the right-hand side of the page displays the results files.
Job Status Page
The results page will consist of a header describing the job and a list of output files, as shown below.
Annotation Service Results Page
The Genome Annotation Service generates several files that are deposited in the Private Workspace in the designated Output Folder. These include
- contigs.fasta - contains the assembled contigs of the genome in DNA FASTA format.
- embl - contains an EMBL dump of the annotated genome.
- feature_dna.fasta - contains all the feature sequences of the genome in DNA FASTA format.
- feature_protein.fasta - contains all the protein sequences of the genome in protein FASTA format.
- features.txt - is a tab-delimited text file listing all the features of the genome. For each feature, it contains the PATRIC ID, the location string, the feature type, the functional assignment, any alternated IDs found, and (for protein-coding genes) the protein MD5 checksum.
- .gb - contains the annotated genome in GENBANK format.
- .genome - contains a special “Genome Typed Object (GTO)” JSON-format file that encapsulates all the data from the annotated genome. See Extracting and Mining Genome Typed Objects for more information.
- .gff - lists all the features of the genome in General Feature Format.
- GenomeReport.html - contains summary report of the genome qualtiy, described in Analyzing Genome Quality. Also contains a link to the genome in the PATRIC Genome Browser.
- .tar.gz - a compressed tarfile of the SEED form of the exported genome
- .txt - plain text form of the features in the genome
- .xls - Excel tabular form of the features in the genome
- load_files - folder containing the data files as imported into the backend Solr database
- quality.json - result of the genome quality analysis in JSON format.
- Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, Shukla M, Thomason JA 3rd, Stevens R, Vonstein V, Wattam AR, Xia F. (2015). RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Scientific reports 5: 8365.