PATRIC FTP site provides bulk access to all the public genomes and other related data in various standard file formats.
PATRIC FTP Site: ftp://ftp.patricbrc.org/
Below is the description of primary data directories on the FTP site and their content.
Provides list of all public genomes, related metadata, and AMR phenotype data in tab-delimited formats.
- genome_summary: Basic summary of genomes and their annotation in tab-delimited format
- genome_metadata: All genome metadata in tab-delimited format
- genome_lineage: taxonomy lineage for all public genomes, presented as taxon ids and taxon names) for all
- PATRIC_genome_AMR.txt: AMR phenotype data generated by laboratory methods in tab-delimited format
Genomes directory procvides access to data for all public genomes in various standard file formats. The data is organized by genomes. There is a separate directory for each genome, with genome_id as the directory name.
For example, below is the genome directory for Escherichia coli MG1655 genome.
Each genome directory provides the following data files for PATRIC and RefSeq annotations (when available).
- .fna: FASTA contig sequences
- .faa: FASTA protein sequence file
- .features.tab: All genomic features and related information in tab-delimited format
- .ffn: FASTA nucleotide sequences for genomic features, i.e. genes, RNAs, and other misc features
- .frn: FASTA nucleotide sequences for RNAs
- .gff: Genome annotations in GFF file format
- .pathway.tab: Metabolic pathway assignments in tab-delimited format
- .spgene.tab: Specialty gene assignements (i.e. AMR genes, virulance factors, essential genes, etc) in tab-delimited format
- .subsystem.tab: Subsystem assignments in tab-delimited format
Downloading data for large number of genomes¶
Because of the large number of genomes currently available on the FTP site, it is not very efficient to download them using FileZilla/lftp clients or the website. However, it is very easy to do it using a simple shell script as follows.
You can download list of all genomes you are interested in from the website and then copy list of genome ids into one text file, called “genome_list”. Alternatively, you can get a list of all public genomes from the FTP (link below) and filter it for the genomes / species you are interested in.
Once you have copied list of genome ids you are interested in a separate file “genome_list”, you can use the following one line shell script to read the list of genome ids from your file and download corresponding .fna files from the PATRIC FTP site. If you are interested in other file type, say .PATRIC.faa or .PATRIC.features.tab, simply replace .fna with that extension.
for i in cat genome_list; do wget -qN “ftp://ftp.patricbrc.org/genomes/$i/$i.fna”; done