Welcome to the GenomeArk. This site provides access to genome sequencing and assembly data generated by the Earth BioGenome Project, the Vertebrate Genomes Project, the Telomere-to-Telomere Consortium, and other related projects. The principles of the GenomeArk are to be a working space and database repository for high-quality reference genomes for all species. The final assemblies deposited are expertly curated before submission to the public archives (NCBI GenBank and European Nucleotide Archive) for annotation and archiving. We ask that users respect access and give credit to sources of unpublished data in the database. The links below provide summary assembly statistics and instructions for downloading the raw data and assemblies. Expert AWS users can also access this data directly from s3://genomeark and more technical information is available from https://github.com/vgp.
- General
- Genomeark S3 Bucket Structure describing the layout of files.
- Notes on HiFi Reads BAM tags
- Accessing
- AWS CLI Primer for downloading and uploading data.
- Uploading
- Easy, Step-by-Step Data Upload Guide.
- Full Data Ingress Instructions.
- AWS Credentials necessary for uploading data.
All Species | Curated Assemblies | Draft Assemblies | Raw Data Only | ||
---|---|---|---|---|---|
All | 684 species | 408 species | 141 species | 135 species | |
VGP | 500 species | 331 species | 101 species | 68 species | |
T2T | 13 species | 7 species | 3 species | 3 species | |
Bat1K | 38 species | 15 species | 19 species | 4 species | |
Number of species at each level of completion. |