Jan 21, 2022

Public workspaceOverview of NCBI's SARS-CoV-2 submission process and the metadata required V.5

Version 1 is forked from Populating the NCBI pathogen metadata template
  • 1US Food and Drug Administration;
  • 2University of British Columbia;
  • 3CDC;
  • 4Centers for Disease Control and Prevention
  • Coronavirus Method Development Community
  • TOAST_public
Icon indicating open access to content
QR code linking to this content
Protocol CitationRuth Timme, Emma Griffiths, Lee Katz, Michael Weigand, Technical Outreach and Assistance for States Team 2022. Overview of NCBI's SARS-CoV-2 submission process and the metadata required. protocols.io https://dx.doi.org/10.17504/protocols.io.b35iqq4eVersion created by Technical Outreach and Assistance for States Team
Manuscript citation:
Griffiths, E. J. et al. The PHA4GE SARS-CoV-2 Contextual Data Specification for Open Genomic Epidemiology. (2020) doi:10.20944/preprints202008.0220.v1. https://www.preprints.org/manuscript/202008.0220/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it’s working
Created: January 21, 2022
Last Modified: January 21, 2022
Protocol Integer ID: 57226
Keywords: GenomeTrakr, metadata, Pathogen package, NCBI Pathogen Detection, INSDC
Disclaimer
Please note that this protocol is public domain, which supersedes the CC-BY license default used by protocols.io.
Abstract
PURPOSE:
This protocol explains the metadata requirements for the following two protocols:

  • Step-by-step instructions for establishing a new NCBI laboratory submission account and for creating and linking a new BioProject to an existing umbrella effort.
  • SARS-CoV-2 raw data submission to SRA (Sequence Read Archive) and metadata to BioSample. Users can modify this protocol to just create a BioSample with no linked raw data.

Required: established BioProject and BioSamples
  • Submit SARS-CoV-2 assemblies to NCBI GenBank, linking to existing BioProject, BioSamples, and raw data.

Version history:
V5: Updated metadata templates (BioSample) for minor edit
Three templates needed for NCBI SARS-CoV-2 submission
Three templates needed for NCBI SARS-CoV-2 submission
START HERE FIRST: Read the PHA4GE contextual data specification BEFORE populating your submission templates!

Training video:

For the visual learners, here is a 10min video summarizing the entire NCBI submission process:
Video

Assembling the three NCBI metadata templates for SARS-CoV-2 submission:

Steps 2-4 provide templates to populate for your submission, however, the primary PHA4GE guidance should be followed first to ensure the correct controlled vocabularies and ontology terms are used to populate these fields.

Guidance included in this protocol:
  • Step 2) PHA4GE BioSample metadata template

  • Step 3) PHA4GE SRA metadata template

  • Step 4) PHA4GE GenBank source modifier template


BioSample metadata
BioSample metadata
SARS-CoV-2 BioSample submission package:

Download custom version containing the PHA4GE pick-lists and controlled vocabulary:

Download SARS-CoV-2.cl.1.0_PHA4GE-V3.1.xlsxSARS-CoV-2.cl.1.0_PHA4GE-V3.1.xlsx

Follow the PHA4GE contextual metadata SOP and source GitHub repository for guidance in populating the template.

SRA metadata
SRA metadata
Populate SRA’s batch metadata table:

Download custom version containing the PHA4GE pick-lists and controlled vocabulary:

Download SRA_template_PHA4GE-V3.xlsxSRA_template_PHA4GE-V3.xlsx

Follow the PHA4GE contextual metadata SOP and source GitHub repository for guidance in populating the template.

PRO TIPS:
  1. If you have sequences to submit that belong to more than one BioProject, create a separate submission + metadata table for each of your BioProjects.
  2. Entering fastq filenames in the spreadsheet: On a Mac, you can directly copy the file names from the folder into a spreadsheet. This is not possible on a PC using copy and paste but can be done with some command-line operation.
  3. Finally, it is important to develop a QA/QC step to make sure the files are associated with the correct sample name. For example, use a left function in excel to strip of the appended text in the file name and then use the exact match to make sure the name matches the sample name.
GenBank metadata
GenBank metadata
Populate two GenBank templates.

1. GenBank structured comment (metadata describing the mapping or assembly methods)

Download GenBank-structuredComment_PHA4GE-V3.xlsxGenBank-structuredComment_PHA4GE-V3.xlsx

2. GenBank source modifier template. This is a custom version containing PHA4GE guidance and direct linkage to the respective BioSample records. Follow guidance presented in this file for populating the template.

Download GenBank-source_modifiers_PHA4GE-V3.xlsxGenBank-source_modifiers_PHA4GE-V3.xlsx