NCBI ViroTrakr submission protocol for foodborne virus surveillance_version 2

Zhihui Yang; Ruth Timme; Maria Balkey

Dec 16, 2025

Version 2

NCBI ViroTrakr submission protocol for foodborne virus surveillance_version 2 V.2

DOI

https://dx.doi.org/10.17504/protocols.io.j8nlkkdbxl5r/v2

Zhihui Yang¹,
Ruth Timme²,
Maria Balkey²

¹FDA/HFP/OLOAS/OAMT;
²FDA/HFP/OLOAS/OSCCS

GenomeTrakr
ViroTrakr

Zhihui Yang

FDA

DOI: https://dx.doi.org/10.17504/protocols.io.j8nlkkdbxl5r/v2

External link: http://ViroTrakr: foodborne viruses (ID 396739) - BioProject - NCBI (nih.gov)

Protocol Citation: Zhihui Yang, Ruth Timme, Maria Balkey 2025. NCBI ViroTrakr submission protocol for foodborne virus surveillance_version 2. protocols.io https://dx.doi.org/10.17504/protocols.io.j8nlkkdbxl5r/v2Version created by Zhihui Yang

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: December 15, 2025

Last Modified: December 16, 2025

Protocol Integer ID: 235061

Keywords: Foodborne virus surveillance, NCBI submission protocol, ncbi submission protocol for microbial p..., submission protocol for foodborne virus ..., foodborne virus surveillance introductio..., foodborne virus, bacterial pathogen surveillance in multi..., bacterial pathogen surveillance, microbial pathogen surveillance, virotrakr contributor, international genomic reference database, virotrakr, pathogen, norovirus, new bioproject at ncbi, foodborne illness, data to ncbi, raw sequence data, genometrakr, sapovirus, genomic database, raw sequence data to the sra database, data to genbank, time reference sequences for phylogeneti..., ncbi submission protocol, ncbi submission, new bioproject, ncbi, phylogenetic analysis, biosample number, ViroTrakr, Foodborne viruses, NCBI submission protocol, Foodborne virus surveillance, ncbi virotrakr submission protocol for foodborne virus surveillance-version, foodborne viral pathogen surveillance, ncbi submission protocol for microbial pathogen surveillance, foodborne virus surveillance-version, foodborne v

Abstract

INTRODUCTION:
This protocol outlines the steps which ViroTrakr contributors need to follow in order to submit their data to NCBI. It includes how to: 
-  Establish your new BioProject at NCBI;
-  Link it to ViroTrakr;
-  Create BioSample numbers for your submission and submit raw sequence data to the SRA database;
-  Submit assembled sequences to GenBank and link them to ViroTrakr.

ViroTrakr, a genomic database initiated by FDA and housed in NCBI, aims to (1) cover sequences of a wide range of
foodborne viruses (e.g., norovirus, hepatitis A virus, sapovirus, etc.) from clinical, food and/or environmental specimens and (2) provide real-time reference sequences for phylogenetic analysis and epidemiologic studies linked to foodborne illnesses. Data analysis pipelines have been established for norovirus and Hepatitis A virus and are under development for other species. Similar to GenomeTrakr for bacterial pathogen surveillance, ViroTrakr is expected to be employed for foodborne viral pathogen surveillance. 

ViroTrakr:
foodborne viruses (ID 396739) - BioProject - NCBI (nih.gov)

GenomeTrakr:
Multispecies (ID 593772) - BioProject - NCBI (nih.gov)



Reference: NCBI submission protocol for microbial pathogen surveillance V.10:
NCBI submission protocol for microbial pathogen surveillance (protocols.io)

ViroTrakr data structure

ViroTrakr database structure: the ViroTrakr database was established as an umbrella BioProject at NCBI with the structure shown below:
Note: The steps involved in your ViroTrakr submission are highlighted in green;
One data level BioProjectper lab or per collaboration project.
 
Database structure: (cont.) for each data level BioProject:

NCBI sign in

Getting started 
Note: if you are an existing NCBI user, you may skip this step.

Please refer to “NCBI submission protocol for microbial pathogen surveillance” Section 1
NCBI submission protocol for microbial pathogen surveillance (protocols.io) for details.

For new users, directly create an account using one of the 3rd party sign-in options:
Sign up / NCBI (nih.gov)

For existing users, sign in to your NCBI account on the 3rd party’s sign in page: Log in / NCBI (nih.gov)

For existing users (cont.), click "more login options" to find the 3rd party that you will sign in with:

You may group, organize and manage your NCBI submission environment for your lab: please refer to “NCBI submission protocol for microbial pathogen surveillance” Section 1
NCBI submission protocol for microbial pathogen surveillance (protocols.io) for details.

Log into your NCBI account and you are ready for your NCBI submission.

Creating BioProjects at NCBI

Establish your new data level BioProject under the umbrella BioProject ViroTrakr:
Note: this step will link your data submission to ViroTrakr.

Please refer to “NCBI submission protocol for microbial pathogen surveillance” NCBI submission protocol for microbial pathogen surveillance (protocols.io) step 3 for details.

Log into your NCBI account at Submissions | BioProject | Submission Portal (nih.gov):

Establish a new BioProject by clicking “New submission”:

There are seven tabs under each BioProject submission.

Populate “Submitter” tab: (a submission group is highly recommended* for your laboratory)

*Note: to establish and use a user group for all your submission related to microbial genome surveillance is highly recommended. The reasons are, as mentioned in NCBI submission protocol for microbial pathogen surveillance (protocols.io):

“- it will link your laboratory's NCBI data ownership to the user group and not to individuals, allowing anyone in the current group to perform updates or retractions and answer inquiries from the NCBI staff, even if there's been a complete turnover of staff since the original data submission. 

- it also ensures consistent data ownership across BioProjects, BioSamples, and sequence data. If your laboratory has non-overlapping research groups submitting and managing data at NCBI, multiple user groups can be established to track these efforts separately.” 

You may use a submission group which has been already established by your laboratory. You may check the “Group” tab in the submission portal, https://submit.ncbi.nlm.nih.gov/groups/for this information. Ask your colleagues to do the same thing, to ensure your laboratory doesn't already have one in place. 

If your laboratory doesn’t have one proper submission group ready, please refer to NCBI submission protocol for microbial pathogen surveillance (protocols.io) section 1.2 and 1.3 for the details on:
- how to request and create a new user group by emailing to NCBI help staff at [email protected]
- how to manage your NCBI submission user group by clicking in the “Group” tab of the submission portal https://submit.ncbi.nlm.nih.gov/groups/

You may contact NCBI by emailing to [email protected]if you have any further question regarding submission group and need additional help.

 
Populate “Project type” tab (e.g. Raw sequence reads):
* Required fields are marked with * asterisk.

Populate “Target” tab: move cursor to the question marks for description of each item.
Required fields are marked with * asterisk; fields without * asterisk could be left blank.


Note: choose the most descriptive and valid organism name for your study. For example,
“Norwalk virus” instead of “norovirus”, “Homo sapiens” instead of “human” should be used. See Organism information - BioSample - NCBI (nih.gov) and Home - Taxonomy - NCBI (nih.gov) for more information about providing a valid organism name. 

Populate “General Info” tab:
•Choose “release immediately following processing” or a specified date to release your submission;
•Provide a description (e.g., Norwalk virus sequencing) of the study goals and relevance (e.g., NGS of clinical samples as part of norovirus surveillance) under “Public description”;
•Choose a “Relevance” from the provided options;
•Click “Yes” to question “Is your project part of a larger initiative which is already registered with NCBI?”:
- enter BioProject accession number PRJNA433975 if norovirus sequence data;
- enter BioProject accession number PRJNA433976 if hepatitis A virus sequence data;
- enter BioProject accession number PRJNA433977 if sapovirus sequence data;
- enter BioProject accession number PRJNA817226 if rotavirus sequence data;
- enter BioProject accession number PRJNA817227 if astrovirus sequence data;
- enter BioProject accession number PRJNA817228 if hepatitis E virus sequence data. 

•You may leave other fields blank. 
 



 

Leave "BioSample" tab blank, it will be created from a different submission portal (SRA submission).

Leave "Publications" tab blank or add relevant publications from your group.

Check  your input in “Review and Submit” tab, you may edit if needed or click “submit” to complete your submission.

The BioProject accession number “PRJNAxxxxxx” will be available within a few minutes on “my submission” page. Meanwhile, you will receive an NCBI email containing these accession numbers, usually within 12 hours. 

Creating BioSamples at NCBI and SRA submission

Note: to protect subject privacy for data submission of clinical samples, removal of any human genomics reads from the raw sequencing data is important. Two steps are included in our pipeline to confirm the absence of human reads from the submitted data: 

First, the ViroTrakr BioProject has been flagged for automated human-read scrubbing by NCBI. With this setting, any data submission to this BioProject will automatically get scrubbed for human reads with the tool: https://github.com/ncbi/sra-human-scrubber. 

Second, an additional step is included in our data analysis pipelines to automatically remove any human reads from the samples by mapping the reads to human hg38 chromosome and collect the unmapped (dehosting) reads for downstream analysis.

Please refer to “NCBI submission protocol for microbial pathogen surveillance” NCBI submission protocol for microbial pathogen surveillance (protocols.io) step 2 for details.

Log into your NCBI account at Submissions | Sequence Read Archive (SRA) | Submission Portal (nih.gov);

Establish a SRA submission by clicking “New Submission”:

There are initially five tabs under each BioSample submission.

Populate “Submitter” tab: (a submission group is highly recommended for your laboratory)

Populate “General info” tab:

•Click “Yes” Under BioProject and enter the BioProject accession number established in step 3.

•Click “No” Under BioSample to indicate you do not have an existing BioSample to associate with this sequence data; one of the next steps will create the BioSample.

•* Click “Release immediately following processing” or specify a date to release if preferred.

Note: this is important for your first submission especially for data from clinical samples. To
protect subject privacy, removal of any human genomics reads from your sequencing raw data can be done with the automated human-read scrubbing tool available in NCBI https://github.com/ncbi/sra-human-scrubber. To do so, along with the first data submission, a flag can be set for that BioProject
indicating this first data submission and subsequent data submissions for that BioProject
get automatically scrubbed. Specifically, choose “Release on specified date” on your first submission, you may enter a date one week in the future (or longer and you may change the date later), meanwhile send the following emailing to [email protected] as soon as possible:

Hi, SRA help desk, 
Please add the human read scrubbing analysis flag to my BioProjectPRJNAXXXXXX, then release my HUPed (delayed release) SRA submissions.
Thanks,
Your name”

Once the flag is set for that BioProject, you may click “Release immediately following processing” for subsequent data submissions. 

•Click “Continue” to next page.

Populate “BioSample Type” tab:

Preview BioSample Types and Attributers on the template page, and select the package that best describes your samples (e.g., select "One Health Enteric” package):

* Note: in addition to the initial 5 tabs, two additional tabs "BioSample type" and "BioSample attributes" were added to collect information for BioSample creation.

Click "Continue" at the bottom of "Sample type" tab:

Populate “BioSample attributes” tab:

•Click “Upload a file using Excel or text format (tab-delimited) that includes the attributes for each of your BioSamples.

•Click “Download Excel” button under “Attributes file”.

•Fill out the downloaded BioSample attributes sheet in excel file, save it in a local folder.

•Click “Choose file” button under “Attributes file”, then upload the populated and saved attributes sheet.

•At bottom of the “Attributes” tab, click “Continue”:

Populate “SRA metadata” tab:

•Click “Upload a file using Excel or text format (tab-delimited)”.

•Click “Download Excel spreadsheet” under “Metadata file”.

•Read instructions under first tab (Contact Info and Instructions) on how to fill out the spreadsheet (next page), fill out the second tab (SRA_data) and save it as a TSV (tab-delimited) file in a local folder.

•Click “Choose file” button under “Metadata file”, then upload the populated spreadsheet that saved in your local folder.

•At the bottom of the “SRA metadata” tab, click “Continue”:

4.7.1: The downloaded metadata excel spreadsheet:
Note: You must save the spreadsheet under the second tab (SRA_data) as a TSV (tab-delimited file) to upload the TSV file for the SRA metadata tab.
4.7.2: Example: fill out the metadata excel spreadsheet:

4.7.2: Example: fill out the metadata excel spreadsheet (Cont.):
           Get the file name of each fastq file and fill out the columns (filename and filename 2):

Populate “Files” tab:

•Click “FTP or Aspera Command Line file preload”.

•Click “FTP upload Instructions”.

4.8.1:Read and follow “FTP upload Instructions”. Select a proper FTP tool (e.g., FileZilla) to upload your data:

4.8.2: Open FileZilla

• Copy and paste “Host, Username, Password” to establish FTP connection.

• Port: default for FTP is 21; default for SFTP is 22. Click “Quick connect”.

• Copy and paste your directory name “uploads/. …”.

• Create a subfolder (REQUIRED!) with a meaningful name. 

• Start upload your sequence data from your local folder to the created subfolder.

4.8.3: when upload is completed, return to SRA submission page and click “Select preload folder”; 

4.8.4: (note: it takes at least 10 minutes for uploaded files to become available) click “continue” to upload:

Please review your submission, make necessary changes on any tab, then click the “Submit” button:

The SRA accession number “SRRxxxxxxxx” will be available within a few minutes on “my submission” page. You may download the “metadata file with SRA accessions” for your record. Meanwhile, you will receive an NCBI email containing these accession numbers, usually within 12 hours. 

Submission of assembled sequences to NCBI GenBank.

Submission of the assembled data to GenBank. 


Raw sequencing data is required as input for ViroTrakr database deposit and subsequent de novo assembly data analysis.  Assemblies or consensus sequences are part of the data analysis in our workflow, its submission to GenBank is highly encouraged. The GenBank submission of your assembled sequences will:

•“Make your sequence data available in the International Nucleotide Sequence Database Collaboration (INSDC) for global use;
•Ensure your data contribution is included in NCBI Virus, BLAST, RefSeq and other resources;
•Follow FAIR data-sharing principles.”

Reference: 

Notes: 
This GenBank submission assumes that you already have established a BioProject and BioSample(s) from step 3 and step 4 of this protocol. Otherwise, please refer to step 5.7 on how to link your sequence submission to ViroTrakr. 

Assembled sequences of norovirus can be directly submitted through GenBank Submission Portal which
offers annotation pipelines for certain species including Norovirus. Submissions | GenBank |
Submission Portal (nih.gov).

Assembled sequences of other foodborne viruses, with annotation provided by the submitter, should
use one of the alternate submission tools (e.g., BankIt, tbl2asn, etc.):
       https://www.ncbi.nlm.nih.gov/WebSub/?tool=genbank
        https://www.ncbi.nlm.nih.gov/genbank/table2asn
 

Log into your NCBI account at Submissions  GenBank | Submission Portal (nih.gov);
Establish a GenBank submission by clicking “New Submission”:
 

Note: Submission of norovirus assemblies can be directly made at the submission portal, all other submission types could use one of the alternate submission tools (such as BankIt, tbl2asn) with similar submission steps. 

There are nine tabs under each GenBank submission. 

Populate “Submission type” tab:

Populate “Submitter” tab: (a submission group is highly recommended for your laboratory)

Populate “Sequencing technology” tab:
•Choose the method used to obtain these sequences;
•Click “Assembled sequences”;
•Fill in the Assembly information (Assembly program and version/date);
•Click “Continue” button at the bottom to next page.

Populate “Sequences” tab: 

•Click “Release immediately following processing” or specify a date to release if preferred.

•Upload a prepared nucleotide FASTA file by clicking “Choose file”.

•Click “Continue” to next page.

Notes:

•Organize your sequence files by type or locus and make one submission for each type.

•Plain text (.txt) nucleotide FASTA files are accepted.

•Use a text editor (for example: Notepad or WordPad) to prepare a file containing the set of nucleotide sequences in FASTA format and save the file as plain text or text.

•You may use the strain, isolate, specimen-voucher, or clone IDs as the sequence_ID in your FASTA file. If you do this, do not include extra information in the sequence_ID such as the organism name, etc.

For more information on how to format and organize FASTA files, please see  FASTA file help.

Populate “Source Info” tab:

•You may find more information on the question “Do your sequences IDs represent one of these?” by clicking “description of these fields”. 

•Click “None of these” if your sequence IDs don’t contain information as described.

•Click “Continue” to next page.

Populate “Source modifiers” tab:

•Click the “Upload a tab-delimited table (template file provided)” button. 

 4.7:Populate “Source modifiers” tab: (cont.) 

* Notes: to link your submitted sequences to ViroTrakr, you may include the BioSample numbers and BioProject numbers of your sequences to your submission. To do so, you may choose one of
the options below: 1. download the source modifier template; 2. customize a source modifier template, see details below. 

•Customized GenBank source modifier template:
Below is a customized version of source modifier table which contains direct linkage to the respective BioSample and BioProject records. Populate the template as guided and save it in a txt format. 
Customized source modifiers_for ViroTrakr.tsv104B  

An example of the customized source modifiers table:

GenBank submission modifers_ViroTrakr.xlsx  

•Click “Choose file” to upload your populated and saved source modifier file.

•Click “Continue” to next page.

Alternatively, you may send the relevant information to GenBank (BioSample number, BioProject
number, etc.) at any time during and after the submission process, or as an update after you receive Genbank accession numbers. Please send the email to:
[email protected] (for replies/updates to records in GenBank)

Populate "References" tab:
Click "Continue" to next page. 

Populate "Review & Submit" tab:
Review your submission, make any necessary changes using the tabs/steps above, then click on the “Submit” button below.

Notes: the DBLINK (BioSample and BioProject) will be included and shown on the GenBank Record Preview, if you chose the option 1 or 2 in step 5.7 above.
Notes:
•The GenBank accession number will be available within a couple of hours on “my submission” page. Meanwhile, you will receive an NCBI email containing these accession numbers, usually within 12 hours;

•If a processed-error shows up, you may correct a submission by using the FIX button associated with the error information. 

Technical Assistance

Technical Assistance: 

If you are having trouble finalizing your submission, contact the relevant NCBI database for assistance and include your submission ID in the email subject (SUB#######):

BioProject (for any BioProject issues): [email protected]

BioSample (for issues on source metadata): [email protected]

SRA (for issues on raw sequencing data): [email protected]

GenBank (for issues on assembled sequences): [email protected]

GenomeTrakr:[email protected]

ViroTrakr:[email protected]

NCBI help desk and account issues:[email protected]