Genotyping and phylogenetic analysis_ViroTrakr workflows

Jayanthi Gangiredla; Mark  Mammel; Zhihui Yang

Apr 02, 2026

Genotyping and phylogenetic analysis_ViroTrakr workflows

DOI

https://dx.doi.org/10.17504/protocols.io.5jyl848org2w/v1

Jayanthi Gangiredla¹,
Mark Mammel¹,
Zhihui Yang²

¹FDA/HFP/OLOAS/OAMT/DFSG;
²FDA/HFP/OLOAS/OAMT/DFES

Zhihui Yang: Contact information: [email protected];

GenomeTrakr
ViroTrakr

Zhihui Yang

FDA

DOI: https://dx.doi.org/10.17504/protocols.io.5jyl848org2w/v1

Protocol Citation: Jayanthi Gangiredla, Mark Mammel, Zhihui Yang 2026. Genotyping and phylogenetic analysis_ViroTrakr workflows. protocols.io https://dx.doi.org/10.17504/protocols.io.5jyl848org2w/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: March 23, 2026

Last Modified: April 02, 2026

Protocol Integer ID: 313776

Keywords: phylogenetic analysis-virotrakr, virotrakr database, raw sequencing data, next generation sequencing, step instructions for foodborne virus, sequencing data analysis, de novo assembly generation for raw sequencing data, generation sequencing platform, data analysis within the galaxytrakr platform, sequencing data, virotrakr, major foodborne viral pathogen, galaxytrakrmicrorunqc workflow, foodborne virus, sapovirus pipeline, quality control assessment for microbial genome, sequencing, new galaxytrakr format, microbial genome, phylogenetic result, target virus in step, galaxytrakr platform, developed sapovirus pipeline, de novo assembly generation, galaxytrakr, phylogenetic tree, reports of the genotyping, sapovirus, galaxytrakr account, raw data, target virus, norovirus, assembled sequence, virus, genotyping, corresponding workflow, workflow in step

Disclaimer

Please note that this protocol is public domain, which supersedes the CC-BY license default used by protocols.io. 

Abstract

ABSTRACT

This workflow provides step-by-step instructions for foodborne viruses sequencing data analysis within the GalaxyTrakr platform. It covers data quality assessment and de novo assembly generation for raw sequencing data (from most Next-Generation Sequencing platforms), execution of the workflows either from the raw sequencing data or assembled sequences, and reports of the genotyping and phylogenetic results. 

This protocol describes how to: 
-  Set up a GalaxyTrakr account (Item 1);
-  Create a new history/workspace for a submission (Item 2);
-  Upload raw sequencing data (Item 3) either from a local folder (Item 3.1), or download data from NCBI (Item 3.2);
-  Upload assembled sequences from a local folder (Item 4);
-  Execute the ViroTrakr workflow with either raw data or assembled sequences (Item 3.1.8 - 3.1.12);
-  Interpret the results (Item 3.1.13 - 3.1.14).

•Note: 
      * This workflow was designed for major foodborne viral pathogens included in the ViroTrakr database (currently HAV, norovirus and sapovirus, additional species are under development). Norovirus is used as an example in this SOP. Users should select the appropriate data library for their target virus in step 3.1.8 and corresponding workflow in step 3.1.10. Genotyping and phylogenetic trees are generated in the similar manner as described in step 3.1.12. 

     *  In this updated version, the previous HAV and NoV pipelines have been integrated with the newly developed sapovirus pipeline and adapted to the new GalaxyTrakr format. This SOP supersedes the earlier versions published for HAV and NoV.


ViroTrakr:  foodborne viruses (ID 396739) - BioProject - NCBI (nih.gov)

   
* Reference: 
-Quality control assessment for microbial genomes: GalaxyTrakr MicroRunQC workflow V.5:
Quality control assessment for microbial genomes: GalaxyTrakrMicroRunQC workflow (protocols.io)
-    GalaxyTrakr: https://doi.org/10.1186/s12864-021-07405-8?rlappend=%3Futm_source%3Dresearchgate.net%26utm_medium%3Darticle

Set up a GalaxyTrakr account.

Log into your GalaxyTrakr account.

Create a GalaxyTrakr account if you are the first-time user:
Galaxy | GT Prod 🧬🔬

Log into your GalaxyTrakr account if you already have one:
Galaxy | GT Prod 🧬🔬

Get familiar with GalaxyTrakr components: Tools, Information and History:

Create a new history/workspace for a submission.

Create a new history:

Upload raw sequencing data and execute the data analysis workflow.

Upload raw sequencing data and execute the data analysis workflow.

The raw sequencing data in fastq files can be imported into GalaxyTrakr directly from your local folder
(instructions shown in 3.1); or downloaded from SRA (instructions shown in 3.2) if the files have been already submitted to ViroTrakr in NCBI (following the Submission protocol): 
NCBI submission protocol for foodborne virus surveillance (protocols.io)).
After being uploaded to GalaxyTrakr, the files will remain in your account and stay private until they are deleted. 

Upload raw data from your local folder.

3.1.1. Click on the button “Upload”.

3.1.1.-cont. Click “Choose local files”.

3.1.2. Select fastq files from your local folder, then click “Open”.  

3.1.3. Click “Start” to upload. 

  3.1.4. Check the status of data upload. 

3.1.5. Build a list of Dataset.

 3.1.6. Build a collection of datasets.

3.1.7. List of Datasets will be available in the history. 

3.1.8. Import the reference data files from the Shared Data libraries following the steps below. 

3.1.8.-cont.  Import the reference data files from the Shared Data libraries following the steps below. 

3.1.9. Reference libraries are available in the history. 

3.1.10. Click the “WorkFlows” button in the left-hand menu -> select the "Public" workflows -> then type and search for “ViroTrakr”.  

 3.1.10.-cont.  Select the appropriate workflow (“Norovirus with reads as input” in this case), then click the arrowhead button to Run the workflow.

3.1.11.  Ensure to select all the appropriate files (arrows) from each dropdown menu and Run Workflow.

3.1.12.  The workflow status will be displayed step-by-step in real time. As each step completes, it will turn green in both the Steps and History panels. The run may take a few minutes or longer, depending on the data and system load. 

3.1.12.-cont.  Once the run is complete, the results will be available in the individual reports in the History. Users can view and download each output file, including the assembly, viral contigs, genotyping report and phylogenetic tree, etc. 

3.1.13.  An example of a genotyping report is shown below:

3.1.14.  An example of a phylogenetic analysis report is shown below:

The result files include: 

•Assembly with MEGAHIT: Metagenomic assembly generated using MEGAHIT.
•Report: Kraken2: Kraken2 classification reports.
•Report_blasthits_Genotype: Report of the best BLAST Hits against reference sequences.
•Genotyping_report: Final report contains QC statistics and genotyping results. 
•Virus contigs: Target virus specific contigs extracted from the metagenomics assembly.
•Reference_query_phylogenetic_tree: Phylogenetic tree showing the input genomes alongside reference genomes from all groups (available in.png and .txt formats).

Download raw data from SRA database.

Users can also download raw data from SRA if already submitted to ViroTrakr, or from other Bioprojects
in NCBI. Ensure to have the SRA accession numbers ready before starting the download. 

3.2.1. Create a list of SRA accession numbers. 

* Note: the numbers must start with SRR, DRR or ERR in a text file, one per line. Save it in a local folder. 

3.2.2. (1) Click the “Upload” button in the left-hand menu, (2) select “Paste/Fetch data”, (3) enter the file name containing the SRA accession number list, (4) click “Choose local file” to select the file from your local folder, then (5) click the “Start” button to begin the upload. The file will be available and appear in the current History, indicated by the arrow below.

3.2.3. (1) Click the “Tool” button in the left-hand menu, (2) click “Get Data”, (3) choose "Faster Download and Extract Reads in FASTQ", (4) select "List of SRA accession" from the drop-down menu as the input type, (5) select the list file from the History panel, then (5) click the “Run Tool” button to start the process. 

3.2.4. The data files will be downloaded from the NCBI SRA database. Download time may vary depending on the number of files and the current NCBI server status.

3.2.5. Follow the steps outlined in 3.1.8. to 3.1.14 to run the workflow and collect the results.

Upload assembled sequences and execute the data analysis workflow.

Upload assembled sequences.
This ViroTrakr workflow supports both raw sequencing data and pre-assembled sequences. You can
upload raw data directly (Item 3), or use assembled sequences generated on other platforms and stored in your local folder.

Click on the button “Upload” in the left-hand menu.

4.1.-cont. Click “Choose local files”.

Click “Start” to upload. 

Check the status of data upload. 

Build a list of Dataset. 

Build a collection of datasets.

List of Datasets will be available in the history. 

Follow the steps outlined in 3.1.8. to 3.1.14 to run the workflow and collect the results.