Apr 02, 2026

Public workspaceGenotyping and phylogenetic analysis_ViroTrakr workflows

  • Jayanthi Gangiredla1,
  • Mark Mammel1,
  • Zhihui Yang2
  • 1FDA/HFP/OLOAS/OAMT/DFSG;
  • 2FDA/HFP/OLOAS/OAMT/DFES
  • GenomeTrakr
  • ViroTrakr
Icon indicating open access to content
QR code linking to this content
Protocol CitationJayanthi Gangiredla, Mark Mammel, Zhihui Yang 2026. Genotyping and phylogenetic analysis_ViroTrakr workflows. protocols.io https://dx.doi.org/10.17504/protocols.io.5jyl848org2w/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: March 23, 2026
Last Modified: April 02, 2026
Protocol Integer ID: 313776
Disclaimer
Please note that this protocol is public domain, which supersedes the CC-BY license default used by protocols.io.
Abstract
ABSTRACT

This workflow provides step-by-step instructions for foodborne viruses sequencing data analysis within the GalaxyTrakr platform. It covers data quality assessment and de novo assembly generation for raw sequencing data (from most Next-Generation Sequencing platforms), execution of the workflows either from the raw sequencing data or assembled sequences, and reports of the genotyping and phylogenetic results.

This protocol describes how to:
- Set up a GalaxyTrakr account (Item 1);
- Create a new history/workspace for a submission (Item 2);
- Upload raw sequencing data (Item 3) either from a local folder (Item 3.1), or download data from NCBI (Item 3.2);
- Upload assembled sequences from a local folder (Item 4);
- Execute the ViroTrakr workflow with either raw data or assembled sequences (Item 3.1.8 - 3.1.12);
- Interpret the results (Item 3.1.13 - 3.1.14).

•Note:
      * This workflow was designed for major foodborne viral pathogens included in the ViroTrakr database (currently HAV, norovirus and sapovirus, additional species are under development). Norovirus is used as an example in this SOP. Users should select the appropriate data library for their target virus in step 3.1.8 and corresponding workflow in step 3.1.10. Genotyping and phylogenetic trees are generated in the similar manner as described in step 3.1.12.

     * In this updated version, the previous HAV and NoV pipelines have been integrated with the newly developed sapovirus pipeline and adapted to the new GalaxyTrakr format. This SOP supersedes the earlier versions published for HAV and NoV.



   
* Reference:
-Quality control assessment for microbial genomes: GalaxyTrakr MicroRunQC workflow V.5:
Troubleshooting
Set up a GalaxyTrakr account.
Log into your GalaxyTrakr account.
Create a GalaxyTrakr account if you are the first-time user:



Log into your GalaxyTrakr account if you already have one:



Get familiar with GalaxyTrakr components: Tools, Information and History:



Create a new history/workspace for a submission.
Create a new history:



Upload raw sequencing data and execute the data analysis workflow.
Upload raw sequencing data and execute the data analysis workflow.

The raw sequencing data in fastq files can be imported into GalaxyTrakr directly from your local folder
(instructions shown in 3.1); or downloaded from SRA (instructions shown in 3.2) if the files have been already submitted to ViroTrakr in NCBI (following the Submission protocol):
NCBI submission protocol for foodborne virus surveillance (protocols.io)). After being uploaded to GalaxyTrakr, the files will remain in your account and stay private until they are deleted.
Upload raw data from your local folder.

3.1.1. Click on the button “Upload”.



3.1.1.-cont. Click “Choose local files”.




     3.1.2. Select fastq files from your local folder, then click “Open”. 



3.1.3. Click “Start” to upload.



  3.1.4. Check the status of data upload.



3.1.5. Build a list of Dataset.



 3.1.6. Build a collection of datasets.



3.1.7. List of Datasets will be available in the history.



3.1.8. Import the reference data files from the Shared Data libraries following the steps below.



3.1.8.-cont.  Import the reference data files from the Shared Data libraries following the steps below.



3.1.9. Reference libraries are available in the history.



3.1.10. Click the “WorkFlows” button in the left-hand menu -> select the "Public" workflows -> then type and search for “ViroTrakr”. 



 3.1.10.-cont. Select the appropriate workflow (“Norovirus with reads as input” in this case), then click the arrowhead button to Run the workflow.



3.1.11. Ensure to select all the appropriate files (arrows) from each dropdown menu and Run Workflow.



3.1.12.  The workflow status will be displayed step-by-step in real time. As each step completes, it will turn green in both the Steps and History panels. The run may take a few minutes or longer, depending on the data and system load.



3.1.12.-cont. Once the run is complete, the results will be available in the individual reports in the History. Users can view and download each output file, including the assembly, viral contigs, genotyping report and phylogenetic tree, etc.



3.1.13. An example of a genotyping report is shown below:



3.1.14. An example of a phylogenetic analysis report is shown below:



The result files include:

•Assembly with MEGAHIT: Metagenomic assembly generated using MEGAHIT.
•Report: Kraken2: Kraken2 classification reports.
•Report_blasthits_Genotype: Report of the best BLAST Hits against reference sequences.
•Genotyping_report: Final report contains QC statistics and genotyping results.
•Virus contigs: Target virus specific contigs extracted from the metagenomics assembly.
•Reference_query_phylogenetic_tree: Phylogenetic tree showing the input genomes alongside reference genomes from all groups (available in.png and .txt formats).

Download raw data from SRA database.

Users can also download raw data from SRA if already submitted to ViroTrakr, or from other Bioprojects in NCBI. Ensure to have the SRA accession numbers ready before starting the download.

3.2.1. Create a list of SRA accession numbers.

* Note: the numbers must start with SRR, DRR or ERR in a text file, one per line. Save it in a local folder.



3.2.2. (1) Click the “Upload” button in the left-hand menu, (2) select “Paste/Fetch data”, (3) enter the file name containing the SRA accession number list, (4) click “Choose local file” to select the file from your local folder, then (5) click the “Start” button to begin the upload. The file will be available and appear in the current History, indicated by the arrow below.



3.2.3. (1) Click the “Tool” button in the left-hand menu, (2) click “Get Data”, (3) choose "Faster Download and Extract Reads in FASTQ", (4) select "List of SRA accession" from the drop-down menu as the input type, (5) select the list file from the History panel, then (5) click the “Run Tool” button to start the process.



3.2.4. The data files will be downloaded from the NCBI SRA database. Download time may vary depending on the number of files and the current NCBI server status.




3.2.5. Follow the steps outlined in 3.1.8. to 3.1.14 to run the workflow and collect the results.


Upload assembled sequences and execute the data analysis workflow.
Upload assembled sequences.
This ViroTrakr workflow supports both raw sequencing data and pre-assembled sequences. You can upload raw data directly (Item 3), or use assembled sequences generated on other platforms and stored in your local folder.

Click on the button “Upload” in the left-hand menu.



4.1.-cont. Click “Choose local files”.



Click “Start” to upload.



Check the status of data upload.



Build a list of Dataset.



Build a collection of datasets.



List of Datasets will be available in the history.



Follow the steps outlined in 3.1.8. to 3.1.14 to run the workflow and collect the results.