Salmonella serotype prediction using the GalaxyTrakr SeqSero2 workflow

Paul Morin; Ruth Timme; Michelle Moore; Shauna Madson; Evelyn Ladines; Julia Manetas; Karen Jinneman

May 14, 2026

Version 3

Salmonella serotype prediction using the GalaxyTrakr SeqSero2 workflow V.3

DOI

https://dx.doi.org/10.17504/protocols.io.4r3l24kypg1y/v3

Salmonella serotype prediction using the GalaxyTrakr SeqSero2 workflow

Paul Morin¹,
Ruth Timme¹,
Michelle Moore¹,
Shauna Madson¹,
Evelyn Ladines¹,
Julia Manetas¹,
Karen Jinneman¹

¹US Food and Drug Administration

Shauna Madson: Retired

GenomeTrakr
Tech. support email: [email protected]

Paul Morin

FDA

DOI: https://dx.doi.org/10.17504/protocols.io.4r3l24kypg1y/v3

Protocol Citation: Paul Morin, Ruth Timme, Michelle Moore, Shauna Madson, Evelyn Ladines, Julia Manetas, Karen Jinneman 2026. Salmonella serotype prediction using the GalaxyTrakr SeqSero2 workflow. protocols.io https://dx.doi.org/10.17504/protocols.io.4r3l24kypg1y/v3Version created by Paul Morin

Manuscript citation:

Zhang S, Yin Y, Jones MB, Zhang Z, Deatherage Kaiser BL, Dinsmore BA, Fitzgerald C, Fields PI, Deng X. (2015). Salmonella serotype determination utilizing high-throughput genome sequencing data. ASM Journals Vol. 53,
No. 5

Zhang S, denBakker HC, Li S, Chen J, Dinsmore BA, Lane C, Lauer AC, Fields PI, Deng X. 2019. SeqSero2: Rapid and Improved Salmonella Serotype Determination Using Whole-Genome Sequencing Data. Appl Environ Microbiol 85:e01746-19. https://doi.org/10.1128/AEM.01746-19

Gangiredla, J., Rand, H., Benisatto, D. et al. GalaxyTrakr: a distributed analysis tool for public health whole genome
sequence data accessible to non-bioinformaticians. BMC Genomics 22, 114 (2021).

Deng X, Li S, Xu T, Zhou Z, Moore MM, Timme R, Zhao S, Lane C, Dinsmore BA, Weill F, Fields PI. Salmonella serotypes in the genomic era: simplified Salmonella serotype interpretation from DNA sequence data. Appl Environ Microbiol. 2025 Mar 19;91(3):e0260024. https://journals.asm.org/doi/10.1128/aem.02600-24

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: March 16, 2026

Last Modified: May 14, 2026

Protocol Integer ID: 313363

Keywords: salmonella, genomic serotyping, seqsero2, salmonella serotype prediction, salmonella serotype, scheme for phenotypic salmonella, phenotypic salmonella, genes responsible for serotype antigen, serotype antigen, salmonella serological somatic, molecular methods for serotype determina..., serotype determination, serotype prediction, traditional serotype determination, rfb gene cluster, antigenic formula, antigen, whole genome sequence, gene, isolates in the galaxytrakr environment, culture isolate, bioinformatic pipeline, GalaxyTrakr, salmonella serotype prediction, salmonella serotype, scheme for phenotypic salmonella, phenotypic salmonella, genes responsible for serotype antigen, serotype antigen, salmonella serological somatic, molecular methods for serotype determination, serotype determination, serotype prediction, salmonella, seqsero2, traditional serotype determination, rfb gene cluster, antigenic formula, antigen, whole genome sequence, bioinformatic pipeline, gene, isolates in the galaxytrakr environment, culture isolate

Disclaimer

Please note that this protocol is public domain, which supersedes the CC-BY license default used by protocols.io.

Abstract

Salmonella serotypes are defined by two surface structures, O antigen and two H antigens. Traditional serotype determination is performed with the Salmonella serological somatic (O) and flagellar (H) tests and paired with biochemical confirmation. More than 2,600 Salmonella serotypes have been described in the White-Kauffmann-Le Minor scheme. Molecular methods for serotype determination have been developed based on genes responsible for serotype antigens. These genes are encoded in the rfb gene cluster, fliC, and fljB. SeqSero2 is a bioinformatic pipeline that uses whole genome sequence (WGS) data from pure-culture isolates to perform in silico analysis to determine the antigenic formula, including somatic (O) antigens and both flagellar (H) antigens. This provides continuity with the well-established scheme for phenotypic Salmonella serotypes.

PURPOSE:
This document outlines the steps required to run SeqSero2 on a collection of isolates in the GalaxyTrakr environment. This is performed by utilizing a custom workflow called “SeqSero2 (prd25)” and downloading the resulting data.

SCOPE: This protocol covers the following tasks:
1. Login or set up an account in GalaxyTrakr
2. Create a new history/workspace
3. Upload data
4. Execute the SeqSero2 workflow
5. Download the results

Materials

Salmonella WGS fastq files or SRA accessions

Before start

When using GalaxyTrakr, it is recommended to use Google Chrome for optimal browser experience although Microsoft Edge and Safari are also compatible browsers. Internet Explorer and Mozilla FireFox are NOT compatible with GalaxyTrakr.

Log into GalaxyTrakr (https://galaxytrakr.org/)

Link to create a new GalaxyTrakr account: https://galaxytrakr.org/login/start

Locating the "SeqSero2 (prd25)" workflow in "Public workflows"

Click on Workflows in Activity Panel and then Public workflows in the Main Panel.

Enter “Seqsero2” in the search bar. The workflow should show up at the top
as “SeqSero2 (prd25)” with estrain as the author.

Click on the workflow title “SeqSero2 (prd25)” to confirm SeqSero2 (prd25) – Version 1.

Optional: Click import to keep a copy in “My workflows”. This only needs to be done once.  After this workflow is imported, it will always be available for use in "My workflows".

Optional: Changing workflow name. Hover over the workflow title and click on the pen icon to rename the workflow (note: icon does not show up on screenshot. Red arrow illustrates where icon appears.) Adding a date to the name will help in keeping track of newer versions.  Workflows do get updated periodically and you  want to ensure you are working with the most recent version.  

Optional: If preferred, click the star for bookmarking your workflows. You can filter your view by showing only bookmarked workflows.

Import data for analysis

If your data is already in GalaxyTrakr, open the history containing that data to be analyzed or move the data to a new history for analysis and proceed to Step #4. This option may be preferred if the data was already uploaded for other purposes such as MicroRunQC.  It’s ok if there are non-Salmonella isolates in your dataset as they will not interfere with the workflow output (data will have no antigenic formula or serovar names).

For uploading new data, proceed to the next step to create a new history and upload your data to be analyzed.

Create new History:

Click on the “+” button in the upper right corner.

Type in a custom name (i.e., “SeqSero Prediction”)

Importing Data: Next steps will show how to Upload data (fastq files) or import data from NCBI (SRR files).

A. For Fastq files, use "Upload" (see next Step 3.3)
OR
B. For SRR files from NCBI, use "Download and Extract Reads in FASTQ format from NCBI SRA" (see Step 3.4)

For uploading fastq.gz files stored locally.

Click on "Upload"
Click on “Choose local files”
Find your WGS fastq.gz files and select those (2 data files: Read 1 and Read 2 per organism).
Click “Start” The amount of time to upload depends on how many files have been selected and the size of those files. The status bar will start to fill as upload progress is made and turn green when completed.

“Download and Extract Reads in FASTQ format from NCBI SRA” to import data from NCBI. 

Click on "Tools"
Click "Get Data"
Click on "Download and Extract Reads in FASTQ format from NCBI SRA"
Enter the NCBI SRR for isolate sequence to be retrieved.
Click “Run Tool”

When the data has finished importing, you should see the successfully uploaded files listed in green in the right panel. 

Files will be highlighted in RED if they were NOT successfully uploaded. 

Example of fast.gz files uploaded:

Example of SRR data downloaded from NCBI:

Build your dataset of paired-reads

For uploaded local .gz files, build list of data set pairs by following Steps 4.1 through 4.5. 

*Note: For SRR data downloaded from NCBI, the two reads will already be paired and you will not
need to manually pair them. If analyzing more than one isolate, you will want to merge your data set collections. See Step 4.6.

Click on the white check mark (blue box) in the history panel and select all files you want to include in the data set for SeqSero analysis.

After files are selected, click on “x of x selected” and choose “Auto Build List” or “Advanced Build Test”. For _R1 and _R2 fastq files, they should automatically pair. Continue with Step 4.4.

If your data has different extensions, you can choose “Advanced Build List” and use specific filters for pairing data files.

Click "List of Paired Datasets" and Next.

Click Filters and choose correct file extension.

Click Next. Files should automatically pair.

The Read 1 and Read 2 fastq.gz files should automatically pair together.

Type in a custom Name for the paired dataset (i.e., “Paired Slm Files”)

Uncheck “Hide original elements” if you want to see your original files.

Click “Build"

 Your paired file should appear green in the history panel.

Note: You may also use the same dataset list from other workflows such as MicroRunQC. It’s ok if there are non-Salmonella isolates in your dataset. They will not return an antigenic formula or serovar name.

If SRR data was downloaded from NCBI for more than one isolate, merge the paired datasets into a list of dataset pairs.

Click on “Tools” and navigate to "Collection Operations" and click on “Merge collections”

Select input collections (Paired-end data (fastq-dump)) to be merged. Additional collections can be specified with the “+ Insert Input Collections” button. When finished, click “Run Tool”.

The resulting merged file will appear in your history panel and can be used in the SeqSero2 (prd25) workflow.

Analyze list of data set pairs by SeqSero2 (prd25) workflow

If you have imported the Seqsero2 workflow into your workflows, click “My
workflows” and click the play icon (white arrow) to run the “SeqSero2 (prd25)” workflow.

Note: If you did not import it into “My workflows”, click on “Public workflows” and
search for "Seqsero2 (prd25)".

In the center panel, your newly created list of paired files should automatically show up in the “Input dataset collection” window.  If it doesn’t, click and drag the file from your history panel into the “Input dataset collection” window or use the drop-down arrow and select the paired data file.

Click “Run Workflow”

Your center panel will show the entire workflow as your data proceeds through each tool. Green boxes indicate the tool has successfully run, yellow/orange boxes indicate the tool is processing the data, and a red box indicates a failure.

After the workflow has completed, all the boxes should be green and the history panel should also have green boxes with a white check mark in a blue box on the upper lefthand corner.

Viewing and Exporting SeqSero Results

After the SeqSero analysis is complete, the results will appear green in the history panel as “Cut on data XX”. Click on the “view” icon to review the predicted serotyping results. “Collapse Collection on data XX” also provides serotyping results with information on the output directory and input files.

Export SeqSero results (copy/paste method)

a. Click and drag to highlight data and copy/paste into an Excel file.

b. Alternatively, click on the table, press “Ctrl-A” to select the entire Table, press “Ctrl-C” to copy
data and then paste the copied data into an Excel file by pressing "Ctrl-V".

Export SeqSero2 results (download tab-delimited text file method)

Click the dataset name.

The panel will expand, enabling more options.

Click the "Save" icon to download a tab-delimited file of results.

Example file name: "Galaxy23-[Cut on data 22].tabular"

Optional:
The small "Info" icon provides further details of the dataset. Several tabs are available in which to view your data including Preview, Visualize, Details and Edit. The Details tab provides granular information on your dataset, job metrics, tool parameters and job information which can be helpful for troubleshooting. The "Edit" tab allows users to edit dataset attributes.

Protocol references

Zhang S, Yin Y, Jones MB, Zhang Z, Deatherage Kaiser BL, Dinsmore BA, Fitzgerald C, Fields PI, Deng X. (2015). Salmonella serotype determination utilizing high-throughput genome sequencing data. ASM Journals Vol. 53,
No. 5

Zhang S, denBakker HC, Li S, Chen J, Dinsmore BA, Lane C, Lauer AC, Fields PI, Deng X. 2019. SeqSero2: Rapid and Improved Salmonella Serotype Determination Using Whole-Genome Sequencing Data. Appl Environ Microbiol 85:e01746-19. https://doi.org/10.1128/AEM.01746-19

Gangiredla, J., Rand, H., Benisatto, D. et al. GalaxyTrakr: a distributed analysis tool for public health whole genome
sequence data accessible to non-bioinformaticians. BMC Genomics 22, 114 (2021).

Deng X, Li S, Xu T, Zhou Z, Moore MM, Timme R, Zhao S, Lane C, Dinsmore BA, Weill F, Fields PI. Salmonella serotypes in the genomic era: simplified Salmonella serotype interpretation from DNA sequence data. Appl Environ Microbiol. 2025 Mar 19;91(3):e0260024. https://journals.asm.org/doi/10.1128/aem.02600-24

GalaxyTrakr Training Resources
https://galaxytrakr.org/about

FDA Salmonella Serotyping Verification Strain Set
https://galaxytrakr.org/libraries/folders/F509df0ee950a48c8/page/1