Aug 14, 2025

Public workspaceQuality control and preliminary data analysis of Sequencing Data: GalaxyTrakr ONT workflow V.3

  • Narjol Gonzalez-Escalona1,
  • Maria Hoffmann1,
  • Jayanthi Gangiredla1
  • 1US FDA
Icon indicating open access to content
QR code linking to this content
Protocol CitationNarjol Gonzalez-Escalona, Maria Hoffmann, Jayanthi Gangiredla 2025. Quality control and preliminary data analysis of Sequencing Data: GalaxyTrakr ONT workflow. protocols.io https://dx.doi.org/10.17504/protocols.io.6qpvrq54plmk/v3Version created by Narjol Gonzalez-Escalona
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: August 09, 2025
Last Modified: August 14, 2025
Protocol Integer ID: 224356
Keywords: ONT, sequencing, long-read, Oxford Nanopore , R10.4.1, microbial genome assembly, AMR prediction, preliminary data analysis of sequencing data, draft de novo assembly, sequencing data, ont assembly workflow, galaxytrakr ont, preliminary sequence analysis for foodborne pathogen, ont workflow, galaxy trakr, galaxy trakr account, wgs sequence quality, account in galaxytrakr, galaxytrakr, custom galaxy instance, sequence type for each isolate, preliminary sequence analysis, workflow, most microbial pathogen, checking oxford nanopore technology, laboratory
Funders Acknowledgements:
FDA Foods Programme
Abstract
Step-by-step instructions for checking Oxford Nanopore Technologies (ONT) WGS sequence quality and preliminary sequence analysis for foodborne pathogens. The ONT workflow, implemented in a custom Galaxy instance, will produce draft de novo assemblies, along with reporting several characteristics such as: sequence type for each isolate, AMR profile, and serotype prediction in the case of Salmonella. This workflow will work on most microbial pathogens, so we advise laboratories to upload their entire ONT run through this workflow.

SCOPE: This protocol covers the following tasks:

1. set up an account in GalaxyTrakr (if not having one already)
2. Create a new history/workspace
3. Upload data
4. Execute the ONT assembly workflow
5. Interpret the results

The first section "Account set up" it is applicable only to members that do not have a galaxy trakr account. If you do, please proceed to section 2.
Troubleshooting
Account set up
Create a GalaxyTrakr account here: https://account.galaxytrakr.org/Account/Register




Log into your GalaxyTrakr account: https://galaxytrakr.org





Create a new history
We recommend creating a new history for each new ONT Run and including the flow cell ID and date in the history name.
Click on the + icon in the upper right History panel




Name your new History by clicking on the “Unnamed history”, type in desired name and hit enter. We recommend including the flow cell ID and the date the run was started.




Upload data
This section will describe the process for uploading raw fastq files into your active History panel. After the files have been uploaded they will stay in your account until they are deleted.
Click on the upload icon on the top of the left web page to start an upload process.






Alternative you can use FileZilla to upload your data to the Galaxy Trakr ftp and retrieve it from there as described below:


Protocol
CREATED BY
Narjol Gonzalez-Escalona

Select "Type (set all):fastqsanger.gz." Choose local file button and navigate to the desired fastq files, then click "start" to upload files. These files should be one single fastq file per sample/isolate).




As the file uploads complete, each row will turn green. Samples in yellow are still in process.








Execute the ONT assembly workflow
Add the ONT (Nanopore Datasets-QC_assembly_genome_profiling_v1) workflow to your own "workflows" panel. You only have to do this step once for each new workflow you need.

Navigate to the “Shared Data" drop down menu, choose workflows and from the Nanopore Datasets-QC_assembly_genome_profiling_v1 drop down menu select import.


click on shared data and select "workflows"

then search for "Nanopore"







Then select import and you will have it in your workflows.


To see the new workflow in your “WORKFLOWS” tools panel on the left, open the Workflow tab and check “show in tools panel” for the workflow of interest.







From the workflow menu select Nanopore Datasets-QC_assembly_genome_profiling_v1. You can added to your favorites and this will make it easier to find it next time you need to use it.


Create a new history and add your fastq files. Then select your fastq files and create a list of Datasets collection (Build Dataset List) with a name of your choosing.








Run the Nanopore Datasets-QC_assembly_genome_profiling_v1 workflow using that dataset collection as input. This can take some time depending on the number of samples you are analyzing. If you choose to you can log out of GalaxyTrakr and log back in at a later time to see if the job is completed.





Upon completion of the pipeline all tiles in the history bar will be green.
Results and interpretation
Results: After the end of the workflow you will have 6 extra files. The combined profiles result will be on the Profiles-table (download it). This Table will contain all your in silico analysis results for all your samples. Flye on collection - consensus will have all of the assemblies (download it). The Flye on collection: assembly info, contains all your assembly information (download it as well).




To download the files click on the title on the file (e.g. Results_Table) and click the floppy disc icon.





The Results_table output file includes the following metrics:

AB
sampleChange this to your sample number or ID
ReadsNumber of reads
GC%GC% content 
Avg_LenAverage length of the reads or reads N50
Mean_QMean Quality of the reads
Q10%percentage of reads > Q10
TaxaTaxa identified from the reads, closest strain match 
ANIANI number compared to the cloest match
AMRAMR gene content
STRESSStress gene identified
STSequence Type
Predicted identificationSalmonella enterica subspecies ID
Predicted antigenic profileIn the case of Salmonella, the antigenic profile
Predicted serotypeSalmonella Serotype
cov.Mean coverage for contigs in the flye assembly.
circ.If the contigs are circular or not
count(Contigs)Number of contigs in the flye assembly
sum(length)Genome size (bp)
ONT pipeline analysis workflow lists the summary metrics for sequence quality, number of contigs, and estimated genome size, along with other common metrics for reads (Mean Length, Quality, and Taxa ANI ID). Additionally, if the Multi-Locus Sequence Type (MLST) for the isolate is available from pubmlst, the workflow also reports Sequence Type (ST), AMR and Stress genes, and in the case of Salmonella will also report Predicted ID, Predicted antigenic profile, and predicted serotype. If no Salmonella, then those fields will left empty. Later iterations of this workflow will address other foodborne pathogens specific serotypes or ID as available.
**This output should be saved either to your LIMS or to a spreadsheet linked to the sequencing run and samples.

Passing Metrics for the assemblies:

minimum depth/coverage: >= 30X
ANI score: >= 95%
average quality score: Mean Q >= 30

Note: The coverage in the Table is based on the longest contig. If you want to know the coverage and close and circularity of each contig, please check the assembly_info file.
Due to the nature of Galaxy Trakr all individual Results_Tables for each sample should be collated into a single Table for easy interpretation. Example output for 8 samples run through the ONT assembly workflow:

CFSAN#BarcodeReadsGC%Avg_LenMean_QQ10%TaxaANIAMRSTRESSVIRULENCESTPredicted identificationPredicted antigenic profilePredicted serotypecov.circ.count(Contigs)sum(length)
CFSAN0003184839770514233.635.99100Salmonella enterica subsp. enterica serovar Heidelberg str. 4157899.998aac(3)-IId,aadA1,blaTEM-1,fosA7,mdsA,mdsB,tet(A)arsR,golS,golTiroB,iroC,sinH,sodC115Salmonella enterica subspecies enterica (subspecies I)4:r:1,2Heidelberg32Y44957870
CFSAN0085854761645514727.235.71100Salmonella enterica subsp. enterica serovar Derby str. 62699.837aac(3)-VIa,aadA1,fosA7.3,mdsA,mdsB,sul1,tet(A)golS,golT,pcoA,pcoB,pcoC,pcoD,pcoE,pcoR,pcoS,qacEdelta1,silA,silB,silC,silE,silF,silP,silR,silSiroB,iroC,sinH40Salmonella enterica subspecies enterica (subspecies I)4:f,g:-Derby47Y45086410
CFSAN0514584673568505885.736.17100Escherichia coli O121 str. RM835299.97acrF,blaEC,emrD,mdtMarsC,arsR,asr,terD,terW,terZ,ymgBeae,efa1,ehxA,espA,espB,espF,espI,espJ,espK,espP,espX1,fdeC,lpfA,nleA,nleB,nleC,stxA2c,stxB2a,tccP,tir,toxB655--:-:-- -:-:-79N75550579
CFSAN0687734597883564909.936.48100Cronobacter sakazakii ATCC BAA-89499.81blaCSAfieF,pcoA,pcoB,pcoC,pcoD,pcoE,pcoR,pcoS,silA,silB,silC,silE,silF,silP,silR,silS,terD,terW,terZ1--:-:-- -:-:-99Y44581775
CFSAN0861804467352575751.536.68100Klebsiella sp. X1-16S-Nf2199.811blaLEN-16,emrD,fosA,kdeA,oqxA,oqxBfieF---:-:-- -:-:-69Y15536414
CFSAN086182434985651525036.36100Citrobacter sp. KTE3298.699blaCMYarsA,arsB,arsC,arsD,arsR---:-:-- -:-:-52Y14917486
CFSAN0861834249755559904.437.05100Enterobacter cancerogenus ATCC 3531698.773ampC,fosAfieFiroB,iroC,iroN---:-:-- -:-:-100Y24992359
CFSAN1229954188297505743.436.25100Salmonella enterica subsp. diarizonae serovar 50:k:z str. MZ008099.285cdtB,iroB,iroC,iucA,iutA,ybtP,ybtQ645Salmonella enterica subspecies diarizonae (subspecies IIIb)6,14:z10:zIIIb (6),14:z10:z92Y15320423
PT exercise circular closed reference genomes characteristics:

TaxaCFSAN No.contig No. expectedChromosomePlasmidsTaxa
GTVSS-021CFSAN06877344,408,038131,19631,2083,983 Cronobacter sakazakii ATCC BAA-894
GTVSS-016CFSAN03080764,813,4098,4017,0984,2385,3965,111Shigella sonnei 08-7761
GTVSS-025CFSAN08495216,326,461 Pseudomonas sp. HMWF034
GTVSS-020CFSAN05145825,421,42281,969 Escherichia coli O121 str. RM8352
GTVSS-019CFSAN04483622,864,61457,565 Listeria innocua ATCC 33091
GTVSS-014CFSAN02346912,939,749 Listeria monocytogenes CFSAN002349
GTVSS-011CFSAN02346412,939,778 Listeria monocytogenes CFSAN002349
GTVSS-001CFSAN00018924,724,86381,814 Salmonella enterica subsp. enterica serovar Bareilly str. CFSAN000189