Quality control and preliminary data analysis of Sequencing Data: GalaxyTrakr ONT workflow

Narjol Gonzalez-Escalona; Maria Hoffmann; Jayanthi Gangiredla

Aug 14, 2025

Version 3

Quality control and preliminary data analysis of Sequencing Data: GalaxyTrakr ONT workflow V.3

DOI

https://dx.doi.org/10.17504/protocols.io.6qpvrq54plmk/v3

¹US FDA

GenomeTrakr
Tech. support email: [email protected]

Narjol Gonzalez-Escalona

FDA

DOI: https://dx.doi.org/10.17504/protocols.io.6qpvrq54plmk/v3

Protocol Citation: Narjol Gonzalez-Escalona, Maria Hoffmann, Jayanthi Gangiredla 2025. Quality control and preliminary data analysis of Sequencing Data: GalaxyTrakr ONT workflow. protocols.io https://dx.doi.org/10.17504/protocols.io.6qpvrq54plmk/v3Version created by Narjol Gonzalez-Escalona

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: August 08, 2025

Last Modified: August 14, 2025

Protocol Integer ID: 224356

Keywords: ONT, sequencing, long-read, Oxford Nanopore , R10.4.1, microbial genome assembly, AMR prediction, preliminary data analysis of sequencing data, draft de novo assembly, sequencing data, ont assembly workflow, galaxytrakr ont, preliminary sequence analysis for foodborne pathogen, ont workflow, galaxy trakr, galaxy trakr account, wgs sequence quality, account in galaxytrakr, galaxytrakr, custom galaxy instance, sequence type for each isolate, preliminary sequence analysis, workflow, most microbial pathogen, checking oxford nanopore technology, laboratory

Funders Acknowledgements:

FDA Foods Programme

Abstract

Step-by-step instructions for checking Oxford Nanopore Technologies (ONT) WGS sequence quality and preliminary sequence analysis for foodborne pathogens. The ONT workflow, implemented in a custom Galaxy instance, will produce draft de novo assemblies, along with reporting several characteristics such as:  sequence type for each isolate, AMR profile, and serotype prediction in the case of Salmonella. This workflow will work on most microbial pathogens, so we advise laboratories to upload their entire ONT run through this workflow.   

SCOPE: This protocol covers the following tasks:

1. set up an account in GalaxyTrakr (if not having one already)
2. Create a new history/workspace
3. Upload data
4. Execute the ONT assembly workflow
5. Interpret the results

The first section "Account set up" it is applicable only to members that do not have a galaxy trakr account. If you do, please proceed to section 2.

Account set up

Create a GalaxyTrakr account here: https://account.galaxytrakr.org/Account/Register

Log into your GalaxyTrakr account: https://galaxytrakr.org

Create a new history

We recommend creating a new history for each new ONT Run and including the flow cell ID and date in the history name. 

Click on the + icon in the upper right History panel

Name your new History by clicking on the “Unnamed history”, type in desired name and hit enter.  We recommend including the flow cell ID and the date the run was started.

Upload data

This section will describe the process for uploading raw fastq files into your active History panel. After the files have been uploaded they will stay in your account until they are deleted. 

Click on the upload icon on the top of the left web page to start an upload process.

Alternative you can use FileZilla to upload your data to the Galaxy Trakr ftp and retrieve it from there as described below:


Protocol
NAME
Uploading files to your Galaxy Trakr account using FileZilla
CREATED BY
Narjol Gonzalez-Escalona

Select "Type (set all):fastqsanger.gz." Choose local file button and navigate to the desired fastq files, then click "start" to upload files. These files should be one single fastq file per sample/isolate).

As the file uploads complete, each row will turn green. Samples in yellow are still in process.

Execute the ONT assembly workflow

Add the ONT (Nanopore Datasets-QC_assembly_genome_profiling_v1) workflow to your own "workflows" panel.  You only have to do this step once for each new workflow you need.

Navigate to the “Shared Data" drop down menu, choose workflows and from the Nanopore Datasets-QC_assembly_genome_profiling_v1 drop down menu select import.

click on shared data and select "workflows"

then search for "Nanopore"

Then select import and you will have it in your workflows.

To see the new workflow in your “WORKFLOWS” tools panel on the left, open the Workflow tab and check “show in tools panel” for the workflow of interest. 

From the workflow menu select Nanopore Datasets-QC_assembly_genome_profiling_v1. You can added to your favorites and this will make it easier to find it next time you need to use it.

Create a new history and add your fastq files. Then select your fastq files and create a list of Datasets collection (Build Dataset List) with a name of your choosing.     

Run the Nanopore Datasets-QC_assembly_genome_profiling_v1 workflow using that dataset collection as input. This can take some time depending on the number of samples you are analyzing. If you choose to you can log out of GalaxyTrakr and log back in at a later time to see if the job is completed.

Upon completion of the pipeline all tiles in the history bar will be green.

Results and interpretation

Results: After the end of the workflow you will have 6 extra files. The combined profiles result will be on the Profiles-table (download it). This Table will contain all your in silico analysis results for all your samples. Flye on collection - consensus will have all of the assemblies (download it). The Flye on collection: assembly info, contains all your assembly information (download it as well).

To download the files click on the title on the file (e.g. Results_Table) and click the floppy disc icon.

The Results_table output file includes the following metrics:

 
AB
sampleChange this to your sample number or ID
ReadsNumber of reads
GC%GC% content 
Avg_LenAverage length of the reads or reads N50
Mean_QMean Quality of the reads
Q10%percentage of reads > Q10
TaxaTaxa identified from the reads, closest strain match 
ANIANI number compared to the cloest match
AMRAMR gene content
STRESSStress gene identified
STSequence Type
Predicted identificationSalmonella enterica subspecies ID
Predicted antigenic profileIn the case of Salmonella, the antigenic profile
Predicted serotypeSalmonella Serotype
cov.Mean coverage for contigs in the flye assembly.
circ.If the contigs are circular or not
count(Contigs)Number of contigs in the flye assembly
sum(length)Genome size (bp)
ONT pipeline analysis workflow lists the summary metrics for sequence quality, number of contigs, and estimated genome size, along with other common metrics for reads (Mean Length, Quality, and Taxa ANI ID).  Additionally, if the Multi-Locus Sequence Type (MLST) for the isolate is available from pubmlst, the workflow also reports Sequence Type (ST), AMR and Stress genes, and in the case of Salmonella will also report Predicted ID, Predicted antigenic profile, and predicted serotype. If no Salmonella, then those fields will left empty. Later iterations of this workflow will address other foodborne pathogens specific serotypes or ID as available.   
 
**This output should be saved either to your LIMS or to a spreadsheet linked to the sequencing run and samples.

Passing Metrics for the assemblies: 

minimum depth/coverage: >= 30X
ANI score: >= 95%
average quality score: Mean Q >= 30

Note: The coverage in the Table is based on the longest contig. If you want to know the coverage and close and circularity of each contig, please check the assembly_info file.

Due to the nature of Galaxy Trakr all individual Results_Tables for each sample should be collated into a single Table for easy interpretation. Example output for 8 samples run through the ONT assembly workflow:

 
CFSAN#BarcodeReadsGC%Avg_LenMean_QQ10%TaxaANIAMRSTRESSVIRULENCESTPredicted identificationPredicted antigenic profilePredicted serotypecov.circ.count(Contigs)sum(length)
CFSAN0003184839770514233.635.99100Salmonella enterica subsp. enterica serovar Heidelberg str.
  4157899.998aac(3)-IId,aadA1,blaTEM-1,fosA7,mdsA,mdsB,tet(A)arsR,golS,golTiroB,iroC,sinH,sodC115Salmonella enterica subspecies enterica (subspecies I)4:r:1,2Heidelberg32Y44957870
CFSAN0085854761645514727.235.71100Salmonella enterica subsp. enterica serovar Derby str. 62699.837aac(3)-VIa,aadA1,fosA7.3,mdsA,mdsB,sul1,tet(A)golS,golT,pcoA,pcoB,pcoC,pcoD,pcoE,pcoR,pcoS,qacEdelta1,silA,silB,silC,silE,silF,silP,silR,silSiroB,iroC,sinH40Salmonella enterica subspecies enterica (subspecies I)4:f,g:-Derby47Y45086410
CFSAN0514584673568505885.736.17100Escherichia coli O121 str. RM835299.97acrF,blaEC,emrD,mdtMarsC,arsR,asr,terD,terW,terZ,ymgBeae,efa1,ehxA,espA,espB,espF,espI,espJ,espK,espP,espX1,fdeC,lpfA,nleA,nleB,nleC,stxA2c,stxB2a,tccP,tir,toxB655--:-:-- -:-:-79N75550579
CFSAN0687734597883564909.936.48100Cronobacter sakazakii ATCC
  BAA-89499.81blaCSAfieF,pcoA,pcoB,pcoC,pcoD,pcoE,pcoR,pcoS,silA,silB,silC,silE,silF,silP,silR,silS,terD,terW,terZ1--:-:-- -:-:-99Y44581775
CFSAN0861804467352575751.536.68100Klebsiella sp. X1-16S-Nf2199.811blaLEN-16,emrD,fosA,kdeA,oqxA,oqxBfieF---:-:-- -:-:-69Y15536414
CFSAN086182434985651525036.36100Citrobacter sp. KTE3298.699blaCMYarsA,arsB,arsC,arsD,arsR---:-:-- -:-:-52Y14917486
CFSAN0861834249755559904.437.05100Enterobacter cancerogenus ATCC
  3531698.773ampC,fosAfieFiroB,iroC,iroN---:-:-- -:-:-100Y24992359
CFSAN1229954188297505743.436.25100Salmonella enterica subsp.
  diarizonae serovar 50:k:z str. MZ008099.285cdtB,iroB,iroC,iucA,iutA,ybtP,ybtQ645Salmonella enterica subspecies
  diarizonae (subspecies IIIb)6,14:z10:zIIIb (6),14:z10:z92Y15320423
   

PT exercise circular closed reference genomes characteristics:

 
 
TaxaCFSAN No.contig No. expectedChromosomePlasmidsTaxa
GTVSS-021CFSAN06877344,408,038131,19631,2083,983  Cronobacter sakazakii ATCC BAA-894
GTVSS-016CFSAN03080764,813,4098,4017,0984,2385,3965,111Shigella sonnei 08-7761
GTVSS-025CFSAN08495216,326,461  Pseudomonas sp. HMWF034
GTVSS-020CFSAN05145825,421,42281,969 Escherichia coli O121 str. RM8352
GTVSS-019CFSAN04483622,864,61457,565 Listeria innocua ATCC 33091
GTVSS-014CFSAN02346912,939,749  Listeria monocytogenes CFSAN002349
GTVSS-011CFSAN02346412,939,778  Listeria monocytogenes CFSAN002349
GTVSS-001CFSAN00018924,724,86381,814    Salmonella enterica subsp. enterica
  serovar Bareilly str. CFSAN000189
 
 

	A	B
	sample	Change this to your sample number or ID
	Reads	Number of reads
	GC%	GC% content
	Avg_Len	Average length of the reads or reads N50
	Mean_Q	Mean Quality of the reads
	Q10%	percentage of reads > Q10
	Taxa	Taxa identified from the reads, closest strain match
	ANI	ANI number compared to the cloest match
	AMR	AMR gene content
	STRESS	Stress gene identified
	ST	Sequence Type
	Predicted identification	Salmonella enterica subspecies ID
	Predicted antigenic profile	In the case of Salmonella, the antigenic profile
	Predicted serotype	Salmonella Serotype
	cov.	Mean coverage for contigs in the flye assembly.
	circ.	If the contigs are circular or not
	count(Contigs)	Number of contigs in the flye assembly
	sum(length)	Genome size (bp)


CFSAN#	Barcode	Reads	GC%	Avg_Len	Mean_Q	Q10%	Taxa	ANI	AMR	STRESS	VIRULENCE	ST	Predicted identification	Predicted antigenic profile	Predicted serotype	cov.	circ.	count(Contigs)	sum(length)
CFSAN000318	48	39770	51	4233.6	35.99	100	Salmonella enterica subsp. enterica serovar Heidelberg str. 41578	99.998	aac(3)-IId,aadA1,blaTEM-1,fosA7,mdsA,mdsB,tet(A)	arsR,golS,golT	iroB,iroC,sinH,sodC1	15	Salmonella enterica subspecies enterica (subspecies I)	4:r:1,2	Heidelberg	32	Y	4	4957870
CFSAN008585	47	61645	51	4727.2	35.71	100	Salmonella enterica subsp. enterica serovar Derby str. 626	99.837	aac(3)-VIa,aadA1,fosA7.3,mdsA,mdsB,sul1,tet(A)	golS,golT,pcoA,pcoB,pcoC,pcoD,pcoE,pcoR,pcoS,qacEdelta1,silA,silB,silC,silE,silF,silP,silR,silS	iroB,iroC,sinH	40	Salmonella enterica subspecies enterica (subspecies I)	4:f,g:-	Derby	47	Y	4	5086410
CFSAN051458	46	73568	50	5885.7	36.17	100	Escherichia coli O121 str. RM8352	99.97	acrF,blaEC,emrD,mdtM	arsC,arsR,asr,terD,terW,terZ,ymgB	eae,efa1,ehxA,espA,espB,espF,espI,espJ,espK,espP,espX1,fdeC,lpfA,nleA,nleB,nleC,stxA2c,stxB2a,tccP,tir,toxB	655	-	-:-:-	- -:-:-	79	N	7	5550579
CFSAN068773	45	97883	56	4909.9	36.48	100	Cronobacter sakazakii ATCC BAA-894	99.81	blaCSA	fieF,pcoA,pcoB,pcoC,pcoD,pcoE,pcoR,pcoS,silA,silB,silC,silE,silF,silP,silR,silS,terD,terW,terZ		1	-	-:-:-	- -:-:-	99	Y	4	4581775
CFSAN086180	44	67352	57	5751.5	36.68	100	Klebsiella sp. X1-16S-Nf21	99.811	blaLEN-16,emrD,fosA,kdeA,oqxA,oqxB	fieF		-	-	-:-:-	- -:-:-	69	Y	1	5536414
CFSAN086182	43	49856	51	5250	36.36	100	Citrobacter sp. KTE32	98.699	blaCMY	arsA,arsB,arsC,arsD,arsR		-	-	-:-:-	- -:-:-	52	Y	1	4917486
CFSAN086183	42	49755	55	9904.4	37.05	100	Enterobacter cancerogenus ATCC 35316	98.773	ampC,fosA	fieF	iroB,iroC,iroN	-	-	-:-:-	- -:-:-	100	Y	2	4992359
CFSAN122995	41	88297	50	5743.4	36.25	100	Salmonella enterica subsp. diarizonae serovar 50:k:z str. MZ0080	99.285			cdtB,iroB,iroC,iucA,iutA,ybtP,ybtQ	645	Salmonella enterica subspecies diarizonae (subspecies IIIb)	6,14:z10:z	IIIb (6),14:z10:z	92	Y	1	5320423


Taxa	CFSAN No.	contig No. expected	Chromosome		Plasmids				Taxa
GTVSS-021	CFSAN068773	4	4,408,038	131,196	31,208	3,983			Cronobacter sakazakii ATCC BAA-894
GTVSS-016	CFSAN030807	6	4,813,409	8,401	7,098	4,238	5,396	5,111	Shigella sonnei 08-7761
GTVSS-025	CFSAN084952	1	6,326,461						Pseudomonas sp. HMWF034
GTVSS-020	CFSAN051458	2	5,421,422	81,969					Escherichia coli O121 str. RM8352
GTVSS-019	CFSAN044836	2	2,864,614	57,565					Listeria innocua ATCC 33091
GTVSS-014	CFSAN023469	1	2,939,749						Listeria monocytogenes CFSAN002349
GTVSS-011	CFSAN023464	1	2,939,778						Listeria monocytogenes CFSAN002349
GTVSS-001	CFSAN000189	2	4,724,863	81,814					Salmonella enterica subsp. enterica serovar Bareilly str. CFSAN000189