Apr 20, 2026

A MODIFIED PROTOCOL FOR DETECTING SPATE-ENCODING GENES FROM SHORT READS AND A SPATEs DATABASE USING ARIBA

  • 1University of Ibadan, Ibadan, Nigeria;
  • 2Ahmadu Bello University, Zaria, Nigeria
  • Refined Protocol for Detecting SPATEs from Short Reads
Icon indicating open access to content
QR code linking to this content
Protocol CitationRotimi A. Dada, Iruka N. Okeke 2026. A MODIFIED PROTOCOL FOR DETECTING SPATE-ENCODING GENES FROM SHORT READS AND A SPATEs DATABASE USING ARIBA. protocols.io https://dx.doi.org/10.17504/protocols.io.14egn5oyqg5d/v1
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: April 13, 2026
Last Modified: April 20, 2026
Protocol  Integer ID: 314875
Keywords: accuracy of spate gene detection, spate gene detection, containing multiple spate gene, ariba serine protease autotransporters of enterobacteriaceae, multiple spate gene, encoding genes from short read, spate reference protein, genes from short read, enterobacteriaceae, homology with spate reference protein, spates database, genome, querying spate, pathogenic bacteria, detecting spate, encoding gene, using ariba serine protease autotransporter, refined spates protocol, refined command line protocol for detection, spate, true in genome, tools such as ariba, rich spate
Funders Acknowledgements:
MRC/DfID
Grant ID: #MR/L00464X
Gates Foundation
Grant ID: INV-036234
Abstract
Serine Protease Autotransporters of Enterobacteriaceae (SPATEs) are secreted effector proteins commonly used by pathogenic bacteria. SPATEs have 3 major domains and are highly homologous in structure, but differ in their functions. The homogeneity of the structure of SPATEs generally and within each domain may sometimes lead to spurious results when tools such as ARIBA and BLAST alone are used indiscriminately. This is particularly true in genomes containing multiple SPATE genes. In addition, the accuracy of SPATE gene detection is also dependent on the quality and richness of the database used for querying SPATEs. We here provide a rich SPATE-only database and a refined Command Line protocol for detection of SPATE-encoding genes from short reads. We validated the results generated from our refined SPATEs protocol using hybrid assemblies (Illumina and ONT) and homology with SPATE reference proteins.
Download SPATEs database (Download SPATEs_db.faSPATEs_db.fa222.8KB )

Use ariba prepareref command to prepare the SPATEs database (SPATEs_db.fa) for use with ARIBA using the following command:
ariba prepareref --all_coding yes -f SPATEs_db.fa outdir
Use ariba run command (https://github.com/sanger-pathogens/ariba/wiki/Task:-run) to carry out “local assembly” of SPATE-encoding genes in your reads.
Assuming your prepared SPATE database (outdir) is called SPATE_prepareref, use:
ariba run SPATE_prepareref reads_1.fastq.gz reads_2.fastq.gz reads_outdir
(uncompressed fastqs can also be used as input files)
Use ariba summary command to summerise your report files:
ariba summary out in.report.1.tsv in.report.2.tsv in.report.n.tsv
Exclude hits reported as interrupted, partial or fragmented for SPATE-encoding genes from downstream analyses (https://github.com/sanger-pathogens/ariba/wiki/The-assembled-column-from-ariba-summary).
Where there is a discordance (partial vs complete/interrupted/fragmented; complete vs partial/fragmented/interrupted; interrupted vs complete/partial/fragmented or fragmented vs complete/partial/interrupted) in the calls made by two or three source databases, adjudge the call to be correct based on any complete assembly of a specific gene from any one database.
Codes for database source of SPATE-encoding genes in the SPATEs_db.fa:
vf – Virulencefinder database
vfdb – Virulencefactor database
ncbi - NCBI