Nanopore sequencing of poliovirus Isolates analysis and quality control checks

Ben Bellekom; Erika Bujaki; Joyce Akello; Catherine Pratt; Aine OToole; Andrew Rambaut; Javier Martin; Nick Grassly; Alex Shaw

Apr 23, 2026

Version 3

Nanopore sequencing of poliovirus Isolates analysis and quality control checks V.3

DOI

https://dx.doi.org/10.17504/protocols.io.261gekoq7g47/v3

¹Imperial College;
²MHRA;
³Imperial College London;
⁴Biosurv International;
⁵University of Edinburgh

Poliovirus Sequencing Consortium

Ben Bellekom

Imperial College

DOI: https://dx.doi.org/10.17504/protocols.io.261gekoq7g47/v3

External link: http://https://www.protocols.io/workspaces/poliovirus-sequencing-consortium

Protocol Citation: Ben Bellekom, Erika Bujaki, Joyce Akello, Catherine Pratt, Aine OToole, Andrew Rambaut, Javier Martin, Nick Grassly, Alex Shaw 2026. Nanopore sequencing of poliovirus Isolates analysis and quality control checks . protocols.io https://dx.doi.org/10.17504/protocols.io.261gekoq7g47/v3Version created by Jasmaine Lee

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: April 23, 2026

Last Modified: April 23, 2026

Protocol Integer ID: 315601

Keywords: poliovirus isolate, comparison of the poliovirus, vp1 poliovirus sequence, poliovirus, poliovirus detection, based poliovirus detection, sequencing data, sequencing result, sequencing run data, laboratory sequence qc database folder, responsibility of the lab senior scientist, nanopore, lab senior scientist before qc analysis review, isolate sample, sequencing run, lab senior scientist, identification of contamination, overview of the procedure, lab, rna from stool surveillance, reportable sequence, standard operating procedure, contamination, same samples by cell

Funders Acknowledgements:

Gates Foundation

Abstract

This standard operating procedure indicates how to perform data analysis, quality control checks, and data reporting after nanopore sequencing of poliovirus isolates and provides guidance on best practice for ensuring relevant sequencing run data are recorded. 

Procedure
For isolate sequencing results to be valid and suitable for reporting, sufficient sample metadata must be recorded and data integrity maintained throughout the planning stage of the experiment, during the experiment and after the experiment. Quality control checks have been included to ensure that the protocol has been performed correctly and that results are valid. Later comparison of the poliovirus isolates sequencing results with culture-based poliovirus detections can be facilitated by adding further metadata describing the timelines and results for processing of the same samples by cell-culture. 

An overview of the procedure is shown in Figure 1.

It is the responsibility of the Lab senior scientist to designate staff to conduct the QC analysis report and to ensure that the personnel conducting the data analysis has provided complete and accurate data, and the
report generated is correct prior to approval. Data must be reviewed and approved by the lab senior scientist before QC analysis review with the technical team (PSC member). The designated technical team lead then submits the QC’d data to the GSL for review/approval. Once approved, the GSL share the QC'd data to the program.

A log should be kept of the sequencing runs that are performed, indicating whether all quality control checks have been completed or whether some remain pending, and confirming that all reportable sequences have been reported. All VP1 poliovirus sequences generated (even in cases of sample or run QC fail) should be collected into a laboratory sequence QC database folder to aid in the identification of contamination. This addition can be performed by annotating the vp1_sequence.fasta  from a run with the run name (e.g. "vp1_sequences_Run21.fasta") and copying it into the database folder. 

We recommend that you do not add RNA from stool surveillance to the same run. The RNA from cell-culture will be much more concentrated, resulting in the distribution of sequencing data skewing towards the isolate samples rather than the stool samples. 

Figure 1. Overview of the poliovirus isolate analysis and quality control checks

Guidelines

All procedures to be performed by suitably trained members of staff.

Post Sequencing Run Checks

Perform the post sequencing run checks by confirming the following points manually

 a. Did the sequencing run complete its full run duration (check the MinKNOW run report). 
 b. Was there no sudden reduction in pore numbers i.e. pores numbers did not fall beneath 400 (or 25% of total pores) in the first hour of the sequencing run (check the MinKNOW run report, see Figure 2).

Figure 2. The MinKNOW run report showing a failed sequencing run where the percentage of available pores (sum of bright green “Sequencing” and darker green “Pore” bars) falls below 25 % of total pores within the first hour of the sequencing run.

If all answers are yes, then continue to running PiranhaGUI for sequence analysis (Step 3).  

If any answers are no, then complete Step 2.

Follow step 2.1 or 2.2 depending in the reason for failure

Run has not reached its full duration

Check the system messages in MinKNOW to see if it gave an alert that explains the run stopping prematurely
Restart the sequencing run. Check that there are still >500 pores available for sequencing (this will be reported when the run is restarted).
If there are insufficient pores available, repeat the library pooling and sequencing library preparation and load into a different flow cell.

Sudden reduction in pore number during first hour

If the run does not pass QC checks after analysis with PIRANHA, repeat the sample pooling and library preparation and load into a different flow cell.

Running PiranhaGUI for Isolate Sequence Analysis

Run the PIRANHA analytical software once the sequencing run is complete. For full details on PiranhaGUI installation and usage visit https://github.com/polio-nanopore/piranhaGUI. 

 Open docker (click “Update to latest” in the top left if Docker Desktop needs to be updated) 

Open PiranhaGUI (click “Install update to PIRANHA software” if piranha needs to be updated)

The PiranhaGUI window will pop up as shown below. Click "Continue" to begin setting up your run analysis.

The window to setup / input data for your run will pop up as shown below.

Fill out the three fields circled red by clicking "Select" next to:
a. "Samples"  to supply the location of your barcode.csv file which informs piranha which barcodes to analyse.

b. "MinKnow Run" to provide the location of your sequencing data (the demultiplexed fastq files). Usually this will be the fastq_pass folder.

Note
PiranhaGUI is configured to detect and access FASTQ read data within the specified directory and subdirectories that correspond to each barcode in the sequencing run with FASTQ sequencing read files

c. "Output Folder" to specifiy where PIRANHA will place its' report files.

Click on "Persistent Run Options" to enable the "Piranha Phylogenetic module", provide supplementary sequences for the phylogenetic analysis, and to set your own default options.

a. Under "Supplementary directory for phylogenetic module"  select the location of your laboratory sequence database. This contains all the vp1_isolate_sequences.fasta files from previous isolate sequencing runs.

b. Under “Orientation”, select the orientation of the barcodes in your barcode primer plate. Horizontal has well A1 barcode01, A2 barcode02, A3 barcode03. Vertical has well A1 barcode01, B1 barcode02, C1 barcode03 and so on.

c. Under "Protocol", select the sample type "stool". Options include "environmental", "stool", or "isolate"

d. Enter or change the names of the positive and negative controls if different from the default. If you have multiple positive and negative controls, separate their names with a comma. e.g "positiveEx1, positiveEX2" or "negativeEX, negativePCR1, negativePCR2"

e. Click "Continue" to confirm any changes.

Note
All the options selected under persistent run options are permanently set and will remain every time you launch Piranha.
With the button "Set options for this run" you can also set options for the analysis but these will not apply to future runs

Alternatively, the button "Set options for this run"  on the main run window can also be used to set options for the analysis but these will not apply to future runs.

Before starting the analysis, click the "Piranha Options" button to set the "Analysis options". When running PIRANHA for isolates set the following options:                                                    
        a.    Minimum read length: 1000                                                          
        b.    Maximum read length: 1300                                                      
        c.    Minimum depth: 50                                               
        d.    Minimum read percentage: 0

Click  "Continue" to return to the run window. For quicker analysis, select the maximum number of threads for the Piranha pipeline.
Note
All settings are saved even after closing the GUI, so there is no need to reapply the changes every time.

Click "Start Analysis". A progress bar will show how much has been completed.

Verify that the Piranha run completed successfully. The last line before “###PIRANHA SOFTWARE FINISHED###” should say “Generating: /data/run_data/output/piranha_output/report.html”. If that is not the case, then Piranha may have encountered an error. Check and resolve the error before continuing.

After the PIRANHA analysis is complete, you can view the output by clicking "Open Output". The "piranha_output" folder will contain the following: 

barcode_reports: This contains the HTML reports for each sample barcode with information on the mutations when compared to respective reference sequence for each poliovirus serotype. The snipit plot shows the percentage co-occurrence of SNPs called against reference and it can give an idea if mixed populations are present within the sample.

published_data folder: This contains a folder for each barcode and a file called vp1_sequences.fasta containing all consensus sequences for each samples’ classified haplotype

detailed_run_report: This contains all sequencing results appended to your barcode.csv file. Any additional metadata can be added to the “detailed_run_report.csv” file. This final report is the definitive document containing all the data for the sequencing run and can be uploaded for storage and data rows shared when reporting detections.

report: This contains the sample summary information specific to the sequences generated for each sample.

PIRANHA will also create an interactive html report which summarises all sample results and confirms whether:
 a. The positive control(s) yielded at least 50 sequences that have mapped to Coxsackievirus A20.
 b. The negative control(s) yielded less than 50 reads mapped to poliovirus or NPEVs

Perform Run Quality Control Checks

Perform the RunQC by checking the following points:
 a. Were there more than 50 reads for the positive control mapped to Coxsackievirus A20? 
 b. Were there less than 50 reads mapped for the negative control? 
 c.  Were there <50 reads present in the positive control that were not Coxsackievirus A20?

If all answers are Yes, then enter Pass in the column “RunQC” of the detailed_run_report. 
If any answers are No, enter Fail in the "RunQC" column and add an explanation of the failure in the “comments” field, then complete the following troubleshooting steps:

Too few positive control reads: 
  a. Confirm that your earlier positive control QC check has passed. Repeat the library pooling and confirm the presence of your library after the cleanup steps using a Tapestation or a Qubit fluorometer.  
  b. Check that you are ligating the correct adaptor (LA) and are using the short fragment buffer (SFB) during library preparation and that none of your end-prep or ligation enzymes are expired

Too many negative control reads: 
    a. Confirm that your earlier negative control QC check has passed.  
    b. Rewash the flow cell with a DNAse wash and repeat the library pooling and sequencing run.  

Extra detection (>50 reads mapping) in the positive control other than the appropriate Coxsackievirus A20 strain is an indication of contamination.
a. Perform a deep clean of workstations, pipettes and equipment with nucleic acid degradation solution (e.g DNA Away, DNAZap)
b. Retest the samples 

Sample Quality Control Checks

If EPIDs and metadata are only available after the sequencing run, add them to the detailed_run_report.

Generate a phylogenetic tree comparing your run to all previous data by using one of the methods below:

a. Using the piranha phylogenetic tree module:  Open the piranha html report in the output folder and confirm that the piranha report contains a phylogenetic tree and the table that shows the IDs of identical sequences.
b. Alternatively, use Geneious or Nextstrain to generate an alignment and  phylogenetic tree of the new sequences generated by PIRANHA and your laboratory sequence QC database with the isolate sequences from prior runs.

Perform individual sample QC checks for all samples and enter Pass/Fail in the column “SampleQC” of the detailed_run_report. Samples should be marked as “Fail” if:
 
  a. A pair of samples with the same EPID (i.e. from the same case) are 3 or more nucleotides different from each other over VP1.

  b. A sample is identical to another sample in the run that does not have the same EPID (i.e. they are from different cases), unless the sequences are both the same Sabin serotype with no more than 1 mutation from the original vaccine.

Check for bleed-through between runs if the flow cell has been reused and for amplicon contamination; compare your current run to previous isolate sequencing runs to see if there is a sample with the same barcode that yielded a highly similar sequence (less than 3 nucleotides different over VP1). If a sample matches a sequence from a previous run in this manner, mark it as a “Fail”.  Refer to your phylogenetic tree to identify similar sequences with matching barcodes. 

Note
Identical sequences appearing at low read numbers over multiple samples could possibly indicate cross-contamination. Such samples should be analysed with care. If contamination is suspected, mark the sample as "Fail".

Due to the increased risk of sample contamination, for isolate sequencing we include a further sample QC step. To reduce sample load, we will pursue only isolates indicated as programmatically relevant by ITD (See note).

Carry out the following step only after a sample has passed the initial sample QC. 

Isolate classification will be assessed at the case level (EPID). For an EPID to pass QC, all expected ITD classifications must be detected and meet read number criteria across the associated samples. This requirement applies even if an individual sample within the EPID independently matches its expected ITD classification (see note).

Note
 Example of Passed SampleQC:
ABCDE
EpIDITD classificationSampleQCIsolateClassificationComment
XXXXXX001Sl1,nOPV2, Sl3passSl1, Sl3
XXXXXX001Sl1,nOPV2, passSl1,nOPV2, 
Expecting Sl1, nOPV2, Sl3 across the isolate sequencing results. 
Sl1, nOPV2, Sl3 were detected, so both samples pass - Important to note whilst the first sample is missing its nOPV2 ITD classification, nOPV2 is found in the second sample resulting in all EPID ITD classifications being present across the samples. 
 

Note
Example of failed SampleQC:
ABCDE
EpIDITD classificationSampleQCIsolateClassificationComment
XXXXXX002Sl1, PV3retestSl1Sl3 missing
XXXXXX002Sl1,nOPV2, retestSl1,nOPV2, 
Expecting Sl1, nOPV2, PV3 across the isolate sequencing results. 
PV3 is missing in both samples, so both samples fail – Important to note despite one sample of the pair having the correct reads for its individual ITD classification, it fails QC as PV3 is not found in either sample


  

For each sample within an EPID, enter Pass/Retest in the sampleQC column. 

SampleQC should be marked as “Pass” if: 
  a. A sample has over 500 reads mapping to the detected virus.

SampleQC should be marked as "Retest" if:
  a. A sample has less than 500 reads and produces a consensus sequence classified as WPV or VDPV. 
  b. A sample has less than 500 reads mapping to a virus expected according to ITD.

If a sample has less than 500 reads and is classified as NPEV or Sabin-like and is not expected according to ITD:
  a. Do not report, and no further testing should be done for that result.

Samples that pass the sampleQC step should have:
   a.  all viral results that pass QC listed in the Isolateclassification column 

Samples that pass SampleQC should be reported (see note), and should have:
  a. “SampleQCCheckComplete” marked as “Yes”. 
  b. “ToReport” should be marked as “Yes” (see note). 


Note
Programmatically important results that pass QC may be reported immediatly even if there is an outstanding retest for that sample. In this instance "ToReport" should be marked as "Yes&Retest" and an explanation of which result is being reported should be added in "QCComments".

Samples marked as Retest in the sampleQC column should have:
  a. "SampleQC" marked as "Retest". 
  b. “ToReport” should be marked as “No” (see note).
  c. An explanation added in “QCComments”, indicating which viral results were missed

Note
Programmatically important results that pass QC may be reported immediatly even if there is an outstanding retest for that sample. In this instance "ToReport" should be marked as "Yes&Retest" and an explanation of which result is being reported should be added in "QCComments".

When the report has been submitted, complete the “DateReported” for samples that
were included. 
Note
The following files are required when reporting data following QC
a.   MinKNOW Run Report
b.   Piranha output report
c.   barcode_reports
c.   QC detailed run report (contains summary of the QC sample results).  Note: ensure to include any QC anomalies and corrective actions
d.   Fasta files of positive samples

Each retested sample should be repeated in a new run initally using the Y7/Q8 primer set (with column “IsclassificationQCRetest” flagged “Yes” in that run). 

These samples should not be grouped together upon retesting and should use different barcodes to the original run. 

Repeat individual sample QC checks for all samples that have undergone retesting using the same criteria indicated above (step 10).

If on retesting:
   a.  Samples yield the same sequence as in the initial run, mark them on each run as “Pass”. If a pair of samples (same EPID) yield the the same sequences as the first run, but these differ by 3 or more nucleotides they can be marked as “Pass”. 
  b. Samples yield the virus as expected by ITD after initally testing negative, mark them as "Pass".
  c. Samples no longer yield sequences, mark them on each run as negative under “IsolateClassification” and add a note of“Likely contamination” in “QCComments” in the original run. These sequences can be removed from further analyses of the sequencing run and should not be submitted
 d. Samples remain negative despite there being an expected ITD result, then refer to GSL guidance for further testing. 

After retesting is complete, mark “SampleQCChecksComplete” in the original run as “Yes”

All samples that are marked "Pass" may now be reported. 

Any data that is not available before or during the run can be added to the detailed_run_report.csv when available.

The filename of vp1_isolate_sequence.fasta should be appended to include the run name (e.g. "vp1_isolate_sequences.fasta" becomes "vp1_isolate_sequences_run21.fasta". Within the file sequences can be appended with "QC_Pass" if the sequence has passed the Run and Sample QC checks (as shown below). The file should then be transferred to the laboratory sequence QC database folder

Protocol references

https://github.com/polio-nanopore/piranhaGUI.
https://github.com/polio-nanopore/piranha.
Áine O’Toole, Rachel Colquhoun, Corey Ansley, Catherine Troman, Daniel Maloney, Zoe Vance, Joyce Akello, Erika Bujaki, Manasi Majumdar, Adnan Khurshid, Yasir Arshad, Muhammad Masroor Alam, Javier Martin, Alexander G Shaw, Nicholas C Grassly, Andrew Rambaut, Automated detection and classification of polioviruses from nanopore sequencing reads using piranha, Virus Evolution, Volume 10, Issue 1, 2024, veae023, https://doi.org/10.1093/ve/veae023

A	B	C	D	E
EpID	ITD classification	SampleQC	IsolateClassification	Comment
XXXXXX001	Sl1,nOPV2, Sl3	pass	Sl1, Sl3
XXXXXX001	Sl1,nOPV2,	pass	Sl1,nOPV2,