May 21, 2026

Transmission-acceleration dengue outbreak-detection pipeline: a step-by-step computational protocol

  • Keanu John Pelitro1,
  • Julia Fye Manzano1,
  • Troy Owen Matavia1,
  • Kylone Soriano1,
  • Klara Bilbao1,
  • Gereka Marie Garcia1,
  • Aira Joy Delos Angeles1,
  • Alfredo Mahar Lagmay1,
  • DJ Darwin Bandoy2,3
  • 1University of the Philippines Resilience Institute, Quezon City, Philippines;
  • 2College of Veterinary Medicine, University of the Philippines, Los Baños, Laguna, Philippines;
  • 3National Institute of Geological Sciences, University of the Philippines Diliman, Quezon City, Philippines
Icon indicating open access to content
QR code linking to this content
Protocol CitationKeanu John Pelitro, Julia Fye Manzano, Troy Owen Matavia, Kylone Soriano, Klara Bilbao, Gereka Marie Garcia, Aira Joy Delos Angeles, Alfredo Mahar Lagmay, DJ Darwin Bandoy 2026. Transmission-acceleration dengue outbreak-detection pipeline: a step-by-step computational protocol. protocols.io https://dx.doi.org/10.17504/protocols.io.j8nlkz76dl5r/v1
Manuscript citation:
Pelitro, K. J. et al. Transmission acceleration outperforms the endemic-channel threshold for dengue outbreak detection (in preparation).
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: May 21, 2026
Last Modified: May 21, 2026
Protocol  Integer ID: 317697
Keywords: dengue, outbreak detection, transmission acceleration, STA/LTA, endemic channel, early warning, anticipatory action, year-cluster bootstrap, surveillance time series, acceleration dengue outbreak, channel outbreak threshold for seasonal dengue, seasonal dengue, weekly dengue case series, deactivation thresholds on the quezon city series, channel outbreak threshold, quezon city series, detection pipeline, computational pipeline, endemic country, csv output of the companion manuscript, end computational pipeline, philippine administrative region, cluster bootstrap consensus analysis, threshold drift
Funders Acknowledgements:
National Institute of Environmental Health Sciences of the National Institutes of Health
Grant ID: P20ES036118
Disclaimer
NOTE: This protocol describes the procedure. The exact, version-pinned analytical code is the authoritative implementation and is deposited at the Zenodo record above. Where the protocol and the code differ in incidental detail, the deposited code governs.
Abstract
This protocol describes the end-to-end computational pipeline used to evaluate transmission-acceleration detectors against the rolling-baseline endemic-channel outbreak threshold for seasonal dengue. The pipeline is implemented as five sequential R scripts that derive the constant transmission-acceleration activation and deactivation thresholds on the Quezon City series, characterise outbreak-threshold drift, run the city-scale analysis, and apply the framework without parameter modification to seventeen Philippine administrative regions and to eight dengue-endemic countries. It evaluates eleven detectors across three paradigms against an anticipatory anchor framework, computes eight performance metrics, and runs a year-cluster bootstrap consensus analysis. Following the steps reproduces every figure, table, and CSV output of the companion manuscript from the deposited dataset. The pipeline consumes only the weekly dengue case series and requires no climate covariates, line-list data, or model-fitting infrastructure beyond a single workstation running R.
Guidelines
Read these guidelines before running the pipeline. They define the conditions under which the results are valid and the points at which local calibration is required.

Data envelope: The pipeline consumes weekly counts of suspected dengue cases indexed to the ISO epidemiological week. It does not require laboratory confirmation, line-list records, symptom-onset dates, or climate covariates. Any surveillance system that produces a stable weekly case count can run it.

Seasonality precondition: The transmission-acceleration paradigm requires a defined annual rising limb. In settings with year-round transmission or weak or multi-modal seasonality, the rising-limb assumption fails and the detector must be re-parameterised or replaced; the country-scale step describes the diagnostic for this case.

In-sample threshold tuning: The activation and deactivation thresholds are derived from the Quezon City series and then evaluated against that same series.

Reporting-practice stability: Verify reporting stability across the evaluable window before interpreting country-scale rankings.
Materials
Software
  • R: 4.4.1+ (Base analytical environment)
  • readxl: 1.4+ (Reads the Dengue-Rainfall_Dataset.xlsx workbook)
  • dplyr, tidyr, purrr: tidyverse (Series reshaping, grouping, and map operations)
  • zoo: 1.8+ (Right-aligned rolling means)
  • pROC: 1.18+ (ROC operating points)
  • MASS: 7.3+ (Negative-binomial fit)
  • ISOweek: 0.6+ (Mapping ISO year and week)
  • ggplot2, cowplot, patchwork, scales: current (Figures and layouts)

Input data
  • QC Data: Stages 1, 2, 3 (YR, WN, DC_QC, optional RF_NASA)
  • Regional Data: Stage 4 (Weekly case counts for the 17 Philippine administrative regions, 2016 to 2024)
  • Country Data: Stage 5 (Weekly case counts for the 8 dengue-endemic countries, 2016 to 2024)

Location: the single workbook Dengue-Rainfall_Dataset.xlsx is the separate Zenodo deposit 10.5281/zenodo.19448854.

NOTE. The original sources are the Quezon City Epidemiology and Surveillance Division (PIDSR) for the city series, the Humanitarian Data Exchange (data.humdata.org) for the regional panel, and OpenDengue (opendengue.org) for the country panel; all are consolidated and deidentified in this one workbook. Region IV-B is folded into MIMAROPA per Philippine Statistics Authority convention. The dataset deposit carries the Open Data Commons Open Database License (ODC-ODbL) v1.0; attribute the UP Resilience Institute, NOAH (UPRI-NOAH) when reusing it.

Hardware
A recent laptop is sufficient. The complete five-script pipeline, including the B = 2,000 headline bootstrap and the six-method threshold-derivation panel, completes in roughly 25 to 45 minutes. No GPU or cluster is required.
Before start
Step 0.1. Obtain the code

Download the code release from Zenodo (or clone the matching GitHub tag) and unpack it. The archive contains the five sequential Stage scripts and README.md, which documents the complete file inventory, the common dictionary, and the per-stage output directories.

Step 0.2. Obtain the dataset and point each script at it

The weekly case series is a separate Zenodo deposit. Download the workbook, then set the workbook path in the USER SETTINGS block near the top of each script: the variable is PATH in Stages 1 to 3 and INPUT_PATH in Stages 4 and 5. The scripts ship with an absolute path placeholder; replace it with your own path, or uncomment the file.path(getwd(), ...) or file.choose() alternative provided in the same block.

Step 0.3. Confirm the environment and the fixed seed

Use R 4.4.1 or newer on Linux, macOS, or Windows. The CRAN dependency set is provisioned automatically on first run. Every script sets a fixed random seed at entry, so every randomised step (the year-cluster bootstraps, the ROC bootstraps, and the parametric bootstraps inside the threshold derivation) reproduces exactly.

Step 0.4. Source the five scripts in order

Run the scripts in numerical order. Stage 1 writes the derived hysteresis thresholds to a sourceable file that the downstream stages consume, so it must run first.

NOTE: Stage 1 writes the adopted thresholds to a sourceable file, `Stage1_eta_thresholds_QC/eta_thresholds_for_figure2.R`, that the downstream stages read. The code names the thresholds eta_ON and eta_OFF (the activation and deactivation thresholds in the manuscript), adopted as the median across the six derivation methods at eta_ON = 1.33 and eta_OFF = 0.73. The detector itself is implemented as a Vaezi-style STA/LTA function, run_vaezi(); it is the Constant Transmission Acceleration detector of the manuscript. This protocol uses the manuscript terms in prose and the code names in the snippets.

CRITICAL: The parameters below are fixed a priori (windows 4 and 26 weeks, guard band 2 weeks, eta_ON = 1.33, eta_OFF = 0.73, activation window 4 to 8 weeks, epidemic-burden fraction 0.70, exclusions 2020, 2021, 2022, 2.0 at the head and line 1,000 elsewhere). Do not change them in any analysis step; that is the domain of a priori specification auditable and the sensitivity analyses clean.
Stage 1. Stage1_eta_threshold_derivation_QC.R
Load the Quezon City series and build the STA/LTA ratio Read sheet “QC Data” (columns YR, WN, DC_QC) from the workbook, index each row to the Monday of its ISO week, and build the right-aligned four-week short-term average and the twenty-six-week long-term average separated by a two-week guard band. The ratio of the two is the STA/LTA series the thresholds operate on. Reported cases are used as supplied, without back-calculation to symptom-onset date.
Define the Vaezi-style detector (Constant Transmission Acceleration)
The detector is implemented as run_vaezi(). It activates when the ratio reaches eta_ON, holds the long-term mean frozen at its activation-time value while on, deactivates when the ratio falls below eta_OFF, and releases the frozen baseline only after MIN_OFF_RESET (eight) consecutive off weeks. This is the Constant Transmission Acceleration detector of the manuscript.
Run the six derivation methods (M1 to M6) and take the cross-method median
Baseline weeks are weeks with DC_QC below 30% of the annual peak. The six methods are M1 baseline-ratio asymmetric quantiles (90th and 50th percentiles); M2 high-sensitivity ROC operating points (95% and 99% sensitivity, via pROC); M3 ARL calibration to an average run length to false alarm of 12 weeks (secondary targets 6 and 26); M4 leave-one-year-out cross-validation under an asymmetric utility favouring early triggering; M5 negative-binomial theoretical bounds at plus or minus one standard deviation; M6 parametric bootstrap of the null STA/LTA ratio (90th and 25th percentiles, 2,000 replicates). Five methods return a reportable eta_ON and four a reportable eta_OFF (M3 returns no eta_OFF). The adopted pair is the raw median across methods, eta_ON = 1.33 and eta_OFF = 0.73, written to a sourceable file for the downstream stages.
Verify the effective-reproduction-number mapping
Confirm that the adopted activation threshold corresponds to an effective reproduction number just above the epidemic threshold, which is what licenses interpreting the seismological STA/LTA structure in dengue-specific units. On the rising limb the STA/LTA ratio reduces to a closed form in the weekly growth rate r; at small r it approximates exp(r times 13 weeks), giving r near ln(1.33)/13, about 0.022 per week. With a dengue serial interval of about two weeks, R_t is about exp(0.044), close to 1.05 at the activation point.

NOTE. The rankings are structurally insensitive to the activation threshold within the inter-method envelope (0.925 to 1.850), because the bootstrap dominance computation (Stage 3) depends on inter-detector ordering, not on absolute metric levels. Re-running the envelope endpoints leaves the cross-paradigm ranking unchanged.
Stage 2. Stage2_outbreak_threshold_drift_QC.R
Compute the rolling week-specific outbreak threshold
At each calendar week, the threshold is the mean plus two standard deviations of cases in the same ISO week across the rolling baseline. Reads sheet "QC Data".
Render the two panels and write the outputs
Panel (a) includes the pandemic years in the rolling baseline; panel (b) excludes 2020 and 2021, to show that excluding the disrupted seasons shifts the threshold numerically but does not resolve its structural lag. Outputs: fig1_dengue_thresholds_vertical.pdf and .png (600 dpi), with the underlying values in fig1_thresholds_with_pandemic.csv and fig1_thresholds_without_pandemic.csv, under Stage2_outbreak_threshold_drift_QC/.
Stage 3. Stage3_QC_analysis.R
Define the evaluable years
Drop the COVID-19 surveillance-disruption years (2020 and 2021) and the truncated 2025 year. This yields n = 10 evaluable Quezon City seasons (2013 to 2019, 2022 to 2024). Exclusions are applied uniformly across every detector.
Build the anticipatory metrics framework
Anchor A1 is the pre-peak Actionable Window, the weeks from peak minus 8 to peak minus 4. Anchor A2 is the Epidemic Burden block, the smallest contiguous block containing the peak whose cumulative cases reach at least 70% of the annual total. A trigger at week t is a true alarm if and only if t lies in A1 or in A2; a trigger with lead nine weeks or more (outside both anchors) is a false alarm. For evaluation consistency the aggregate tables use a single peak (the global annual maximum); the bimodal year 2024 is the documented exception and is rendered in Figure 3 with two peaks (week 34 and week 47), each with its own A1 and A2 anchors, but does not enter the aggregate tables under two peaks.
Run the eleven-detector bank
Transmission-acceleration paradigm (six): Constant Transmission Acceleration (the run_vaezi detector), Continuous Transmission Acceleration, Composite Outbreak Signal, Critical Transition Indicator, Incidence Gradient, and Hydrological Inflection Measure. Retrospective-threshold paradigm (three): Outbreak Threshold (mean + 2 SD), Alarm Threshold (mean + 1 SD), and walk-forward CUSUM. Surveillance-guideline percentile paradigm (two): WHO 90th Percentile Threshold and WHO 75th Percentile Threshold. All percentile- and standard-deviation-based detectors use a walk-forward construction so every metric is prospective. The disjunctive composite is not evaluated, because it is strictly dominated by Constant Transmission Acceleration alone.
Score the eight metrics in three categories
Burden and accuracy: TAM (True-Alarm Magnitude), N_True_Alarms, PPV, and Sensitivity, the last being A1-restricted (the proportion of years with at least one true alarm inside A1). Timeliness: Mean_Lead_Time, WP (Warning Persistence), and ALY (Actionable Lead-Time Yield). False alarms: N_False_Alarms, lower is better. Five of the eight (TAM, N_True_Alarms, Sensitivity, Mean_Lead_Time, WP) drive the head-to-head consensus; use the count of evaluable years as the common denominator and substitute zero, not a missing value, for non-firing years.
Run the head-to-head paired comparison and the sensitivity tables
Compare the three target detectors (Constant TA, Continuous TA, Outbreak Threshold) with paired Wilcoxon signed-rank tests at alpha = 0.05, reporting the year-cluster bootstrap confidence interval (B = 2,000 at the headline) as the primary effect-size statement and the Wilcoxon P value as a sensitivity diagnostic. Write the sensitivity tables S1 to S3 and S5 (Actionable Window width, Epidemic Burden cumulative-case fraction, and year-inclusion rules), confirming the cross-paradigm ranking is stable across the 60% to 80% burden envelope and the alternative window widths.
Stage 4. Stage4_Regional_analysis.R
Load and screen the regional panel
Read sheet "Regional Data", fold Region IV-B into MIMAROPA, apply the year exclusions, and retain a region for a given year if its annual peak reaches at least twelve cases. This yields 6 to 7 evaluable years per region.
Compute the bootstrap dominance probability per region
Within each year-cluster bootstrap replicate (B = 1,000, under the nested seed 20260101), rank the three target detectors on each of the five primary metrics, average the ranks to a composite mean rank, and record the replicate leader. Pr is the proportion of replicates in which a detector leads. Rank-then-average is used rather than standardise-then-rank because the metrics differ in scale.
Classify each region into a consensus tier
From the all-pairs head-to-head check, classify each region as strong consensus, partial consensus, lead only, or contested, applying strict cross-stratum Bonferroni (k = 51) as the primary correction. Treat the decisive zone Pr at least 0.75 as an interpretive convention. Write the six Figure 4 multipanels, Figure 5, and the regional CSVs.
Stage 5. Stage5_Country_analysis.R
Load the country panel and screen evaluable years
Read sheet "Country Data" and apply the same evaluable-year criteria, yielding 5 to 7 evaluable years per country. The country panel mixes native surveillance signals; no harmonisation is applied across case definitions, so each ranking reflects detector performance against that country's own signal.
Run dominance and consensus with the country-scale correction
Compute the bootstrap dominance probability and the consensus tiers exactly as at the regional scale (B = 1,000, nested seed 20260101), applying strict cross-stratum Bonferroni at k = 24 for the country panel.
Inspect the seasonality diagnostic
Confirm the seasonality precondition for each country. Colombia is the expected exception: year-round transmission weakens the annual rising limb, the outbreak threshold leads there at a non-decisive bootstrap probability of 0.66, and it is the diagnostic case for the precondition rather than a counterexample to the framework. Write the six Figure 6 multipanels, Figure 7, and the country CSVs.

NOTE. A clean end-to-end run reproduces every figure (Figure 1 through the Figure 6 multipanels and Figure 7), table, and CSV referenced in the manuscript and its supplement. Compare the emitted files against the inventory in README.md; the headline benchmark values in the next section are the quickest check.
Expected results
A clean run reproduces the headline benchmark values below. Use them as a checkpoint: a deviation signals a configuration or data-placement error, not a different finding.

CheckpointConstant transmission accelerationOutbreak threshold
City True-Alarm Magnitude (mean cases/season)4,2571,119
City sensitivity100%30%
City mean lead time (weeks, within 4 to 8)6.91.6
City warning persistence (weeks)5.51.6
City true alarms per year21.93.5
Regions led (of 17, consensus tier)10 strong/partial0
Countries led (of 8)41 (Colombia, Pr 0.66)

At the regional scale, constant and continuous transmission acceleration lead in fifteen of seventeen regions; nine of the fifteen consensus-winning regions reach the decisive zone Pr ≥ 0.75 (twelve at Pr ≥ 0.70, seven at Pr ≥ 0.80). At the country scale, the ordering holds in seven of eight countries; Colombia is the single setting where the outbreak threshold leads, at a non-decisive Pr of 0.66, and is the diagnostic case for the seasonality precondition.
Troubleshooting
Symptom: Constant TA never activates in a stratum
Likely cause and resolution: The long-term window plus guard band exceeds the available pre-season history. Confirm at least w_long + guard (28) weeks precede the first evaluable peak; otherwise the detector is correctly silent until enough history accumulates.
Symptom: A detector saturates sensitivity at the city scale
Likely cause and resolution: Expected. The four-week actionable window plus high trigger volume can saturate the sensitivity metric, so sensitivity in isolation is a coarse summary at the city scale. Read it alongside burden, timeliness, and the false-alarm rate.
Symptom: Country-scale ranking unstable across runs
Likely cause and resolution: Either the bootstrap replicate count is too low (confirm B_other = 1,000) or the native surveillance signal is reporting-unstable across the evaluable window. Check for documented surveillance-system changes in that country before interpreting.
Symptom: Outbreak threshold ranks first in a stratum
Likely cause and resolution: Check the seasonality precondition. Year-round or multi-modal transmission weakens the rising limb the acceleration detectors depend on. Treat the stratum as an aseasonal boundary case and re-tune or replace the detector against the local series.
Symptom: Bimodal season flagged as a false alarm in the secondary peak
Likely cause and resolution: Expected under the single-peak rule, which anchors on the global annual maximum. Per-peak actionable-window anchoring is a documented methodological extension, not part of the primary specification.
Symptom: Headline city CI differs from the manuscript
Likely cause and resolution: Confirm B_head = 2,000 with bias-corrected accelerated intervals for the headline summary only, and B_other = 1,000 with percentile intervals everywhere else. The asymmetric replicate count is intentional.
Protocol references
1. Allen, R. V. Automatic earthquake recognition and timing from single traces. Bull. Seismol. Soc. Am. 68, 1521-1532 (1978).

2. Vaezi, Y. & Van der Baan, M. Comparison of the STA/LTA and power spectral density methods for microseismic event detection. Geophys. J. Int. 203, 1896-1908 (2015).

3. Cori, A., Ferguson, N. M., Fraser, C. & Cauchemez, S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am. J. Epidemiol. 178, 1505-1512 (2013).

4. Aldstadt, J. et al. Space-time analysis of hospitalised dengue patients in rural Thailand reveals fine-scale transmission patterns. Trop. Med. Int. Health 17, 1076-1085 (2012).

5. World Health Organization. Handbook for integrated vector management (World Health Organization, 2012).

6. Hii, Y. L. et al. Optimal lead time for dengue forecast. PLoS Negl. Trop. Dis. 6, e1848 (2012).

7. Buckeridge, D. L. Outbreak detection through automated surveillance: a review of the determinants of detection. J. Biomed. Inform. 40, 370-379 (2007).

8. Unkel, S., Farrington, C. P., Garthwaite, P. H., Robertson, C. & Andrews, N. Statistical methods for the prospective detection of infectious disease outbreaks: a review. J. R. Stat. Soc. A 175, 49-82 (2012).

9. Altman, D. G. & Bland, J. M. Diagnostic tests 1: sensitivity and specificity. BMJ 308, 1552 (1994).

10. Bastos, L. S. et al. A modelling approach for correcting reporting delays in disease surveillance data. Stat. Med. 38, 4363-4377 (2019).

11. Hemming, K., Haines, T. P., Chilton, P. J., Girling, A. J. & Lilford, R. J. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. BMJ 350, h391 (2015).
Acknowledgements
We thank the Quezon City Epidemiology and Surveillance Division for the city-scale dengue surveillance data, and the Humanitarian Data Exchange for the regional Philippine dengue dataset.