License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Protocol status: Working We use this protocol and it's working
Created: August 22, 2025
Last Modified: August 22, 2025
Protocol Integer ID: 225246
Keywords: dynamont this protocol, dynamont, dynamont manuscript, zenodo, dataset
Funders Acknowledgements: TMWWDG
Grant ID: FKZ5575/10-9
DFG EXC 2051
Grant ID: Project‐ID 390713860
BMBF
Grant ID: 01GR2305B.TP7
European Research Council
Grant ID: CoG‐101088027
Benchmark Preparation Commands
1 # Extract 10.000 Random Reads
pod5 view <dataset.pod5> --ids --no-header -o all_ids.txt
sort --random-sort all_ids.txt | head --lines 10000 > 10k_ids.txt
pod5 filter <dataset.pod5> -o <dataset_r10k.pod5> --ids 10k_ids.txt
2 # Convert To Other Data Formats
blue-crab p2s <dataset_r10k.pod5> -o <dataset_r10k.blow5>
pod5 convert to_fast5 <dataset_r10k.pod5> -o fast5/
multi_to_single_fast5 -i fast5/ -s single_fast5/
3 ## explicitely using rna002_70bps_hac\@v3 for RNA002 data
dorado basecaller sup -x cuda:0 <dataset_r10k.pod5> > <dataset_r10k.bam>
samtools bam2fq <dataset_r10k.bam> > <dataset_r10k.fastq>
dorado summary <dataset_r10k.bam> > sequencing_summary.txt
4 # converting sequencing summary to tombo format (single fast5)
awk -F'\t' 'NR == 1 {print; next} {$1 = $2 ".fast5"; print}' OFS='\t' sequencing_summary.txt > tombo_sequencing_summary.txt
5 ## preset = splice if RNA and h_sapiens, s_cerevisiae, e_coli, sarscov2
## preset = lr:hq for DNA R10.4.1
minimap2 -x -a | samtools view -hbF4 | samtools sort >
Dynamont Segmentation Commands
6 # model can be added explicitely, otherwise default pore model is chosen
python segment.py --raw <path/to/pod5/dataset_r10k/> --basecalls <dataset_r10k.bam> --mode basic -o <dynamont.csv> --pore <pore>
Dorado Segmentation Commands
7 # Basecalling with emit moves
## explicitely using rna002_70bps_hac\@v3 for RNA002 data
dorado basecaller sup -x cuda:0 --emit-moves <dataset_r10k.pod5> > <dataset_r10k_moves.bam>
8 # Extracting moves as segmentation borders
python extractDoradoMoves.py <dataset_r10k_moves.bam> -o <dataset_r10k_moves.tsv>
f5c Segmentation Commands
9 f5c index --slow5 <dataset_r10k.blow5> <dataset_r10k.fastq>
10 # Eventalign
## added --rna in case of RNA
f5c Eventalign -b <dataset_r10k_mapping.bam> -g <ref.fa> -r <dataset_r10k.fastq> --slow5 <dataset_r10k.blow5> --signal-index --collapse-events --pore <pore> --min-mapq 0 --summary <dataset_r10k_event.sum> > <dataset_r10k_event.tsv>
11 # Resquiggle
## added --rna in case of RNA
f5c Resquiggle --pore <pore> <dataset_r10k.fastq> <dataset_r10k.blow5> > <dataset_r10k_resqu.tsv>
Tombo Segmentation Commands
12 tombo preprocess annotate_raw_with_fastqs --fast5-basedir single_fast5/ --fastq-filenames <dataset_r10k.fastq> --sequencing-summary-filenames sequencing_summary.txt
13 # only executed on RNA002
tombo Resquiggle --q-score 0 --rna single_fast5/ <ref.fa>
Uncalled4 Segmentation Commands
14 # preset = splice if RNA and h_sapiens, s_cerevisiae, e_coli, sarscov2
# preset = lr:hq for DNA R10.4.1
# preset = map-ont else
dorado basecaller sup -x cuda:0 --reference <ref.fa> --mm2-opts "-x <preset> --secondary=no" --emit-moves <dataset_r10k.pod5> > <mapped_basecalls.bam>
samtools view -hbF 2304 <mapped_basecalls.bam> > <primary_mapped_basecalls.bam>
uncalled4 align --ref <ref.fa> --reads <dataset_r10k.pod5> --bam-in <primary_mapped_basecalls.bam> --tsv-out <uncalled4_segmentation.tsv> --tsv-cols aln.read_id,dtw --min-aln-length 1