BCRseq Analysis using Dandelion

Oliver  Skinner; Saba Asad

Apr 17, 2026

BCRseq Analysis using Dandelion

DOI

https://dx.doi.org/10.17504/protocols.io.eq2lyorwwgx9/v1

Oliver Skinner¹,
Saba Asad¹

¹Peter Doherty Institute for Infection and Immunity, Department of Microbiology and Immunology, University of Melbourne, Parkville, Victoria, Australia

BCRseq Analysis

Oliver Skinner

DOI: https://dx.doi.org/10.17504/protocols.io.eq2lyorwwgx9/v1

Protocol Citation: Oliver Skinner, Saba Asad 2026. BCRseq Analysis using Dandelion. protocols.io https://dx.doi.org/10.17504/protocols.io.eq2lyorwwgx9/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: April 17, 2026

Last Modified: April 17, 2026

Protocol Integer ID: 315247

Keywords: specific antibodies influences immunity to malaria, diversity in the antigenic target, plasmodium infection, malaria, bcrseq data, specific antibodies influences immunity, analysing bcrseq data, bcrseq analysis, immunoglobulin isotype switching, phenotypic variation, cognate antigen, antigenic target, clonal expansion, using dandelion

Abstract

Naive B cells amplify and diversify their responses when activated by cognate antigen, via clonal expansion, immunoglobulin isotype switching, phenotypic variation, and somatic hypermutation (SHM). Since diversity in the antigenic targets, functional classes, and production kinetics of parasite-specific antibodies influences immunity to malaria, we test here whether individual B cell clones diversify over time in vivo during Plasmodium infection and treatment. Here we share our step by step protocol for analysing BCRseq data using Dandelion.

Guidelines

**Postprocessing**

In addition to the pre-processing steps at the contig level, postprocessing or integrating cell-level quality control is performed using Dandelion’s ‘check_contig’ function. The function checks whether a rearrangement is annotated with consistent V, D, J and C gene calls and performs special operations when a cell has multiple contigs. All contigs in a cell are sorted according to the unique molecular identifier (UMI) count in descending order, and productive contigs are ordered higher than non-productive contigs. For cells with other than one pair of productive contigs (one VDJ and one VJ), the function will assess if the cell is to be flagged with having orphan (no paired VDJ or VJ chain), extra pair(s) or ambiguous (biologically irreconcilable, for example, both BCRs in the same cell) status with an exception that IgM and IgD are allowed to coexist in the same B cell if no other isotypes are detected. The function also asserts a library type restriction with the rationale that the choice of the library type should mean that the primers used would most likely amplify only relevant sequences to a particular locus. Therefore, if there are any annotations to unexpected loci, these contigs likely represent artifacts and will be filtered away. A more stringent version of ‘check_contigs’ is used here by a separate function, ‘filter_contigs’, which only considers productive VDJ contigs, asserting a single cell should only have one VDJ and one VJ pair, or only an orphan VDJ chain, and explicitly removes contigs that fail these checks (with the same exceptions for IgM/IgD as per above).

**Clonotype definition and diversity**

BCRs were grouped into clones/clonotypes using a sequential, rule-based procedure applied to productive rearrangements and implemented within Dandelion/Change-O outputs. Briefly, after cell-level QC (mentioned above), each cell contributed at most one dominant productive VDJ (heavy) and one productive VJ (light; IGK/IGL) contig (selected by UMI count), and clonotypes were assigned as follows:

1. Chain-aware grouping (heavy and light): Heavy- and light-chain repertoires were clonotyped separately using the criteria below, and paired clonotypes were defined using the combined heavy+light signature when both chains were available.

2. Shared V/J gene usage: Sequences were first required to have identical V-gene and J-gene calls (gene-level calls, not allele-level unless otherwise stated) for the relevant chain.

3. Matched junction length: Within each V/J bin, sequences were required to have identical CDR3 (junction) amino-acid length, ensuring a like-for-like comparison across aligned CDR3s.

4. CDR3 similarity threshold: Sequences passing (1–3) were then clustered by CDR3 amino-acid sequence similarity using a Hamming-distance criterion (equal-length CDR3s), with clonotype membership requiring ≥85% amino-acid identity.

5. Handling incomplete receptors: Cells lacking an unambiguous productive chain (e.g., missing light chain or orphan heavy chain) were clonotyped using the available productive chain only, and were excluded from analyses requiring paired heavy+light definition.

Diversity was quantified from the resulting clonotype frequency tables (counts of cells per clonotype within each sample/timepoint/condition). Clonal expansion was summarised by the distribution of clone sizes (e.g., unexpanded singletons versus expanded clones), and repertoire diversity was computed using standard clonotype-based diversity metrics (e.g., Shannon entropy and/or Simpson diversity) on depth-normalised clonotype tables to enable fair comparisons across samples.

Pre-processing

Adjust cell and contig barcodes by adding user-supplied suffixes and/or prefixes to ensure that there are no overlapping barcodes between samples.

Re-annotation of contigs with igblastn (v1.19.0) against IMGT (international ImMunoGeneTics) reference sequences.

Re-annotation of D and J genes separately using blastn with similar parameters as per igblastn (dust = ‘no’, word size ( J = 7; D = 9)) but with an additional e-value cutoff ( J = 10^^−4 in contrast to igblastn’s default cutoff of 10; D = 10^^−3^^). This is to enable the annotation of contigs without the V gene present.

Identification and recovery of nonoverlapping individual J gene segments (under associated ‘j_chain_multimapper’ columns). In the list of all mapped J genes (all_contig_j_blast.tsv) from blastn, the J gene with the highest score (j_support) was chosen. Dandelion then looks for the next J gene with the highest ‘j_support’ value, and with start (j_sequence_start) and end (j_sequence_end) positions not overlapping with the selected J gene, and does so iteratively until the list of all mapped J genes are exhausted. In contigs without V gene annotations, it then selects the 5’ end leftmost J gene and update the ‘j_call’ column in the final AIRR table. For contigs with V gene annotations, but with multiple J gene calls, it uses the annotations provided by igblastn.

Additional re-annotation of heavy-chain constant (C) region calls using blastn (v2.13.0) against curated sequences from CH1 regions of respective isotype class.

Heavy chain V gene allele correction using TIgGER (v1.0.0). The final outputs are then parsed into AIRR format with change-o scripts.

The rearrangement sequences that pass standard quality control checks are saved in file ending with the suffix ‘_contig_dandelion.tsv’.

Postprocessing

Perform cell-level quality control using Dandelion’s ‘check_contig’ function to ensure consistent V, D, J, and C gene calls. Sort contigs by UMI count and prioritize productive contigs.
Flag cells with orphan, extra pair(s), or ambiguous status, allowing IgM and IgD to coexist if no other isotypes are detected.
Apply library type restriction to filter out contigs annotated to unexpected loci.
Use ‘filter_contigs’ to ensure a single cell has only one VDJ and one VJ pair, or only an orphan VDJ chain, removing contigs that fail these checks.

Clonotype definition and diversity

Group BCRs into clones/clonotypes using a rule-based procedure applied to productive rearrangements.
After cell-level QC, each cell contributes at most one dominant productive VDJ and one productive VJ contig, selected by UMI count.
Chain-aware grouping: Clonotype heavy- and light-chain repertoires separately, defining paired clonotypes using combined heavy+light signature.
Shared V/J gene usage: Require identical V-gene and J-gene calls for the relevant chain.
Matched junction length: Require identical CDR3 amino-acid length within each V/J bin.
CDR3 similarity threshold: Cluster sequences by CDR3 amino-acid sequence similarity using a Hamming-distance criterion, requiring ≥85% amino-acid identity.
Handling incomplete receptors: Clonotype cells lacking an unambiguous productive chain using the available productive chain only, excluding them from analyses requiring paired heavy+light definition.
Diversity was quantified from the resulting clonotype frequency tables (counts of cells per clonotype within each sample/timepoint/condition). Clonal expansion was summarised by the distribution of clone sizes (e.g., unexpanded singletons versus expanded clones), and repertoire diversity was computed using standard clonotype-based diversity metrics (e.g., Shannon entropy and/or Simpson diversity) on depth-normalised clonotype tables to enable fair comparisons across samples.

Somatic hypermutation analysis

The basic mutation load is quantified by using the ‘pp.quantify_mutations’ in Dandelion which is a wrapper of SHaZam’s basic mutational analysis. It sums all mutation scores (heavy and light chains, silent and replacement mutations) for the same cell.