In addition to the pre-processing steps at the contig level, postprocessing or integrating cell-level quality control is performed using Dandelion’s ‘check_contig’ function. The function checks whether a rearrangement is annotated with consistent V, D, J and C gene calls and performs special operations when a cell has multiple contigs. All contigs in a cell are sorted according to the unique molecular identifier (UMI) count in descending order, and productive contigs are ordered higher than non-productive contigs. For cells with other than one pair of productive contigs (one VDJ and one VJ), the function will assess if the cell is to be flagged with having orphan (no paired VDJ or VJ chain), extra pair(s) or ambiguous (biologically irreconcilable, for example, both BCRs in the same cell) status with an exception that IgM and IgD are allowed to coexist in the same B cell if no other isotypes are detected. The function also asserts a library type restriction with the rationale that the choice of the library type should mean that the primers used would most likely amplify only relevant sequences to a particular locus. Therefore, if there are any annotations to unexpected loci, these contigs likely represent artifacts and will be filtered away. A more stringent version of ‘check_contigs’ is used here by a separate function, ‘filter_contigs’, which only considers productive VDJ contigs, asserting a single cell should only have one VDJ and one VJ pair, or only an orphan VDJ chain, and explicitly removes contigs that fail these checks (with the same exceptions for IgM/IgD as per above).
**Clonotype definition and diversity**
BCRs were grouped into clones/clonotypes using a sequential, rule-based procedure applied to productive rearrangements and implemented within Dandelion/Change-O outputs. Briefly, after cell-level QC (mentioned above), each cell contributed at most one dominant productive VDJ (heavy) and one productive VJ (light; IGK/IGL) contig (selected by UMI count), and clonotypes were assigned as follows:
1. Chain-aware grouping (heavy and light): Heavy- and light-chain repertoires were clonotyped separately using the criteria below, and paired clonotypes were defined using the combined heavy+light signature when both chains were available.
2. Shared V/J gene usage: Sequences were first required to have identical V-gene and J-gene calls (gene-level calls, not allele-level unless otherwise stated) for the relevant chain.
3. Matched junction length: Within each V/J bin, sequences were required to have identical CDR3 (junction) amino-acid length, ensuring a like-for-like comparison across aligned CDR3s.
4. CDR3 similarity threshold: Sequences passing (1–3) were then clustered by CDR3 amino-acid sequence similarity using a Hamming-distance criterion (equal-length CDR3s), with clonotype membership requiring ≥85% amino-acid identity.
5. Handling incomplete receptors: Cells lacking an unambiguous productive chain (e.g., missing light chain or orphan heavy chain) were clonotyped using the available productive chain only, and were excluded from analyses requiring paired heavy+light definition.
Diversity was quantified from the resulting clonotype frequency tables (counts of cells per clonotype within each sample/timepoint/condition). Clonal expansion was summarised by the distribution of clone sizes (e.g., unexpanded singletons versus expanded clones), and repertoire diversity was computed using standard clonotype-based diversity metrics (e.g., Shannon entropy and/or Simpson diversity) on depth-normalised clonotype tables to enable fair comparisons across samples.