Parse the filtered mpileup to extract forward and reverse strand read counts for both the reference and alternate alleles at each variant position. This step requires a custom parsing script.
mpileup Base String Encoding (Reference)
Forward strand reference reads: '.' (dot)
Reverse strand reference reads: ',' (comma)
Forward strand alternate reads: uppercase base letter (A, T, C, G)
Reverse strand alternate reads: lowercase base letter (a, t, c, g)
Insertions: '+' followed by length and inserted sequence
Deletions: '-' followed by length and deleted sequence
For each variant position from variants.tsv, extract from filtered.mpileup:
Forward reference reads (count of '.' characters)
Reverse reference reads (count of ',' characters)
Forward alternate reads (count of uppercase alt base)
Reverse alternate reads (count of lowercase alt base)
NOTE: Custom Python or R scripts are required for this step. Standard mpileup parsers (e.g., pysam or the R package Rsamtools) can decode base strings. Ensure your script correctly handles read-start ('^') and read-end ('$') markers, as well as insertion/deletion encoding, which must be stripped before counting.