Mar 27, 2026

Ghannam et al. extended methods V.2

  • Jack Ghannam1
  • 1Harvard University
Icon indicating open access to content
QR code linking to this content
Protocol CitationJack Ghannam 2026. Ghannam et al. extended methods. protocols.io https://dx.doi.org/10.17504/protocols.io.rm7vz49b5lx1/v2Version created by Jack Y. Ghannam
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: March 27, 2026
Last Modified: March 27, 2026
Protocol  Integer ID: 314000
Keywords: extended methodological detail, computational methodological detail, methodological detail, overarching study design, experimental protocol, extended method, extended methodological details for the overarching study design, extended methodological detail, extended method, overarching study design, experimental protocol, computational method, methods this protocol, study, protocol
Abstract
This protocol provides extended methodological details for the overarching study design, experimental protocols, and computational methods used in our study.
DNA and RNA extraction from tumor and matched-normal tissue
For fresh frozen (FF) tumor samples, DNA and RNA were simultaneously extracted from a single tissue sample using the Qiagen AllPrep DNA/RNA Kit. For formalin-fixed paraffin-embedded (FFPE) tumor and non-neoplastic germline control tissue, DNA and RNA were co-extracted using the AllPrep DNA/RNA FFPE Kit from Qiagen. A Picogreen assay was used to quantify all DNA samples. For peripheral blood mononuclear cells (PBMCs; germline controls), DNA was isolated using magnetic bead-based extraction with the QIAsymphony DSP DNA Midi Kit on the QIAsymphony SP system (Qiagen). After lysis of red blood cells, electromagnetic rods were used to separate the DNA-bound magnetic beads from the solution. The beads were then washed multiple times before subsequent elution of DNA in Tris-EDTA (TE).
Whole-exome sequencing (WES) using Twist Somatic Exome v6
Genomic DNA (100-250 ng in 50µL) was used as the input for DNA fragmentation by shearing (Covaris focused-ultrasonicator) with a target of 150 bp-sized fragments. Library preparation was conducted using the KAPA HyperPrep Kit with Library Amplification (product KK8504) and duplex unique molecular identifier (UMI) adapters from Integrated DNA Technologies (IDT). During PCR, unique 8-base dual index sequences present within the p5 and p7 primers (IDT) were incorporated. Beckman Coulter AMPure XP beads were used for enzymatic clean-ups, with the elution volume reduced to 30 µL to increase library concentration.
Following library construction, the libraries were quantified using the Invitrogen Quant-iT™ broad range dsDNA quantification assay kit (Thermo Scientific, Catalog: Q33130) with a 1:200 PicoGreen dilution. Each library was then diluted to a concentration of 35 ng/µL with 10 mM Tris-HCl (pH 8.0). The entire protocol for constructing and quantifying the libraries was automated using the Agilent Bravo liquid handling platform. IDT’s XGen hybridization and wash kit was used to perform hybridization and capture. A set of 12-plex pre-hybridization pools were prepared by pooling equal volumes of the normalized libraries, Human Cot-1, and IDT XGen blocking oligos. The pre-hybridization pools were then lyophilized using the Biotage SPE-DRY system. A custom exome bait set (TWIST Biosciences) and hybridization mastermix were added to the lyophilized pool, followed by resuspension and an overnight incubation. The Hamilton Starlet liquid handling platform was used for library normalization and hybridization, whereas target capture was automated using the Agilent Bravo platform. Following capture, PCR amplification was performed on the captured material.
Following post-capture enrichment, quantification of library pools was performed via qPCR on an Agilent Bravo automated system, employing a kit from KAPA Biosystems with probes targeting the adapter ends. Based on the qPCR results, the pools were normalized to the required loading concentration using the Hamilton Starlet platform. Up to 24 samples per lane were sequenced using Illumina’s NovaSeq S4 sequencing platform. Cluster amplification of the library pools was conducted following the manufacturer’s instructions (Illumina) utilizing Exclusion Amplification cluster chemistry with NovaSeq S4 flow cells. Sequencing was performed using Sequencing-by-Synthesis chemistry on NovaSeq S4 flow cells with paired-end 151 bp reads.
WES using ICE exome
For libraries sequenced using the ICE exome bait set, library construction followed the protocol outlined by Fisher et al.1 with some modifications. First, the starting genomic DNA amount for shearing was decreased to 10-100 ng in 50 µL of solution. Additionally, palindromic forked adapters (IDT) containing unique dual-indexed molecular barcode sequences were used in place of Illumina paired-end adapters to enable downstream pooling. For end repair/A-tailing, adapter ligation, and library enrichment PCR, Kapa HyperPrep reagents in a 96-reaction kit format were utilized. During the Solid Phase Reversible Immobilization (SPRI) cleanup following enrichment, the elution volume was reduced to 30 µL to concentrate the library, and a vortexing step was introduced to ensure maximum template recovery.
Following library construction, hybridization and capture were carried out using Illumina’s TruSeq Rapid Exome Kit, following the manufacturer’s recommended protocol with a few modifications. First, all libraries from one library construction plate were pooled before hybridization. Second, the Midi plate provided in Illumina’s kit was swapped with a skirted PCR plate to allow for automation. The Agilent Bravo liquid handling system was used to automate all steps for hybridization and capture. Following post-capture enrichment, library pools were quantified using qPCR with KAPA Biosystems probes specific to the adapter ends (automated on the Agilent Bravo). Libraries were then normalized to 2 nM based on the qPCR quantification and denatured with 0.2 N NaOH on the Hamilton Starlet. After denaturation, the libraries were diluted to 20 pM using Illumina hybridization buffer.
Cluster amplification of the denatured templates was carried out following the manufacturer’s protocol (Illumina), using exclusion amplification cluster chemistry and HiSeq X flow cells. Sequencing was performed using v2.5 Sequencing-by-Synthesis chemistry for HiSeq X flow cells, with the flow cells analyzed using Real-Time Analysis (RTA) version 2.7.0 or later. Whole exome library pools were sequenced in paired 76 bp runs, with dual-indexed sequences read to identify molecular indices. To ensure sufficient coverage, the pools were run across multiple lanes.
Bulk RNA sequencing using Illumina TruSeq™ Stranded mRNA Sample Preparation Kit
Using the Quant-iT™ RiboGreen® RNA Assay Kit, total RNA was quantified and adjusted to a concentration of 5 ng/µL. Following normalization, 2 µL of External RNA Controls Consortium (ERCC) controls (1:1000 dilution) were added to each sample. 200 ng of each sample was then allocated for library preparation, which was performed using an automated version of the Illumina TruSeq™ Stranded mRNA Sample Preparation Kit, an approach that preserves RNA transcript strand orientation. This protocol entailed mRNA selection with oligo dT beads, followed by heat-induced fragmentation and cDNA synthesis. The resulting 400 bp cDNA fragments were prepared into dual-indexed libraries, involving the addition of an ‘A’ base, adapter ligation with P7 adapters, and PCR amplification using P5 adapters. After enrichment, libraries were quantified using the Quant-iT™ PicoGreen™ assay (1:200 dilution). The libraries were then normalized to 5 ng/µL, pooled, and re-quantified using the KAPA Library Quantification Kit for Illumina platforms. All steps were performed in a 96-well format, with liquid handling performed using Agilent Bravo and Hamilton Starlet automated systems.
Cluster amplification and sequencing on the flow cell were performed according to the manufacturer’s protocols on a NovaSeq 6000. Each sequencing run generated 101 bp paired-end reads, along with an eight-base index barcode read. Data processing, including de-multiplexing and data aggregation, was performed using the Broad Picard Pipeline.
Bulk RNA sequencing using Illumina TruSeq RNA Access Library Prep Kit
Total RNA quality was evaluated using the Caliper LabChip GX2, and the percentage of RNA fragments longer than 200 nt (DV200) was calculated. For each sample, 200 ng of RNA was used for first strand cDNA synthesis, followed by second strand synthesis and ligation of indexed adapters. PCR amplification was then performed to enrich for adapter-ligated fragments. The amplified libraries were quantified via an automated PicoGreen assay. 200 ng of each cDNA library was then pooled into 4-plex groups, excluding controls. Capture probes targeting the exome, along with IDT’s xGen® Blocking Oligos, were added to minimize off-target binding. Samples were hybridized for the recommended duration. After hybridization, streptavidin magnetic beads were employed to capture the library-bound probes, followed by two wash steps to remove nonspecific products. To increase specificity, the hybridization, capture, and washing steps were repeated. A second round of PCR amplification enriched the captured libraries. After enrichment, the libraries were quantified using qPCR with the KAPA Library Quantification Kit for Illumina platforms and pooled equimolarly. The entire workflow was performed in a 96-well format, with liquid handling executed by either the Hamilton Starlet or Agilent Bravo automated systems.
Cluster amplification and sequencing were carried out according to the manufacturer’s protocols on the HiSeq 2000 or 2500 platforms, generating 76 bp paired-end reads along with an eight-base index barcode read. Data processing, including de-multiplexing and data aggregation, was performed using the Broad Picard Pipeline.
Consensus clustering of genetic alterations
Non-negative matrix factorization (NMF) consensus clustering was run with varying numbers of clusters (k). We used the following parameters: k.init = 2, k.final = 8, num.clusterings = 20, maxniter = 2000, error.function = “divergence”, stopconv = 40, and stopfreq = 10. We found cophenetic correlation values were consistently high and minimally different across most values of k (k=2: 0.9553; k=3: 0.9473; k=4: 0.9147; k=5: 0.9401; k=6: 0.9359; k=7: 0.9304; k=8: 0.9053), apart from slightly lower stability at k=4 (0.9147) and k=8 (0.9053).
The classifier used single nucleotide variants (SNVs), indels, and copy number variants (CNVs) from GLASS cohort samples, as in the NMF clustering. We additionally used only variants that were significantly associated with cluster membership in GLASS, determined via Fisher’s exact test with false discovery rate (FDR)-correction. A multinomial classifier was chosen from a diverse set of models via nested 5-fold cross-validation, using the scikit-learn python package2. Candidate models included k-nearest neighbors, centroid-based classifiers, random forests and multinomial regression, with multinomial regression attaining the highest prediction accuracy in cross-validation (76%).
Analysis of somatic variants and copy number alterations
The variant calling and copy number analysis pipeline included the following additional filtering criteria. SNVs and indels from WES were filtered for functional effect using annotations from OncoKB3 and Combined Annotation Dependent Depletion (CADD) (version GRCh37-v1.6)4; their cancer cell fraction (CCF) estimates from ABSOLUTE were required to exceed a 10% threshold. Gene-level copy number alterations (CNAs) from WES were restricted to focal events. SNVs, indels, and CNAs from OncoPanel were filtered by similar criteria. However, variant allele fractions (VAFs) were used in lieu of CCFs since the OncoPanel outputs were insufficient to run ABSOLUTE.
We conducted a separate Cox regression for each gene in the targeted list, for each of SNVs/indels and CNAs, such that each variant class for each gene was separately tested for association with overall survival (OS). Regression was only performed if the variant occurred at least twice in the dataset. For each regression, the independent variable was presence/absence of the variant and the response variable was survival from start of immune checkpoint blockade (ICB) treatment. MGMT promoter methylation status was included as a covariate in each case, with partially methylated cases grouped with methylated cases and cases with missing annotations (n=10) grouped with unmethylated cases. The P-values from these tests were adjusted for multiple hypotheses via Benjamini-Hochberg5, yielding FDR-corrected q-values.
In all cases, we used tumor samples collected before the start of ICB treatment. For the ICB cohort, we selected these pre-ICB samples as follows: when one or more pre-ICB OncoPanel samples existed for a case, the latest OncoPanel sample was used; if no pre-ICB OncoPanel samples existed, the latest WES sample was used. Variants from the GLASS standard-of-care cohort were filtered in a similar fashion to those in the ICB cohort. A bootstrap resampling procedure was used to estimate the power of detecting survival associations in the GLASS cohort, given effect sizes seen in the ICB cohort. The resulting power estimates were 99% for CDKN2A losses, 94% for PDGFRA gain-of-function mutations, 75% for TP53 losses, and 72% for MDM4 amplifications.
Tumor phylogeny analysis
For participants possessing multiple WES samples, we used PhylogicNDT6, a Bayesian method for reconstructing phylogenetic trees from somatic mutations, to infer tumor clonal phylogenies. Briefly, PhylogicNDT uses the variant CCF estimates from ABSOLUTE to cluster mutations into putative tumor clones, and then identifies probable parent-child relationships between those clones. Finally, we used PhylogicNDT’s CopyNumber2Tree method7 to assign CNAs to tumor clones, providing a richer view of tumor evolution. For paired CCF analyses, latest pre- and earliest post-ICB time points were selected for comparison. If a given time point had multiple specimens sequenced by WES, CCFs were averaged for each clone across samples. Clones were considered “evolving” if there was a >25% change relative to their pre-ICB CCF at their earliest post-ICB time point.
Calculation of tumor mutational burden (TMB)
TMB was calculated by dividing the number of non-synonymous mutations in a sample by the size in megabases (Mb) of the corresponding bait set. For WES samples, we used 35 Mb for the bait set size. For OncoPanel samples, the bait set size differs by assay version (e.g., 1.315 Mb for version 3). For analyses that combined TMB measurements from WES and OncoPanel, when a participant possessed samples from both WES and OncoPanel, the OncoPanel TMB was used. The latest pre-ICB sample was used for participants with pre-ICB samples at multiple timepoints. For analyses involving TMB measurements from WES, when a participant possessed multiple samples collected on the same day, TMB values were averaged across all WES samples from that day, which resulted in one TMB value per participant. One patient, having only samples sequenced by OncoPanel, was excluded from TMB analyses because the corresponding version of OncoPanel did not calculate TMB.
Mutational signature analysis
To identify the underlying mutational processes, we performed a two-step mutational signature analysis. First, we inferred mutational signatures de novo from SNVs identified from patient WES data, filtering to include only variants with a CCF (95% lower confidence interval) greater than 0.001. To better resolve distinct processes that may co-exist within a single tumor, we treated individual tumor sub-clones as distinct samples for this discovery phase. These sub-clones were defined using membership probabilities inferred by PhylogicNDT6, and clones with five or fewer mutations were excluded from the analysis. The signature extraction was performed using SignatureAnalyzer's implementation of Automatic Relevance Determination Non-negative Matrix Factorization (ARD-NMF)8. The algorithm was run with 10 random restarts and employed L1 regularization on both the signature profiles and their activities. This unsupervised discovery process yielded a total of six mutational signatures. In the second step, we characterized these six de novo signatures by comparing their mutational profiles to the Catalogue of Somatic Mutations in Cancer (COSMIC) v3 reference9 catalogue using cosine similarity. Five of the signatures closely matched known processes: a clock-like signature (single base substitution [SBS]-like), a signature associated with temozolomide treatment (SBS11-like), a haloalkane-related signature (SBS42-like), and two signatures associated with deficient DNA mismatch repair (dMMR, SBS15-like and SBS26-like). The remaining signature was not a close match to any known COSMIC signatures. Of note, the signature designated as SBS26-like showed strong similarity to both COSMIC SBS12 and SBS26. However, consistent with literature linking signatures resembling both to MMR deficiency10, and for the sake of clarity and brevity, we refer to this signature as SBS26-like throughout the manuscript. Finally, to quantify the activity of these processes in each patient, we projected the set of six de novo signatures back onto the full SNV catalogue of each bulk tumor sample. This was accomplished by running SignatureAnalyzer in a supervised mode, which generated an exposure score (i.e., the number of mutations attributed to each signature) for every tumor. To investigate the interaction between TMB and signatures of dMMR, defined as the sum of SBS15-like and SBS26-like mutational signatures, a nested survival analysis was performed. For this, patients were first stratified into TMB subgroups based on either the median or quartiles of TMB. Within each TMB stratum, the prognostic impact of dMMR signature burden was then assessed using a median dMMR split calculated within a given TMB stratum.
Neoantigen prediction
Personal neoantigens were predicted from coding mutations identified through WES of matched tumor and normal samples. Patient-specific human leukocyte antigen (HLA) class I alleles (HLA-A, -B, -C) were inferred from normal WES data using OptiType11 and Polysolver12. For each nonsynonymous somatic mutation, all possible 8- to 11-mer peptide sequences containing the mutated amino acid were generated as candidate neoantigens. The binding affinity of each neoantigen to the patient’s HLA alleles was predicted using HLAthena13. Peptides with a percentile rank score < 2.0 were considered potential binders, encompassing both strong (%Rank < 0.5) and weak (%Rank < 2.0) binders.
To be included in the final counts, predicted binders were required to have evidence of expression, defined with a corresponding gene expression level >1 transcript per million (TPM). Expressed neoantigens were then stratified by their clonal status. The clonality of each source mutation was determined using CCF estimates from ABSOLUTE, which were subsequently clustered into tumor clones using PhylogicNDT (see section Tumor phylogeny analysis). Neoantigens arising from mutations in the founding clone were classified as clonal. This process yielded the final overall and clonal neoantigen load for each tumor.
Evaluation of DNA repair variants
To identify putative pathogenic germline variants in DNA mismatch repair (MMR) and other key DNA repair genes, germline variant calls from normal whole exome sequencing data for each patient were aggregated from their respective Oncotator-annotated MAF files14. We evaluated for variants within a predefined panel of genes associated with MMR deficiency and polymerase proofreading, including MSH2, MSH6, MLH1, PMS2, POLE, POLD1, and EPCAM. Variants classified as 'Silent' or 'Intron' were removed to enrich for those with potential functional consequences.
The remaining variants were annotated using the Ensembl Variant Effect Predictor (VEP)15. To ensure unambiguous annotation, the VEP output was collapsed by variant, retaining only annotations consistent across all transcripts and removing those with conflicting information. Filtering was then applied to identify likely pathogenic variants: to be retained, a variant had to be predicted as deleterious (REVEL16 score > 0.5 or CADD_PHRED4 score > 20), rare in the general population (Genome Aggregation Database [gnomAD]17 exome allele frequency < 0.01 or not present), and not classified as 'Benign' in ClinVar. Variants classified as 'Likely_benign' were also removed if the classification was reviewed by an expert panel. Any filtered, high-confidence pathogenic variants were then mapped back to the study cohort to identify carriers. This analysis identified seven rare germline variants of interest across six patients. However, a careful review of these cases revealed no classical phenotype of constitutional DNA repair deficiency. The identified variants were largely classified by ClinVar as "Benign/Likely Benign," "Uncertain Significance," or having "Conflicting classifications of pathogenicity." For the five patients in this group with available pre-treatment data, the tumor mutation burden was low (range 1.9–5.7 mut/Mb), which is inconsistent with the expected baseline hypermutation from pathogenic germline POLE/POLD1 or MMR variants. The sixth patient (GBM-188) had a high TMB post-treatment, but their MSH2 variant was classified as a “Variant of Uncertain Significance” by an expert panel, and they exhibited no notable family history of MMR-related cancers, making a germline cause for their TMB unlikely.
Bulk transcriptomic analyses
Quality control filtering
We restricted our analysis of bulk RNA sequencing (RNA-seq) data from the ICB cohort to tumor samples with sufficient quality control metrics: >6500 genes detected and mapping rate >80%, along with other thresholds on exonic rate, rRNA rate, and variation in exon coverage. Samples with ABSOLUTE tumor purity estimates below 20% were rejected. For each participant, the latest pre-ICB treatment sample was used. Samples were included for analysis only if they were collected within 180 days of treatment start. These filters yielded a collection of 128 tumor samples, comprising 79 FF and 49 FFPE samples.
Tumor transcriptional subtype and HLA signature scoring
We used single-sample Gene Set Enrichment Analysis (ssGSEA)18 as implemented in GSEApy19 to score each bulk RNA-seq sample for enrichment in gene sets corresponding to the glioblastoma (GBM) expression subtypes of Verhaak et al.20: Classical, Mesenchymal (MES), Neural, and Proneural (PN). These gene sets were obtained from Molecular Signatures Database (MSigDB)21. We likewise used ssGSEA to generate signature scores for HLA molecule expression, using gene sets from MSigDB (i.e., GOCC_MHC_CLASS_I_PROTEIN_COMPLEX for HLA class I; GOCC_MHC_CLASS_II_PROTEIN_COMPLEX for HLA class II; and GOCC_MHC_PROTEIN_COMPLEX for combined HLA class I and II expression). In some analyses the enrichment scores were used directly as continuous-valued variables. In other analyses, we classified samples into categorical subtype by simply choosing the subtype with maximal score. We applied this scoring and classification to bulk RNA-seq samples from the ICB cohort, as well as the GLASS standard-of-care cohort.
Subtype transition visualization
We used the tumor samples’ ssGSEA-based expression subtype classifications to visualize transitions between subtypes, from pre- to post-treatment. Alluvial plots were generated using the ggplot2 (v3.5.1) extension ggalluvial in R (v.4.3.1) to compare transcriptional subtype transitions pre- versus post-treatment in newly diagnosed and recurrent cases between the ICB cohort and the GLASS standard-of-care cohort.
Differential expression and gene set enrichment analysis
We performed a differential expression analysis in conjunction with gene set enrichment analysis (GSEA)22 to identify gene sets associated with survival. We first used Surrogate Variable Analysis (SVA)23 to account for batch effects in the bulk RNA-seq samples while preserving the OS signal; this was especially important for jointly analyzing FFPE and FF samples, which have varying artifact profiles. We next used Limma voom’s empirical Bayes procedure24 to estimate genes’ association with survival, controlling for the surrogate variables provided by SVA; this yielded a ranked list of survival-associated genes. We used pre-ranked GSEA to test the enrichment of gene sets with respect to the ranked genes. We tested a collection of 56 gene sets from MSigDB, comprising 50 cancer hallmark gene sets25, 4 GBM expression subtype gene sets from Verhaak et al.20, and 2 Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets representing HLA class I and II molecules26,27. We attained FDR-adjusted q-values for these tests via Benjamini-Hochberg correction. Finally, we used a leave-one-out validation procedure to examine the robustness of these q-values; the entire analysis was rerun with each sample held out in turn. For each putative association the largest q-value resulting from the leave-one-out procedure is reported, ensuring no gene set’s significance depends on a single sample. We applied this methodology to bulk RNA-seq data from newly diagnosed and recurrent cases in the ICB and GLASS standard-of-care cohorts.
Immune cell deconvolution with CIBERSORTx
We used CIBERSORTx28 to estimate immune cell prevalence in each bulk RNA-seq sample from the ICB cohort and the GLASS standard-of-care cohort. The “Recurrent GBM: full aggregate” single-cell dataset from the Brain Immune Atlas (https://brainimmuneatlas.org/) was used as the reference29. Genes in this reference with fewer than 1000 reads across all cells were excluded. We ran CIBERSORTx in absolute mode and used S-mode batch correction.
Cell state-signature scoring and refinement
Single cell-derived malignant cell state signature scores were calculated for bulk RNA-seq data from our cohort and the external GLASS cohort. The analysis was performed on log2(TPM+1) expression values. Gene signatures for GBM cellular states were compiled from published single-cell studies (Neftel et al.30, Garofano et al.31, Couturier et al.32, Nomura et al.33). Additional signatures were defined using canonical marker genes for macrophages, T-cells, and oligodendrocytes. All gene symbols in the expression matrices and signature lists were harmonized to current Human Genome Organisation Gene Nomenclature Committee (HGNC) nomenclature prior to scoring.
All signature scoring was performed using the R package scalop in an iterative workflow34. First, an initial score was calculated for each signature using the sigScores() function with default parameters. These initial signatures were then refined using the filter_signatures() function to remove genes having a low correlation with their respective signature score. Second, the refined signatures were expanded by calculating the Pearson correlation between every gene in the expression matrix and each of the refined signature scores across all samples; genes with a correlation coefficient r > 0.8 were added to the corresponding signature list. The sigScores() function was again applied to the resulting signature lists to generate the final scores for analysis. For each of the four primary classifiers (Neftel, Garofano, Nomura, Couturier), a dominant cellular state was assigned to each sample based on the signature with the highest final score.
Proportional representation analysis
To estimate the proportional representation of transcriptional subtypes within each bulk tumor sample, we utilized the calculated ssGSEA scores for the four GBM subtypes defined by Verhaak et al. (classical, mesenchymal, neural, and proneural). Since raw ssGSEA enrichment scores can be negative, a transformation was applied across the entire cohort to convert them into a non-negative scale suitable for proportional analysis. First, the minimum score across all samples and all four subtypes was identified. The absolute value of this single minimum score was then added to every subtype score for every sample in the dataset. This procedure shifts all scores into a positive range while preserving the relative differences between them, both within and across samples. Following this cohort-level adjustment, the four resulting scores for each sample were normalized by dividing each score by their sum, yielding the proportional contribution of each Verhaak subtype. These proportions were then used to calculate a MES-to-PN ratio, allowing for the assessment of subtype dynamics between pre- and post-treatment timepoints. To ensure the pre-treatment samples accurately reflected the tumor state at the time of immunotherapy initiation while preserving sufficient sample size, we included only patients whose pre-ICB tumor sample was collected no more than 45 days prior to the start of treatment for analysis.
External cohort validation
Two external cohorts were evaluated to validate the association between tumor transcriptional subtype and survival35,36. We scored transcriptional signatures and assigned dominant transcriptional subtypes to tumors in these cohorts using ssGSEA, as described above (see Bulk transcriptomic analyses). In recurrent GBM (McFaline-Figueroa et al., 2024), the MES subtype predicted improved OS in the adjuvant setting (P=0.033), trended toward improved OS in the neoadjuvant setting (P=0.077), and was predictive in a combined analysis (P=0.006). Likewise, in newly diagnosed GBM (Weathers et al., 2025), the MES subtype was associated with improved progression-free survival (PFS) (P=0.003), trended toward better OS (P=0.053), and the continuous mesenchymal signature also predicted improved OS (P=0.038).
Immunohistochemistry
First, formalin-fixed FFPE blocks were sectioned at 5 µm. FFPE tissue sections were then baked for 30 minutes at 60°C and deparaffinized (Leica AR9222) prior to staining. The CD3 primary antibody (Dako, catalog number A0452, polyclonal) was applied at a 1:250 dilution following a 20-minute ethylenediaminetetraacetic acid (EDTA) antigen retrieval step (Leica ER2 AR9640). The antibody was visualized using 3,3’-diaminobenzidine (DAB), and sections were counterstained with hematoxylin (Leica DS9800). Finally, the slides were rehydrated in graded alcohol and coverslipped using the HistoCore Spectra CV mounting medium (Leica 3801733).
Nuclear segmentation was performed with a non-AI assisted, 'Traditional' HALO algorithm. Segmentation aggressiveness and nuclear contrast thresholds were adjusted on a per-sample basis to ensure optimal nuclear identification. A standard cytoplasm radius was set at 3.96 µm for all samples. For each sample, a cytoplasmic positivity threshold was set based on visual confirmation of accurate CD3+ cell detection across the tissue.
Multiplex immunofluorescence (mIF)
HLA class I and SOX2 dual staining
For dual staining of HLA class I and SOX2, slides were warmed for 20-30 minutes at 37°C, deparaffinized in two successive changes of xylene for 10–12 minutes each, and subsequently rehydrated through a graded ethanol series (100%, 96%, 70%, and 50%) for 10 minutes each, followed by three 10-minute washes in phosphate-buffered saline (PBS). Heat-induced epitope retrieval was performed by incubating the slides in a lab-prepared, Tris-based antigen unmasking solution at 95°C for 30 minutes, after which they were allowed to cool to room temperature for 45–60 minutes in distilled water. After three 5-minute washes in PBS, a hydrophobic barrier was drawn around the tissue sections, and slides were incubated for 60 minutes in a blocking buffer composed of 20% horse serum, 0.2% Triton X-100, and 0.1% bovine serum albumin (BSA) in PBS. Tissues were then incubated overnight at 4°C with primary antibodies against major histocompatibility complex (MHC) Class I (mouse, Abcam, ab70328, clone EMR8-5, 1:100) and SOX2 (rabbit, Abcam, ab215970, clone EPR3131, 1:50). The following day, slides were washed and incubated with secondary antibodies, diluted 1:200 in 1% horse serum in PBS, for 1 hour at room temperature while protected from light. After final washes in PBS and PBST (0.1% Tween-20 in PBS), nuclei were counterstained with 4’,6-diamidino-2-phenylindole (DAPI) (1 µg/ml) for 10 minutes. Samples were mounted using ProLong Gold Antifade Mountant (Life Technologies, P36930).
HLA class II and SOX2 dual staining
For dual staining of HLA class II and SOX2, staining was performed on the Leica Bond RX automated staining platform using the Leica Biosystems Refine Detection Kit (DS9800). FFPE tissue sections were baked for 30 minutes at 60°C and deparaffinized on-instrument (Leica AR9222). The protocol included a 30-minute antigen retrieval step in Leica ER2 solution (AR9640) before incubation with a primary antibody against HLA-DR (Abcam, ab92511, clone EPR3692, 1:500), detected with an Alexa 647 fluorophore. A stripping step and a 10-minute antigen retrieval in Leica ER2 solution were then performed before incubation with a primary antibody for SOX2 (Abcam, ab215970, clone EPR3131, 1:50), detected with an Alexa 555 fluorophore. Following staining, slides were counterstained with DAPI (Nucblue; Invitrogen R37606) and coverslipped with Prolong Diamond (Invitrogen P36961).
Segmentation parameters
A cell expansion of 5 µm was used to define the cytoplasm. The algorithm was configured with a background radius of 8 µm, a Gaussian sigma of 1.5 µm, and a cell area between 10 µm2 and 400 µm2.
Intensity thresholding and statistical analysis
Statistical analysis was performed in R (v4.3.1). Individual cells were classified into four expression categories ('Negative', 'Low', 'Medium', 'High') using a data-driven thresholding strategy designed to mirror traditional semi-quantitative pathology scores (0, 1+, 2+, 3+). This process was conducted independently for the HLA class I and HLA class II datasets. For each dataset, global intensity thresholds were calculated using all cells from all samples. The "Negative" threshold was defined as two standard deviations (SD) above the mean of the lowest intensity quartile of all cells. Thresholds for "Medium" and "High" were set at 1.5 SD and 3.0 SD above the global mean intensity of all cells, respectively. A cell was classified as "Low" if its intensity was above the 'Negative' threshold but below the 'Medium' threshold. For each sample, the percentage of cells in each tier was then calculated for the total cell population and stratified by malignant (SOX2High) and non-malignant (SOX2Low) compartments. These values were compared across transcriptional subtypes, derived from bulk RNA-seq of the same tumor sample from each respective mIF case, using the Wilcoxon rank-sum test. Spearman's rank correlation was used to assess associations with ssGSEA scores and CD3+ T-cell density.
Nuclei isolation from frozen tissue
Frozen tissue samples were processed using the Chromium Nuclei Isolation Kit (10x Genomics, PN-1000494). The tissue was mechanically dissociated in a chilled Lysis Buffer (prepared from Lysis Reagent, PN-2000558) using a pestle (PN-2000561) until homogeneous. The suspension was then incubated on ice for 10 minutes. The lysate was loaded onto a Nuclei Isolation Column (PN-2000562) and centrifuged at 16,000 x g for 20 seconds at 4°C to clear large debris. The collected flow-through containing the nuclei was centrifuged at 500 x g for 3 minutes at 4°C. The pellet was then resuspended in Debris Removal Buffer (prepared from Debris Removal Reagent, PN-2000560), centrifuged at 700 x g for 10 minutes at 4°C, and the supernatant was discarded. The nuclei pellet was washed twice by resuspending in 1 ml of Wash and Resuspension Buffer (containing BSA and RNase Inhibitor, PN-2000565) and centrifuging at 500 x g for 5 minutes at 4°C. For the final resuspension, the pellet was resuspended in the Wash and Resuspension Buffer. Nuclei were then stained using the ViaStain™ Acridine Orange/Propidium Iodide (AO/PI) solution (Nexcelom Bioscience), and the final concentration and viability were determined using a Countess II Automated Cell Counter (Thermo Fisher Scientific) before proceeding to the 10x Genomics workflow.
10x Genomics for single nucleus RNA sequencing (snRNA-seq)
Briefly, a suspension targeting 20,000 nuclei recovery was loaded onto a Chromium GEM-X single cell 5' Chip (PN-1000698) and processed on a Chromium X Controller. In this step, single nuclei were partitioned into Gel Bead-in-emulsions (GEMs), where their RNA was reverse transcribed using primers containing a 10x Barcode, a UMI, and a template switch oligo to generate barcoded cDNA. After breaking the emulsion, the pooled cDNA was purified using Dynabeads MyOne SILANE magnetic beads (PN-2000048) and subsequently amplified via PCR. 5' gene expression libraries were then constructed from the amplified cDNA using the Library Construction Kit C (PN-1000694). This process involved enzymatic fragmentation, end repair, A-tailing; and ligation of Illumina sequencing adaptors; with SPRIselect bead-based cleanups performed after each step. Finally, sample indexing PCR was performed using the Dual Index Kit TT Set A (PN-1000215) to barcode the libraries.
Single-nucleus data processing of human glioma samples
Initial quality control filtering was applied to remove low-quality cells based on a low number of detected genes (<300) combined with either a high fraction of mitochondrially-encoded genes (>20%) or low expression of housekeeping genes (< 5 of 96 genes, per Tirosh et al.37). An exception was made for cells expressing at least two T-cell markers (i.e., CD2 or CD3D/E/G), which were manually retained for downstream analysis. The average expression of each gene i was then calculated across all cells, N, as , and genes with an average expression below a threshold of 4 were excluded. Technical genes (e.g., antisense, LINC) were removed, while a curated list of biologically relevant genes—including canonical T-cell markers, proliferation-associated genes, and cell-type signature genes—was manually retained. For the cells and genes that passed these quality control filters, we defined relative expression levels by centering the expression levels for each gene across all cells in the dataset. Lastly, samples with fewer than 100 high-quality cells were removed from downstream analysis.

Cell clustering, annotation, and doublet detection
Doublet detection
An integrative approach was used to identify and remove heterotypic doublets:
  1. scDblFinder (v1.18.0): This package was run per sample with default parameters to identify potential doublets.
  2. scds (v1.13.1): The cxds_bcds_hybrid function was used with default parameters, and cells with the highest 5% of scores were flagged as doublets.
  3. Signature Scoring: Cell-type-specific signatures were created from the top 50 differentially expressed genes for each cell type. Using the sigScore function from the scalop package (v1.1.0), each cell was scored against these signatures. A permutation-based approach established a significance threshold (0.99 quantile of scores from a shuffled matrix) for each signature. Cells with scores exceeding the threshold for two or more cell types were flagged as potential doublets. Due to their expected transcriptional similarity, cells co-classified as malignant and glial were not considered doublets.
A cell was definitively classified as a doublet if it was identified by the signature scoring method and at least one of the other two tools (scDblFinder or scds). These consensus doublets, along with cells from an undetermined cluster that was deemed low-quality, were removed from all downstream analyses.
Malignant cell annotation
The CNA-based malignant cell classification was further refined using density-based clustering (dbscan v1.2.2; eps=0.3, minPts=50). All cells within clusters containing >25% CNA-defined malignant cells and >75% of cells from a single patient were also classified as malignant. Conversely, cells with a malignant CNA signal that did not fall into these malignant clusters were reclassified to a non-malignant type based on their maximal cell-type signature score. Final filtering steps removed cells with very high (>10,000 genes) or very low (<500 genes for malignant cells) complexity.
NMF for deriving malignant meta-programs.
To capture the heterogeneity between malignant cells, we used an NMF-based approach. NMF was performed independently on the relative expression values of each sample after setting any negative values to zero. The NMF algorithm requires defining a priori the ‘k’ parameter, reflecting the expected number of latent features in the data. Since the optimal k was unknown and likely varied between samples, we ran the NMF algorithm on each sample across a range of k values (from 4 to 9), which generated 39 distinct programs per sample. Each of these programs was summarized by its top 90 genes (representing the 0.99 quantile of NMF coefficients). To identify robust gene signatures, we then clustered all programs from all samples based on Jaccard similarity, deriving consolidated meta-programs (MPs) that were consistently found across different samples. This process initially yielded 10 MPs, which were then annotated using functional enrichment analysis against several databases: Gene Ontology (GO) terms, the MSigDB Hallmark gene sets, and external signatures of previously described GBM cell states. We refined this final list by excluding four MPs that appeared to be artifacts: one was removed because it contained genes from multiple cell types, and three others were excluded because they were enriched for genes specific to a single non-malignant cell-type, likely representing contamination or doublets. Cell state assignment and validation procedures are described in the section Malignant cell state assignment.
Malignant cell state assignment
Malignant cells were scored for the NMF MPs using the scalop package function sigScore. To facilitate cell classification, we generated a shuffled expression matrix by shuffling the expression values for each gene for each of the 345,094 malignant cells. We then scored the shuffled matrix for the NMF MPs, thereby yielding 345,094 normally distributed scores for each expression program. These served as null distributions for cell-state classification, and we set the 0.95 quantile as the malignant-state-specific threshold by annotating the malignant cell with this state. Each malignant cell was classified into a state if the score computed for that state was larger than the state-specific threshold. Cells that achieved significant scores for multiple states were assigned to the state for which they scored maximally. Cells that did not achieve a score higher than the threshold for any of the states were assigned a ‘not determined’ (ND) state. Cells were assigned a ‘cycling’ state on top of their cellular state if they achieved a score higher than the cell cycle MP threshold, and ‘non-cycling’ otherwise.
To validate our data-derived malignant cell annotations, we scored the malignant cells using external GBM state signatures30–33. Each malignant cell was then assigned to the state corresponding to its highest-scoring signature. Signatures associated with technical artifacts or low-quality cells were excluded. Cycling state was annotated separately on top of the other cellular states.
Generation of pseudobulk expression profiles
Estimate the contribution of different cell types to the mesenchymal signature
To estimate the relative contributions of various cell types—both malignant and non-malignant—to the mesenchymal bulk signal, pseudobulk expression profiles were generated for each sample. Raw counts for each gene i were first summed across all cells j within a sample to produce a sample-level pseudobulk matrix. A conversion factor was then calculated to normalize each pseudobulk profile to a total of one million counts, yielding expression values in counts per million ( ).

The same procedure was subsequently applied at the cell-type level. For each sample, raw counts were summed separately for each annotated cell type, and the sample-specific conversion factor calculated on a sample-level was applied to normalize the data. This ensured that the total pseudobulk counts across all cell types within a sample summed to one million:


These cell-type-resolved pseudobulk profiles were used to quantify the contribution of each cell-type to specific transcriptional signatures. First, the total signature signal was per-sample was calculated from the sample-level pseudobulk profiles by: where i is a gene in the gene-set of interest. Then, the fraction of the signal derived from a specific cell-type was determined by dividing the cell-type signal by the total sample-level signal of a specific signature (containing K genes) . Finally, the mean contribution of a specific cell-type across all samples of a given tumor was used as the final estimate.

Re-annotate Verhaak state to a sample
Verhaak subtype signature scores were calculated from log2-transformed pseudobulk profiles, and each sample was re-annotated based on the highest scoring signature. This approach enabled the classification derived from snRNA-seq data to be directly compared with bulk RNA-seq-based classification from the same sample, accounting for known intra-tumoral heterogeneity.
Estimate the contribution of different cell types to the HLA signature
To estimate the relative contributions of various cell types—both malignant and non-malignant—to the HLA class I and II bulk signal, an identical approach as described above was used (see Estimate the contribution of different cell types to the mesenchymal signature), but with the following gene sets: (HLA class I: 'B2M', 'HLA-A', 'HLA-B', 'HLA-C', 'HLA-E', 'HLA-F', 'HLA-G', 'HLA-H'; HLA class II: 'HLA-DMA', 'HLA-DMB', 'HLA-DOA', 'HLA-DOB', 'HLA-DPA1', 'HLA-DPB1', 'HLA-DQA1', 'HLA-DQA2', 'HLA-DQB1', 'HLA-DQB2', 'HLA-DRA', 'HLA-DRB1', 'HLA-DRB3', 'HLA-DRB4', 'HLA-DRB5').
Simulated bulk expression of glioblastoma cell states
To evaluate the contribution of different malignant cell states to the HLA class I and II signal, a simulated bulk expression approach was employed as described previously34. The analysis was restricted to pre-ICB samples and to malignant-cell states containing at least five cells within a given sample.
The bulk expression generation involved a two-step averaging process. First, for each individual tumor sample, the counts per million (CPM) values were averaged across all cells belonging to a specific malignant state. These sample-level average expression values were then log-transformed according to the formula: where is the expression level for gene j in malignant cell i of a certain cell-state.
Second, to obtain a single expression profile per patient for each state, these values were averaged across all samples belonging to the same patient. The resulting expression matrix, with genes as rows and patient-state combinations as columns, was then row-centered using the scalop R package after excluding the malignant “undetermined" state.
An HLA class I score and an HLA class II score were calculated for each patient-state combination. These scores represent the mean of the centered expression values for canonical gene sets (HLA class I: 'B2M', 'HLA-A', 'HLA-B', 'HLA-C', 'HLA-E', 'HLA-F', 'HLA-G', 'HLA-H'; HLA class II: 'HLA-DMA', 'HLA-DMB', 'HLA-DOA', 'HLA-DOB', 'HLA-DPA1', 'HLA-DPB1', 'HLA-DQA1', 'HLA-DQA2', 'HLA-DQB1', 'HLA-DQB2', 'HLA-DRA', 'HLA-DRB1', 'HLA-DRB3', 'HLA-DRB4', 'HLA-DRB5'). The distributions of these scores were compared between the MES-like state and the other four malignant states using a two-sided, unpaired Welch's t-test.
Survival analyses involving single nucleus RNA sequencing
MES-High vs. MES-Low stratification
For survival analyses based on snRNA-seq classifiers, proportions of malignant cell states were calculated for each pre-ICB sample with over 100 malignant cells. Patient-level proportions were derived by averaging across samples if replicates were present. For the MES-High vs. MES-Low analysis, patients were stratified into two groups based on a median split of the 'MES-like' malignant cell proportion across our NMF-based classification and previously published classifiers.
Cell type Cox models
To evaluate the association between cell type abundance and survival, Cox proportional hazards models were fit for each malignant and tumor microenvironment (TME) cell type. The patient-level proportion of each cell type was calculated as described above and standardized (Z-score transformed). Each standardized proportion was used as a continuous variable in a univariable Cox model adjusted for disease setting (newly diagnosed vs. recurrent) and MGMT promoter methylation status. The hazard ratio was interpreted as the change in hazard for each one standard deviation increase in cell proportion.
Longitudinal analysis of cellular composition
For patients with paired pre- and post-ICB samples, cellular compositions were calculated at each timepoint to assess longitudinal changes. For both the malignant and TME compartments, the proportion of each cell type was determined relative to the total number of cells in its respective compartment. If multiple samples existed for a single timepoint, their cell type proportions were averaged to yield a single representative value. These longitudinal changes were visualized as fish plots, where the x-axis represents the time from initial diagnosis in days. For this visualization, patients were stratified into three groups based on the absolute change in the MES-like cell fraction within the malignant compartment between their pre- and post-ICB timepoints. Patients were defined as MES-like Increase if this fraction increased by more than 25 percent, MES-like Decrease if it decreased by more than 25 percent, and MES-like Stable if the change was ≤25 percent in either direction. Substantial shifts for any given cell type were defined as an absolute change in cell fraction greater than 25 percent between the pre- and post-ICB samples.
Stepwise multivariable model
To determine which independent variables from the prespecified pool (MES versus PN subtype, MGMT promoter methylation status, absolute lymphocyte count [ALC], disease setting, age, specific checkpoint inhibitor, sex, dexamethasone exposure, genomic cluster (overall, and in dummy variable C1-C5), and HLA class I and II ssGSEA scores) to include, we performed a stepwise selection procedure to build a multivariable survival model. The stepwise selection process implemented forward-backward selection and determined model selection by Schwarz Bayesian information criterion (SBC). The entry type I error rate to be permitted into the model was 0.20, the exiting type I error rate was 0.05 or greater to be excluded from model estimation.
Protocol references
1. Fisher, S. et al. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol 12, R1 (2011).
2. Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011).
3. Chakravarty, D. et al. OncoKB: A Precision Oncology Knowledge Base. JCO Precis Oncol 2017, PO.17.00011 (2017).
4. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310–315 (2014).
5. Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 289–300 (1995).
6. Leshchiner, I. et al. Comprehensive analysis of tumour initiation, spatial and temporal progression under multiple lines of treatment. 508127 Preprint at https://doi.org/10.1101/508127 (2019).
7. Parry, E. M. et al. Evolutionary history of transformation from chronic lymphocytic leukemia to Richter syndrome. Nat Med 29, 158–169 (2023).
8. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
9. Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res 52, D1210–D1217 (2024).
10. Degasperi, A. et al. A practical framework and online tool for mutational signature analyses show inter-tissue variation and driver dependencies. Nat Cancer 1, 249–263 (2020).
11. Szolek, A. et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316 (2014).
12. Shukla, S. A. et al. Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes. Nat Biotechnol 33, 1152–1158 (2015).
13. Sarkizova, S. et al. A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nat Biotechnol 38, 199–209 (2020).
14. Ramos, A. H. et al. Oncotator: cancer variant annotation tool. Hum Mutat 36, E2423-2429 (2015).
15. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol 17, 122 (2016).
16. Ioannidis, N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet 99, 877–885 (2016).
17. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
18. Barbie, D. A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108–112 (2009).
19. Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
20. Verhaak, R. G. W. et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, 98–110 (2010).
21. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
22. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545–15550 (2005).
23. Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3, 1724–1735 (2007).
24. Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15, R29 (2014).
25. Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1, 417–425 (2015).
26. Loch, S. & Tampé, R. Viral evasion of the MHC class I antigen-processing machinery. Pflugers Arch 451, 409–417 (2005).
27. Roche, P. A. & Furuta, K. The ins and outs of MHC class II-mediated antigen processing and presentation. Nat Rev Immunol 15, 203–216 (2015).
28. Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 37, 773–782 (2019).
29. Pombo Antunes, A. R. et al. Single-cell profiling of myeloid cells in glioblastoma across species and disease stage reveals macrophage competition and specialization. Nat Neurosci 24, 595–610 (2021).
30. Neftel, C. et al. An Integrative Model of Cellular States, Plasticity, and Genetics for Glioblastoma. Cell 178, 835-849.e21 (2019).
31. Garofano, L. et al. Pathway-based classification of glioblastoma uncovers a mitochondrial subtype with therapeutic vulnerabilities. Nat Cancer 2, 141–156 (2021).
32. Couturier, C. P. et al. Single-cell RNA-seq reveals that glioblastoma recapitulates a normal neurodevelopmental hierarchy. Nat Commun 11, 3406 (2020).
33. Nomura, M. et al. The multilayered transcriptional architecture of glioblastoma ecosystems. Nat Genet 57, 1155–1167 (2025).
34. Hara, T. et al. Interactions between cancer cells and immune cells drive transitions to mesenchymal-like states in glioblastoma. Cancer Cell 39, 779-792.e11 (2021).
35. McFaline-Figueroa, J. R. et al. Neoadjuvant anti-PD1 immunotherapy for surgically accessible recurrent glioblastoma: clinical and molecular outcomes of a stage 2 single-arm expansion cohort. Nat Commun 15, 10757 (2024).
36. Weathers, S.-P. et al. Improved overall survival in an anti-PD-L1 treated cohort of newly diagnosed glioblastoma patients is associated with distinct immune, mutation, and gut microbiome features: a single arm prospective phase I/II trial. Nat Commun 16, 3950 (2025).
37. Tirosh, I. et al. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. Nature 539, 309–313 (2016).