MPRA library design and cloning of functional evolution, modification, and derivatization of mammalian developmental enhancers

Tony Li

Apr 17, 2026

MPRA library design and cloning of functional evolution, modification, and derivatization of mammalian developmental enhancers

DOI

https://dx.doi.org/10.17504/protocols.io.rm7vz4dx5lx1/v1

Tony Li¹

¹University of Washington

Tony

Tony Li

University of Washington

DOI: https://dx.doi.org/10.17504/protocols.io.rm7vz4dx5lx1/v1

Protocol Citation: Tony Li 2026. MPRA library design and cloning of functional evolution, modification, and derivatization of mammalian developmental enhancers. protocols.io https://dx.doi.org/10.17504/protocols.io.rm7vz4dx5lx1/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: April 17, 2026

Last Modified: April 17, 2026

Protocol Integer ID: 315226

Keywords: mammalian developmental enhancer, derivatization of mammalian developmental enhancer, evolutionary dynamics of mammalian ci, parietal endoderm enhancer, mammalian ci, regulatory element, cloning of functional evolution, parallel reporter assay, nucleotide resolution

Abstract

This contains detailed protocol for two complementary massively parallel reporter assay (MPRA) studies that together dissect the functional architecture and evolutionary dynamics of mammalian cis-regulatory elements (CREs) at nucleotide resolution, using five parietal endoderm enhancers as a model system.

Materials

- 500 bp oligos (Twist Biosciences)
- KAPA2G Robust HotStart ReadyMix (Roche)
- F_Retrieval_Primer
- R_Retrieval_Primer
- SYBR green 100x
- OJBL684
- OJBL685
- OJBL686
- OJBL056
- Molecular barcode (BC)
- QPCR machine
- Clean 26 Concentrator (Zymo Research)
- Nanodrop spectrophotometer
- AgeI-HF and SbfI-HF (NEB)
- C3020K (NEB)
- Zymo Research purification kit
- NextSeq2000
- OJBL681/OJBL684
- OJBL061
- OJBL683/OJBL686
- OJBL064
- PEAR
- Bowtie2 (v2.5.3)
- Python and R scripts
- CRL-2745 (ATCC)
- Dulbecco’s modified Eagle medium (DMEM; Thermo Fisher)
- FBS (Fisher Scientific)
- HyClone fetal bovine serum
- Penicillin/streptomycin (Thermo Fisher)
- Lipofectamine 2000 (Thermo Fisher)
- Opti-MEM (Gibco)
- PBS
- 0.05% trypsin
- All-prep DNA/RNA mini kit (Qiagen)
- OJBL790
- OJBL789
- OJBL076
- OJBL077
- TURBO DNase
- FS buffer
- Dithiothreitol
- dNTP mix
- SSIII (Thermo Fisher)
- OJBL790
- OJBL076
- OJBL077
- TapeStation D1000 HS (Agilent)
- Ampure XP beads
- bcl2fastq
- BCALM R package

MPRA library design and cloning

To construct the libraries, the sequences were synthesized as 500 bp oligos (Twist Biosciences) with dual out primers for PCR amplification and isolation. Oligos were double stranded using three rounds of PCR amplification with KAPA2G Robust HotStart ReadyMix (Roche). A total of 5 ng of DNA input was used. For dial out PCR1, DNA was mixed with 12.5 µl KAPA2G Robust master mix, 1.25 µl 10 µM (F_Retrieval_Primer), 1.25 µl 10 µM (R_Retrieval_Primer), 0.25 µl of SYBR green 100×, and 8.75 µl water. Homology ends for Gibson assembly were appended through PCR2, with ~5 ng of the eluate from PCR1 taken as input, 12.5 µl KAPA2G Robust master mix, 1.25 µl 10 µM OJBL684, 1.25 µl 10 µM OJBL685, 0.25 µl of SYBR green 100×, and 8.75 µl water. Barcode (BC) insertions for each element were appended through PCR3, with ~5 ng of the eluate from PCR2 taken as input, 12.5 µl 2× KAPA2G Robust master mix, 1.25 µl 10 µM OJBL684, 1.25 µl 10 µM OJBL056, 0.25 µl of SYBR green 100×, and 8.75 µl water. Primer OJBL056 contains fifteen random Ns to serve as a molecular barcode (BC) for CRE-BC association. Each PCR step was amplified with tracking by qPCR with 1 min at 95 °C, and cycles up to the qPCR inflection point with 15 s at 95 °C, 15 s at 65 °C and 1 min at 72 °C. All PCR steps were cleaned up with DNA Clean 26 Concentrator (Zymo Research), eluted in 12 µl of water and quantified with a spectrophotometer (Nanodrop). The final product pool was used as inserts for a pooled Gibson assembly as described below.

The plasmid backbone was linearized using AgeI-HF and SbfI-HF (NEB) and ligated with the pooled PCR-amplified CREs via Gibson assembly. The assembly mixture was transformed (electroporation) into competent bacteria (C3020K; NEB). After bottlenecking to ensure library complexity of ~150,000 colonies, cultures were incubated overnight at 37°C, and plasmid DNA was purified the following day using a miniprep kit (Zymo Research). The resulting plasmid library was then used for the final subassembly step to associate BC to the CRE.

CRE-BC subassembly

To associate barcodes to CRE, PCR amplification was performed from the plasmid libraries to generate the CRE-BC libraries. The reaction consisted of 12.5 µl KAPA2G Robust master mix, 1.25 µl 10 µM OJBL060 (appending Nextera P7 adapter), 1.25 µl 10 µM custom-indexed P5 primers (NextP5_index1-9), 0.25 µl of SYBR green 100×, and 8.75 µl water. PCR step was amplified with tracking by qPCR with 1 min at 95 °C, and cycles up to the qPCR inflection point with 15 s at 95 °C, 15 s at 65 °C and 1 min at 72 °C. The resulting amplified libraries are purified by 1.0× Ampure XP beads (Beckman Coulter).

Sequencing

Sequencing libraries were pooled and paired-end sequenced on the NextSeq2000 with the following primers and cycle numbers: read1 (CRE forward): 159 cycles, primer oJBL681/oJBL684; index1 (BC read): 15 cycles, primer oJBL061; read2 (CRE reverse): 158 cycles, primer oJBL683/oJBL686; index2 (sample index): 6 cycles, primer oJBL064. For libraries with larger amplicons, cycle numbers were adjusted: read1: 401 cycles; index1: 15 cycles; read2: 212 cycles; index2: 10 cycles.

CRE reads were preprocessed to remove adapter reads and low quality base calling using trim_galore (default parameters for paired-end reads). After trimming, forward and reverse CRE reads were then joined and error-corrected with PEAR (small amplicons: options -v 4; larger amplicons: options -v 20). Next, CRE reads were mapped to their respective reference libraries using bowtie2 (v2.5.3) and BC reads were merged and tallied for each mapped CRE using custom Python and R scripts. Following this, BCs were filtered based on read count to remove low-abundance barcodes from the CRE-BC association. The resulting CRE-BC association dictionaries were utilized for downstream analyses, which included processing valid barcode counts for bulk MPRA.

Cell culture, transfection, and collection

PYS-2 cells (CRL-2745, ATCC) were grown in Dulbecco’s modified Eagle medium (DMEM; Thermo Fisher, cat. no. 11995065), supplemented with 10% FBS (Fisher Scientific, Cytiva HyClone fetal bovine serum, cat. no. SH3039603) and 1× penicillin/streptomycin (Thermo Fisher, cat. no. 15140122). Cells were kept at 37 °C and 5% CO2, and passaged every 2 days.

Three biological replicates of 2-6 M PYS-2 cells were transfected episomally using Lipofectamine 2000 (Thermo Fisher, cat. no. 11668030, Gibco Opti-MEM cat. no. 31985) with 4 µg of reporter plasmid per replicate. Positive (plasmid 002) and negative (plasmid 003) control plasmids were transfected at matched concentrations to monitor transfection efficiency and to assess the duration of episomal plasmid expression following transfection. Cells were washed with phosphate-buffered saline (PBS) with medium changes the next day, and cells passaged as usual thereafter. After 2 days, PYS-2 cells were lifted off plates with 0.05% trypsin, washed once with PBS, and resuspended in 80% ice-cold methanol, and placed at −80 °C until further processing.

Bulk MPRA library preparation

Genomic DNA and RNA was extracted from methanol fixed cells using the All-prep DNA/RNA mini kit (Qiagen) following the manufacturer’s instructions. MPRA amplicon libraries from DNA were generated in two steps of PCR amplification with KAPA2G Robust HotStart ReadyMix (Roche). A total of 0.5–1 µg of genomic DNA input was used. For low-cycle number PCR1, gDNA was mixed with 12.5 µl KAPA2G Robust master mix, 1.25 µl 10 µM oJBL790, and 1.25 µl 10 µM oJBL789. Cycling parameters: 1 min at 95 °C, and four cycles of 15 s at 95 °C, 15 s at 65 °C and 30 s at 72 °C, followed by 4 °C hold. Primer oJBL789 contains ten random Ns to serve as pseudo-UMI (hereafter referred to as UMIs for brevity) to correct for PCR jackpotting. Reactions were cleaned up with Ampure XP beads at 1×, and eluted in 12 µl of water. Illumina adapters and sequencing indices were appended through PCR2, with 5 µl of the eluate from PCR1 taken as input, and 7.5 µl KAPA2G Robust master mix, 0.75 µl 100× SYBr green, 1.25 µl 10 µM oJBL076, and 1.25 µl 10 µM Nextera P7 indexed primers. Libraries were amplified with tracking by qPCR with 1 min at 95 °C, and cycles up to the qPCR inflection point with 15 s at 95 °C, 15 s at 65 °C and 1 min at 72 °C. Libraries were then cleaned up with Ampure XP beads at 1×.

Amplicon libraries for RNA were obtained by first DNase-treating RNA (5 µg RNA, 2.25 µl TURBO DNase (Thermo Fisher), and 1.875 µl 10× buffer, incubated at 37 °C for 60 min, followed by 3.8 µl of TURBO DNase inactivation buffer. One microgram of DNase-treated RNA was then taken to reverse transcription. Briefly, 11 µl (500 ng µl−1) RNA was mixed with 2 µl µM oJBL789, incubated at 65 °C for 5 min and placed on ice. Seventeen microliters of reverse transcription master mix was then added (4 µl 5× FS buffer, 1 µl 0.1 M dithiothreitol, 1 µl 10 mM dNTP mix, and 1 µl SSIII (Thermo Fisher)), and the reaction was incubated at 55 °C for 60 min, followed by 80 °C for 10 min. Half of the reverse transcription reaction was then directly amplified for PCR1 (12.5 µl KAPA2G Robust master mix, 1.25 µl 10 µM oJBL790, and 1.25 µl 10 µM Nextera P7 indexed primers), with cycling parameters of 1 min at 95 °C, and four cycles of 15 s at 95 °C, 15 s at 65 °C and 30 s at 72 °C, followed by 4 °C hold. Reactions were cleaned up with Ampure XP beads at 1.5 ×, and eluted in 12 µl of water. PCR2 proceeded as for libraries prepared from genomic DNA, with oJBL077 and oJBL076, and reactions were stopped at inflexion point from qPCR tracking. Libraries were then cleaned up with Ampure XP beads at 1×.

Bulk MPRA data processing and quantification

Final libraries were quantified with TapeStation D1000 HS (Agilent) for final quality assessment, and adjusted to final 2 nM on the basis of the TapeStation quantification. Libraries were pooled and paired-end sequenced on NextSeq2000 with the following primers and cycle numbers: read1 (BC forward): 15 cycles, primer oJBL072; index1 (sample index): 10 cycles, primer oJBL760; read2 (UMI): 10 cycles, primer oJBL761; index2 (BC reverse): 15 cycles, primer oJBL074.

Sequencing data were demultiplexed using bcl2fastq. Forward and reverse mBC reads were joined and error corrected with PEAR v0.9.11 (options: -v 15 -m 15 -n 15 -t 15). Using custom Python and R scripts, successfully assembled barcode reads were combined with UMI reads, BC–UMI pairs were counted, and the read and UMI counts per BC were determined. The read and UMI counts for the BC present in the reporter pools (determined a priori; see section BC-CRE subassembly above) were collected as comparison to the bulk measurements.

Expression for each BC from the UMI counts table was computed as follows. First, the total UMI per sample (per cell line, replicate, and batch) to the BC in our list was determined for both RNA- and DNA-derived libraries. Each BC UMI count was then normalized by the summed of counts in its respective sample type (DNA and RNA). The normalized RNA UMI count was then divided by the normalized DNA UMI count (1% winsorized summed RNA over DNA UMI count), to generate the bulk MPRA-derived estimate of expression per BC.

Statistical analyses for individual elements were performed using BCalm R package, comparing activity estimates to those of standard negative control sequences.