X-PIE modeling

Zhou Gong; Beirong Zhang; Xiaofang He; Bowen Zhong; Yueling Zhu; Lichun He; Kangning Tan; Zhu Liu; Jing Chen; Zhen Liang; Xu Zhang; Yukui Zhang; Lihua Zhang; Maili Liu; Qun Zhao

Jun 15, 2026

X-PIE modeling

DOI

https://dx.doi.org/10.17504/protocols.io.14egnpbdqv5d/v1

Zhou Gong¹,
Beirong Zhang²,
Xiaofang He¹,
Bowen Zhong²,
Yueling Zhu¹,
Lichun He¹,
Kangning Tan³,
Zhu Liu³,
Jing Chen²,
Zhen Liang²,
Xu Zhang¹,
Yukui Zhang²,
Lihua Zhang²,
Maili Liu¹,
Qun Zhao²

¹Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences;
²Dalian Institute of Chemical Physics, Chinese Academy of Sciences;
³Huazhong Agricultural University

X-PIE

Zhou Gong

Innovation Academy for Precision Measurement Science and Tec...

DOI: https://dx.doi.org/10.17504/protocols.io.14egnpbdqv5d/v1

Protocol Citation: Zhou Gong, Beirong Zhang, Xiaofang He, Bowen Zhong, Yueling Zhu, Lichun He, Kangning Tan, Zhu Liu, Jing Chen, Zhen Liang, Xu Zhang, Yukui Zhang, Lihua Zhang, Maili Liu, Qun Zhao 2026. X-PIE modeling. protocols.io https://dx.doi.org/10.17504/protocols.io.14egnpbdqv5d/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: June 03, 2026

Last Modified: June 15, 2026

Protocol Integer ID: 318475

Keywords: crosslinking mass spectrometry (XL-MS), protein-protein interaction, interaction interface, distance restraint, conformational sampling, dimensional structural models of protein, protein complex, crosslinking mass spectrometry, modes between protein pair, protein pair, protein, mass spectrometry, conformational sampling, dimensional structural model

Abstract

X-PIE modeling is a computational pipeline for building three-dimensional structural models of protein-protein complexes from crosslinking mass spectrometry (XL-MS) data. The method clusters crosslink sites into spatially distinct interaction interfaces prior to conformational sampling, enabling modeling of multiple binding modes between protein pairs. This protocol describes the complete workflow from input data preparation to final model generation.

Guidelines

Overview & Scope
X-PIE modeling is a computational pipeline that builds three-dimensional structural models of protein-protein complexes from crosslinking mass spectrometry (XL-MS) data. The method first clusters crosslink sites into spatially distinct interaction interfaces, then performs simulated annealing under distance restraints to generate structural ensembles. This protocol is designed for users who have already obtained a filtered PPI list (e.g., from X-PIE curation) and wish to proceed to atomic-level structural modeling. The pipeline is implemented as a single Python script that orchestrates domain partitioning, interface clustering, restraint generation, and conformational sampling via XPLOR-NIH.

Prerequisites & Skills
Operating system: Linux/Unix environment with Python 3.13 installed.
Programming skills: Intermediate command-line proficiency; ability to edit configuration files and monitor batch jobs.
Domain knowledge: Familiarity with PDB file formats, protein domain architecture, and basic computational structure calculation concepts (XPLOR-NIH syntax is not required but helpful for troubleshooting).

Materials

CPU: Multi-core workstation or HPC node recommended. Step 4 (XPLOR simulated annealing) is the bottleneck and scales sub-linearly with core count.
Memory: at least 4 GB RAM 
Disk space: ~100 MB-1 GB per PPI depending on ensemble size and protein size.
Runtime: Step 1-3 (analysis + preparation) take minutes. Step 4 (sampling) takes hours (e.g., 24 structures × 8 cores ≈ 2-6 hours for a 300-residue complex). Plan compute time accordingly.

Troubleshooting

Problem

Missing PDB files

Solution

Ensure every protein name in link.dat has a matching <name>.pdb.

Problem

DSSP (mkdssp) not found

Solution

Install mkdssp and verify it is in your PATH.

Problem

XPLOR not found

Solution

Set correct paths in x-pie.cfg or ensure xplor is in your PATH.

Problem

PSF generation fails

Solution

Check that PDB files use standard atom and residue names. AlphaFold models usually work without modification.

Problem

Empty or single-state clusters

Solution

With very few crosslinks, only one interface state may be found. This is expected behavior. Verify that NZ atoms exist at the specified residue numbers

Safety warnings

Input file preparation can be time-consuming for large protein systems. Please check the output regularly to ensure no issues have occurred.

Before start

The following software or components must be installed and confirmed to be functioning correctly prior to running X-PIE modeling.
The following component versions have been tested and confirmed to operate normally. Other versions may work but have not been validated.

Python  3.13.2
Biopython  1.85
NumPy   1.26.4
Matplotlib   3.10.3
DSSP (mkdssp)  4.6.1
XPLOR-NIH    3.10

Prepare Input Files

Place the following in your working directory:

    link.dat                    # Crosslink data file
    protein_A.pdb         # PDB file for protein A
    protein_B.pdb         # PDB file for protein B

Crosslink data file (link.dat)

Format: whitespace-separated text, one crosslink per line, four columns:

Protein_A  Residue_Number_A  Protein_B  Residue_Number_B

Example:
    actin   328   cofilin   144
    actin   326   cofilin   144
    actin    50   cofilin    22

Note
- Protein names must match PDB filenames (case-sensitive)
- Residue numbers must correspond to crosslinked residues
- Each protein in link.dat must have a corresponding pdb file

PDB Files

- One PDB file per protein named exactly as in link.dat
- Crosslinked residues must contain complete side chains
- Missing residues at non-crosslinked sites are acceptable
- If the crosslinked residue is not lysine (lacking an NZ atom), for example when the crosslink occurs at the protein N-terminus, the program automatically uses the backbone N atom of the corresponding crosslinked residue as the target for computational analysis.

Generate Configuration File

Execute the script once to generate the default configuration file:

python x-pie-modeling.py
Expected Behavior: If x-pie.cfg does not exist, the program creates it and exits immediately with a message requesting you to edit XPLOR paths.

Open x-pie.cfg and set the following required paths:

[paths]
xplor_home =/path/to/xplor/toppar/
xplor_bin  = /path/to/xplor/bin/xplor

Verify these files exist in xplor_home:
/toppar/topallhdg_new.pro
/toppar/parallhdg_new.pro
/toppar/toph11.pep

Run Crosslink Analysis

Execute Command

python x-pie-modeling.py
This command runs the program in interactive mode. At each step, the program will prompt you to enter relevant information while simultaneously displaying the default values for these parameters.

The program will first prompt you to enter the XPLOR toppar directory and the XPLOR executable path like the following. 

Enter XPLOR toppar directory [/opt/xplor-nih-3.10/toppar]: 
Enter XPLOR executable path [/opt/xplor-nih-3.10/bin/xplor]:

If these have been defined in the preceding steps, they will be displayed as default paths. If they were not defined earlier, you can specify them at this step.

The program will then proceed to Step 0: Checking dependencies. 
It automatically verifies whether the aforementioned software has been properly installed. If any dependency is missing, a warning message will be displayed and the program will exit. If all dependencies are correctly installed, the program will prompt you to enter the cross-link data file and pdb file (including name and path)

Enter cross-link data file path [./link.dat]:
Enter PDB file directory [.]:
The program will then automatically check whether the PDB files match the information specified in the cross-link data. If no issues are detected, the user will be prompted to enter the cross-linker arm length.

Enter cross-linker arm length (Angstroms) [15]:
The program will then ask whether to consider the flexibility of loop regions containing cross-linked sites. For sampling efficiency, the default parameter is set to not consider loop flexibility.

Enable flexible loop handling for intra-domain cross-link sites? (y/n) [n]:

The program will then proceed to execute Step 1: Running xl_analysis.py, generating definitions of protein-protein interaction interfaces and relevant information regarding structural flexibility . It will pause to await user review of these outputs, during which the generated information can be modified as needed based on specific circumstances. 

The following is a description of each output file (locate in the path /interface-define/) produced in this step:

Interface clustering:
A text file named actin_cofilin_clusters.txt (the filename contains the names of both cross-linked proteins), with the following contents:

# Protein_A	Site_A	State_A	Protein_B	Site_B	State_B
actin	328	1	cofilin	144	2
actin	326	1	cofilin	144	2
actin	50	1	cofilin	22	1
actin	50	1	cofilin	127	1
Here, actin serves as the anchor protein. The X-PIE analysis indicates that cofilin forms two distinct interaction interfaces with actin, corresponding to the State_B labels 1 and 2, respectively. These two interaction interfaces are associated with their corresponding cross-linked sites. For example, 50-22 and 50-127 correspond to one interaction interface, while 328-144 and 326-144 correspond to another interaction interface.

Note
If modification of the interface definitions is needed, for example, merging multiple interfaces into a single interface or splitting them into additional interfaces, simply edit the corresponding state numbers in the last column.

Domain boundaries:
The output files are text documents named with the protein name (identical to the PDB filename) followed by the "_domains" suffix. Using the example system, two files will be generated: actin_domains.txt and cofilin_domains.txt. The contents are as follows:

actin:1-375
cofilin:1-166
These lines define the structural domain ranges for each protein. In this example, both proteins are treated as single intact domains, with no flexible linker regions within either protein.

Note
If modification of the domain range is needed, simply edit the corresponding numbers in the file. Multiple distinct domains should be separated by spaces.

Linker/loop regions:
The output files are text documents named with the protein name followed by the "_linker" suffix, defining the amino acid ranges of linker regions. In the actin-cofilin example, since there are no linker regions and each protein is treated as a single domain, the linker files contain only the protein name with an empty amino acid range, as shown below:

actin:
cofilin:

Note
If modification or definition of linker amino acid ranges is needed, simply add the linker range after the colon, for example, 25-35. 
It should be noted that the amino acid ranges defined for linkers must not overlap with those defined in the domain files; each amino acid can belong exclusively to either a linker or a domain. Consequently, when modifying linker ranges, the corresponding domain ranges must be adjusted simultaneously, and vice versa. 
Furthermore, the amino acid ranges defined for linkers, combined with those defined in the domain files, should cover all amino acids present in the protein PDB file. Any undefined amino acids will exhibit incorrect positional distributions during subsequent sampling steps.
There will also be text files in the current directory containing the "loop" suffix. If "Y" was selected in the preceding step "Enable flexible loop handling for intra-domain cross-link sites?", then for cross-linked sites located within unstructured loop regions inside a domain, the amino acid ranges of the loops containing these cross-linked sites (5 residues) will be output. If loop flexibility was not enabled, or if the cross-linked sites are not located in loop regions, the amino acid ranges in these loop files will be empty.

Meanwhile, a PNG image will also be generated in the interface-define/ directory, depicting the domain distribution of the proteins, the connections between cross-linked sites, and the different interaction interfaces.

After completing the modification and confirmation of the above file information, press Enter to proceed to the next step.

Generate Input Files for Sampling

The program will first ask for the number of structures to calculate, which corresponds to the number of repeated Monte Carlo simulated annealing sampling runs.

Enter number of structures to calculate [24]:
The program will then proceed to execute Step 2: Running generate-script.py, The program will then generate the restraint file (xlms.tbl) in the input/ folder for use during sampling, and create the sampling script refine.py in the current root directory.
The xlms.tbl file follows the format shown below:

# Cross-linker length: 15.0 A
# Format: assign (segid X and resi N and name nz) (segid Y and resid M and name nz) 10.0 6.0 11.0

assign (segid ALT1 and resi 328  and name nz) (segid BLT2 and resid 144  and name nz) 10.0 6.0 11.0
assign (segid ALT1 and resi 326  and name nz) (segid BLT2 and resid 144  and name nz) 10.0 6.0 11.0
assign (segid ALT1 and resi 50  and name nz) (segid BLT1 and resid 22  and name nz) 10.0 6.0 11.0
assign (segid ALT1 and resi 50  and name nz) (segid BLT1 and resid 127  and name nz) 10.0 6.0 11.0
assign (segid ALT1 and resi 61  and name nz) (segid BLT1 and resid 127  and name nz) 10.0 6.0 11.0
Here, segid ALT1 and BLT1/BLT2 are automatically assigned segment IDs for the two proteins in the complex. Based on the preceding definition that cofilin forms two distinct interaction interfaces with actin, cofilin is therefore assigned two different segment IDs: BLT1 and BLT2.

name nz indicates that the distance restraints are based on NZ atom distances; this can also be manually modified to other atoms if desired.

For the final three numerical columns, the first column minus the second column (i.e., 10 − 6 = 4) corresponds to the lower bound of the distance restraint (4 Å), while the first column plus the third column (i.e., 10 + 11 = 21) corresponds to the upper bound (21 Å).
It should be noted that for any system, the lower bound reflects the sum of van der Waals radii and is therefore fixed. The upper bound, however, is automatically calculated by the program based on the previously defined cross-linker arm length: upper bound = cross-linker arm length + 6 Å.

The user may also review and modify the sampling script refine.py to adjust parameters such as the number of structures, file paths, and restraint settings. 

After confirmation, the program will automatically proceed to Step 3: Running pdbpsf-prepare.py, which generates the PDB and PSF files required for sampling. The runtime for this step typically ranges from several minutes depending on the system size; larger systems require longer processing times.

Upon completion of this step, PDB and PSF files with "prepare" in their filenames will be generated in the input/ directory. These PDB files can be opened with PyMOL or other visualization software for inspection.

Run Conformational Sampling

After completing and confirming the above steps, the program will prompt whether to initiate sampling. Since the sampling process can be quite lengthy, the default option is not to start sampling. If the default option is selected, X-PIE will exit and provide the sampling command script, as shown below, which can be defined and executed separately:

/opt/xplor-nih-3.10/bin/xplor -py refine.py -smp 8
The number 8 in the command line represents the number of CPU threads to be used for parallel computation.

If you choose to initiate sampling immediately, the program will also prompt for the number of CPU threads to use. After confirmation, X-PIE will exit and automatically launch XPLOR in the background to perform structural sampling.

Model Selection and Analysis

After sampling is complete, `Calc_*.pdb` files and corresponding `Calc_*.viols` files will be generated in the output/ directory. In the REMARK section at the beginning of each PDB file, various energy values of the structure and the number of crosslink violations are listed. 
In the corresponding viols files, detailed violation information can be inspected; for crosslinks, if violations occur, the calculated distance between the crosslinked sites in the structure and the deviation from the defined distance restraint are displayed.

In the scripts/ folder, we provide the `analyze_structures.py` script for structural analysis. After inputting the path to the structures, the number of structures to be analyzed, and reference PDB structure information (if available), the script automatically calculates the total energy of these structures and ranks them. It selects structures with the lowest energy and no significant atomic clashes, and computes RMSD values reflecting structural convergence. If a reference structure is provided, the script also calculates the RMSD values of these structures relative to the reference structure.

Batch Modeling Using Configuration Files

In addition to the default interactive mode, X-PIE modeling supports batch processing of multiple complexes through configuration file settings. The format of the configuration file is as follows:

[data]
    link_file = path to crosslink file        (default: link.dat)
    pdb_dir   = directory containing PDBs     (default: ./)

[paths]
    xplor_home = path to XPLOR toppar/ directory  [REQUIRED]
    xplor_bin  = path to XPLOR executable         [REQUIRED]

[parameters]
    linker_length        = crosslinker arm length (Å)      (default: 15)
    num_structures       = number of models to generate    (default: 24)
    num_cores            = CPU cores for XPLOR sampling    (default: 8)
    use_loop_flexibility = enable flexible loop handling   (default: false)

[workflow]
    interactive            = false       (default: true)
    pause_after_analysis   = false            (default: true)
    pause_after_generate   = false            (default: true)
    auto_run_sampling      = true   (default: false)
    edit_timeout_seconds   = auto-continue timeout (s)     (default: 0)
After completing the configuration, execute:

python x-pie-modeling.py --config x-pie.cfg
The program will then complete all steps automatically in the background.
By configuring different link.dat files and different initial PDB files in the config file, continuous batch processing of multiple protein complex modeling jobs can be achieved. 
However, it should be noted that the config-based execution mode does not allow manual intervention for script modification. Therefore, for complex systems, it is recommended to first use the interactive mode to confirm interaction interfaces, protein flexibility, and other relevant information.


Note
All associated source code, README documentation, and representative input/output files are publicly available on Zenodo (10.5281/zenodo.20523994).