May 26, 2026

A Cost-Optimized 5-Protein Panel Revolutionizes Systemic Lupus Erythematosus Diagnosis

  • Wenhua Lv1,
  • Zhenwei Shang1,
  • Chen Sun1,
  • Yuping Zou1,
  • Siyu Wei1,
  • Haiyan Chen1,
  • Junxian Tao1,
  • Hongsheng Tian1,
  • Yu Dong1,
  • Chen Zhang1,
  • Mingming Zhang1,
  • Hongchao Lv1,
  • Yongshuai Jiang1
  • 1College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
  • scientific work
Icon indicating open access to content
QR code linking to this content
Protocol CitationWenhua Lv, Zhenwei Shang, Chen Sun, Yuping Zou, Siyu Wei, Haiyan Chen, Junxian Tao, Hongsheng Tian, Yu Dong, Chen Zhang, Mingming Zhang, Hongchao Lv, Yongshuai Jiang 2026. A Cost-Optimized 5-Protein Panel Revolutionizes Systemic Lupus Erythematosus Diagnosis. protocols.io https://dx.doi.org/10.17504/protocols.io.e6nvwwxwzvmk/v1
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: May 25, 2026
Last Modified: May 26, 2026
Protocol  Integer ID: 317914
Keywords: SLE, protein risk score, prediction models, cost-optimized, PAF, protein panel revolutionizes systemic lupus erythematosus diagnosis, protein biomarkers for systemic lupus erythematosus, protein risk score, protein biomarker, plasma protein data from the uk biobank, protein panel, systemic lupus erythematosus, plasma protein data, associated protein, polygenic risk score, diagnostic performance, clinical risk factor, uk biobank, multiple imputation sensitivity check, sle, lasso regression
Funders Acknowledgements:
National Natural Science Foundation of China
Grant ID: 31970651
National Natural Science Foundation of China
Grant ID: 92046018
Excellent Youth Support plan of Education Department of Heilongjiang Province
Grant ID: YQJH2023036
Marshal Initiative Funding
Grant ID: HMUMIF-22010
Mathematical Tianyuan Fund of the National Natural Science Foundation of China
Grant ID: 12026414
Fundamental Research Funds for the Provincial Universities in Heilongjiang province, China
Grant ID: 2024, Wenhua Lv
Abstract
This protocol uses plasma protein data from the UK Biobank (~48,000 individuals, 544 SLE cases) to systematically identify and validate protein biomarkers for systemic lupus erythematosus (SLE). A balanced case-control sampling (BCCS) strategy with 1,000 iterations and LASSO regression is applied to select stable SLE-associated proteins. A protein risk score (ProtRS) is calculated, and its diagnostic performance is compared with polygenic risk score (PRS) and clinical risk factors. A cost-optimized 5-protein panel (TRIM21, SOD2, KLK3, IL15, ADIPOQ) achieving AUC=0.82 at 87% cost reduction is identified. The protocol also includes population attributable fraction (PAF) analysis and multiple imputation sensitivity checks.
Materials
1. Data
UK Biobank application – plasma protein levels (2,923 proteins), genetic PRS, clinical phenotypes (smoking, sleep duration, insomnia, PM2.5).

2. Software
R (≥4.2) with packages: `data.table`, `dplyr`, `glmnet`, `doParallel`, `foreach`, `pROC`, `caret`, `mice`, `VIM`, `graphPAF`.

3. Hardware
Multi-core computing recommended (parallel LASSO iterations).

4. Code and data availability
(1) All analysis R scripts are publicly available at GitHub: [https://github.com/lvwenhua1989/ProtRS_SLE](https://github.com/lvwenhua1989/ProtRS_SLE) and archived on Zenodo: [https://zenodo.org/records/20132707](https://zenodo.org/records/20132707).
(2) UK Biobank individual level data cannot be shared; access must be requested via [https://www.ukbiobank.ac.uk/register-apply/](https://www.ukbiobank.ac.uk/register-apply/).
1. Data preparation and preprocessing
Data preparation and preprocessing
Obtain UK Biobank approved data. Extract SLE cases using ICD-9 / ICD-10 and self-report. Exclude other autoimmune diseases.
Extract control individuals (n=48,036) from non-cancer self-reported population without SLE or related autoimmune conditions.
For each of the 2,923 plasma proteins, perform mean imputation for missing values (replace NA with the protein’s mean across all samples).
Center and scale each protein to zero mean and unit variance.
2. Identification of SLE‑associated proteins using LASSO with BCCS (1,000 iterations)
Identification of SLE‑associated proteins using LASSO with BCCS (1,000 iterations)
Set random seed for reproducibility (e.g., `set.seed(123)`).
For **iteration = 1 to 1,000**:
(1) Randomly sample from the control pool a number of controls equal to the number of SLE cases (n=544) → balanced case‑control set (1:1). 
(2) Run LASSO logistic regression (`glmnet`, family="binomial", alpha=1) with disease status as outcome and all 2,923 proteins as predictors.
(3) Select the penalty parameter λ by 10‑fold cross‑validation using `lambda.min` (the λ giving minimum mean cross‑validated error).
(4) Record the non‑zero coefficients and corresponding protein names.
 After 1,000 iterations, count the **selection frequency** for each protein (how many iterations it had non‑zero coefficient).
Retain proteins selected in **≥500 iterations (≥50% frequency)** as stable SLE‑associated biomarkers.
 For each retained protein, compute the **final coefficient** as the median of its non‑zero coefficients across all iterations where it was selected.
3. Calculate Protein Risk Score (ProtRS) and evaluate full 35‑protein model
Calculate Protein Risk Score (ProtRS) and evaluate full 35‑protein model
 For each individual j, compute ProtRS = Σ(β_i × Prot_ij) where β_i is the final coefficient from Step 2.5, and Prot_ij is the centered+scaled protein level.
 For each of 1,000 BCCS iterations (new random control subset each time):
(1) Split the balanced set into training (70%) and testing (30%).
(2) Fit a logistic regression model: `SLE ~ ProtRS + age + gender` on the training set.
(3) Predict probabilities on the test set. 
(4) Calculate AUC, accuracy, precision, sensitivity, specificity, F1 using `pROC` and `caret`.
Aggregate the 1,000 metrics: report median and 95% uncertainty interval (2.5th and 97.5th percentiles)
4. Compare ProtRS with PRS and clinical risk factors
Compare ProtRS with PRS and clinical risk factors
Extract the standard PRS for SLE.
Using the same 1,000 BCCS iterations as in Step 3, build logistic regression models:
(1) Model PRS: `SLE ~ PRS + age + gender`.
(2) Model clinical: `SLE ~ smoking + sleep_duration + insomnia + PM2.5 + age + gender`. 
(3) Categorise continuous variables as described in Methods: PM2.5, PRS, ProtRS by quartiles; sleep duration into <7h, 7‑8h, >9h.
 Evaluate performance metrics for each model similarly. Compare bar plots.
5. Cost-optimized incremental protein selection
Cost-optimized incremental protein selection
Rank the 35 stable proteins by **descending selection frequency** (from Step 2.3).
For k = 1 to 35: (1) Take the top k proteins. (3) Compute ProtRS_k as weighted sum using their median coefficients (from Step 2.5). (4) Repeat the 1,000 BCCS evaluation (Step 3.2) for this k‑protein ProtRS.
Plot performance metrics (AUC, accuracy, etc.) against number of proteins.
 Identify the “cost-effective optimum” where adding more proteins yields minimal gain. The top 5 proteins (TRIM21, SOD2, KLK3, IL15, ADIPOQ) achieve AUC=0.82 while reducing assay costs by ~87% compared to 35 proteins.
6. Population attributable fraction (PAF) calculation
Population attributable fraction (PAF) calculation
 For each risk factor (ProtRS, PRS, insomnia, sleep duration, PM2.5, smoking), discretise variables as described (e.g., ProtRS by quartiles, lowest quartile as reference).
 Using the `graphPAF` R package function `PAF_calc_discrete()`, compute PAF with the formula:  
PAF = 1 – Σ (p_i / RR_i) where p_i = proportion in exposure level i, RR_i = relative risk compared to optimal level (estimated from logistic regression).
Repeat for each of the 1,000 BCCS iterations to obtain median PAF and 95% UI.
7. Sensitivity analyses
Sensitivity analyses
Alternative imputation methods:
(1) Replace mean imputation (Step 1.3) with multiple imputation by chained equations (`mice`), median imputation, or k-nearest neighbours (k=5, `VIM::kNN`).
(2) Re-compute ProtRS and model performance (AUC).
(3) Verify high Pearson correlations (>0.96) with original ProtRS
Higher iteration stability
(1) Repeat Step 2 with **10,000 BCCS iterations** for LASSO, retaining proteins selected in ≥50% of iterations.
(2) Compare overlap with the 1,000‑iteration set using a Venn diagram (Figure S5). Expect >97% concordance.
Protocol references
1. Rees F, Doherty M, Grainge MJ, Lanyon P, Zhang W. The worldwide incidence and prevalence of systemic lupus erythematosus: a systematic review of epidemiological studies. Rheumatology. 2017;56(11):1945-61. doi: 10.1093/rheumatology/kex260. PubMed PMID: 28968809.
2. Crow MK. Pathogenesis of systemic lupus erythematosus: risks, mechanisms and therapeutic targets. Annals of the rheumatic diseases. 2023;82(8):999-1014. doi: 10.1136/ard-2022-223741. PubMed PMID: 36792346.
3. Lee YH, Choi SJ, Ji JD, Song GG. Overall and cause-specific mortality in systemic lupus erythematosus: an updated meta-analysis. Lupus. 2016;25(7):727-34. doi: 10.1177/0961203315627202. PubMed PMID: 26811368.
4. Barber MRW, Drenkard C, Falasinnu T, Hoi A, Mak A, Kow NY, et al. Global epidemiology of systemic lupus erythematosus. Nature reviews Rheumatology. 2021;17(9):515-32. doi: 10.1038/s41584-021-00668-1. PubMed PMID: 34345022; PubMed Central PMCID: PMC8982275.
5. Carter EE, Barr SG, Clarke AE. The global burden of SLE: prevalence, health disparities and socioeconomic impact. Nature reviews Rheumatology. 2016;12(10):605-20. doi: 10.1038/nrrheum.2016.137. PubMed PMID: 27558659.
6. Aringer M, Costenbader K, Daikh D, Brinks R, Mosca M, Ramsey-Goldman R, et al. 2019 European League Against Rheumatism/American College of Rheumatology Classification Criteria for Systemic Lupus Erythematosus. Arthritis & rheumatology. 2019;71(9):1400-12. doi: 10.1002/art.40930. PubMed PMID: 31385462; PubMed Central PMCID: PMC6827566.
7. Abozaid HSM, Hefny HM, Abualfadl EM, Ismail MA, Noreldin AK, Eldin ANN, et al. Negative ANA-IIF in SLE patients: what is beyond? Clinical rheumatology. 2023;42(7):1819-26. doi: 10.1007/s10067-023-06577-w. PubMed PMID: 37016193; PubMed Central PMCID: PMC10267001.
8. Qian H, Gao S, Zhang T, Xie Y, Chen S, Hong Y, et al. Identification of RSAD2 as a Key Biomarker Linking Iron Metabolism and Dendritic Cell Activation in Systemic Lupus Erythematosus Through Bioinformatics and Experimental Validation. Journal of inflammation research. 2025;18:3859-78. doi: 10.2147/JIR.S500115. PubMed PMID: 40109657; PubMed Central PMCID: PMC11920641.
9. Dai Y, Liu J, Lai Y, Gao F, Lin H, Zhang L, et al. Exploring mitochondrial and ferroptotic mechanisms for systemic lupus erythematosus biomarker identification and therapy. Scientific reports. 2025;15(1):9140. doi: 10.1038/s41598-025-93872-y. PubMed PMID: 40097571; PubMed Central PMCID: PMC11914642.
10. Wang ZY, Liu WJ, Jin QY, Zhang XS, Chu XJ, Khan A, et al. Machine Learning-Based Identification of Novel Exosome-Derived Metabolic Biomarkers for the Diagnosis of Systemic Lupus Erythematosus and Differentiation of Renal Involvement. Current medical science. 2025. doi: 10.1007/s11596-025-00023-5. PubMed PMID: 40019633.
11. Collins R. What makes UK Biobank special? Lancet. 2012;379(9822):1173-4. doi: 10.1016/S0140-6736(12)60404-8. PubMed PMID: 22463865.
12. Caron B, Patin E, Rotival M, Charbit B, Albert ML, Quintana-Murci L, et al. Integrative genetic and immune cell analysis of plasma proteins in healthy donors identifies novel associations involving primary immune deficiency genes. Genome medicine. 2022;14(1):28. doi: 10.1186/s13073-022-01032-y. PubMed PMID: 35264221; PubMed Central PMCID: PMC8905727.
13. Guo Y, You J, Zhang Y, Liu WS, Huang YY, Zhang YR, et al. Plasma proteomic profiles predict future dementia in healthy adults. Nature aging. 2024;4(2):247-60. doi: 10.1038/s43587-023-00565-0. PubMed PMID: 38347190.
14. Shen XN, Huang SY, Cui M, Zhao QH, Guo Y, Huang YY, et al. Plasma Glial Fibrillary Acidic Protein in the Alzheimer Disease Continuum: Relationship to Other Biomarkers, Differential Diagnosis, and Prediction of Clinical Progression. Clinical chemistry. 2023;69(4):411-21. doi: 10.1093/clinchem/hvad018. PubMed PMID: 36861369.
15. Budnik B, Amirkhani H, Forouzanfar MH, Afshin A. Novel proteomics-based plasma test for early detection of multiple cancers in the general population. BMJ oncology. 2024;3(1):e000073. doi: 10.1136/bmjonc-2023-000073. PubMed PMID: 39886137; PubMed Central PMCID: PMC11235013.
16. Majka DS, Holers VM. Cigarette smoking and the risk of systemic lupus erythematosus and rheumatoid arthritis. Annals of the rheumatic diseases. 2006;65(5):561-3. doi: 10.1136/ard.2005.046052. PubMed PMID: 16611864; PubMed Central PMCID: PMC1798134.
17. Kaul A, Gordon C, Crow MK, Touma Z, Urowitz MB, van Vollenhoven R, et al. Systemic lupus erythematosus. Nature reviews Disease primers. 2016;2:16039. doi: 10.1038/nrdp.2016.39. PubMed PMID: 27306639.
18. Manocha GD, Mishra R, Sharma N, Kumawat KL, Basu A, Singh SK. Regulatory role of TRIM21 in the type-I interferon pathway in Japanese encephalitis virus-infected human microglial cells. Journal of neuroinflammation. 2014;11:24. doi: 10.1186/1742-2094-11-24. PubMed PMID: 24485101; PubMed Central PMCID: PMC3922089.
19. Rakebrandt N, Lentes S, Neumann H, James LC, Neumann-Staubitz P. Antibody- and TRIM21-dependent intracellular restriction of Salmonella enterica. Pathogens and disease. 2014;72(2):131-7. doi: 10.1111/2049-632X.12192. PubMed PMID: 24920099.
20. Lenart M, Rutkowska-Zapala M, Szatanek R, Weglarczyk K, Stec M, Bukowska-Strakova K, et al. Reprint of: Alterations of TRIM21-mRNA expression during monocyte maturation. Immunobiology. 2017;222(6):841-5. doi: 10.1016/j.imbio.2017.05.005. PubMed PMID: 28576352.
21. Perera PY, Lichy JH, Waldmann TA, Perera LP. The role of interleukin-15 in inflammation and immune responses to infection: implications for its therapeutic use. Microbes and infection. 2012;14(3):247-61. doi: 10.1016/j.micinf.2011.10.006. PubMed PMID: 22064066; PubMed Central PMCID: PMC3270128.
22. Jiang J, Yang M, Yang B, Wu H, Lu Q. Elevated IL-15 levels in systemic lupus erythematosus: potential pathogenesis insight and therapeutic target. International immunopharmacology. 2024;142(Pt A):112973. doi: 10.1016/j.intimp.2024.112973. PubMed PMID: 39217881.
23. Patidar M, Yadav N, Dalai SK. Interleukin 15: A key cytokine for immunotherapy. Cytokine & growth factor reviews. 2016;31:49-59. doi: 10.1016/j.cytogfr.2016.06.001. PubMed PMID: 27325459.
24. Prokunina L, Castillejo-Lopez C, Oberg F, Gunnarsson I, Berg L, Magnusson V, et al. A regulatory polymorphism in PDCD1 is associated with susceptibility to systemic lupus erythematosus in humans. Nature genetics. 2002;32(4):666-9. doi: 10.1038/ng1020. PubMed PMID: 12402038.
25. Chen W, Wu S, Li G, Duan X, Sun X, Li S, et al. Accurate diagnosis of prostate cancer with CRISPR-based nucleic acid test strip by simultaneously identifying PCA3 and KLK3 genes. Biosensors & bioelectronics. 2023;220:114854. doi: 10.1016/j.bios.2022.114854. PubMed PMID: 36327902.
26. Liu K, Li QZ, Delgado-Vega AM, Abelson AK, Sanchez E, Kelly JA, et al. Kallikrein genes are associated with lupus and glomerular basement membrane-specific antibody-induced nephritis in mice and humans. The Journal of clinical investigation. 2009;119(4):911-23. doi: 10.1172/JCI36728. PubMed PMID: 19307730; PubMed Central PMCID: PMC2662554.
27. Cohen P, Graves HC, Peehl DM, Kamarei M, Giudice LC, Rosenfeld RG. Prostate-specific antigen (PSA) is an insulin-like growth factor binding protein-3 protease found in seminal plasma. The Journal of clinical endocrinology and metabolism. 1992;75(4):1046-53. doi: 10.1210/jcem.75.4.1383255. PubMed PMID: 1383255.
28. Kodak JA, Mann DL, Klyushnenkova EN, Alexander RB. Activation of innate immunity by prostate specific antigen (PSA). The Prostate. 2006;66(15):1592-9. Epub 2006/05/10. doi: 10.1002/pros.20414. PubMed PMID: 16683268.
29. White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. 2011;30(4):377-99. doi: https://doi.org/10.1002/sim.4067.
30. Kowarik A, Templ M. Imputation with the R Package VIM. Journal of statistical software. 2016;74. doi: 10.18637/jss.v074.i07.
31. Choi MY, Malspeis S, Sparks JA, Cui J, Yoshida K, Costenbader KH. Association of Sleep Deprivation and the Risk of Developing Systemic Lupus Erythematosus Among Women. Arthritis care & research. 2023;75(6):1206-12. doi: 10.1002/acr.25017. PubMed PMID: 36094865; PubMed Central PMCID: PMC10008454.
32. Young KA, Munroe ME, Harley JB, Guthridge JM, Kamen DL, Gilkensen GS, et al. Less than 7 hours of sleep per night is associated with transitioning to systemic lupus erythematosus. Lupus. 2018;27(9):1524-31. doi: 10.1177/0961203318778368. PubMed PMID: 29804502; PubMed Central PMCID: PMC6026567.
33. Palma BD, Tufik S. Increased disease activity is associated with altered sleep architecture in an experimental model of systemic lupus erythematosus. Sleep. 2010;33(9):1244-8. doi: 10.1093/sleep/33.9.1244. PubMed PMID: 20857872; PubMed Central PMCID: PMC2938866.
34. Balbo M, Leproult R, Van Cauter E. Impact of sleep and its disturbances on hypothalamo-pituitary-adrenal axis activity. International journal of endocrinology. 2010;2010:759234. doi: 10.1155/2010/759234. PubMed PMID: 20628523; PubMed Central PMCID: PMC2902103.
35. Redwine L, Hauger RL, Gillin JC, Irwin M. Effects of sleep and sleep deprivation on interleukin-6, growth hormone, cortisol, and melatonin levels in humans. The Journal of clinical endocrinology and metabolism. 2000;85(10):3597-603. doi: 10.1210/jcem.85.10.6871. PubMed PMID: 11061508.
36. Robeva R, Tanev D, Kirilov G, Stoycheva M, Tomova A, Kumanov P, et al. Decreased daily melatonin levels in women with systemic lupus erythematosus - a short report. Balkan medical journal. 2013;30(3):273-6. doi: 10.5152/balkanmedj.2013.8064. PubMed PMID: 25207118; PubMed Central PMCID: PMC4115899.
37. Vaccaro A, Kaplan Dor Y, Nambara K, Pollina EA, Lin C, Greenberg ME, et al. Sleep Loss Can Cause Death through Accumulation of Reactive Oxygen Species in the Gut. Cell. 2020;181(6):1307-28 e15. doi: 10.1016/j.cell.2020.04.049. PubMed PMID: 32502393.
38. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software. 2010;33(1):1-22. PubMed PMID: 20808728; PubMed Central PMCID: PMC2929880.
39. Rockhill B, Newman B, Weinberg C. Use and misuse of population attributable fractions. American journal of public health. 1998;88(1):15-9. doi: 10.2105/ajph.88.1.15. PubMed PMID: 9584027; PubMed Central PMCID: PMC1508384.
40. Ferguson J, O'Connell M. Estimating and displaying population attributable fractions using the R package: graphPAF. European journal of epidemiology. 2024;39(7):715-42. doi: 10.1007/s10654-024-01129-1. PubMed PMID: 38971917; PubMed Central PMCID: PMC11343908.

Acknowledgements
This research has been conducted using the UK Biobank Resource under Application Number 89695. The work was supported by grants listed in the manuscript.