Peer Review Auditing and Evaluation Protocol

Paola Di Maio

Apr 24, 2026

Peer Review Auditing and Evaluation Protocol

DOI

https://dx.doi.org/10.17504/protocols.io.36wgqxzwylk5/v1

Paola Di Maio^1,2,3

¹CSKRNS;
²Ronin;
³Center for Systems, Knowledge Representation and Neuroscience

Non Invasive Neuromodulation

Paola Di Maio

CSKRNS, Ronin, Center for Systems, Knowledge Representation ...

DOI: https://dx.doi.org/10.17504/protocols.io.36wgqxzwylk5/v1

Protocol Citation: Paola Di Maio 2026. Peer Review Auditing and Evaluation Protocol. protocols.io https://dx.doi.org/10.17504/protocols.io.36wgqxzwylk5/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: April 23, 2026

Last Modified: April 24, 2026

Protocol Integer ID: 315590

Keywords: peer review, evaluation, research integrity, open audit methodology for academic process, peer review auditing, open audit methodology, structured audit methodology for the evaluation, structured audit methodology, peer review, auditing, documents the auditing, audit layer, review management platform, existing review management platform, evaluation protocol, preserving reviewer anonymity, protocol ensures process integrity, integrity, reviewer anonymity, academic process, journal, academic decision process, fairness of academic decision process, review, evaluation, structured metadata, editorial manager, protocol, document, oamap

Disclaimer

Use your discretion at all times

Abstract

This protocol specifies a structured audit methodology for the evaluation of the integrity and fairness of academic decision processes -- peer review (journal and conference), grant funding allocation, and career progression decisions. The protocol ensures process integrity, fairness, and transparency while preserving reviewer anonymity. It is designed to be integrated with existing review management platforms (OpenReview, OJS, EasyChair, Editorial Manager) with minimal modificationm as well as with a standalone model card, which standardizes and documents the auditing and evaluation process.https://rpmc-seven.vercel.app/

 The protocol operationalizes the OAMAP (Open Audit Methodology for Academic Processes) framework through five audit layers, each generating structured metadata that can be independently verified.

Guidelines

Beta version, s protocol enables venues (journals, conferences, funding bodies) to produce auditable records of their assessment processes. The goal is to make unfairness detectable without exposing reviewer identities -- distinguishing anonymity (which protects reviewers) from opacity (which shields processes from scrutiny).
Scope
The protocol applies to:
Journal peer review (submission to decision)
Conference peer review (submission to decision)
Grant funding review (application to decision)
Promotion and hiring decisions (application to decision)
Prerequisites
Access to the venue's submission management system
Authority to configure audit logging (typically: editor-in-chief, programme chair, or grants officer)
The OAMAP Review Process Model Card Generator (available at: [PLACEHOLDER: rpmc-seven.vercel.app])
Digital Forensics Model Card Generator for AI tool documentation (available at: https://huggingface.co/spaces/STARBORN/forensics_mc_generator)
Key Principle: Anonymity vs Opacity
This protocol preserves ANONYMITY: reviewer identities are never exposed in audit records. All reviewer-level data uses pseudonymous identifiers (Reviewer-A, Reviewer-B).
This protocol eliminates OPACITY: every procedural step is timestamped and recorded. Process irregularities become detectable by authorised parties.
uires testing and eveluatio evaluators are invited to give feedback to the author

Materials

Submission data
Tools for documentation https://rpmc-seven.vercel.app/

Safety warnings

Be ready to defend the right to transparent decision making in peer review process

Before start

Before You Start
PurposeThis protocol enables venues (journals, conferences, funding bodies) to produce auditable records of their assessment processes. The goal is to make unfairness detectable without exposing reviewer identities -- distinguishing anonymity (which protects reviewers) from opacity (which shields processes from scrutiny).
Scope
The protocol applies to:
Journal peer review (submission to decision)
Conference peer review (submission to decision)
Grant funding review (application to decision)
Promotion and hiring decisions (application to decision)
Prerequisites
Access to the venue's submission management system
Authority to configure audit logging (typically: editor-in-chief, programme chair, or grants officer)
The OAMAP Review Process Model Card Generator (available at: [PLACEHOLDER: rpmc-seven.vercel.app])
Digital Forensics Model Card Generator for AI tool documentation (available at: https://huggingface.co/spaces/STARBORN/forensics_mc_generator)
Key Principle: Anonymity vs Opacity
This protocol preserves ANONYMITY: reviewer identities are never exposed in audit records. All reviewer-level data uses pseudonymous identifiers (Reviewer-A, Reviewer-B).
This protocol eliminates OPACITY: every procedural step is timestamped and recorded. Process irregularities become detectable by authorised parties.

Step 1: Publish Evaluation Criteria
Action: Publish the complete evaluation criteria that will be used to assess submissions. Make these publicly accessible before the submission deadline.
Record: URL of published criteria document; date of publication; version number.
Rationale: Reviews must demonstrably map to published criteria. A review that rejects on grounds not in the published criteria creates an audit flag. This does not mean the decision is wrong, but the deviation is recorded.
Expected Output: A publicly accessible document listing all evaluation dimensions, scoring scales, and acceptance thresholds.

Step 2: Register AI Tools (Layer 0 – Tool Documentation)
Action: For every AI tool that will be used in the assessment pipeline (LLM detectors, plagiarism checkers, statistical screeners, automated formatters, scope classifiers), complete a Digital Forensics Model Card (DF-MC) using the generator at https://huggingface.co/spaces/STARBORN/forensics_mc_generator.
Record for each tool:
Tool name and version
Forensic domain classification
Reasoning type (deductive, inductive, abductive, hybrid)
Known bias types and causes
Known error types and causes (especially false positive rates)
Which forensic process stages the tool participates in
Whether human review is required before acting on outputs
Known limitations (language coverage, text length sensitivity, adversarial vulnerabilities)
Store: Save the JSON output in the venue’s DF-MC Registry (a designated folder or database accessible to authorised auditors).
Rationale: Every AI tool used in assessment must be documented before it processes any submission. This is analogous to the tool validation requirements in ISO/IEC 17025 for forensic laboratories.
Expected Output: One DF-MC JSON file per AI tool, stored in the venue’s registry.

Step 3: Configure Conflict of Interest Policy
Action: Document the conflict of interest (COI) criteria that will be applied during reviewer assignment. Specify:
Co-authorship lookback period (e.g., 3 years, 5 years)
Institutional affiliation conflicts
Funding source conflicts
Personal relationship declarations
Whether COI checks are automated, manual, or both
Record: COI policy document; date of adoption; method of enforcement.
Expected Output: A COI policy document linked to the venue’s submission system.

Step 4: Configure Appeal Pathway
Action: Document the appeal process, including:
Eligibility criteria for appeals
Timeline for filing an appeal
How appeal reviewers are assigned (must not include original decision-makers)
How appeal outcomes are communicated
Whether aggregate appeal statistics will be published
Record: Appeal policy document; date of adoption.
Expected Output: A published appeal policy accessible to submitters.

Step 5: Receive and Timestamp Submission
Action: Upon receipt of each submission, the system generates:
A unique submission identifier
A cryptographic timestamp (ISO 8601 format)
A completeness check confirmation
An anonymisation verification (for double-blind venues)
Record: All Layer 1 metadata is accessible to the submitting author in real time.
Audit Rule: No submission may be silently discarded, delayed, or rerouted without a timestamped record.
Expected Output: A Layer 1 audit record per submission.

Step 6: Run AI Tools and Log Invocations
Action: For each AI tool invoked on the submission, log:
Tool identifier and version (linking to the DF-MC in the registry)
Timestamp of invocation
Input hash (NOT the submission content – a cryptographic hash preserving confidentiality)
Output classification (e.g., “human-authored”, “AI-flagged”, “clear”, “suspicious”)
Confidence score
Decision threshold applied
Whether the output was reviewed by a human before any action was taken
Record: AI tool invocation log, linked to the submission identifier.
Audit Rule: Every AI tool invocation must be logged. If a tool is invoked without logging (detected via reconciliation between tool invocation counts and audit event counts), the gap is flagged.
Expected Output: One audit event per AI tool invocation per submission.

Step 7: Record Desk Rejection (If Applicable)
Action: If the submission is desk-rejected, record:
Timestamp of desk rejection decision
Coded reason from a published taxonomy (e.g., “out-of-scope”, “formatting-non-compliant”, “duplicate-submission”, “integrity-flag”, “capacity-constraint”)
Whether any AI tool output contributed to the desk rejection decision
Identity of the human decision-maker (pseudonymised in the audit trail)
Record: Desk rejection record linked to the submission identifier.
Audit Rule: Desk rejections without a coded reason create an audit flag.
Expected Output: A Layer 1 desk rejection record (if applicable).

Step 8: Assign Reviewers and Log Assignment Metadata
Action: Assign reviewers and record:
Number of reviewers assigned
Timestamp of assignment
Confirmation that COI checks were performed per the venue’s stated policy
Method of COI check (automated, manual, or both)
Any reassignments with coded reasons
Record: Layer 2 metadata. Authors see process metadata (number of reviewers, timeline) but NOT reviewer identities.
Audit Rule: The number of reviewers assigned must meet the venue’s stated minimum. If fewer reviewers are assigned, a coded reason is required.
Expected Output: A Layer 2 audit record per submission.

Step 9: Monitor Review Completion
Action: For each assigned reviewer, record:
Timestamp of assignment acceptance
Timestamp of review submission
Completeness check: does the review address all required evaluation criteria? (This is a structural check, not a content assessment)
Word count (as a rough proxy for engagement)
Whether actionable feedback is provided
Whether the review engages with the methodology
Timeline compliance: was the review submitted within the venue’s stated deadline?
Record: Layer 3 metadata, pseudonymised (Reviewer-A, Reviewer-B, etc.).
Audit Rule: A review that does not address all published evaluation criteria creates an audit flag. A review submitted after deadline creates an audit flag (with coded reason if extension was granted).
Expected Output: A Layer 3 audit record per review, pseudonymised.

Step 10: Record the Decision
Action: Record the editorial, panel, or committee decision:
Timestamp of decision
Decision outcome (accept, minor-revision, major-revision, reject, desk-reject, withdrawn)
How individual scores mapped to the final decision
Whether the decision was consistent with the aggregate reviewer recommendation
If the decision overrode the aggregate recommendation: a coded justification category (editorial-judgment, scope-mismatch, integrity-concern, capacity-constraint, other)
Any use of editorial override
Whether minority reviewer opinions were noted
Record: Layer 4 metadata.
Audit Rule: A decision that contradicts the aggregate reviewer recommendation without a coded justification creates an audit flag.
Expected Output: A Layer 4 decision record per submission.

Step 11: Collect Aggregated Demographics
Action: At the decision stage, compile aggregated demographic data for fairness analysis:
Acceptance rate by author region
Acceptance rate by career stage
Acceptance rate by institution type
Acceptance rate by language background
Desk rejection rate by the same categories
AI tool flag rate by the same categories
Record: Aggregated statistics only – never individual-level data. Apply differential privacy techniques where sample sizes are small.
Rationale: This enables detection of systematic bias (e.g., are non-native English speakers being desk-rejected at higher rates after AI tool flagging?).
Expected Output: Aggregated fairness statistics, published as part of the venue’s annual report.

PHASE 6: POST-DECISION (Layer 5 – Accountability)

Step 12: Process Appeals
Action: If a decision is appealed:
Record the appeal filing timestamp
Confirm that the appeal reviewer was not involved in the original decision
Record the appeal outcome with a coded reason
Record the timeline from filing to resolution
Record: Layer 5 metadata.
Audit Rule: An appeal assigned to someone involved in the original decision creates a critical audit flag.
Expected Output: A Layer 5 appeal record (if applicable).

Step 13: Generate the Review Process Model Card
Action: For each submission that has completed the review cycle, generate a Review Process Model Card using the OAMAP generator. This card summarises Layers 0-5 in a single, structured, machine-readable document.
Record: The generated JSON model card, stored in the venue’s audit archive.
Tool: Use the Review Process Model Card Generator at [PLACEHOLDER: rpmc-seven.vercel.app]
Expected Output: One Review Process Model Card (JSON) per submission.

Step 14: Produce Venue-Level Compliance Report
Action: At the end of each review cycle (issue, conference, funding round), produce a venue-level OAMAP compliance report:
Total submissions received
Desk rejection rate with breakdown by coded reason
Average number of reviewers per submission
Timeline adherence rate (percentage of reviews completed within deadline)
Criteria coverage rate (percentage of reviews addressing all published criteria)
Decision-score consistency rate
Override rate with breakdown by coded reason
Appeal rate and outcomes
Aggregated fairness statistics (by region, career stage, institution type, language background)
AI tool invocation summary (tools used, flag rates, false positive estimates)
Record: Compliance report, published or made available to authorised auditors.
Expected Output: A venue-level OAMAP compliance report per review cycle.

Retention
Layers 1-3 audit records: retain for minimum 5 years
Layer 4 decision provenance: retain for 7 years
Layer 5 appeal records: retain for duration of active challenge plus 2 years
Aggregated fairness statistics: retain indefinitely (no individual data)

Access Control
Authors: Layer 1 data for their own submissions only
Editors: Layers 1-4 for submissions within their editorial scope
Independent auditors: Layers 1-5 for challenged decisions under formal audit mandate
Public: Aggregated venue-level compliance reports only

Privacy
All reviewer-level data uses pseudonymous identifiers
Input hashes use venue-specific salted algorithms
Reviewer pseudonyms are resolvable only by the venue’s editorial system, not by the audit infrastructure
GDPR/CCPA compliance: audit metadata is designed to be non-identifying with respect to reviewers

Protocol references

Di Maio, P. (2024). Towards open standards for systemic complexity in digital forensics. CRC Press. DOI: 10.1201/9781003512820-3
Di Maio, P. (2025). Digital and web forensics model cards, V1. arXiv:2512.17722
Di Maio, P. (2026). Open Audit Methodology for Academic Processes (OAMAP). Position paper submitted to CoARA-ERIP Working Group.
Hargreaves, C., Nelson, A., Casey, E. (2024). An abstract model for digital forensic analysis tools. Forensic Sci. Int. Digit. Investig. 48, 301679.
Mitchell, M. et al. (2019). Model cards for model reporting. ACM FAccT.
Beygelzimer, A. et al. (2023). The NeurIPS 2021 consistency experiment. arXiv:2306.03262.
Smith, R. (2006). Peer review: a flawed process at the heart of science and journals. J. R. Soc. Med. 99(4), 178-182.
Liang, W. et al. (2023). GPT detectors are biased against non-native English writers. Patterns 4(7), 100779.
CoARA (2022). Agreement on Reforming Research Assessment.
W3C AI KR CG (2026). Digital Forensics Model Card Specification (CG-DRAFT). https://w3c-cg.github.io/aikr/DFMC/
ISO/IEC 17025:2017. General requirements for the competence of testing and calibration laboratories.
SWGDE (2023). Best practices for digital forensic tool validation.

Acknowledgements

This protocol was developed as part of the OAMAP framework by the Epistemic Systems Lab at Ronin Institute, in connection with the W3C AI Knowledge Representation Community Group, the CoARA-ERIP Working Group, and the SE4RA Project.