Apr 24, 2026

Peer Review Auditing and Evaluation Protocol

  • 1CSKRNS;
  • 2Ronin;
  • 3Center for Systems, Knowledge Representation and Neuroscience
  • Non Invasive Neuromodulation
Icon indicating open access to content
QR code linking to this content
Protocol CitationPaola Di Maio 2026. Peer Review Auditing and Evaluation Protocol. protocols.io https://dx.doi.org/10.17504/protocols.io.36wgqxzwylk5/v1
License: This is an open access  protocol  distributed under the terms of the  Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: April 23, 2026
Last Modified: April 24, 2026
Protocol  Integer ID: 315590
Keywords: peer review, evaluation, research integrity, open audit methodology for academic process, peer review auditing, open audit methodology, structured audit methodology for the evaluation, structured audit methodology, peer review, auditing, documents the auditing, audit layer, review management platform, existing review management platform, evaluation protocol, preserving reviewer anonymity, protocol ensures process integrity, integrity, reviewer anonymity, academic process, journal, academic decision process, fairness of academic decision process, review, evaluation, structured metadata, editorial manager, protocol, document, oamap
Disclaimer
Use your discretion at all times
Abstract


This protocol specifies a structured audit methodology for the evaluation of the integrity and fairness of academic decision processes -- peer review (journal and conference), grant funding allocation, and career progression decisions. The protocol ensures process integrity, fairness, and transparency while preserving reviewer anonymity. It is designed to be integrated with existing review management platforms (OpenReview, OJS, EasyChair, Editorial Manager) with minimal modificationm as well as with a standalone model card, which standardizes and documents the auditing and evaluation process.https://rpmc-seven.vercel.app/

The protocol operationalizes the OAMAP (Open Audit Methodology for Academic Processes) framework through five audit layers, each generating structured metadata that can be independently verified.


Guidelines

Beta version, s protocol enables venues (journals, conferences, funding bodies) to produce auditable records of their assessment processes. The goal is to make unfairness detectable without exposing reviewer identities -- distinguishing anonymity (which protects reviewers) from opacity (which shields processes from scrutiny).

Scope

The protocol applies to:
  • Journal peer review (submission to decision)
  • Conference peer review (submission to decision)
  • Grant funding review (application to decision)
  • Promotion and hiring decisions (application to decision)

Prerequisites

  • Access to the venue's submission management system
  • Authority to configure audit logging (typically: editor-in-chief, programme chair, or grants officer)
  • The OAMAP Review Process Model Card Generator (available at: [PLACEHOLDER: rpmc-seven.vercel.app])
  • Digital Forensics Model Card Generator for AI tool documentation (available at: https://huggingface.co/spaces/STARBORN/forensics_mc_generator)

Key Principle: Anonymity vs Opacity

This protocol preserves ANONYMITY: reviewer identities are never exposed in audit records. All reviewer-level data uses pseudonymous identifiers (Reviewer-A, Reviewer-B).
This protocol eliminates OPACITY: every procedural step is timestamped and recorded. Process irregularities become detectable by authorised parties.
uires testing and eveluatio evaluators are invited to give feedback to the author
Materials
Submission data
Tools for documentation https://rpmc-seven.vercel.app/
Safety warnings
Be ready to defend the right to transparent decision making in peer review process
Before start

Before You Start

PurposeThis protocol enables venues (journals, conferences, funding bodies) to produce auditable records of their assessment processes. The goal is to make unfairness detectable without exposing reviewer identities -- distinguishing anonymity (which protects reviewers) from opacity (which shields processes from scrutiny).

Scope

The protocol applies to:
  • Journal peer review (submission to decision)
  • Conference peer review (submission to decision)
  • Grant funding review (application to decision)
  • Promotion and hiring decisions (application to decision)

Prerequisites

  • Access to the venue's submission management system
  • Authority to configure audit logging (typically: editor-in-chief, programme chair, or grants officer)
  • The OAMAP Review Process Model Card Generator (available at: [PLACEHOLDER: rpmc-seven.vercel.app])
  • Digital Forensics Model Card Generator for AI tool documentation (available at: https://huggingface.co/spaces/STARBORN/forensics_mc_generator)

Key Principle: Anonymity vs Opacity

This protocol preserves ANONYMITY: reviewer identities are never exposed in audit records. All reviewer-level data uses pseudonymous identifiers (Reviewer-A, Reviewer-B).
This protocol eliminates OPACITY: every procedural step is timestamped and recorded. Process irregularities become detectable by authorised parties.


Step 1: Publish Evaluation Criteria
Action: Publish the complete evaluation criteria that will be used to assess submissions. Make these publicly accessible before the submission deadline.
Record: URL of published criteria document; date of publication; version number.
Rationale: Reviews must demonstrably map to published criteria. A review that rejects on grounds not in the published criteria creates an audit flag. This does not mean the decision is wrong, but the deviation is recorded.
Expected Output: A publicly accessible document listing all evaluation dimensions, scoring scales, and acceptance thresholds.
Step 2: Register AI Tools (Layer 0 – Tool Documentation)
Action: For every AI tool that will be used in the assessment pipeline (LLM detectors, plagiarism checkers, statistical screeners, automated formatters, scope classifiers), complete a Digital Forensics Model Card (DF-MC) using the generator at https://huggingface.co/spaces/STARBORN/forensics_mc_generator.
Record for each tool:
  • Tool name and version
  • Forensic domain classification
  • Reasoning type (deductive, inductive, abductive, hybrid)
  • Known bias types and causes
  • Known error types and causes (especially false positive rates)
  • Which forensic process stages the tool participates in
  • Whether human review is required before acting on outputs
  • Known limitations (language coverage, text length sensitivity, adversarial vulnerabilities)
Store: Save the JSON output in the venue’s DF-MC Registry (a designated folder or database accessible to authorised auditors).
Rationale: Every AI tool used in assessment must be documented before it processes any submission. This is analogous to the tool validation requirements in ISO/IEC 17025 for forensic laboratories.
Expected Output: One DF-MC JSON file per AI tool, stored in the venue’s registry.
Step 3: Configure Conflict of Interest Policy
Action: Document the conflict of interest (COI) criteria that will be applied during reviewer assignment. Specify:
  • Co-authorship lookback period (e.g., 3 years, 5 years)
  • Institutional affiliation conflicts
  • Funding source conflicts
  • Personal relationship declarations
  • Whether COI checks are automated, manual, or both
Record: COI policy document; date of adoption; method of enforcement.
Expected Output: A COI policy document linked to the venue’s submission system.
Step 4: Configure Appeal Pathway
Action: Document the appeal process, including:
  • Eligibility criteria for appeals
  • Timeline for filing an appeal
  • How appeal reviewers are assigned (must not include original decision-makers)
  • How appeal outcomes are communicated
  • Whether aggregate appeal statistics will be published
Record: Appeal policy document; date of adoption.
Expected Output: A published appeal policy accessible to submitters.
Step 5: Receive and Timestamp Submission
Action: Upon receipt of each submission, the system generates:
  • A unique submission identifier
  • A cryptographic timestamp (ISO 8601 format)
  • A completeness check confirmation
  • An anonymisation verification (for double-blind venues)
Record: All Layer 1 metadata is accessible to the submitting author in real time.
Audit Rule: No submission may be silently discarded, delayed, or rerouted without a timestamped record.
Expected Output: A Layer 1 audit record per submission.
Step 6: Run AI Tools and Log Invocations
Action: For each AI tool invoked on the submission, log:
  • Tool identifier and version (linking to the DF-MC in the registry)
  • Timestamp of invocation
  • Input hash (NOT the submission content – a cryptographic hash preserving confidentiality)
  • Output classification (e.g., “human-authored”, “AI-flagged”, “clear”, “suspicious”)
  • Confidence score
  • Decision threshold applied
  • Whether the output was reviewed by a human before any action was taken
Record: AI tool invocation log, linked to the submission identifier.
Audit Rule: Every AI tool invocation must be logged. If a tool is invoked without logging (detected via reconciliation between tool invocation counts and audit event counts), the gap is flagged.
Expected Output: One audit event per AI tool invocation per submission.
Step 7: Record Desk Rejection (If Applicable)
Action: If the submission is desk-rejected, record:
  • Timestamp of desk rejection decision
  • Coded reason from a published taxonomy (e.g., “out-of-scope”, “formatting-non-compliant”, “duplicate-submission”, “integrity-flag”, “capacity-constraint”)
  • Whether any AI tool output contributed to the desk rejection decision
  • Identity of the human decision-maker (pseudonymised in the audit trail)
Record: Desk rejection record linked to the submission identifier.
Audit Rule: Desk rejections without a coded reason create an audit flag.
Expected Output: A Layer 1 desk rejection record (if applicable).
Step 8: Assign Reviewers and Log Assignment Metadata
Action: Assign reviewers and record:
  • Number of reviewers assigned
  • Timestamp of assignment
  • Confirmation that COI checks were performed per the venue’s stated policy
  • Method of COI check (automated, manual, or both)
  • Any reassignments with coded reasons
Record: Layer 2 metadata. Authors see process metadata (number of reviewers, timeline) but NOT reviewer identities.
Audit Rule: The number of reviewers assigned must meet the venue’s stated minimum. If fewer reviewers are assigned, a coded reason is required.
Expected Output: A Layer 2 audit record per submission.
Step 9: Monitor Review Completion
Action: For each assigned reviewer, record:
  • Timestamp of assignment acceptance
  • Timestamp of review submission
  • Completeness check: does the review address all required evaluation criteria? (This is a structural check, not a content assessment)
  • Word count (as a rough proxy for engagement)
  • Whether actionable feedback is provided
  • Whether the review engages with the methodology
  • Timeline compliance: was the review submitted within the venue’s stated deadline?
Record: Layer 3 metadata, pseudonymised (Reviewer-A, Reviewer-B, etc.).
Audit Rule: A review that does not address all published evaluation criteria creates an audit flag. A review submitted after deadline creates an audit flag (with coded reason if extension was granted).
Expected Output: A Layer 3 audit record per review, pseudonymised.
Step 10: Record the Decision
Action: Record the editorial, panel, or committee decision:
  • Timestamp of decision
  • Decision outcome (accept, minor-revision, major-revision, reject, desk-reject, withdrawn)
  • How individual scores mapped to the final decision
  • Whether the decision was consistent with the aggregate reviewer recommendation
  • If the decision overrode the aggregate recommendation: a coded justification category (editorial-judgment, scope-mismatch, integrity-concern, capacity-constraint, other)
  • Any use of editorial override
  • Whether minority reviewer opinions were noted
Record: Layer 4 metadata.
Audit Rule: A decision that contradicts the aggregate reviewer recommendation without a coded justification creates an audit flag.
Expected Output: A Layer 4 decision record per submission.
Step 11: Collect Aggregated Demographics
Action: At the decision stage, compile aggregated demographic data for fairness analysis:
  • Acceptance rate by author region
  • Acceptance rate by career stage
  • Acceptance rate by institution type
  • Acceptance rate by language background
  • Desk rejection rate by the same categories
  • AI tool flag rate by the same categories
Record: Aggregated statistics only – never individual-level data. Apply differential privacy techniques where sample sizes are small.
Rationale: This enables detection of systematic bias (e.g., are non-native English speakers being desk-rejected at higher rates after AI tool flagging?).
Expected Output: Aggregated fairness statistics, published as part of the venue’s annual report.
PHASE 6: POST-DECISION (Layer 5 – Accountability)
Step 12: Process Appeals
Action: If a decision is appealed:
  • Record the appeal filing timestamp
  • Confirm that the appeal reviewer was not involved in the original decision
  • Record the appeal outcome with a coded reason
  • Record the timeline from filing to resolution
Record: Layer 5 metadata.
Audit Rule: An appeal assigned to someone involved in the original decision creates a critical audit flag.
Expected Output: A Layer 5 appeal record (if applicable).
Step 13: Generate the Review Process Model Card
Action: For each submission that has completed the review cycle, generate a Review Process Model Card using the OAMAP generator. This card summarises Layers 0-5 in a single, structured, machine-readable document.
Record: The generated JSON model card, stored in the venue’s audit archive.
Tool: Use the Review Process Model Card Generator at [PLACEHOLDER: rpmc-seven.vercel.app]
Expected Output: One Review Process Model Card (JSON) per submission.
Step 14: Produce Venue-Level Compliance Report
Action: At the end of each review cycle (issue, conference, funding round), produce a venue-level OAMAP compliance report:
  • Total submissions received
  • Desk rejection rate with breakdown by coded reason
  • Average number of reviewers per submission
  • Timeline adherence rate (percentage of reviews completed within deadline)
  • Criteria coverage rate (percentage of reviews addressing all published criteria)
  • Decision-score consistency rate
  • Override rate with breakdown by coded reason
  • Appeal rate and outcomes
  • Aggregated fairness statistics (by region, career stage, institution type, language background)
  • AI tool invocation summary (tools used, flag rates, false positive estimates)
Record: Compliance report, published or made available to authorised auditors.
Expected Output: A venue-level OAMAP compliance report per review cycle.
Retention
  • Layers 1-3 audit records: retain for minimum 5 years
  • Layer 4 decision provenance: retain for 7 years
  • Layer 5 appeal records: retain for duration of active challenge plus 2 years
  • Aggregated fairness statistics: retain indefinitely (no individual data)
Access Control
  • Authors: Layer 1 data for their own submissions only
  • Editors: Layers 1-4 for submissions within their editorial scope
  • Independent auditors: Layers 1-5 for challenged decisions under formal audit mandate
  • Public: Aggregated venue-level compliance reports only
Privacy
  • All reviewer-level data uses pseudonymous identifiers
  • Input hashes use venue-specific salted algorithms
  • Reviewer pseudonyms are resolvable only by the venue’s editorial system, not by the audit infrastructure
  • GDPR/CCPA compliance: audit metadata is designed to be non-identifying with respect to reviewers
Protocol references


  • Di Maio, P. (2024). Towards open standards for systemic complexity in digital forensics. CRC Press. DOI: 10.1201/9781003512820-3
  • Di Maio, P. (2025). Digital and web forensics model cards, V1. arXiv:2512.17722
  • Di Maio, P. (2026). Open Audit Methodology for Academic Processes (OAMAP). Position paper submitted to CoARA-ERIP Working Group.
  • Hargreaves, C., Nelson, A., Casey, E. (2024). An abstract model for digital forensic analysis tools. Forensic Sci. Int. Digit. Investig. 48, 301679.
  • Mitchell, M. et al. (2019). Model cards for model reporting. ACM FAccT.
  • Beygelzimer, A. et al. (2023). The NeurIPS 2021 consistency experiment. arXiv:2306.03262.
  • Smith, R. (2006). Peer review: a flawed process at the heart of science and journals. J. R. Soc. Med. 99(4), 178-182.
  • Liang, W. et al. (2023). GPT detectors are biased against non-native English writers. Patterns 4(7), 100779.
  • CoARA (2022). Agreement on Reforming Research Assessment.
  • W3C AI KR CG (2026). Digital Forensics Model Card Specification (CG-DRAFT). https://w3c-cg.github.io/aikr/DFMC/
  • ISO/IEC 17025:2017. General requirements for the competence of testing and calibration laboratories.
  • SWGDE (2023). Best practices for digital forensic tool validation.


Acknowledgements

This protocol was developed as part of the OAMAP framework by the Epistemic Systems Lab at Ronin Institute, in connection with the W3C AI Knowledge Representation Community Group, the CoARA-ERIP Working Group, and the SE4RA Project.