Mar 09, 2026

Public workspaceGoverning Against Domestication: A Protocol for Distributed Scientific Reproducibility Validation

  • Ceri John1
  • 1ValiChord
  • ValiChord
Icon indicating open access to content
QR code linking to this content
Protocol CitationCeri John 2026. Governing Against Domestication: A Protocol for Distributed Scientific Reproducibility Validation. protocols.io https://dx.doi.org/10.17504/protocols.io.rm7vzp5b2lx1/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: In development
We are still developing and optimizing this protocol
Created: March 07, 2026
Last Modified: March 09, 2026
Protocol Integer ID: 282374
Keywords: protocol for distributed scientific reproducibility validation abstract, distributed scientific reproducibility validation abstract, reproducibility crisis in science, reproducibility crisis, reproducibility validation system, epistemic integrity, reproducibility validation, general template for any reproducibility validation system, published computational research, governance, institutional force, governance framework, validator, governance architecture, repository requirement, new system unless governance, peer review, peer infrastructure, source peer, institutional pressure, computational research, institution, science, validation
Abstract
Abstract
The reproducibility crisis in science is widely documented, but most reform efforts share a structural flaw: they rely on institutional goodwill that institutions are not reliably able to sustain. Data mandates go unenforced. Repository requirements produce unusable deposits. Peer review rarely attempts reproduction. The policies exist. They are domesticated — hollowed out through accumulated accommodation until nothing remains worth defending.
This paper presents a governance framework for distributed scientific reproducibility validation systems: systems that coordinate independent validators to assess whether published computational research can actually be reproduced, and that issue structured, tamper-evident records of the outcome. The framework's central argument is that technical architecture alone cannot ensure epistemic integrity — the institutional forces that corrupted previous reform efforts will operate on any new system unless governance is designed from the outset to resist them.
We describe six non-negotiable epistemic commitments, a tiered governance architecture that scales with operational maturity, nine mechanical anti-domestication defences against the two most likely capture scenarios, and a set of red lines that cannot be conceded regardless of institutional pressure. The framework is implemented in ValiChord, an open-source peer-to-peer infrastructure for reproducibility validation built on Holochain (https://github.com/topeuph-ai/ValiChord), and is offered as a general template for any reproducibility validation system that intends to remain honest under pressure.
Materials
This protocol relies on the following core documents, which define the governance architecture and operational framework for distributed scientific reproducibility validation:
  • Governing Against Domestication (Preprint) — Full governance framework, including epistemic commitments, tiered governance, and anti‑domestication mechanics.
  • ValiChord Vision & Architecture Document — High‑level system design, lifecycle, and architectural rationale.
  • ValiChord Rust Scaffold (Architecture Specification) — Type‑level and trait‑level specification of the eight‑layer system architecture.
  • ValiChord Governance Framework (Extended Version) — Detailed governance mechanics, red lines, safe concessions, and structural defences.
  • ValiChord Technical Reference (if you choose to include it) — Architecture sketches and engineering notes.

Troubleshooting
Overview & Purpose
1. Introduction:

Why Governance Fails
Every significant reproducibility reform initiative of the past two decades has
followed the same arc. It begins with a genuine crisis diagnosis, produces a set of
policy commitments from journals, funders, or institutions, and then slowly declines
into compliance theatre. The Open Data Badge programme. Pre-registration
mandates. ARRIVE guidelines. Data availability statements. Each was introduced
with evidence of need and genuine institutional support. Each was domesticated.
Domestication is not sabotage. No single actor makes a decision to abandon the
commitment. Instead, a hundred small accommodations accumulate: the exception
that becomes the norm, the softened language, the unenforced deadline, the score
that replaces the record, the badge that no one checks behind. The process is
procedurally correct throughout. The result is a system that performs reform while
preserving the status quo.

The threat model for any reproducibility validation system is therefore not primarily
technical. It is institutional. The adversaries are coordinated legitimacy, career
incentives, volume dominance by prestigious institutions, and the natural human
tendency toward comfortable ambiguity over uncomfortable precision. Silence and
selective participation are more dangerous than outright fraud, because they are
harder to identify and easier to justify.

This paper presents a governance framework built around this threat model. It is
designed for systems that coordinate independent validators to assess whether
published computational research can be reproduced — systems that must operate
honestly precisely when powerful actors have the most to lose from honest operation.
The framework has been implemented in ValiChord (https://github.com/topeuph-
ai/ValiChord), an open-source distributed reproducibility validation infrastructure,
and is described here as a general template.

The framework is organised into three components: a set of epistemic commitments
that function as non-negotiable constraints on system behaviour; a tiered governance
architecture that scales with operational maturity; and a suite of mechanical anti-
domestication defences that operate through code, licence terms, and automatic
tripwires rather than requiring ongoing institutional courage.
2. The Core Insight
A reproducibility validation system’s true adversaries are: coordinated
legitimacy, career incentives, institutional dominance, and human risk
aversion. Silence, ambiguity, and selective participation are more
dangerous than outright fraud.
This threat model shapes every governance choice that follows. The framework's
philosophy can be summarised in five principles:
• Detection over prevention. Gaming cannot be fully prevented. It can be
made detectable and costly.
• Transparency over perfection. Admitting uncertainty is always preferable
to manufacturing false certainty.
• Process over outcomes. Robust process yields trustworthy outcomes over
time; optimising for outcomes produces gaming.
• Community over control. Governance serves epistemic integrity, not any
individual, institution, or funder.
• Discomfort over comfort. Some discomfort is the necessary cost of honest
operation, especially when powerful actors find it inconvenient
3. Six Non-Negotiable Epistemic Commitments
The following six commitments are the minimum conditions under which a
reproducibility validation system can claim epistemic integrity. They are not
aspirational goals. They are structural requirements, and they apply from the
system's first operational day at any scale.

Commitment 1: Forced Disagreement Visibility
When independent validators reach different conclusions about a study's
reproducibility, that disagreement must be documented prominently. It cannot be
hidden, averaged away, or relegated to supplementary material. Divergent
assessments must appear alongside convergent ones, without editorial smoothing.
Disagreement is not a system failure — it is frequently the most important finding.
The pressure to suppress disagreement will be framed as protecting non-expert
audiences from confusion. This framing must be rejected. The reproducibility crisis
was created partly by exactly this kind of helpful simplification. Disagreement
between independent, competent validators about whether a study can be
reproduced is information that the scientific community needs, not noise to be
managed.

Commitment 2: Institutional Attribution
Validators must be identified by institution in internal governance records, even
when individual identity is protected in publications. Institutional-level attribution
creates the accountability that prevents rubber-stamping: if validators affiliated with
a particular institution systematically produce more favourable assessments than the
field average, that pattern becomes visible and can be investigated.
Without institutional attribution, the system cannot distinguish genuine validation
from coordinated softness. Individual anonymity is a reasonable protection for
validators; institutional anonymity is a mechanism for evading accountability.

Commitment 3: No Guaranteed Closure
Some validation attempts will produce genuinely ambiguous results. Two
independent validators may reach different conclusions. A study may be partially
reproducible, reproducible under specific conditions that were undisclosed, or
reproducible by one method but not another. The system must not force clean
verdicts where evidence does not support them.
"Unable to determine" is a legitimate and valuable outcome. It is not a system failure;
it is an honest description of the evidence. Systems that require binary outputs will
be gamed — researchers will modify submissions, validators will resolve doubt by
choosing the more convenient option, and funders will treat ambiguity as a problem
to be managed rather than a finding to be reported.

Commitment 4: Rapid Consequences
When validator behaviour clearly falls below acceptable standards — cursory reports
submitted in a fraction of the expected time, generic assessments clearly not tailored
to the study, patterns suggesting the validation was not performed in good faith —
the consequence must be rapid. The validation is flagged, the validator is contacted,
and if the pattern continues, they are removed. Months-long committee processes
allow bad actors to continue operating during review and signal to all participants
that the system is reluctant to enforce its own standards.

Commitment 5: Pattern Visibility
Aggregate patterns in validation outcomes must be tracked and reported, even when
this creates uncomfortable findings. If validators from a particular disciplinary
background, institutional type, or region consistently produce systematically
different results than others, this must be documented as a finding rather than
suppressed as an embarrassment. Pattern visibility is the mechanism through which
systemic bias becomes detectable and addressable.

Commitment 6: Legible Governance
Every governance decision — validator assignment, protocol selection, difficulty
classification, any deviation from standard procedure — must be logged with its
reasoning and available to participants on request. Governance that is not legible is
governance that cannot be held accountable. The system must be auditable by
design, not by exception.
4. Tiered Governance Architecture

Governance overhead must match operational scale. A system validating its first
twenty studies under a pilot framework does not need the same governance
apparatus as one that has issued thousands of validation records across multiple
disciplines. The framework is therefore structured in three tiers, with escalating
governance mechanisms activated in response to demonstrated need rather than on
a calendar schedule.

The trigger for tier escalation is evidence of need: when the number of validators,
validations, or institutional relationships creates governance demands that the
current tier cannot adequately address, the next tier activates.

Tier 1: Foundational Governance
Appropriate from system inception through early operation. Deliberately minimal —
governance overhead in this tier must not consume resources better spent proving
the core validation model works.

Decision Authority
The project lead and academic principal investigator make operational decisions
jointly. All decisions are logged with rationale. No decisions are made behind closed
doors. A simple appeals process allows any participant to challenge a decision in
writing; the PI reviews within ten business days and publishes a response with
rationale. Unresolved appeals are referred to a designated external reviewer from the
advisory network for binding decision within fifteen business days.
Conflict of Interest Screening
Before any validation assignment, validators declare institutional affiliations. These
are cross-checked against study authors and institutions. Co-authorship within five
years disqualifies. Shared department disqualifies. Same institution with no direct
collaboration history is permitted, subject to governance review.

Participant Protection
Informed consent with clear study design explanation is required. Participants have
the right to withdraw without penalty and to anonymity in publications. Time caps
prevent exploitation; a fairness adjustment mechanism compensates for
disproportionately difficult assignments. Data storage must comply with applicable
privacy regulations (GDPR in the European context).
Basic Transparency
All governance decisions published with rationale. Aggregate results shared with
participants. No selective reporting. Disagreements between validators documented,
not smoothed.

Tier 2: Enhanced Governance
Activates when operational scale creates governance demands — in terms of
validator count, validation volume, or institutional relationships — that Tier 1
mechanisms cannot adequately handle.

Advisory Board
Composition of five to seven members: at least two practising computational
researchers who function as potential validators, at least one research integrity
specialist, and at least one representative from a partner institution. No single
institution holds a majority. Two-year terms with staggered rotation. Authority is
advisory, not executive; meeting minutes are published and dissenting views
recorded.

Research Integrity Function
A designated governance function responsible for investigating validator behaviour
concerns, reviewing flagged validations, maintaining the conflict of interest register,
monitoring gaming patterns, and handling formal complaints. Staffed part-time at
this scale, drawing on advisory board members and the external network.

Validator Cartel Detection
As the validator pool grows, systematic monitoring for five patterns: collusion (cross-
institutional agreement rates above 90% over twenty or more validations trigger
investigation); institutional volume dominance (no institution may supply more than
40% of active validators); rubber-stamping (time-tracking compared to task
complexity, with automatic flagging of validators completing tasks in under 10% of
expected time); selective participation (validators accepting only easy assignments
have their assignment priority reduced); and homophily (agreement rates with
specific institutions tracked, with persistent rates above 90% triggering reputation
review). All flagging is automated; consequences require human review with
published rationale.

Reputation System Governance
When a formal reputation system is operational, the algorithm must be fully
published as open source, with all factor weights disclosed. Changes require
community notice at least three months in advance. Historical changes are logged
publicly. The following components are explicitly forbidden from the algorithm:
volume bonuses (rewarding quantity over quality), speed bonuses (creating racing
incentives), institutional prestige weighting (introducing halo effects), and personal
network advantages. Algorithm changes require a supermajority of the designated
algorithm committee.

Certification Standards
When formal validation records become system outputs, every record must include:
protocol identification; validation summary (validator count and outcomes);
epistemic confidence level with rationale; disagreement disclosure prominently
displayed where applicable; a limitations section; a link to full provenance; and valid
dates with a minimum currency of twenty-four months before expiry review. The
following are explicitly forbidden from validation records: single numerical
reproducibility scores; phrases guaranteeing reproducibility; omission of
disagreements; and selective outcome presentation.

A tiered badge structure, if implemented, should require increasing validator counts
and consensus levels for higher tiers, with all tiers linking to the full validation
record. Binary pass/fail badges are incompatible with epistemic integrity and should
not be issued.

Badge Level Minimum Consensus Additional
Validators Threshold Requirements
Bronze 3 ≥60% No substantial
unresolved
disagreements

Silver 5 ≥70% Pre-registered;
deviations disclosed

Gold 7 ≥80% Multi-institutional;
open data and code

Compensation Fairness
Validator compensation must be tied to quality indicators, not speed or volume. No
bonuses for rapid completion; no penalties for thoroughness. Rates should be
reviewed annually against market rates for comparable expert peer review work and
published openly. The principle is that the compensation structure must not create
incentives that conflict with validation quality.

Institutional Capture Prevention
As institutional partnerships form, specific structural defences become necessary. No
single institution should simultaneously dominate governance, validation supply,
and funding. All governance positions should rotate on two-year terms, staggered so
that no more than one-third of any body changes simultaneously. Governance bodies
must include representation from small institutions, early-career researchers, and
diverse geographic contexts — this is an architectural requirement for epistemic
credibility, not a diversity add-on. A validation pool dominated by a small number of
well-resourced institutions is insufficiently independent to produce trustworthy
results. Graph-based verification of co-authorship and funding relationships should
be used to confirm that governance bodies are genuinely independent rather than
structurally connected through shared networks.

Tier 3: Mature System Governance
The complete governance architecture for a system operating at scale. Designed in
advance so that the mature system inherits the right principles — governance
designed after powerful actors are embedded always accommodates those actors.

Full Governance Structure
A mature system requires distinct governance layers: a Research Integrity Office with
final decision authority, appeals adjudication, and transparency oversight; pre-
commitment standards governance including a deviation review board and
disciplinary standards committees; validation rules covering validator selection,
gaming detection, and collusion pattern analysis; meta-governance specifying rule
evolution processes and separation of powers; certification standards governance;
incentive integrity oversight; ecosystem rules governing API access and third-party
usage; and transparency standards covering public disclosure requirements and
privacy protections.

Anti-Calcification Mechanisms
Preventing early adopters from locking in permanent advantage is a structural
requirement. Reputation decay (exponential decay with approximately a six-month
half-life) ensures that established validators must continue contributing to maintain
standing. Domain-bounded reputation prevents authority in one field from
conferring authority in others. New validator boosts for initial validations allow
entrants to establish credibility. Blind validator selection, where validators do not
know who else is assessing the same study, prevents coordination.

Campbell's Law Resistance
Any metric that becomes a target ceases to be a good measure. Validation records
must be presented with context. Anti-threshold guidance should make explicit that
reducing records to simple pass/fail signals is contrary to the system's purpose.
Variance and uncertainty must always accompany summary metrics. Exploratory
and confirmatory analyses must be clearly distinguished and assessed by appropriate
criteria; mislabelling exploratory work as confirmatory is a governance escalation
trigger.

Cross-Disciplinary Standards
A standardised attestation taxonomy (Success/Partial/Failed/Inconclusive) provides
baseline consistency. Disciplinary standards committees define what these categories
mean within their fields and set pre-registration requirements appropriate to their
methodology. Annual cross-disciplinary review prevents semantic drift. The
taxonomy must acknowledge that "success" means different things in different
research contexts, and validation records must make these distinctions explicit.
International Governance

Multi-jurisdictional operation creates legal and cultural complexity. The minimum
standards model (a global baseline with regional adaptations where necessary) and
mutual recognition between regional frameworks are preferable to attempting
uniform global governance. Baseline transparency and integrity commitments are
non-negotiable across jurisdictions; their implementation may legitimately vary.
5. Anti-Domestication Mechanics

Why Mechanical Defences Are Necessary
Governance commitments, however clearly stated, do not enforce themselves. The
preceding section describes what the system commits to. This section specifies how
those commitments are enforced through mechanisms that operate without
requiring anyone to be brave in the moment — through code-level constraints,
licence terms, and automatic tripwires.

Two capture scenarios were stress-tested against the governance framework and
found to reveal the gap between philosophical commitment and operational reality.
Both scenarios involve actors who are not hostile, who behave procedurally correctly
throughout, and who nevertheless gradually hollow out the system's epistemic
function.

Scenario A: Institutional Soft-Capture via Flagship Partnership
A large, respected institution becomes the system's flagship partner. They provide
funding, validators, and journal introductions. They are not adversarial. But over
time, small accommodations accumulate: a summary view that buries disagreement
behind a click, cautious language from their validators, polite requests to delay
publication of inconvenient results. No rules are broken. The system is slowly
hollowed out through structural dependency.
This is the most dangerous capture scenario because every individual request is
reasonable, no one acts in bad faith, and the system functions correctly throughout.
The capture is structural, not conspiratorial.

Mechanic 1: Default Salience Rule (Code-Level)
If material disagreement exists in a validation record, it must be visible in the first
screen, first scroll, and first API response. Not behind tabs, summaries, confidence
labels, or badges. This is a UI governance constraint enforced in code. Any interface
— internal or partner-facing — that renders validation records must pass a salience
check: is disagreement visible without user action?
Rationale: default views define reality. Almost no one clicks through. A "summary-
first" display that technically preserves disagreement behind a link is functionally
indistinguishable from hiding it.

Mechanic 2: Anti-Delay Constraint
Validation records must be published within a defined maximum latency after final
attestation. Extensions require a public rationale logged in the governance record, a
visible "Delayed by Governance Review" banner on the pending record, and
automatic publication if the extension period expires without resolution. Delay
becomes louder, not quieter. Any actor requesting delay must accept that the delay
itself is visible and documented.
Rationale: In fast-moving fields, delaying a validation record for months while
"awaiting clarification" is functionally equivalent to suppression. The anti-delay
constraint ensures that the cost of delay is reputational transparency, not quiet
burial.

Mechanic 3: Funding Concentration Tripwire
No single institution may simultaneously exceed two of the following three
thresholds: more than 25% of operational funding; more than 25% of active
validators; more than 25% of governance seats. Exceeding two of three triggers
automatic public disclosure of the concentration, temporary intake throttling for new
validations from that institution, and a mandatory governance review within thirty
days.
Rationale: financial dependency is governance capture by another name. An
institution that provides funding, validators, and governance oversight
simultaneously doesn't need to make demands — everyone already knows what
would happen if they withdrew. The tripwire makes concentration visible before it
becomes structural.

Mechanic 4: Exit-with-Integrity Clause
The system should pre-commit publicly to the following statement, embedded in all
partnership agreements and published on the project website:
If maintaining this system’s epistemic commitments becomes incompatible
with continued funding or partnership, the system will prefer contraction or
suspension over compromise.
This is not a dramatic gesture. It is a filter. Good partners are not threatened by it.
Partners who intend to exert influence through dependency will self-select out early,
which is the point.

Scenario B: Journal-Led Capture via Workflow Integration
Journals integrate the validation system's API into their editorial workflows and use
validation records during peer review. This appears to be adoption success. But
journals control timing, framing, editorial thresholds, and reviewer instructions — all
of which sit outside the validation system's governance boundary. Without touching
governance or code, a journal can use the system only at pre-submission triage
(filtering out studies likely to show disagreement before they reach validation);
instruct reviewers to defer to validation records as shields rather than signals;
display summarised badges to readers while editors see full records; treat only pre-
submission records as authoritative; and eventually define "validated" in ways that
exclude disagreement and ambiguity.

The capture happens entirely outside the system's boundary. Nobody touches
governance, code, or data. They control the context in which outputs are interpreted.
Mechanic 5: API Display Requirements (Licence-Level)
The system's API licence must include mandatory display terms. Any partner
querying validation records must: display disagreement with the same visual
prominence as agreement; include a mandatory, unfurlable link to the full validation
record; show the "last updated" timestamp (preventing stale snapshots being
presented as current); and include a standard attribution line pointing to the full
record. Summarisation is permitted, but summarisation that omits material
disagreement is a licence violation. Violation results in written notice, a thirty-day
remediation period, and public API access revocation if unresolved.

Mechanic 6: Anti-Binary-Badge Clause
The system should not issue binary pass/fail badges. The output is always the full
validation record or a structured summary that preserves disagreement. If a third
party creates their own binary badge based on the system's data, the API terms must
require that badge to include a clear statement that it is not issued by the validation
system and that the full record is available at the link provided.
Rationale: Binary badges are the mechanism through which nuanced evidence gets
reduced to gatekeeping thresholds. The validation system's entire design philosophy
resists this reduction. If journals want a binary signal, they may create one — but the
validation system's name is not on it, and the full record is always one click away.

Mechanic 7: Temporal Integrity
Validation records are living documents. The API must always return the current
state of a record, including any post-publication validations, updates, or new
disagreements. Records must carry creation timestamp, last-updated timestamp,
version number (incremented on every material change), and a count and summary
of post-publication validations. A journal may cache a snapshot, but the API must
make visible when a cached version is stale. Any display of a validation record that
omits the last-updated field violates the API licence.
Rationale: Temporal freezing — treating a pre-submission validation record as
definitive and ignoring post-acceptance validations — is how journals shift a
validation system from corrective to confirmatory function. Living documents with
visible timestamps ensure that new evidence cannot be quietly ignored.

Mechanic 8: Triage Misuse Detection
Aggregate query patterns across API partners should be monitored statistically.
Detectable patterns include: pre-screening bias (a partner querying studies only
before editorial decision and never after acceptance); selective querying (querying
studies from certain institutions but not others); and outcome filtering (studies
queried pre-decision and showing disagreement being disproportionately rejected).
Findings should be published in the annual transparency report (anonymised by
default), shared privately with the partner for remediation, and made public with
partner identified if the pattern persists after notification.

Mechanic 9: Public Delisting
If a partner is demonstrably using the validation system in ways that systematically
undermine its epistemic function — and remediation has failed — the system publicly
withdraws API access and publishes a detailed explanation. This is the nuclear
option. It is reputational, not legal. In science publishing, reputation is currency. The
credible threat of public delisting is itself a governance mechanism. Delisting
requires documented evidence of systematic misuse, failed remediation, and a
governance supermajority vote. It cannot be triggered unilaterally.

How These Mechanics Interact
The two scenarios target different attack surfaces — internal structural dependency
and external interpretation — but the defences share a common logic:
• Make capture visible. Funding concentration is disclosed. Delay is
bannered. Query patterns are monitored. Display violations are public.
• Make resistance automatic. Code-level salience rules, API licence terms,
and automatic tripwires operate without anyone needing to make a brave
phone call.
• Make the cost of capture exceed the cost of compliance. Public
delisting, visible delay banners, and funding tripwires impose reputational
costs on capture attempts early, when correction is still cheap.
6. Red Lines and Safe Concessions
The following cannot be conceded regardless of who asks — funders, partner institutions, journal editors, government officials:
Disagreement Visibility The pressure will be framed as "Can we just show the majority result?" or "Disagreement confuses non-experts." The answer is no. Disagreement in validation records must be preserved and prominently displayed for a minimum of twenty-four months. This is non-negotiable because hiding disagreement is exactly how the reproducibility crisis was created in the first place.
Institutional Attribution The pressure will come as "Our validators are concerned about retaliation." Individual validator anonymity is negotiable. Institutional-level attribution is not, because institutional patterns are how systematic rubber-stamping becomes detectable. Without it, the system cannot distinguish genuine validation from coordinated softness.
No Single Numerical Score The pressure will come as "We need a number for our database" or "Reviewers don't have time to read full records." The answer is: provide a summary, a confidence level, a status — but never a single number that can be used as a threshold. Thresholds get gamed. Gaming thresholds is exactly what p-hacking is. The validation system must not create the next version of the same problem.
No Forced Closure on Ambiguous Results The pressure will come as "We need a decision for this grant review" or "The paper is being held up." The answer is: the system provides the best evidence available, honestly described. If that evidence is ambiguous, the honest description is "ambiguous." Forcing certainty where none exists is the fundamental failure mode of the system being replaced.
Safe Concessions The following can legitimately be negotiated without compromising epistemic integrity: • Individual validator anonymity (as long as institutional attribution is preserved) • Timeline adjustments for record publication (within the anti-delay constraint — extensions must be public, bannered, and time-limited) • Discipline-specific adaptations of standards (within the baseline set by the non-negotiable commitments) • Presentation format changes (as long as content requirements are met) • Partnership terms and pricing • Governance committee composition details (as long as structural independence requirements are met)
• Technology implementation choices Scripts for Difficult Conversations When a funder says: "We need a simple score for our database." We can provide a structured summary — reproducibility status, confidence level, and validator count — in a machine-readable format that your systems can ingest. What we cannot provide is a single number, because reducing complex validation evidence to a single score creates exactly the kind of metric that gets gamed. We have seen this with p-values, journal impact factors, and h-indices. We would be creating the next version of the same problem.
When an institution says: "Our validators want full anonymity." We protect individual validator identity by default. What we cannot hide is institutional affiliation, because institutional patterns are how we detect systematic problems. If validators from an institution consistently produce soft reviews, that needs to be visible — not to shame the institution, but to maintain the system’s integrity. Without this, we cannot distinguish genuine validation from rubber-stamping.
When a journal says: "We just want to know pass or fail." We can give you a clear reproducibility status — and for most studies, that will be straightforward. But for studies where validators genuinely disagree, we will say so, because that disagreement might be the most important finding. A study where five validators succeed and one fails might be telling you something about hidden assumptions, software versions, or genuine fragility. That is information your editors need, not noise to be averaged away.