Deterministic Scientific Claim Gate

Does this claim’s
evidence hold up?

CAPAS compiles supplied evidence into a claim admissibility decision. It does not determine truth — it decides, deterministically, whether the evidence licenses the claim: ACCEPT / REWRITE / REJECT / HOLD, with a re-derivable audit trail and no language model in the verdict. Your model proposes; CAPAS disposes. Built for regulatory reviewers, data & training-data engineers, and journal editors.

Run sample claim Talk to pilot owner See pilot package

CAPAS checks whether your supplied evidence is internally consistent and re-derivable. It does not certify scientific truth, compliance, authorship, or the authenticity of the data — a consistent fabrication still passes (the GIGO ceiling).

Recent Gate Decisions SCHEMA V3

ACCEPT statistical_confidence: p=0.03 <= alpha=0.05, direction confirmed ff23bfcc

REWRITE threshold passes; effect direction not licensed 2a2c5234

REJECT artifact unavailable for reproducibility 1cc473f8

HOLD incomplete evidence · fail-closed 9033f4c1

Real engine output, not a mockup — re-derive every verdict & hash: python3 benchmarks/home_gate_decisions.py

Deterministic gates

Domains, one engine

LLM in the verdict · fail-closed (test-proven)

100%

Re-derivable, replayable

Every number is CLOSED (proven by a test), BACKED (regenerates with a hash), or SCOPED (empirical-pending, with its corpus & upgrade path) — see the proof ledger.

The Core

Built on four principles

CAPAS was designed from the ground up to make scientific claim evaluation auditable, repeatable, and integration-ready.

Deterministic verdicts

The same claim + evidence bundle always produces the same verdict. No probabilistic outputs, no model drift, no hallucinations. Schema v3 rules are the single source of truth.

No randomness

Schema-validated structure

Every payload is validated against CAPAS Schema v3 before gating. Field types, required keys, and claim-type-specific constraints are enforced at the gate boundary.

CAPAS Schema v3

Full provenance trail

Every REJECT and HOLD verdict includes machine-readable provenance blockers — licensing flags, reproducibility gaps, attestation failures — exportable as structured JSON.

Audit-ready

Zero LLM dependency

CAPAS does not use language models for gate decisions. Verdicts are computed by deterministic rule functions — 26 cross-domain invariant gates plus per-claim-type evidence contracts — against a structured contract, fully offline-capable. Every decision carries a re-derivable audit_hash.

Rule-based only

Where it Lives

Four-step workflow

From claim draft to admissibility decision in one deterministic pass.

Claim drift

A paper says association. A dataset row turns it into causation. CAPAS catches the boundary.

Select mode

Guided builder, raw JSON, batch evaluation, or claim-candidate extraction (deterministic preview, human confirmation). Pick your workflow.

Run gate

Schema v3 and claim-type rules return ACCEPT, REWRITE, REJECT, or HOLD with full provenance.

Inspect decision

Review verdict, schema errors, provenance blockers, and fine-tune evidence readiness.

Verdict Reference

Four possible outcomes

Every gate run returns exactly one of these. Each carries a machine-readable reason trail.

Claim is admissible

All schema constraints pass. Evidence fully supports the claim as stated. No licensing or reproducibility blockers found.

p_value=0.03, alpha=0.05
effect_direction_confirmed: true
→ ACCEPT · audit_hash ff23bfcc

REWRITE

Claim needs adjustment

Evidence is present but the claim overreaches. Direction, scope, or causal language must be corrected before resubmission.

p_value=0.03, alpha=0.05
effect_direction_confirmed: false
→ REWRITE · audit_hash 2a2c5234

REJECT

Claim is inadmissible

Critical evidence is missing, contradicted, or irreproducible. The claim cannot be licensed in its current form.

artifact_available: false
independent_reproduction_pass: false
→ REJECT · audit_hash 1cc473f8

HOLD

Pending verification

Evidence is incomplete or no external oracle is available; the gate fail-closes to HOLD instead of guessing.

p_value=0.03, alpha=0.05
effect_direction_confirmed: (missing)
→ HOLD · audit_hash 9033f4c1

Why CAPAS

Not an LLM checker. Not a plagiarism tool.

CAPAS is structured, deterministic admissibility gating for scientific claims — a verdict you re-derive to the same hash, not a score from a model.

What CAPAS is NOT

Not a fact-checker — CAPAS doesn't verify truth, it validates evidence licensing and schema compliance

Not a plagiarism detector — citation matching is not the concern; admissibility is

Not an LLM wrapper — zero language model calls in the gate decision path

Not a peer review replacement — CAPAS is a pre-gate boundary tool, not editorial judgment

What CAPAS IS

A deterministic claim admissibility gate — same input always produces the same output

A schema enforcement layer — CAPAS Schema v3 defines exactly what a valid evidence package looks like

A provenance audit tool — every blocker is machine-readable and exportable

An integration-ready API — designed to sit inside research pipelines, publishing workflows, and audit systems

The other half of the stack

Everyone builds “what can I say?” — CAPAS builds “should this have been said?”

Frontier models and agents are built to generate claims. The systems next to them each gate only a slice — LLM-judges assess truth (stochastically, not replayably), fact-checkers chase truth not boundary, domain validators like Pinnacle 21 cover one field. Across the systems we surveyed we found none that gate, deterministically and replayably, whether an arbitrary claim is licensed by its evidence before reuse. CAPAS starts there.

System	Generates claims	Gates claims
GPT · Claude · Gemini	✓	✗
Deep-research agents	✓	✗
LLM-as-judge	✗	stochastic · no guarantee, not replayable
Fact-checkers	✗	truth, not boundary
CAPAS	✗	✓ deterministic · replayable · fail-closed (test-proven)

How drift happens

The contamination cascade

Without a claim-level gate, drift survives source review, metadata review, and provenance review — because none of them evaluate the claim boundary itself.

01 · Paper

Benefit in one subgroup

One randomized trial, one patient subgroup, bounded result.

→

02 · Drift

All-patient benefit claimed

The reusable sentence widens a subgroup result to every patient.

→

CAPAS · Gate

REWRITE

Stops drift before it becomes governed evidence.

→

Blocked

Dataset / Model

Population-wide claim reproduced downstream. Prevented.

Cross-domain engine

One mechanism. 10 domains. 26 gates.

A fabricated claim still has to satisfy the conservation laws of its field — and that is re-derivable with no oracle. The same engine catches a balance sheet that doesn’t close, a survey mean that’s arithmetically impossible (GRIM), a 99%-sensitive test claimed to imply a 99% PPV for a rare disease (the base-rate fallacy), an unbalanced reaction, a non-dimensional equation, a qubit with T2 > 2·T1. Any declared number that breaks a domain law forces REJECT — downgrade-only, so it can only make a verdict stricter. Fail-closed is a proven invariant (18/18 structurally-deficient claims rejected, locked by a test), not a number.

Finance — A=L+EStatistics — GRIMEpidemiology — Bayes PPV, RR/OR, vaccine efficacyChemistry — stoichiometry, charge, oxidation statesPhysics — dimensions, η≤1, v≤cQuantum — T2≤2T1, Γφ, gate floorEngineering — Ohm’s lawBiology — Hardy-Weinberg, mark-recaptureMathematics — root & linear-system checksUniversal — probability & conservation+ a new domain = one deterministic law

Proof-carrying

It re-derives the evidence. It doesn’t trust it.

Above the schema gate, CAPAS re-computes the claimed result from its raw inputs and only GATEs — marks re-derived — what it can reproduce. What it cannot re-derive it ATTESTs: signed and bound, never marketed as verified. The boundary is explicit on every receipt. No language model is ever in the decision path.

Statistical — re-run the test from raw dataCalibration & chromatography peak areaClinical datasets (SDTM→ADaM)Financial ratios from XBRL — US-GAAP & IFRSAccounting identities (debits=credits, A=L+E)Dimensional consistency (SI)Stoichiometry / mass balancePhysical laws (Antoine, c, absolute zero, Holevo)Quantum re-simulation (below the frontier)Zero-knowledge proof over hidden data

GATE = re-derived and reproducible. ATTEST = signed, not verified. CAPAS certifies computational consistency of the re-derivable slice — not scientific truth, and not the authenticity of the raw data (the irreducible GIGO residual). It re-derives more than it trusts, and says exactly which is which.

The enterprise asset

Every decision is an audit artifact

Frontier models produce text; CAPAS produces an auditable evidence trail — and that is what a regulated buyer actually purchases. Every verdict is operational, not just a label: ACCEPT licenses reuse · REWRITE returns the corrected claim with an Original→Licensed diff · REJECT names the missing evidence fields · HOLD lists the obligations to resolve before reuse.

"claim_id": "claim_drift_001",

"claim_type": "causal_mechanism_claim",

"decision": "REJECT",

"evidence_contract": "intervention_or_natural_experiment + temporal_order + confounders + mechanism",

"reason": "causal claim lacks intervention/natural-experiment evidence or temporal order",

"audit_hash": "sha256:392234b5fb245c166f0e2dd46b9bc1ffff1f2c6d52b20cd3b8f0b9ea6667034d",

"re_derive": "python3 benchmarks/home_gate_decisions.py"

What validates us

Run it yourself — every claim carries its command

CAPAS is a fail-closed gate: it disposes a verdict on the structure of the evidence you submit, with no language model in the decision path. Every claim below carries the command that regenerates it — clone the repo and run them. A claim you can’t re-derive isn’t a claim, it’s a slogan.

PROVEN · locked invariants — re-running the test is the proof

Never ACCEPTs a structurally-deficient claim (18/18 deficient cases held). python3 benchmarks/verify_fail_closed.py
Never crashes on hostile or garbage input — fails closed, returns one of four legal verdicts every time. python3 benchmarks/test_dynamic_fuzz.py
Survives a 20-payload hostile battery (malformed, injection, oversize, SSRF) with no false-accept. python3 benchmarks/verify_robustness.py
Every HOLD ships an actionable resolution — no dead ends (7/7 paths). python3 benchmarks/verify_hold_has_resolution.py
The 8 load-bearing invariants all hold, with a reproducible result hash. python3 benchmarks/conformance.py
Anytime-valid Type-I guarantee: empirical false-reject ≤ α=0.05 under the null. python3 benchmarks/demo_sequential_typeI.py

BACKED · regenerates from a command, and the output hash matches

Every verdict carries an audit hash a third party re-derives from the published recipe; tamper any load-bearing field and the hash diverges. python3 benchmarks/verify_audit_hash_reproduces.py
A signed conformance attestation binds the commit, the conformance result hash, and a live verdict into one packet hash. python3 benchmarks/attest_conformance.py
All 26 cross-domain invariant gates produce a live verdict. python3 benchmarks/generate_capability_matrix.py
The browser Gate App consumes a single source of truth — no drift between the served contract and capas.py. python3 benchmarks/verify_gate_contracts_match.py

SCOPED · the mechanism runs; the headline rate is over a synthetic / agent-coded corpus

We name the corpus and the upgrade path — these are coverage, not measured real-world rates.

N=1238 decisions across 12 claim families exercise the full ACCEPT/REWRITE/REJECT/HOLD space — synthetic-grid coverage, not a real-world reject rate. python3 benchmarks/family_decision_mix.py
The n=28 retrospective separated retracted from replicated published claims by structure — agent-coded from public retraction records, not blind adjudication. python3 benchmarks/pilot_real.py
The pharma admissibility contract is fail-closed over a synthetic n=3024 corpus (not a measured market rate). python3 benchmarks/generate_pharma_corpus.py

WHAT IS NOT VALIDATED — the honesty footer is the point

The GIGO ceiling is real and we do not hide it. CAPAS gates the structure of evidence, not ground truth — a self-consistent, well-formed, fabricated payload can pass the gate. Our own fuzz and pedagogy-governance tests measure and report this residual (a disclosed false-admit on the GIGO-ceiling class). The gate raises the cost of lying and makes every verdict re-derivable; it does not detect a careful liar.

No blind, oracle-adjudicated fraud-detection rate exists. The retrospective and benchmark numbers are agent-coded or synthetic; the “0 false-accepts” figure is on that corpus only.
The cross-domain blind-adjudication study (Gap-1) has NOT been run. Its registry, receipts, and gating are built and pass (python3 benchmarks/study_assembly.py), but the n≥500 human-adjudicated comparison is pre-registered, not yet executed.
No SOC 2 / ISO 27001 certification. Roadmap, not held.
Head-to-head baselines (UMA, BTS, EigenTrust) are our models of those mechanisms, not the live systems.

If any command above does not reproduce its stated result on your machine, that is a bug — tell us.

Retrospective validation

Tested against real retracted science

28 famous claims — every one of them passed peer review and was published in Nature, Science, or The Lancet. 14 were later retracted (Wakefield, Surgisphere, Schön, STAP…); 14 were independently replicated (LIGO, Higgs, RECOVERY dexamethasone…). Plausibility could not tell them apart — all 28 looked publishable. The gate separated them by structure.

28/28

Separated by structure on an illustrative, agent-coded retrospective — not an adjudicated benchmark

0/28

Plausibility / peer-review — all 28 passed, retracted and valid alike

Each fraud was gated for its actual structural deficiency — no controls, no independent reproduction, unauditable data — the same gaps it was retracted for. Honest scope: an illustrative retrospective whose corpus is coded from public retraction records (Retraction Watch, journal notices); it validates the gate’s structural logic, not fraud-detection from raw paper text. Partner pilots pending.

Use Cases

Where teams deploy CAPAS

From academic publishing to enterprise AI governance — any workflow that touches scientific claims benefits from a deterministic admissibility gate.

Academic publishing

Gate submitted manuscripts at the desk-review stage. Catch inadmissible claims before peer review consumes reviewer time.

AI training pipelines

Validate claim-evidence pairs before they enter training datasets. Prevent inadmissible or unlicensed scientific claims from corrupting model training.

Regulatory compliance

Generate structured audit trails for claims in regulated industries. Every verdict is machine-readable, timestamped, and exportable.

Research pilot programs

Run batch evaluations across a corpus of claims. Identify systemic evidence gaps before a full production deployment.

Claim fact-checking desks

Triage incoming claims automatically. Surface only those with sufficient evidence structure for human fact-checker review.

API integration

Embed CAPAS gate calls directly into your existing research pipeline, CMS, or editorial system. JSON in, structured verdict out.

For developers

Wrap your LLM. Stop trusting plausible.

An installable verification layer: your model proposes, CAPAS disposes. It never lets an unsupported claim through as ACCEPT — it re-derives what is re-derivable, grades the rest, and emits a verifiable reward your model can’t game by sounding right. No language model in the verdict.

$ pip install capas-claim-gate
# live on PyPI · or from source: pip install git+https://github.com/fomv9354lve/capas-inteligentes
from capas_sdk import verified
# your model proposes the evidence; CAPAS gates it
verdict = verified(my_llm, "reproducibility_check")["verdict"]
# -> ACCEPT / REWRITE / REJECT / HOLD  (never the LLM’s call)

0/5 false-accepts — CAPAS gate

5/5 accept-if-undisputed default

5 checkable-FALSE claims · a strong LLM-judge ties CAPAS here (0/5) — the gate’s edge is determinism, not accuracy · re-derive: python3 benchmarks/head_to_head_sota.py

The boost is reliability, not capability — the model doesn’t get smarter, its output becomes admissible-or-deferred. Honest scope: it grounds record↔text, not text↔reality — a source that lies about its methods and withholds its data passes (the GIGO ceiling), so CAPAS says exactly which slice it re-derived and which it could only attest.

One core, three surfaces

However you build, CAPAS is the gate

The same deterministic engine, exposed three ways. CAPAS is never the language model — it is the fail-closed layer the model proposes into.

Library · pip

Wrap your LLM in code. gate · reward · certificate · invariants · gate_quantum.

pip install capas-claim-gate
live on PyPI

Skill · MCP server

A tool any agent calls — Claude Code, Desktop, Cursor. Zero dependencies. The agent proposes, CAPAS disposes.

python3 capas_mcp.py  # see docs/mcp.md

API · signed certificate

Hosted, auth-gated issuance of a signed, persisted, tamper-evident admissibility certificate — the audit artifact a regulated buyer purchases.

POST /api/certificate → id + signature

The same pattern — invariant checks + threshold gates + a fail-closed verdict + a disclosed boundary — is what IBM’s production calibration system is. The architecture isn’t speculative: a hardware vendor runs it at scale.

Proof on real hardware

We beat a vendor benchmark with the vendor’s own numbers.

IBM’s headline gate-error figure is an optimistic lower bound — for real circuits it under-states by 3–10× (Proctor, Nat. Phys. 2022). From the same published calibration fields, CAPAS re-derives the complete error budget — fully auditable, no hardware required.

IBM headline (RB)

1.6×10^-3

→

Re-derivable complete floor

1.9×10^-2

Error the headline hid

2–11×

Honest scope: the 1.9×10⁻² worst case shown is ibm_fez q9–q10, re-derived term-by-term from the vendor’s published fields (the 3–10× structured-circuit band is a cited Proctor range, not our finding). The live leg is a separate device: on ibm_kingston, CAPAS re-found the chip’s one anomalous qubit (Q121) and its bad-coupler cluster from calibration alone, and admitted a real Bell measurement — 0.045 vs a re-derived floor of 0.020, ≈2.2×, under two independent oracles. The arithmetic is textbook and copyable; the fail-closed discipline that refuses the optimistic headline as admissible is the part a third party can re-run. Full method →

Independent validation

We didn’t invent the architecture — a frontier quantum-hardware vendor’s production stack already runs it.

IBM’s quantum stack will not run your circuit until it clears a calibration gate — every job checked against frozen, re-derived device invariants, fail-closed, with no model-of-the-day in the decision. That is exactly the CAPAS architecture: re-derive from declared evidence, refuse on violation, keep the verdict deterministic. It already runs in production, at the frontier of physics. CAPAS generalizes the same mechanism across ten domains — finance, statistics, epidemiology, chemistry, physics, quantum. Two independent systems converging on the same admissibility mechanism is consilience: evidence the design is structural, not a pitch.

Honest scope: the identities are textbook and the convergence is architectural, not a partnership claim. And the mechanism itself is the open Apache-2.0 engine — re-derivable, therefore copyable — so the consilience validates the design, it is not the moat. What is defensible is the cross-domain composition plus the self-run conformance mark and signed certificate; on an unchanging-then-reported-operational calibration value, CAPAS applies the more conservative, fail-closed disposition. (Independent of IBM — not affiliated or endorsed; observations from public open-plan metadata.)

0 false flags

Run live over the real 156-qubit ibm_kingston calibration, CAPAS independently re-found only the genuine anomalies — the single T2 > 2·T1 qubit (Q121) and the 12-edge TLS cluster — with no false positives on honest vendor data.

24× caught

An estimate put one coupler’s residual ZZ at ~86 kHz; IBM’s published value is 3.56 kHz. CAPAS’s exact-only discipline refused the estimate — gating on it would have false-flagged a healthy edge.

Not a slide — re-derive it yourself: examples/kingston_live_audit.py audits the live device; benchmarks/kingston_real_bell_verdict.json gates a real Bell measurement against IBM’s own calibrated noise model (two independent oracles agree). Open engine, Apache-2.0: the method is fully inspectable and the verdict re-derivable by anyone. The defensible asset isn’t the copyable engine — it’s the self-run conformance mark and the signed certificate, under a governance charter that pre-commits the mark to neutral governance — binding in direction, drafted for irrevocability, not yet legally executed (the trustee is an open item).

Why it compounds

The moat isn’t any single gate. It’s a conformance mark and signed certificate you re-run, not trust.

Any one gate is copyable. What is hard to copy is trust you can audit — and CAPAS is built so trust is earnable, not asserted. Every verdict re-derives to a hash an independent party can reproduce, and tampering diverges; every headline claim is CLOSED / BACKED / SCOPED in a public ledger that discloses which numbers are synthetic; the mark attests only that an artifact passed a suite you run yourself for the same verdict and the same hash, issued as a signed, content-addressed certificate. A competitor can copy a gate in a week; it cannot copy a self-run conformance mark, a signed-certificate audit trail, and a relicensing posture renounced in writing. That is the moat — the auditable standard around the tool. Whether it becomes the reference standard depends on third parties actually challenging it: that adjudication is open, not yet run.

Open standard · governed

Open engine. Reserved mark. Pre-committed to neutral governance.

CAPAS ships open-core (Apache-2.0): the schema, calculus, reference gate, CLI, tests, and benchmark corpus are yours to run and fork. The defensible asset is not the code — it’s the certification mark, and a mark is only worth trusting if it can’t be pulled. So the mark is reserved and pre-committed to neutral governance before adoption, not after — the one move that let Open Policy Agent survive its sponsor’s acquisition while MongoDB, Elastic, HashiCorp, and Redis each triggered a fork by relicensing their core to capture value. We renounce that move in writing. Governance charter →

Conformance is self-runnable and deterministic — python3 benchmarks/conformance.py runs the exact suite the certifier runs and returns the same verdict and the same hash. No private process to trust. The mark attests an artifact passed that. Certification & how to certify →

Credentials & trust

Open credentials. Nothing you can’t re-check.

Each badge links to a verifiable source — a passing workflow, a published score, a signed release, or a re-derivable hash. We don’t display a certification we haven’t earned.

Apache-2.0 CI · passing CAPAS-CONFORMANT · 8/8 self-run + re-derivable hash Sigstore-signed conformance attestation (Rekor) PyPI · capas-claim-gate 0.3.0 (0.4.0 pending) OpenSSF Scorecard Governance charter · binding direction, not yet executed Responsible-disclosure policy Third-party notices

Quantum-calibration credential: CAPAS gates reported quantum-device claims against textbook invariants — run live over IBM’s 156-qubit ibm_kingston calibration it re-found only the genuine anomalies (Q121, the bad-coupler cluster) with 0 false flags. Architectural consilience with IBM’s own admissibility engine — not a partnership or endorsement. The IBM consilience →

OpenSSF Best Practices Badge: passing (100%) — self-cert + static/dynamic-analysis criteria met. The badge hotlinks the live bestpractices.dev project, so it verifies independently and updates itself.

Beachhead

Statistical-claim admissibility for regulated submissions.

Pinnacle 21 already checks whether a trial dataset is structurally well-formed (CDISC conformance). It does not check whether the reported statistic is licensed by its evidence. CAPAS does: significance versus alpha, multiplicity, confidence-interval-excludes-null, effect direction, endpoint pre-specification — re-derivably, beside the submission, not as a replacement. Validated on a 3,024-case synthetic admissibility corpus, 0 deficient claims accepted (fail-closed) — contract coverage of the space P21 skips, not a production false-accept rate on real submissions. Market validation →

Get started

Ready to gate your first claim?

Load a sample payload or build your own evidence contract in under two minutes. No account required for the pilot.

Open Gate App Download pilot packet Read methodology

CAPAS gates structured evidence supplied by users. It does not certify scientific truth or replace external review.
The engine ships 26 deterministic gates across 10 domains; unsupported domains HOLD until a team defines a new evidence contract, admissibility policy, and audit artifact.
CAPAS does not use language models at decision time — every gate decision is deterministic and fully traceable.

Does this claim’sevidence hold up?