Live leaderboard

quaid evals

The benchmark for agent memory systems

Competitive benchmark coverage across reliability, conversational memory, and scale. Quaid’s release gate lives here beside published and estimated peer results.

Latest release

v0.23.0

2026-06-22

DAB v1

99.1%

Release gate score

LoCoMo

20%

Measured today

Published runs

Across Quaid evals

Competitive Leaderboard

Grouped bar charts compare Quaid against peer memory systems across the benchmarks that decide whether a memory layer is real or just context stuffing.

Infrastructure & Reliability

DAB v1

Release-gate reliability out of 100%.

Conversational Memory

LoCoMo + LongMemEval

Published and measured dialogue-memory scores.

Scale

BEAM

Extreme-scale memory performance from 100K to 10M tokens.

measured published estimated pending / n/a

Estimated values show an asterisk in tooltips. Pending, not-benchmarked, and n/a states render as ghost bars at 0. GBrain is included from the AI Heroes benchmark, May 2026, which reports an 8.3x win over qmd on 150 real questions. The reported setup uses cloud-based enrichment and is not airgapped; numeric DAB, LoCoMo, LongMemEval, and BEAM scores remain pending until reproducible runs are published.

Methodology

DAB v1

Release-gate scoring and thresholds.

Methodology

LoCoMo

Conversational memory benchmark details.

Methodology

LME

LongMemEval setup and per-type breakdown.

Methodology

BEAM

Extreme-scale memory benchmark methodology.

Methodology

History

Version charts and the last 10 published runs.

Version History

Latest Quaid release-gate runs. Full trend charts live on the history page.

Open full history

v0.23.0 2026-06-22

99.1%

P@5 17.4% / R@5 38.9%

v0.23.0 2026-06-16

75.8%

P@5 17.4% / R@5 38.9%

v0.22.4 2026-05-18

94.4%

P@5 pending / R@5 pending

v0.22.3 2026-05-15

94.4%

P@5 pending / R@5 pending

v0.22.2 2026-05-14

94.4%

P@5 pending / R@5 pending

Latest MSMARCO snapshot: P@5 17.4% / R@5 38.9%.