Live leaderboard

quaid evals

The benchmark for agent memory systems

Competitive benchmark coverage across reliability, conversational memory, and scale. Quaid’s release gate lives here beside published and estimated peer results.

Latest release
v0.23.0
2026-06-22
DAB v1
99.1%
Release gate score
LoCoMo
20%
Measured today
Published runs
28
Across Quaid evals

Competitive Leaderboard

Grouped bar charts compare Quaid against peer memory systems across the benchmarks that decide whether a memory layer is real or just context stuffing.

Infrastructure & Reliability

DAB v1

Release-gate reliability out of 100%.

Conversational Memory

LoCoMo + LongMemEval

Published and measured dialogue-memory scores.

Scale

BEAM

Extreme-scale memory performance from 100K to 10M tokens.

measured published estimated pending / n/a

Estimated values show an asterisk in tooltips. Pending, not-benchmarked, and n/a states render as ghost bars at 0. GBrain is included from the AI Heroes benchmark, May 2026, which reports an 8.3x win over qmd on 150 real questions. The reported setup uses cloud-based enrichment and is not airgapped; numeric DAB, LoCoMo, LongMemEval, and BEAM scores remain pending until reproducible runs are published.

Version History

Latest Quaid release-gate runs. Full trend charts live on the history page.

Open full history
v0.23.0 2026-06-22
99.1%
P@5 17.4% / R@5 38.9%
v0.23.0 2026-06-16
75.8%
P@5 17.4% / R@5 38.9%
v0.22.4 2026-05-18
94.4%
P@5 pending / R@5 pending
v0.22.3 2026-05-15
94.4%
P@5 pending / R@5 pending
v0.22.2 2026-05-14
94.4%
P@5 pending / R@5 pending

Latest MSMARCO snapshot: P@5 17.4% / R@5 38.9%.