DAB v2.1
420-point competitive benchmark. No system scores above 50%.
Why v1 Is Not Enough
DAB v1 is a release gate. It catches regressions in install, ingestion, search, and MCP behavior, but it does not separate basic functionality from strong retrieval, conversation memory, graph reasoning, or latency under competitive workloads.
Retrieval
Exact lookup, paraphrase recall, ranking quality, long-tail coverage, latency-aware search.
Conversation
Multi-turn memory, fact carryover, contradiction handling, temporal recall, state preservation.
Graph
Entity extraction, edge creation, multi-hop traversal, graph updates, relationship repair.
Sections
Five scored sections. Three of them form the benchmark core.
Infrastructure
- Install and version check
- Database bootstrap and migrations
- Corpus ingest and persistence
- MCP startup and tool discovery
Retrieval
- Exact FTS recall on known documents
- Semantic paraphrase recall on natural queries
- Ranking quality under noisy distractors
- Hybrid search consistency across repeated runs
- Latency-sensitive search at larger corpus sizes
Conversation
- Multi-turn memory retention
- Persona and preference carryover
- Conflict detection across turns
- Temporal grounding of prior facts
- Long-context extraction without tool drift
Graph
- Entity extraction and normalization
- Relationship creation from free text
- Multi-hop traversal and retrieval
- Contradiction handling in graph updates
- Graph repair after partial writes
Intelligence
- Memory tool selection
- Gap detection when evidence is missing
- Context pruning under token pressure
- Evidence citation and answer discipline
>2s = 50% pts, >5s = 0 pts.
Competitive Snapshot
| System | Infra | Retrieval | Conversation | Graph | Intelligence | Total | Status |
|---|---|---|---|---|---|---|---|
| Quaid | 35 | 80 | 0 | 0 | 15 | 130/420 (31%) | measured |
| GBrain | estimated | estimated | not-run | estimated | pending | estimated, no numeric score | estimated |
| Mem0 v3 | ~30 | ~30 | ~82 | 0 | ~10 | ~152/420 (36%) | estimated |
| qmd | 21 | 45 | 0 | 0 | 5 | 71/420 (17%) | measured |
Competitor scores estimated from published benchmarks. Quaid measured. GBrain is included from the AI Heroes benchmark post, which reports an 8.3x win over qmd on 150 real questions. Its reported setup uses cloud-based enrichment and is not airgapped; no numeric DAB score is published.
How To Run
MEMORY_CMD=quaid bash benchmarks/dab-v2/run.sh