Methodology

DAB v2.1

420-point competitive benchmark. No system scores above 50%.

Why v1 Is Not Enough

DAB v1 is a release gate. It catches regressions in install, ingestion, search, and MCP behavior, but it does not separate basic functionality from strong retrieval, conversation memory, graph reasoning, or latency under competitive workloads.

100pts

Retrieval

Exact lookup, paraphrase recall, ranking quality, long-tail coverage, latency-aware search.

100pts

Conversation

Multi-turn memory, fact carryover, contradiction handling, temporal recall, state preservation.

100pts

Graph

Entity extraction, edge creation, multi-hop traversal, graph updates, relationship repair.

Sections

Five scored sections. Three of them form the benchmark core.

40pts

Infrastructure

Install and version check
Database bootstrap and migrations
Corpus ingest and persistence
MCP startup and tool discovery

100pts

Retrieval

Exact FTS recall on known documents
Semantic paraphrase recall on natural queries
Ranking quality under noisy distractors
Hybrid search consistency across repeated runs
Latency-sensitive search at larger corpus sizes

100pts

Conversation

Multi-turn memory retention
Persona and preference carryover
Conflict detection across turns
Temporal grounding of prior facts
Long-context extraction without tool drift

100pts

Graph

Entity extraction and normalization
Relationship creation from free text
Multi-hop traversal and retrieval
Contradiction handling in graph updates
Graph repair after partial writes

80pts

Intelligence

Memory tool selection
Gap detection when evidence is missing
Context pruning under token pressure
Evidence citation and answer discipline

Latency Penalty

>2s = 50% pts, >5s = 0 pts.

Competitive Snapshot

System	Infra	Retrieval	Conversation	Graph	Intelligence	Total	Status
Quaid	35	80	0	0	15	130/420 (31%)	measured
GBrain	estimated	estimated	not-run	estimated	pending	estimated, no numeric score	estimated
Mem0 v3	~30	~30	~82	0	~10	~152/420 (36%)	estimated
qmd	21	45	0	0	5	71/420 (17%)	measured

Competitor scores estimated from published benchmarks. Quaid measured. GBrain is included from the AI Heroes benchmark post, which reports an 8.3x win over qmd on 150 real questions. Its reported setup uses cloud-based enrichment and is not airgapped; no numeric DAB score is published.

How To Run

MEMORY_CMD=quaid bash benchmarks/dab-v2/run.sh