Methodology

DAB v2.1

420-point competitive benchmark. No system scores above 50%.

Why v1 Is Not Enough

DAB v1 is a release gate. It catches regressions in install, ingestion, search, and MCP behavior, but it does not separate basic functionality from strong retrieval, conversation memory, graph reasoning, or latency under competitive workloads.

100pts

Retrieval

Exact lookup, paraphrase recall, ranking quality, long-tail coverage, latency-aware search.

100pts

Conversation

Multi-turn memory, fact carryover, contradiction handling, temporal recall, state preservation.

100pts

Graph

Entity extraction, edge creation, multi-hop traversal, graph updates, relationship repair.

Sections

Five scored sections. Three of them form the benchmark core.

40pts

Infrastructure

  • Install and version check
  • Database bootstrap and migrations
  • Corpus ingest and persistence
  • MCP startup and tool discovery
100pts

Retrieval

  • Exact FTS recall on known documents
  • Semantic paraphrase recall on natural queries
  • Ranking quality under noisy distractors
  • Hybrid search consistency across repeated runs
  • Latency-sensitive search at larger corpus sizes
100pts

Conversation

  • Multi-turn memory retention
  • Persona and preference carryover
  • Conflict detection across turns
  • Temporal grounding of prior facts
  • Long-context extraction without tool drift
100pts

Graph

  • Entity extraction and normalization
  • Relationship creation from free text
  • Multi-hop traversal and retrieval
  • Contradiction handling in graph updates
  • Graph repair after partial writes
80pts

Intelligence

  • Memory tool selection
  • Gap detection when evidence is missing
  • Context pruning under token pressure
  • Evidence citation and answer discipline
Latency Penalty

>2s = 50% pts, >5s = 0 pts.

Competitive Snapshot

System Infra Retrieval Conversation Graph Intelligence Total Status
Quaid 35 80 0 0 15 130/420 (31%) measured
GBrain estimated estimated not-run estimated pending estimated, no numeric score estimated
Mem0 v3 ~30 ~30 ~82 0 ~10 ~152/420 (36%) estimated
qmd 21 45 0 0 5 71/420 (17%) measured

Competitor scores estimated from published benchmarks. Quaid measured. GBrain is included from the AI Heroes benchmark post, which reports an 8.3x win over qmd on 150 real questions. Its reported setup uses cloud-based enrichment and is not airgapped; no numeric DAB score is published.

How To Run

MEMORY_CMD=quaid bash benchmarks/dab-v2/run.sh