Ratel Benchmark

Ratel, measured

How context engineering effect performance of your agent. How much token can you save, can you keep it accurate, make it faster?

Evaluated with golden industry standards, measured with BFCL v3.

How Ratel effect agent accuracy and token cost

Pick a model. The bars compare mean total tokens per task; the radar overlays all five metrics for With Ratel (the search_tools gateway) vs Without Ratel (the full tool pool). On Token efficiency the axis shows tokens spent from the center out, so Without Ratel spikes there. Hover any plot for exact values.

Mean total tokens per task

88% tokens
Without RatelClaude Sonnet 4.6 · full tool pool
0
With RatelClaude Sonnet 4.6 · search_tools gateway
0
Anthropic

Claude Sonnet 4.6

Task completionTool selectionRecallToken efficiencySpeed
Without RatelWith Ratel
Task completion
92.0%
Without Ratel 92.0%
Tool selection
99.0%
Without Ratel 100.0%
Recall
96.3%
Without Ratel 97.2%
Token cost
3,667
Without Ratel 29,404
−88%
p50 latency
8,270 ms
Without Ratel 9,207 ms

Task completion

Pooled over the simple + multiple splits. Without Ratel = full tool pool in context; With Ratel = search_tools gateway; Oracle = only the gold tools in context (the ceiling).

ModelArmTask completionTool selectionRecallMean total tokens
Claude Haiku 4.5Without Ratel93.0%98.0%95.8%29,324
Oracle93.0%100.0%97.3%1,732
With Ratel92.0%97.0%95.0%3,639
Claude Sonnet 4.6Without Ratel92.0%100.0%97.2%29,404
Oracle92.0%99.0%96.2%1,851
With Ratel92.0%99.0%96.3%3,667

Retrieval evaluation

Evaluation of retriever engine. Pick a pool size — the bars and tables show the mean ranking metrics at top k = 1,3,5. Accuracy = a gold tool in the top-K; complete = every gold tool in the top-K; gold = share of queries whose gold tool was retrievable.

Simple (single gold tool)

100%75%50%25%0%
Accuracy
Complete
Recall
MRR
nDCG
K=1
Accuracy
Complete
Recall
MRR
nDCG
K=3
Accuracy
Complete
Recall
MRR
nDCG
K=5
KaccuracycompleterecallMRRnDCGgold sim.
196.5%96.5%0.9650.9650.96598.7%
398.7%98.7%0.9870.9760.97998.7%
598.7%98.7%0.9870.9760.97998.7%

Multiple (several gold tools)

100%75%50%25%0%
Accuracy
Complete
Recall
MRR
nDCG
K=1
Accuracy
Complete
Recall
MRR
nDCG
K=3
Accuracy
Complete
Recall
MRR
nDCG
K=5
KaccuracycompleterecallMRRnDCGgold sim.
195.5%95.5%0.9550.9550.95599.0%
398.5%98.5%0.9850.9680.97399.0%
599.0%99.0%0.9900.9690.97599.0%