Ratel, measured

How context engineering effect performance of your agent. How much token can you save, can you keep it accurate, make it faster?

Evaluated with golden industry standards, measured with BFCL v3.

View results Read the docs

How Ratel effect agent accuracy and token cost

Pick a model. The bars compare mean total tokens per task; the radar overlays all five metrics for With Ratel (the search_tools gateway) vs Without Ratel (the full tool pool). On Token efficiency the axis shows tokens spent from the center out, so Without Ratel spikes there. Hover any plot for exact values.

Mean total tokens per task

−88% tokens

Without RatelClaude Sonnet 4.6 · full tool pool

With RatelClaude Sonnet 4.6 · search_tools gateway

Anthropic

Claude Sonnet 4.6

Without RatelWith Ratel

Task completion: 92.0%; Without Ratel 92.0%
Tool selection: 99.0%; Without Ratel 100.0%
Recall: 96.3%; Without Ratel 97.2%
Token cost: 3,667; Without Ratel 29,404; −88%
p50 latency: 8,270 ms; Without Ratel 9,207 ms

Task completion

Pooled over the simple + multiple splits. Without Ratel = full tool pool in context; With Ratel = search_tools gateway; Oracle = only the gold tools in context (the ceiling).

Model	Arm	Task completion	Tool selection	Recall	Mean total tokens
Claude Haiku 4.5	Without Ratel	93.0%	98.0%	95.8%	29,324
	Oracle	93.0%	100.0%	97.3%	1,732
	With Ratel	92.0%	97.0%	95.0%	3,639
Claude Sonnet 4.6	Without Ratel	92.0%	100.0%	97.2%	29,404
	Oracle	92.0%	99.0%	96.2%	1,851
	With Ratel	92.0%	99.0%	96.3%	3,667

Retrieval evaluation

Evaluation of retriever engine. Pick a pool size — the bars and tables show the mean ranking metrics at top k = 1,3,5. Accuracy = a gold tool in the top-K; complete = every gold tool in the top-K; gold = share of queries whose gold tool was retrievable.

Simple (single gold tool)

100%75%50%25%0%

Accuracy

Complete

Recall

MRR

nDCG

K=1

Accuracy

Complete

Recall

MRR

nDCG

K=3

Accuracy

Complete

Recall

MRR

nDCG

K=5

K	accuracy	complete	recall	MRR	nDCG	gold sim.
1	96.5%	96.5%	0.965	0.965	0.965	98.7%
3	98.7%	98.7%	0.987	0.976	0.979	98.7%
5	98.7%	98.7%	0.987	0.976	0.979	98.7%

Multiple (several gold tools)

100%75%50%25%0%

Accuracy

Complete

Recall

MRR

nDCG

K=1

Accuracy

Complete

Recall

MRR

nDCG

K=3

Accuracy

Complete

Recall

MRR

nDCG

K=5

K	accuracy	complete	recall	MRR	nDCG	gold sim.
1	95.5%	95.5%	0.955	0.955	0.955	99.0%
3	98.5%	98.5%	0.985	0.968	0.973	99.0%
5	99.0%	99.0%	0.990	0.969	0.975	99.0%