Claude Sonnet 4.6
- Task completion
- 92.0%
- Without Ratel 92.0%
- Tool selection
- 99.0%
- Without Ratel 100.0%
- Recall
- 96.3%
- Without Ratel 97.2%
- Token cost
- 3,667
- Without Ratel 29,404
- −88%
- p50 latency
- 8,270 ms
- Without Ratel 9,207 ms
How context engineering effect performance of your agent. How much token can you save, can you keep it accurate, make it faster?
Evaluated with golden industry standards, measured with BFCL v3.
Pick a model. The bars compare mean total tokens per task; the radar overlays all five metrics for With Ratel (the search_tools gateway) vs Without Ratel (the full tool pool). On Token efficiency the axis shows tokens spent from the center out, so Without Ratel spikes there. Hover any plot for exact values.
Pooled over the simple + multiple splits. Without Ratel = full tool pool in context; With Ratel = search_tools gateway; Oracle = only the gold tools in context (the ceiling).
| Model | Arm | Task completion | Tool selection | Recall | Mean total tokens |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | Without Ratel | 93.0% | 98.0% | 95.8% | 29,324 |
| Oracle | 93.0% | 100.0% | 97.3% | 1,732 | |
| With Ratel | 92.0% | 97.0% | 95.0% | 3,639 | |
| Claude Sonnet 4.6 | Without Ratel | 92.0% | 100.0% | 97.2% | 29,404 |
| Oracle | 92.0% | 99.0% | 96.2% | 1,851 | |
| With Ratel | 92.0% | 99.0% | 96.3% | 3,667 |
Evaluation of retriever engine. Pick a pool size — the bars and tables show the mean ranking metrics at top k = 1,3,5. Accuracy = a gold tool in the top-K; complete = every gold tool in the top-K; gold = share of queries whose gold tool was retrievable.
| K | accuracy | complete | recall | MRR | nDCG | gold sim. |
|---|---|---|---|---|---|---|
| 1 | 96.5% | 96.5% | 0.965 | 0.965 | 0.965 | 98.7% |
| 3 | 98.7% | 98.7% | 0.987 | 0.976 | 0.979 | 98.7% |
| 5 | 98.7% | 98.7% | 0.987 | 0.976 | 0.979 | 98.7% |
| K | accuracy | complete | recall | MRR | nDCG | gold sim. |
|---|---|---|---|---|---|---|
| 1 | 95.5% | 95.5% | 0.955 | 0.955 | 0.955 | 99.0% |
| 3 | 98.5% | 98.5% | 0.985 | 0.968 | 0.973 | 99.0% |
| 5 | 99.0% | 99.0% | 0.990 | 0.969 | 0.975 | 99.0% |