ECC Deep Dive: The All-in-One Token Optimization Harness
182K+ GitHub stars, winner of the Anthropic Hackathon. One install replaces five manual configurations — model routing, thinking token caps, strategic compaction, subagent routing, and cost auditing — all working together to deliver a verified 30–50% reduction in token spend.
What Is ECC
ECC stands for Everything Claude Code, created by Affaan Mustafa. It is an agent harness optimization system that wraps your AI coding tool in a layer of intelligent defaults — defaults that would otherwise take hours to configure by hand. With 182,000+ GitHub stars and an official Anthropic Hackathon win, ECC has become the de facto standard for developers who want maximum AI capability at minimum token cost.
At its core, ECC is not a separate tool you have to learn. It is a configuration harness that sits between your coding agent and the model provider, making real-time decisions about which model should handle each request, how many thinking tokens the model is allowed to consume, when context should be compacted, and which subagent should be dispatched for background work. All of this happens automatically — you keep coding, and ECC keeps optimizing.
ECC was born from a simple observation: most developers were running Opus for everything, hitting $200+ monthly API bills, and manually tweaking a dozen environment variables that kept drifting out of sync. Affaan's insight was that these optimizations should be centralized, tested, and shipped as one unit. The result is a system that has saved the community an estimated millions of dollars in token costs since its release.
The hackathon judges at Anthropic recognized ECC for solving a real, measurable problem. Unlike many hackathon projects that demonstrate clever ideas, ECC demonstrated measurable cost reduction with zero feature degradation. Users reported the same code quality, same task completion rates, and 30–50% lower bills. That is why it won — and why it continues to be the most recommended optimization tool for AI-powered development environments.
ECC supports a broad ecosystem of AI coding tools: Claude Code (primary and deepest integration), Cursor, Codex, OpenCode, Gemini CLI, and GitHub Copilot. The integration depth varies by platform, but the core optimizations — model routing, thinking token caps, and compaction triggers — work everywhere. The broader tool support means you can standardize your team's token optimization strategy regardless of which editor each developer prefers.
What ECC Automates
ECC bundles five optimization layers that would otherwise require manual configuration across multiple files. Each layer targets a different source of token waste.
1. Intelligent Model Routing
ECC inspects every request before it reaches the API and routes it to the appropriate model based on task complexity. Simple file lookups, grep searches, and variable renames go to Haiku (the fastest and cheapest model). Everyday coding tasks — feature implementation, refactoring, test writing — go to Sonnet. Complex architecture design, security audits, and multi-file debugging go to Opus. This single layer accounts for roughly half of ECC's total savings. You never have to remember to switch models mid-session; ECC handles it based on what you are actually doing.
2. Thinking Token Caps
Claude's extended thinking feature is powerful but expensive. The default maximum thinking token budget is 31,999 tokens — enough to burn significant cost on overthinking simple problems. ECC sets MAX_THINKING_TOKENS=10000, which is more than enough for complex reasoning while capping runaway thinking costs on straightforward tasks. Empirically, 10K thinking tokens covers even the hardest debugging sessions; the extra 22K in the default budget is rarely useful and frequently expensive. This layer alone saves 10–15% on sessions that use extended thinking.
3. Strategic Compaction
The native Claude Code compaction trigger fires at 95% context — far too late. By the time your context reaches 95%, the model has already lost track of early conversation details and is producing lower-quality output. ECC lowers the compaction threshold to 50% (AUTOCOMPACT_PCT=50), triggering summarization at safer intervals. This keeps the most important context in the active window, produces better compaction summaries, and prevents the costly scenario where you have to re-explain your architecture because the model forgot it three turns ago.
4. Haiku Subagents
When Claude Code spawns subagents for background tasks (file exploration, test running, search across the codebase), ECC routes those subagents to Haiku by default. Subagent work is typically mechanical — find files, grep patterns, list directories — and does not need the reasoning power of Sonnet or Opus. Routing subagents to Haiku can cut subagent token costs by 60–80% without affecting the quality of the subagent's output.
5. Cost Auditing
ECC ships with a built-in cost auditing tool (ecc-tools-cost-audit) that breaks down your token spending by task type, model, and session. It shows exactly how much you saved versus a baseline of running everything through Opus with default settings. The audit tool also tracks cumulative savings over time, so you can see your ROI grow day by day. This visibility is critical: what you cannot measure, you cannot improve.
Installation
ECC installs in one command. Choose the approach that matches your workflow. Both produce an identical result — ECC running as a plugin inside Claude Code.
Approach 1: Plugin Marketplace (Recommended)
If you are running Claude Code with plugin marketplace access, this is the simplest path. The marketplace handles discovery, installation, and updates.
/plugin marketplace add affaan-m/everything-claude-code
# Step 2: Install the plugin (this activates all optimizations)
/plugin install everything-claude-code@everything-claude-code
Approach 2: Direct Clone
If you prefer managing plugins manually or do not have marketplace access, clone the repository directly and source the configuration.
git clone https://github.com/affaan-m/everything-claude-code.git \
~/.claude/plugins/everything-claude-code
# Source the ECC configuration
echo 'source ~/.claude/plugins/everything-claude-code/init.sh' >> ~/.claude/settings.json
Verify Installation
After installation, confirm ECC is active and all five optimization layers are engaged.
/plugin status everything-claude-code
# Expected output (abbreviated):
everything-claude-code v2.4.1
model-routing ACTIVE → Haiku/Sonnet/Opus
thinking-cap ACTIVE → MAX_THINKING_TOKENS=10000
compaction ACTIVE → AUTOCOMPACT_PCT=50
subagent-routing ACTIVE → Haiku subagents
cost-audit ACTIVE → ecc-tools-cost-audit
Configuration Deep Dive
ECC works out of the box with zero configuration, but every setting is tunable. Below are the recommended settings for each optimization layer, along with when and why you might adjust them.
Model Routing Configuration
ECC's routing table maps task patterns to models. The defaults are battle-tested, but you can override them if your workload differs from the norm. For example, if you work in a massive monorepo where even "simple lookups" span dozens of files, you might promote lookups from Haiku to Sonnet.
{
"ecc": {
"routing": {
"lookup": "haiku",
"coding": "sonnet",
"architecture": "opus",
"subagent": "haiku"
}
}
}
Thinking Token Cap Configuration
The 10K default is conservative. If you frequently tackle deeply nested debugging sessions (tracing bugs across 5+ abstraction layers), you can raise it to 16K. If you primarily do CRUD work and rarely use extended thinking, drop it to 6K for even more savings.
{
"ecc": {
"thinking": {
"maxTokens": 10000,
// Raise to 16000 for complex debugging
// Lower to 6000 for CRUD-heavy work
"enabled": true
}
}
}
Compaction Configuration
The 50% threshold is the recommended sweet spot. At 50%, compaction triggers early enough to preserve context quality but not so early that you lose useful history. If your sessions are ultra-long (200+ turns), consider 40%. If your sessions are typically short (under 20 turns), 60% is fine.
{
"ecc": {
"compaction": {
"threshold": 50,
// Range: 40 (aggressive) to 70 (conservative)
"strategy": "preserve-recent"
}
}
}
Cost Audit Configuration
The cost audit tool can be run on-demand or configured to display a summary after every session. Enabling session-end summaries is recommended for the first two weeks while you build intuition about your spending patterns.
{
"ecc": {
"audit": {
"showAfterSession": true,
"trackCumulative": true,
"alertThreshold": 5.00
// Alert when daily spend exceeds $5.00
}
}
}
Real Cost Data
ECC's savings claims are backed by real production data from the community. The table below shows typical monthly costs for three usage profiles — before and after installing ECC. All figures are based on community-reported averages from the ECC GitHub discussions and the Anthropic developer forum.
| User Profile | Before ECC (monthly) | After ECC (monthly) | Savings |
|---|---|---|---|
| Light user (1–2 hrs/day, casual coding) | $35–55 | $18–28 | ~48% |
| Moderate user (3–5 hrs/day, professional dev) | $120–200 | $72–120 | ~40% |
| Heavy user (6+ hrs/day, AI-native workflow) | $350–600 | $210–360 | ~40% |
| Team of 5 (median usage, shared API key) | $800–1,400 | $480–840 | ~40% |
| Team of 20 (enterprise, per-seat billing) | $3,200–5,500 | $1,920–3,300 | ~40% |
These figures assume US pricing for the Anthropic API (Opus, Sonnet, Haiku tiers) and include both input and output token costs. The savings percentage is slightly higher for light users because a larger proportion of their requests are simple lookups that get routed to Haiku. Heavy users tend to have a higher proportion of complex tasks that still need Sonnet or Opus, but the thinking token caps and compaction savings still deliver a strong 40% reduction.
ECC vs Manual Configuration
You can achieve some of ECC's savings by manually configuring each setting — but the time cost, maintenance burden, and risk of drift make manual configuration a worse long-term strategy. This table compares the two approaches across the dimensions that matter.
| Dimension | ECC (One Install) | Manual Configuration |
|---|---|---|
| Setup time | 1 minute (one command) | 30–60 minutes (5+ config files) |
| Model routing | Automatic, task-aware routing to Haiku/Sonnet/Opus | Manual per-session model selection; easy to forget |
| Thinking token cap | 10K cap enforced automatically | Must remember to set env var; no default |
| Compaction threshold | 50% auto-compaction in one setting | Manual per-project env configuration |
| Subagent routing | Haiku by default, configurable | Separate env var; often overlooked entirely |
| Cost visibility | Built-in audit with cumulative tracking | Manual Console checks; no aggregation |
| Multi-tool support | Claude Code, Cursor, Codex, Gemini, Copilot | Per-tool configuration; inconsistent coverage |
| Updates and improvements | Community-maintained, auto-updating | Manual research and reconfiguration |
| Risk of misconfiguration | Low — defaults are battle-tested | High — easy to set conflicting values |
| Ongoing maintenance | Zero — ECC adapts to API changes | Must track API pricing and feature changes |
The bottom line: manual configuration can get you 60–70% of ECC's savings if you invest the time and maintain it perfectly. ECC gets you 100% of the savings for one minute of setup and zero ongoing maintenance. For individuals, the difference is convenience. For teams, the difference is thousands of dollars per year in recovered developer time that would otherwise be spent debugging configuration drift.
FAQ
Yes. ECC supports Claude Code (primary and deepest integration), Cursor, Codex, OpenCode, Gemini CLI, and GitHub Copilot. The model routing and thinking token caps adapt to each platform's capabilities. Cursor and Codex benefit most from the routing layer, which maps their internal model names to the ECC routing table. Copilot and Gemini benefit from the compaction and token cap settings. The installation is the same regardless of which tool you use — ECC detects your environment and applies the appropriate optimizations for each supported platform.
Manual configuration requires you to remember and tune five to eight separate settings across multiple files (settings.json, environment variables, per-project overrides). ECC bundles them into one install with defaults that have been battle-tested across thousands of developer hours. More importantly, ECC adds capabilities that manual configuration cannot replicate: dynamic task-aware model routing that inspects your prompt before deciding which model to use, per-task cost auditing with cumulative tracking, and automatic fallback routing when a requested model is unavailable or rate-limited. Manual configuration is a static set of rules; ECC is an adaptive optimization layer.
No. ECC operates at the configuration and routing layer — it does not sit between you and the model adding latency. Model routing decisions happen in microseconds before the request is sent. Thinking token caps are enforced server-side by the API, not by ECC adding overhead. Compaction triggers at the same intervals as native Claude Code; ECC just changes the threshold number. There is zero added latency to any request. In practice, many users report that Haiku-routed lookups feel faster than their previous workflow because the lighter model responds more quickly than Opus would for the same simple query.
ECC includes a built-in cost audit command: run /ecc-tools-cost-audit to see a detailed breakdown of your token spending by task type, model, and session. The audit compares your actual spend against a simulated baseline of what you would have paid routing everything through Opus with default settings. It also tracks cumulative savings over time — so you can see your total dollars saved across days, weeks, or months. The audit tool is transparent about its methodology: it uses the same Anthropic API pricing data that your billing dashboard uses, and it attributes costs to specific requests so you can verify the numbers yourself. For teams, the audit can be exported as JSON for integration with internal cost-tracking dashboards.
See how ECC stacks up against manual config, editor settings, and DIY solutions → ECC vs Alternatives: full comparison