Token Counter — Count Tokens & Cut AI Costs (Free, No Login)
Paste your text below for an instant token count. Then learn 5 proven methods to cut AI token usage by 30–50% — each with a real tool you can use today.
Why Token Usage Is Eating Your Budget
Every prompt you send, every file you read, every thinking step the model takes — it all costs tokens. Most developers don't realize they're bleeding money on three invisible leaks: using the most expensive model for trivial tasks, letting context balloon past 80%, and re-reading the same files over and over. There's another dimension too: context headroom — how close you are to hitting the context limit. When headroom runs out, you lose everything the model knew. The good news? All of these are fixable. Below are five battle-tested methods, each with a concrete tool you can start using in the next 10 minutes.
The 5 Methods
Each method targets a different layer of token consumption. Pick one or stack them all — every method works independently. Click any card for the full deep-dive guide.
Route Cheap Models to Simple Tasks
90% of your daily AI coding work doesn't need Opus. File lookups, simple edits, variable renames — these are Haiku/Sonnet territory. Routing every "find where this function is defined" to Opus is like taking a taxi to cross the street. Claude Code lets you set a default model and subagent model independently, so exploration is cheap while real coding stays capable.
{
"model": "sonnet",
"env": {
"CLAUDE_CODE_SUBAGENT_MODEL": "haiku"
}
}
Compact Before Context Explodes
The default compaction threshold is 95% — meaning the model is already drowning before it summarizes. By then, critical early context is gone. Dropping AUTOCOMPACT_PCT to 50% triggers compression at safer intervals, keeping the most important information in play. Think of it as saving your game before the boss fight, not after you've already lost.
{
"env": {
"AUTOCOMPACT_PCT": "50"
}
}
ECC — The All-in-One Harness Optimizer
⭐ 182K+ GitHub starsInstead of configuring every optimization by hand, ECC (Everything Claude Code) bundles them into one system. Winner of the Anthropic Hackathon, it automates model routing (Haiku for lookups, Sonnet for coding, Opus for architecture), caps thinking tokens at 10K (vs the default 32K), triggers compaction at safer intervals, and routes subagents to cheaper models. Works across Claude Code, Cursor, Codex, OpenCode, Gemini, and Copilot. The 30–50% savings figure comes from real production use, not marketing slides.
/plugin marketplace add affaan-m/everything-claude-code
/plugin install everything-claude-code@everything-claude-code
Trim Your CLAUDE.md and Rules
Every line in your CLAUDE.md and rules files is injected into every conversation — before you even say hello. A 500-line CLAUDE.md is burning tokens on 480 lines you don't need for this specific task. Cut it to 5–10 core rules. Move language-specific rules to separate files that only load when you're in that language's project. The token savings are front-loaded: they compound across every single session.
CLAUDE.md / .claude/rules/ — keep it lean Read the full guide →Search First, Read Later
Reading an entire 800-line file to find one function is like buying the whole grocery store for milk. Use grep and glob to locate what you need, then Read only the relevant section. Claude Code's built-in search tools (Grep, Glob) cost a fraction of a full file read. This habit alone can cut per-task token usage by 20–40%, especially in large codebases where you'd otherwise be loading files you never actually modify.
Grep + Glob → then Read (CC built-in) Read the full guide →Method Comparison
Each method has a different effort-to-savings profile. Pick your starting point.
| Method | Effort | Token Savings | Works With |
|---|---|---|---|
| 01 Model Routing | 5 min config | 20–50% | CC, Cursor, Codex |
| 02 Strategic Compaction | 2 min config | 10–20% | Claude Code |
| 03 ECC (Recommended) | 1 min install | 30–50% | CC, Cursor, Codex, Gemini, Copilot |
| 04 Trim Rules | 30 min audit | 10–30% | Any AI coding tool |
| 05 Search First | Behavior change | 20–40% | CC, Cursor, Codex |
Who Needs This
If any of these sound familiar, you're leaving money on the table.
💻 Heavy Daily User
You spend 4+ hours a day in Claude Code or Cursor. Your monthly API bill is in triple digits. Cutting 30% saves real money.
🏢 Team Lead
Your team of 5+ devs all use AI tools. Per-person token waste multiplied by headcount = thousands per month. Centralized optimization pays off fast.
🚀 Indie Hacker
Bootstrapping on a tight budget. Every dollar on tokens is one less for marketing. These methods keep your AI assistant affordable without downgrading quality.
📚 AI Coding Beginner
Just started with AI coding tools and already hit a $50 monthly bill. You didn't know token costs added up. Now you know — and now you can fix it.
FAQ
Common causes: using the most expensive model for every task, bloated CLAUDE.md or system prompts, reading entire files instead of searching first, letting context fill past 80% before compacting, and not capping thinking tokens. Most users can cut 30-50% just by routing simple tasks to cheaper models.
Haiku for exploration and file search (~80% cheaper than Opus). Sonnet for everyday coding (~60% cheaper than Opus, near-identical code quality). Reserve Opus for architecture design, complex debugging, and security audits. The key is routing: match the model to the task complexity.
Yes, based on real production data. ECC combines model routing, thinking token caps (10K vs default 32K), strategic compaction triggers, and Haiku subagents — each layer compounding the savings. The 30-50% figure is conservative for daily heavy users.
No, if done right. The goal isn't using fewer tokens per task — it's using the right number. Routing lookups to Haiku and keeping Opus for architecture doesn't reduce quality; it improves ROI. Clean prompt files often produce better output than bloated ones.
Anthropic Console shows per-request token counts. Claude Code displays session totals after each task. ECC includes built-in cost auditing that breaks down spending by task type. For teams, most API gateways offer per-key usage dashboards.
Yes — and that's where the real savings are. Each method targets a different layer: model routing saves on requests, compaction saves on overhead, ECC automates both, rule trimming reduces baseline, and search-first prevents waste. Stacked together, the savings compound.