Method 02

Strategic Compaction: Stop Burning Tokens on Dead Context

Q: Will setting AUTOCOMPACT_PCT to 50% slow down my workflow?

No — compaction runs asynchronously and typically takes only a few seconds. The minor overhead of compacting at 50% is negligible compared to the cost of losing context. In practice, a compaction at 50% preserves higher-quality summaries because the model still has ample headroom to process the full history before condensing it. Users report sessions that stay coherent longer and require less repetition of earlier instructions.

The default compaction threshold waits until your context is 95% full. By then, the model has already lost the plot. Lower the trigger, compact on your terms, and keep the conversation coherent from start to finish.

Token Savings: 10-20%

What Is Compaction

Every conversation with an AI coding tool builds up a history: your prompts, the model's responses, tool calls, file reads, and code edits. This history lives inside the model's context window -- a fixed-size memory that holds everything the model can "see" at any given moment. When the context window fills up, something has to give.

Compaction is the mechanism that solves this. Instead of simply truncating the oldest messages and losing them forever, compaction asks the model to summarize the conversation so far into a condensed, structured form. The summary preserves critical information -- what files were modified, what decisions were made, what bugs were identified -- while discarding the verbose back-and-forth that led to those outcomes. The condensed summary then replaces the original messages in the context window, freeing up space for new turns without losing the thread.

Think of it like a progress report in a long project. You do not reread every Slack message and email from the past three weeks before your next task. You read a concise summary of what was done, what is in progress, and what is blocked. Compaction does the same thing for your AI coding session. The model writes itself a structured memo of everything that has happened, then continues working from that memo instead of the raw transcript.

The key insight is that compaction is not just about saving tokens -- it is about preserving coherence. A well-timed compaction keeps the model oriented. A poorly-timed one (or none at all) leaves the model groping through a fragment of its own history, asking questions it already answered three turns ago.

Why 95% Is Too Late

Claude Code ships with a default AUTOCOMPACT_PCT of 95%. This means automatic compaction only fires when your context window is 95% full -- just 5% away from complete saturation. By the time that threshold is reached, the damage is already done.

Context windows do not fill evenly. The earliest messages in a session -- your initial instructions, the architectural decisions, the first file reads -- sit at the very bottom of the context stack. As the conversation grows, those early messages are the first to get squeezed. At 95% fullness, the model's working memory has already been silently degraded. It may have lost the original design constraints you described at the start. It may have forgotten about that edge case you flagged 30 minutes ago. The model will not tell you this has happened -- it will simply start producing answers that drift from your intent, repeat questions, or introduce bugs that contradict earlier decisions.

Worse, compaction triggered at 95% fullness produces lower-quality summaries. The model is operating under pressure, with very little headroom to process the full history and distill it accurately. It is like asking someone to write a book summary when they have five minutes before the exam versus when they have the whole weekend. The summary written under pressure will miss nuance, omit important caveats, and flatten the reasoning that led to key decisions.

Real-world symptoms of late compaction include: the model suddenly asking "which file are we working on?" after 45 minutes of editing, code changes that undo a fix from earlier in the session, and wild goose chases where the model re-discovers a bug it already diagnosed and solved. Each of these wastes not just tokens but your time and attention -- the scarcest resources in any development workflow.

How to Configure Strategic Compaction

Changing the compaction threshold takes one line of configuration. Add AUTOCOMPACT_PCT to your Claude Code settings file to trigger compaction at 50% context fullness instead of 95%. This gives the model a comfortable buffer to produce high-quality summaries while plenty of original context is still intact.

      # ~/.claude/settings.json

      {

        "env": {

          "AUTOCOMPACT_PCT": "50"

        }

      }

In addition to automatic compaction, Claude Code provides the /compact slash command for manual triggering. You can run it at any time to force a compaction at a semantically meaningful boundary -- for example, after finishing a major feature, before switching to a new task, or when you notice the model starting to forget details. The command is immediate: type /compact, hit enter, and the model summarizes the conversation so far into a compact form.

Best practice: Use AUTOCOMPACT_PCT=50 as your safety net and /compact as your precision tool. The automatic threshold catches you before context degrades; manual compaction lets you choose the optimal moment for summarization — right after completing a logical unit of work. Compaction is fundamentally about managing context headroom: every compaction reclaims tokens you can use for reasoning instead of re-reading. Pair with model routing — cheap models for lookups means less context consumption between compactions.

Manual vs Automatic Compaction

Both automatic and manual compaction have their place. The table below breaks down when each approach works best.

Scenario	Best Approach	Why
Long, uninterrupted coding session	Automatic (50%)	You will not remember to compact manually; the safety net catches you at the right time
Just completed a major feature	Manual (/compact)	Semantic boundary produces the highest-quality summary, preserving architectural intent
About to switch tasks or projects	Manual (/compact)	Lock in context before the mental model shifts; the next session starts from a clean summary
Short, focused bug fixes (under 30 turns)	Neither needed	Context window will not fill up; compaction overhead is unnecessary for brief sessions
Debugging session with many dead ends	Manual (/compact)	Compact after finding the root cause, discarding the failed hypotheses that led there
Pair-programming with frequent pauses	Automatic (50%)	Irregular pacing means you may forget to compact; automatic keeps things tidy

The two modes are not mutually exclusive. The most effective strategy layers them: automatic compaction at 50% ensures you never drift into degraded-context territory, while manual compaction at natural breakpoints maximizes summary quality at the moments that matter most.

Before vs After: A Real Session Comparison

To make this concrete, here is a side-by-side comparison of the same debugging session with default compaction (95%) versus strategic compaction (50%). The scenario: tracking down a race condition in a React state management layer across approximately 80 conversation turns.

Metric	Default (95%)	Strategic (50%)
Context at compaction trigger	~190,000 / 200,000 tokens	~100,000 / 200,000 tokens
Compaction summary quality	Adequate -- missed 2 edge cases	High -- all key findings preserved
Times model re-asked a question	4 times	0 times
Undone earlier fix	Yes -- reverted a state update	No regressions
Total tokens consumed	~340,000 tokens	~285,000 tokens
Session duration	62 minutes	48 minutes
Token savings	--	16% fewer tokens

The 16% token savings came from eliminating redundancy: the model did not re-read files it had already analyzed, did not re-investigate hypotheses it had already tested, and did not re-explain concepts it had already understood. Each of those repetitions was real money. More importantly, the session finished 14 minutes faster -- and developer time is worth far more than token costs.

This pattern is consistent across session types. Long debugging sessions benefit the most, since they accumulate the largest volume of dead-end exploration that should be summarized away. Architecture discussions benefit from preserving design rationale. Even routine feature work sees savings, because the model spends fewer tokens re-establishing context after each compaction cycle.

Common Pitfalls

Strategic compaction is straightforward to configure, but there are three mistakes that can undermine its effectiveness.

Pitfall 1: Compacting Too Often

Setting AUTOCOMPACT_PCT to an extremely low value -- say, 20% -- triggers compaction so frequently that the model spends more time summarizing than working. Each compaction pass consumes tokens (the summary itself costs something to generate), and over-compacting can actually increase total token usage rather than decrease it. Worse, overly frequent compaction can lose important nuance by summarizing before enough context has accumulated to identify what is truly important. The 50% threshold hits the sweet spot: enough context to produce a meaningful summary, enough headroom to do it well.

Pitfall 2: Waiting Too Long

The inverse mistake is sticking with the 95% default out of inertia or fear of change. Every session that runs at 95% is a session where the model's reasoning quality degrades silently in the final third of the conversation. The cost is not just wasted tokens -- it is wrong answers, repeated work, and eroded trust in the tool. If you change only one setting in your Claude Code configuration, make it AUTOCOMPACT_PCT=50. The downside risk is zero; the upside is immediate.

Pitfall 3: Never Using Manual Compaction

Automatic compaction is a safety net, not a strategy. Relying exclusively on AUTOCOMPACT_PCT means you are always compacting at arbitrary context-fill percentages rather than at semantically meaningful boundaries. The best summaries are produced right after completing a logical unit of work -- a finished feature, a resolved bug, a completed refactor. At those moments, the conversation has a natural "chapter break" and the summary can cleanly capture what was accomplished. Train yourself to type /compact at these inflection points, and the quality of your session continuity will improve noticeably.

FAQ

Context truncation simply drops the oldest messages when the context window is full -- permanently losing everything in them. Compaction actively summarizes conversation history into a condensed form, preserving key decisions, code changes, and reasoning while discarding redundant back-and-forth. Compaction is intelligent compression; truncation is blind deletion. With AUTOCOMPACT_PCT set lower, compaction runs while there is still enough context headroom to produce a high-quality summary, rather than scrambling at the last moment.

No -- compaction runs asynchronously and typically takes only a few seconds. The minor overhead of compacting at 50% is negligible compared to the cost of losing context. In practice, a compaction at 50% preserves higher-quality summaries because the model still has ample headroom to process the full history before condensing it. Users report sessions that stay coherent longer and require less repetition of earlier instructions. The net effect is faster sessions, not slower ones.

Yes, and combining both is the recommended strategy. Set AUTOCOMPACT_PCT to 50% as a safety net, then manually trigger /compact at natural breakpoints: after completing a major feature, before switching tasks, or when you notice the model starting to forget earlier details. Manual compaction at these inflection points produces the best summaries because you are compacting at semantically meaningful boundaries rather than arbitrary context-fill percentages. The two approaches complement each other -- automatic keeps you safe, manual keeps you precise.

← Back to all 5 methods