Most people using Claude Code on a Max subscription assume "unlimited" means "no need to think about usage." That assumption is expensive in a different way: not financially, but in terms of efficiency. The model can quietly burn through thousands of turns generating verbose outputs, serializing parallel work, and narrating its own actions before executing them. You never see a bill, so you never notice the waste.
I spent time building a system to surface these patterns, analyze them, and feed the insights back into my Claude Code configuration. This post covers the full architecture: what I built, why each piece exists, and how you can replicate it.
The Problem With Unlimited
When you pay per token, every inefficiency shows up on your invoice. On a flat-rate plan, the feedback loop disappears. The model can run 700 lines of nmap output directly into context on every turn, rerun the same structure-exploration Bash commands sequentially instead of in parallel, and write status narration before every tool call, and you will never see a direct signal that any of this is happening.
The inefficiency compounds differently than cost: longer sessions accumulate more context, which makes the model slower to reason about what matters, which causes more turns, which makes the session longer. The result is that a session that should have taken 40 turns takes 150.
I needed visibility.
Step One: ccusage
ccusage is an open-source CLI tool that reads the JSONL files Claude Code writes locally to ~/.config/claude/projects/. It does not send anything anywhere. All analysis is local, which matters for anyone doing client work.
Installation is zero-friction:
npx ccusage
The daily report gives you token counts and estimated USD costs per day, broken down by model. The session report is more useful:
npx ccusage session --breakdown
This shows you every conversation, what it cost in estimated pay-per-use terms, and which models handled what proportion of the work. When you see one session estimated at several times what you would expect, that is your signal to investigate.
The session report also reveals model distribution. If you see Opus handling routine file reads, grep searches, and boilerplate edits that Sonnet could manage equally well, that is a concrete inefficiency you can fix with a single CLAUDE.md line.
Step Two: Reading the JSONL Files Directly
ccusage gives you aggregate metrics. To understand why a session was expensive, you need to read the actual conversation. Claude Code stores every session as a JSONL file where each line is a JSON object representing one turn, including role, content, tool calls, and cost.
I wrote a Python script called ccreflect that does three things:
- Calls
npx ccusage session --jsonto find the most expensive sessions. - Locates the corresponding JSONL files and parses the conversation content, extracting user messages, assistant responses, and tool call sequences.
- Builds a structured prompt from that content and sends it to Claude via
claude -pfor reflective analysis.
The analysis prompt asks Claude to do something specific: identify patterns in the conversation that caused high token usage, separate what the user did from what the model did, and produce concrete recommendations with target files (CLAUDE.md, skill files, or both).
The output is two files: a LEARNINGS_*.md report and a CLAUDE_md_snippets.md file with ready-to-paste additions for your configuration files.
This is the part that makes the system self-improving. The model analyzes its own sessions and produces instructions for future sessions.
What the Analysis Actually Found
Running ccreflect against my five most expensive sessions surfaced four consistent patterns.
Sequential tool calls that could be parallel
Build checks, linters, and test runners often have no dependency on each other. The model was firing them one by one, waiting for each to complete before starting the next. On a session with 200+ turns, this adds up significantly.
Raw tool output living in context
A full test suite run or a detailed dependency audit can produce thousands of lines of output. Keeping that in the active context means every subsequent turn processes it again as part of the cache read. The fix is straightforward: spool output to ./output/<tool>-<timestamp>.txt and read slices as needed.
Narration before action
The model has a habit of writing a sentence describing what it is about to do before doing it. "The test suite appears to be failing on the auth module. Running the full test pass now." This is harmless in a single turn. Across 300 turns it contributes meaningfully to output token counts.
Opus on work that does not need Opus
Running searches across a codebase, reading config files, writing boilerplate, reformatting outputs: these do not require Opus-level reasoning. Sonnet handles them identically. The fix is a model selection rule in CLAUDE.md.
The Configuration Changes
The analysis produced three targeted changes.
code-analysis skill
Added an Execution Rules section at the top of the skill file:
## Execution Rules
- Parallel by default. Fire all independent checks in one message block.
Never serialize linting + type checking + tests; they share no dependency.
- Spool output. Save any tool output >100 lines to ./output/<tool>-<timestamp>.txt.
Read slices with grep/Read. Never let raw output dumps live in context.
- No narration before tool calls. One sentence after a finding, not before an action.
Global dev CLAUDE.md
Added model selection defaults:
## Model & session defaults
- Default model: Sonnet 4.6 for file edits, searches, builds, and routine code work.
Opus only for complex architecture decisions, algorithmic design, and hard reasoning tasks.
State the reason for escalating in one line.
- Codebase exploration ("where is X", "understand this module") → spawn Explore subagent.
Do not Grep/Bash sequentially in the main thread.
- Sessions expected >50 turns: checkpoint to SESSION_STATE.md at turn 50.
Autonomous agent CLAUDE.md
Added cost discipline rules that apply to any session running without continuous supervision:
## Cost discipline
- Budget: 80 turns, whichever comes first. At 60% of budget, write SESSION_STATE.md
and stop unless explicitly continued.
- Independent checks run in parallel. Fire all in one message block.
- Spool >100 lines of tool output to ./output/. Do not let raw dumps live in context.
None of these changes reduce quality. The model still does everything it did before. The work just gets done with fewer tokens and in less wall-clock time.
The Automation Layer
One analysis session is useful. A continuous feedback loop is better.
I set up two cron jobs. The first runs ccreflect every morning at 9 AM, analyzing the past seven days of sessions and saving the report to ~/claude-tools/reflections/<date>/. If the analysis finds expensive sessions or problematic patterns, macOS sends a local notification. No external services, no data leaving the machine.
The second cron job fires every Monday at 10 AM with a single notification: open a new Claude chat and ask it to read the reflections folder and propose CLAUDE.md changes.
The key design decision here is that the AI proposes, I decide. Fully automated CLAUDE.md changes would create a feedback loop with no human checkpoint. If the model makes a bad recommendation one week and it gets auto-applied, every session that week runs with that bad instruction. Human review adds maybe five minutes and eliminates that risk entirely.
The Weekly Review Workflow
Monday morning, you get the notification. You open a fresh Claude chat (not a long-running one, a fresh one with no accumulated context) and say:
"Read ~/claude-tools/reflections/ and propose changes to my CLAUDE.md files."
The model reads the weekly reports, identifies which patterns persisted, and surfaces two or three specific changes. You review them, apply what makes sense, discard what does not. Five minutes.
Fresh chat matters for this. A long-running chat carries context overhead. The review session should be lightweight by design.
The Broader Principle
What this system actually is: a lightweight LLMOps loop for a single-person setup. The same concepts that production teams apply to model monitoring and prompt iteration apply at the individual level, just with simpler tooling.
The JSONL files that Claude Code writes locally are underused by most people. They contain a complete record of every decision the model made, every tool it called, and every turn it took. That data is the input to a feedback loop if you choose to use it.
The ccreflect script is about 300 lines of Python. The cron setup is two lines. The CLAUDE.md changes are three small blocks of text. The infrastructure is minimal by design because the insight does the work, not the complexity.
If you work with Claude Code regularly and have not looked at your session data, start with npx ccusage session. The patterns you find will likely be specific to your workflow, not the generic ones I found in mine. That specificity is the point: the system tells you what to fix in your configuration, not someone else's.
The full ccreflect script and installer are available on request. The ccusage tool is open source at github.com/ryoppippi/ccusage.
Follow me on LinkedIn to stay updated.
Happy hacking and prompting!