GitHub All-Stars vol.17: CodeBurn

A slightly unsettling realization hit us at VirtusLab not long ago - our collective list of starred GitHub repositories had quietly ballooned into something that could generously be called "a problem." Instead of pretending it wasn't happening, we decided to lean into it: every two weeks, we pick a trending open-source project, pull the hood off, and tell you what we find underneath. We focus on fresh, relatively unknown repos - not the usual suspects that everybody and their tech newsletter already covered (because let's be honest, you don't need us for that).

Today's specimen: CodeBurn by AgentSeal.

The Moment of Reckoning

There's a particular kind of dread that hits you at 11 PM on a Thursday when you check your Anthropic billing dashboard and see a number that looks more like a car payment than a developer tool expense. You've been vibe-coding all week. The code ships. The tests pass. Everything feels productive. And then: $847 this week.

The natural next question - "but what did I actually spend it on?" - has, until now, had no good answer. Your billing page gives you a total. Claude Code's built-in /cost command shows you the current session. Neither tells you that 56% of your weekly spend was the AI... having a conversation with itself.

Someone on X/Twitter crystallized this beautifully. The tweet from @heygurisingh that launched a thousand existential crises went something like: most of my $200/day spend wasn't coding - it was Claude just... talking. The tool that produced that revelation? A 3,000-line TypeScript CLI called CodeBurn.

1,201 stars in two days. Let's see if they're deserved.

Visdom 2.0 AI-Native SDLC

The Missing Layer Between AI Coding and Production

What CodeBurn Actually Is

CodeBurn reads the session transcripts that AI coding agents already store on your disk - Claude Code's JSONL files at ~/.claude/projects/, Codex's at ~/.codex/sessions/ - and tells you exactly where your tokens went. By task type, by tool, by model, by project. No API keys. No proxy. No wrapper. No data leaving your machine.

That last point matters more than it sounds. In a world where every other dev tool wants to phone home, CodeBurn's entire architecture is built around the radical premise that the data already exists on your machine - someone just needed to make it legible.

The timing is not accidental. April 2026 is the month AI coding costs became a line item that managers actually noticed. Claude Code Max plans run $100-200/month. API users are reporting averages of $13/developer/day, with power users blowing past $30 without breaking a sweat. And then OpenAI shipped Codex with its own local session format. CodeBurn is the first tool that reads both, side by side, in one dashboard.

The Architecture: 3,000 Lines of Disciplined Constraint

This is a GitHub All-Stars article, so we go deep. CodeBurn is 3,078 lines of TypeScript across 16 source files. The tech stack: Commander.js for CLI, Ink (React for terminals - yes, really) for the TUI, and LiteLLM's pricing database for cost calculations. The architecture is a clean four-stage pipeline: discover sessions → parse JSONL → classify turns → render.

Let's walk through the interesting decisions.

Provider Plugin System: The Abstraction That Didn't Over-Abstrast

The single most architecturally elegant decision in the codebase. Instead of hardcoding Claude's JSONL format and bolting Codex support on top as an afterthought, CodeBurn defines a Provider interface:

Each provider handles four concerns: where are the session files, what does each JSONL line mean, how to normalize tool names (Codex's exec_command → Bash), and how to display model names for humans. The registry? A flat array:

Adding support for Pi, OpenCode, or Amp is literally a single file. No config system, no registration boilerplate, no plugin loader. This is the sweet spot of abstraction - enough structure to normalize heterogeneous formats, not so much that the abstraction itself becomes the problem.

If you've ever worked on a project where someone introduced a ServiceLocatorFactoryRegistryProvider to solve a two-implementation problem, you'll appreciate the restraint here.

Deduplication: The Unglamorous Hero

Here's where things get tricky and where most clones would get it wrong. Claude Code stores the same API call in multiple session files - the main session and subagent sessions share message IDs. If you naively iterate through all JSONL files and sum up tokens, every number on your dashboard is a lie.

CodeBurn solves this for Claude via message ID tracking:

For Codex, the problem is different. Codex uses cumulative token counters instead of per-call deltas, so a naive parser double-counts every call. CodeBurn cross-checks cumulative totals and computes per-call deltas:

That comment about OpenAI vs Anthropic token counting conventions is chef's kiss. This is the kind of detail that separates tools built by people who've actually debugged billing discrepancies from tools built by people who read the README and assumed it was complete. OpenAI counts cached tokens inside the input token count; Anthropic reports them separately. Getting this wrong means your cost calculations are off by a factor that depends on your cache hit rate. CodeBurn normalizes to Anthropic's convention everywhere.

The 13-Category Classifier: Regex Over LLM (A Radical Act in 2026)

This is where CodeBurn's design philosophy really shines, and frankly, where it earns my respect as an engineer.

In April 2026, the instinct for most developers building an analytics tool would be: "Let's call an LLM to classify each turn!" It's the obvious solution. It would be expensive (ironic, for a cost-tracking tool), slow, non-deterministic, and would require an API key - violating the tool's zero-config promise.

CodeBurn does the boring thing instead. A two-pass classifier: tool patterns first, keyword refinement second.

The priority chain is deliberate and correct. Agent spawns and plan mode override everything - these are structural signals, not content signals. Then Bash-without-edits gets matched against domain patterns. Only then does "has edits" get its category. A turn that uses the Edit tool and the user said "fix the broken login" → Debugging, not Coding.

Stable. Inspectable. Instant. Zero API calls. Zero non-determinism. Regex may not be sexy in 2026, but it's fast, free, and reproducible. Sometimes the boring thing is the right thing.

One-Shot Rate: The Metric Nobody Else Tracks

This is CodeBurn's genuinely novel contribution. For each turn involving code edits, the classifier detects retry cycles: Edit → Bash (to test) → Edit (to fix what broke).

A 90% one-shot rate on Coding means the AI got it right first try 9 out of 10 times. A 40% one-shot rate on Debugging means 60% of debug tokens are burning on retry loops. This is the number that actually tells you whether your AI investment is returning value - not the total spend, but the efficiency per turn.

And it has real actionable implications. If your one-shot rate on a specific project is consistently low, maybe the context is insufficient, maybe the codebase needs better documentation, maybe you need to break tasks into smaller chunks. This maps directly to what Tomek Lelek and I write about in our book on Vibe Engineering - measuring AI effectiveness requires going beyond "did the code ship?" to "how efficiently did the AI get there?"

Cost Calculation: Trust, But Verify

Pricing data comes from LiteLLM's open model pricing database, cached for 24 hours. But here's the pragmatic bit - CodeBurn doesn't trust LiteLLM's fuzzy model name matching for models that matter most. For every Claude and GPT model, there's a hardcoded fallback:

That fastMultiplier: 6 on Opus 4.6 accounts for Claude Code's fast mode pricing. The getCanonicalName function strips date suffixes and version tags before matching. More specific names always win, preventing gpt-5.4-mini from being priced as gpt-5. Belt and suspenders - the way financial calculations should be done.

The TUI: React in Your Terminal (And It Looks Good)

The dashboard is built on Ink - actual React components with hooks, state, and effects, rendered in the terminal. Not curses. Not blessed. React 19.

The layout is responsive - panels go side-by-side at 90+ columns, stack vertically below. Gradient bar charts use per-character coloring calculated with linear interpolation across three color stops (blue → amber → orange). When stdout isn't a TTY, the dashboard renders once statically and unmounts. Interactive mode only activates for actual terminal sessions.

The attention to visual detail is unusual for a CLI tool that's two days old. It looks like someone cared.

Developer Experience

The macOS menu bar widget (via SwiftBar) is a thoughtful bonus - a flame icon showing today's spend, dropdown for activity breakdown, currency picker. Refreshes every 5 minutes. codeburn install-menubar handles setup.

Multi-currency support covers 162 ISO 4217 codes with exchange rates from the Frankfurter API (European Central Bank data, free, no API key). For those of us billing in PLN or CHF, this is an appreciated detail.

Competitive Landscape

ccusage - The direct inspiration (credited in README, which: class). CLI analyzer for Claude Code JSONL with daily/monthly tables. Mature and reliable, but no TUI, no Codex support, no task classification, no one-shot tracking.
claude-usage - Local dashboard with progress bars for Pro/Max subscribers. Focused on plan limit tracking rather than cost analysis. Web UI, not TUI.
Claude-Code-Usage-Monitor - Real-time monitoring with burn rate predictions. Complementary to CodeBurn (real-time vs. historical).
ccost - Lightweight token and cost analyzer. Simpler scope, fewer breakdowns.
Claude Code's built-in /cost - Current session only. No history, no project breakdown, no classification. The equivalent of checking your bank balance but never looking at your transactions.

CodeBurn's unique position: it's the only tool combining multi-provider support (Claude + Codex in one view), task classification (13 categories), one-shot success tracking, and an interactive responsive TUI. The provider plugin system positions it to absorb every new coding agent that ships a local session format.

The Honest Take

Because this is GitHub All-Stars, not a product review - we say what we actually think.

What's genuinely impressive:

The provider abstraction hits the exact right level. Not over-engineered (no plugin loader, no config registry), not under-engineered (clean interface, consistent normalization). Adding Codex support cost 305 lines in a single file. The classifier choosing regex over LLM is a masterclass in resisting the temptation of your own tools. And the deduplication logic - handling both Claude's message-ID approach and Codex's cumulative-token approach correctly - is the kind of invisible work that separates tools you can trust from tools that produce nice-looking wrong numbers.

Things I'd like to see:

CI pipeline. No GitHub Actions. The test suite exists (4 files, 383 SLOC) but there's no evidence it runs on push. For a tool that tracks every dollar, the testing infrastructure feels underdressed for the party.
Test coverage on the critical path. Test ratio is 0.02 (383 test SLOC / ~15,700 total). The provider registry and export tests exist, but the parser and classifier - the two most critical components - have no dedicated test files. The classifier's 13-category priority chain and the parser's deduplication logic are exactly the kind of code that needs 50+ test cases.
Historical tracking. CodeBurn gives you snapshots but doesn't persist them. Want to compare this week's one-shot rate to last week's? Export CSVs manually. A lightweight SQLite store for daily aggregates would unlock trend analysis.
Silent cost failures. When a model isn't found in LiteLLM or the fallbacks, calculateCost returns 0. No warning. A "3 API calls couldn't be priced" notice would help users trust the numbers they can see.
Session drill-down. The dashboard shows project and model breakdowns, but you can't drill into a specific session to see what happened turn by turn. For debugging that one $47 Tuesday afternoon session, this would be invaluable.

The Broader Picture: Why This Matters Beyond One CLI Tool

Let me put on my Head of Application Development hat for a moment.

CodeBurn's existence is a signal. It tells us that AI coding costs have crossed the threshold from "rounding error on the cloud bill" to "thing we need observability for." We've had APM tools for our production services for two decades. We've had cost dashboards for our cloud infrastructure for a decade. Now we need the same for our AI developer tooling.

The fact that the data was already there - sitting in JSONL files on disk, generated by every Claude Code and Codex session, completely unread - is both embarrassing and encouraging. Embarrassing because we've been flying blind. Encouraging because the fix was 3,000 lines of TypeScript and a good set of regex patterns.

If you're running a team where AI coding tools are a significant expense (and in April 2026, that's most teams), the one-shot rate metric alone is worth the install. It's the closest thing we have to a quantitative answer to "is the AI actually helping, or is it just churning?"

Conclusion

CodeBurn is 3,000 lines that solve one problem well: telling you where your AI coding tokens went, with enough granularity to change your behavior. The architecture reflects disciplined constraint - the right abstraction for providers, the right approach for classification (boring regex, not fancy LLM), and the right level of visual polish for a CLI dashboard.

The real test isn't the star count. It's that first moment when you run npx codeburn and discover that Conversation outranks Coding in your spend. When you see your one-shot rate on Debugging hovering at 30%, meaning 70% of debug tokens are burning on retry loops. Those are the numbers that change how you prompt, how you structure tasks, and whether the $200/day is actually buying you velocity.

Two days old. 1,201 stars. Already the de facto answer to "where did my AI budget go?" The provider system means it'll keep absorbing new coding agents as they ship local session formats. If you're spending real money on AI coding, you should be running this.

A well-deserved GitHub star from me 😉