AI coding agent analytics: what 1,573 sessions taught us

Developers believe AI makes them 20% faster. A rigorous study by METR found that experienced open-source developers were actually 19% slower with AI assistance. The kicker: after the study, those same developers still believed they'd been 20% faster.

Only 29% of organizations can confidently measure whether AI coding tools are paying off. The rest are guessing.

We were guessing too.

Spending on AI coding every day, measuring none of it

Our team runs Claude Code every day. Six people, 37 projects, real production work. We weren't experimenting with AI coding. We were betting on it.

But we couldn't answer basic questions. Are we actually getting better at using AI agents? Which tasks work well? Which ones burn tokens and produce nothing? When someone abandons a session after 30 seconds, why?

The industry gives you token counts, latency numbers, and cost dashboards. Fine. But nobody was measuring the things that actually matter: did the agent follow its instructions? Did the session produce anything useful? Is the team improving over time, or just spending more?

We kept asking these questions internally. In every call where we decided which tasks to give to agents and which to do ourselves. Nobody had answers. We were making those decisions based on gut feeling, and we had no way to know if our gut was any good.

The tooling didn't exist.

This isn't just us. Faros AI studied 10,000 developers across 1,255 teams and found something uncomfortable. Teams with high AI adoption shipped 21% more tasks and 98% more pull requests. Sounds great. But those same teams saw 91% longer PR review times, 9% more bugs per developer, and 154% larger PRs. At the organization level, there was no significant correlation between AI adoption and actual improvement.

More output. Same or worse outcomes. And almost nobody measures the difference.

We decided to stop talking about it and start doing it.

The numbers we found inside our agent sessions

We built a tool to capture and analyze our Claude Code sessions. Every session, every transcript, every outcome. After 1,573 sessions, some unexpected patterns showed up.

4% skill utilization

We had spent weeks building custom skills for Claude Code. Careful instructions, domain-specific workflows, and the whole setup. Rudel showed us that only 4% of our sessions were actually loading them.

The agent wasn't broken. The discovery mechanism was. Skills in .claude/skills/ use RAG matching for discovery, and if the agent doesn't look, it doesn't find them. We added a single mandatory discovery step to our global CLAUDE md configuration. Utilization went from 4% to 61%.

One config change. Measurable across the entire team. We would never have found this without the data.

26% abandonment in the first 60 seconds

Not failures. Abandonments. A quarter of all sessions ended with the user giving up before the agent got started. That's a lot of wasted context, wasted time, wasted intent. Understanding why people bail early turned out to be more useful than optimizing the sessions that run long.

Two patterns kept showing up. The agent starts doing something clearly wrong in the first few seconds, and the user decides that killing the session and retrying with a different prompt or a different model is faster than correcting it. Or the agent doesn't load the right tools, skills, or MCPs, and the user can tell immediately that the session won't produce what they need.

Early errors predict session failure

If the agent stumbles in the first two minutes and doesn't self-correct, the session almost never recovers. This pattern was consistent enough that we started treating the first two minutes as a reliability signal for the entire session. If something goes wrong early and the agent course-corrects, the session tends to land fine. If it doesn't, you're better off starting over.

This one and the 26% abandonment finding are two sides of the same coin. Both point to the same root cause: bad configuration. The agent wasn't set up to succeed from the start, so users were forced into a loop of killing sessions and retrying until something clicked. Fix the config, and both numbers improve.

10% increase in successfully completed tasks

After applying what we learned from these patterns, not only did the specific metrics grow, but our team's task completion rate did too by 10%. Rudel measured this by analyzing conversation prompts and user behavior across sessions. The improvement came from a combination of the skills fix, better onboarding flows, and killing session patterns that we now knew were doomed.

The part that doesn't show up in a dashboard: people on the team feel more confident now. They waste less time second-guessing whether the agent is doing the right thing. They have a better relationship with AI coding in general. That's hard to measure, but it matters just as much as the 10%.

These weren't theoretical findings. They came from our own daily work, measured by a tool we built because nothing else on the market was tracking this.

That tool is rudel.ai.

How rudel.ai works

CLI-first. Two commands to get started: rudel login and rudel enable. That's it.

Rudel installs as a Claude Code hook. When a session ends, the hook fires and uploads the transcript automatically. No manual exports, no copy-paste, no workflow changes. You code the way you already code. The data shows up in your dashboard.

The stack:

CLI (/apps/cli), the hook that captures transcripts on session end.
Backend receives and processes transcripts into structured analytics. Bun + TypeScript.
ClickHouse stores and queries all session data. Fast aggregations across thousands of sessions, the kind of analytical workload we've been running at scale for years.
Dashboard, where you explore session analytics, team patterns, model usage, and success metrics.

Everything is TypeScript. A monorepo with Turbo that runs on Bun. The GitHub repo has the full architecture diagram.

You can use the free hosted version at rudel.ai or self-host the entire thing. Both options are fully available. Self-hosting was the number one concern during our launch, and we get it. You should have complete control over where your data sessions live.

Source available and building in public

We made rudel source available for a specific reason. We're asking developers to trust us with their session transcripts. You should be able to read every line of code that touches your data. See exactly what gets sent, how it gets stored, and what gets analyzed. No black box.

Self-host the whole thing if you prefer. We'd rather you run Rudel on your own infrastructure than not run it at all.

But source available isn't only about trust. The product is early. It works, the data already pays for itself, but there's a long list of things we want to build. We'd rather do that with the community than alone.

We're also not limiting this to Claude Code. Codex support is already integrated and being tested. There's an open PR for Pi support, we go all in on cross-platform. If your team uses more than one AI coding agent, you should be able to compare them side by side with the same analytics.

Collab is welcome. File an issue, open a PR, fork it, tell us the metrics you wish you had.

Launch week: 190 stars and VPs in the inbox

We launched the week of March 10. Put it on Hacker News, posted on LinkedIn and Twitter. The GitHub repo went live.

We expected a few people to try it. We did not expect what actually happened.

144 Hacker News upvotes. 86 comments. 190 GitHub stars in the first few days. Signups ranged from solo developers to engineering teams at companies who'd never heard of us. VPs of engineering and CTOs are reaching out because they want visibility into how their teams use AI coding agents.

We didn't pay for any distribution. We solved a problem that turned out to be bigger than our team, built it in the open, and people recognized it.

The thing that stood out from the HN comments: people weren't asking "why would I need this?" They were asking, "Can I self-host it?" and "When will you support other agents?" The 26% abandonment stat and the low skill usage resonated with many people. Multiple commenters shared that they'd seen similar patterns on their own teams but had no way to quantify them.

The category is real. Both Datadog and Anthropic shipped their own Claude Code analytics features in 2026. Rudel predates both. When the incumbents enter a space you've been building in, you know the problem wasn't made up.

AI agents team-level intelligence

The launch validated that this problem exists far beyond our six-person team. Now we earn the trust of every team that signed up by shipping fast.

The dashboard is getting a full redesign. We design to delight users through the invaluable insights, and in many cases, that means figuring out what people need before they know they need it. The current dashboard works. The next version will feel like something you want to open every morning.

Beyond that: context window analysis, multi-model tracking, and the bigger vision. Team-level intelligence. Not "how many tokens did I use today" but "is my engineering team actually getting better at working with AI agents over time?" That's the question every VP of engineering is asking right now, and nobody has a good answer yet.

We operate on weekly shipping cycles. Every reported bug gets fixed within a week. Features are shipped based on what users actually need.

If any of this sounds useful:

Try it. rudel.ai, free hosted version. Takes two minutes to set up.
Self-host it. github.com/obsessiondb/rudel, full source, self-hosting docs included.
Build with us. Open an issue, open a PR, tell us what you need.