We’ve been putting AI to the test. Not in theory, but in controlled experiments. You’ll learn:
- when to use it to get things done (not to plan them),
- how to guide it with the right structure,
- and where the “AI as teammate” metaphor breaks down today.
Dive in!
Where did an AI coding assistant or agent save you the most time or increase efficiency?
Last month, I worked on demo applications for an article on Local-Second, Event-Driven. For that, I needed a CRUD backend connecting to a PostgreSQL database, and two React frontends. Due to the chosen architecture, the design was a bit unorthodox: using a transactional event-sourcing approach. To make things more exciting, I decided to write the backend in Rust.
These are not the mainstream technologies on which LLMs were predominantly trained, but given an architectural article to follow, they implemented each feature of the backend that I asked for. It was way faster than writing by hand, but it did require careful reading of the code. I used Claude Code, which, for example, was very happy to generate duplicate code. When I pointed out that some two methods are almost or exactly the same, CC did fix the issue. But you need to read the code rather carefully to spot such problems.
One thing that might help is having a review agent and a coding agent working in tandem. I’ll try that in the next demo app, and report in the next issue of TMWAIed!
What did you achieve that was not possible before without AI?
I think the best speedup is generating simple frontend applications from scratch. I needed two of those for my demo, and I could implement two brand-new React+Typescript CRUD-like frontend apps in a couple of hours.
Also, I was just able to say, “implement a greedy hotel room allocation algorithm”, and a couple of minutes later, I got one, along with a test suite. Doing this by hand would be way longer.
Where did AI fail miserably, despite your attempts to solve a task using it?
I think the most spectacular failure is when the LLM decided to use floats to represent money. In 2025! Despite being a textbook example of what not to do! I mean, this must be in the training material multiple times.
Another problem is number conversions, especially aligning database, Rust, and JSON types. Claude did mess up here, requiring a careful code review and some cleanup. I mean, i32
, i64
, SERIAL
, BIGINTEGER
, and JavaScript’s integer-floats are all numbers, but you can’t just assign one to another blindly.
Finally, Claude Code has shown its human side: it could not center a div. Despite “thinking harder” or starting from scratch multiple times. And it wasn’t a particularly tricky div - adding width: 100vw;
to the parent (by hand) solved the problem.
Where did an AI coding assistant or agent save you the most time or increase efficiency?
Over the last two months, from time to time, I have been working on a side project building my very first mobile app for iOS and Android operating systems. Since I had zero experience with mobile development, I think even starting such a project without AI support would have a fairly high entry point. I decided to build a simple productivity app for personal habit tracking, that’s offline and private first. Data is stored only on user devices.
I knew I wanted to use Flutter to have a single code base for both operating systems, and that was where my knowledge about mobile development ended back then.
Before I started my very first chat with AI, using Cursor with Agent mode, I created a Cursor rules file for my project. Well, AI actually did that for me. I asked it to provide a set of rules that an experienced Flutter developer should follow, and refined it a little to my needs.
What did you achieve that was not possible before without AI?
AI helped me set up a mobile development environment without reading a single documentation file, which was quite rewarding. I was able to run a simple Hello World app on my iPhone the first evening. Without it, I would need to spend a decent amount of time just learning the basics, and it would probably take a while before I could code fluently in a new language and framework.
After setting up my repository on GH I started building on feature branches. I used GH Copilot PRs review to review the code AI written in the first place. I must admit that due to my poor knowledge of the new framework, I did not carefully review what AI had written initially. Many times Copilot reviews caught some pitfalls and helped me to understand the code better by giving summaries of the changes.
Where did AI fail miserably, despite your attempts to solve a task using it?
Tempted by the “Vibe Coding” hype, I thought that this was something to build over a weekend. It’s certainly not unless you really prepare for it, with some in-depth design of the domain model, UI/UX mocks, dependencies, and detailed code structure that will be solid and maintainable, which is not something that “Vibe” coders care about.
Therefore, my big mistake was rushing things. I wanted AI to write the code and tests simultaneously, but then I changed my domain and UI design idea, and had to spend more time on the iterations by fixing and adjusting existing tests. That’s where I saw AI struggle for the first time (claude-sonnet-4.0 model). I was constantly getting into AI “drift” states where it really couldn’t fix the problem, and very often, I was deciding to completely change some other part of the code, which appeared fine. It was often also making very poor decisions on design or code structure. I learned that those must be reviewed carefully; otherwise, today I would end up with completely unmaintainable code. After I pointed it towards a different approach, it left lots of “garbage” like unused methods or localizations behind. Most of the time, I would have to instruct it to clean up those or do it myself.
I learned that a single chat iteration should focus on smaller features/adjustments, be precise, and then jump into another chat. I also learned that once something is working, it’s good to commit fast, to have a safe place to rollback and start over with some feature or fix after AI “drifts”, for example.
At some point, when I was happy with the main code, I deleted the entire tests directory, leaving only a utility file to run tests on different screen sizes in the future. That was much simpler than forcing AI to fix over 200 broken tests. Writing them from scratch (of course, also with AI help) ended up being much simpler.
My honest opinion is that “Vibe coding” is a lie. Without knowledge about design principles and some technical aspects, the code is guaranteed to be a total mess. Today, I am close to finishing the work and publishing the app. I am already working mostly alone, using AI as a companion rather than the “architect” of the entire solution. Despite all the troubles, I still liked the process, and despite a lot of frustration, I find it quite satisfying and fun. AI capabilities cannot be ignored today, but they still, in my opinion, need to be used carefully.
Hello there, your friendly neighborhood self-avowed AI LLM skeptic here. You'll probably get tired of me soon enough — iff my malcontent rants continue being featured here — so I'll give you a head start with something positive for a change. This one's about that time I rewrote a Bash boot script in Rust while being four-fifths of the way into the arms of Morpheus — and it somehow ended up working.
1. Where did an AI coding assistant or agent save you the most time or increase efficiency?
Quick lore recap: I run NixOS, and at some point, they switched the boot process to be fully systemd-based, dropping the old scripting hooks I'd used to do systemd-ask-password | cryptsetup
open on my poor man's second-factor keyfile during early boot.
It should have been simple in theory: define a socket template unit, reference it in /etc/crypttab
, and let cryptsetup
fetch the decrypted keyfile. In practice, it was a pain to massage into a semblance of working, with nobody on the Interwebs seeming to have attempted that particular contortion. Shout-out to the person at NixCon 2024 who rubber-ducked me towards the solution (pro tip: systemd's default unit dependencies can ruin your day in initrd — try disabling them).
2. What did you achieve that was not possible before without AI?
I'd had the thing limping along, but I kept having that nagging feeling you get when seeing some speed tape plastered on the wing of a plane you're about to fly on — it is designed for that, and yet it seems so flimsy and jury-rigged. So I always wanted to "productionise" my script; rewrite it in Rust™, pack it into a statically linked binary that runs in initrd, something testable and extensible. But that felt like a Project — that one capital letter too many triggering the executive dysfunction snooze button, leaving the idea on a back burner indefinitely.
So here's where LLM agents come in. One day, my biological trappings couldn't decide whether they were too tired to be awake or too restless to sleep. I'd had some decent experiences with agents already by then, and had converged on a workflow that seemed to function, so I wondered: how would they handle such an off-kilter task as my boot script, especially while I wasn't together enough to truly help it?
And lo and behold, I fired up Roo Code — picked mainly for its orchestrator mode that coordinates sub-agents. I added two of my own: one for Nix specifics, one for LUKS. They had access to LSPs and tree-sitter MCPs for both Nix and Rust, allowing them to navigate and query code effectively. I handed them my half-baked idea — "here's a thing that works, Rustify it" — and let them chew on it.
Between a few Markdown notes standing in for "memory" and the agent's task breakdown, it started making real progress: a statically linked Rust binary using libcryptsetup-rs
instead of shelling out, nix
+ libc
for socket activation (had some issues getting rust-systemd
to link properly). All with the amount of oversight one can provide while half-conscious.
Hours later, or minutes, time was getting rather wibbly-wobbly, timey-wimey by then — it compiled. The agent insisted it was tested enough, so I dropped it into my NixOS VM boot test, expecting it to crash and burn as usual. Instead: green. The thrice-damned thing just worked. Even after requesting a modular refactor, the unit tests remained green, and the VM test still passed. I was too tired to celebrate, but the feeling was... unsettlingly positive.
So, to sum up, what I couldn't have achieved without the agent was, honestly, writing that code at all. Not for lack of capability — I could've done it, had I ever started — but because it lowered the activation energy enough for my sleep- and ADHD-addled brain to get on with it. It nudged me forward when my thoughts turned to mush, and "remembered" just enough context through notes and structure to stay coherent. It didn't understand what it was doing, but it made me believe I still did.
Such a hands-off approach also works quite well when you're perfectly lucid, but focused elsewhere — say, set the agent on a side task while you're busy with the job that keeps you fed and housed. Then, when taking a break, take a peek at what the agent has accomplished and push it in a new direction. At the end of the day, you'll have your 8 hours dutifully clocked in, but also possibly some progress on your pet project — and if the agent produced some nonsense, then no big deal, you couldn't have worked on that yourself during your day job anyway, could you?
3. Where did AI fail miserably, despite your attempts to solve a task using it?
Of course, a few weeks later, roughly the same setup — this time building a Pratt parser from scratch — worked fine until I tasked the agent with integrating it with the existing codebase. All of the many attempts were an incomprehensible mess. So yeah: still dumb, still fast.
But sometimes, being dumb fast enough, really is enough.
So that's about it for this time. If you liked it, I might come back soon with a more in-depth dive into how I've changed my tack a bit from swearing off using LLM agents forever to actually giving them a fighting chance to worm themselves into my toolbox, and the workflow that works for me.
Here’s what our recent experiments taught us:
- AI speeds up execution, not thinking. Give it a plan, don’t expect it to make one.
- Structure over vibes. Whether you use Flutter, Rust, or Scala, the more context and constraints you provide, the better the output.
- Sometimes LLMs nail niche tasks you’d never thought they could pull off, even when you’re not awake enough to keep tabs on them.
Tried agentic AI or prompt engineering on a real task? Tell us, top stories get featured!
Contact page
Socials: X, Mastodon, Bluesky