Published: Aug 21, 2025|20 min read20 minutes read
As we go full-AI mode, we want to inspire fellow developers with all the cool projects and experiments our devs are running with AI tools and agents.
At VirtusLab Group, SoftwareMill, and VirtusLab, we’ve always loved sharing our know-how, whether that’s on a conference stage, tech blog, or in Slack threads that never sleep. Now we’re putting that same energy into a monthly AI roundup.
Welcome to the new series: “This Month We AIed” initiated by the Scala expert and SoftwareMill co-founder, Adam Warski. Once a month, a handful of our developers answer up to three questions:
Where did an AI coding assistant or agent save you the most time?
What did you achieve that was not possible before without AI?
Where did AI fail miserably, despite your attempts to solve a task using it?
We’ll kick things off with stories from Piotr Kukiełka, Adam Warski, Mateusz Gołąbek, Artur Skowroński, and Adam Rybicki.
So watch this space! Have something cool you’re building with AI? Write to us and jump into the conversation!
Recently, I dove headfirst into one of a programmer's least favorite tasks: hunting down a nasty memory leak. Our app's chat component, which streams syntax-highlighted code, consumed nearly a gigabyte of memory after a single interaction. I tackled this beast using an AI coding assistant as my pair programmer. The experience was a fantastic case study of AI's power and limitations today.
Here’s a breakdown of where it shone, what it enabled, and where it fell completely flat.
Where AI Saved Me the Most Time
The biggest time-saver was using AI to rapidly generate boilerplate and test different architectural approaches. Instead of spending hours setting up each experiment, I could describe my goal and get a working scaffold in minutes.
My investigation involved testing multiple hypotheses:
Is highlight.js the culprit?
Would a simpler, regex-based highlighter work?
What if we swapped react-markdown for a custom markdown-it component?
Could we offload the work to a Web Worker?
Could we isolate a memory leak using React’s portals?
I used the Amp (powered by Claude Sonnet 4) to do the heavy lifting, for each of these ideas. It wrote the A/B testing components, scaffolded the Web Worker communication, and provided the code to swap out entire Markdown rendering libraries. This wasn't about the AI having the answer; it was about it accelerating my ability to ask the questions. It handled the "how," freeing me to focus on the "why." This easily saved me days of tedious setup and refactoring.
What I Achieved That Wasn't Possible Before
Without AI, an investigation this broad would have been impractical. Testing five fundamentally different architectural solutions for a memory leak would typically take a week or more. Each path requires significant research, setup, and implementation before you can even begin to measure the results.
With the AI assistant, I tested all of these complex approaches in just a day or two. This velocity was the real game-changer. By quickly ruling out multiple dead ends (like blaming the highlighting library or moving work to a worker), I was able to zero in on the true root cause: React's struggle to garbage collect the highly-nested DOM structures created by syntax highlighting.
The achievement wasn't just solving the problem, but the sheer speed at which I could navigate the problem space and arrive at a definitive conclusion.
Where AI Failed Miserably
For all its help in building solutions, the AI failed completely at the most critical task: diagnosing the nuanced, underlying problem.
When I would pose the problem to the AI, it offered generic, textbook solutions for React memory leaks:
"Are you using React.memo?" (We were, it didn't help.)
"Make sure you have stable keys." (We did.)
"Optimize your dependency arrays." (Already done.)
It couldn't make the creative leap that a human developer can. It didn't "look" at the deeply nested <span> jungle created by highlight.js and connect it to React's reconciliation overhead. It didn't analyze the memory profiler's strange batch allocations and hypothesize about heap management. The final "aha!" moment came from human observation, experience, and intuition: combining knowledge of the DOM, React's internals, and the visual data from browser devtools.
The AI is a phenomenal tool for executing a strategy, but it can't yet form one from scratch when faced with a complex, non-obvious bug. My takeaway? Use
AI to build your experiments, but use your brain to interpret the results.
Lately, I’ve performed a half-day-long experiment, where I tried to vibe-code adding a new feature to a CRUD application, using our Bootzooka template. My idea here was to try to simulate, at least a bit, how an AI agent would behave not in a green-field startup, but in a corporate / scale-up environment. Hence, we are looking at extending a JVM microservice with a React frontend, with some existing constraints, and a specific architectural style that should be followed.
Our template already has some basic features implemented, and the code is structured in a specific way, encapsulating the functionalities in modules. My goal was to check if the LLM (in this case, Claude Sonnet 4 through Cursor) will be able to follow these patterns, and what kind of effort it takes to implement such a feature.
What I Achieved That Wasn't Possible Before
You can watch the whole process on YouTube, if you’ve got a spare hour. TL;DR: yes, LLMs follow the patterns without much problems; and yes, adding a CRUD feature to the app was considerably faster than if I did it all by hand.
Where AI Failed Miserably
But there are caveats as well. First off, the LLM does need some guidance from time to time. That’s where it’s very useful for you to know the frameworks and libraries used, as the guidance often concerns some fine details that Claude misses. Having this knowledge allows you to prevent the coding agent from reinventing the wheel anew.
One area where AI failed to read my intentions correctly was around committing. Since I was trying to do most of the work by prompting, I also asked the AI to commit after a certain development stage was complete. However, the result was that from this point on, the coding agent was trigger-happy and later committed stuff that I hadn’t yet reviewed. So, be careful when asking for git commits - starting a new chat after that helps to resolve the issue!
Where AI Saved Me the Most Time
The feedback from the compiler (I was using type-safe languages: Scala and TypeScript), along with a codebase-aware MCP server, turned out to be a huge time-saver - but for the LLM, not directly for me. It turns out that tight and precise feedback loops are important for AIs as well.
The time-saving also poses a problem: once I give Claude a prompt to implement some code, the completion takes a small, but non-trivial amount of time (say, 5 minutes). During that time, the LLM iterates on the design, fixing compiler errors, test errors, etc. What should the developer do, then? It’s too short to context-switch to another task. It’s too long to just watch the prompts unfold. We do need a way to amend our workflow while still supervising the agent.
In the project from the logistics industry, I needed to analyze over 20,000 logs from our vessel trajectory monitoring system to investigate shipping inconsistencies and bugs observed over the past months. The goal was to extract actionable insights like the top 20 most problematic port pairs, frequent collision routes, etc., and propose improvements.
Where AI Saved Me the Most Time
Instead of spending time manually developing complex data extraction logic and parsing patterns, the AI assistant understood our vessel/shipment domain context and helped rapidly develop precise log analysis scripts.
These scripts were intended for manual, one-time use to generate fast insights. Since they weren’t meant for production code, we focused purely on getting actionable results fast without worrying about code elegance or best practices. The whole operation took about a dozen or so minutes.
What I did:
1. Downloaded generated CSV with +20k logs from Kibana
2. Used simple meta-prompt with GPT 4o to generate the target prompt:
1You are an expert in prompt engineering and Python scripting. Improve the following prompt to generate a more detailed summary. Propose in separate paragraphs extra metrics which could be possibly helpful based on given examples. Think before the final answer. Ensure that your generated prompt is clear and unambiguous, and include all necessary information.
2
3<prompt>
4
5[ ... I wrote in simple terms what I want to achieve, with a few examples of the logs without focusing on the form and technical details of the implementation ... ]
6
7</prompt>
8
9Return the final prompt in markdown format
3. Slightly adjusted the generated prompt, and used it to generate Python scripts using claude-4-sonnet in Cursor.
4. Executed a single prompt - make output look nice (but no emojis!) - in Cursor.
All data shown below is synthetic for demonstration purposes.
We recently gave a fixed-price estimate for a project involving LLMs. My key lesson in AI capacity planning is that usage-based cost estimation requires more than multiplying token prices by expected volume.
The foundation is solid observability: we trace the entire request pipeline, including STT, LLM, and embedding steps, using OpenTelemetry-compatible tools like Arize Phoenix or Langfuse. We then gather token usage data across representative samples to calculate averages, medians, and variances. Using a centralized proxy like OpenRouter simplifies multi-model routing and cost tracking.
We then project monthly costs using pricing calculators, factoring in expected request volume, retries, and infrastructure overhead (like vector DBs or gateways). It’s crucial to compare pricing tiers across providers, account for threshold-based pricing changes, and monitor quota usage.
Finally, we provide min/max cost projections, plan for price drops over time, and set up safeguards like usage alerts and manual approval thresholds to prevent billing surprises.
Coding: Small Tidbits
I’ve been playing with Agentic Mode in Cursor, it’s great, but I keep running into rate limits with Anthropic. I’ve switched to using cursor-small as a coding agent, and for simple scripts, it works surprisingly well. If you have tips for working around those rate limits, I’d love to hear them.
Context7 continues to impress. I needed to integrate the ElevenLabs API, and while Cursor managed to install and locate the package just fine, it struggled to pick the right methods (I’m writing in TypeScript), scanning objects and digging into node_modules. Once I gave it access to Context7, it just read the docs and nailed the integration in one shot, including writing tests.
Pain: Managing MCPs
Anyone using a good Model Context Protocol tool that isn’t Claude? Many tools pull MCP from Claude’s settings, but I’d love a “Postman for MCP”. This repository lets me selectively inject parts into a tool’s settings and generate proper manifests. The Context for MacOS is solid, but adding MCP there still feels clunky, and I haven’t found an easy way to export it back (or maybe I missed something). Curious what your workflow looks like.
Where AI Saved Me the Most Time?
I love ChatGPT's Record Mode, available in the Native App. It's fantastic for meetings, both in English and Polish. It doesn't clutter the meeting like all other recording bots do and works with virtually any audio source and every single meeting, so it's great in that regard as well - as it’s tool-independent. Overall, it's a really great piece of work. The notes it writes are also very cohesive and highlight the most important things. In fact, aside from the fact that at least on macOS it degrades the audio quality because it switches to mic mode, which is immediately noticeable, it's a very good improvement to my workflow.
Seems like ChatGPT has killed yet another application, Granola, because this solution is "good enough" for most people (myself included).
Also, I re-started using Gemini 2.5 Pro/Flash and both are surprisingly good. They have a bit of a different flavour than ChatGPT, so I have not fully switched yet, but it is really promising.
NotebookLM is an excellent tool for conveniently creating all kinds of software documentation. It seems especially useful for smaller projects that involve a lot of early-stage documentation, notes, lessons learned, or meeting minutes. It works as a sort of an assistant project manager or Tech Lead, enabling quick creation of proper documentation, asking questions, sharing materials, and drafting preliminary versions of documents required in more complex processes. It is a very useful tool.
Additionally, there’s now a Pro version available with a confidentiality level suitable for paying Gemini users, and several free alternatives also emerged. A very handy tool.
I hadn’t updated my local environment setup for quite some time, so when a colleague of mine decided to update the Java version across the entire project. I was caught off guard: code that had compiled fine just a minute earlier suddenly started throwing errors in several places.
Rather than spending time googling how to properly update my setup, I decided to use ChatGPT for help. What would probably have taken me around 30 minutes of reading documentation and figuring things out manually, I managed to solve in under a minute. I had all the answers prompted instantly, and after following the steps in the generated instructions, everything was working fine.
What I updated
SDKMAN,
Local Java version (from 17 to 21),
Java version in sbt,
Java version in the entire IntelliJ project.
Prompts I’ve Used
In my local environment I’m using macOS, Java version 17, and SDKman. I needed to update the Java version to version 21.
After that, I was prompted with comprehensive instructions that did everything I needed.
Then I asked GPT:
I have to make updates in IntelliJ, so I use the updated Java version inside the IDE. I’m running into Class org.junit.rules.TestRule not found - continuing with a stub error.
Once again, I received well-written instructions that guided me step by step through which option I should change in order to make the recent update work in IDE.
This issue of This Month We AIed was sponsored by:
velocity & experimentations,
human insights,
and new workflows 😀
We’ll return next month with more wins, fails, and eyebrow-raising dev hacks. If you have an AI tale of your own, share it with us! Write a comment or ping us on X, Mastodon, Bluesky!