What It’s Like to Work as a Dev Tooling Expert
Interview with Jerzy Muller, Scala Evangelist & Dev Tooling Expert at VirtusLab
Sometimes Google returns zero results. That’s usually where tooling engineers begin their work. In large-scale engineering organizations, their job is to make impossible systems work together and keep thousands of developers unblocked when everything starts breaking at scale. We sat down with Jerzy Müller, Scala Evangelist and Dev Tooling Expert at VirtusLab, to talk about what this work actually looks like behind the scenes.
So first - what actually makes a Tooling Expert different from a regular developer?
The biggest difference is probably that there’s almost no template work. As a normal developer, you can become “the Spring person,” or “the React person,” and after a while a lot of tasks start looking similar. Different business logic, same patterns. In tooling, that basically never happens. Every next problem is often something completely new, radically different from the previous one. You constantly operate in environments that are messy, unstructured, and full of systems that were never really designed to work together.
A huge part of the job is taking tools that “absolutely shouldn’t work together” and somehow making them cooperate anyway. Sometimes literally with duct tape and prayers. It requires a lot of creativity and improvisation.
And then there’s the scale. The kinds of systems we build support massive engineering organizations, so scaling problems and integration issues are just everyday reality.
What kinds of problems are we talking about exactly?
One of the defining characteristics of this role is that very often there’s no existing solution. You hit problems where Google gives you zero results. Not “bad results.” Literally nothing. So you can’t just copy a Stack Overflow answer. LLMs? Hallucinations or garbage. You have to actually understand what’s happening under the hood and invent a solution yourself.
Can you give some practical examples?
Yeah, sure. One example was optimizing large-scale testing systems by distributing tests across an internal cloud platform that was never originally designed for this kind of workload. We ended up generating hundreds of descriptors dynamically and aggregating results manually just to make the whole thing scale properly.
Another interesting case was discovering bugs deep inside open-source ecosystems like Java or Scala, as well as in commercial tools such as Bitbucket or Jira. There were issues that nobody had identified before, simply because very few companies operate at this scale.
Sometimes critical tooling suddenly stops working and you need to build patches or workarounds almost immediately. We’ve also had moments where entire systems had to be rewritten from scratch because they stopped scaling altogether, especially after the acceleration caused by AI adoption. At one point, we even ended up building our own internal build system because none of the existing solutions on the market could handle our scale and requirements.
That sounds like a huge amount of responsibility.
It is. If you break a tooling system, you’re not affecting one feature or one customer. You can block hundreds, sometimes even thousands of developers from being able to work. So the impact is massive. Which is why, interestingly, there usually isn’t much “ship-it-fast-at-all-costs” pressure in this role. The priority is quality.
The mindset is basically: “Better to deliver tomorrow and have it stable than deliver today and prevent people from working.” The only moments where real time pressure appears are exceptional situations - legal requirements, sudden external changes, timezone-related issues, critical security vulnerabilities, things like that.
So what kind of mindset does someone need to succeed in this role?
Internally, we actually have a term for this: radical ownership. You need curiosity. Not just knowing how to use a tool, but understanding why it works the way it works. A lot of people can copy-paste solutions. But tooling work requires people who can independently investigate problems, adapt to completely new technologies, and figure things out without a guidebook.
There’s also a huge “MacGyver” aspect to the job. You constantly build new things out of imperfect, incompatible components. Honestly, it’s a little like that scene from Apollo 13 where they had to make a square connector fit into a round socket using random parts lying around.
That’s surprisingly close to real tooling work.
How important is communication in this kind of role?
Extremely important. And that surprises a lot of people. Many engineers often choose highly technical careers because they want less interaction with others … and then it turns out communication is absolutely critical.
Tooling experts spend a lot of time talking to teams, understanding undocumented internal systems, asking questions, gathering context. And users rarely describe their actual problem correctly. Usually they complain about symptoms and propose a solution they think they need but often that solution is wrong. So part of the job is listening carefully, extracting the real issue underneath, and solving that instead.
What about ownership after deployment?
Ownership doesn’t end when you merge the PR. If you build something used by hundreds of developers, you effectively become part engineer, part helpdesk. That’s why we started building more and more “self-healing” systems - applications capable of detecting and automatically repairing common failures on their own. Otherwise the operational overhead becomes impossible.
How does the team usually approach solving problems?
A lot of brainstorming. We often sit together and analyze: what the actual problem is, what tools are available, what constraints exist and how we can combine everything into something workable.
AI is actually very useful in this process especially for brainstorming and idea generation. But it still requires experienced oversight, otherwise the output quality drops quickly. Usually we start with: MVPs, spikes, proof-of-concept implementations, just to validate whether the idea actually works before investing heavily into a full solution.
Do you work with long-term roadmaps?
Not really. The environment changes too fast. Plans are highly dynamic and can shift completely quarter to quarter depending on current priorities. AI adoption is a perfect example - suddenly entire scaling assumptions changed almost overnight.
What makes this role attractive to people?
Honestly: the lack of boredom. There’s constant change, constant learning, constant new challenges. You also get a huge amount of autonomy and influence. People with initiative can genuinely shape how tooling evolves across the organization. You’re not just implementing tickets. You’re identifying problems nobody noticed yet and proposing entirely new systems.
And what makes hiring difficult for these roles?
The profile is extremely rare. You need people who are exceptionally strong technically, comfortable operating in ambiguity, proactive, creative and also good communicators.
That combination is surprisingly hard to find. You can easily find brilliant compiler engineers who don’t want to talk to anyone. Or excellent communicators who struggle deeply with low-level technical complexity. Finding both in one person is difficult. And for new hires, the environment can initially feel overwhelming because there’s so much complexity and so little structure. Which is why we’ve been continuously improving the onboarding process.
What still motivates you after all these years?
Honestly, it’s probably the fact that the job never becomes predictable. Every few months there’s a completely new kind of problem, a new scaling challenge, or some system that suddenly behaves in a way nobody expected.
And there’s something genuinely satisfying about taking a messy situation, a bunch of incompatible tools, and somehow making everything work together. That “Apollo 13” kind of engineering never really gets boring.




