Generating Direct-Style Scala 3 Applications

While being the best language out there, Scala isn't (yet?) the most ubiquitous language for developing business applications. But we're on a mission to change that! Starting with what everybody's doing right now, of course: generating applications using Claude Code or other LLMs.

AIs have a strong understanding of the most popular application stacks, such as TypeScript, Java, and Python. But how does an AI agent fare when tasked with writing a Scala 3 application? And making this ever harder, requesting direct style, which isn't the most widespread approach within Scala?

What kind of guidance (if any) does an LLM need to write a direct-style Scala 3 application? Let's find out!

The baseline: Writing an application hand-in-hand with the LLM

To see how Claude Code (my current LLM of choice) handles the task of writing a direct-style Scala 3 application, I've generated (a number of times) an application that accepts SOAP messages via an HTTP API, stores the incoming data in Kafka, and runs a background process that transfers the data from Kafka to S3, grouped in per-hour files.

The tech stack is given as:

Scala 3
direct-style approach (so no Futures or IOs)
Java 21 with virtual threads
Tapir for HTTP
Ox for concurrency / utilities

The problem is quite diverse when it comes to the tasks to be performed, which I think makes it a good benchmark for LLMs:

bootstrapping a Scala 3 direct-style application
exposing an HTTP API with OpenTelemetry metrics & tracing
accepting non-standard, that is, non-JSON, input (SOAP, XML parsing)
integrating with a 3rd party system: Kafka
customising the Tapir errors generated by interceptors to be SOAP-compatible
running a background process
interfacing with a Java library (AWS SDK)
non-trivial business logic (bucketing data into hourly files, taking into account latecomers and optimizing file writes / uploads)

As a first shot, I wrote the application interactively, talking with Claude through each feature, reviewing the code automatically and manually, and pointing out what can be improved in writing "good direct-style Scala" (or at least what I consider good).

During this session, the CLAUDE.md was empty, and I was making notes on where Claude stumbled and needed the most corrections. Overall, Claude didn't have many problems using either Scala 3 or writing direct-style Scala. Most of the time, it "just worked". So if you know some Scala, and your workflow is more interactive than autonomous, you might as well stop here.

If you're curious, here's the code that Claude wrote with my help. I've been using OpenSpec to research and specify the features before implementing them.

The result of this experiment was quite satisfactory, but can Claude generate a well-designed Scala 3 direct-style application autonomously? For that, we'll need to work on some guidance.

CLAUDE.md: Using Scala tooling

The first thing I noticed is that Claude is very insistent on using bash and sbt for everything: compiling, running tests, and even inspecting dependencies (by unpacking jars and decompiling classes ...).

sbt isn't exactly a speed champion, so the first piece of advice was to ALWAYS use sbt --client. This starts a background sbt server and then just connects to it, making subsequent sbt commands much faster. Well, unless the LLM decides to run the application, then you have to use sbt directly, as otherwise you can't easily interrupt or kill the application if it misbehaves. This results in the first part of the general-Scala CLAUDE.md:

Secondly, I'm working in VS Code with Metals, so all of this happened despite Metals MCP being installed and available. Clearly, we can do much better!

Metals MCP exposes tools that cover 90% of the use-cases for which Claude was using sbt via bash. But for some reason known only to the LLM gods, even though the MCP is enabled, it's usually not used. However, adding additional instructions to CLAUDE.md makes a lot of difference:

With these simple instructions, Claude almost always uses Metals MCP, which is a pleasure to watch; the answers come back almost instantly, symbols are inspected in a sane way, and a lot of the thought process is just interactions with the test, compile, import-build, inspect, etc., tools.

CLAUDE.md: Scala 3

In general, Claude knows Scala 3 quite well. It uses all of the new types, new syntax, etc., without problems. The only piece of advice I needed to give here is to use braceless syntax, something which is more of a per-project preference than mandated by Scala 3:

CLAUDE.md: Direct-style

When asked to write a Scala project, Claude's default will be to use cats-effect. However, my goal was to use a direct style, which was clearly marked in the tech stack. Once this requirement was in place, Claude mostly generated correct code, though there were some recurring problems.

These mostly stem from the fact that, while Claude does know Tapir and Ox, this knowledge isn't very thorough and is often based on older versions of them. So we have to let Claude know about the newer direct-style-tailored APIs:

Autonomously generating an application

To autonomously generate the entire application, we need more than lessons learned from the baseline experiment.

Since I had all of the specifications (quite elaborate) written and saved in the repository by OpenSpec, I asked Claude to generate a prompt that could be used to re-generate the same application from scratch.

And after stripping out all the technical details that Claude initially included (which I requested so that future agents would have more of a free hand during development), here's what I got.

Equipped with the CLAUDE.md file described above and the prompt, I fired up a Sandcat container so I could run Claude fully autonomously & dangerously, without it asking any questions. I also added a brief description of the development process (more on that later). And the results were... ok.

The application worked; all the pieces were there, but the biggest weakness was the pervasive use of mutable data structures, returns, vars, etc. In general, the code was quite imperative.

CLAUDE.md: Functional programming and other style guidelines

I've run a couple of experiments trying to make Claude a more "functional" programmer. I can't say I've achieved 100% success, but with the additions below, things are much better than they were initially. Note that this is still while generating the whole app autonomously:

Apart from avoiding mutability, this also includes guidelines on visibility and naming, aspects that often could use improvement.

That said, the baseline version that Claude developed iteratively with my guidance was still much cleaner on the "functional" front. This was mainly evident in the Kafka -> S3 view generator, which buckets data into hourly files.

When running autonomously, Claude consistently creates a mutable Builder-like class, which either uses a mutable collection or a class var field. On the other hand, in the cooperative baseline, we ended up with an immutable State-like class, which was passed & returned by the main business logic, and stored in a method-local mutable variable. Such a design is harder to misuse: the mutability is more local, and there's simply no way to get concurrency problems. But also, hard to replicate.

I tried various wordings in both the prompt and CLAUDE.md, but haven't achieved complete autonomous success yet. That's one place for improvement & future research.

The development process

The prompt wouldn't be complete without a description of the development process—how to approach writing a small, but non-trivial application from scratch, given only a specification. Here, the instructions include first generating a plan.md file and splitting up each requirement into tasks. Then, working iteratively on each task, nothing revolutionary or extraordinary.

However, one thing I found most beneficial for improving code quality (and often correctness) was code reviews. Not done by me, but done by Claude itself. I have a number of review agents which I often use (covering target functionality coverage & correctness, performance, code readability, test minimalism, and abstraction/DRY). Asking Claude to review the code using those agents after each task did burn some additional tokens, but discovered and fixed many problems with the code.

It might be a topic for another post, but I found it hard to make Claude review its own work after each task. I tried using all-caps ALWAYS and MUST, being polite, or not so polite, repeating the review requirement, etc., but the agent conveniently "forgot" to run the reviews after the first or second task. Or performed the review only once, skipping the re-review requirement. Hence, this is another area for improvement.

What finally worked better was asking Claude to review task groups (corresponding to entire features) rather than single tasks. All in all, here's the part of the prompt describing the development process (I'm including it for completeness; this part is not strictly related to writing direct-style Scala apps):

Teaching Claude more about direct style

The autonomously generated direct-style Scala app was good, but what's in the Bootzooka template was better. There are some quirks around observability and virtual threads, as well as direct-style-specific Scala patterns or newer Tapir/Ox APIs that Claude simply didn't know about. How to fix that?

Well, one possibility would be to just take the Bootzooka template and customize it. But I took a different route.

First, I compressed the Tapir docs and Ox docs table of contents into a single file, with link -> chapter description pairs. However, adding this to the prompt didn't make a large difference.

My second attempt was different. I asked Claude to inspect the Bootzooka codebase and generate a use-case-driven "direct-style book". The use cases were hand-picked by me, but the text is entirely AI-generated. Some refinements were necessary, as initially, Claude was much too talkative and hallucinated a bit in a couple of places.

The general idea is that, as part of the prompt, the LLM gets the following:

The index.md contains a list of references to chapters, alongside descriptions of what each use-case covered by that chapter actually is. That way, the LLM can select the relevant use cases for the problem at hand and load them into the context selectively.

And here the results are much better! The generated application properly integrates OpenTelemetry observability with JVM metrics, virtual thread context propagation, and logging integration. Claude no longer runs extensive research on testing endpoints with TapirSyncStubInterpreter, but has a template solution that additionally leverages the sttp endpoint-as-request interpreter.

Of course, maintaining direct-style-guide is additional work, as it needs syncing with changes in Bootzooka, Tapir, Ox, and sttp, but it should be automatable using an agent. Given, of course, that others find this idea useful as well (so if you haven't yet, please leave a ⭐ on our OSS repositories! :) ).

TL;DR: How to generate direct-style Scala 3 applications

Summing up: Claude generates correct direct-style Scala 3 applications based solely on the tech stack and chosen paradigm. However, there are a couple of areas where guidance is useful.

As part of the CLAUDE.md, include instructions to:

use Metals MCP for compiling, testing, importing build, inspecting symbols, finding dependencies
when using sbt, always use it in client mode, except when running applications
use braceless syntax, if that's your preference
use functional programming, and avoid using mutable state or mutable collections
use some direct-style-specific Tapir APIs for more concise code
properly use Ox nested concurrency scoping

Then, Claude generates an elegant direct-style app, given a prompt which includes:

a general description of the problem
the development process, stressing the necessity of code reviews
the tech stack: here Scala 3 + direct-style + Tapir + Ox + sttp
the specification of the features to be developed

Finally, to streamline development (saving Claude some research) and to properly implement some less-obvious aspects of direct-style applications, a very useful approach is to ask Claude to follow what's already implemented in the Bootzooka template. Not directly, but by including in the prompt a link to a book generated by AI, for AI: direct-style-guide.

So, what are you generating in Scala?