Companies, especially mid-sized and larger ones, face the challenge of structuring projects at scale. Arguably, almost every technology and paradigm can be used by a single developer to write a small piece of code, but obviously, this is not enough for any organization—engineering solutions must allow people from multiple disciplines to develop and maintain the application efficiently throughout the whole application lifecycle.
Unfortunately, many seemingly good ideas that work perfectly in small- and mid-sized projects, do not scale. Even worse, decisions made at the early stages are usually difficult to change later.
Instead of rolling the dice, it's beneficial to look at what the biggest companies do and consider if those solutions are useful for regular companies. At VirtusLab, we strongly believe that Scala and Akka are such technologies, as they are widely adopted by major industry players yet equally beneficial for regular companies.
To achieve success, it’s essential that not only your deployed solution scales and remains robust but that your code management and development processes also embody these qualities—especially as the codebase grows significantly.
In this regard, we've been employing Bazel for our clients and discovered that it's a great piece of engineering that provides tangible benefits for many organizations.
The case study presented here describes an ongoing project for one of our clients, a global freight forwarder. The collaboration on this project started over 3 years ago, and its success has allowed for the addition of more features and components. The project started to naturally accrue more features to cover more use cases, which usually means increasing code size and complexity.
The project is using Scala, Akka & Kafka to manage complexity while still handling massive amounts of information. The scale we're talking about is around one million messages per day, where each message is a 20KiB XML with a complex structure only to represent a real-world business complexity. We used 120 case classes just to create “the model”—the in-memory version of that XML structure. That structure is also needed to derive formats for Akka and Kafka serialization.
Among many design choices, there’s the one between the monorepo and a multi-repo approach. While the biggest companies use monorepos, together with the development of good tooling support, there are strong arguments for using it for smaller projects too.
And so our team decided to embark on the monorepo path.
Aside from the many advantages of the monorepo, there’s one challenge: setting up a tooling that will support the monorepo allowing for reaping the benefits, while not introducing hindrances.
The monorepo started by using the sbt (Simple Build Tool). It is a tool well-known among Scala developers, and back then, it was the de facto standard for Akka-based projects. This way, the first generation of the monorepo was born. While it was not perfect (notably not being a universal tool for both the backend and frontend), it seemed to strike the right balance.
The main benefit of this setup was simplicity, familiarity, and ease of maintenance. For example, to share the code, you didn’t need to publish the JAR—you could just import the library across the multi-project setup, while the build system ensures that those changes are correct. In general, making changes across the entire codebase is much easier.
However, one challenge started to appear as the project scaled and more components were added: due to the size of the monorepo, the sbt started to take 30-60 seconds just to load the configuration.
But that was only the tip of the iceberg: the compilation times were far too long, and tests (especially integration ones) were causing long and costly CI runs.
As mentioned in the intro, the system had to be performant enough to avoid being overwhelmed with ingress data. A big part of this performance was the time to read & preprocess the XML data.
Initially, for the proof-of-concept, we used a serialization library that generated deserialization code in the runtime. This soon became a major bottleneck to the point that tests started to timeout.
Thanks to Scala's built-in support for code generation (macros), we could move the bookkeeping work to the compilation stage, so that during message processing in runtime no extra work would need to be performed.
Moreover, thanks to Scala's type system, the code generator had to pass the type checker, meaning any bugs would immediately be exposed to the engineer. We also prefer to catch errors at compilation time rather than discovering them in production, as this makes fixes multiple times less expensive. We followed the same approach in other parts of our system as well (e.g., Kafka Avro message serialization with the avro4s library).
This change was successful, as it improved runtime performance. However, there was a trade-off: the additional processing and code generation performed during compilation resulted in longer compilation times.
To summarize this step: the monorepo with sbt multi-project setup, having substantial amounts of code generation, took 40–60 mins to compile. This was not sustainable and had to be dealt with.
The biggest compilation bottleneck was evident: the majority of compilation time was taken by the code generation in the model deserialization & pre-processing component.
One of the most obvious things to do was to realize that this part of the code didn’t need to be updated with every change. We’ve extracted these files as a new shared library, hoping that the engineers would usually not need to update these very often.
The compilation time for some changes was reduced to 20 minutes. Unfortunately, as the system underwent active development with significant work being done on the model, full recompilations were required, which took it back to 40-60 minutes.
In other words: the problem became slightly less severe but it was still there.
We then looked at the sbt—our build system. The sbt still needed 30-60 sec to read the project configuration. It seemed that our tooling reached its limits in this regard. We knew we could either spend time trying to apply low-level performance improvements to sbt or look for alternatives.
We’ve considered using Gradle. However, it had some deficiencies. For example, at the time we were evaluating it, the support for build caches was only available in the enterprise edition.
After some consideration, it was decided that migration to Bazel would be the best fit. The main reasons for using Bazel were:
- ability to massively parallelize the build,
- ability to use remote build caches to speed up all builds,
- ability to compile only what needs to be compiled,
- ability to run only affected tests (with a 100% guarantee of accurately detecting affected test),
- ability to unify a build tool across a polyglot system.
The last point meant that we could use the same build tool for our frontend part too, an improvement that we couldn’t easily achieve with sbt.
It was assumed that the migration would take around 3 months of work of a single developer: starting from understanding the codebase to finishing the job.
Let’s see if those expectations were met.
There are three main ways to approach migration to a new build tool: immediate migration, incremental migration with an integration layer, or maintaining parallel builds.
As the development of the core of the project couldn’t be stopped and the project was too small to invest in the integration layer, we went with a parallel builds approach.
This meant that during the migration, some backend projects were built using sbt while others by Bazel. This not only enabled us to compare the two directly but, more importantly, allowed us to carefully evaluate the correctness of new artifacts. It provided the flexibility to revert to sbt if we encountered any difficulties, and we could have even decided to abandon Bazel if it didn't prove to be a good fit. Having a safety net for such endeavors is always beneficial.
During the migration, we hit some obstacles caused by the relative youth of the Bazel-related ecosystem: editor/IDE support was lacking in features, some libraries caused some issues, and there were deficiencies in scala_rules. Fortunately, those problems are no longer the case nowadays: IntelliJ Bazel plugin has been improved considerably, libraries got fixed, and scala_rules have matured.
We’ve followed best practices to implement Bazel and to allow it to shine. Bazel was efficient out-of-the-box with its heavy use of parallelization and support for the build caches.
Build times dropped to 10 minutes. Moreover, during the code migration, we’ve identified compilation flags that helped reduce the compilation time by an additional 5 minutes. Admittedly, this optimization is not unique to Bazel and can be done in every build system. We don’t have any date to gauge how this 5 minutes benefit would translate to the initial monorepo.
To summarize, the compilation times on CI/CD, which were initially in the 40-60 minute range when we started, have been reduced to 5 minutes or less. In practice, it was faster. For example, the builds of the mainline code were able to use cache artifacts created as the code was maturing and undergoing scrutiny in the PRs (pull requests). It was quite often the case that the CI/CD build after merging the PR would take only 1 minute.
The migration ended in 5 calendar months with only 1-2 people actively working on it. Given that a lot of the time was spent on improving rules_scala and working on strategies on how to best approach the migration process, built with Bazel, we estimate that migration of similar projects would be a matter of weeks rather than months.
We’ve noticed something interesting: there was a period, just after Bazel's introduction where it would be blamed for bugs unrelated to the build system. To give one example, we had a case where a test-case would fail for exactly one person.
While Bazel was the first to be blamed, in reality, it was a misconfiguration in the timezones-related unit tests. On top of the fixes, we’ve persisted the timezone settings in the build system to avoid more issues like that. Bazel makes it very hard to shoot yourself in the foot and strives for reproducibility, but it’s not a panacea for everything and bugs find a way.
After 2 months of a transition period when both old and new systems were implemented, we were so happy with Bazel that we completely ditched sbt.
There were also positive surprises. The test execution was migrated to Bazel, as their results can be cached too. By utilizing annotations, flaky tests can be retried, reducing the number of false-positive test results and, in turn, reducing warning fatigue.
Code generation and even multi-stage builds are well supported on Bazel. Previously, we maintained some increasingly complex Jenkins scripts to perform some project chores. Those scripts were not developed together with the mainline code and occasionally caused issues. To add salt to the injury, Jenkins' Groovy runtime was problematic on its own.
Being presented with the ability to plug in those scripts as 1st class citizens into the monorepo, we were able to use the same tools & libraries in those scripts. More and more jobs and scripts were migrated into monorepo, as we got more confidence in their maintainability.
The last change that all developers welcomed was how fast Bazel was to start and finish the task at hand compared to sbt. The only practical way to use sbt was to keep the server running all the time, consuming resources and complicating various workflows. With Bazel, local builds are lightning-fast, giving developers much faster code-run loops and helping them to stay in the zone.
The name “Scala” is derived from the words “scalable” and “language” reflecting its design to manage increasing project complexity and user demands in the world of ever-evolving business and technical requirements. Technologies like Akka empower engineers to create solutions that scale both vertically and horizontally, and at the same time, optimize resource utilization.
We believe that adopting a monorepo approach with Bazel as the build tool offers a scalable solution for enhancing development efficiency, being a missing puzzle in total project development efficiency. This approach, regardless of project size or the number of collaborators, maintains an optimal project structure, fosters effective code collaboration, and minimizes build and test times.
Our case study not only supports this claim but also demonstrates that migrating existing systems to this solution is feasible and can yield additional unexpected benefits.