Different approaches to code organization
Code organization is easy when work is done in a small team, and the codebase is not yet extensive. However, as the project's scope grows and new sub-teams appear, the question of how to subdivide and manage the growing codebase becomes increasingly important.
For many product owners and developers alike, the most intuitive approach would be to split the codebase into repositories roughly representing modules. Each sub-team takes the role of an owner of a single repository, managing it and driving its development. Separation of modules ensures that it is easy to onboard new developers as they only have initial contact with the small surface of the project space. Testing single modules is also easier and faster in this setting.
However, separate repositories need to be interconnected. Depending on the technology, multiple approaches to that challenge exist. They differ from having build artifacts pushed to some form of organizational artifact repository regularly to using git submodules or their equivalent in other version control systems. No matter what approach is used, the problem of synchronizing sub-model versions is hard to overcome. Releases are complex processes. But overall, the biggest drawback of separate repositories is the fact how hard it is to make any change that touches multiple modules.
Multiple successful companies developed an alternative approach to separate repositories. They are using a so-called monorepo, which is a single repository containing both the entirety of the project’s code responsible for the business logic and the code and configuration for tools managing the development, build, and deployment processes.
Benefits of monorepo
Let’s briefly discuss the potential benefits of a single unified monorepo for bigger projects. This will also demonstrate some shortcomings of the multiple repositories approach.
Integration tests
In the monorepo setting, integration tests are easy to set up and maintain. There is no difference in complexity in how those tests would be set up in a large monorepo and in a small project containing only a single module. In comparison, integration tests in the multi-repo setting probably need to reside in a separate repository, and each time when they are run on the CI, need to load the correct versions of all the tested repositories and their dependencies. This creates the potential for errors resulting from version mismanagement and increases friction, resulting in a longer feedback loop between code changes and testing their correctness.
Monorepos improves business logic correctness and system security by making integration tests easy to maintain and keeping the feedback loop of code changes short.
Dependencies management
In a microservices environment, it is expected that microservices have their own external dependencies. It is also not unusual that there is considerable overlap between sets of dependencies of different modules. Each microservice manages its dependencies’ versions. Separate version definitions for each microservice can quickly lead to version mismatches and dependency conflicts.
Microservice architecture inherently has some resiliency to that problem, but as the versions drift away, there is no indication of possible underlying troubles also on the API level. Misaligned versions can lead to unexpected breakage of the protocol between microservices and as a result to hard to pinpoint and fix errors. Version errors can be avoided by setting up a firm dependencies version policy. However, this policy is always hard to enforce and maintain across all disconnected repositories.
In monorepo, dependency policy is usually hardwired, and its enforcement is automatic. This is achieved by having a central configuration file where all the dependency versions are stored, and each microservice is forced to use the same versions. Such constraint is unreplicable in the multi-repository setting as the version configuration must exist in a separate repository. Maintaining the up-to-date version of that unique repository in all downstream projects would be prohibitively complicated.
Simultaneous release
Whenever a crisscrossing change in the codebase touches more than one microservice, some form of simultaneous release of more than one module is needed. Such a process requires not only the synchronization of versions between changed microservices but usually also updating versions that other microservices are using. Multiple, typically complex, strategies exist for performing that task without the risk of system breakage.
The monorepo setting can eliminate release coordination complexity to a great extent. It guarantees that each service uses the up-to-date versions of other services. Moreover, integration tests ensure that all those versions are compatible. Such an approach considerably reduces the costs of system maintenance particularly when it comes to coordinating simultaneous releases.
Synchronized versions also significantly improve system security. Using aligned service versions doesn’t mean that with monorepos, you can ignore backward compatibility for your services; it simply provides better options for managing and testing the version compatibility of your deployments while at the same time eliminating some classes of errors by design.
Code review
Each code change must be reviewed before being merged into the main codebase. This is particularly important for the crisscrossing changes, as they have a higher potential for creating breakages. In a multi-repository setting, the reviewer must assess changes in multiple repositories separately. This creates a mental burden, causes friction, and is a potential source of human error.
Monorepo, by its nature, provides a unified view for all the changes, making multi-module modifications as easy to review as ones within a single module. By reducing friction, it improves the developer’s efficiency and system robustness.
Code reuse
Multiple previous points mentioned how monorepos facilitates wider changes in the codebase, usually touching things deep into the dependency graph. This gives developers more confidence and makes it easier to spread the knowledge about the system's deeper workings to the broader team. It is also much easier to reuse and generalize the code.
Much was written about all the benefits of code reusability. The most worth mentioning is increased developers' efficiency due to eliminating the need for multiple sub-teams to repeat the same task or avoiding code copy-pasting (to avoid introducing new repository dependencies and to reduce the burden of coordinated releases). The other, not less important, is the possibility of applying the same security standards in the whole ecosystem.
Potential challenges in monorepos
While there is a wide array of benefits to monorepo as a means for code organization, there are also some potential challenges.
- The size of the codebase usually correlates with build configuration times. As monorepos contain all microservices of the system and usually all tools required to build and manage the project, they can contain thousands of modules. No matter what build tool the project uses, each build's first step is the so-called configuration phase. This means collecting the metadata about all the modules and resolving dependencies. For some build tools (including sbt), the memory and time complexity of those tasks can grow quadratic with the size of the project. This is why many build tools are unsuitable for the monorepo setting.
- It is common for monorepos to be heterogeneous, i.e., contain code written in multiple languages, with different execution modes, managed by various tools. This can lead to very complex relationships between different tools used in the project and incoherent dependency management.
- The “contact surface” of the monorepo projects can be overwhelming for developers joining the team. This can lead to a prolonged onboarding process when the new developer requires constant assistance from more experienced colleagues and is less productive.
- If the project was developed using multiple interconnected repositories, it might pose a real challenge to change its course and migrate it to the monorepo. The migration process may be extended and require specific knowledge. Moreover, the benefits of the monorepo approach are visible only after the migration is concluded.
While these challenges may seem discouraging, the selection of proper tools can overcome them.
Capabilities of tools for monorepos
After long experience with migrating and maintaining monorepo projects, we at VirtusLab believe that the most crucial thing determining productivity in a monorepo setting is the choice of the tool to manage the project. The difference in the capabilities of projects can be staggering.
- Full reproducibility is a cornerstone of multiple benefits from monorepos. It means that precisely the same artifacts will be generated for the same state of the codebase. Those artifacts will be identical and have the same checksum, no matter the environment. While it is achievable in multiple build tools with some configuration, it is always fragile and easy to break, and only a tiny minority of tools guarantee full reproducibility.
- The tools vary widely in terms of how they use caching. For some, it is not guaranteed, and time benefits are luck-based. What’s worse, caching can introduce flakiness to test suites in some tools, especially those without reproducibility guarantees.
- It is convenient to offload some work from the developer’s computer to some remote machine on demand while sharing parts of the build state. For some build tools, it requires a huge effort and not always a clean workaround, while others come with this feature out of the box.
- As with all software, tools for monorepo management differ in terms of their performance. A faster tool means less frustration, more productivity for the developers, and a tighter feedback loop during development.
- Different tools support different ranges of technologies, such as programming languages or test frameworks. Lack of support for even a single technology used in the project can mean a considerable loss of effectiveness and increased friction.
Bazel, as the recommended tool
As our choice of tool for maintaining monorepos, we can confidently recommend Bazel. It excels in all the capabilities mentioned in the previous section. For any proper configuration, it guarantees full reproducibility of builds. What's more, it actively makes it really hard to configure it so that reproducibility is broken. Bazel has a state-of-the-art caching infrastructure for the build tool. It out-of-the-box supports remote execution. It is exceptionally performant. Last but not least, it supports various programming languages and technologies, both built-ins and third-party extensions.
Bazel helps to overcome the aforementioned potential challenges in monorepo.
- Thanks to its excellent caching and great configuration architecture, it is exceptionally performant in terms of build configuration.
- It solves the problem of tool heterogeneity by having a well-designed and flexible system of rules and targets that allows for unified dependency management.
- Thanks to a system of so-called project views, developers are gradually introduced to the project, exposing them to only the parts of the system relevant to their current work. These combats overwhelm and increase productivity early on.
- While it cannot be understated that migrating from other tools to Bazel is a significant undertaking, there are solutions to make it bearable. For example, for Rust modules, Bazel offers gradual migration by initially having its dependency configuration stored in ordinary cargo files. This allows developers not to feel alienated by the new tool and use their old habits for the migration period while still enjoying the benefits of the monorepo.
To sum it up, our experience at VirtusLab allows us to confidently say that many bigger microservice-based projects can benefit significantly from migrating to the monorepo, and Bazel is the single best tool for monorepo management.





