Code organization is easy when work is done in a small team, and the codebase is not yet extensive. However, as the project's scope grows and new sub-teams appear, the question of how to subdivide and manage the growing codebase becomes increasingly important.
For many product owners and developers alike, the most intuitive approach would be to split the codebase into repositories roughly representing modules. Each sub-team takes the role of an owner of a single repository, managing it and driving its development. Separation of modules ensures that it is easy to onboard new developers as they only have initial contact with the small surface of the project space. Testing single modules is also easier and faster in this setting.
However, separate repositories need to be interconnected. Depending on the technology, multiple approaches to that challenge exist. They differ from having build artifacts pushed to some form of organizational artifact repository regularly to using git submodules or their equivalent in other version control systems. No matter what approach is used, the problem of synchronizing sub-model versions is hard to overcome. Releases are complex processes. But overall, the biggest drawback of separate repositories is the fact how hard it is to make any change that touches multiple modules.
Multiple successful companies developed an alternative approach to separate repositories. They are using a so-called monorepo, which is a single repository containing both the entirety of the project’s code responsible for the business logic and the code and configuration for tools managing the development, build, and deployment processes.
Let’s briefly discuss the potential benefits of a single unified monorepo for bigger projects. This will also demonstrate some shortcomings of the multiple repositories approach.
Integration tests
In the monorepo setting, integration tests are easy to set up and maintain. There is no difference in complexity in how those tests would be set up in a large monorepo and in a small project containing only a single module. In comparison, integration tests in the multi-repo setting probably need to reside in a separate repository, and each time when they are run on the CI, need to load the correct versions of all the tested repositories and their dependencies. This creates the potential for errors resulting from version mismanagement and increases friction, resulting in a longer feedback loop between code changes and testing their correctness.
Monorepos improves business logic correctness and system security by making integration tests easy to maintain and keeping the feedback loop of code changes short.
Dependencies management
In a microservices environment, it is expected that microservices have their own external dependencies. It is also not unusual that there is considerable overlap between sets of dependencies of different modules. Each microservice manages its dependencies’ versions. Separate version definitions for each microservice can quickly lead to version mismatches and dependency conflicts.
Microservice architecture inherently has some resiliency to that problem, but as the versions drift away, there is no indication of possible underlying troubles also on the API level. Misaligned versions can lead to unexpected breakage of the protocol between microservices and as a result to hard to pinpoint and fix errors. Version errors can be avoided by setting up a firm dependencies version policy. However, this policy is always hard to enforce and maintain across all disconnected repositories.
In monorepo, dependency policy is usually hardwired, and its enforcement is automatic. This is achieved by having a central configuration file where all the dependency versions are stored, and each microservice is forced to use the same versions. Such constraint is unreplicable in the multi-repository setting as the version configuration must exist in a separate repository. Maintaining the up-to-date version of that unique repository in all downstream projects would be prohibitively complicated.
Simultaneous release
Whenever a crisscrossing change in the codebase touches more than one microservice, some form of simultaneous release of more than one module is needed. Such a process requires not only the synchronization of versions between changed microservices but usually also updating versions that other microservices are using. Multiple, typically complex, strategies exist for performing that task without the risk of system breakage.
The monorepo setting can eliminate release coordination complexity to a great extent. It guarantees that each service uses the up-to-date versions of other services. Moreover, integration tests ensure that all those versions are compatible. Such an approach considerably reduces the costs of system maintenance particularly when it comes to coordinating simultaneous releases.
Synchronized versions also significantly improve system security. Using aligned service versions doesn’t mean that with monorepos, you can ignore backward compatibility for your services; it simply provides better options for managing and testing the version compatibility of your deployments while at the same time eliminating some classes of errors by design.
Code review
Each code change must be reviewed before being merged into the main codebase. This is particularly important for the crisscrossing changes, as they have a higher potential for creating breakages. In a multi-repository setting, the reviewer must assess changes in multiple repositories separately. This creates a mental burden, causes friction, and is a potential source of human error.
Monorepo, by its nature, provides a unified view for all the changes, making multi-module modifications as easy to review as ones within a single module. By reducing friction, it improves the developer’s efficiency and system robustness.
Code reuse
Multiple previous points mentioned how monorepos facilitates wider changes in the codebase, usually touching things deep into the dependency graph. This gives developers more confidence and makes it easier to spread the knowledge about the system's deeper workings to the broader team. It is also much easier to reuse and generalize the code.
Much was written about all the benefits of code reusability. The most worth mentioning is increased developers' efficiency due to eliminating the need for multiple sub-teams to repeat the same task or avoiding code copy-pasting (to avoid introducing new repository dependencies and to reduce the burden of coordinated releases). The other, not less important, is the possibility of applying the same security standards in the whole ecosystem.
As our choice of tool for maintaining monorepos, we can confidently recommend Bazel. It excels in all the capabilities mentioned in the previous section. For any proper configuration, it guarantees full reproducibility of builds. What's more, it actively makes it really hard to configure it so that reproducibility is broken. Bazel has a state-of-the-art caching infrastructure for the build tool. It out-of-the-box supports remote execution. It is exceptionally performant. Last but not least, it supports various programming languages and technologies, both built-ins and third-party extensions.
Bazel helps to overcome the aforementioned potential challenges in monorepo.
- Thanks to its excellent caching and great configuration architecture, it is exceptionally performant in terms of build configuration.
- It solves the problem of tool heterogeneity by having a well-designed and flexible system of rules and targets that allows for unified dependency management.
- Thanks to a system of so-called project views, developers are gradually introduced to the project, exposing them to only the parts of the system relevant to their current work. These combats overwhelm and increase productivity early on.
- While it cannot be understated that migrating from other tools to Bazel is a significant undertaking, there are solutions to make it bearable. For example, for Rust modules, Bazel offers gradual migration by initially having its dependency configuration stored in ordinary cargo files. This allows developers not to feel alienated by the new tool and use their old habits for the migration period while still enjoying the benefits of the monorepo.
To sum it up, our experience at VirtusLab allows us to confidently say that many bigger microservice-based projects can benefit significantly from migrating to the monorepo, and Bazel is the single best tool for monorepo management.