What is the cost of shared code?

Jason Conway-Williams
7 min readFeb 22, 2021

Today was the first time I took part in a conversation around the differences, benefits and pitfalls of monorepos and multirepos. To be honest, I didn’t contribute much to the conversation since I’ve only used the monorepo a couple of times over the past few months and the conversation came about during a squad planning/retro meeting where we were discussing mechanisms to remove bottle necks during the development, test and release process. The discussion started when I suggested that a bottle neck exists in the test and release stages because we have eight developers working in two repositories (one being a monorepo) and the QA team of two people can only merge, test and release one change at a time from each repository. My suggestion was to spend some time splitting up the monorepo into single project repositories (multirepos’)to remove the bottle neck — developers can then work in a repo on their own (most of the time) meaning that QA’s can merge, test and release more than one change at a time. Two of my colleagues sang the praises of monorepos and declared that we will be investing more into monorepos in the future and not moving away from them.

Full disclosure — although we have a monorepo and use Lerna to manage the packages within the repo, we currently do not have a fully functioning monorepo CI/CD mechanism. Although the monorepo Jenkins job does build and deploy npm packages to our npm repository, the build and deploy job for this monorepo deploys a single “Dockerised” service. In a truly monorepo world, the monorepo would be configured with a Jenkins pipeline that could individually build and deploy each project in the monorepo based on which files have been changed. Only building, versioning and deploying changes sub projects within the monorepo.

My colleagues views on monorepos spiked my curiosity and had me wondering why they thought so much about monorepos? On face value without doing any research and relying solely on my experience mainly with multirepos and the few months using monorepos, I couldn’t see any significant benefit that would make me want to move to monorepos? The idea of checking out and frequently pulling changes in a large Git project would infuriate me. I started looking for articles explaining the pro’s and con’s on monorepos and it seems that there are not only many articles around this subject but also many conflicting opinions, including my own.

The articles that i’ve come across above discuss the problems with scaling when it comes to large Git repositories and large code bases but what also would worry me is the possibility of merge conflicts and confusion around git commits and git management of the repository.

What problem do monorepos solve?

Once I’d got through all of what I consider to be cons, I thought to my self “What problem do monorepos solve?”. Again, the articles I have come across suggest the pro’s offered are:

  • Developers only need to checkout one repository — But doesn’t this mean that they have to wait longer for the checkout to complete? Don’t they also have to wait longer for updates to complete? They could also perform a sparse checkout to only checkout what they need to work on but this is more effort for minimal benefits. Not really a problem to solve.
  • Atomic commits and overall project status — Looking through the commit history within a monorepo for all changes within a sub-project could become extremely difficult and time consuming. Not really a problem solved.
  • Multiple pull requests — The argument here is that a feature request could span multiple repos creating multiple pull requests. A feature change in a monorepo would create a single pull request which is easier to track and manage. I don’t see having multiple pull requests as a problem. The QA’s that I’m friends with prefer to know what is being merged in and where, which is also an indication of which services need to be tested and deployed. Not really a counter argument if you have a fully automated CI/CD monorepo build and deployment pipeline in place but again, not really a problem to solve.
  • Monorepos reduce configuration effort — I do agree with this to some extent. We have all been through the pain of repeatedly setting up projects from scratch but, because we’ve all been through this pain multiple times, we all have probably created mechanisms to scaffold projects based on project types and have utilised external tools such as Vue CLI to ease the process. I don’t think monorepos fully solve this problem either though. The project setup within a monorepo can become extremely complex if not managed and maintained properly. Also, how often do you setup new monorepos? Do you have to scaffold the creation and structure of monorepos? Again, not really a problem solved.

My response to the above benefits might seem negative but that’s not my intention. I was trying to understand THE problem being solved by monorepos but as much as monorepos solve minor issues with multirepos, they will introduce other problems such as versioning, build and deployment issues.

An effort to move from monolith services to micro-services also meant that we were moving from one repository to multiple repositories.

I was also trying to understand how we got to multirepos and then to monorepos. An effort to move from monolith services to micro-services also meant that we were moving from one repository to multiple repositories. Monorepos seem to be the solution to this — moving the monolith from our code in our deployed estate to our code at rest, in the repo. It makes sense that you might of initially started off with a monolith REST service containing 20 endpoints providing CRUD functionality for customer data. Converting this to micro-services would introduce many repositories — multirepos. It therefore makes sense that you might use a monorepo to house all micro-services created from converting the monolith.

But what about the cost of shared code?

So I think I got to the origin of the problem and was able to see how we got to where we are; moving to a monorepo paradigm, but there was a question that I still hadn’t addressed. During the initial conversation, a colleague mentioned that one of the main benefits of our monorepo was the shared code base. We have a number of sub projects that are used by many of the other sub-projects. This shared code consists of build configuration as well as helper and utility Classes/methods. Housing the shared code base in the monorepo allows all of the other sub projects to naturally inherit changes rather than having to set any new version numbers when a shared code base is implemented as an external dependency.

This isn’t as clean and simple as it sounds though. Let’s say you have sub projects A, B, C, D and E which are built and deployed as docker containers. Projects A, B, C and D use a shared code based stored in sub project F. A QA finds a bug with sub project F so a developer fixes the bug and creates a PR. You need to configure the PR builder to build projects A, B, C and D because the shared code base has been modified. The QA also needs to know that services A, B, C and D need to be tested because the shared code base has been modified - is it the QA’s responsibility to know this or is it up to the developer to inform the QA? The build and deploy job also needs to be configured to build and deploy service A, B, C and D because the shared code has changed. Another scenario is that a change request has come in for project A and in the process of implementing the change request, a change has to be made to the shared code base. This now effects the other dependant services because the shared code base has changed. The previously mentioned configuration needs to be in place for the PR builder, master builder and deployment pipeline and the QA needs to know that this change affects all four services. This isn’t simple configuration or process and can become quite convoluted and difficult to manage.

I’ve had conversations around shared code bases and how they should be managed before with another colleague when we were discussing splitting a large repository into smaller ones. Specifically, what would we do with the shared code? How would we manage the shared code base as a module? When we make a change to the shared code, how do we roll that out to all of the dependent projects? How often do we modify the shared code? How would we manage this? This is where the question comes in; “What is the cost of shared code?” and more importantly, is it worth it? Is the process involved in providing that shared code base to multiple projects as an external dependency worth it based on two factors; the size of the shared code and how often it is modified?

If over the past two years the shared code has been modified once, then you might assume that it might only be modified once over the next two years and therefore, makes sense for this shared code base to be provided as an external dependency. If the shared code is extremely small, say one javascript file consisting of ten lines. Then it might make sense just to copy the shared code into each dependant project, duplicating it. This might sound terrible and make your OCD scream out but let’s ask the questions again — “Whats the cost of shared code?” and “Is it worth it?” Is it worth placing that single javascript file or Java class in a repo on its own and provide it as an external dependency? It’s definitely more efficient to duplicate it in each dependant project. I use these questions whenever I find myself splitting up repositories and creating shared code dependency projects. Based on this, I don’t think that a shared code base should be used as an argument to implement monorepos.

Conclusion

I’m neither for or against monorepos. I do believe that they shouldn’t be used across the board but instead, on a case by case basis. As soon as a monorepo becomes complicated and becomes a blocker, rethink it’s structure and process, maybe convert it to simpler multirepos. In my experience, stakeholders care about releasing quality, frequently. I’ve found that the best way to do this is to do two things; simplify and automate. As long as the monorepo’s can be implemented as simple as possible, and the process to develop, build, test and deploy each service within the monorepo is simple and automatic, I don’t see why they can’t be used.

--

--