At Unity, we’ve been spending a lot of effort recently looking at our build times. One of the topics that came up is lumped builds. I realized during all of this (and while doing my own research of course) that many people are lacking a clear understanding of what lumped builds gain you, and what they cost in return.
What are lumped builds?
Lumped builds are known by a few different names. I’m sure I don’t know all of them, but they’re sometimes called bulk builds or quite commonly called unity builds (no, nothing to do with us).
Essentially, lumped builds simply mean including multiple source files into one SCU (single compilation unit). In other words: “Please compile ten (or fifteen, or whatever) files at the same time instead of one file at a time”.
You can enable lumped builds in build systems that support them. To give some examples, Jam has support for them (as lumped builds) and FastBuild has support for them as unity builds.
Why might I want lumped builds?
The main goal of lumped builds is to speed up build times (as including multiple source files in one SCU reduces serial compilation time). Essentially, they’re something you start looking at when your build times get long…or at least long enough to bother you.
Why might I not want lumped builds?
Lumped builds aren’t a silver bullet. They do come with some obvious downsides.
First of all, you have to remember that the compiler will see all files in the lump as if they were one file. Therefore, a namespace using statement in one file will apply to all files in the lump, not only the file it’s written in. Similarly, developers usually quickly run into issues with file-scope statics: C++ would normally allow you to have two static functions (or variables) of the same name as long as they’re in separate files. With lumped builds, this won’t work (if the files are in the same lump).
These are simple and straightforward issues; there isn’t much you can do about them if you want to use lumped builds. You’ll simply have to be aware, and they don’t have to be a significant problem as long as you are aware of them.
But there’s a dark side.
If you are only running lumped builds, everything might seem fine and rosy at first. However, since you generally trust the build system to be in charge of lumps, the files allocated to each lump will change over time (as developers add or remove files, change per-file compiler flags, etc). This means a developer can add a file (which inadvertently changes the lumping allocation) and suddenly (and frustratingly!) get a compiler error she didn’t create (because the previous developer forgot an include, or forgot to extern something, but the build accidentally worked anyway because the relevant files happened to be in the same lump). Additionally, this same kind of codebase degradation causes issues with IDE integration. Your IDE doesn’t know your build system sees these N files as one file, therefore it will be unable to reliably provide code completion and navigation as your codebase degrades further. In theory, this phenomenon would also affect static analysis tools (though I haven’t personally tried running static analysis tools on such a codebase).
Is there anything I can do to prevent all that?
I think the best thing you can do is to not go down the path of having only non-lumped builds. Even if you have lumped builds turned on for developers (for the sake of their iteration time), simply make sure your CI server, or nightly builds at least, are building with lumping turned off. This preserves the non-lumped build path and will provide protection against the codebase degradation that can otherwise happen.