How Unity’s Build Team Developed a Custom Continuous Integration Solution

Most people reading this know that I lead the Build & Release Engineering team at Unity Technologies. On Unity’s company blog, I’ve posted about our in-house build automation / CI solution, Katana. I wanted to write more about this project, and go over a few key elements as to why I believe it has turned into a successful project for Unity.

First of all, if you haven’t read the blog post I wrote on Unity’s blog about the project’s development, first go do that. I’ll wait.

Katana has matured and changed quite a lot even since that blog post was written.

Obligatory screenshot of Katana’s current UI:

It’s hard for me to believe it, but it’s been just over two years since we first started the Katana project at Unity, and we’ve been using Katana in production for almost one year. Our most recent project with Katana has been working on making it support multiple Unity mainlines in a scalable way (We are in the process now of transitioning our older mainlines (which are currently still built and tested on our old TeamCity server) over to Katana).

At this point, I think we have enough evidence to say that Katana has been a success for Unity. I’ll reflect on a few reasons why this is the case.

We did requirements gathering.

Before the Katana project was even defined, I spent a lot of time researching. The process went roughly like this:

I made a list of all the ways in which the current solution was failing us. This included both the things the current solution wasn’t doing correctly (i.e, bad performance, broken features, etc) and the things that the current solution wasn’t doing at all (missing features).
I researched to generate a list of existing, off-the-shelf solutions that seemed like they might be viable.
I cross-checked the list of requirements against the features of the possible solutions.

At this point, I realized that there was nothing off-the-shelf that came close to really fitting our requirements. I had also learned that it was becoming more and more common for large engineering organizations to invest heavily in custom infrastructure.

I settled on pursuing a custom solution based on top of Buildbot. I knew Buildbot has been used successfully in many large organizations, including open source projects (i.e, Mozilla and the Chromium project). Development at Unity actually works like a large open-source project in many ways, and often we run into the same sort of scalability issues that would be more expected of a really large open-source project.

Since Buildbot is more of an open-source CI framework, and it was missing a lot of the features we would need, I knew that we’d need to do a lot of development to end up with a workable solution.

Conclusion: Taking the time to do a real set of requirements gathering was key to getting us off to a good start.

We did prototyping.

Around this time, we hired another developer, Maria, onto our team. She had some experience working with Buildbot already from her previous job, which was lucky for us. I knew that the project was going to take a long time (I thought anywhere from 6 months to a year, but I didn’t have enough information to make an accurate guess), so before we invested in this much work, I wanted to do some prototyping and to generally get a better understanding of Buildbot’s current set of capabilities. I made a plan with Maria, to check our set of requirements against Buildbot’s capabilities, and also to make a proof of concept that would test out scalability.

(This was also the phase where multiple people in the company, including our CTO, tried to gently talk me out of pursuing this plan. I didn’t actually realize it at the time, but now I think I might be the only person working at Unity — save possibly my husband, who also happens to work there, and Maria, who I had working on my team — who didn’t think going down this road was a mistake. Luckily, they let me try (probably thinking I would fail), anyway.)

This phase lasted around 2 months. By the end of it, I was convinced that we were going down the right road, and, having a better understanding of our requirements and Buildbots capabilities, I laid out a project plan that would take approximately 8 months.

It was during this time that we also decided that we would forgo some nice user-facing features in order to focus on performance.

Conclusion: Taking the time to do the prototyping (even knowing it was prototyping and that much of the code would be abandoned) was key to getting us off to the right start and helping us set our expectations.

We used an open-source solution instead of starting completely from scratch.

While using any existing solution is likely to constrain you in some way (because you didn’t design the software exactly according to your needs), and having that happen does have a cost, the development time we saved by using an open-source framework is something I can’t even quantify.

Throughout development, and even now, we get the advantage of bug-fixes and improvements from upstream as well.

Finally, participating in open-source projects is an excellent way to meet potential job candidates. I have now hired six full time developers at Unity, and two of them I met through open-source projects we were participating in (one of them being Buildbot).

Conclusion: Leverage the power of open-source. Use open-source tools, contribute to them, and benefit from them.

When plans had to change, we adapted. But carefully.

A couple months into the development of the project, we realized that it was going to take significantly longer than anticipated to implement a key feature we needed. When this sort of thing happens, you can be proactive, or you can be reactive. We were proactive — we immediately re-evaluated our timelines (expanding the project schedule by a few months), updated our project plan, and communicated the new timeline and milestones to relevant people.

I really believe that doing this was key to keeping us focused and not letting us get lax about deadlines.

Conclusion: Plans sometimes have to change, but don’t let timelines creep; actively re-evaluate them.

We did project management.

Throughout the project, the project was carefully managed. This entailed:

Requirements gathering
Resource planning with relevant parties (we had to work a lot our IT department, in particular, to plant he procurement of hardware, and we started these conversations during the prototyping phase)
Design (complete with architecture diagrams)
We kept a project plan with milestones, deliverables, and estimated timelines
Day-to-day tasks were managed Kanban-style, in a Trello Board

For a project that took this long, and involved as many resources (and thus monetary investment), I think this project management was key. It was important to manage in a way that didn’t burden the development process, but I think the project would have taken significantly longer if it hadn’t been managed.

(Aside: This is an example of why I think this manifesto is one of the worst things ever.)

Conclusion: There’s a difference between writing code and developing software.

We had a direct feedback cycle with our customers.

I’ve decided that one of the best things about being an internal tools team is that your customers work with you and you have a direct line to each other at all times. (Incidentally, one of the worst things about being an internal tools team is that customers work with you and you have a direct line to each other at all times ;-))

We see and talk to our customers in real-time every day. This meant it was easy for us to do a really iterative development process, which helped us easily prioritize our tasks.

Conclusion: Having a quick and direct feedback cycle helps you prfioritize day-to-day tasks. This is of course harder if your customers are external, but it’s worth striving for.

We had support, even if that just meant the freedom to fail.

I mentioned before that everyone was skeptical of the project. But, they let us push forward — perhaps thinking that that we might fail. We also had support from our IT department, to help us get quotes on hardware, and make rollout plans (because we couldn’t decommission our old solution, we had to roll out a new build farm as part of our project), and support from our development directors when it came to transferring teams over. We also had a lot of eager members of R&D who were willing to use Katana, even when it came with a lot of bumps and and quirks, all the while giving us feedback to help us make it better.

Conclusion: Anything is possible with enough work, but teams really are more likely to succeed if they have outside support where it’s needed.

So, there you have it. I’m convinced that these seven reasons helped make the difference between a successful project and a potentially expensive failure.

The Build Team at Unity continues to work on Katana, in addition to many other projects (which are also usually named after Japanese weapons), and the we now have 4 people who have their hands involved in the project in some way (whether it’s working on the backend, working on the front-end, or working on implementing the steady stream of configuration requests from members of R&D). I fully believe that Katana will be a core part of development at Unity for quite some time.