CI Pipeline Optimization Service

With a great number of features in production comes a great CI Pipeline to ensure they keep working

Your problem

Your application is doing great. New customers sign up all the time. Your developers keep improving it with new features. You properly maintain your application and write good tests. There’s static analysis tools to help improve your security and code quality. But after a few years of hard work this XKCD comic now describes to how long your developers have to wait to get feedback from the Continuous Integration (CI) Pipeline to hear if their code is good enough to deploy to production.

Let’s set some expectations: 95% of your CI runs should give you actionable feedback within 3 minutes. It’s not the end of the world if the full CI pipeline slowly creeps up to 5 minutes, but when it takes long than this it will start to hurt in a more noticeable way: developers will feel obligated to take an increasingly longer coffee break to wait for feedback. When it gets really bad, they’ll need to completely switch tasks, which requires rebuilding their mental model of the problem with every switch. The time and energy costs of this are substantial, and the context switching will lead to more shipped bugs or more code review feedback cycles to catch them. The bigger your team is and the more often your developers need to wait for CI feedback, the more time they are collectively wasting by waiting.

If you assume every developer takes a short break after pushing code to CI, you can optimistically discount the first 3-5 minutes of a CI run. Afterwards, every single minute counts. Assuming a developer pushes code to CI 6 times per day, a team of 5 developers will end up collectively waiting 10 hours per month for each additional minute of CI runtime. This means a CI runtime of 15 minutes wastes over 100 hours of developer time. Every. Single. Month. That’s close to a full developer’s salary wasted by waiting.

What would you invest if you could reclaim the time wasted by your best developers? One month’s salary? Two? Six?

The real problem is that in many organizations this waste is invisible to higher management because nobody actively pays attention to it. Sure, the developers might become grumpier and their velocity drops a bit, but isn’t that part of working on a larger application? No! It does not have to be. Waiting too much hurts the morale of great developers. A solution to lower productivity is often sought in (trying to) hire more people, but if you know they would be wasting a good part of their time then you’re only addressing part of the problem without solving it.

The solution

In order to solve a problem, it’s useful to know how big it is. You will need data. For a given month, you need to know how many CI runs are started, how long they took and what their results were. Which tools are taking the most time? Often the slowest factor is automated tests, so track the number of tests that are executed. Track success/failure and retries due to brittleness. You can pre-define thresholds when to look at improving things:

  • Under 3 minutes is the sweet spot, which gives you some breathing room to have it go up without hurting your productivity.
  • At 5 minutes is when you create a low priority ticket to look at it when able: you’re wasting a few hours per month, so the return on investment for addressing this is lower than most other things your developers could be doing.
  • Once you hit 10 minutes the priority increases to high: your team of 5 is wasting over a developer-week every month now! If you can spend a week to reclaim this, the investment of time will pay itself back very quickly.
  • If you manage to pass 15 minutes without addressing the issue, then the task to solve this should be critical, especially when you have a larger team. The time you’ll regain by solving the problem will be more valuable than most other tickets in the backlog.

On a relatively new project you can apply some simple tricks such as adding parallel_tests, and auto-retrying failed brittle tests. You can run your CI Pipeline on a more heavy dedicated server: more cores means faster results. That can take care of some projects for multiple years. But once you have done all this and your CI is slow again, you’ll need to invest progressively more time to scale things up based on your particular bottleneck(s). Maybe split tests across multiple larger machines, and collate results afterwards. You could invest in identifying tests that are most relevant for your changes and simply run those in a dedicated check, or you could (partially) re-architect from factories to fixtures to reduce the number of (very often repeated) database interactions, or setup a series of Docker containers that allow for quick setup times and massive parallel testing. If brittle tests are your bane, setting up a machine to just run them 24/7 to identify the brittle ones might also be an option.

Whatever the solution you need, it will cost increasingly more developer time to make your tests fast. Time that you’re already wasting by waiting on long CI runs, so the opportunity cost of having your developers solve the problem is really high.

An upside of CI Pipeline optimization work is that it is an optimization problem that is relatively far removed from your business domain. Compared to one of your experienced developers, you could have someone do it that might only need 10% of the domain knowledge your developers have. It also benefits from experience. In other words: hiring an outside expert will cost less time and have a lower opportunity cost than having your own developers solve the problem.

Our offer

We specialize in maintenance tasks that help teams be more productive. We’ve setup CI Pipelines and we’ve optimized them when they got slow. Allow us to help your developers regain a fast CI Pipeline.

Because the value for you and the effort for us will vary wildly between projects, we know it is hard to put a single price on this. So we don’t do this. What we do is collaborate with you to find out what your slow CI Pipeline is costing you right now and we’ll agree on a budget based on this. The bigger your team is, the more the wasted time hurts you. The more optimized the CI Pipeline already is, the more effort it will take to improve it further.

The upside for you is that we only charge you for the time savings we’ve already delivered. For example, if at the end of our first month we manage to cut CI time down by 50% then we’ll charge you 50% of the budget at the end of that month. With a 30 day payment term, that means you’ve already benefitted from about 1.5 months of savings by the time you actually pay us. It also means No Cure, No Pay which reduces the risk for you to zero. On our end it means that the larger the budget, the more time we can spend helping you.

Because we offer multiple services, we try to keep our mornings available for routine maintenance and communication and use our afternoons for uninterrupted development time. This means that we’ll usually spend up to 20 hours per week working on optimizing your CI Pipeline.

Our process

What can you expect from our CI Pipeline Optimization Service?

  • We’ll start with a brief onboarding call with you and your tech lead, to get to know you and your application. This is similar to the Introduction Meeting for our Maintenance Service.
  • We sign a basic contract to get the liability side covered and establish terms for the rest of the process.
  • You give us access to your code and CI system, so we can determine how much time your team is currently wasting by waiting on CI. We’ll try to get a representative sample CI run to mark the starting point by creating a new branch and adding some newlines to a file, then pushing it to CI. Then we add a few more lines and push to CI again. Repeat it until we establish a reliable time range within which your CI finishes.
  • Our target is 3 minutes or less for actionable feedback, and (when other things also trigger) under 5 minutes for the entire CI pipeline. This means you want quick feedback on tests related to areas that were changed, but it’s okay to have the rest of the test suite take a little longer.
  • We’ll set a budget together with you based on how much this is already costing you (size of the team) and how much effort we expect is needed to improve it. A bigger budget means we can spend more time on it and thus go to greater lengths to help your developers reclaim their wasted time.
  • We send you an offer based on the numbers we have agreed upon and the terms in our contract. You sign it. We get to work.
  • We start to improve your CI pipeline, delivering one or more Pull Requests to be reviewed by your developers for acceptance. In case of more drastic measures we might try switching CI platforms or underlying servers to higher performance/capacity ones. It is possible that your monthly costs will go up as a result, but this should only be a fraction of the value of the time you’re saving.
  • Once you’re benefitting from your developers being more productive, we’ll invoice you based on the percentage of savings we’ve achieved for you so far. We invoice at the start or end of every month, so depending on how long we need, you will get one or more invoices from us.
  • Once we hit the target, or have found a sweet spot that’s close enough to the target, the engagement ends and we’ll send our final invoice.
  • Based on how fast your CI Pipeline load grows, it will be useful to schedule a check-up session 6 or 12 months later.

If you’re one of our Maintenance Service clients, we’ll check every month how things look and if you could benefit from further CI Pipeline optimization. This way the problem will never be invisible, so we can help you before you’ve wasted hundreds of developer hours. From this perspective, it’s a service that practically pays for itself.

You’re currently losing valuable developer hours due to them waiting for long CI runs. Having us come in to solve it on a No Cure No Pay basis is the smartest way to solve this problem, because you’ll only pay once you’re actually seeing the time savings.

There is no risk, so why don’t you email us to see how we can help you?