Scaling CI

Continuous Integration Across Projects


Continuous Integration is a core feature of the Software Development Life Cycle in Agile teams. The term comprises both an idea and a service.

The idea behind Continuous Integration is that all changes to a software system under development are continually regression tested. This provides a system of continuous feedback as change occurs to the source code. Feedback is a core principle of Agile. We automate these processes of continuous feedback by using a CI service, such as Teamcity or Jenkins, to define Pipelines that implement the various tasks that are necessary.

So CI is also a service that is available on the network and monitors the source code repository for changes. The source code repository is a versioned file archive for storing source code. When a change is detected, as a new revision, the CI service will feed the new source code into a CI Pipeline that will checkout the code, build it and then prove that it works as expected. After the pipeline has finished executing, it will either be Green, meaning success, or Red, meaning failure. If the pipeline was successful, artefacts are created from the compiled source code and these artefacts are uploaded to an artefact repository.

picture alt
The CI Pipeline

There are some rules that the developers follow when using a CI service.

As you can see from the rules, CI is intended to function as something of a conductor, governing the behaviour of developers as they share code, preventing them from sharing each others' mistakes and wasting any time looking at broken code that they don't understand as they don't know the context under which the changes were developed.

There are some obligations that must be met in order for CI to function.

But there are significant benefits to running a CI system.

So what happens when you scale up the number of developers in this environment?

Scaling Pains

So far I have described two states that the CI pipeline can occupy; Green or Red. But there is a third state; Currently Building. As you can see from the rules stated in the introduction and the benefits of sharing code, we would hope that CI would build once per change, and that developers would only share code when the build is Green. This is far from reality.

The problem is that the pipeline takes time to finish. On any typical project, this can take from a few minutes to an hour. It’s not uncommon for a CI pipeline to take 45 minutes. It really depends on what kind of tests are being executed and what else takes place within the pipeline.

So in order to illustrate the problem, let’s suppose we have a development team of about fifty developers and that the pipeline takes 45 minutes to pass. And let's suppose that they all make one change. CI would then take 2250 minutes to clear all those changes, or 37.5 hours. That is a working week. Each developer would only get one commit per week, and that’s assuming all of those changes pass the pipeline and that the pipeline stays green.

Clearly this system doesn’t scale.

There are a number of common practises that are adopted to address this. For the most part, they are less than optimal.

Short build times become a very crucial aspect of scaling teams. In fact addressing this issue is the subject of another project, Cascade, that I have spent a great deal of time on.

The costs associated with clearing the pipeline in CI is the primary reason we have the Test Pyramid, where have a great many Unit Tests that execute very quickly, and we have far fewer Acceptance Tests that are very costly in terms of time to execute. The tests we value the most, Acceptance Tests, are the ones we execute the least because they take so long to run.

Divide and …

But there is very clearly another way of addressing the contention difficulties within CI. We can divide the larger software system up into smaller parts and maintain the components in their own source code repositories along with their own CI pipelines. This keeps the number of active developers on each pipeline to a low number and keeps contention manageable. Just as in concurrent programming, if you have a more granular exclusion lock, you tend to get more concurrency, so more work gets done.

picture alt
Multiple Pipelines

But what happens when one software component is dependent on another? Can the software system still be divided when the components need to communicate in order to complete their use cases? Let’s limit ourselves to the scenario of only two components that communicate. Now if two components both initiate calls on each other and know each other’s interfaces then you might argue that they cannot be divided since the dependency works both ways. But in the case where one component consumes the other and the dependency only runs one way, you might envision a system where the least dependent component is built first in its own pipeline and then it is included in later builds by dependent components until the software system as a whole is fully composed and tested.

Well the former scenario is separable through the use of Stubs, which I will get to in a moment. But in particular I want to argue against the latter notion of composing pipelines in that way because it isn't very unreliable.

Every software component has a certain probability of failure. I can describe this in terms of a statistical distribution. A single component with a failure rate of \lambda within a time period follows a Poisson distribution {\sim} Poisson(\lambda) . Let’s say \lambda is one failure and the time period is one day. Now Poisson distributions are additive so that Poisson(\lambda) + Poisson(\gamma) = Poisson(\lambda+\gamma) where \lambda and \gamma are both the mean values of the first two Poisson distributions.

The thing to note is that the mean value of failure for the sum is the sum of the two means. So let’s say the probability of failure is the same for any number of dependent components and our component is dependent on ten other components. The probability of those components having some problem within the time period of one day is ~ {\sim} Poisson(10\lambda) . In other words, the likelihood of failure is high and increases linearly.

The obvious counter argument is that we need all ten components to execute when we get into production, so we can’t get over this. And yes that’s true, but it’s also the responsibility of all the pipelines to achieve this goal. By composing pipelines in this way, you are placing the burden of integration on any component that consumes another and this introduces drag into that pipeline. The drag can be so severe that consuming components never achieve a stable platform on which to develop and developers sit waiting on issues that don’t have anything to do with them. I should also say that I’m not limiting myself to some fault in the software components that are being composed. There are many reasons these components might not work correctly; data setup, limited resources, deployment and shared interfaces. And this situation is not limited to components that the subject component consumes directly, but includes any transitive dependencies. The arbitrary number of dependencies taken above as ten is not so unlikely.

So how do we test components in isolation? Most software systems have boundaries that extend beyond the bounds of the organisation. When this happens, we are limited in our testing anyway as we can rarely deploy those other systems in our pipelines. What we do when this happens is that we stub out the system by reproducing those dependencies as a Stub. This stub acts as the real system would in respect of the interface we are consuming. We do the same when we divide our system up into smaller pieces. When we want to test just a single component, all of its dependencies are stubbed out. Our tests setup the stubs prior to actually working the subject system by telling the stubs what calls they should expect and what to return with when the calls are made. The tests then work the subject component which would cause it to make calls on the stubs. These effects are recorded within the stub with all the associated data that the subject supplied. Once the test has finished with the subject component, it verifies that the stubs were called in the appropriate manner.

picture alt
The Stub Context

But now we have multiple Build Pipelines that are each building a software component with all their dependencies stubbed out. We need to move forward into testing that these components work well together. We introduce Integration Pipelines. Integration Pipelines are pipelines that are triggered whenever a Build Pipeline successfully completes. The output of the Build Pipeline is passed to the Integration Pipeline so that change is gradually introduced and tested.

picture alt
Integration Pipelines

So let's focus a bit more on how Build Pipelines feed their changes to Integration Pipelines. As I illustrated in the pipeline diagram in the introduction, each successful build of a build pipeline saves the tested code as artefacts in an artefact repository, such as Artefactory or Nexus. A build system, such as Maven or Gradle, would download all a projects's software dependencies from these repositories, build the software and then publish the resulting artefacts back into the repository for other projects to consume. Each artefact that is built has a unique name and version number.

picture alt
Integration Pipelines

When an integration pipeline is triggered, it is passed the name and version of the new component so that it can obtain the correct artefact from the artefact repository. The name and version is used to introduce the new component into a baseline set of working components. The new component replaces the previous working version of itself. Notice that there is actually another repository at work here, the baseline versions repository. Very often this repository isn't even really a persistent store at all, but merely a log message at the end of the integration pipeline log or a file that is checked back into the source code repository. This part of the system is not handled in a very standardized way.

But this setup has a major issue. Note that the integration pipeline builds reactively on demand by any upstream pipeline. What happens when a change occurs to an interface that two or more components share? The first component that passes its pipeline will trigger an integration pipeline that will have mismatched components. The integration pipeline will fail, until all components that use that interface have been updated. Only then will the integration pipeline pass. This violates our rule about only building in CI with the expectation of passing.

picture alt
Building Mismatched Components

I will now refer to a set of components that implement a single change or feature as a Change Set. And as I’ve previously stated, the integration pipelines will only pass complete change sets. Now let’s suppose that there are multiple change sets on the go at the same time.

The source code repository of each component serialises all changes as a set of revisions. So if part of one change set is committed, and then another entire change set is committed across the build pipelines, it doesn’t matter that the following change set is good, the integration pipeline will still fail. Eventually the following components for the first change set are committed and the system as a whole is consistent again. Now you are in the situation that the change sets that are completely independent have now become dependent on one another. You cannot choose to take a version of the system that includes the earlier change set but not the later one as they are now interlinked. Independent change sets that contain no direct dependencies on other independent change sets can become dependent through this kind of ‘mechanical lock’. The only time this isn’t a risk is when you are talking about disjoint change sets where the respective change sets don’t affect the same components.

picture alt
Interweaving Of Change Sets Across Components

Added to all of this are the efforts of the poor QA developer who needs to develop the acceptance tests. His efforts are often reactive. How does he develop and run his tests against the system that implements a feature when he doesn’t have the system to test against? His efforts introduce another dependency to a successful integration pipeline. There are all the artefacts that are named and versioned by the build pipelines and there is the source code version of the integration tests that the QA has developed that prove the change set works. All of these need to be used at the same time in order to have a green integration pipeline.

… Conquer

Well, the obvious issue here is that the entire system of on-boarding change sets is unmanaged and reactive. This needs to change. Instead of a blind acceptance that all changes are valid candidates to be introduced into the integration pipeline, we introduce valid change sets as a conscious action.

This gives us the opportunity to pre-plan the order in which we commit change sets. In this way we remove the ‘mechanical lock’ or interweaving problem. Thus we will pass the integration pipeline on a much more frequent basis.

So how does this actually work? Well, let’s suppose that we have two features currently being developed by two teams. They both cover the same components and each team has a QA who is developing the integration tests for the integration pipeline. They have all developed their changes on branches of their respective source code repositories.

Change Set Controller

Now the first way we could arrange this is to introduce the role of a Change Set Controller into the organisation. This is an individual that co-ordinates the ordering of change sets into the pipeline. Simple and disjoint change sets can be applied at any point. The interesting change sets, from his perspective, are change sets that are not disjoint and need to be serialised. Each group would approach the Change Set Controller and ask for permission to proceed. This is across source code repositories. His role then is to identify change sets that are not disjoint and to ensure that each change set does not interweave with any other change set. In order to achieve this, he merely needs to line up the change sets and indicate to the developers that their change much follow some other change set. Then they can go ahead and commit. They don’t need to commit at the same time. They just need to ensure that they are ‘in order’. The QA would commit his test code into the integration pipeline in order as well.

picture alt
The Change Set Controller

I’ve implied that change sets can be introduced before previous change sets have passed the pipelines. This is a point that could be up for debate. On the one hand, we have our Continuous Integration Rules that I outlined in the introduction. On the other hand, the reason we separated the builds into different pipelines was to introduce more concurrency. If we locked out the entire system of pipelines while a change was going through, we would lose much of this concurrency and therefore the reason for the many pipelines in the first place. So I think on balance, that we should allow the committing of change sets during the execution of integration pipelines.

So what would happen is that the build pipelines would execute for each change set that is introduced. Each build pipeline must output a distinct artefact. Once all the components in the change set have been published to the artefact repository, the QA would take the baseline set of components, the set that last passed the integration pipeline, and he would introduce the new components to define the entire set under test. He would then manually trigger the integration pipeline with that set.

So what happens when the integration tests fail? We have already decided that there will be following revisions coming through. So we are now in the situation that if we want to introduce a code fix, we must append our code fix after the other revisions. This means that any specific builds that focus on those revisions, would fail as the fix hasn't been applied yet. So we would probably skip the individual builds for revisions in favour of getting the build back to green as soon as possible. This leaves us in the position that we wouldn't individually validate those intermediate change sets. There are two ways to apply the fix; If the fix is trivial, we could fix-forward. If however the change is very significant, then we can apply a negative change set, a Revert, that removes the offending change set from the pipeline. The choice between the two approaches is decided by how long it takes to identify the issue at hand and its magnitude.

There is a third option. Once we hit a red build, we throw away any subsequent commits to the individual branches. The teams that developed their change sets would have to re-schedule their changes with the Change Set Controller and repeat the work of merging these feature branches into the respective master branches of the components.

This issue with failure highlights the weaknesses in the Change Set Controller arrangement. The reason I’m still mentioning it is that it is possible to implement without significant changes to the CI system. But there is another possible arrangement that is much more technically demanding and could only be implemented with very significant changes to the CI system and that is Change Set Pipelines.

Change Set Pipelines

The Change Set Pipeline arrangement supposes that all the components in the change set are developed on a Feature Branch across all source code repositories. When the change set is ready to be tested, CI runs an entire set of pipelines for every component with a corresponding feature branch. When all the pipelines pass, the feature branches are merged into the master branch and the master pipelines execute. Since all the feature branches are separate, the Change Set Pipelines for each feature can be run in parallel.

picture alt
Change Set Pipelines

The advantage of this system is that the change sets are automatically co-ordinated. So there is less human dependency. Also the full set of integration tests are performed for each change set in isolation. A great deal more testing is done earlier, and there is much less potential for the pipeline to go red and to have incoming changes contribute to the complexity of getting the pipelines green again.

What happens when a merge conflict occurs during the merge into master depends on the circumstances. However I think that since we are running the integration pipelines at a much earlier stage, that most of the issues that break things will have been identified at this point. As long as the change sets are reasonably small in scope and do not contain particularly aggressive refactorings, then they can be merged straight in. If however, change sets go in on a very infrequent basis and the differences between master and the change set is great, then it's appropriate to pull the master changes into the change set pipeline so that any merge issues are resolved there before moving all the changes into the master pipelines.

The change set still needs to be identified so that the integration pipelines can apply the entire set to the baseline set of working components. There is also another issue in that we don’t want different change set pipelines to mix up their artefacts. Artefacts that are stored an artefact repository have a name and a version. What we need here is a component name, a change set name and a version.


The introduction of change sets into our CI environment only solves some of the problems. Let’s get onto the subject of stubbing.

QAs should develop the integration tests against stubs. This is going to surprise some, but let me argue;

In fact this is completely inline with the principles of TDD. The test is defined first and as such it proves the process that must be implemented. The software artefacts are then made to meet this process. This is an intangible, but no less valuable for that. So not only does this policy take the pressure off the QA developer since he isn't developing reactively while CI is red, but also this practice is much more rigorous.

Let me give you an example. It’s not uncommon to develop many software components as micro services. These are built in build pipelines. Typically an integration pipeline composes all these micro services together and proves them. Only then does the developer introduce a User Interface application. The reasons for this is typically that the UI is less separable than micro services and typically has a great many dependencies. UIs are typically a monolith and depend on all of the micro services. It is useful to prove the underlying services work before taking on the burden of a UI and all its dependencies. Now the developer on the UI faces a rather daunting task. He needs a stable platform. And as I’ve already outlined, the more components there are, the less likely the platform is to be stable. But if a QA has developed his integration tests against stubs, the UI developer can use those tests to setup stubs in an appropriate starting state, and can develop against those stubs even before the developers who are working on the underlying platform have finished development. The key here is that the environment for the UI developer is a part of the CI system, so it stays up-to-date. It is maintained.

But there is a technical challenge as well to having our integration tests execute against stubs. Stubs are comparatively dumb when compared to our production services. We essentially tell them that when a particular call is made at a particular stage, then they must return some data. This only works over one state change. For journey testing, where we take the system over multiple state changes, we need to record the stubs behaviour at every state change. When we finally introduce the real components, recording at every state change won’t be necessary for any stubs that have been replaced. The integration tests need to be designed in such a way that they can selectively operate stubs based on whatever underlying components have actually been deployed.

picture alt
The Online Banking Journey

The integration tests running against stubs could be introduced as their own Build Pipeline. This way the integration tests are proven to work against stubs before they act against the actual artefacts. In this way there is a system to validating that the stubs can be used as a substitute for the actual components.

So I’ve outlined that stubs are a necessary component of testing any component that depends on an external service, since we can’t deploy those services in our pipelines. I’ve taken the idea further and suggested that the application that we are developing should be divided up into smaller components, each with their own pipeline, and when they interface with systems outside of their bounded context, we use stubs again in those pipelines. And I’ve suggested that the QAs use stubs to develop their integration tests and that the stubs be made available to UI developers, and anyone really, who consumes the same components that the integration tests exercise. I’m pretty much putting stubs everywhere.

And I’ve outlined that the contract changes that are necessary to implement a change set, be identified up front so that stubs can be developed. So the obvious next step is to communicate the contract through the use of publishing stubs. A stub would reside in its own repository. It would be built and published to the artefact repository. And it could then be consumed by any party that is interested. Because it goes into the artefact repository, it would have to be named and versioned. We now have a versioning system for our contracts since the stub is a reference implementation of the contract.

But again, there is a problem here with naming. Each feature will introduce changes to the interfaces and therefore to the version numbers of the stubs. But we don’t know what order that the change sets will be introduced in, so we can’t simply order the interface changes upfront. Instead they all need to be qualified by the feature branch. In other words they are branched, just as the components are. This also means that they need to be promoted in the same way once a feature pipeline is merged into master. This is so that a baseline set of stubs can be used for further changes and to form part of the build pipelines on the master branch.


Solving the problem of having green builds in CI is not limited to proving that artefacts are good for production. Having green builds is a powerful productivity magnifier as it allows developers to communicate code and to isolate and solve issues. To this end, I argue for some changes to the way software is developed in large teams.

Where today developers on individual source code repositories communicate a change in contract via some arbitrary method, I am advocating that they formalise the method of communicating contracts via stubs.

I’m advocating that related software change be composed into Change Sets that are driven through CI in a controlled manner. Now when a build fails, the change set that causes the issue can be identified. The change set can be reverted and rescheduled. This keeps the pipelines green.

Since the pipelines are green, the artefacts they build, including the stubs, and the integration tests that ‘program’ them can be used for other purposes. Specifically, they can be used to support development of downstream components such as User Interfaces.

Also I’m advocating that QAs do their development up front instead of reactively. They no longer wait for the development of the individual components to take place before completing their work. They therefore prove how a consumer of the system will interact with the system while the components are still under development. This is something intangible, but no less valuable than the gathering of requirements or the management of a software project.

Finally, I'm advocating that dependent components do not pull their dependencies into their build pipelines as this binds the time of development in a specific order and introduces instability into the pipeline as the dependencies change their setup and their own dependencies over time. This is integration work and should be reserved for its proper time and place.