We are the Dev Teams of
  • brands
  • ebay_main
  • ebay
  • mobile

Taming the Hydra (part 1)

by Marc Günther
in How We Do It


How to take care of a large Jenkins installation and still keep your sanity

Sometimes maintaining our Jenkins infrastructure reminds me of fighting the fabled Hydra. Every time you slash a problem in one of the jobs, in the meantime ten more jobs have sprouted, each of which will pose their own problems in the future.

Hercules, by John Singer Sargent (1921)

Our hydra by the end of 2014 had grown almost 3000 heads. Something had to be done.

Where do all these Jenkins jobs come from?

How do people usually create a Jenkins job? Well, they use a technique which is frowned upon for decades in software development:

  • find an existing job that does roughly what you want
  • copy it
  • adjust the copy till it more closely does what you need
  • forget about it
  • repeat

This of course is called Copy Paste Programming and that, as we all know, leads to unmaintainable source code and likewise to an unmaintainable mess of Jenkins jobs.

Also, for us in the Engineering Support team, it lead to a serious series of déjà vus. Didn’t I fix this problem in that job last week? Oh no, it was this job, which looks almost the same as that job, but uses a different nodeJS version, for no reason whatsoever, except that it’s wrong. And this job is embarrassingly slow, because it doesn’t run Maven in parallel, although it could. And yet another one doesn’t contain the fix in the Groovy post-build step, which we did two months ago…

The Hydra grows more heads

And then something else happened. The Product Development team was fighting their own beast, commonly known as The Monolith:

HAL 2001 monolith
Monolith as displayed at the “Hackers at Large” conference in Enschede, The Netherlands, 10-11-12 August 2001.

The Monolith proved to be unmaintainable as well. Change one tiny thing over here, and a whole wall collapses over there. Build up the wall again, and a whole building collapses (an entirely different building, not the one you have been working on).

The solution is of course to break everything up into smaller pieces. These are as independent from one another as possible, only exposing their API, can be implemented in any language, and are much easier to understand and maintain, as they solve only one specific problem instead of trying to ensure world peace.

Enter Micro Services

But Micro Services pose their own challenges. Developers have to juggle with a lot more Git repositories. The Site Operations team has to provide the infrastructure to deploy and monitor all these small services. And we in the Engineering Support team wanted an easy way to maintain all the new jobs that would be needed.

Hercules’ solution fails us

Jenkins Script Console told us, that even without any of these new jobs, we had already quite a beast to fight.

Result: 2936

That was end of 2014. We had several discussions back then, but one thing was clear from the start. Hercules’ solution won’t work for us. We can’t kill the hydra. A lot of these jobs are actually needed and with the advent of Micro Services, they will only become more.

So instead of killing, we have to tame the beast. Orchestrate the heads, so to speak.

The vision

Seeing that a lot of these heads, erm jobs, are quite similar to each other, we envisioned a mechanism that would allow us to:

  • administer the similar parts of the jobs in one central place (making it easy for us to maintain), whereas to
  • keep the specific configuration parts of the jobs separate. Ideally they would be stored in the Git repository itself, alongside the source code (where it would be easy for developers to maintain).

Of course, not all of the jobs are similar to each other. In reality, there are several very different job types, each forming a cluster of similar jobs. Some of the clusters are big, some only contain a single job (one-of-a-kind jobs). We were most interested in the big clusters. If all the jobs were one-of-a-kind jobs, none of the following would have made any sense.

So we were looking for some kind of flexible templating mechanism. Jenkins doesn’t provide one out of the box, but there are some plugins to be found, also Jenkins allows to do Groovy system scripting, and also provides a Remote Access API.

Tune in next week for a look at the alternatives, the failures, and the (quite cool, if I might say so) solution that we eventually came up with and are using successfully now for quite a while (hosting 1228 jobs now, which mostly take care of themselves)…

The other parts of the series:

- Part 2: The alternatives
- Part 3: The implementation