Buildbot in 5 minutes - a user-contributed tutorial

(Ok, maybe 10.)

Buildbot is really an excellent piece of software, however it can be a bit confusing for a newcomer (like me when I first started looking at it). Typically, at first sight it looks like a bunch of complicated concepts that make no sense and whose relationships with each other are unclear. After some time and some reread, it all slowly starts to be more and more meaningful, until you finally say "oh!" and things start to make sense. Once you get there, you realize that the documentation is great, but only if you already know what it's about.

This is what happened to me, at least. Here I'm going to (try to) explain things in a way that would have helped me more as a newcomer. The approach I'm taking is more or less the reverse of that used by the documentation, that is, I'm going to start from the components that do the actual work (the builders) and go up the chain from there up to change sources. I hope purists will forgive this unorthodoxy. Here I'm trying to clarify the concepts only, and will not go into the details of each object or property; the documentation explains those quite well.

Installation

I won't cover the installation; both buildbot master and slave are available as packages for the major distributions, and in any case the instructions in the official documentation are fine. This document will refer to buildbot 0.8.5 which was current at the time of writing, but hopefully the concepts are not too different in other versions. All the code shown is of course python code, and has to be included in the master.cfg master configuration file.

We won't cover the basic things such as how to define the slaves, project names, or other administrative information that is contained in that file; for that, again the official documentation is fine.

Builders: the workhorses

Since buildbot is a tool whose goal is the automation of software builds, it makes sense to me to start from where we tell buildbot how to build our software: the builder (or builders, since there can be more than one).

Simply put, a builder is an element that is in charge of performing some action or sequence of actions, normally something related to building software (for example, checking out the source, or make all), but it can also run arbitrary commands.

A builder is configured with a list of slaves that it can use to carry out its task. The other fundamental piece of information that a builder needs is, of course, the list of things it has to do (which will normally run on the chosen slave). In Buildbot, this list of things is represented as a BuildFactory object, which is essentially a sequence of steps, each one defining a certain operation or command.

Enough talk, let's see an example. For this example, we are going to assume that our super software project can be built using a simple make all, and there is another target make packages that creates rpm, deb and tgz packages of the binaries. In the real world things are usually more complex (for example there may be a configure step, or multiple targets), but the concepts are the same; it will just be a matter of adding more steps to a builder, or creating multiple builders, although sometimes the resulting builders can be quite complex.

So to perform a manual build of our project we would type this from the command line (assuming we are at the root of the local copy of the repository):

$ make clean    # clean remnants of previous builds
...
$ svn update
...
$ make all
...
$ make packages
...
# optional but included in the example: copy packages to some central machine
$ scp packages/*.rpm packages/*.deb packages/*.tgz someuser@somehost:/repository
...

Here we're assuming the repository is SVN, but again the concepts are the same with git, mercurial or any other VCS.

Now, to automate this, we create a builder where each step is one of the commands we typed above. A step can be a shell command object, or a dedicated object that checks out the source code (there are various types for different repositories, see the docs for more info), or yet something else:

from buildbot.process.factory import BuildFactory
from buildbot.steps.source import SVN
from buildbot.steps.shell import ShellCommand

# first, let's create the individual step objects

# step 1: make clean; this fails if the slave has no local copy, but
# is harmless and will only happen the first time
makeclean = ShellCommand(name = "make clean",
                         command = ["make", "clean"],
                         description = "make clean")

# step 2: svn update (here updates trunk, see the docs for more
# on how to update a branch, or make it more generic).
checkout = SVN(baseURL = 'svn://myrepo/projects/coolproject/trunk',
               mode = "update",
               username = "foo",
               password = "bar",
               haltOnFailure = True )

# step 3: make all
makeall = ShellCommand(name = "make all",
                       command = ["make", "all"],
                       haltOnFailure = True,
                       description = "make all")

# step 4: make packages
makepackages = ShellCommand(name = "make packages",
                            command = ["make", "packages"],
                            haltOnFailure = True,
                            description = "make packages")

# step 5: upload packages to central server. This needs passwordless ssh
# from the slave to the server (set it up in advance as part of slave setup)
uploadpackages = ShellCommand(name = "upload packages",
                              description = "upload packages",
                              command = "scp packages/*.rpm packages/*.deb packages/*.tgz someuser@somehost:/repository",
                              haltOnFailure = True)

# create the build factory and add the steps to it
f_simplebuild = BuildFactory()
f_simplebuild.addStep(makeclean)
f_simplebuild.addStep(checkout)
f_simplebuild.addStep(makeall)
f_simplebuild.addStep(makepackages)
f_simplebuild.addStep(uploadpackages)

# finally, declare the list of builders. In this case, we only have one builder
c['builders'] = [
    BuilderConfig(name = "simplebuild", slavenames = ['slave1', 'slave2', 'slave3'], factory = f_simplebuild)
]

So our builder is called simplebuild and can run on either of slave1, slave2 and slave3. If our repository has other branches besides trunk, we could create another one or more builders to build them; in the example, only the checkout step would be different, in that it would need to check out the specific branch. Depending on how exactly those branches have to be built, the shell commands may be recycled, or new ones would have to be created if they are different in the branch. You get the idea. The important thing is that all the builders be named differently and all be added to the c['builders'] value (as can be seen above, it is a list of BuilderConfig objects).

Of course the type and number of steps will vary depending on the goal; for example, to just check that a commit doesn't break the build, we could include just up to the make all step. Or we could have a builder that performs a more thorough test by also doing make test or other targets. You get the idea. Note that at each step except the very first we use haltOnFailure = True because it would not make sense to execute a step if the previous one failed (ok, it wouldn't be needed for the last step, but it's harmless and protects us if one day we add another step after it).

Schedulers

Now this is all nice and dandy, but who tells the builder (or builders) to run, and when? This is the job of the scheduler, which is a fancy name for an element that waits for some event to happen, and when it does, based on that information decides whether and when to run a builder (and which one or ones). There can be more than one scheduler. I'm being purposely vague here because the possibilities are almost endless and highly dependent on the actual setup, build purposes, source repository layout and other elements.

So a scheduler needs to be configured with two main pieces of information: on one hand, which events to react to, and on the other hand, which builder or builders to trigger when those events are detected. (It's more complex than that, but if you understand this, you can get the rest of the details from the docs).

A simple type of scheduler may be a periodic scheduler: when a configurable amount of time has passed, run a certain builder (or builders). In our example, that's how we would trigger a build every hour:

from buildbot.schedulers.timed import Periodic

# define the periodic scheduler
hourlyscheduler = Periodic(name = "hourly",
                           builderNames = ["simplebuild"],
                           periodicBuildTimer = 3600)

# define the available schedulers
c['schedulers'] = [ hourlyscheduler ]

That's it. Every hour this hourly scheduler will run the simplebuild builder. If we have more than one builder that we want to run every hour, we can just add them to the builderNames list when defining the scheduler and they will all be run. Or since multiple scheduler are allowed, other schedulers can be defined and added to c['schedulers'] in the same way.

Other types of schedulers exist; in particular, there are schedulers that can be more dynamic than the periodic one. The typical dynamic scheduler is one that learns about changes in a source repository (generally because some developer checks in some change), and triggers one or more builders in response to those changes. Let's assume for now that the scheduler "magically" learns about changes in the repository (more about this later); here's how we would define it:

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.changes import filter

# define the dynamic scheduler
trunkchanged = SingleBranchScheduler(name = "trunkchanged",
                                     change_filter = filter.ChangeFilter(branch = None),
                                     treeStableTimer = 300,
                                     builderNames = ["simplebuild"])

# define the available schedulers
c['schedulers'] = [ trunkchanged ]

This scheduler receives changes happening to the repository, and among all of them, pays attention to those happening in "trunk" (that's what branch = None means). In other words, it filters the changes to react only to those it's interested in. When such changes are detected, and the tree has been quiet for 5 minutes (300 seconds), it runs the simplebuild builder. The treeStableTimer helps in those situations where commits tend to happen in bursts, which would otherwise result in multiple build requests queuing up.

What if we want to act on two branches (say, trunk and 7.2)? First we create two builders, one for each branch (see the builders paragraph above), then we create two dynamic schedulers:

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.changes import filter

# define the dynamic scheduler for trunk
trunkchanged = SingleBranchScheduler(name = "trunkchanged",
                                     change_filter = filter.ChangeFilter(branch = None),
                                     treeStableTimer = 300,
                                     builderNames = ["simplebuild-trunk"])

# define the dynamic scheduler for the 7.2 branch
branch72changed = SingleBranchScheduler(name = "branch72changed",
                                        change_filter = filter.ChangeFilter(branch = 'branches/7.2'),
                                        treeStableTimer = 300,
                                        builderNames = ["simplebuild-72"])

# define the available schedulers
c['schedulers'] = [ trunkchanged, branch72changed ]

The syntax of the change filter is VCS-dependent (above is for SVN), but again once the idea is clear, the documentation has all the details. Another feature of the scheduler is that is can be told which changes, within those it's paying attention to, are important and which are not. For example, there may be a documentation directory in the branch the scheduler is watching, but changes under that directory should not trigger a build of the binary. This finer filtering is implemented by means of the fileIsImportant argument to the scheduler (full details in the docs and - alas - in the sources).

Change sources

Earlier we said that a dynamic scheduler "magically" learns about changes; the final piece of the puzzle are change sources, which are precisely the elements in buildbot whose task is to detect changes in the repository and communicate them to the schedulers. Note that periodic schedulers don't need a change source, since they only depend on elapsed time; dynamic schedulers, on the other hand, do need a change source.

A change source is generally configured with information about a source repository (which is where changes happen); a change source can watch changes at different levels in the hierarchy of the repository, so for example it is possible to watch the whole repository or a subset of it, or just a single branch. This determines the extent of the information that is passed down to the schedulers.

There are many ways a change source can learn about changes; it can periodically poll the repository for changes, or the VCS can be configured (for example through hook scripts triggered by commits) to push changes into the change source. While these two methods are probably the most common, they are not the only possibilities; it is possible for example to have a change source detect changes by parsing some email sent to a mailing list when a commit happen, and yet other methods exist. The manual again has the details.

To complete our example, here's a change source that polls a SVN repository every 2 minutes:

from buildbot.changes.svnpoller import SVNPoller, split_file_branches

svnpoller = SVNPoller(svnurl = "svn://myrepo/projects/coolproject",
                      svnuser = "foo",
                      svnpasswd = "bar",
                      pollinterval = 120,
                      split_file = split_file_branches)

c['change_source'] = svnpoller

This poller watches the whole "coolproject" section of the repository, so it will detect changes in all the branches. We could have said:

svnurl = "svn://myrepo/projects/coolproject/trunk"

or:

svnurl = "svn://myrepo/projects/coolproject/branches/7.2"

to watch only a specific branch.

To watch another project, you need to create another change source -- and you need to filter changes by project. For instance, when you add a change source watching project 'superproject' to the above example, you need to change:

trunkchanged = SingleBranchScheduler(name = "trunkchanged",
                                     change_filter = filter.ChangeFilter(branch = None),
                                     # ...
                                     )

to e.g.:

trunkchanged = SingleBranchScheduler(name = "trunkchanged",
                                     change_filter = filter.ChangeFilter(project = "coolproject", branch = None),
                                     # ...
                                     )

else coolproject will be built when there's a change in superproject.

Since we're watching more than one branch, we need a method to tell in which branch the change occurred when we detect one. This is what the split_file argument does, it takes a callable that buildbot will call to do the job. The split_file_branches function, which comes with buildbot, is designed for exactly this purpose so that's what the example above uses.

And of course this is all SVN-specific, but there are pollers for all the popular VCSs.

But note: if you have many projects, branches, and builders it probably pays to not hardcode all the schedulers and builders in the configuration, but generate them dynamically starting from list of all projects, branches, targets etc. And using loops to generate all possible combinations (or only the needed ones, depending on the specific setup), as explained in the documentation chapter about Customization.

Status targets

Now that the basics are in place, let's go back to the builders, which is where the real work happens. Status targets are simply the means buildbot uses to inform the world about what's happening, that is, how builders are doing. There are many status target: a web interface, a mail notifier, an IRC notifier, and others. They are described fairly well in the manual.

One thing I've found useful is the ability to pass a domain name as the lookup argument to a mailNotifier, which allows to take an unqualified username as it appears in the SVN change and create a valid email address by appending the given domain name to it:

from buildbot.status import mail

# if jsmith commits a change, mail for the build is sent to jsmith@example.org
notifier = mail.MailNotifier(fromaddr = "buildbot@example.org",
                             sendToInterestedUsers = True,
                             lookup = "example.org")
c['status'].append(notifier)

The mail notifier can be customized at will by means of the messageFormatter argument, which is a function that buildbot calls to format the body of the email, and to which it makes available lots of information about the build. Here all the details.

Conclusion

Please note that this article has just scratched the surface; given the complexity of the task of build automation, the possibilities are almost endless. So there's much, much more to say about buildbot. However, hopefully this is a preparation step before reading the official manual. Had I found an explanation as the one above when I was approaching buildbot, I'd have had to read the manual just once, rather than multiple times. Hope this can help someone else.

(Thanks to Davide Brini for permission to include this tutorial, derived from one he originally posted at http://backreference.org.)