2.5.6. Workers

The workers configuration key specifies a list of known workers. In the common case, each worker is defined by an instance of the buildbot.worker.Worker class. It represents a standard, manually started machine that will try to connect to the Buildbot master as a worker. Buildbot also supports “on-demand”, or latent, workers, which allow Buildbot to dynamically start and stop worker instances.

2.5.6.1. Defining Workers

A Worker instance is created with a workername and a workerpassword. These are the same two values that need to be provided to the worker administrator when they create the worker.

The workername must be unique, of course. The password exists to prevent evildoers from interfering with Buildbot by inserting their own (broken) workers into the system and thus displacing the real ones. Password may be a Secret.

Workers with an unrecognized workername or a non-matching password will be rejected when they attempt to connect, and a message describing the problem will be written to the log file (see Logfiles).

A configuration for two workers would look like:

from buildbot.plugins import worker
c['workers'] = [
    worker.Worker('bot-solaris', 'solarispasswd'),
    worker.Worker('bot-bsd', 'bsdpasswd'),
]

2.5.6.2. Worker Options

Properties

Worker objects can also be created with an optional properties argument, a dictionary specifying properties that will be available to any builds performed on this worker. For example:

c['workers'] = [
    worker.Worker('bot-solaris', 'solarispasswd',
                  properties={'os': 'solaris'}),
]

Worker properties have priority over other sources (Builder, Scheduler, etc.). You may use the defaultProperties parameter that will only be added to Build Properties if they are not already set by another source:

c['workers'] = [
    worker.Worker('fast-bot', 'fast-passwd',
                  defaultProperties={'parallel_make': 10}),
]

Worker collects and exposes /etc/os-release fields for interpolation. These can be used to determine details about the running operating system, such as distribution and version. See https://www.linux.org/docs/man5/os-release.html for details on possible fields. Each field is imported with os_ prefix and in lower case. os_id, os_id_like, os_version_id and os_version_codename are always set, but can be null.

Limiting Concurrency

The Worker constructor can also take an optional max_builds parameter to limit the number of builds that it will execute simultaneously:

c['workers'] = [
    worker.Worker('bot-linux', 'linuxpassword',
                  max_builds=2),
]

Note

In Worker For Builders concept only one build from the same builder would run on the worker.

Master-Worker TCP Keepalive

By default, the buildmaster sends a simple, non-blocking message to each worker every hour. These keepalives ensure that traffic is flowing over the underlying TCP connection, allowing the system’s network stack to detect any problems before a build is started.

The interval can be modified by specifying the interval in seconds using the keepalive_interval parameter of Worker (defaults to 3600):

c['workers'] = [
    worker.Worker('bot-linux', 'linuxpasswd',
                  keepalive_interval=3600)
]

The interval can be set to None to disable this functionality altogether.

When Workers Go Missing

Sometimes, the workers go away. One very common reason for this is when the worker process is started once (manually) and left running, but then later the machine reboots and the process is not automatically restarted.

If you’d like to have the administrator of the worker (or other people) be notified by email when the worker has been missing for too long, just add the notify_on_missing= argument to the Worker definition. This value can be a single email address, or a list of addresses:

c['workers'] = [
    worker.Worker('bot-solaris', 'solarispasswd',
                  notify_on_missing='bob@example.com')
]

By default, this will send an email when the worker has been disconnected for more than one hour. Only one email per connection-loss event will be sent. To change the timeout, use missing_timeout= and give it a number of seconds (the default is 3600).

You can have the buildmaster send an email to multiple recipients by providing a list of addresses instead of a single one:

c['workers'] = [
    worker.Worker('bot-solaris', 'solarispasswd',
                  notify_on_missing=['bob@example.com', 'alice@example.org'],
                  missing_timeout=300)  # notify after 5 minutes
]

The email sent this way will use a MailNotifier (see MailNotifier) status target, if one is configured. This provides a way for you to control the from address of the email, as well as the relayhost (aka smarthost) to use as an SMTP server. If no MailNotifier is configured on this buildmaster, the worker-missing emails will be sent using a default configuration.

Note that if you want to have a MailNotifier for worker-missing emails but not for regular build emails, just create one with builders=[], as follows:

from buildbot.plugins import status, worker
m = status.MailNotifier(fromaddr='buildbot@localhost', builders=[],
                        relayhost='smtp.example.org')
c['reporters'].append(m)

c['workers'] = [
    worker.Worker('bot-solaris', 'solarispasswd',
                  notify_on_missing='bob@example.com')
]

Workers States

There are some times when a worker misbehaves because of issues with its configuration. In those cases, you may want to pause the worker, or maybe completely shut it down.

There are three actions that you may take (in the worker’s web page Actions dialog):

  • Pause: If a worker is paused, it won’t accept new builds. The action of pausing a worker will not affect any ongoing build.

  • Graceful Shutdown: If a worker is in graceful shutdown mode, it won’t accept new builds, but will finish the current builds. When all of its build are finished, the buildbot-worker process will terminate.

  • Force Shutdown: If a worker is in force shutdown mode, it will terminate immediately, and the build it was currently doing will be put to retry state.

Those actions will put the worker in either of two states:

  • paused: the worker is paused if it is connected but doesn’t accept new builds.

  • graceful: the worker is graceful if it doesn’t accept new builds, and will shutdown when builds are finished.

A worker might not be able to accept a job for a period of time if buildbot detects a misbehavior. This is called the quarantine timer.

Quarantine timer is an exponential back-off mechanism for workers. This prevents a misbehaving worker from eating the build queue by quickly finishing builds in EXCEPTION state. When misbehavior is detected, the timer will pause the worker for 10 seconds, and then the time will double with each misbehavior detection until the worker finishes a build.

The first case of misbehavior is for a latent worker to not start properly. The second case of misbehavior is for a build to end with an EXCEPTION status.

Pausing and unpausing a worker will force it to leave quarantine immediately. The quarantine timeout will not be reset until the worker finishes a build.

Worker states are stored in the database, can be queried via REST API, and are visible in the UI’s workers page.

2.5.6.3. Local Workers

For smaller setups, you may want to just run the workers on the same machine as the master. To simplify the maintenance, you may even want to run them in the same process.

This is what LocalWorker is for. Instead of configuring a worker.Worker, you have to configure a worker.LocalWorker. As the worker is running on the same process, password is not necessary. You can run as many local workers as your machine’s CPU and memory allows.

A configuration for two workers would look like:

from buildbot.plugins import worker
c['workers'] = [
    worker.LocalWorker('bot1'),
    worker.LocalWorker('bot2'),
]

In order to use local workers you need to have buildbot-worker package installed.

2.5.6.4. Latent Workers

The standard Buildbot model has workers started manually. The previous section described how to configure the master for this approach.

Another approach is to let the Buildbot master start workers when builds are ready, on-demand. Thanks to services such as Amazon Web Services’ Elastic Compute Cloud (“AWS EC2”), this is relatively easy to set up, and can be very useful for some situations.

The workers that are started on-demand are called “latent” workers. You can find the list of Supported Latent Workers below.

Common Options

The following options are available for all latent workers.

build_wait_timeout

This option allows you to specify how long a latent worker should wait after a build for another build before it shuts down. It defaults to 10 minutes. If this is set to 0, then the worker will be shut down immediately. If it is less than 0, it will be shut down only when shutting down master.

check_instance_interval

This option controls the interval that the health checks run during worker startup. The health checks speed up the detection of irrecoverably crashed worker (e.g. due to an issue with Docker image in the case of Docker workers). Without such checks build would continue waiting for the worker to connect until missing_timeout time elapses. The value of the option defaults to 10 seconds.

Supported Latent Workers

As of time of writing, Buildbot supports the following latent workers:

Dangers with Latent Workers

Any latent worker that interacts with a for-fee service, such as the EC2LatentWorker, brings significant risks. As already identified, the configuration will need access to account information that, if obtained by a criminal, can be used to charge services to your account. Also, bugs in the Buildbot software may lead to unnecessary charges. In particular, if the master neglects to shut down an instance for some reason, a virtual machine may be running unnecessarily, charging against your account. Manual and/or automatic (e.g. Nagios with a plugin using a library like boto) double-checking may be appropriate.

A comparatively trivial note is that currently if two instances try to attach to the same latent worker, it is likely that the system will become confused. This should not occur, unless, for instance, you configure a normal worker to connect with the authentication of a latent buildbot. If this situation does occurs, stop all attached instances and restart the master.