2.5.6. Workers
The workers
configuration key specifies a list of known workers.
In the common case, each worker is defined by an instance of the buildbot.worker.Worker
class.
It represents a standard, manually started machine that will try to connect to the Buildbot master as a worker.
Buildbot also supports “on-demand”, or latent, workers, which allow Buildbot to dynamically start and stop worker instances.
2.5.6.1. Defining Workers
A Worker
instance is created with a workername
and a workerpassword
.
These are the same two values that need to be provided to the worker administrator when they create the worker.
The workername
must be unique, of course.
The password exists to prevent evildoers from interfering with Buildbot by inserting their own (broken) workers into the system and thus displacing the real ones.
Password may be a Secret.
Workers with an unrecognized workername
or a non-matching password will be rejected when they attempt to connect, and a message describing the problem will be written to the log file (see Logfiles).
A configuration for two workers would look like:
from buildbot.plugins import worker
c['workers'] = [
worker.Worker('bot-solaris', 'solarispasswd'),
worker.Worker('bot-bsd', 'bsdpasswd'),
]
2.5.6.2. Worker Options
Properties
Worker
objects can also be created with an optional properties
argument, a dictionary specifying properties that will be available to any builds performed on this worker.
For example:
c['workers'] = [
worker.Worker('bot-solaris', 'solarispasswd',
properties={'os': 'solaris'}),
]
Worker
properties have priority over other sources (Builder
, Scheduler
, etc.).
You may use the defaultProperties
parameter that will only be added to Build Properties if they are not already set by another source:
c['workers'] = [
worker.Worker('fast-bot', 'fast-passwd',
defaultProperties={'parallel_make': 10}),
]
Worker
collects and exposes /etc/os-release
fields for interpolation.
These can be used to determine details about the running operating system, such as distribution and version.
See https://www.linux.org/docs/man5/os-release.html for details on possible fields.
Each field is imported with os_
prefix and in lower case. os_id
, os_id_like
, os_version_id
and os_version_codename
are always set, but can be null.
Limiting Concurrency
The Worker
constructor can also take an optional max_builds
parameter to limit the number of builds that it will execute simultaneously:
c['workers'] = [
worker.Worker('bot-linux', 'linuxpassword',
max_builds=2),
]
Note
In Worker For Builders concept only one build from the same builder would run on the worker.
Master-Worker TCP Keepalive
By default, the buildmaster sends a simple, non-blocking message to each worker every hour. These keepalives ensure that traffic is flowing over the underlying TCP connection, allowing the system’s network stack to detect any problems before a build is started.
The interval can be modified by specifying the interval in seconds using the keepalive_interval
parameter of Worker
(defaults to 3600):
c['workers'] = [
worker.Worker('bot-linux', 'linuxpasswd',
keepalive_interval=3600)
]
The interval can be set to None
to disable this functionality altogether.
When Workers Go Missing
Sometimes, the workers go away. One very common reason for this is when the worker process is started once (manually) and left running, but then later the machine reboots and the process is not automatically restarted.
If you’d like to have the administrator of the worker (or other people) be notified by email when the worker has been missing for too long, just add the notify_on_missing=
argument to the Worker
definition.
This value can be a single email address, or a list of addresses:
c['workers'] = [
worker.Worker('bot-solaris', 'solarispasswd',
notify_on_missing='bob@example.com')
]
By default, this will send an email when the worker has been disconnected for more than one hour.
Only one email per connection-loss event will be sent.
To change the timeout, use missing_timeout=
and give it a number of seconds (the default is 3600).
You can have the buildmaster send an email to multiple recipients by providing a list of addresses instead of a single one:
c['workers'] = [
worker.Worker('bot-solaris', 'solarispasswd',
notify_on_missing=['bob@example.com', 'alice@example.org'],
missing_timeout=300) # notify after 5 minutes
]
The email sent this way will use a MailNotifier
(see MailNotifier
) status target, if one is configured.
This provides a way for you to control the from address of the email, as well as the relayhost (aka smarthost) to use as an SMTP server.
If no MailNotifier
is configured on this buildmaster, the worker-missing emails will be sent using a default configuration.
Note that if you want to have a MailNotifier
for worker-missing emails but not for regular build emails, just create one with builders=[]
, as follows:
from buildbot.plugins import status, worker
m = status.MailNotifier(fromaddr='buildbot@localhost', builders=[],
relayhost='smtp.example.org')
c['reporters'].append(m)
c['workers'] = [
worker.Worker('bot-solaris', 'solarispasswd',
notify_on_missing='bob@example.com')
]
Workers States
There are some times when a worker misbehaves because of issues with its configuration. In those cases, you may want to pause the worker, or maybe completely shut it down.
There are three actions that you may take (in the worker’s web page Actions dialog):
Pause: If a worker is paused, it won’t accept new builds. The action of pausing a worker will not affect any ongoing build.
Graceful Shutdown: If a worker is in graceful shutdown mode, it won’t accept new builds, but will finish the current builds. When all of its build are finished, the buildbot-worker process will terminate.
Force Shutdown: If a worker is in force shutdown mode, it will terminate immediately, and the build it was currently doing will be put to retry state.
Those actions will put the worker in either of two states:
paused: the worker is paused if it is connected but doesn’t accept new builds.
graceful: the worker is graceful if it doesn’t accept new builds, and will shutdown when builds are finished.
A worker might not be able to accept a job for a period of time if buildbot detects a misbehavior. This is called the quarantine timer.
Quarantine timer is an exponential back-off mechanism for workers.
This prevents a misbehaving worker from eating the build queue by quickly finishing builds in EXCEPTION
state.
When misbehavior is detected, the timer will pause the worker for 10 seconds, and then the time will double with each misbehavior detection until the worker finishes a build.
The first case of misbehavior is for a latent worker to not start properly.
The second case of misbehavior is for a build to end with an EXCEPTION
status.
Pausing and unpausing a worker will force it to leave quarantine immediately. The quarantine timeout will not be reset until the worker finishes a build.
Worker states are stored in the database, can be queried via REST API, and are visible in the UI’s workers page.
2.5.6.3. Local Workers
For smaller setups, you may want to just run the workers on the same machine as the master. To simplify the maintenance, you may even want to run them in the same process.
This is what LocalWorker is for.
Instead of configuring a worker.Worker
, you have to configure a worker.LocalWorker
.
As the worker is running on the same process, password is not necessary.
You can run as many local workers as your machine’s CPU and memory allows.
A configuration for two workers would look like:
from buildbot.plugins import worker
c['workers'] = [
worker.LocalWorker('bot1'),
worker.LocalWorker('bot2'),
]
In order to use local workers you need to have buildbot-worker
package installed.
2.5.6.4. Latent Workers
The standard Buildbot model has workers started manually. The previous section described how to configure the master for this approach.
Another approach is to let the Buildbot master start workers when builds are ready, on-demand. Thanks to services such as Amazon Web Services’ Elastic Compute Cloud (“AWS EC2”), this is relatively easy to set up, and can be very useful for some situations.
The workers that are started on-demand are called “latent” workers. You can find the list of Supported Latent Workers below.
Common Options
The following options are available for all latent workers.
build_wait_timeout
This option allows you to specify how long a latent worker should wait after a build for another build before it shuts down. It defaults to 10 minutes. If this is set to 0, then the worker will be shut down immediately. If it is less than 0, it will be shut down only when shutting down master.
check_instance_interval
This option controls the interval that the health checks run during worker startup. The health checks speed up the detection of irrecoverably crashed worker (e.g. due to an issue with Docker image in the case of Docker workers). Without such checks build would continue waiting for the worker to connect until
missing_timeout
time elapses. The value of the option defaults to 10 seconds.
Supported Latent Workers
As of time of writing, Buildbot supports the following latent workers:
Dangers with Latent Workers
Any latent worker that interacts with a for-fee service, such as the EC2LatentWorker
, brings significant risks.
As already identified, the configuration will need access to account information that, if obtained by a criminal, can be used to charge services to your account.
Also, bugs in the Buildbot software may lead to unnecessary charges.
In particular, if the master neglects to shut down an instance for some reason, a virtual machine may be running unnecessarily, charging against your account.
Manual and/or automatic (e.g. Nagios with a plugin using a library like boto) double-checking may be appropriate.
A comparatively trivial note is that currently if two instances try to attach to the same latent worker, it is likely that the system will become confused. This should not occur, unless, for instance, you configure a normal worker to connect with the authentication of a latent buildbot. If this situation does occurs, stop all attached instances and restart the master.