ivarch.com: Scheduled Command Wrapper: Setup Guide

Setup guide

The Scheduled Command Wrapper can be used in many different ways. This guide starts with some simple examples of replacing existing cron jobs, and then describes how to configure and monitor a system which can receive applications with their own scheduled items.

Most of these examples assume that scw has been installed to look for its global settings in /etc/ and place its logs and metrics under /var/, like this:

sh ./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc
make
sudo make install

If different paths were used, scw can always be passed a specific configuration file with the “-c” option.

Where an example starts with “$”, this indicates that you would type what comes after the “$” at the shell prompt, and the following lines will be the resultant output.

Running a simple command through SCW on the terminal

To get started, try running any shell command through scw interactively. When connected to a terminal, all command output will be displayed with a timestamp, colour coded according to the stream and status.

$ scw -c /dev/null -s Command="echo hello; echo error >&2" run dummy
scw: /var/spool/scw/you: Permission denied
scw: dummy: MetricsDir: metrics directory unavailable: proceeding without metrics or checks
2024-11-25T21:38:31 [s] (begin)
2024-11-25T21:38:31 [-] hello
2024-11-25T21:38:31 [E] error
2024-11-25T21:38:33 [s] (end) exit status 0, elapsed time 2s

In this example, the “-c /dev/null” tells scw not to load any configuration at all, so it proceeds with the defaults. This leads to the two error messages about permissions and missing directories. Once scw has been configured, these won’t appear any more.

The next option, “-s Command=…”, tells scw to ignore the item’s command setting and run this command instead.

Finally, “run dummy” tells scw to run an item named dummy. Since that item has no settings (as it doesn’t exist), scw proceeds with default settings plus the Command setting provided by “-s”.

Each output line is prefixed with the timestamp and the stream identifier (“s” for status, “-” for standard output, or “E” for standard error).

Adding logging to a user cron job

To use scw to add logging to cron jobs under your own user ID without any system-wide config, you will need a basic configuration file:

ItemsDir = /home/you/.scw/items
MetricsDir = /home/you/.scw/metrics/{ITEM}
CheckLockFile = /home/you/.scw/metrics/.lock
OutputMap = OES stamped /home/you/.scw/logs/{ITEM}.log

Change the paths as appropriate for your account, and make sure the directories exist:

mkdir -p /home/you/.scw/items
mkdir -p /home/you/.scw/metrics
mkdir -p /home/you/.scw/logs

Now look at the crontab entry you are going to update. For example, this entry might run a script which outputs its progress as it goes, causing cron to email you each night:

0 22 * * Mon-Fri backup-my-files.sh

Choose a name for this scheduled item, for example “backup”, and create a file for it. For example, /home/you/.scw/items/backup.cf would contain:

Description = Back up my files throughout the week
Command = backup-my-files.sh

You can test it by manually running the item:

scw -c /home/you/.scw/config run backup

This will run it as normal, but its output will be written to the terminal, as well as to the log file described in the OutputMap line of the configuration file.

Then you can change your crontab line to look like this:

0 22 * * Mon-Fri scw -c /home/you/.scw/config run backup

Now the job will run but its output will go to the log file instead, and every line will be timestamped.

If you still wanted an email, but only if the script ended with an error status, you could add another OutputMap using the “!” stream instruction, so your backup.cf item file would look like this:

Description = Back up my files each weeknight
Command = backup-my-files.sh
OutputMap = !OES stamped you@your.com

The OutputMap directives are cumulative, so the log file mentioned in your main configuration file will still be populated.

For the email to only include the errors rather than all of the output, change “!OES” to just “!E”. Refer to the “Output mapping” part of the manual for full details.

Note that the logs written by scw will keep growing unless you rotate them with a tool such as logrotate.

Adding output logging and monitoring to root cron jobs

The default settings for scw are tailored for a system-wide installation. This means that if you are happy with the default output locations, then there is no need for a custom configuration file when replacing root cron jobs.

Here is a cron job which runs every 5 minutes to collect some information:

$ sudo cat /etc/cron.d/get-hdd-temperature
*/5 * * * * root /usr/local/sbin/get-hdd-temperature.sh

If something goes wrong, then it will email root every 5 minutes, which can cause problems by itself. However, if we discard errors, then nobody will ever know about them. So, we will write its output to a log, and use a monitoring system to raise the alert when it fails.

As with the non-root example earlier, the logs are not rotated by scw. The next section includes a sample logrotate configuration.

Create a file /etc/scw/items/root/get-hdd-temperature.cf containing this:

Description = Update the cache file holding the hard disk temperatures
Schedule = */5 * * * *
Command = /usr/local/sbin/get-hdd-temperature.sh
MaxRunTime = 10 minutes
SuccessInterval = 15 minutes

The MaxRunTime setting tells scw to terminate the command after 10 minutes, since it shouldn’t take that long.

The SuccessInterval setting tells scw to notify the monitoring system that there should never be more than 15 minutes between successful completions of this item.

When there are many scheduled items, let scw write the crontab entries, so that everything about an item is described in one place - its item file. This means that each item file should have its own Schedule lines.

To get scw to manage the crontab, remove the original crontab line and then run “scw update” to generate a new one covering all items.

$ sudo rm /etc/cron.d/get-hdd-temperature
$ sudo scw update
$ sudo cat /etc/cron.d/scw
# (root:get-hdd-temperature)
*/5 * * * * root scw run get-hdd-temperature

Note how each item’s crontab lines are preceded by a comment showing which user and item they refer to.

Once an item is running through scw, it can be monitored with a tool such as Zabbix to detect when the item has been disabled, when it is overrunning, and when it has been too long between successful runs.

This is achieved by directing the monitoring system to read the item list file, which was generated by the “scw update” command above. It looks like this:

$ jq . < /var/spool/scw/items.json 
[
  {
    "username": "root",
    "item": "get-hdd-temperature",
    "description": "Update the cache file holding the hard disk temperatures",
    "metricsDir": "/var/spool/scw/root/get-hdd-temperature",
    "successInterval": 900
  }
]

In Zabbix, import the “Scheduled items managed by SCW” template, and apply that template to the host that scw is running on. The template is included with this documentation in both JSON and XML formats: zabbix_scw_template.json, zabbix_scw_template.xml.

When applying the template, you can alter the location of the item list file if necessary, by changing the value of the “{$SCW_ITEMLISTFILE}” macro.

Once the template is applied, within 10 minutes the scheduled items will be detected by Zabbix, which will create the appropriate monitoring entries and alert triggers.

If the scheduled item is disabled, Zabbix should alert you within 1 minute:

$ sudo scw disable get-hdd-temperature

Similarly, if the scheduled item hasn’t succeeded for 15 minutes (the item’s SuccessInterval), Zabbix will raise an alert.

Note that Zabbix is not the only option. In the scw manual, the “Metrics” subsection under “FILES” provides details of the metrics files generated for each item. Any monitoring system that can detect the presence or absence of a file, and the timestamp of a file, can make use of these in the same way Zabbix does.

A script for use with Checkmk is also included with this documentation: checkmk-local.sh. Read the comments at the top of the script for details.

After adding more items, or changing the SuccessInterval or Schedule of existing items, remember to re-run the update to regenerate both the crontab and the item list file:

$ sudo scw update

It is advisable to run this update command automatically to avoid forgetting to do it after making any changes.

Using scripts directly as scheduled items

The ItemsDir for a user may contain shell or Perl scripts as well as item files. Settings for scw can be placed directly inside them, in the comment block at the start.

For example, check-remote-mounts.sh will run every minute if saved as /etc/scw/items/root/check-remote-mounts.sh, made executable, and “scw update” is then run:

#!/bin/sh
#
# scw Description = Check and remount remote mount points
# scw Schedule = * * * * *
# scw MaxRunTime = 120
# scw SuccessInterval = 180
# scw SilentConcurrency = true
#
# For any NFS or CIFS filesystems that are either already mounted or are
# supposed to be automatically mounted on boot, check that they are
# still mounted (in case of stale NFS handle and so on).  Any which are
# not are unmounted and remounted.  Exits with failure if any remain
# unmounted.
#
# This script won't cope with mount points with whitespace in their
# paths.
#
# If monitored through the item list file, this will raise an alert if a
# mount point remains broken after 3 minutes.  An operator can then
# review this item's logs to see what failed and when.
#

timeoutCommand="timeout 10"
exitStatus=0

for mountPoint in $(awk '$3~/^(nfs|cifs)/ && !/^[[:space:]]*#/ {print $2}' /etc/fstab); do
        # Skip if it's not mounted on boot and isn't currently mounted.
        if awk -v m="${mountPoint}" '$2==m {print $4}' /etc/fstab | grep -Fq 'noauto'; then
                awk -v m="${mountPoint}" '$2==m {print}' /proc/mounts | grep -Eq . || continue
        fi

        printf "%s %s\n" "notice" "${mountPoint}: checking" >&3

        isMounted=false
        ${timeoutCommand} mountpoint -q "${mountPoint}" && isMounted=true

        if ! ${isMounted}; then
                # If not mounted - warn, and attempt to remount.
                printf "%s %s\n" "warning" "${mountPoint}: not mounted" >&3
                printf "%s %s\n" "notice" "${mountPoint}: attempting remount" >&3
                ${timeoutCommand} umount "${mountPoint}" \
                || ${timeoutCommand} umount -l "${mountPoint}" \
                || ${timeoutCommand} mount "${mountPoint}"
                ${timeoutCommand} mountpoint -q "${mountPoint}" && isMounted=true
        fi
        if ! ${isMounted}; then
                # If still not mounted - report the error.
                printf "%s %s\n" "error" "${mountPoint}: not mounted, remediation failed" >&3
                exitStatus=1
        else
                # Report that the mount is OK.
                printf "%s %s\n" "ok" "${mountPoint}: is mounted" >&3
        fi
done

exit ${exitStatus}

This script reports its status via file descriptor 3 (“>&3”), and the first word of each status message is either “notice”, “warning”, “error”, or “ok”.

The silent concurrency failure is used so that an alert won’t be raised unless mount points fail to mount after 3 minutes or the script gets stuck waiting for commands for 3 minutes. Otherwise, it would alert if it got stuck after 1 minute, and remote mounts can be slow to deal with, so this would risk causing false positives.

Managing scheduled jobs for multiple applications

The default configuration of scw allows applications, running under their own user IDs, to install their items under /etc/scw/items/USER/. It is up to each application to ensure that the log and metrics directories are writable by the application user.

When an application is deployed, it should run commands like this as root, such as via an RPM or Debian package post-installation script:

mkdir -p /var/spool/scw/USER
chown USER /var/spool/scw/USER
mkdir -p /var/log/scw/USER
chown USER /var/log/scw/USER
scw update --all-users

Invoking “scw update –all-users” ensures that the item crontabs, and the item list file, are updated for all users. Updating the item list file means that the items can be picked up by the monitoring system automatically, as described earlier.

Custom configuration for each application’s user can be placed in /etc/scw/settings/USER.cf, such as additional OutputMap lines:

OutputMap !E stamped application-owner@company.com

Since the item directory can contain shell or Perl scripts as well as configuration files, scheduled items can be deployed directly into the appropriate item directory, and include the settings within the comment block at the top of the file.

$ sudo cat /etc/scw/items/appuser/backup.sh
#!/bin/sh
# scw Description = Back up the application's files
# scw Schedule = 0 22 * * *        # 10pm every day
# scw SuccessInterval = 36 hours   # alert after a day and a half

printf "%s %s\n" "notice" "starting backup" >&3

/opt/app/sbin/trigger-app-backup
exitStatus=$?

if test $exitStatus -eq 0; then
    printf "%s %s\n" "ok" "backup successful" >&3
else
    printf "%s %s\n" "error" "backup failed - exit status $exitStatus" >&3
fi

exit $exitStatus

Refer to the manual for more details, including what happens if there is both a script and a configuration file for an item.

Note that scripts within an item directory must be executable.

An example logrotate configuration file for this setup is as follows:

$ sudo cat /etc/logrotate.d/scw
/var/log/scw/*/*.log {
    weekly
    compress
    delaycompress
    missingok
    notifempty
    nocreate
}

When scw is managing crontabs or the item list file is instructing the monitoring system, it is essential that “scw update” is run after any changes. Running it every 15 minutes is the simplest way to achieve that.

$ sudo cat /etc/cron.d/scw-update
*/15 * * * * root scw update

Running the update via a crontab managed by scw itself is not recommended, since a mistake that removes its item file will stop all further automatic updates without notification.

Add “–all-users” to the above crontab line if scw is managing the crontab for multiple users, not just root.

← Back to the project page