On-Call Playbook for role::oit_linux::foreman hosts

This role manages the Foreman, aka build.oit.ncsu.edu

Ecosystem

The foreman server controls several other servers via agents known as “smart proxies” These include the tftp servers used to network boot, as well as the Puppet Master and Puppet CA.

The foreman is critical in that it serves as the “node classifier” which tells the puppet Master what manifests to apply. While foreman is down, puppet is not working properly for any clients.

Service Windows

This role should be available 24x7x365, and as mentioned above the consequences of short unexpected downtime visible to customers.

You should post an outage notification

  • If the total downtime was more than 15 minutes during the production day, or
  • the total downtime was more than 30 minutes during off-hours.

Firewall restrictions

Tests

First Actions

If the problem is unresolved

Posting boilerplate

Tags: oncall
Edit me