On-Call Playbook for role::oit_linux::tftpserver hosts
In a network boot, the DHCP servers (operated by Comtech) identify which servers(s) client computers should download their boot files.
The tftp servers (this role) deliver a kernel and RAMdisk used to install the operating system.
When the tftp server is unavailable, clients attempting to boot off the network will be unable to download their installers, and will eventually time out.
This role should be available 24x7x365, but the consequences of short unexpected downtime are trivial (delays on installing new machines)
You should post an outage notification
- If the total downtime was more than 5 minutes during the production day, or
- the total downtime was more than 15 minutes during off-hours, or
These hosts are on VLAN 30, and should be accessible via ssh on port 22 from anywhere on or off campus, without requiring VPN.
Operating System checks
OS checks test general operating system health.
foreman smart proxy process
|20 minutes||Never||No proxy process|
This checks to see if the foreman smart proxy is currently running on the host. It’s a pretty dumb check right now.
|20 minutes||Never||tfto connection failed.|
This checks to see if the sysnews server can contact the tftp server. at all by asking for some none existent bogus file and checking the negative answer from the server.
Always deal with underlying OS issues first.
First responders should ssh to the host, and reboot with
When the host comes back up, [“Force and immediate status check”] on sysnews, and [log in to the Foreman server (build.oit.ncsu.edu)] and click on the “TFTP on …” entry under the Infrastructure -> Smart Proxies main menu is not showing errors.
If the problem is unresolved
Troubleshoot and fix the problem. If you need to,
puppet agent --disable and hand fix the bugger. No whining.
Please fill out as many technical details as possible and appropriate in this Sysnews Boilerplate post for tftpserversEdit me