Watch the BOSH Resurrector resurrect Bosh jobs
What?
The Resurrector is a plugin to the BOSH Health Monitor that is responsible for automatically recreating VMs that become inaccessible. It continuously cross-references VMs expected to be running against the VMs that are sending heartbeats. When the resurrector does not receive heartbeats for a VM for a certain period of time, it will kick off a task on the Director to try to “resurrect” that VM. The Director may do one of two things:
- create a new VM if the old VM is missing
- replace a VM if the Agent on that VM is not responding to commands
How?
- Run watch bosh vmsso you can keep an eye on the effect you’re having on VM state.
- Open a second terminal buffer and bosh sshinto one of the Diego cells.
Killing off a BOSH agent is a little harder than it looks. This is a great thing for CF operators, but less of a good thing when creating exercises to learn about the system. For instance, try killing off an agent process:
- Run ps aux | grep bosh-agentto find the BOSH agent.
- Kill it mercilessly, kill -9 <process id>
- Run ps auxagain. Grep for the agent again. Discover that phoenix-like, there is already a new agent process with a new process id. The VM’s listed state won’t have even flickered. Don’t quote me here, but I’m pretty sure upstart is responsible for this sorcery.
Looks like we’ll have to get creative if we’re ever going to see this resurrector at work.
- While still SSHed into a VM, notice the path to agent.jsonin the output ofps auxand throw some un-parseable junk in there.
Expected Result
Watch the process choke, the VM fail, and the resurrector bring it back! If it doesn’t come back within a few minutes, check to see if resurrection is turned on.