Situation: A glitch is causing unexpected system reboots. After much testing, you identify the problem. A firmware patch should prevent it from recurring. Luckily, you've already got the tools that will let you remotely "flash", or update, your firmware.
Complication: If your system glitches while you're remotely updating firmware, you won't be able to connect to it remotely anymore. Oh...and your system is on another planet.
That's the firmware problem facing a NASA team right now. The Mars Reconnaissance Orbiter has seen unexpected reboots, and engineers believe they've got a patch that could fix it. However, they're worried that a mistake or unexpected reboot during the patch process might leave the satellite so confused it will stop transmitting its data.
ProLiant engineers have actually grappled with this very same problem, though a little closer to home.
Before I explain that, an aside: there's a cool connection between HP engineering and Mars spacecraft. Lossless compression technology developed by HP labs and used in HP's RGS software for workstations was used by NASA for transferring images from the Spirit Rover on Mars.
Here's two ways that ProLiant blades -- including the RGS-using ProLiant WS460c G6 workstation blade -- protect you from this "botched update" scenario:
1. Redundant ROMs - There are two ROM images stored on each blade. One is a "primary" image, used to boot. The other is a "backup" image. Here's a screenshot from RBSU showing the version numbers (dates, actually) of the primary and backup images on one blade.
When you flash a ROM, it actually overwrites the backup image, and then makes this image the new primary. The original primary becomes the new backup. This hedges against both a new image being bad, and against the flash process failing to complete or corrupting the image. (One reason a flash might fail: total loss of power during a flash.)
By the way, if both ROM images are valid, you can select which one you want to use at boot time from RBSU. Here's a short video showing that:
There's also a manual way described in the Maintenance and Server Guide to force a boot to the redundant image by setting some physical DIP switches inside the blade itself.
2. Bootblock - There's actually a third, non-flashable section of a ProLiant ROM. This "boot block" section
includes a a disaster-recovery feature that lets the server flash a new ROM image, even if both of the existing ROM images are corrupted.
BIOS & firmware updates are often used to fix glitches, but HP (and presumably NASA) also add new features or enhancements too. We post release notes that describe all the fixes and enhancements added to each version. Here's a recent one added to the BL460c G6.
For example, one enhancement in this latest version is a "boot override menu" (see screenshot below), displayed by hitting F11 during boot. It lets you specify a "one time" override of the RBSU boot order, so you can boot to some other device. After booting that one time, the system will fall back to its original boot order settings.