As a pilot, I recently passed the 35th anniversary of my first solo flight. With my dad as an instructor, I was flying at a very young age, even before I started my career in mission critical facility infrastructure. That anniversary put me to thinking about the similarities between being a pilot and running a data center.
TechSite uses a 1978 Cessna 310 to travel to our more distant customer’s job sites. When it comes to redundancy, the 310’s design is a lot like that of a high availability data center. It has redundant engines, redundant instruments, and redundant vacuum systems. The 310 even has redundant electrical systems where both alternators normally share the load and the load shifts to the surviving unit should one fail. Sounds a lot like a dual bus, doesn’t it.
Every pilot is required to maintain minimum proficiency standards. Pilots must make a minimum of three landings every 90 days, three landings at night every 90 days, and six instrument approaches every six months. My insurance company requires me to have an instrument proficiency check every year where I go up with an instructor, put a hood over my eyes, and fly without looking out the windows.
A pilot must also go up with an instructor once every two years for a Biannual Flight review (BFR). During the BFR, the instructor makes you practice “what if” scenarios. Examples include: What would you do if one of your engines quit on takeoff? What would you do if your radios failed? What would you do if your electrical system failed? What would you do in the event of a fire….? The airplane manufacturers create emergency checklists with immediate action steps that you need to memorize, more in depth check lists you have to keep beside you so that you can refer to them quickly, and complete operating handbooks to read before any emergencies occur.
When was the last time that you practiced your data center proficiency? What would you do if someone accidently pushed the EPO? What would you do if you had a fire in the data center? What would you do if your chiller failed? Could you run your data center if your monitoring system failed? Do you know where your check lists, as-builts, schematics, and operating manuals are located?
Like flying a plane, running a data center requires vigilance and practice to remain proficient. Occasionally you may also need to bring in an “instructor” and run through the drills. The middle of a data center emergency is not the time to figure out that your plane is stalling and you don’t know what to do next.