TechSite is often involved in high profile Data Center enhancement projects. These projects may include creating a dual bus in a data center that only has a single power path, changing out switchgear, push-pulling UPS systems, or other major upgrades. These projects are often done with the mission critical equipment on-line. We often feel these projects are similar to changing the engines on a 747 at 30,000 ft.
My experience is that the high profile projects which have the attention of everyone from the CIO on down, are usually well planned. Schedules, Methods of Procedures, responsibilities, and back out plans are all discussed and practiced in advance. The actual implementation of the project is normally smooth and has a positive outcome.
Some might ask what about the day-to-day activities that are part of data center operations versus these high profile projects? Are we doing the fundamentals right? Here are some examples of what I mean by fundamentals:
Documentation: Know your facility. Do we have panel board schedules for all panels in the data center? How else do we ensure that we have maintained redundancy or avoided mistakes when reconfiguring? Do we have a schematic drawing of the Emergency Power Off system? How else do we safeguard the EPO or recover if it is ever used? Do we have current as-built drawings for everything? The as-built drawings are so easy to produce during construction and very difficult to produce later after everything is covered up. It is best to have them updated during the construction process and make sure we get copies from the installing contractors or engineers.
Maintenance: Keep your facility in good repair. Do we have complete records of every maintenance procedure implemented on our critical infrastructure? Without them, how else do we spot trends and know when the next time-driven recurrent procedure is due? Do we have concurrent maintainability built into our design? If performing routine maintenance is avoided because something might go wrong… think of never changing the oil in your car. Maintenance avoidance is a serious problem.
Testing: Exercise your facility’s integrated systems during regular, planned testing events. How often do we shut down one side of our dual bus? For many people the answer is never. How do we know that everything is plugged in correctly and that power supplies are working redundantly as advertised? Do we know if our rack PDUs can handle the full load of the devices and that we have reserved fail-over capacity on the breakers, PDUs, and UPS’s? Do we transfer the actual data center load onto generator via a simulated outage four (4) times/year? How do we know all the relays and generator batteries are ready? Without testing these various fundamentals we do not know what can or will go wrong in a real emergency.
These are just a few examples of the fundamentals of running a mission critical data center. As a community, we tend to do a good job on the high profile/high visibility events. Let’s make sure we do the fundamentals right as well because they are notably important, even if the CIO is not looking.