No disaster planning at all - This is the situation of most companies in the world; disaster has not even been considered and no planning has been done. When a disaster occurs, people become frantic and recovery is difficult if not impossible. Companies without a good disaster recovery plan are living on borrowed time. A disaster will strike: it could be a flood, a collapsed ceiling, an infestation of insects in the wiring, a major earthquake or a terrorist attack.
IT departments without any kind of disaster recovery need to get new management. Disasters can occur at any time and rarely do they give notice, and the lack of a plan can mean the complete destruction of a company. If you work in a company which has an IT department which does not plan for disasters, then perhaps it would be wise to start looking around for one that does.
No disaster plan, but good backup procedures - If you cannot get anyone in your company to buy into disaster planning, then the bare minimum is to regularly (once every day) back up the data on your computers and store them offsite (use an archival company - never store them at employee's homes). If your IT department is not making good backups of at least the critical systems every single day (at the minimum), then it is simply not doing it's job. One important thing to remember about backups: they must be tested upon occasion. Nothing is more frustrating than to need a backup and find that the data is corrupt or non-existent.
IT departments who do not perform this simple step should be fired in mass, as they are placing their entire companies at risk.
Fault tolerance - The next step up in disaster recovery is to build fault tolerance into all of your critical systems. This means installing RAID drives (disk drives which are redundant copies of each other), clustered systems and other types of local recovery procedures.
A plan for disaster without any resources in place - Once you have a good backup and archival procedure and your critical systems are fault tolerant, the next step is to put together procedures for remote disaster recovery. This simply means you ask and answer the question, "what do we do if the computer center is utterly destroyed?" You might, for example, make arrangements with another division or company to share equipment and space if either is struck by disaster. Agreements need to be made with critical computer vendors to quickly ship new systems in the event of an emergency. This kind of planning is a good first step, although recovery would be slow in the event of disaster.
Cold Site - This is a site (often managed by a third party and shared among multiple clients) which is stocked with equipment and ready to go. However, the machines are not operational, data is not copied on a live basis and time (generally more than 24 hours) is required to bring the site up live. This is a popular disaster recovery method because it tends to be less expensive than other options, yet still gives a company the ability to survive a true disaster.
If you outsource your disaster recovery to a third party, than odds are they will establish this form of disaster recovery. This will work as long as your planning is good, your backups are sound and your documentation is excellent. Of course, extended downtime in the event of a disaster must be acceptable for a cold site to be a valid option. Plan on twenty-four hours for critical systems and as long as a week for less important functions.
A split site - Some companies are large enough that the IT department could be staffed at more than one location. In the event of a disaster to one site, operations would simply shift to the other. Any needed equipment could be purchased as necessary in the event of a disaster. The advantage to this method is it eliminates the need for the major up-front costs of building a disaster center.
A Warm Site - If your company has the resources and good sense to understand that IT is vital to it's survival, then you should be able to at least create a warm site. This is a site which is pre-positioned with equipment, software and other necessities, all ready to go in the event of a disaster. The equipment is idle, often turned off, but can be quickly restored and brought online if needed. Data is quickly available and can be restored without much difficulty.
Companies that go to this level of disaster preparedness are rare; a high level of competence and foreword thinking is required plan, build and maintain it.
A Hot Site - In this scenario, a duplicate computer center is set up in a remote location (at least a few miles from the primary computer facility), with communications lines set up and actively copying data at all times. The site has a duplicate of every critical server (at least), with data that is up-to-date to within hours, minutes or even seconds. It also (in the best case) has desks, phones and whatever else is necessary for operations to continue if the worst happens.
This is the ultimate in disaster preparation, reserved for companies with excellent management and highly skilled IT staff. Hot sites are expensive, difficult to set up and require constant maintenance, but in the event of a disaster operations can continue with a minimum of downtime.
The ultimate - For my company, I am in charge of the disaster recovery. Senior management is very intelligent and understands the criticality of IT and the necessity for the systems to be up quickly in the event of a disaster. With this in mind, we were able to put together a very well thought out disaster plan. We have a hot site which is connected to our main computer facility by a T3 line. All of our data is copied to the hot site computers over the T3 in real time or at staggered intervals (generally every 15 minutes). In additional, all of the communications to our stores have backup capability to the hot site as well.
Our premise was simple: we planned that the disaster center would be able to take over immediately for critical functions if the main computer facility was totally destroyed. Non-critical functions such as accounting would be restored within 24 to 48 hours.
And we go one step further. To be absolutely certain that our plans work, we actually run our operations out of the disaster site at least once per year.
Conclusions - Our operating basis is simple: we still want to have a job and get paid if our computer center is destroyed. It's very simple, really. Our management is top-notch, so there was no argument about what was needed. That's the way it should be, as one thing you can count on in life is: there will be a disaster at one time or another. Thus, you had better plan on it.
Unless otherwise noted, all photos and text is Copyright © Richard G Lowe, Jr.