Posted By: David Faust
It’s fall and the weather is again changing—are you prepared to weather the storm? In this issue, we consider preparedness when disaster strikes. You will learn how c‑treeACE database technology provides features and tools enhancing your survival strategy.
Face it… stuff happens. Consider recent history: 9/11, Hurricane Sandy, Typhoon Haiyan, Napa Valley earthquakes, and now attention on Ebola. We’re already busy enough with the continual barrage of cyber attacks networks face every second of every day. Here is a short list of disasters waiting to strike your organization at the least opportune time:
- Natural Disasters – earthquake, hurricanes, storms, blizzards, wildfires
- Utility outages and disruptions
- Cyber attacks
- Random component failure
Even general economic calamities and corporate reorganizations can quickly and seriously disrupt any business. With sudden and unexpected staff changes, is your “human knowledgebase” retained?
Business must survive and many organizations require zero downtime environments. Hospitals, financial institutions, and educational facilities must continue operations. Many organizations depend on remote facilities for daily operations. What if your remote support center is disrupted?
Will your organization survive? Be prepared!
Business Continuity Plans
A Business Continuity Plan should be a component of larger risk management assessments for your entire organization and its associated vendors and partners. ISO 31000:2009 risk management certification may be your ultimate goal. However, even businesses not seeking certification must think about survival “the day after.”
A complete business continuity plan concerns much more than data backups. Who will access those backups? Who does your team turn to when communications are down? Are procedures in place addressing responsibilities during a crisis? More than simply data backups, successful continuity plans address both people and technology.
Good communication is required of any successful organization. A continuity plan must assume all communications are disrupted. Documented procedures must be available to every party outlining clear responsibilities and fallback procedures should chain of command be broken. Also consider legal entities. How will customer contracts continue to be accessed, reviewed, and updated? Do you have contractual SLAs that need to be addressed during a service outage, or (depending on the nature of the outage) do customers need to be notified that an “Act of God” provision is being invoked? How will your vendors interact with your systems?
A good plan ensures everyone understands their role and duties while in crisis mode. Successful business recovery requires quick and accurate determinations of organizational roles and leadership and each category must be clearly defined and documented:
Will your people know where to find this documentation in a crisis?
As your business grows and people and roles change within your organization, your continuity plan must also evolve. Consistent assessment of your plans is required to achieve business recovery goals—this requires advance planning and execution. The best made plans frequently fail when put to the test. Crisis drills play an important role in identifying weaknesses in even the most detailed plans. When was the last time you practiced a simple fire drill? Now consider if your physical location is completely “dark.” Can you service your customers?
Many data backup options are available depending on latent storage needs. You must also consider internal and external data retention policies as part of an overall backup strategy. Do you need all data immediately available? Or can you continue with a minimal subset? How long does it take to recover data to a usable format? What other risks must be addressed with your particular data needs? Don’t forget an aging strategy to archive and then delete from active systems any data that no longer needs immediate accessibility.
c‑treeACE Dynamic Dump backups provide easily enabled backup solutions for your database data. A simple text-based script provides one-time set up and walk away convenience. Ensure backed up data is then retained off-site and you’ll have a safe, easily restorable solution. Additionally, by retaining transaction logs between backups, you’ll be able to roll forward to a point in time beyond your original backup.
Third-party applications can also be used to back up c‑treeACE data. Microsoft Windows provides a VSS component that integrates with third-party tools. Many SAN devices include snap capabilities which can take nearly instantaneous disk-level backups. When coordinated with c‑treeACE, this is a very fast way to grab a point-in-time snapshot of the database. Even a basic Unix rsync can be considered with c‑treeACE. For any of these strategies, c‑treeACE provides a full database quiesce feature, either in crash-consistent or database-consistent modes. Crash-consistent means cached data is not flushed to disk first, and is only appropriate for files under full transaction control. The restored data in this case requires a server to undergo automatic recovery from the transaction logs, which should also be part of the database copy. A full database-consistent mode forces a flush of all dirty cache pages to disk as part of the quiesce process. In this mode, the database is ready-to-go should the snap be brought back online at a later time. c‑treeACE makes the quiesce very easy with utilities and API calls that can be integrated into any backup strategy.
Any data backup strategy must include off-site storage.
Have you considered reliable and secure transportation while in transit to the off-site facility? Is your driver and vehicle bonded and insured against data loss? Your biggest off-site risks are physical media and security of location. Tapes, DVD, and external disk drives are all easy and well understood backup media. However, you must consider how long each maintains complete data integrity. Each type of media ages differently, under different conditions. Environmental factors of the storage facility must be closely monitored and recorded. Always assume this off-site facility can be hit with calamity as well. What is your alternative strategy?
Cloud storage has become an extremely reliable option. Highly redundant and available, modern data backup strategies must consider this very economical medium. With cloud storage your larger risk is security, as data is now beyond your physical control. This option may not be appropriate for protected and highly confidential data. An off-site storage provider based in your home country may have a redundant system in another country. If the data you are storing is protected under local laws (e.g., healthcare data), make sure those laws permit storage outside of the country. That said, certainly a combination of off-site physical storage combined with cloud storage meets the majority of even the most demanding data retention needs.
Distributed data through replication is another model for off-site data storage. In this model, data is replicated either batched, or via real-time replication. With the c‑treeAMS Replication Agent, you can have a high availability secondary c‑treeACE server always running in a remote location, with the latest up-to-date information. Distance is the tradeoff here, as performance can diminish proportional to distance. With modest investment in additional resources and setup, a replicated data model can reap huge payoffs when a second location is imminently required.
Retaining data is only a fraction of your backup plan. Restoring data is your most important goal. Can you access your backups when you need them most? Have you tested your backups and ensured they’re complete and reliable? What is your failover strategy and who will manage this process? If using a replicated solution, what procedures are in place ensuring applications connect to alternate servers in a timely manner?
There are two recovery objectives to address in your continuity plans:
- Recovery Point Objective (RPO) – the acceptable latency of data that will not be recovered
- Recovery Time Objective (RTO) – the acceptable amount of time to restore the function
RPO concerns data that cannot be recovered from your system. Perhaps a paper trail remains available; however, this shouldn’t be counted on.
RTO concerns how long it will take to be up and running, at least as minimally defined in your plan.
Be sure to secure devices to read data from different media as you shouldn’t expect your original equipment being available. For example, retain a second device for proprietary tape drives should you use this format. It is disastrous to have data in hand and no way to restore from the media!
Backups are vacuous without recovery. VALIDATE your data restoration process on a continual basis and assess changing data requirements over time. What was appropriate yesterday is likely not adequate tomorrow.
While many disasters are natural events, others are often more pernicious. Systems must be monitored. Close attention of end-to-end components within your environment provides early problem detection. Networks, phone systems, services, and databases must all be carefully monitored. c‑treeACE provides over 400 available metrics covering everything from cache performance, memory usage, network traffic, disk I/O, locks and users. By using the c‑treeACE SnapShot API you can easily add specific statistic monitoring to your application. Available utilities integrate easily into third-party monitoring tools such as Nagios. A future article is planned demonstrating c‑treeACE utilities for this purpose. Stay tuned.
A key component of your risk management assessment will be a substantial security policy. All employees should be aware of this policy and adhere to it. Safeguarding and monitoring systems from both external AND internal actions ensures steady and uninterrupted processes.
Practice – Refine – REPEAT
Every business continuity plan with any chance of success requires judicious planning, practice, and continual refinement starting the process over and over. This should be an investment in your business, considered both an insurance policy and an asset. A concrete, detailed plan reaps incalculable benefits within your organizational structure both during a crisis and while preparing for one.