The cloud disaster recovery buzz has expanded the demands for DRaaS (DR as a service) solution. Managed disaster recovery is now a mainstream offering that is supported by more than 250 disaster recovery providers. DRaaS emerged to address the need of IT organizations — which did not have the required staffing and/or secondary center facility — to support increasingly aggressive recovery-time targets. Disaster recovery Providers’ monthly service pricing and cloud usage policies have become increasingly fragmented as a direct result of increased IaaS pricing competition. And the more flexible cloud usage policies from hyper-scale cloud providers. This evolution put the responsibility to recover from disaster in the cloud providers’ hands. Choosing DRaaS can be a smart decision depending on a few elements. For those of you who choose DRaaS or Disaster Recovery Providers, on top of the checklist I posted in the previous post, please consider three main factors:
Get the service from a DR expert team
Team that has a lot of experience in DR methodology, writing high level design and low level design, has experience with more than 50 implementations of DR, getting 24/7 personal support (with a service level agreement), taking responsibility for the networking, and a crew that learns, trains, and lives DR and business continuity planning.
Demand weekly or monthly DR tests.
Mandatory. Bear in mind that a DR working site ≠ replication, so although cloud providers seem like a good opportunity, they don’t cover all testing aspects. Your task when choosing a DRaaS recovery site is to get a highly available, resilient, on-demand infrastructure that is the ideal foundation for your DR environment. The test objectives are: to perform good state snapshots, to test application inside VM, to predict problems in DR before a disaster happens, and, bottom line, to have an RTO and RPO detailed report for every server. Performing good state snapshots is very important. We had two customers that got Crypto. The virus was replicated to the DR site. By creating a snapshot at every test, we helped those two customers — one of them got back in four days, and the other one got back one day.
DR product and not “backup and replication”
The common mistake at choosing the primary software or hardware to manage your DR lies at the bad perception that replication = DR. It does not!!! Replication ≠ DR ≠ Backup. When you chose the product, please bear in mind the following demands from it:
To be free to have the ability to change your production storage, to change the target site country, and to change your VM application. Hardware-based synchronous replication is often proprietary for each storage vendor and provides no cross-vendor replication, but your product should have no affinity to any hardware or storage provider or storage disaster recovery architecture (SCSI, Fibre Channel, iSCSI). This independence allows your product to be used on existing heterogeneous storage infrastructures and does not lock you in to a vendor for future purchases. Using a hardware-independent solution allows uses not possible with proprietary array-based solutions.
Online disaster recovery
Demand layer 2 replication to have the ability to auto-forward IP addresses at real disaster event. You don’t want to forward them manually at disaster point.
Ensure that the cloud provider service notifies you that server is down before you aware of it.
Consistency in database systems refers to the requirement that any given database transaction must change affected data only in allowed ways. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code), but merely that any programming errors cannot result in the violation of any defined rules. Consistency is one of the four guarantees that define ACID transactions, however, significant ambiguity exists about the nature of this guarantee. It is defined variously as: * The guarantee that any transactions started in the future necessarily see the effects of other transactions committed in the past. * The guarantee that database constraints are not violated, particularly once a transaction commits. * The guarantee that operations in transactions are performed accurately, correctly, and with validity, with respect to application semantics. As these various definitions are not mutually exclusive, it is possible to design a system that guarantees “consistency” in every sense of the word, as most relational database management systems in common use today arguably do. Most databases like SQL or ORACLE are scalable, reliable, flexible, and high-performance relational database management systems for server-based systems. Your product needs to provides SQL disaster recovery real-time enterprise data protection and replication. Same for exchange server disaster recovery.
Replication method should be lean as possible
For example, choosing a snapshot base replication shortly will slow down the production target.
Disaster recovery RPO and Disaster recovery RTO – the closest to zero the best it is
Even with the benefit of knowing when a test will occur, many recovery planners do not properly prepare for the test. The primary objective of the test in to identify weaknesses that can be fixed before an actual disaster happens. This purpose can’t be achieved if the test doesn’t get the focus he should.
Business continuity planning is an investment
As some data indicate, most businesses that lost their information because of disasters or events eventually lose their business. It does not matter what you decide to do, whether you choose to get someone to do it for your company or you get some short courses on disaster recovery so that you can do it yourself, what is important is you plan for disaster.
Disaster recovery is an investment that you need to make for your business. You cannot risk your whole business just because of some lost data or downtime after an event.