Migrating to the cloud is a logical move for most companies. But change is always hard, especially when there are legacy applications which have been running for a long time on on-premise.
One of our retail customer wanted to move one of their important, legacy, warehouse application to Google Cloud.
Following were the factors that needed to be considered,
- New environment needed to be tested with real world data.
- Minimal downtime.
- Easy fallback.
- Close Deadline.
- User confidence in the system.
The last factor was very important because the application had been running for a long time, and the environment was sort of a black box.
We decided that an ideal migration strategy would be Lift Tinker and Shift. For this, we replaced low-hanging fruits with managed services or solutions like kubernetes. This helped with an incremental move to the cloud and the deadline.
After considering several options, we identified that, an incremental approach, which would result in two parallel endpoints would be the best cut-over strategy.
Incremental changes in infrastructure allows easy setup and easy fallback.
Two parallel endpoints allow both on-premise and cloud environments to be alive simultaneously.
Endpoint users will be oblivious to a changed endpoint as they uses pre-configured devices. This would help with the traffic splitting.
The database was initially migrated to GCP. Database Migration service from GCP is useful to migrate op-premise databases to GCP.
In the initial state of the migration, the environment looked like the architecture above.
Users hit the endpoints of the application, which connects to a Cloud SQL database instance. Some important key-value data and job requests are sent to a redis server. Few celery instances pick up the jobs from the redis and execute them. Some of the key-value data from the redis are accessed from outside the environment using API endpoints as well.
In the next stage of the migration following changes were made,
- Application is deployed in GCP (GCE/GKE).
- A Cloud Memorystore instance is deployed in GCP.
- New celery instances are deployed in GCP.
- HAProxy is set up in a GCP instance.
Traffic from on-premise is routed via the HAProxy to Cloud Memorystore. Now there are two working endpoints from two environments. Cloud Memorystore is chosen as an anchor for this setup, as it is used as a real-time source of truth by application. For traffic splitting, we had the luxury to change to new endpoints in end devices. But traffic can be split in other layers like load balancer as well.
Once the environment was established, users were incrementally migrated to the cloud endpoint. After extensive testing and confidence building, all users were moved to cloud and the on-premise environment was shut down.
This cut-over approach enables,
- Parallel testing to compare and verify both environment outputs.
- Incremental cut-over of users.
- Easy fallback at any stage of the migration.
- Increased user confidence.
Even though this approach allows almost stress-free, seamless cloud migration, this is unnecessary for many environments, as the expense of retaining two environments can result in higher cost. The cost, of course, depends on how long the testing phase lasts. Another negative aspect of this approach is the requirement to maintain parallel CI code or sometimes parallel application code until the testing phase is over.
Along with the design of infrastructure, the way we move to a new infrastructure also influence downtime, cost and most importantly user perception of the systems. As cloud provides us the ability to be flexible, it would be wise to make full use of it by customising the migration for optimal results.