Tuesday, 10 October 2017

Successfully handle disruptive changes with no downtime: the TFS 2015 example

It is something I’ve mentioned a few years ago but the question came out again during my presentation at x-celerate.de:

How should I handle a breaking change without service interruptions?

This is a brilliant question, and the best example I can give out is the TFS 2015 Upgrade that introduced support for renaming Team Projects.

If you don’t want to have downtime for your users the only mitigation is to introduce an intermediate migration layer which is going to be pouring data from the production stack and transform it into what you want.

The upside of this is that you are performing a very expensive and time-consuming operation out-of-band, so you can apply all the usual patterns for highly available application deployments.

The downside is that it is a costly operation, it could be compute, storage or something else but it will cost something out of it.

In my specific case I was able to perform a scheduled upgrade into the mandatory weekend window (yes, there was still a bit of downtime but it was due to the nature of the product and it was expected – you can overcome the hurdle if you are building your own product though) instead of having days of downtime due to the migration of data from a schema to another, at the cost of lots of storage space for the temporary tables and a dedicated server to run the tool.

No comments:

Post a Comment