Wednesday, 30 May 2018

A story of high availability with SQL Server AlwaysOn and TFS

A few weeks ago something happened on our TFS instance - we discovered that DBCC CHECKDB under certain conditions can mark a database as corrupted.

Long story short, this was due to a peculiar condition related to a high volume of transactions during that operation, not something you see every day. Microsoft Support was really good helping us getting back to normality.

In retrospective, what really hit me was how resilient TFS was thanks to SQL Server AlwaysOn. As you know, I am a huge fan of AlwaysOn because of how transparent it makes High Availability.

For us, maintaining availability meant a simple failover to the other node. Given that we are running the Availability Group with Synchronous-Commit Mode (my default choice when it comes to TFS) the then-Primary Replica was already updated to the latest transaction, so there was no data loss. 

Team Foundation Server did not lose a single heartbeat. When things go south like this, during the issue itself and if you are doing something during the failover you will get a JobInitializationError, which is self-explanative. As this is a transactional system by design, nothing is left hanging in the balance like good ol' SourceSafe :)

Of course we were in limited availability while we were troubleshooting and fixing this problem (always change the Failover Mode to Manual when you are doing so), but there was no downtime.

Also talking recovery, at the end of the day we had to restore backups on the Secondary Replica to get back to a proper synchronisation. Again, a bit tedious and time consuming given the sizes involved, but it was flawless.

Tuesday, 22 May 2018

Small details carrying a huge value

I was reading this post by Microsoft Premier Developer’s blog, and it was a nice throwback to past times where I had to deal with these type of requests because of the existing process in place.

I also thought about how easy it became customising a process with VSTS compared to TFS, and the first thing that sprung to mind was to pair this up with the Board Styling options:

This will cause cards that are unassigned to a single individual but assigned to a group to be highlighted in the board:

There can be so may reasons why a team might choose to do this – and it does not just apply to product development. Think about situations where telemetry operators escalate events or tickets are integrated in the backlog.

Why am I focusing on such small details? Well, this is the kind of personalisation (I cannot really call them customisations 😊) that enable cross-role consumption of the stack. 
It does not have to be anything extremely complicated, but whenever you can bring an existing process inside the tool in a frictionless manner you are already paving the way for a better reception and adoption of the tool itself.

Friday, 11 May 2018

Elevate your telemetry from silo to valuable data source

I am going to speak at DevOpsDays Kiel next week about telemetry, and I was thinking about how much Application Insights evolved in the last few years.

Without mentioning the awesome Application Insights Analytics, I was really pleased with how easy it is to bring valuable data to the forefront.

For example, this was there pretty much since the inception:

It’s great, but it is kind-of-buried in the detailed information provided. What I really enjoyed on the other hand, was this:

This is an organic and straightforward way to escalate a single piece of information. Why you ask?
Well, because the previous screen is a summary, with a single button named Operations in a pane called Take Action

So, from a UX point of view, it comes natural to dig into the details of a single request raising an exception and promote that information to an actionable backlog item.

A development team does not (usually) need quantity, it needs quality in order to fix problems raised by telemetry. It is the natural evolution of telemetry systems to be able to integrate with DevOps stacks in an effortless way – the real challenge is doing so without being excessively verbose, but still providing the much needed value to close the loop.