Thursday 1 March 2018

Not all is lost if your cube gets corrupted…

I ran into this very odd situation yesterday with the Reporting capability of my production TFS instance – I realised the Incremental Analysis Database Sync job and the Optimize Databases job were running for hours!

image_thumb2

I stopped the Incremental Analysis, and the Optimize Databases job completed successfully. Fine.
But – for whatever reason – my SSAS cube got corrupted! I couldn’t even connect to the Analysis Engine with SSMS. I also found errors in the Event Viewer pointing at a corrupted cube:

image_thumb1

Errors in the metadata manager. An error occurred when loading the 'Team System' cube, from the file, '\\?\<path>\Tfs_Analysis.0.db\Team System.3330.cub.xml'.
Errors in the metadata manager. An error occurred when loading the 'Test Configuration' dimension, from the file, '\\?\<path>\Tfs_Analysis.0.db\Configuration.254.dim.xml'.

Now, what to do? It looked like a full-blown rebuild was in order, and it is a costly operation, given that what the rebuild does is dropping both the data warehouse and the SSAS cube, rebuilds the warehouse with data from the TFS databases and then rebuilds the cube.

It is not like being without source code or Work Items, but still… it is an outage, and it is painful to swallow.

Now, in this case the data warehouse was perfectly healthy – the report shown an update age just a few minutes old. So all the raw data in this case is fine, and all you need to do is to rebuild how you look at this data.

The SSAS cube is just a way of looking at the data warehouse. If your warehouse is fine, just wait for the next scheduled Incremental Analysis Database Sync job to run, it will recreate the cube (thus making the Analysis Database Sync job a Full one rather than an Incremental one) without going through the full rebuild.

clip_image0025_thumb1

Why didn’t I process this myself by using the WarehouseControlService? Simply because the less you mess with the scheduled jobs the better it is Smilehiccups happen, but the system is robust enough to withstand such problems and pretty much self-heal itself once the stumbling block is removed.

No comments:

Post a Comment