Change Tracking Tables And Statistics

dbmhrw.jpg

Note: This is one of the blog posts that came about while I was troubleshooting an issue at my day job, but I didn’t think enough of it at the time to take a screen shot of the various execution plans. I’ll do my best to do it justice, though.

The business intelligence people were starting to get very worried.

One of their overnight processes that integrated data from a vendor was failing. Well, that’s not entirely true: One of the overnight processes that integrated changed data from a vendor was failing. And not just failing in the normal, easy to fix type. I mean, failing in that it was never finishing. Processes would get lost mid-thread, and do nothing but spin CPU cycles all night. Thankfully, using a 3rd party monitoring tool I was able to go back in time and see that a certain stored procedure was the culprit. In digging into the procedure and looking at the execution plan, nothing looked amiss. A stored procedure that used to work that’s running long or not finishing? Sounds like classic parameter sniffing to me. Short on time I took the easy way out. Stop me if you’ve seen code like this (or written it yourself):

ALTER PROCEDURE dbo.BoatMurder
@integarval int
WITH RECOMPILE
AS
...

Ah yes, WITH RECOMPILE. The poor man’s homeless man’s performance tuning. Not that it doesn’t have its place, but I didn’t have time to really dig into it and the data needed to get there. We ran the procedure manually from my terminal and lo and behold, success. The data appeared and all was right in the world. At least, until the following night when it had to run again. We saw a similar issue where the procedure never finished.

The business intelligence folks, again, were concerned. It was time to roll up my sleeves and get to the bottom of this

Change tracking tables are still tables

The tl;dr version of what the stored procedure was doing was looking at a particular very large table that had change tracking1 enabled. For those that haven’t used it before, there’s a lot of great technical resources out there on MSDN that I’d recommend looking into. In a nutshell, when a table is change tracked SQL Server creates some under the covers views and tables to track what specifically is changing in a table. This is different from change data capture 2, which is a lot more expensive. Rather than telling you what something changed from and what it changed to (versioning), change tracking just lets you know something changed, and if you care about doing incremental data loads, this is a pretty handy thing to use.

The “eureka” moment came when I dug a little deeper into the plan, when I saw this:

changetrackingtablesseek

Except it wasn’t a happy index seek. It was a scan. And a big one at that; instead of a paltry 429 rows, the engine estimated that there would be hundreds of thousands of rows to return. Since I know there aren’t that many changes going on, I was a puzzled. After I did some searching, I stumbled upon Kendra Little’s excellent performance tuning change tracking article. In it, she mentions out of date statistics on change tracking tables. The object in question sure looked like a table, and it was estimating the wrong number of rows. Time for a statistics update! Could it really by so simple as:

UPDATE STATISTICS sys.change_tracking_315968302

Sure enough, that did the trick. The plan instantly got happier. And if you look at the code above, it’s not too hard to decipher what SQL Server is doing: it’s creating table objects in the sys schema that contains a bunch of change data, and simply tagging the object_id after the table name. The real question is, why isn’t this happening automatically if enough data changes?

Should I add a maintenance plan for change tracking?

Of course, it’s the old tried-and-true DBA response: “It depends.” Normally you could grab a quick look-see of out of date statistics using something like

DBCC SHOW_STATISTICS (change_tracking_315968302)

Except that doing that, heck even selecting from these tables, appears to be protected and taboo as far as the server is concerned. And it doesn’t appear that Ola Hallengren’s excellent index optimization script (which, lets face it, you should already be using anyhow) will pick it up, either. With that in mind, I offer you the following handy stored procedure you can run should you start to see some strangeness with jobs or code that start to act up.

RefreshChangeTrackingTableStatistics

CREATE PROCEDURE RefreshChangeTrackingTableStatistics
    @DatabaseName nvarchar(255) = null,
    @OutputResults bit = 0
AS
BEGIN
    SET NOCOUNT ON
    CREATE TABLE #ChangeTrackingDatabases
    (
        RowNumber INT IDENTITY(1,1) PRIMARY KEY,
        DatabaseID INT NOT NULL,
        DatabaseName NVARCHAR(255) NOT NULL
    )

    CREATE TABLE #ChangeTrackingTables
    (
        RowID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
        DatabaseID INT,
        DatabaseName NVARCHAR(255),
        TableObjectID INT,
        TableName NVARCHAR(255),
        SchemaName NVARCHAR(255),
        CodeToRun NVARCHAR(MAX)
    )
    IF @DatabaseName IS NULL
    BEGIN
        INSERT INTO #ChangeTrackingDatabases (DatabaseID, DatabaseName)
        SELECT ctd.database_id, d.name FROM sys.change_tracking_databases ctd
        INNER JOIN sys.databases d ON d.database_id = ctd.database_id
    END
    IF @DatabaseName IS NOT NULL
    BEGIN
        INSERT INTO #ChangeTrackingDatabases (DatabaseID, DatabaseName)
        SELECT ctd.database_id, d.name FROM sys.change_tracking_databases ctd
        INNER JOIN sys.databases d ON d.database_id = ctd.database_id
        WHERE d.Name = @DatabaseName
    END


    DECLARE @DatabaseID int
    DECLARE @CurrentDatabaseName nvarchar(255)
    DECLARE @dynSQL nvarchar(max)
    DECLARE @Counter INT = 1
    WHILE @Counter <= (SELECT COUNT(RowNumber) FROM #ChangeTrackingDatabases)
    BEGIN
        SELECT @CurrentDatabaseName = DatabaseName, @DatabaseID = DatabaseID FROM #ChangeTrackingDatabases WHERE RowNumber = @Counter
        SELECT @dynSQL = 'USE [' +  @DatabaseName + ']; INSERT INTO #ChangeTrackingTables (DatabaseID, DatabaseName, TableObjectID, TableName, SchemaName) SELECT ' + CAST(@DatabaseID AS nvarchar(255)) + ', ''' + @CurrentDatabaseName + ''', st.object_id, st.name, ss.name FROM sys.change_tracking_tables ctt inner join sys.tables st ON st.object_id = ctt.object_id inner join sys.schemas ss ON ss.schema_id = st.schema_id'
        EXECUTE sp_executesql @dynsql
        SET @Counter = @Counter + 1
    END

    SET @Counter = 1
    WHILE @Counter <= (SELECT COUNT(RowID) FROM #ChangeTrackingTables)
    BEGIN
        SELECT @dynSQL = 'USE [' +  DatabaseName + ']; UPDATE STATISTICS sys.[change_tracking_' +  CAST(TableObjectID AS nvarchar(255)) + ']' FROM #ChangeTrackingTables WHERE RowID = @Counter
        UPDATE #ChangeTrackingTables SET CodeToRun = @dynSQL WHERE RowID = @Counter
        EXECUTE sp_executesql @dynsql
        SET @Counter = @Counter + 1
    END

    IF @OutputResults = 1 SELECT * FROM #ChangeTrackingTables

    DROP TABLE #ChangeTrackingDatabases
    DROP TABLE #ChangeTrackingTables
END
GO

 

The script itself is pretty simple; you can supply it with a name of a database and, if there are tables that are changed-tracked, it will generate some dynamic T-SQL and update the statistics. The only other parameter is @Output which will just show you the commands it ran, if you’re into that.

So, why bother?

The real question is, why do we have to do this? Normally, table statistics get updated automatically based on certain circumstances3… but just like everything else, sometimes you need to take matters into your own hands. Hopefully, the above code will help you if you find yourself in a situation similar to me.

Footnotes

  1. https://msdn.microsoft.com/en-us/library/bb933875.aspx
  2. https://msdn.microsoft.com/en-us/library/cc645937.aspx
  3. https://msdn.microsoft.com/en-us/library/ms190397.aspx