Production Data Monitoring with Stored Procedures and PowerShell

“We don’t know what we don’t know.”

I hate that phrase, but in this case, it pretty much sums it up best: when we think about production monitoring, what do we normally think about? Is a server online? Can we connect? Is the SQL Agent available? How’s that drive space looking? What about CPU usage? There’s lots of great tools out there to monitor SQL server and the underlying OS’s that we run it on. Typically, we tend to think of “infrastructure monitoring” as “production monitoring.”1

Honestly? That’s a pretty narrow focus, and a potentially dangerous one at that. As a DBA you might not be completely in tune with what’s going on over in developer land: they may be releasing code that might be not working as they intended. Take a look at the following table as an example:

CREATE TABLE Customer.Usages
(
    UsageID bigint identity (1,1) not null primary key clustered,
    BusiessEntityID int not null,
    InvoiceID int not null,
    UsageUnitID int not null,
    AccountID varchar(15) not null,
    AmountBilled numeric(8,2) not null,
    UsageCreateDate datetime not null,
    RemittanceID int not null
)

We have a table that is storing some customer usages of… whatever our fictional business uses. There is probably some code that interacts with this table: maybe we do bulk processing or an ETL to load data to this table, or we have people manually interacting with an application that updates these particular rows. Since this table is (probably) integral to billing our customers, it’s a big deal to keep accurate and up to date.

As long as the code the developers are writing isn’t throwing .NET exceptions or SSIS errors or <insert whatever errors you’d see in an application here>, everyone is pretty happy. But how would we identify a problem with the data that goes into this table? Say for instance we have a bug somewhere that isn’t putting the correct account number in the table, how would we know?

“Bad news early is good news.”

Tired of these manager-isms yet? Sooner or later someone is going to find this bug. Developers are going to point fingers at the QA team for not testing enough, QA is going to blame the developers and/or business partners for not telling them about a particular use case, and you can sure you’ll either be called into help fix the problem, and depending on how your systems are used, you might also be looking at some downtime to correct all the bad data and fix the underlying code issue. Ouch.

Which is why when I say “production monitoring” I’m not just talking about hardware and software. We need to treat our data as part of that equation too. One way to do that is to write an audit stored procedure (or two) that looks at critical tables and makes sure “mistakes” aren’t being inserted, and if they are, starting throwing alerts saying so.

Using the example above, let’s say the root cause of the data issue is that our account numbers are not passing validation because of a bad check digit on the end of each one. So, we write a stored procedure called “Monitoring.ValidateUsageAccountID” that looks at all usages in our table and that we can successfully join to the accounts table. For simplicity’s sake, let’s say the stored procedure looks like this:

CREATE PROCEDURE Monitoring.ValidateUsageAccountIDs
BEGIN
    SELECT AccountID FROM Customer.Usages
    EXCEPT
    SELECT AccountID FROM Customer.Accounts
END

This is overly simplistic but you get the idea: we’re looking for all the AccountIDs from the Usages table that don’t exist in the Accounts table2. In practice, it’s probably more complex. The question is what do we do with it: how do we run it and how do we have it alert us if there is a problem? If only there was some technology that would let us do that…

I don’t know what you expected?

Of course, I am going to use PowerShell to get this done. The tricky part is: how can we call a stored procedure from PowerShell? It’s not as hard as you think. We’re going to load up the module and run a simple little one-liner. Please note, you’ll need to have imported the sqlps module. Just run Import-Module sqlps.

((Get-ChildItem -Path SQLSERVER:\SQL\LOCALHOST\DEFAULT\Databases | Where-Object {$_.Name –eq "DatabaseName"}).ExecuteWithResults("EXEC dbo.TestProc").Tables.Rows | Measure-Object).Count

Okay, so maybe it’s not a simple little one liner. Let’s break it down:

  1. First, we’re going to use the provider to go out and get a database object from a server. Since the Get-ChildItem cmdlet can take a path as a parameter, we’re passing it along the host name and instance name (‘default’ in this case for a non-named instance). I’m then filtering for a particular database name, the one where my stored procedure and data reside. This returns a database object.
  2. Database objects have a couple neat methods in them. In particular there’s two I am going look at here.executemethodsThese methods let us execute SQL against the database object in question. And just like they say, one (.ExecuteWithResults) returns results. Even from stored procedures! That’s what we’re calling on the database object, and we’re telling it to execute our procedure.
  3. That method returns an interesting object: a DataSet object. A DataSet object contains a collection of tables that each contain a collection of rows. With it, we can figure out the number of rows returned by the procedure. You might wonder why I am using Measure-Object here instead of just .Count. Depending on your version of PowerShell and the object(s) being returned, a simple .Count on a dataset that only returns one table with one resulting row might return 0, or 1, or sometimes a $NULL.. Measure-Object is more thorough and will always return an accurate count regardless of what we get back.
  4. The other nice thing about the dataset object is that each column in your result set becomes a property. That means, if the procedure was returning results, you could also filter those results in your script without having to modify the procedure. You could tack a Where-Object after the .Rows property via a pipe and then only select certain rows.
  5. Oh and one other thing: your results are in memory on the host where you ran the code. So if you had a lot of results, as long as you have the memory to hold it, you can do all sorts of “expensive” things (like sorting results) pretty quickly, without making SQL Server do it. The flip side to this is you need to be mindful of the number of results you are returning so that you don’t run whichever host you call this from out of memory.

Putting it into practice

Let’s say that in the audit above, if ANY rows are returned we need to start sounding alarm bells. You could do a lot of things: you could use built-in PowerShell cmdlets like Send-MailMessage and craft a subject and body to alert a distribution list. Or maybe you could integrate the script with something like SystemCenter and create custom monitor script to leverage an existing rule-based alerting that has been set up for other things in your environment. Since we’re calling this from PowerShell, all we really need is access to the module and we could run and/or schedule these monitoring scripts to run wherever we want!

The possibilities are endless. Just remember, that like any query you put into production, you’ll want to make sure you’re not overdoing it; if your monitors are resource intensive or could be blocked by other processes, you’ll need to really be careful before you jump headfirst into this. You should work with your business partners and developers to find out what needs monitoring. It’s one thing to be a hero in a crisis, but it’s another altogether to prevent one!

Footnotes

  1. You guys got the reference from my image, right? That’s a monitor. It’s a huge lizard. Right? Guys?
  2. Yes, I know: we could use foreign key constraints in this case to make sure values exist both places and reject an insert. I just wanted a quick example. You could apply this auditing logic for a lot of things.