Hey, have you heard of this term everyone’s using these days: DevOps? It means a lot of different things to different people, but one of the cruxes is that your new or changed features of a software product are deployed to production through automation. If you ask around, you’ll find that there’s a lot of different interpretations of how much of a given IT’s shop is being deployed through automation. Some people are content to do strictly software deployments, some also include automated database deployments, and some go whole-hog and even do infrastructure changes through automation. And some, well, don’t do any automation.
Where I work, we fall into the “more automation” camp, and it’s great. In the time I’ve been there, we’ve embraced and enhanced our deployment processes, and being part of the developer operations team as a DBA I get to play a pretty large part in continuing to improve our process. It’s super amazing stuff.
However, there’s a tougher side to automated deployments, and that’s when someone checks in a change that might not break a non-production environment but can potential wreak havoc in production. I have that problem in the worst possible way; where I work, several of our production SQL Servers are running Standard Edition, but our non-production environments run developer edition. Which means sometimes scripts to enable to use certain features get checked in and run fine when the changes are deployed to everywhere but production. What about DML scripts, too? Since we rely pretty heavily on replication, big data changes we don’t know are being checked in can cause a lot of heartburn for our subscriptions, too.
These were causing us a lot of problems and making for some unexpected issues during deployments, which is the exact time you don’t want issues to come up. I could crack open a build and take a look at all the scripts that are checked in, but when you have hundreds (or thousands) of scripts going as part of a deployment, that’s pretty dang tedious. Time for some automation. I wrote a new function that parse and return information about your deployment scripts, so you can get information like this:
Go ahead and meander over to my repository to get the code and try it for yourself. I’ll go into some detail about how it works below.
When you load a library with Add-Type, you have a few different options at your disposal to actually loading it. In my example, I’m just using Add-Type and then giving the library name. This works if the library is part of your assembly cache. Since I have SQL Server installed locally, this works fine since it’s put there by the installer. However, if you want to run this example on a machine that doesn’t have SQL Server installed, you need to copy the DLL to your machine somewhere, and give a physical path to the library with the -Path parameter.
So the goal, then, is to proactively review scripts that get checked into a branch (or are part of a build that will be deployed to production). You could write a quick and dirty parser to read a file, split it up, and look for keywords. That’s actually a lot harder than it sounds, and you have to account for things like brackets vs. not using brackets, single vs. quoted identifiers, and of course, parsing errors. It’s messy, cumbersome, and not at all very fun. So is there a better way?
“Good artists copy, but great artists steal” applies here: whether you know it or not, you already have an amazingly accurate T-SQL parser at your disposal. In fact, it’s so amazing, it’s the same one SQL Server uses to parse your queries before it algebrizes them. Seems to me that if it’s good enough for your SQL Servers, it should be good enough for you. Since we’re using PowerShell for this project, loading the libraries that we need is really easy: we’ll use Add-Type. Loading types (or libraries in PowerShell) is simple:
Add-Type -AssemblyName "Microsoft.SqlServer.TransactSql.ScriptDom,Version=13.0.0.0,Culture=neutral,PublicKeyToken=89845dcd8080cc91"
When you use Add-Type to load a library, in your session, all the .NET classes and namespaces of that library are loaded. That means you can instantiate them with New-Object . In this case, we’re going to load a new object of type TSql130Parser:
Add-Type -AssemblyName "Microsoft.SqlServer.TransactSql.ScriptDom,Version=13.0.0.0,Culture=neutral,PublicKeyToken=89845dcd8080cc91" $Parser = New-Object "Microsoft.SqlServer.TransactSql.ScriptDom.TSql130Parser"($true)
One of the difficulties of automating this code is that, depending on which version of the DLL you loaded impacts what version of the object you can create. Each version of SQL Server has it’s own namespace for the parser. You’ll need specify that when you’re loading the library AND when you’re creating your objects. In my final code I basically loop over supported versions till I find out, or throw an error if I can’t.
Those two lines take care of loading the library we need and making a new object, $Parser , to hold our object that is going to, uh, parse our queries. With the object now created, we can load up a saved T-SQL script file with a System.IO.StreamReader object, and then .Parse() the code:
$Reader=New-Object System.IO.StreamReader("<path to a file name>") $Errors= $null $Fragment=$Parser.Parse($Reader,[ref]$Errors)
When we initialize the object, we have to give it two parameters: one, where is the source data coming from (our StreamReader) and what object to hold any parse errors in (the $Errors variable). This second part is super important: if you do have a parse error, you won’t see it in your resulting object. You only get object(s) back that were successfully parsed. If you want to see errors, you have to deal with them separately in their own object.
The object returned from .Parse() holds a TSqlFragment object, which contains information about the code you just parsed. It has an interesting hierarchy:
The base return is the TSQLFragment object, which in turn has a Batches object, which in turn holds… well it can hold a lot of different things. When the text is parsed, it will determine what type of object to return based on the statement it determines it is. For example, if it’s an insert statement it will be a certain type of object with a given set of properties and methods, and if it’s, say, a create index statement you’ll get different properties, such as which table or view is getting the index along with the indexed columns and included columns. It really is interesting.
But interesting can a double-edged sword: since the statement object that gets returned can be different for each parsed piece of code, that means to set up any type of intelligence around the stuff we’re dealing with, we need to check for very specific objects. In my code, I create a custom class to hold details about each type of object I might encounter, and how to deal with it when I encounter it:
Class ParserKey { [string] $ObjectType [string] $SchemaSpecification [string] $TallyVariable ParserKey ([string] $ObjectType, [string] $SchemaSpecification, [string] $TallyVariable) { $this.ObjectType = $ObjectType $this.SchemaSpecification = $SchemaSpecification $this.TallyVariable = $TallyVariable } } $ParserKeys += New-Object Parserkey ("InsertStatement","InsertSpecification.Target.SchemaObject","InsertStatements") $ParserKeys += New-Object Parserkey ("AlterTableAddTableElementStatement","SchemaObjectName",$null)
In the above example, I am declaring a custom class, and then adding in two objects to it. The first is the name space of an insert statement, and the second is the name space of an alter table statement. Since the statement object of both of these batches is different, I need to know where to look for what object is being queried/altered. This is how I set up my catalog of statements to look for. So later on, when I look at each statement, I can do this:
$SplitDefinition = (($ParserKeys | Where-Object {$_.ObjectType -eq $s.gettype().name}).SchemaSpecification).Split(".")
This lets me find the object that holds my “breakdown” of the child object, and then know where to look (and loop through) a namespace to get my target object info.
With a script like this, you have an incredibly powerful and fast way to police what’s getting deployed to your servers. Ideally, as part of any non-production deployment you could include code like this in your deployment and have it save off the details of the findings to a file, or even email it. As for me, if I see a bunch of new indexes being added I might want to ask the question why or, if you’re a DBA that deals a lot with replication and you see a bunch of statements that drop some tables, you can know about it ahead of time (before it breaks the deployment since you can’t drop replicated objects).
Code like this is why I love PowerShell for automation, especially as a DBA. Imagine the time and effort you’d need to manually review these files? Or, imagine the time and effort it would take write your own parser to do the same. That’s one of the reasons PowerShell is so powerful and flexible: we can use tools developed by people that already do this, and in this case, use the same tools that the actual product uses.
I’d love for you to give the code a whirl yourself, and see what you think. Thanks to the custom class I included, once you’re familiar with the operation you can easily extend it to look at other statement types. Feel free to comment on the repository or below with your feedback, as I’d love to hear if it helps you!
Pingback: Policing Database Deployments – Curated SQL
“In my code, I create a custom class”…
You just casually create a custom class…
Just casually created a class…
I need, I need to go sit down for a while, maybe read a PowerShell book or 6…