Finding hay in a stack of needles
Say what you want about linked servers, but chances are you’ve probably got some lurking in your environment(s). They’re not great, they’re certainly not loved, and everyone seems to think we can all live in a world without them.
In reality, it’s more than just hand waving. As much as I’d love to tell people to “just write services” (or, gasp, use Service Broker) that doesn’t always win out. Linked servers aren’t a problem, until they are: maybe a server goes offline, or maybe a server moves, or if you’re like me on a recent project, you’re attempting to consolidate a database on one server to another.
Linked servers can throw a big monkey wrench into something like that. Here’s an example: let’s say instance “A” has one database that are referenced by linked servers from instances “B” and “C.” Here’s a simple diagram to help illustrate what I mean:
Instances “B” and “C” both have linked servers back to instance “A.” With me so far? Now, what happens if we wanted to consolidate the database into instance “B?” Well, check out this updated diagram:
Databases A and B are are on the same instance, while C is still all alone down there. Without doing anything and just moving the databases, our two linked server logical connections are now broken. Any objects that used the old linked server reference to database A when it was on the old instance will break (the database isn’t even there anymore, let alone if we leave the instance running) and the linked server reference from database C to database A falls into the same boat.
Here’s what we’d have to do:
- The first fix is pretty easy: we could just re-point any linked server references on the instance where database C lives to point to the server that now contains database A and B (so long as the database name stays the same). Likewise, we’d also need to define any other linked server references that database A uses that might not exist on the new home instance, too.
- More sinister than that, though, is what do you do about cases where you had a linked server reference in database B that pointed to a database A on instance A? In order for everything to keep working without any modifications, you’d have create a “loopback” linked server on the new home instance for database B with whatever your queries are using.
The whole reason we had a linked server in the first place is because the database we wanted was on a different server. If we move the database off of instance A and onto instance B that would negate the whole reason for the linked server in the first place, right? Say what you want about cross-database joins, but at least they can be a lot better than dealing with linked server (and probably perform better, too).
There’s other considerations, too, like logins and permissions and other objects, but linked servers can be the biggest pain in the rear because: even though the linked servers are defined, do you really know what objects in your database use them?
I know what you’re thinking: “Drew, can’t I just do a find and replace for my linked server names and cut them out, or replace them with a different one?” Sure you can. We already have some pretty good tooling to do this, like the free (and excellent) SQL Search from Redgate. You can drop a linked server name into it and it’ll find everything that contains a text phrase. Or, you could write some T-SQL to output the definition of an object and do pattern matching that way.
And it works. Kind of.
Let’s look at a couple examples. First, here’s SQL Search looking for objects in a copy of AdventureWorks2014 that contain the name of my linked server, “Castor:”
As you can see, this finds that I have one view that hits it. So far, so good: I can easily go to this object and alter it (or check in a change to source control and deploy my changes. Life isn’t always so easy though. Let’s say I have another linked server named “HumanResources.” See where this is going?
SQL Search is doing exactly what it’s designed to do here: it’s finding every object that matches that string, HumanResources. It’s unfortunate that it also happens to be the name of my linked server and a schema in my database, but such is life, right? Its returning every object it hits a match on, which includes a bunch of views, plus one of those views actually contains my linked server reference.
I love SQL Search
Let me be clear: I’m not here to gang up on the fine folks at Redgate. This tool is beautiful and I love it. Otherwise, how else could we quickly search for objects in our databases? The alterative would be either querying system views for object definitions or using cursors to call sp_helptext over and over, and then trying to do pattern matching. Same as what SQL Search does.
There’s no easy way to sort this wheat from chaff, is there? This might be a starting point; let’s narrow down the search the objects we might need to look at. Then, we’ll manually script each one out, one at a time. That sounds an awful lot like a manual process. “If only there was a way to automate this checking”, he asked, sarcastically.
What defines an object as having a linked server reference, really? It’s pretty simple because they use four part identifiers:
It goes beyond a little bit more than pattern matching, doesn’t it? We just can’t match on a linked server name; we have to match on a series of patterns. Four in a row, to be exact. To the human eye, that’s really easy to notice. For pattern matching it gets a little trickier: how can you write a matching algorithm to look for four identifiers in a row? Oh and by the way: what about dealing with square brackets vs non-square brackets? If you’re feeling extra spicy, what about a mix of square brackets for the server name, but nothing else? Or just the database name? Spaces before and after them? This can get complex pretty quickly, and unless you’re really good at writing a regular expression (and maybe even if you are), this quickly becomes a very tricky technical exercise.
Instead, we can be a little more creative. In a previous blog post, I wrote about how you can parse T-SQL script files with PowerShell to get a feel for what is in them. Using that as an example, we can do something similar to look at object DDL code to ascertain what uses a linked server, and what doesn’t. Let’s start by actually getting some object definitions. As a for-instance, let’s say I wanted to look at all the views on my server. I could get them pretty easily using the sqlserver PowerShell module:
#Import the SQLServer module
#Get the views from my local database
$views = Get-ChildItem -Path SQLSERVER:\SQL\SQLSERVERC\DEFAULT\Databases\SomeDatabase\Views
$views contains and array of objects that corresponds to all my views. Of all the methods and properties on these objects, there’s a really handy one called
.TextBody . It’s exactly what it sounds like: it’s the DDL code that makes up the object. So let’s say I called it for the first object in my array list, here’s what I’d be looking at:
Now that we have the object code, let’s do what we did in my other parsing example: let’s send it over to a Parser object. To do that though, we need to be… creative. Check out this code block:
#Load the parser DOM assembly
Add-Type -AssemblyName "Microsoft.SqlServer.TransactSql.ScriptDom,Version=184.108.40.206,Culture=neutral,PublicKeyToken=89845dcd8080cc91"
#Create a parser object
$Parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql130Parser($true)
#Load the object DDL code of the first view to a stream object
$memoryStream = New-Object System.IO.MemoryStream
$streamWriter = New-Object System.IO.StreamWriter($memoryStream)
$memoryStream.Position = 0
$streamReader = New-Object System.IO.StreamReader($memoryStream)
#Parse the code of the view
$Errors = $null
$Fragments = $Parser.Parse($streamReader, [ref] $Errors)
#Get all the tokens
$Tokens = $Fragments.ScriptTokenStream
Whew, there’s a lot going on up there. Let’s break it down real fast:
- The first thing we’re doing is manually loading an assembly with Add-Type. If that looks daunting and a little bit gross, it’s because it is. That’s why I advocate for the sqlserver module so much. It saves you having to do this all the time. However, for the functionality we need we’re going to have to get our hands dirty. First, we call
Add-Type to load the ScriptDom assembly. Notice that I have to give it a version. I’m using version 13 (SQL Server 2016) since that’s what I have installed on this machine. This library is available on any machine that’s got SQL Server installed. If not, you can always copy this library manually to your local machine and load it via a direct path. Just remember to change your “13” to whatever your version of SQL Server is!
- Next, we’re creating an object of type TSql130Parser. Notice that 130 in there? Yup, you have to create a parser object that matches the version level of the library you loaded in the previous step. Yuck.
- So now I’ve got my
$Parser object that can take in object code, but I can’t just send my object’s
.TextBody into it. That’s because the
$Parser object we created only accepts streams as input. And our
.TextBody isn’t, it’s just a simple string object. To get it into a stream so the parser can read it, we’re going to create a new MemoryStream object.
- Once our memory stream object is created, we need to write our string to it. To do that, we need another object, a StreamWriter. We create this object, and tell it that the stream we want to write to, in this case the MemoryStream object we created one line above. Next, with out freshly minted StreamWriter, we’ll call the
.Write() method to write our string to the MemoryStream. To make sure we write it all (because we’re using buffers), we need to call
.Flush() to make sure everything makes it out in the
- The next step is crucial: we need to “reset” our MemoryStream’s pointer. When we wrote our object code to the stream, the position of the cursor is at the end of the string of DDL code. Before we can read the entire string from the MemoryStream, we need to reset it’s pointer position to the start, or position 0. Just like an array, position 0 is the start of the entire stream.
- It’d be so easy if we could just use our MemoryStream as the object for the parser wouldn’t it? Which means we can’t. Instead, we need another object, this of type StreamReader, to read the stream back to us. We declare it with another New-Object, and pass it the stream we want to read (again, our MemoryStream).
- Once we have our StreamReader object (which is in the variable
$streamReader ) then we can use our parser. We’ll the
.Parse() method, and pass it too parameters: the first is our StreamReader, and the second is an array to hold any errors the parser spits back at us. We’re going to store the results of the parsing in our variable
- $Fragments now contains a lot of interesting things. If we called Get-Member on this object, here’s what we’d see:A lot of these properties look interesting. For instance, the Batches property breaks down each specific batch of statements. However, I’m more interested in the ScriptTokenStream property. What does that have? Well, let me show you. I’m going to pass that property to Group-Object and then Sort-Object:If you select the ScriptTokenStream’s TokenType property, you can see for each part of your DDL what type of text exists in the DDL! For instance, we can see that there are a total of 32 quoted identifiers, 30 regular identifiers, 17 left and right parenthesis (makes sense), 99 spaces… all pretty interesting stuff! So how do we put all this together? Well, if we just call the TokenStream property on it’s own…… and we just have to look for four identifiers (quoted or not) in a row. In my sample code, I did a mix of them, and here we can see them in the stream.
There you have it: our core process. We’ll call the DOM parser for each object’s text body, check for four identifiers in a row, and keep track of what we find. Throw in some more object types like views, stored procedures, triggers… and you have a really nice way to quickly check all your objects for linked servers. Pretty neat, right? So how can we “productionize” it? We’ll write a function.
I went ahead and wrote a function that does just that, using the core code above. You can find it over on my github. The function will return objects that represent each thing if your database that contains a four part identifier. Here’s a quick demo:
In the object being returned, you can see each database object that contains a linked server reference. It will tell you the name of the linked server as well as which database and schema the referenced object is in remotely.
This can be a really good way to not only discover these objects, but there’s a bonus in that you can quickly get their object definition as well. I’m using a default property set to hide it, but if you do a Select-Object * on the returned objects, the definition is there too, which you can quite easily do .Replace() methods on and/or write them out to a file should you want to work on changes. Like this:
Currently, the function searches for linked servers in the following database objects:
- Stored procedures
- Table triggers
- Database triggers
I encourage you to go take a look at the code and try it for yourself (in non-production, of course, till you’re confident). I wrote this because I’m currently going through the pain of trying to consolidate servers and without taking special care here on both the server I’m migrating from and the server I’m migrating to, I may break applications or business processes if care isn’t taken to do this kind of discovery.
So there you have it: using PowerShell to first gather up a bunch of different types of database objects, then using existing libraries to do the hard part, which is parsing every single character in a given T-SQL statement and the just turning it into a counting exercise. Why stop there, though: this is a great first step at discovery but what about extending this functionality to help you generate automatic change scripts to drop the linked server identifier from a query, or even replace it with a new one (to handle if you were moving to a completely different instance with an already-defined linked server name, and you just want to rename which one an object is referencing).
That’s one of the many reasons I love PowerShell for automating stuff like this, and hopefully you do too. I’m working on another function that can take objects and either replace or remove a linked server reference, but I’d love to see what you can come up with. Hopefully you’ll find a use for this, even if it’s in a quest to help you find and eradicate (not just rename or replace) linked server references. Thanks for reading, and I’d love to hear your feedback. Don’t be shy about reviewing my code and sharing pull requests, either!