Building a Better Get-SQLErrorLog

Back in July of 2016, Microsoft made an exciting announcement: they were finally (finally!) adding new cmdlets to the SQL Server PowerShell provider. Along with some new management cmdlets around SQL Server Agent and Always Encrypted, there was one that caught my eye immediately: Get-SQLErrorlog. I’ve always wanted an easier way to get error logs from SQL Server, and this new cmdlet seemed perfect. I downloaded the update and I gave it a whirl.

At first, everything looked pretty promising: I give the cmdlet an instance (either via name or path) and it returns error log entries. There’s also support for time ranges via the -Before , -To , and -Timespan  parameters. It sounded too good to be true.

It turns out that it kind of was.

The current cmdlet

If my QA servers had a Facebook page, their relationship status would be “it’s complicated.” They always have plenty of issues, so they have lots of error logs to parse. I fired up the new module ( Import-Module sqlserver ) and gave it a whirl:

And I was greeted by a deluge of text in my console:

get-sqlerrorlogresultswithtime

That’s the end of the output chain, and that number underneath the last line is the result of a Stopwatch class; I wrapped the cmdlet in it and used it to output the run time in milliseconds (which comes out to just a hair under 42 minutes to return 158MB of log files). It looks like it’s putting out an object of some type so I changed my command to put the results in a variable and then I wanted to see what the object looked like:

getsqlerrorlogobjectinfo

That’s quite an object we have there. The good news was that it was a lot faster: it didn’t have to pipe the entire contents to my screen. But the resulting object was pretty big. And it had one pretty big drawback: I couldn’t filter on error types, severity or message. I mean, I could… but I have to parse the Text property.

But that’s not all. I was curious about how the cmdlet was figuring out what error logs to use, or where they were located. So I fired up an extended event session to look for it. Here’s what I found:

getsqlerrorlogobjectextendedeventdata

You can’t see all the text data from the statement fields, but it’s collecting a bunch of information from some dynamic management views and returning the info from the client. If you’re looking at that and wondering why I think it’s a big deal, here’s why: can you think of a good reason why you’d want to look at your error logs? I can: when my server’s down! I can’t use this cmdlet if the server is unreachable. That’s a pretty big swing-and-a-miss, at least to me.

Dreaming a little bigger

It doesn’t have to be so bad, though, because we can make it better. In my mind, my perfect error log cmdlet should:

  1. Be usable when a SQL Server instance is down, and
  2. Be relatively quick, and
  3. Parse error messages by number, severity, and state, and
  4. Incorporate date/time range filtering, and
  5. Return an object

What’s that, you say? You want to solve each of these and write our own, better cmdlet? I got you, fam. At the bottom of this blog post you’ll find my code, and you can skip ahead if you don’t care about the “hows” or “whys” of what you’ll be looking at.

Where my log files at (y’all)?

First things first: we need a better way to determine where the logs files for a SQL instance are located without the need to query SQL server. Fortunately, there’s a way to do that: each time SQL Server starts, it writes information to the Windows Event Log with event ID 17111, as shown below.

sqlservererrorlogeventid

There’s a handy cmdlet included with Windows named, you guessed it, Get-EventLog  that allows us to filter on Event IDs. We can use it to find the event IDs in question, and parse out the path where the logs are being stored. In this case, my QA server drops them on a disk named K: and in the SQLData\Logs with a filename of ERRORLOG. We can do that with one handy line of PowerShell:

That line of code does quite a bit: it connects to the event log of our local PC and finds all events with an ID of 17111 and a source of “SQLSERVER.” It then gets the very first one in the list (the most recent startup) and strips out the first part of the message and the trailing single quote and period, leaving us with the path and filename of the latest error log.

This is good, but it’s not good enough: we still need to be able to run the Get-EventLog  cmdlet on a target server, and we need to make sure we can filter on the correct instance of SQL Server (if we’re running a named instance, or if we’re running multiple instances). To do that, our script will make use of the Invoke-Command  which will let us execute a command on remote computer. It will also take our server instance parameter and use that to filter the event log event further. You can see this in action in the code at the bottom of this post.

Gotta go fast

Once we have our error logs, the next thing we have to do is load and parse the text. Unfortunately, this isn’t something that regular old PowerShell is great at. That’s because a lot of PowerShell cmdlets that read files have pipeline support. That means every line of text it reads, it passes through the pipeline. This is INCREDIBLY slow. But we can turn that behavior off, and you can see the difference for yourself. The following two lines of code do the same exact thing, but the gulf in performance is breathtaking.

Why the difference? The second line uses a parameter of Get-Content called -Readcount . It controls how many lines of text are passed through the pipeline at one time. A value of zero says “pass everything at once,” which is a lot faster.

Matchmaker, matchmaker

Okay, so we’ve got the location of our error logs and we’ve got a method to quickly retrieve them. Now for the tricky part: parsing them. Text parsing in PowerShell isn’t great; there aren’t a lot of native libraries that do anything special with large collections of text. Once our error logs are read from disk, we need to find specific errors in the error log and parse them out. We could go one line at a time, but that would be painfully slow. There’s one way that does provide a pretty nifty way of finding particular lines of text: a regular expression.

The regular expression I wrote will find all lines of text in your error logs that look like this:

So to match those lines, I’m using the Select-String  cmdlet and using the -Pattern  parameter to pass along a regular expression:

(Note: If you don’t use regular expressions often, that line probably doesn’t make a lot of sense. I don’t use them very often, but I’m familiar with them. That line right there? That was the hardest part of writing this script. There are a ton of great regular expression editors and tutorials out there. They’ll do a better job of teaching you about them than I can.)

Our matching pattern will find all the lines that match this text, and return the line numbers that contain them. And it’s relatively fast; not instant, but not terribly long either. And once we have the lines that match this text, the SQL Server error logs always put the message of the error on the following line. We just have to use the line numbers of the error numbers, add one, and boom, we get our errors with messages that we can write to an object.

Divide and conquer

Once we know the line numbers, it just becomes yet another text parsing exercise. If you’ve done any text parsing in PowerShell, you might know it’s slow. So then the question becomes: how can we speed it up without loading custom libraries or modules? The same way SQL Server breaks up expensive queries: we’ll multi-thread it.

Multi-threading in PowerShell is not something I have done much of, but there’s a couple different ways to do it. To do it “properly,” we need to use runspaces. The simplest way to describe a runspace is that you write a script of code, you give it to a runspace, pass it some parameters, and it goes and runs it. The hard part comes after you tell it to run. You need to monitor the runspace to see if it completes, you need to retrieve the results of the processing, and you need to dismiss it when you’re done. None of this is done automatically (short of closing your PowerShell session, that is).

Complicating things is our need to not just make one runspace, but to make multiple. Here’s a snippet of code that creates the runspaces:

We create a runspace, then we add some properties: the script we want to run (a previously declared scriptblock variable), and any arguments that need passed to it. In this section, I’m also collecting all my runspaces into one object so I can start, stop, collect, and dispose of them easier. Then we tell the runspaces to well, run. We kick them off like this:

Now we wait and monitor. The loop below will iterate indefinitely until they all the runspaces say they are done:

And once they’re done, we need to collect the returned results and dispose of them:

Putting it all together

I’ve written enough; you probably want to just see the code. I have plenty of -Verbose  support in the script and some stopwatch support to report on how long it takes it process each step.

As for use, here’s a concrete example: I’m going to collect the error logs from my QA server, and filter for all failed logins (error number 18456). Simple, effective, and it’s a great way to find all failed logins for an instance:

 Final notes and thanks

Is this code perfect? Goodness, no. I could continue to improve upon it; for example, I’m passing the entire contents of the file to each runspace to parse. That’s expensive. The script also doesn’t return informational messages and instead only parses out actual errors. I wrote it to demonstrate what I feel is a more useful implementation of an error log parser. I am also omitting the ability to filter for error numbers/severity/state numbers in the cmdlet itself and instead am relying on the user to filter it.

Ultimately though, I’m hoping that this kind of functionality (and the need for it, in my opinion) is something that we’ll see incorporated into future versions of the Get-SQLErrorLog cmdlet. I’m hoping that this gets enough exposure for that to happen.

There are probably a multitude of ways this script can be improved, and you can even help by contributing. I put this code out on my Github for you to download and play with it. It comes with the standard warning, as always: <Deep breath> “Don’t just run code you found on the internet in production. Test it first, understand it, and then see it can meet your needs before going to production. I can’t assume responsibility if you mess up your servers.”

This script was a long time coming; I’ve been writing it on and off for a couple months now. I couldn’t have finalized it, though, without the help of some very special people:

  • Paul Bahler (@paulbahler), who I work with and is amazing programmer, a good friend, and also currently smells like blueberries. He also was my go-to resource for my regular expression work.
  • Aaron Nelson (@sqlvariant) for a great extra set of eyes on a code review and suggesting we get this up to Microsoft so they can improve the existing functionality of the code. I’m hoping that happens!
  • Chrissy LaMaire (@cl), who helped give me great feedback on my code (and style) before publishing the final version of the code. She also makes excellent steak frites.

And a big “thank you” for reading! What do you think? Is this functionality useful to you too? Head down to the comments and let me know.

One thought on “Building a Better Get-SQLErrorLog

  1. Pingback: A Better Get-SQLErrorLog – Curated SQL

Comments are closed.