What’s in a service pack?
You might have heard people talking about all the cool stuff that Microsoft introduced with SQL Server 2016 Service Pack 1. There’s lots to get excited about: finally, those of us stuck on Standard Edition get a lot (but not all) the bells and whistles of Enterprise Edition. A lot of the developers and BI people I work with are super excited to leverage some of the cool features of 2016 now in instances that aren’t Enterprise licensed.
I can get excited by change data capture, columnstore indexes, and dynamic data masking as much as the next person, but buried in the release from Microsoft was something that jumped out at me:
DROP TABLE support for replication – DROP TABLE DDL support for replication to allow replication articles to be dropped. For more information, refer KB 3170123.
… and then I woke up; apparently I passed out from a combination of excitement and surprise that replication got an update. To be fair, though, this change wasn’t just something introduced with SP1 for 2016; Microsoft initially tucked this change into SQL Server 2014 SP2.
The change does what now?
A major complaint I hear about SQL Server replication is around how well it doesn’t play with things like automated deployments or continuous integration. It can be hard to maintain a replication set up that mirrors your production environment; space requirements can be high, and refreshing a non-production environment means automating your replication settings so that when refreshes complete, you have all the publications and subscriptions set up. A lot of DBAs (myself included) forgo replication in DEV/INT/QA for that reason.
Two of the biggest offenders are when you’re trying to rename a table (or move it to a new schema) or outright drop a table. So that means your deployments will fail, which is never a good thing in agile environments that need to get code out the door fast. To correct the issue, you’d either need to make sure a table is taken out of replication ahead of the deployment (which means you’d have to know about the change ahead of time) or you’d need to react to the failed deployment and yank the table out and then redeploy.
Both aren’t great solutions, but with this recent change, you can take some of that sting out. 2016 SP1 and 2014 SP2 now include a new publication option called “allow_drop” that allows you to drop a table that is marked for replication. So now, if you or any developers you work with check in code to drop deprecated tables, it won’t throw an error anymore. There are some requirements (more on that in a minute), but to enable this change on a published database hosted on a 2016 SP1 or 2014 SP2 server, you’d run the following command (this isn’t something you can set in the GUI, at least not yet):
exec sp_changepublication @publication = '<Publication Name>', @property = 'allow_drop', @value = 'true'
And it works! Neat! Everything is great, let’s just end this article here, shall we? Too bad, because this is replication we’re talking about. This isn’t ‘Nam, there are rules.
To enable this change though, there’s more to it than just “you need to be running this service pack level on your publisher.” It’s far, far more than that:
- Your publisher needs to be running SQL Server 2016 SP1 or 2014 SP2. We’ve already seen that is a requirement based on the content of the KB article. Big whoop.
- In addition, if you have the same article in more than one publication, you have to enable this feature on all of them or you’ll still get the error that a table is in replication and can’t be dropped.
- Your subscribers don’t care about the SQL Server version (except when you have a data type that exists in the publisher that doesn’t at the subscriber. Then it cares. But for this change, it doesn’t).
- Your distributor also needs to be running a version that supports this functionality.
I can’t make that last point any bolder; that’s not documented anywhere that I could find. I’ll explain why later, but for now, roll with it. In fact, let’s see what happens if we don’t upgrade our distributor as well. Time to go to the lab!
In my lab, I have the following configuration:
- SQLSERVERA: Running SQL Server 2016 SP1. I have an AdventureWorks database mounted here, and I’ve got a publication called “AdventureWorks2014_TR” set up.
- SQLSERVERB: Running SQL Server 2016. There is a database called “AWSubscriber” that is set up as a subscriber for AdventureWorks2014_TR.
- SQLSERVERC: Running SQL Server 2016. This is my distributor.
For my test, I have one publication with multiple articles. For simplicity sake, I am going to just use the table dbo.AWBuildVersion since it’s a nice, easy one row table and it’s not foreign keyed to the hilt, so it’s a nice test case since the drop will be easy. Next, I’m going to issue a DROP TABLE command for the table. It fails, because we haven’t turned anything on yet:
To prepare, let’s first run the T-SQL statement that will enable the new feature. Remember, this command will fail if you’re not running the appropriate service pack.
exec sp_changepublication @publication = 'AdventureWorks_TR', @property = 'allow_drop', @value = 'true'
Next, I’m going to issue a DROP TABLE command, and bingo: the table is dropped even though it’s marked for replication. Everything is awesome! Or is it?
Houston, we have a problem
If we open up replication monitor on our distributor, we might see a red x. Which, if you’ve been using replication as long as I have, that elicits a Pavlovian response to reach for the liquor.
We’ve got an error in our subscription saying that the there’s an invalid distribution command type in the queue. Strange; does that mean this transaction (the DROP TABLE DDL statement) is trying to be replicated down to the subscribers? And how do we deal with this error? Well, that’s where we go back to the bolded requirement I talked about earlier: in order for this functionality to work, not only do you have to make sure your publisher is updated to the correct service pack, your distributor needs the correct patch level, too.
Why? Well, probably because of what’s included here from the KB:
The log reader agent will post a cleanup command for the distribution database of the dropped table and do the cleanup of the publisher metadata. If the log reader hasn’t processed all the log records that refer to the dropped table, then it will ignore new commands that are associated with the dropped table. Already processed records will be delivered to distribution database. They may be applied on Subscriber database if the Distribution Agent processes them before Log Reader cleans up the obsolete (dropped) article(s).
So whatever this functionality entails, your distributor needs to be able to handle the command as well. Fortunately, if you have your distributor on the same server as your publication(s), this isn’t a big deal, and you’ll probably not encounter this scenario. However, if you have a dedicated distributor, you need to plan accordingly to make sure it’s updated as well. And if you’re stuck with a SQL Server 2012 or earlier instance for it, you won’t be able to use this feature, no matter what version your publisher is using. Ouch.
So how do we fix this error? There’s three ways:
- The “easy, but requires downtime” way: update your distributor to a compatible service pack level. This might be an option for you if you’re using SQL Server 2014 or SQL Server 2016. Not so much if you’re using any other version.
- The “sort of easy but without any downtime if you can stomach some replication latency” way: delete the publication and subscriptions and recreate them again. While your distributor won’t eat any downtime, you’re at the mercy of the time it takes your publication to generate and deliver a snapshot, plus whatever blocking is generated while the snapshot is being taken. Yuck.
- The “trickier, no good, last-resort-don’t-blame-me-if-you-mess-up” way: Manually remove the offending transaction from the distribution database. I hate this solution because you’re descending into the bowels of replication to manually remove something. This hardly, if ever, goes well. But if you get stuck, you can get rid of the offending transaction. In your replication monitor, find the offending transaction sequence number and command id. In my example above, you can see the transaction sequence number and command ID number. Savor this moment, because it’s a rare opportunity for replication monitor to be helpful.Once you have that info, we need to run a query against the distribution database to remove that exact transaction. I’m including both a select statement and a delete statement here, so you can see the validation method:
WHERE xact_seqno = 0x0000007E00000480000200000000 and command_id = 1
DELETE FROM [distribution].[dbo].[MSrepl_commands]
WHERE xact_seqno = 0x0000007E00000480000200000000 and command_id = 1
That should only affect one row and after a few seconds, your replication monitor should go green for the given again and start delivering any other transactions queued up. You can test it by putting a tracer token in and making sure it delivers.
Should you take advantage of this “enhancement?”
If you’re suffering from failed deployments or constantly running into situations where a DROP TABLE statement is causing your heartburn, then you could consider it. Just be mindful of the requirements.
While other DDL statements can filter down in transnational replication, this statement won’t. So that still means you’ll be dealing with tables that were dropped in production living on in your replicas. I’d also argue that with a change like this in place, you’re probably less likely to remember to get rid of tables on your subscribers since it won’t break replication. Think about it: if you’re not sure what’s going to production already, how likely is this change likely to let you know something isn’t in your replicated tables anymore. No, you’d need some additional monitoring to look for changes in your table objects and compare them to what’s in replication vs. what is still on your replicas.
While it’s nice to see some replication changes in service packs, I’m hoping that Microsoft continues to consider those of us that continue to rely on this technology. This was a good start. I probably won’t be turning this on in my environments. My advice to you is pretty much the advice of all IT changes: test the hell out of it. Even if it means building a non-production distributor to test with and setting up some replication in at least your QA environment.
The other consideration is that this command, unlike other DDL statements has no bearing on your subscribers. As usual, dropping a table on your publisher will not drop it on your subscriber(s). Personally, I’m okay with that: it’s more work for me track down deprecated tables that are still on replicas than it is to just have tables disappear for reporting users. And if you’re replicating data to third parties, they probably wouldn’t appreciate it either.
What’s the alternative?
If you’re constantly being bitten by code that conflicts with replication breaking your deployments, instead of relying on this enhancement (or if you’re unable to take advantage of it) you can educate your developers on how to test for replication. You could show them a query like this…
select * from sys.tables where is_published = 1
… and how they can use it to know if a table is replicated or not, and test for it. Will something like that solve all your code deployment problems, or easy any pains when you’re trying to remove objects that are published for replication? Probably not, but it at least can help move your release process or testing scenarios to help reduce, or hopefully eliminate these difficulties for you.