If you want to (or more likely need to) encrypt data in SQL Server, you have options. Most often, two technologies bubble up to the top of the heap:
In terms of setting each solution up, TDE is a lot easier. It puts the onus on the DBAs to create certificates and store them inside SQL Server. Always Encrypted is more work for the developers, since they need to work with the required libraries to make it work and handle the problem of keeping the certificates up to date and installed on the clients. The data is completely unreadable to anyone else; not even the DBAs can read it.
So guess which one gets used more? In terms of set up and ease of use, TDE is a lot easier to maintain. But there’s a dark side to TDE, and that’s when you combine it with replication: it essentially undoes all your hard work. And unless you like failing data security audits, you need to get a handle on it quickly.
There’s a couple different ways reapplication will “break” TDE. The first happens with the replication snapshot. Let’s set up a quick demo. I have a few virtual machines running: SQLSERVERA is my publisher where I’m replicating from, SQLSERVERB will be my subscriber, and SQLSERVERC will be my distributor. I already have set up replication, now let’s set up TDE.
USE master; GO CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'; go CREATE CERTIFICATE MyServerCert WITH SUBJECT = 'My DEK Certificate'; go USE AdventureWorks2014_TDE; GO CREATE DATABASE ENCRYPTION KEY WITH ALGORITHM = AES_128 ENCRYPTION BY SERVER CERTIFICATE MyServerCert; GO -- Turn on Encryption ALTER DATABASE AdventureWorks2014_TDE SET ENCRYPTION ON; GO
I’ve taken a copy of AdventureWorks and turned on TDE. So if I were to try to attach the database anywhere where I don’t have the keys and certificates, it’s not going to be readable. Encryption at rest!
We can check the ecryption status with this query. We’re looking for encryption_state = 3 (Encrypted).
select db.name, dek.* from sys.dm_database_encryption_keys dek inner join sys.databases db ON db.database_id = dek.database_id
But what happens if we set up a transactional replication publication on this database and do a snapshot? Remember that when you create a publication, your distributor and subscriber(s) need to know which network share (or FTP server) to drop all the data and schema definitions to so they can be read in by the distribution agent and recreated. In my example, I’m dropping them to a network share. Once the snapshot completes, let’s go check out our subscriber database…
Uh oh. The same query returned zero results at the subscriber. Which means no encryption! Replication won’t replicate encryption, at all. So if you have a requirement to encrypt your data at the source, you’ll need to do it on your subscribers too.
But that’s not all. Let’s go back to that network share real quick:
All my BCP files are still there. And you can open them all up. For example, what does my Password table schema structure look like? Easy, just open the table name’s .sch file in notepad:
It gets even better. Let’s try a fun experiment: we’ll create a new database, called “UhOh” (foreshadow much?), and recreate the schema of the password table:
CREATE DATABASE [UhOh] GO USE [UhOh] GO CREATE SCHEMA [Person] GO CREATE TABLE [Person].[Password]( [BusinessEntityID] [int] NOT NULL, [PasswordHash] [varchar](128) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [PasswordSalt] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL, [ModifiedDate] [datetime] NOT NULL )
And guess what happens I fire up a command line and use bcp on them? I can open a command prompt just use a quick command:
bcp Uhoh.Person.Password IN Password_2#1.bcp -T -C
I think you know where this going. If we query that table…
Yikes. All your data, wide open, for everyone to see. Fortunately, the folks at AdventureWorks are using password salting.
So what can you do about this?
The other way replication can be leaky is if users have access to the distribution database. Remember that transactions get logged to the database before they get delivered. Specifically, they get logged int the table dbo.MSrepl_commands . If you query this table, you’ll see something like this (click for a bigger picture):
At first blush, this looks okay: I don’t see anything in here that looks like plain text SQL with values visible. The problem, though, is that the data in this table is hashed, not encrypted. There’s a huge difference. That means that the data in the command column can be read. While the process to convert this column to a readable value isn’t readily apparent, it’s possible with a little time and effort. Or, better yet, we can run this stored procedure:
execute sp_browsereplcmds
And suddenly, your commands are readable:
Is this a problem? Well, it’s data in the clear. It’s values, so you don’t really know which data matches up to where. But there’s more than enough info in those commands to know we’re updating an email address, and here’s the new values. This means your distribution database is a potential risk: just like your publisher and subscriber databases, if someone were to secure it’s data, they could potentially compromise some of your data.
So what can you do about this?
In short: not great news for the distribution database. How can we lock down our distribution database to minimize exposure?
Just because your published database is encrypted doesn’t let you off the hook with replication. Knowing you have some potential exposure is important so you can mitigate your risk. The examples I gave you above all have to do with either access to the physical location of your snapshot files or access to the databases that store and read the replicated data. What I didn’t even mention was monitoring traffic on the network. if you’re using SSL connections at your publishers, you should most definitely set that up for your distributors and subscribers, too. You may also want to look at storage arrays that do at-rest encryption as well.
In some coming blog posts, we’ll look at more options for setting up replication with technologies like Always Encrypted, as well as using SSL for your connections. Check back soon, and thanks for reading.
Pingback: Replication And TDE – Curated SQL