Is your encrypted data really encrypted?
If you want to (or more likely need to) encrypt data in SQL Server, you have options. Most often, two technologies bubble up to the top of the heap:
- Transparent Data Encryption (TDE): TDE is encryption at rest. For anyone who has access, the data looks exactly “normal” when you query it. However, on your storage, the data is encrypted. The theory goes, if the disks or underlying hardware were stolen out from under you then your data is secure so long as you keep your certificates in a safe place. It doesn’t require any configuration on the client side, all the translation and decryption happens on SQL Server.
- Always Encrypted. This is the real-deal encryption. Essentially, you don’t trust anyone with access to the server with your data, including the DBAs. If you want to view the data, you need an application with appropriate encryption libraries loaded AND certificates. So not only is the data encrypted on disk, it’s gibberish to anyone who doesn’t have the right certificates.
In terms of setting each solution up, TDE is a lot easier. It puts the onus on the DBAs to create certificates and store them inside SQL Server. Always Encrypted is more work for the developers, since they need to work with the required libraries to make it work and handle the problem of keeping the certificates up to date and installed on the clients. The data is completely unreadable to anyone else; not even the DBAs can read it.
So guess which one gets used more? In terms of set up and ease of use, TDE is a lot easier to maintain. But there’s a dark side to TDE, and that’s when you combine it with replication: it essentially undoes all your hard work. And unless you like failing data security audits, you need to get a handle on it quickly.
There’s a couple different ways reapplication will “break” TDE. The first happens with the replication snapshot. Let’s set up a quick demo. I have a few virtual machines running: SQLSERVERA is my publisher where I’m replicating from, SQLSERVERB will be my subscriber, and SQLSERVERC will be my distributor. I already have set up replication, now let’s set up TDE.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>';
CREATE CERTIFICATE MyServerCert WITH SUBJECT = 'My DEK Certificate';
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_128
ENCRYPTION BY SERVER CERTIFICATE MyServerCert;
-- Turn on Encryption
ALTER DATABASE AdventureWorks2014_TDE
SET ENCRYPTION ON;
I’ve taken a copy of AdventureWorks and turned on TDE. So if I were to try to attach the database anywhere where I don’t have the keys and certificates, it’s not going to be readable. Encryption at rest!
We can check the ecryption status with this query. We’re looking for encryption_state = 3 (Encrypted).
select db.name, dek.* from sys.dm_database_encryption_keys dek
inner join sys.databases db ON db.database_id = dek.database_id
But what happens if we set up a transactional replication publication on this database and do a snapshot? Remember that when you create a publication, your distributor and subscriber(s) need to know which network share (or FTP server) to drop all the data and schema definitions to so they can be read in by the distribution agent and recreated. In my example, I’m dropping them to a network share. Once the snapshot completes, let’s go check out our subscriber database…
Uh oh. The same query returned zero results at the subscriber. Which means no encryption! Replication won’t replicate encryption, at all. So if you have a requirement to encrypt your data at the source, you’ll need to do it on your subscribers too.
But that’s not all. Let’s go back to that network share real quick:
All my BCP files are still there. And you can open them all up. For example, what does my Password table schema structure look like? Easy, just open the table name’s .sch file in notepad:
It gets even better. Let’s try a fun experiment: we’ll create a new database, called “UhOh” (foreshadow much?), and recreate the schema of the password table:
CREATE DATABASE [UhOh]
CREATE SCHEMA [Person]
CREATE TABLE [Person].[Password](
[BusinessEntityID] [int] NOT NULL,
[PasswordHash] [varchar](128) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[PasswordSalt] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[ModifiedDate] [datetime] NOT NULL
And guess what happens I fire up a command line and use bcp on them? I can open a command prompt just use a quick command:
bcp Uhoh.Person.Password IN Password_2#1.bcp -T -C
I think you know where this going. If we query that table…
Yikes. All your data, wide open, for everyone to see. Fortunately, the folks at AdventureWorks are using password salting.
So what can you do about this?
- Grant minimum access privileges to your replication share. Your distribution agent needs read access, and your snapshot agent needs read/write. That’s it. No users. Period.
- Keep your replication share tidy. With default settings, these .bcp files will stick around for 72 hours. If that’s too long, there’s a couple different ways to handle this one:
- Set up something like a scheduled PowerShell task to clean this folder out at whatever interval you want.
- To force a faster cleanup, you can turn off the immediate_sync option of the publication. When immediate sync is enabled, replication will hold onto the snapshots for a default 72 hour period. Without it, as soon as a snapshot is initialized the snapshots will be cleaned up by the distribution cleanup job. This option can have other ramifications, so make sure you don’t want to do option #1 before you proceed.
Reading the other guy’s mail
The other way replication can be leaky is if users have access to the distribution database. Remember that transactions get logged to the database before they get delivered. Specifically, they get logged int the table dbo.MSrepl_commands . If you query this table, you’ll see something like this (click for a bigger picture):
At first blush, this looks okay: I don’t see anything in here that looks like plain text SQL with values visible. The problem, though, is that the data in this table is hashed, not encrypted. There’s a huge difference. That means that the data in the command column can be read. While the process to convert this column to a readable value isn’t readily apparent, it’s possible with a little time and effort. Or, better yet, we can run this stored procedure:
And suddenly, your commands are readable:
Is this a problem? Well, it’s data in the clear. It’s values, so you don’t really know which data matches up to where. But there’s more than enough info in those commands to know we’re updating an email address, and here’s the new values. This means your distribution database is a potential risk: just like your publisher and subscriber databases, if someone were to secure it’s data, they could potentially compromise some of your data.
So what can you do about this?
In short: not great news for the distribution database. How can we lock down our distribution database to minimize exposure?
- Keep your users out of the distribution database. Like your network share above, treat your distribution database as any other system database and keep it locked down. If you have your own replication monitoring set up, restrict the service accounts that connect to run the queries. Get as explicit as you can with table permissions, and don’t share the passwords with anyone.
- Consider changing your retention period of your transactions. The default setting is to keep transactions for 72 hours. If you want to minimize exposure, you can change this by modifying your distribution retention period in your distributor properties. However, if you set it too low you could potentially lose transactions before they get delivered to your subscribers.
- You can enable encryption on your distribution database. It doesn’t support Always Encrypted though. I’d test real carefully to see how encryption messes with your replication performance. I haven’t tested this as scale. Be careful.
Conclusion: Treat your distributors and subscribers like your publishers
Just because your published database is encrypted doesn’t let you off the hook with replication. Knowing you have some potential exposure is important so you can mitigate your risk. The examples I gave you above all have to do with either access to the physical location of your snapshot files or access to the databases that store and read the replicated data. What I didn’t even mention was monitoring traffic on the network. if you’re using SSL connections at your publishers, you should most definitely set that up for your distributors and subscribers, too. You may also want to look at storage arrays that do at-rest encryption as well.
In some coming blog posts, we’ll look at more options for setting up replication with technologies like Always Encrypted, as well as using SSL for your connections. Check back soon, and thanks for reading.