For example, there is a repo for doing this in Django: https://sourcegraph.com/github.com/dcwatson/django-pgcrypto.
There is some discussion in the SQLAlchemy manual, but I am using non-byte columns: http://docs.sqlalchemy.org/en/rel_0_9/core/types.html
I am running Flask on Heroku using SQLAlchemy.
A code example and/or some discussion would be most appreciated.
There are a bunch of stages to this kind of decision making, it's not just "shove a plugin into the stack and that encryption thing is taken care of"
First, you really need to classify each column for its attractiveness to attackers & what searches/queries need to use it, whether it's a join column / index candidate, etc. Some data needs much stronger protection than other data.
Consider who you're trying to protect against:
Of course, there's also the app server, upstream compromise of trusted sources for programming languages and toolkits, etc. Eventually you reach a point where you have to say "I can't realistically defend against this". You can't protect against somebody coming in, saying "I'm from the Government and I'll do x/y/z to you unless you allow me to install a rootkit on this customer's server". The point is that you've got to decide what you do have to protect against, and make your security decisions based on that.
A good compromise can be to do as much of the crypto as possible in the app, so PostgreSQL never sees the encryption/decryption keys. Use one-way hashing whenever possible, rather than using reversible encryption, and when you hash, properly salt your hashes.
That means
pgcrypto
doesn't actually do you much good, because you're never sending plaintext to the server, and you're not sending key material to the server either.It also means that two people with the same plaintext for column SecretValue have totally different values for
SecretValueSalt, SecretValueHashedBytes
in the database. So you can't join on it, use it in aWHERE
clause usefully, index it usefully, etc.For that reason, you'll often compromise with security. You might do an unsalted hash of part of the datum, so you get a partial match, then fetch all the results to your application and filter them on the application side where you have the full information required. So your storage for SecretValue now looks like
SecretValueFirst10DigitsUnsaltedHash, SecretValueHashSalt, SecretValueHashBytes
. But with better column names.If in doubt, just don't send plaintext of anything sensitive to the database. That means
pgcrypto
isn't much use to you, and you'll be doing mostly application-side crypto. The #1 reason for that is that if you send plaintext (or worse, key material) to the DB, it might get exposed in log files,pg_stat_activity
, etc.You'll pretty much always want to store encrypted data in
bytea
columns. If you really insist you can hex- or base64 encode it and shove it in atext
column, but developers and DBAs who have to use your system later will cry.