Encrypting Streams in Go

At Blend, we deal with highly sensitive consumer financial data. We use several data stores — Postgres, MongoDB, CockroachDB, and Etcd — all of which need to be backed up. While MongoDB and Postgres give us prebuilt tools for encrypting backups, Etcd and CockroachDB do not. Our standard practice is to encrypt these backups before storing them. This became more challenging as our backups grew.

Encrypting backups in memory

At the beginning the backups were small, and we were able to use Vault’s transit features to encrypt them. As usage increased, we stored more data, and these backups grew. For small files, we could encrypt the data in memory using the Go standard library. While these files don’t get huge (~2 GB at most), we didn’t allocate enough memory to the backup job containers to encrypt them — the containers ran out of memory and were killed. We needed a way to encrypt our backups without reading entire files into memory.

Encrypting large backups using streaming encryption

Instead of encrypting large backups in memory, we can do it over a stream. Thankfully, Go provides a stream cipher in the crypto/cipher package. We decided to use an AES cipher with the stream cipher in counter mode. With this, we no longer needed to read all the data into memory — we could just read in data from an io.Reader and write to an io.Writer chunk-by-chunk using our cipher’s XORKeyStream method.

But why do the work of looping and copying data when we can take advantage of built-in functions like io.Copy? We thought it would be nice to wrap a plaintext stream with our encrypter also implementing io.Reader so we could just “read” our ciphertext immediately.

Skipping the initialization steps, our code ended up looking like this:

type StreamEncrypter struct {
    Source io.Reader
    Block  cipher.Block
    Stream cipher.Stream
    Mac hash.Hash
    IV  []byte
}

func (s *StreamEncrypter) Read(p []byte) (int, error) {
    n, readErr := s.Source.Read(p)
    if n > 0 {
        s.Stream.XORKeyStream(p[:n], p[:n])
        err := writeHash(s.Mac, p[:n])
        if err != nil {
            return n, ex.New(err)
        }
        return n, readErr
    }
    return 0, io.EOF
}

What benefits does the StreamEncrypter type give us? For one, encrypting a file becomes:

encrypter, _ := NewStreamEncrypter(key, reader)
io.Copy(file, encrypter)

Not only is this simple, but it accomplishes our original goal of not having to read the entire file into memory to encrypt it. Looking at the line XORKeyStream(p[:n], p[:n]), we’re even reusing the space in our buffer. We never need to allocate more space!

Since we’re operating on interfaces instead of concrete types like os.File, our data can come from anywhere (e.g. an HTTP request) and be written anywhere (e.g. a file). This is particularly helpful since we send Etcd backups over the network to our backup job.

In addition to encrypting the data, we also need to verify that the ciphertext hasn’t been tampered with or corrupted. We do this using an HMAC, which computes a running hash of our cipher.

Takeaways

Encrypting data in memory worked for small backups, but it didn’t scale with our dataset. Stream ciphers are the best way to get around this to encrypt larger backups. Go provides useful library functions that should be taken advantage of, but they aren’t always easy to use . With a small amount of code, we can take advantage of these library functions and simplify the process. You can find our whole solution, as well as the decryptor, here.