blu: Encrypted, Deduplicated File Archival You Actually Own

The Cloud Will Burst

A few years ago, Balaji said something on a podcast w/Tim Ferriss that stuck with me. I’m paraphrasing, but the gist was: someday, the cloud will burst. Meaning state actors, through subpoenas, breaches, or coercion, will eventually access data stored in traditional cloud services. S3, Google Drive, iCloud, Dropbox – all of it. True privacy requires encrypting your data with keys that you, and only you, control.

I started building blu right after listening to that episode. That was years ago while living in Brazil, and the project has been in active development since, albeit stalled at times.

The problem is that most backup and archival tools don’t encrypt client-side at all, or they manage the keys for you, which means you’re trusting the provider. “Trust me, bro” security. And in a world where AI can analyze anything at scale and governments are building surveillance infrastructure as fast as they can, that trust is a liability.

So I designed nad built this system, and called it blu.

What It Does

blu is an encrypted, content-addressed, deduplicated file archival system written in Rust. It reads files from a directory, chunks and deduplicates them by content hash, encrypts everything client-side, and stores the encrypted blobs to a configurable backend (local filesystem or S3).

You hold the keys. blu never stores or transmits your private key. There is no key escrow, no server, no third party. If you lose your key, your data is gone. That’s the point.

Basic workflow:

blu init /path/to/your/data     # Create a vault
blu sync                        # Chunk, dedup, encrypt, store
blu ls                          # List indexed files
blu restore-files --all --to /tmp/restored  # Restore

NOTE: As of today’s date, this is in active development and very pre-alpha.

Encryption Architecture

This is where blu is more than a wrapper around gpg or age.

blu uses envelope encryption with a three-tier key hierarchy. At the top is your User Key, derived deterministically from a BIP39 mnemonic (the same 12/24-word seed phrase model used in crypto wallets). Your mnemonic is your identity. Recover the mnemonic, recover everything.

Below that is a Key Encryption Key (KEK), one per vault. The KEK is what actually protects your data encryption keys (DEKs), and it’s wrapped using age encryption with your User Key as the recipient. KEKs are versioned and rotatable – when you rotate, blu re-wraps the DEKs (tiny, fast) without re-encrypting the actual data (huge, slow). A vault with a terabyte of data might have 125,000 blob files. Re-wrapping 125k DEKs takes seconds. Re-encrypting a terabyte does not.

At the bottom, every blob file and every index file gets its own randomly-generated DEK, encrypted with ChaCha20-Poly1305.

Post-Quantum Hybrid Encryption

This is a requirement, not a feature. “Collect now, decrypt later” is a real threat model. Nation states are already stockpiling encrypted traffic for future quantum decryption. If your data is encrypted today w/classical-only algorithms, it has an expiration date.

blu uses ML-KEM-768 (the NIST post-quantum standard) combined with X25519 in a hybrid KEM for wrapping KEKs. If either algorithm holds, the encryption holds. This is the same defense-in-depth approach that Signal and others are adopting, and it’s non-negotiable for anything claiming to protect data long-term.

To be honest, I’m not a cryptographer, and I don’t know much about encryption, except not to roll my own. So I just use whatever Filippo Valsorda has set up for age.

Agent Daemon

Nobody wants to type a passphrase for every command. blu includes an agent daemon (similar in concept to ssh-agent) that keeps your decrypted keys in memory. The agent uses mlock() to prevent secrets from being swapped to disk and zeroize-on-drop to clear them from memory when the session ends. Configurable timeouts, auto-start, and explicit lock/unlock.

Why Open Source

I initially explored commercial applications for this, but the more I thought about it, the more obvious it became: privacy tools that aren’t auditable aren’t trustworthy. If you can’t verify the encryption, you’re trusting someone else with your secrets. That defeats the entire purpose.

The core crypto pipeline is solid and well-tested (200+ tests passing). The project is in active development – I’m building out CLI test coverage, config validation, and a diagnostics command. The encryption architecture, storage model, and dedup pipeline are working and stable.

I believe this can serve as a foundation for others to build on, whether for personal use, enterprise applications, or further work on decentralized encrypted storage.

The Cloud Will Burst#

What It Does#

Encryption Architecture#

Post-Quantum Hybrid Encryption#

Agent Daemon#

Why Open Source#

Links#