Last summer, I wrote a blog post called “ATProtocol Record Hydration: Building Privacy-Aware Views”. In it, I explored how ATProtocol’s current features can help create privacy-aware data views. With so much discussion about “private data” and what needs to be built to support it, I want to showcase the existing protocol features that make it possible today.
If you don’t want to read the original post, here’s the main idea: add an optional service field to com.atproto.repo.strongRef and blob types. This field points to a service that uses inter-service authentication to provide the data. Apps that don’t support this feature won’t find the content in the user’s repository. Apps that do support it will control access with permissions.
That post focused on the mechanics: remote record references, inter-service authentication, and the hydration pattern. This post aims to refine that model. I want to cover three main points: the trust relationships needed for controlled data, why data portability must not be lost when adding access control, and how a permission-aware PDS can join the network’s event stream without exposing record contents.
The Love Triangle
When discussing permissioned data on ATProtocol, people usually focus on two groups: the data owner and those granted access to it. But that view leaves out a third key player: the applications that store, relay, and display the data.
Permissioned data is a love triangle between:
The user who creates and owns the data
The permitted identities that are authorized to access it
The applications that store or relay the data
This is not a flaw. It’s a core part of how the system works, and we should recognize it. By “applications,” I mean more than just the PDS. It includes the PDS that stores the data, the AppView that displays it, and any relay or indexer involved. Every application in this chain is part of the trust relationship.
Export Parity: Permissioned Data Is Still Your Data
Here’s a principle I want to state clearly: a permission-enabled PDS should have the same export controls and functionality as any other PDS.
It can be tempting to treat permissioned records as different when it comes to data portability, but they are not. They are still user-owned records. The repository owner always receives the full CAR file, including the commit, MST, and all record data blocks. Export, import, migration, and backup all work the same as with a standard PDS. This is not up for debate.
Data portability is one of ATProtocol’s core promises. The moment we say, “Well, these records are special, so they can’t be exported the normal way,” we’ve broken that promise. Permissioned data adds a runtime behavior (access control at read time), not a storage constraint.
Permission metadata that describes who can access what and under which conditions should also be portable. Whether it’s kept as separate permission records, inside the record’s source field, or managed with a dedicated lexicon, it must remain intact during export and import.
The Repository Already Supports This
This is where it gets interesting. ATProtocol repositories already separate “what exists” from “what it contains.” We don’t need a new primitive for metadata-only network participation; we just need to use the current structure on purpose.
How the MST Separates Structure from Content
An ATProtocol repository has three layers, each referencing the next by CID:
Signed commit object — contains the account DID, a revision TID, and a data CID pointing to the MST root. The commit is signed with the account’s signing key.
Merkle Search Tree nodes — each entry maps a collection/rkey path (like
app.bsky.feed.post/3lsopfrzoww25) to a record’s CID. The tree is deterministic: given any set of path and CID pairs, exactly one valid MST exists.Record data blocks — the actual DAG-CBOR encoded record content. Each block’s CID is computed from its bytes, and that CID appears in the MST.
The critical property: the MST stores paths and CIDs, not record content. The tree nodes reference records by their content hash, but they never contain record data. The commit signature covers the MST root CID, computed from the MST nodes that reference record CIDs. The signature transitively authenticates every path→CID mapping in the repository without requiring any record data.
This means the whole Merkle tree, including the commit signature, tree structure, and every path-to-CID mapping, stays complete and verifiable even if all record data blocks are removed.
Metadata-Only CAR: The Network View
Strip the record data blocks from a repository CAR file, and what you have left is a signed, verifiable table of contents:
The signed commit block (identical to the full export)
All MST node blocks (identical to the full export)
No record data blocks
This metadata-only CAR shows which account owns the repository, its revision, which records exist at which paths, and each record’s content hash. The full Merkle tree stays intact and can be checked against the commit signature. What’s missing is the actual content of any record; the CIDs in the MST leaf entries are just dangling references.
This is structurally valid. The CAR spec doesn’t prohibit dangling CID references, and ATProtocol already tolerates them. A metadata-only CAR is the natural extension of patterns the protocol already uses.
Two CARs, Two Audiences
This gives us a clean model:
Network participants, like relays and indexers, get the metadata-only CAR. They can see what exists, check the tree, and index paths and CIDs, but they can’t view the record content.
Repository owners always have access to the full CAR, including all record data for export, import, migration, and backup. The permission layer never limits the owner’s access to their own data.
Authorized consumers (AppViews and clients), fetch individual record blocks as needed through authenticated requests. They check each block’s CID against the MST and provide content to users who have permission.
This is export parity in practice: the owner’s experience is identical to a standard PDS, while the network sees only what it needs to maintain structure and consistency.
A Permissioned Firehose: Events Without Values
In standard ATProtocol, a PDS publishes its commit log to the network through an event stream (the firehose). Relays and downstream consumers subscribe to this stream to stay synchronized. Each event includes the full record of the commit.
For a permission-enabled PDS, this creates a problem. You can’t send the full content of a confidential record to every relay and indexer on the network. Doing so would defeat the purpose of access control.
Metadata-Only Commits
Instead of publishing entire records, a permission-enabled PDS shares the commit and changed MST nodes without the record data blocks. This follows the metadata-only CAR pattern for diffs.
This acts as a commit announcement: “a record at this path was created or updated, here’s its content hash, but you’ll need to ask me for the actual content.”
This changes the default from push to pull. Instead of sending everything, the system now tells you something exists, and you can request it if you have permission.
The precedent already exists. Firehose diffs are already partial CARs with dangling CID references. The deprecated tooBig mechanism emitted commit-only events without record blocks. Jetstream omits MST nodes entirely. Sync v1.1 introduces collection-filtered partial exports and inductive verification without MST retention. A metadata-only firehose event is the natural next step in this progression.
Backwards-Compatible StrongRef Extension
The cleanest way to express this is as a backwards-compatible extension to the existing strongRef structure. Today, a strong reference looks like:
{
"cid": "bafyreic4uafzyy5o7g4o7yjnnmmkootivwvybyrq2xcu63x3z3tmuj5tgq",
"uri": "at://did:plc:w4xbfzo7kqfes5zb7r6qv3rw/app.bsky.feed.post/3me5kw5txns2c"
}For a permissioned record, the reference gains a service field:
{
"cid": "bafyreic4uafzyy5o7g4o7yjnnmmkootivwvybyrq2xcu63x3z3tmuj5tgq",
"uri": "at://did:plc:w4xbfzo7kqfes5zb7r6qv3rw/app.bsky.feed.post/3me5kw5txns2c",
"service": "did:plc:w4xbfzo7kqfes5zb7r6qv3rw#atgated_pds"
}The service field is a service identifier that points to the endpoint serving the full record content. Existing clients who don’t recognize this field simply ignore it. They still see a valid reference and can use the URI and CID for indexing, deduplication, and graph building, but they can’t access the value.
Clients that understand the service field know to resolve the DID, find the #atgated_pds service endpoint, and make an authenticated request to get the record content, following the existing authorization rules for inter-service authentication.
What This Enables
This design preserves several important properties:
Network-level consistency: Relays still see every commit and can keep a full view of the repository structure, including which collections exist, how many records are in each, and the CID of every commit. They just can’t view the contents of permissioned records. The Merkle tree stays intact and verifiable.
Content-addressed integrity: The CID in the announcement is the hash of the actual record content. When an authorized consumer later gets the full record, they can check that the CID matches. The firehose announcement acts as a commitment to the content, even if the content isn’t included.
Selective hydration at the edges: AppViews and clients are the right place for hydration. They already make authenticated requests on behalf of users. The firehose tells them what exists, and the service endpoint tells them what it contains if they have permission.
Backwards compatibility: Nothing breaks. Applications that don’t recognize service fields still work. The permissioned layer only adds new features.
Putting It Together
Here’s how these refinements combine:
1. A user creates a confidential Lexicon Community calendar event on their PDS for a private party.
2. The PDS publishes a metadata-only firehose event, which includes the commit and MST diff but no record data block. The service field in the metadata is the fragment of the service that can serve the data.
3. A relay consumer sees the event, indexes the path and CID, and verifies the MST integrity against the commit signature. It knows a record exists but can’t see what it contains.
4. An AppView, acting on behalf of an authorized user, resolves the service DID, discovers the endpoint, and makes an authenticated request for the record block. It verifies the returned data’s CID against the MST.
5. The AppView checks the requesting user’s authorization and returns the record.
6. If the user ever wants to migrate, they export their entire repository as a full CAR file (including all record data blocks) and import it into a new PDS.
Nothing in this flow needs protocol changes. Everything is additive: metadata-only firehose events, an optional field on strong references, and clear expectations about trust boundaries. The MST already separates structure from content; we’re just making that separation intentional.
What’s Next
Some questions remain. How should permission metadata be standardized across different implementations? What happens when a record is cached by an authorized consumer and needs to be revoked?
It’s important to note that collection NSIDs and TID-format rkeys in the MST reveal record types and creation times, even in metadata-only CARs. A relay consumer processing these events can see, for example, that you created three community.lexicon.calendar.event records last Tuesday, but not what those events contain. Incorporating metadata privacy would require changes to how the MST is built, which is beyond the scope of this post but worth considering.
The repository structure already gives us the separation we need: metadata-only CARs for the network and full CARs for the owner. A permissioned firehose based on this separation is a straightforward, backwards-compatible way to add access control to the network layer without disrupting what already works.
Join the discussion on the ATProtocol Community Discourse or in the atmosphere.