Skip to content

Design decisions

For sync-engine developers: the decisions that define datadata, what they buy, and what they cost. Disagreement is the point of publishing this — argue with us.

Decision. One server per folder orders every change. No distributed merge for structured data.

Why. Authority makes the hard things simple: validation has a place to stand (a write is checked against the current state, not a possible one), sequence numbers make history linear, and “what does the document say” has one answer. The apps datadata is honed against — collaborative tools with agents proposing changes — need reviewable conflicts more than they need serverless merging.

Cost. No offline writes, no P2P, no E2E encryption. These aren’t deferred features; they’re the price of the model, paid knowingly.

JSON Patch for structure, CRDT only for text

Section titled “JSON Patch for structure, CRDT only for text”

Decision. Structured changes are RFC 6902 patches with opt-in guards; only text fields use CRDTs (Yjs).

Why. CRDTs guarantee convergence, not intent: two structurally valid merges can still be semantically wrong, and nobody gets asked. Patches are legible — to humans reviewing, to agents reasoning, to the event log as audit — and guards turn “someone else changed this” into an explicit, handleable signal. Text is the exception because per-keystroke intent really is captured by CRDT merge semantics.

Cost. Concurrent structured edits can reject and need retry or resolution. RFC 6902 itself has sharp edges we’ve had to design around — written up in JSON Patch RFC issues.

Decision. One format — the JSON Patch — is both the wire format for changes and the stored event format. There is no application-defined event vocabulary, and no reducer code that interprets events into state.

Why. The one-timeline principle, applied to changes. Classic event sourcing splits a change’s meaning between the event data and the source code that applies it — so the repository’s history becomes a second timeline again: replaying last year’s events through this year’s reducers is a synchronization problem, and every context that consumes the log must carry a compatible implementation. A patch needs no interpreter: the change is the data, and applying it is defined by a public RFC, not by the application. That makes history bug-compatible — if a mutation wrote something the wrong way, the log records exactly what happened and replays it identically forever, rather than being silently reinterpreted by newer code. And because one standard covers all synchronization, any context — a replica, a pipeline, a debugger, an agent — can read and apply updates without knowing the schema or the application that produced them.

Cost. Patches record what changed, not why. A domain event named taskCompleted carries reasoning; a replace at /status doesn’t, and intent has to be carried alongside — which is part of why staged changes carry provenance. Reducer-style derived state and richer replay semantics (the strengths of event-sourcing engines) are off the table.

Decision. A document type’s schema lives at sys:schema:<type>, edited through the same API as data.

Why. It collapses an entire category of machinery — migration tools, admin APIs, deploy-coupled schema changes — into the machinery that already exists. Schemas sync live, version like data, and are writable by agents. The meta-schema keeps it from being anarchy. And it doesn’t cost TypeScript: schemas authored in code are upserted into the schema documents at startup, with compile-time types inferred from the schema itself — no codegen.

The deeper reason is history in one timeline. Code-only schemas split a document’s meaning across two histories — the repository’s and the data’s — so “what did this document mean six months ago?” requires correlating git commits with deploy times with document versions, and a field removed from a code schema is recorded nowhere except a commit. With schemas as documents, the schema’s history lives in the same event log as the data it governs: point-in-time introspection has one place to look, and a folder’s exported history carries its own interpretation with it instead of depending on a repository somewhere else.

Cost. The engine must handle data written under old schema versions indefinitely (documents migrate on their next read, so cold documents keep old shapes), and runtime schema changes are a correctness surface that static registries never have.

Decision. Staged work is a changeset in a host document — flat lanes of changes with base snapshots — not a fork of the folder’s history.

Why. Branching a document store invites the full weight of merge semantics for a use case — “propose, review, commit” — that needs much less. A changeset is itself document data: it syncs, survives reload, supports multiple authors, and is observable through sys:session / sys:stage with zero new machinery. Conflicts are derived three-way from base snapshots — no replay, no rebase graph.

Cost. One level of staging, not arbitrary history surgery: no branches of branches, no cherry-picking across folders. So far the apps haven’t missed it.

Decision. Subscribe / read / create / update, over documents — including system state. No admin API, no separate metadata service.

Why. Every concept added to a sync engine’s surface gets paid for by every application — and every agent prompt — forever. Synthesizing engine state as documents means introspection, devtools, and review UIs are just subscribers.

Cost. Read-models must be synthesized carefully (and documented as read-only), and “everything is a document” is a discipline that has to be defended in design review against every convenient exception.