Skip to content

Where datadata comes from

datadata is a second system. From October 2020 I built Dossier, an open-source headless CMS, as a solo project — bring your own auth, database, and backend, with a schema-driven admin UI on top. It never gained traction. A proper postmortem series is in the works; what belongs here is the part that transferred, because most of datadata’s design decisions are answers to things Dossier taught me.

The complete event stream. Dossier captured every mutation — including schema changes — as one ordered stream of sync events; replaying it on an empty database reproduced everything, schema and content together, across backends. That idea survived intact and deepened: it’s datadata’s one timeline principle and the basis of portable event streams. Some postmortem lessons are about what to keep.

Owning a rich text format. Dossier’s rich text format changed twice (Editor.js blocks, then Lexical’s editor state), and each switch meant migrating stored content and rebuilding editor UI. datadata doesn’t own a text format: it embeds Yjs and inherits its editor ecosystem, so the format question belongs to a project whose whole job it is.

Abstracting at the primitive level. Dossier’s database adapter abstracted over low-level primitives like transactions, across Postgres, SQLite, and D1 — and D1, which has no interactive transactions, proved the boundary wrong rather than just awkward. datadata’s two seams — storage adapter and event bus — own higher-level operations end to end, and there is deliberately one production backend until the seams have earned a second.

A migration rule matrix. Dossier’s schema changes were governed by a matrix of allowed and unsupported operations, some triggering re-validation or re-indexing duties. datadata’s answer is deliberately smaller: three explicit migration operations, append-only, with documents that no longer fit flagged rather than blocked or dropped.

Betting on the generic admin UI. A schema-driven admin interface was a pillar of the CMS value proposition — and a large share of Dossier’s code. AI agents eroded that pillar mid-project: bespoke, task-specific UIs became cheap to build, often cheaper than adapting a generic one. datadata inverts the bet — no bundled UI, an API small enough for an agent’s tool definitions, and sessions as the review surface where humans meet agent-made changes.

Dossier was content infrastructure with sync underneath; datadata is the sync engine promoted to the product, carrying forward the one idea that aged best — the self-contained, schema-aware event log — and shedding the layers that aged worst. The fuller postmortem will be linked from here when it’s published.