Skip to content

Schema evolution

Schemas are documents, so schema evolution is document editing — but evolution has rules of its own. datadata’s stance is evolution-first: schemas change freely while the app runs, migrations are explicit, and documents that no longer fit are flagged, never dropped.

A schema’s version is its document’s sequence number — there is no separate version field. Every accepted edit to sys:schema:<type> advances it, and every document tracks which schema sequence its data conforms to. Every event in a document’s history also records the schema sequence in effect when it was written — which is what keeps historic versions interpretable anywhere.

Data-shape changes are declared as migrations in the schema document. Three operations exist, deliberately minimal:

  • rename — move a field to a new name.
  • remove — delete a field. Removal is never inferred: dropping a field from the schema without a remove migration makes documents still holding it invalid, rather than silently discarding data.
  • remap — rewrite scalar values (old → new pairs). Collapsing several old values into one is allowed; one-to-many is not. Remaps are pure — the new value depends only on the old one.

The migration list is append-only, enforced by the server: a schema write may extend it but never modify or drop a committed migration, and the server stamps each appended migration with the schema sequence it took effect at — authors don’t control the stamps.

What authors do control is each migration’s key, and the keys’ lexicographic order is the replay order. A new migration’s key must sort after every committed one — the server rejects a key that would land inside the committed range, since that would silently reorder replay for older documents. Keys can be written by hand or generated by tooling; a convention like 002-rename-title sorts correctly and stays readable.

Migration runs on read — and writes back

Section titled “Migration runs on read — and writes back”

When a document is read whose conformed sequence is behind the schema, the server brings it forward: it replays the migrations stamped after that sequence — in their declared order — validates the result (backfilling declared defaults for added fields), and — if anything changed — persists the migrated data back as a normal change with a new sequence number. Each document pays the migration cost once, on its next read, not on every read. There is no proactive bulk sweep: a document nobody reads keeps its old shape, and its pending migrations simply accumulate until it’s next loaded.

Invalid documents are flagged, not dropped

Section titled “Invalid documents are flagged, not dropped”

Schema edits are not checked against existing documents — you can tighten a type or add a required field freely, and documents that no longer fit become invalid. What happens then is asymmetric on purpose:

  • Writes are strict. A change that would leave a document invalid under the current schema is rejected (a schemaValidation error on the wire).
  • Reads are relaxed. An invalid document is still delivered — flagged with the violation, its data intact, its conformed sequence deliberately left behind as the signal. The app (or an agent) decides how to repair it.

Invalid documents are enumerable, so “what broke when we tightened the schema?” is a query, not an audit.

Object types choose how to treat keys the schema doesn’t declare: reject (the default — an unknown field is a validation issue) or strip (drop them on validation, for open-by-design shapes).