Schema evolution
Schemas are documents, so schema evolution is document editing — but evolution has rules of its own. datadata’s stance is evolution-first: schemas change freely while the app runs, migrations are explicit, and documents that no longer fit are flagged, never dropped.
Versioning
Section titled “Versioning”A schema’s version is its document’s sequence number — there is no
separate version field. Every accepted edit to sys:schema:<type> advances
it, and every document tracks which schema sequence its data conforms to.
Every event in a document’s history also records the schema sequence in
effect when it was written — which is what keeps historic versions
interpretable anywhere.
Migrations are explicit and append-only
Section titled “Migrations are explicit and append-only”Data-shape changes are declared as migrations in the schema document. Three operations exist, deliberately minimal:
rename— move a field to a new name.remove— delete a field. Removal is never inferred: dropping a field from the schema without aremovemigration makes documents still holding it invalid, rather than silently discarding data.remap— rewrite scalar values (old → new pairs). Collapsing several old values into one is allowed; one-to-many is not. Remaps are pure — the new value depends only on the old one.
The migration list is append-only, enforced by the server: a schema write may extend it but never modify or drop a committed migration, and the server stamps each appended migration with the schema sequence it took effect at — authors don’t control the stamps.
What authors do control is each migration’s key, and the keys’
lexicographic order is the replay order. A new migration’s key must sort
after every committed one — the server rejects a key that would land inside
the committed range, since that would silently reorder replay for older
documents. Keys can be written by hand or generated by tooling; a convention
like 002-rename-title sorts correctly and stays readable.
Migration runs on read — and writes back
Section titled “Migration runs on read — and writes back”When a document is read whose conformed sequence is behind the schema, the server brings it forward: it replays the migrations stamped after that sequence — in their declared order — validates the result (backfilling declared defaults for added fields), and — if anything changed — persists the migrated data back as a normal change with a new sequence number. Each document pays the migration cost once, on its next read, not on every read. There is no proactive bulk sweep: a document nobody reads keeps its old shape, and its pending migrations simply accumulate until it’s next loaded.
Invalid documents are flagged, not dropped
Section titled “Invalid documents are flagged, not dropped”Schema edits are not checked against existing documents — you can tighten a type or add a required field freely, and documents that no longer fit become invalid. What happens then is asymmetric on purpose:
- Writes are strict. A change that would leave a document invalid under
the current schema is rejected (a
schemaValidationerror on the wire). - Reads are relaxed. An invalid document is still delivered — flagged with the violation, its data intact, its conformed sequence deliberately left behind as the signal. The app (or an agent) decides how to repair it.
Invalid documents are enumerable, so “what broke when we tightened the schema?” is a query, not an audit.
Unknown fields
Section titled “Unknown fields”Object types choose how to treat keys the schema doesn’t declare: reject (the default — an unknown field is a validation issue) or strip (drop them on validation, for open-by-design shapes).