# ADR-0004: Project file format ## Status Accepted. Amended by [ADR-0015](0015-project-storage-runtime.md) — see the "Amendments" section at the end of this file for the specifics; the rest of this ADR remains the canonical reference for the project file format. ## Context Projects must be: - Shareable — students and instructors should be able to send projects to each other and reconstruct the full database state. - Diffable — version control should produce meaningful diffs as a schema or data set evolves. - Versioned — the format will change as the app evolves, and old projects must continue to load. - Efficient enough for moderate amounts of practice data without forcing users into pathological YAML files of tens of thousands of rows. The on-disk SQLite file (`.db`) is convenient but binary and not suited to sharing or diffing. ## Decision A project is a directory containing: ``` / project.yaml # schema, relationships, metadata, version data/ .csv # one CSV file per table, with header row playground.db # derived; rebuildable from project.yaml + data/ history.log # append-only command/replay log (see ADR-0006) ``` - `project.yaml` carries a top-level `version: 1` field from the outset, plus all schema, relationship, and project metadata. - Table data lives in `data/
.csv` (UTF-8, header row, RFC 4180 quoting). One file per table keeps diffs scoped and avoids monolithic YAML. - `playground.db` is a **derived artifact**. The authoritative state is `project.yaml` + `data/`. The `.db` file is kept when present (we never silently drop it) but can be rebuilt from the text sources at any time. - Rebuilding when no `.db` exists: silent, automatic. - Rebuilding when a `.db` exists: requires user confirmation with a summary diff (e.g. "3 tables, 47 rows will be recreated; existing `.db` will be replaced"). - A `.gitignore` template is created in each project; by default the `.db` file is ignored so version control captures only the authoritative sources. ## Consequences - Projects round-trip cleanly through git, email, and zip. - Large practice data sets remain efficient (CSV is appropriate). - Schema review remains pleasant (YAML is appropriate). - The app must be able to (re)build a database from the text sources at any time — this is a first-class code path, not an edge case. - The `version` field opens the door to format migrations as the app evolves; old projects load by running registered migrators in sequence. ## Amendments ### Amendment 1 — runtime data flow ([ADR-0015](0015-project-storage-runtime.md)) The phrase "`playground.db` is a derived artifact" describes a *recovery* property: the database can always be reconstructed from `project.yaml` + `data/`. It does not describe runtime data flow. At write time, all persistence targets (the SQLite database, `project.yaml`, the relevant `data/
.csv` files, and `history.log`) share a single source — the user's command — and are written alongside one another in a defined order (see ADR-0015 §6). None of the text files is "downstream" of the database at write time. ### Amendment 2 — `.db` rebuild trigger ([ADR-0015](0015-project-storage-runtime.md)) The "rebuild with confirmation when `.db` exists" semantics in the original Decision section are replaced by a simpler model: - On load, if `playground.db` exists, it is opened as-is. - On load, if `playground.db` is missing, it is rebuilt silently from `project.yaml` + `data/`. - A new app-level command, `rebuild`, explicitly discards the current `playground.db` and reconstructs it from the text sources, with a confirmation prompt and a summary of what will be reconstructed. The application does not attempt to detect drift between the database and the text sources automatically. `rebuild` is the explicit user-driven path for cases where drift exists (git pull over an existing `.db`, hand edits to YAML/CSV, recovery after a rare failure described in ADR-0015 §6).