# ADR-0004: Project file format

## Status

Accepted. Amended by [ADR-0015](0015-project-storage-runtime.md) —
see the "Amendments" section at the end of this file for the
specifics; the rest of this ADR remains the canonical reference
for the project file format.

## Context

Projects must be:

- Shareable — students and instructors should be able to send
  projects to each other and reconstruct the full database state.
- Diffable — version control should produce meaningful diffs as a
  schema or data set evolves.
- Versioned — the format will change as the app evolves, and old
  projects must continue to load.
- Efficient enough for moderate amounts of practice data without
  forcing users into pathological YAML files of tens of thousands
  of rows.

The on-disk SQLite file (`.db`) is convenient but binary and not
suited to sharing or diffing.

## Decision

A project is a directory containing:

```
<project-name>/
  project.yaml         # schema, relationships, metadata, version
  data/
    <table>.csv        # one CSV file per table, with header row
  playground.db        # derived; rebuildable from project.yaml + data/
  history.log          # append-only command/replay log (see ADR-0006)
```

- `project.yaml` carries a top-level `version: 1` field from the
  outset, plus all schema, relationship, and project metadata.
- Table data lives in `data/<table>.csv` (UTF-8, header row, RFC
  4180 quoting). One file per table keeps diffs scoped and avoids
  monolithic YAML.
- `playground.db` is a **derived artifact**. The authoritative
  state is `project.yaml` + `data/`. The `.db` file is kept when
  present (we never silently drop it) but can be rebuilt from the
  text sources at any time.
  - Rebuilding when no `.db` exists: silent, automatic.
  - Rebuilding when a `.db` exists: requires user confirmation
    with a summary diff (e.g. "3 tables, 47 rows will be
    recreated; existing `.db` will be replaced").
- A `.gitignore` template is created in each project; by default
  the `.db` file is ignored so version control captures only the
  authoritative sources.

## Consequences

- Projects round-trip cleanly through git, email, and zip.
- Large practice data sets remain efficient (CSV is appropriate).
- Schema review remains pleasant (YAML is appropriate).
- The app must be able to (re)build a database from the text
  sources at any time — this is a first-class code path, not an
  edge case.
- The `version` field opens the door to format migrations as the
  app evolves; old projects load by running registered migrators
  in sequence.

## Amendments

### Amendment 1 — runtime data flow ([ADR-0015](0015-project-storage-runtime.md))

The phrase "`playground.db` is a derived artifact" describes a
*recovery* property: the database can always be reconstructed
from `project.yaml` + `data/`. It does not describe runtime
data flow.

At write time, all persistence targets (the SQLite database,
`project.yaml`, the relevant `data/<table>.csv` files, and
`history.log`) share a single source — the user's command — and
are written alongside one another in a defined order (see
ADR-0015 §6). None of the text files is "downstream" of the
database at write time.

### Amendment 2 — `.db` rebuild trigger ([ADR-0015](0015-project-storage-runtime.md))

The "rebuild with confirmation when `.db` exists" semantics in
the original Decision section are replaced by a simpler model:

- On load, if `playground.db` exists, it is opened as-is.
- On load, if `playground.db` is missing, it is rebuilt
  silently from `project.yaml` + `data/`.
- A new app-level command, `rebuild`, explicitly discards the
  current `playground.db` and reconstructs it from the text
  sources, with a confirmation prompt and a summary of what
  will be reconstructed.

The application does not attempt to detect drift between the
database and the text sources automatically. `rebuild` is the
explicit user-driven path for cases where drift exists (git
pull over an existing `.db`, hand edits to YAML/CSV, recovery
after a rare failure described in ADR-0015 §6).