# ADR-0015: Project storage runtime

## Status

Accepted. Amends ADR-0004 (project file format) and ADR-0007
(sharing and export); see the "Relationship to earlier ADRs"
section at the end for the exact deltas.

## Context

ADR-0004 defined the on-disk shape of a project — `project.yaml`
+ `data/<table>.csv` + `history.log`, with `playground.db` as a
derived artifact. It deliberately did not specify runtime
semantics: when a project comes into existence, where it lives,
how the on-disk files are kept consistent with the running
SQLite database, what happens on load, on failure, on
concurrent open, and how the canonical app-level commands
(`save`, `load`, `new`, `export`, `import`) are scoped.

Track 1 of the application built everything against an
in-memory SQLite database. Every quit lost all work. This is
the largest single UX gap left in the project, and the next
useful feature (replay/undo, ADR-0006) depends on the
`history.log` written here.

This ADR fills the runtime gap. It commits to a single
persistence model — *every successful command writes through
to all targets immediately, and validation gates everything* —
and works the resulting design through to file naming, the
load picker, the failure model, and concurrent-open behaviour.

## Decision

### 1. Lifecycle and locations

There is no in-memory database in normal operation. Every
session is backed by a project on disk.

- **Startup with no CLI argument:** the application creates a
  new temporary project under the OS data directory (see
  below), opens it, and runs against it.
- **Startup with a CLI argument** (`rdbms-playground <path>`,
  requirement L1): the application opens the project at that
  path. If the path does not exist or does not look like a
  project (no `project.yaml`), it refuses with a friendly
  error.
- **`save` / `save as`** elevate or copy a project to a chosen
  location.
- **`load`** opens a different project (see section 7).
- **`new`** creates a fresh temp project from inside the
  running application, after closing the current one.

The OS data root is platform-standard:

- Linux: `$XDG_DATA_HOME/rdbms-playground` (defaulting to
  `~/.local/share/rdbms-playground` when `XDG_DATA_HOME` is
  unset).
- macOS: `~/Library/Application Support/rdbms-playground`.
- Windows: `%APPDATA%\rdbms-playground`.

Inside the data root: `projects/` holds projects — both
auto-generated temp ones and ones the user has saved with a
name of their choosing. There is no requirement that named
projects move *out* of the data root, and no encouragement to
do so: keeping a saved project right alongside the temp ones
is the easiest workflow and is fully supported. Users who
prefer a different home (a course directory, a shared drive,
a git working tree) save there instead. The application
prescribes nothing.

The data root also carries a small state file
`last_project` (a single line containing the absolute path of
the most recently opened project). It exists to support
`--resume` (section 7).

A `--data-dir` CLI flag fully replaces the OS-standard data
root for the duration of that run; both project creation and
the load picker's listing use the supplied directory and only
that directory. The `last_project` state file is read and
written under the active data root, so a user with multiple
data roots gets independent resume histories per root, which
is the intuitive behaviour.

### 2. Project naming and display name

Temp project directory names follow the pattern
`<YYYYMMDD>-<word>-<word>-<word>`, where the words are drawn
from a small built-in wordlist compiled into the binary (no
external file or network call). Example:
`20260507-water-buffalo-skating`. The leading date keeps the
file listing chronologically sortable; the words give learners
something nameable to refer to.

Named projects use whatever directory name the user chose at
`save` time.

**Collision handling.**

- For auto-generated temp names: before creating the
  directory, the application checks for an existing entry of
  the same name in the data root and regenerates the
  three-word slug if one is found. The wordlist is large
  enough (multiple categories, dozens of words each) that
  collisions are essentially never observed in practice; the
  check is cheap and removes the failure mode entirely.
- For user-supplied names at `save` / `save as` / `import`:
  if the target directory already exists (whether it
  contains a project or anything else), the operation is
  refused with a friendly error. The user picks a different
  name or moves/removes the existing directory first. We
  deliberately do not auto-suffix or merge — silently
  changing the name the user typed, or writing into someone
  else's directory, is worse than asking them to pick again.

The application carries a *display name* derived from the
project directory name by a small prettifier:

- Strip a leading `YYYYMMDD-` if present (temp projects).
- Split on `-` (kebab-case), `_` (snake_case), or case
  boundaries (camelCase / PascalCase).
- Title-case each word.

So `20260507-water-buffalo-skating` displays as
"Water Buffalo Skating"; `MyOrders` displays as "My Orders";
`customer_demo` displays as "Customer Demo".

The display name is shown in the bottom status bar at all
times, prefixed with `Project:` so it's unambiguous. This is
how the user knows which project they are editing.

### 3. `project.yaml` shape

Flat ordered lists. Tables and columns preserve declaration
order; relationships preserve creation order.

```yaml
version: 1
project:
  created_at: 2026-05-07T14:30:12Z
tables:
  - name: Customers
    primary_key: [id]
    columns:
      - { name: id, type: serial }
      - { name: Name, type: text }
relationships:
  - name: Customers_id_to_Orders_CustId
    parent: { table: Customers, column: id }
    child:  { table: Orders,    column: CustId }
    on_delete: cascade
    on_update: no_action
```

The `version: 1` field is required. Migrators (section 9)
upgrade older versions on load. The project's name is
**not** stored in `project.yaml`; the directory name on disk
is the canonical name. Recording it twice would create an
opportunity for the two to drift if the user renamed the
directory by hand; with one source of truth, that question
doesn't arise.

### 4. CSV encoding

One file per table, `data/<TableName>.csv`, UTF-8, RFC 4180
quoting, header row carrying column names in declaration
order.

Per-type encoding:

| Type       | CSV form                              |
|------------|---------------------------------------|
| `text`     | RFC 4180 string                       |
| `int`      | decimal integer                       |
| `real`     | shortest-round-trip decimal           |
| `decimal`  | string form already validated by `value.rs` |
| `bool`     | `true` / `false`                      |
| `date`     | `YYYY-MM-DD`                          |
| `datetime` | ISO 8601 with `T` and a `Z` or offset |
| `blob`     | base64 (standard alphabet, padded)    |
| `serial`   | integer                               |
| `shortid`  | base58 string                         |

NULL is the empty unquoted field; the empty quoted field
(`""`) is an empty string. The distinction is preserved
because SQL preserves it and the playground is meant to teach
SQL.

### 5. `history.log` format

Append-only, one record per line, three pipe-separated fields:

```
2026-05-07T14:30:12Z|ok|create table Customers with pk id:serial
2026-05-07T14:30:30Z|ok|insert into Customers ('Alice')
```

- **Timestamp** in ISO 8601 with `Z`.
- **Status** is always `ok` in v1, because failed commands
  are not recorded — this matches ADR-0006's "successfully
  executed command" wording and keeps the log directly
  replayable. The status field is kept in the line format
  anyway so future use cases (audit logs that record
  attempts, validation diagnostics, distinguishing
  user-issued from imported commands) can carry additional
  values without a format break.
- **Command** is the user's input as typed. Newlines (when
  multi-line input arrives, requirement I1) are escaped as
  literal `\n`.

`history.log` is **not** included in `export` (see section 11
and the ADR-0007 amendment). It is private to the user's
working copy.

### 6. Persistence ordering

A successful user command produces effects in four targets:
the SQLite database, `project.yaml`, the relevant
`data/<table>.csv` file(s), and `history.log`. INV-2 from the
Phase-1 record requires that the **combined db persistence
logic** — validation, metadata-table handling, the SQLite
mutations — gate everything else.

The implementation order inside a command is:

1. **Validate and stage in the database.** Open a SQLite
   transaction. Perform validation, schema/metadata
   mutations, data mutations. Do not commit yet.
2. **Stage text targets.** Write `project.yaml` (if schema or
   relationships changed) and affected `data/<table>.csv`
   files (if rows changed) to temp files inside the project
   directory. Append the new line for `history.log` to a
   temp copy. `fsync` each.
3. **Rename text targets.** Atomic rename each temp file to
   its final path (POSIX `rename(2)`; on Windows
   `MoveFileEx(REPLACE_EXISTING)`).
4. **Commit the SQLite transaction.**

Failure handling:

- Failure in step 1 or 2 → roll back the SQLite transaction;
  no rename happens; on-disk state is unchanged. Surface the
  failure (see section 8) and quit.
- Failure in step 3 (rename fails after `fsync`) → roll back
  the SQLite transaction; orphan temp files remain in the
  project directory and are cleaned up on next open. On-disk
  semantic state is unchanged. Surface and quit.
- Failure in step 4 (commit fails after rename succeeded) →
  rare; on next launch the on-disk text is ahead of the
  `playground.db`. The user sees stale data and runs
  `rebuild` (section 7) to recover. Documented edge case;
  acceptable for v1.

This ordering is "commit db last so a fatal failure leaves
disk state recoverable via `rebuild`."

### 7. Load and rebuild

**Load on startup or via the `load` command.** If
`playground.db` exists in the project directory, it is opened
as-is. If it does not exist, it is rebuilt silently from
`project.yaml` + `data/<table>.csv`. There is no automatic
detection of drift between the database and the text sources
on load; that's what `rebuild` is for.

**`--resume` CLI option.** Equivalent to passing the path
recorded in the `<data-root>/last_project` state file as the
positional CLI argument. If `last_project` is missing or
points at a path that no longer exists, `--resume` exits
with an error pointing the user at the absent project; it
does **not** silently fall back to creating a new temp
project, because the user's intent ("resume what I had") is
clear and silent fallback would mask the problem. `--resume`
and an explicit positional path are mutually exclusive; the
combination errors out.

The `last_project` file is rewritten on every successful
project open (startup, `load`, `new`, `save as`, `import`).
A clean exit doesn't clear it — that's the whole point of
`--resume` after a quit.

**CSV row-load failure during rebuild.** When rebuilding
`playground.db` from `project.yaml` + `data/<table>.csv`,
each row insert can fail (malformed CSV, type-validation
failure, FK violation, NOT NULL violation, etc.). The
behaviour mirrors the persistence failure model (section 8):
the rebuild stops at the first failing row and surfaces a
fatal error of the form

> Unable to load row *N* from `data/<table>.csv` into table
> `<table>`: *&lt;diagnosis from the value/FK/constraint
> validator&gt;*

The application then quits. There is no realistic case
where a CSV produced by a previous well-behaved session
contains an unloadable row; if one does, something has gone
wrong (hand edit, partial git merge, file corruption) and
the user should fix the file or restore an earlier copy.
Continuing past the bad row would either lose data
silently (skip it) or load partial state (stop but keep
what loaded), both of which leave the user in a worse
position than a clear error message.

**`rebuild` app-level command.** Discards the current
`playground.db` and reconstructs it from `project.yaml` +
`data/`. Always shows a confirmation prompt with a summary
("12 tables, 47 rows will be reconstructed; existing
`playground.db` will be replaced") before doing the work.
Useful when:

- The user pulled new YAML/CSV from git over an old `.db`.
- A prior persistence failure left the `.db` behind the text
  (section 6, step-4 failure mode).
- The user hand-edited the YAML or CSV outside the app.

**Load picker UX.** The `load` command opens an in-TUI modal
listing temp projects from the data dir, sorted newest
first, with the prettified display name and creation
timestamp. Arrow keys select; Enter loads; Esc cancels;
pressing `b` (for "browse") switches the modal to a
path-entry prompt for projects outside the data dir. This
covers both common (pick a recent temp) and uncommon (open a
named project at a custom path) cases without forcing the
user into a fully manual path entry up front.

### 8. Failure model

Persistence failures are fatal. The application surfaces a
banner with the operation, the path, and the OS error
message, then quits cleanly so the banner remains visible
above the shell prompt. The user investigates (disk full,
permission denied, network filesystem hiccup) and restarts.

This is the right model because the realistic failure modes
for a local data directory do not heal transiently. Showing a
warning and continuing risks silent loss when the user later
quits the app while the failure window is still open.

The persistence ordering in section 6 ensures that "fatal
failure → quit" never leaves the disk in a state that cannot
be recovered: it is either unchanged (the common case) or
recoverable via `rebuild` (the rare step-4 failure).

The "quit on failure" mode is also not anticipated to be
particularly disruptive in practice. Even if a transient
issue (a network drive timing out, an antivirus scanner
holding a file briefly) does cause a fatal failure, the
user's path back into the session is just
`rdbms-playground --resume`. With section 6's ordering
guaranteeing recoverable disk state and `--resume`
guaranteeing one-command return, the cost of erring on the
side of "stop and let the user investigate" is small enough
that the safety benefit dominates.

### 9. Migration framework (F3)

`project.yaml` carries `version: 1` from the outset. Future
format changes bump the version and add a registered
migrator function:

```rust
fn migrate_v1_to_v2(raw: &mut RawProject) -> Result<(), MigrateError> { ... }
```

Migrators are stored in an ordered list keyed by source
version. On load, the application:

1. Reads the file's `version`.
2. If `version < latest_known`, copies the original file to
   `project.yaml.v<N>.bak` (where `<N>` is the original
   version).
3. Runs each migrator in sequence from `version + 1` to
   `latest_known`.
4. Writes the upgraded YAML back at the new version.
5. If any migrator fails, restores the `.bak` and surfaces
   the failure as a fatal load error.

The framework is built in v1 even though no migrator exists
yet. The first real migrator (when v2 lands) exercises it.

### 10. Concurrency

A lock file `<project>/.rdbms-playground.lock` is written
when a project is opened, containing the PID and hostname of
the owning process. On open:

- If no lock file exists: take the lock and proceed.
- If a lock file exists with a live PID on this host: refuse
  with a friendly error pointing the user at the running
  instance.
- If a lock file exists but the PID is dead (or it lists a
  different hostname): take the lock (clean handover from a
  crashed prior instance).

The lock is removed on clean exit. Crashes leave it behind;
the next open reclaims it.

The lock blocks only other rdbms-playground TUI instances.
External read-only tooling (`sqlite3 playground.db -readonly`,
text editors looking at `project.yaml`, etc.) is not
prevented. The user is on their own if they fiddle with the
project files concurrently with the running app — that's a
power-user workflow we don't get in the way of.

### 11. App-level commands

The track 2 command set, all available in both modes per
ADR-0003:

- **`save`** — for a temp project, prompts for a target
  directory and elevates to a named project (effectively
  identical to `save as`). For a named project, reports
  "auto-saved; use `save as` to copy to a new location."
- **`save as`** — prompts for a target directory; copies
  the entire project there and switches to operating on the
  copy.
- **`load`** — opens the load picker (section 7).
- **`new`** — creates a fresh temp project; closes the
  current one cleanly first (auto-save guarantees the
  current state is on disk).
- **`rebuild`** — section 7.
- **`export`** — produces a zip per ADR-0007, *excluding*
  both `playground.db` and `history.log` (see ADR-0007
  amendment below). Default filename pattern unchanged.
- **`import`** — accepts an exported zip, unpacks it into a
  named project at a chosen location, runs `rebuild` on
  open. The exported zip has no `playground.db` and no
  `history.log`, so a fresh `playground.db` is created from
  YAML+CSV, and `history.log` starts empty. The chosen
  target directory must not already exist (per the §2
  collision rule); the user picks a different name or
  removes the existing directory first.

The `.gitignore` template (F2) is created in every new
project directory and excludes:

```
/playground.db
/.rdbms-playground.lock
/project.yaml.v*.bak
```

`playground.db` is rebuildable; the lock file is
per-process; migration backup files are local recovery aids
that don't belong in shared history. The `data/` directory
and `project.yaml` itself are *not* ignored — they are the
shared source of truth.

`history.log` is **not** ignored by default. Whether to
commit one's working log is a per-user, per-project taste
question — some learners will treat the log as part of the
audit trail and want it in git; others will prefer to keep
it private. The export zip handles the "share with
strangers" case (ADR-0007 amendment 1); committing to git
is a different decision and we leave it to the user.

### 12. Persistent input history (I2-persist)

The in-memory navigable input history (Up/Down arrows,
draft preservation, consecutive-duplicate dedup) gains a
loader: on project open, the history navigation seed is
populated from the project's `history.log` (latest N entries,
where N is the same in-memory cap as today). New successful
commands append to `history.log` and are pushed onto the
in-memory stack as they are now.

Project-scoped only. A separate global rolling history is
deferred to a future ADR (OOS-6).

### 13. Out of scope

The following are tracked but not part of this ADR:

- **OOS-1.** Snapshot ring buffer and `undo` (U1, U2,
  ADR-0006).
- **OOS-2.** `replay` command (U4). The `history.log`
  format is replay-compatible; the command itself ships
  later.
- **OOS-3.** Multi-tab output / V4 session log work.
- **OOS-4.** Tab completion or syntax highlighting for the
  new commands' arguments.
- **OOS-5.** L2 (submitting a command alongside project
  load).
- **OOS-6.** Global rolling input history.

## Relationship to earlier ADRs

This ADR amends two earlier ADRs in place rather than
superseding them outright; the earlier ADRs remain the
canonical reference for everything outside the amended
clauses.

- **ADR-0004 — Project file format.** The "playground.db is a
  derived artifact" framing remains correct for *recovery*
  (the database can be reconstructed from text sources at any
  time). It does not describe runtime data flow: at write
  time, all four targets (db, yaml, csv, history.log) share a
  single source — the user's command — and are written
  alongside one another per section 6 here. The "rebuild
  with confirmation when `.db` exists" semantics are
  reframed: there is no automatic drift detection on load;
  the rebuild path is the explicit `rebuild` command, which
  prompts for confirmation when invoked.
- **ADR-0007 — Sharing and export.** The export contents are
  now `project.yaml` + `data/`, *excluding* both
  `playground.db` (as before) and `history.log` (new).
  Rationale: the history is the user's working log and may
  contain commands they don't want to share. Export remains
  zip-based; default filename pattern is unchanged.

The amendments are made in place in those ADR files, with a
note pointing to this ADR.

## Consequences

- The biggest UX gap closes: quitting no longer loses work.
- A failed command leaves the disk unchanged. A succeeded
  command is durable on disk before the application
  acknowledges it, with one documented edge case that the
  `rebuild` command exists to fix.
- The persistence path runs four file writes per command in
  the common case. At teaching scale this is invisible; at
  bulk-insert scale (thousands of rows in tight loops) it
  could matter, and a future "batch" command will be the
  remedy. Premature debouncing is rejected (it would create
  a real inconsistency window for negligible gain at this
  scale).
- The "commit db last" ordering is the load-bearing
  invariant for failure recovery. Future contributors
  changing the persistence flow must preserve it.
- The display-name prettifier is small and lives close to
  the project loader; future filename conventions
  (instructor-supplied lesson kits, perhaps) plug into it.
- The lock file is a small piece of state that survives
  crashes; the "live PID on this host" check is the
  load-bearing piece of its correctness. Cross-host network
  filesystems will give us false positives there; we accept
  that and document it if real users hit it.
- `history.log` becomes the persistent history surface.
  Once `replay` (OOS-2) and `undo` (OOS-1) land, they read
  from the same file with no schema changes.