Files

T

claude@clouddev1 4fca862c6c Project storage runtime: ADR-0015 + ADR-0004/0007 amendments

Designs track-2 lifecycle and persistence end-to-end: per-command
write-through to db+yaml+csv+history.log gated by the combined db
persistence logic with commit-db-last ordering; existence-only load
with explicit rebuild command; --resume CLI flag backed by
<data-root>/last_project; in-TUI list-with-browse picker; lock file
for single-instance enforcement; fatal-banner-then-quit failure
model (with --resume making restart cheap); fatal CSV row-load
errors with full diagnosis; YYYYMMDD-word-word-word temp naming
with display-name prettifier; collision-checked names for both
temp and user-supplied projects. Project name lives only on the
filesystem (not duplicated in YAML). ADR-0004 and ADR-0007 amended
in place. requirements.md and CLAUDE.md updated; OOS-6 (global
rolling history) tracked as deferred.

2026-05-07 19:53:47 +00:00

22 KiB

Raw Blame History

ADR-0015: Project storage runtime

Status

Accepted. Amends ADR-0004 (project file format) and ADR-0007 (sharing and export); see the "Relationship to earlier ADRs" section at the end for the exact deltas.

Context

ADR-0004 defined the on-disk shape of a project — project.yaml

data/<table>.csv + history.log, with playground.db as a derived artifact. It deliberately did not specify runtime semantics: when a project comes into existence, where it lives, how the on-disk files are kept consistent with the running SQLite database, what happens on load, on failure, on concurrent open, and how the canonical app-level commands (save, load, new, export, import) are scoped.

Track 1 of the application built everything against an in-memory SQLite database. Every quit lost all work. This is the largest single UX gap left in the project, and the next useful feature (replay/undo, ADR-0006) depends on the history.log written here.

This ADR fills the runtime gap. It commits to a single persistence model — every successful command writes through to all targets immediately, and validation gates everything — and works the resulting design through to file naming, the load picker, the failure model, and concurrent-open behaviour.

Decision

1. Lifecycle and locations

There is no in-memory database in normal operation. Every session is backed by a project on disk.

Startup with no CLI argument: the application creates a new temporary project under the OS data directory (see below), opens it, and runs against it.
Startup with a CLI argument (rdbms-playground <path>, requirement L1): the application opens the project at that path. If the path does not exist or does not look like a project (no project.yaml), it refuses with a friendly error.
save / save as elevate or copy a project to a chosen location.
load opens a different project (see section 7).
new creates a fresh temp project from inside the running application, after closing the current one.

The OS data root is platform-standard:

Linux: $XDG_DATA_HOME/rdbms-playground (defaulting to ~/.local/share/rdbms-playground when XDG_DATA_HOME is unset).
macOS: ~/Library/Application Support/rdbms-playground.
Windows: %APPDATA%\rdbms-playground.

Inside the data root: projects/ holds projects — both auto-generated temp ones and ones the user has saved with a name of their choosing. There is no requirement that named projects move out of the data root, and no encouragement to do so: keeping a saved project right alongside the temp ones is the easiest workflow and is fully supported. Users who prefer a different home (a course directory, a shared drive, a git working tree) save there instead. The application prescribes nothing.

The data root also carries a small state file last_project (a single line containing the absolute path of the most recently opened project). It exists to support --resume (section 7).

A --data-dir CLI flag fully replaces the OS-standard data root for the duration of that run; both project creation and the load picker's listing use the supplied directory and only that directory. The last_project state file is read and written under the active data root, so a user with multiple data roots gets independent resume histories per root, which is the intuitive behaviour.

2. Project naming and display name

Temp project directory names follow the pattern <YYYYMMDD>-<word>-<word>-<word>, where the words are drawn from a small built-in wordlist compiled into the binary (no external file or network call). Example: 20260507-water-buffalo-skating. The leading date keeps the file listing chronologically sortable; the words give learners something nameable to refer to.

Named projects use whatever directory name the user chose at save time.

Collision handling.

For auto-generated temp names: before creating the directory, the application checks for an existing entry of the same name in the data root and regenerates the three-word slug if one is found. The wordlist is large enough (multiple categories, dozens of words each) that collisions are essentially never observed in practice; the check is cheap and removes the failure mode entirely.
For user-supplied names at save / save as / import: if the target directory already exists (whether it contains a project or anything else), the operation is refused with a friendly error. The user picks a different name or moves/removes the existing directory first. We deliberately do not auto-suffix or merge — silently changing the name the user typed, or writing into someone else's directory, is worse than asking them to pick again.

The application carries a display name derived from the project directory name by a small prettifier:

Strip a leading YYYYMMDD- if present (temp projects).
Split on - (kebab-case), _ (snake_case), or case boundaries (camelCase / PascalCase).
Title-case each word.

So 20260507-water-buffalo-skating displays as "Water Buffalo Skating"; MyOrders displays as "My Orders"; customer_demo displays as "Customer Demo".

The display name is shown in the bottom status bar at all times, prefixed with Project: so it's unambiguous. This is how the user knows which project they are editing.

3. `project.yaml` shape

Flat ordered lists. Tables and columns preserve declaration order; relationships preserve creation order.

version: 1
project:
  created_at: 2026-05-07T14:30:12Z
tables:
  - name: Customers
    primary_key: [id]
    columns:
      - { name: id, type: serial }
      - { name: Name, type: text }
relationships:
  - name: Customers_id_to_Orders_CustId
    parent: { table: Customers, column: id }
    child:  { table: Orders,    column: CustId }
    on_delete: cascade
    on_update: no_action

The version: 1 field is required. Migrators (section 9) upgrade older versions on load. The project's name is not stored in project.yaml; the directory name on disk is the canonical name. Recording it twice would create an opportunity for the two to drift if the user renamed the directory by hand; with one source of truth, that question doesn't arise.

4. CSV encoding

One file per table, data/<TableName>.csv, UTF-8, RFC 4180 quoting, header row carrying column names in declaration order.

Per-type encoding:

Type	CSV form
`text`	RFC 4180 string
`int`	decimal integer
`real`	shortest-round-trip decimal
`decimal`	string form already validated by `value.rs`
`bool`	`true` / `false`
`date`	`YYYY-MM-DD`
`datetime`	ISO 8601 with `T` and a `Z` or offset
`blob`	base64 (standard alphabet, padded)
`serial`	integer
`shortid`	base58 string

NULL is the empty unquoted field; the empty quoted field ("") is an empty string. The distinction is preserved because SQL preserves it and the playground is meant to teach SQL.

5. `history.log` format

Append-only, one record per line, three pipe-separated fields:

2026-05-07T14:30:12Z|ok|create table Customers with pk id:serial
2026-05-07T14:30:30Z|ok|insert into Customers ('Alice')

Timestamp in ISO 8601 with Z.
Status is always ok in v1, because failed commands are not recorded — this matches ADR-0006's "successfully executed command" wording and keeps the log directly replayable. The status field is kept in the line format anyway so future use cases (audit logs that record attempts, validation diagnostics, distinguishing user-issued from imported commands) can carry additional values without a format break.
Command is the user's input as typed. Newlines (when multi-line input arrives, requirement I1) are escaped as literal \n.

history.log is not included in export (see section 11 and the ADR-0007 amendment). It is private to the user's working copy.

6. Persistence ordering

A successful user command produces effects in four targets: the SQLite database, project.yaml, the relevant data/<table>.csv file(s), and history.log. INV-2 from the Phase-1 record requires that the combined db persistence logic — validation, metadata-table handling, the SQLite mutations — gate everything else.

The implementation order inside a command is:

Validate and stage in the database. Open a SQLite transaction. Perform validation, schema/metadata mutations, data mutations. Do not commit yet.
Stage text targets. Write project.yaml (if schema or relationships changed) and affected data/<table>.csv files (if rows changed) to temp files inside the project directory. Append the new line for history.log to a temp copy. fsync each.
Rename text targets. Atomic rename each temp file to its final path (POSIX rename(2); on Windows MoveFileEx(REPLACE_EXISTING)).
Commit the SQLite transaction.

Failure handling:

Failure in step 1 or 2 → roll back the SQLite transaction; no rename happens; on-disk state is unchanged. Surface the failure (see section 8) and quit.
Failure in step 3 (rename fails after fsync) → roll back the SQLite transaction; orphan temp files remain in the project directory and are cleaned up on next open. On-disk semantic state is unchanged. Surface and quit.
Failure in step 4 (commit fails after rename succeeded) → rare; on next launch the on-disk text is ahead of the playground.db. The user sees stale data and runs rebuild (section 7) to recover. Documented edge case; acceptable for v1.

This ordering is "commit db last so a fatal failure leaves disk state recoverable via rebuild."

7. Load and rebuild

Load on startup or via the load command. If playground.db exists in the project directory, it is opened as-is. If it does not exist, it is rebuilt silently from project.yaml + data/<table>.csv. There is no automatic detection of drift between the database and the text sources on load; that's what rebuild is for.

--resume CLI option. Equivalent to passing the path recorded in the <data-root>/last_project state file as the positional CLI argument. If last_project is missing or points at a path that no longer exists, --resume exits with an error pointing the user at the absent project; it does not silently fall back to creating a new temp project, because the user's intent ("resume what I had") is clear and silent fallback would mask the problem. --resume and an explicit positional path are mutually exclusive; the combination errors out.

The last_project file is rewritten on every successful project open (startup, load, new, save as, import). A clean exit doesn't clear it — that's the whole point of --resume after a quit.

CSV row-load failure during rebuild. When rebuilding playground.db from project.yaml + data/<table>.csv, each row insert can fail (malformed CSV, type-validation failure, FK violation, NOT NULL violation, etc.). The behaviour mirrors the persistence failure model (section 8): the rebuild stops at the first failing row and surfaces a fatal error of the form

Unable to load row N from data/<table>.csv into table <table>: <diagnosis from the value/FK/constraint validator>

The application then quits. There is no realistic case where a CSV produced by a previous well-behaved session contains an unloadable row; if one does, something has gone wrong (hand edit, partial git merge, file corruption) and the user should fix the file or restore an earlier copy. Continuing past the bad row would either lose data silently (skip it) or load partial state (stop but keep what loaded), both of which leave the user in a worse position than a clear error message.

rebuild app-level command. Discards the current playground.db and reconstructs it from project.yaml + data/. Always shows a confirmation prompt with a summary ("12 tables, 47 rows will be reconstructed; existing playground.db will be replaced") before doing the work. Useful when:

The user pulled new YAML/CSV from git over an old .db.
A prior persistence failure left the .db behind the text (section 6, step-4 failure mode).
The user hand-edited the YAML or CSV outside the app.

Load picker UX. The load command opens an in-TUI modal listing temp projects from the data dir, sorted newest first, with the prettified display name and creation timestamp. Arrow keys select; Enter loads; Esc cancels; pressing b (for "browse") switches the modal to a path-entry prompt for projects outside the data dir. This covers both common (pick a recent temp) and uncommon (open a named project at a custom path) cases without forcing the user into a fully manual path entry up front.

8. Failure model

Persistence failures are fatal. The application surfaces a banner with the operation, the path, and the OS error message, then quits cleanly so the banner remains visible above the shell prompt. The user investigates (disk full, permission denied, network filesystem hiccup) and restarts.

This is the right model because the realistic failure modes for a local data directory do not heal transiently. Showing a warning and continuing risks silent loss when the user later quits the app while the failure window is still open.

The persistence ordering in section 6 ensures that "fatal failure → quit" never leaves the disk in a state that cannot be recovered: it is either unchanged (the common case) or recoverable via rebuild (the rare step-4 failure).

The "quit on failure" mode is also not anticipated to be particularly disruptive in practice. Even if a transient issue (a network drive timing out, an antivirus scanner holding a file briefly) does cause a fatal failure, the user's path back into the session is just rdbms-playground --resume. With section 6's ordering guaranteeing recoverable disk state and --resume guaranteeing one-command return, the cost of erring on the side of "stop and let the user investigate" is small enough that the safety benefit dominates.

9. Migration framework (F3)

project.yaml carries version: 1 from the outset. Future format changes bump the version and add a registered migrator function:

fn migrate_v1_to_v2(raw: &mut RawProject) -> Result<(), MigrateError> { ... }

Migrators are stored in an ordered list keyed by source version. On load, the application:

Reads the file's version.
If version < latest_known, copies the original file to project.yaml.v<N>.bak (where <N> is the original version).
Runs each migrator in sequence from version + 1 to latest_known.
Writes the upgraded YAML back at the new version.
If any migrator fails, restores the .bak and surfaces the failure as a fatal load error.

The framework is built in v1 even though no migrator exists yet. The first real migrator (when v2 lands) exercises it.

10. Concurrency

A lock file <project>/.rdbms-playground.lock is written when a project is opened, containing the PID and hostname of the owning process. On open:

If no lock file exists: take the lock and proceed.
If a lock file exists with a live PID on this host: refuse with a friendly error pointing the user at the running instance.
If a lock file exists but the PID is dead (or it lists a different hostname): take the lock (clean handover from a crashed prior instance).

The lock is removed on clean exit. Crashes leave it behind; the next open reclaims it.

The lock blocks only other rdbms-playground TUI instances. External read-only tooling (sqlite3 playground.db -readonly, text editors looking at project.yaml, etc.) is not prevented. The user is on their own if they fiddle with the project files concurrently with the running app — that's a power-user workflow we don't get in the way of.

11. App-level commands

The track 2 command set, all available in both modes per ADR-0003:

save — for a temp project, prompts for a target directory and elevates to a named project (effectively identical to save as). For a named project, reports "auto-saved; use save as to copy to a new location."
save as — prompts for a target directory; copies the entire project there and switches to operating on the copy.
load — opens the load picker (section 7).
new — creates a fresh temp project; closes the current one cleanly first (auto-save guarantees the current state is on disk).
rebuild — section 7.
export — produces a zip per ADR-0007, excluding both playground.db and history.log (see ADR-0007 amendment below). Default filename pattern unchanged.
import — accepts an exported zip, unpacks it into a named project at a chosen location, runs rebuild on open. The exported zip has no playground.db and no history.log, so a fresh playground.db is created from YAML+CSV, and history.log starts empty. The chosen target directory must not already exist (per the §2 collision rule); the user picks a different name or removes the existing directory first.

The .gitignore template (F2) is created in every new project directory and excludes:

/playground.db
/.rdbms-playground.lock
/project.yaml.v*.bak

playground.db is rebuildable; the lock file is per-process; migration backup files are local recovery aids that don't belong in shared history. The data/ directory and project.yaml itself are not ignored — they are the shared source of truth.

history.log is not ignored by default. Whether to commit one's working log is a per-user, per-project taste question — some learners will treat the log as part of the audit trail and want it in git; others will prefer to keep it private. The export zip handles the "share with strangers" case (ADR-0007 amendment 1); committing to git is a different decision and we leave it to the user.

12. Persistent input history (I2-persist)

The in-memory navigable input history (Up/Down arrows, draft preservation, consecutive-duplicate dedup) gains a loader: on project open, the history navigation seed is populated from the project's history.log (latest N entries, where N is the same in-memory cap as today). New successful commands append to history.log and are pushed onto the in-memory stack as they are now.

Project-scoped only. A separate global rolling history is deferred to a future ADR (OOS-6).

13. Out of scope

The following are tracked but not part of this ADR:

OOS-1. Snapshot ring buffer and undo (U1, U2, ADR-0006).
OOS-2. replay command (U4). The history.log format is replay-compatible; the command itself ships later.
OOS-3. Multi-tab output / V4 session log work.
OOS-4. Tab completion or syntax highlighting for the new commands' arguments.
OOS-5. L2 (submitting a command alongside project load).
OOS-6. Global rolling input history.

Relationship to earlier ADRs

This ADR amends two earlier ADRs in place rather than superseding them outright; the earlier ADRs remain the canonical reference for everything outside the amended clauses.

ADR-0004 — Project file format. The "playground.db is a derived artifact" framing remains correct for recovery (the database can be reconstructed from text sources at any time). It does not describe runtime data flow: at write time, all four targets (db, yaml, csv, history.log) share a single source — the user's command — and are written alongside one another per section 6 here. The "rebuild with confirmation when .db exists" semantics are reframed: there is no automatic drift detection on load; the rebuild path is the explicit rebuild command, which prompts for confirmation when invoked.
ADR-0007 — Sharing and export. The export contents are now project.yaml + data/, excluding both playground.db (as before) and history.log (new). Rationale: the history is the user's working log and may contain commands they don't want to share. Export remains zip-based; default filename pattern is unchanged.

The amendments are made in place in those ADR files, with a note pointing to this ADR.

Consequences

The biggest UX gap closes: quitting no longer loses work.
A failed command leaves the disk unchanged. A succeeded command is durable on disk before the application acknowledges it, with one documented edge case that the rebuild command exists to fix.
The persistence path runs four file writes per command in the common case. At teaching scale this is invisible; at bulk-insert scale (thousands of rows in tight loops) it could matter, and a future "batch" command will be the remedy. Premature debouncing is rejected (it would create a real inconsistency window for negligible gain at this scale).
The "commit db last" ordering is the load-bearing invariant for failure recovery. Future contributors changing the persistence flow must preserve it.
The display-name prettifier is small and lives close to the project loader; future filename conventions (instructor-supplied lesson kits, perhaps) plug into it.
The lock file is a small piece of state that survives crashes; the "live PID on this host" check is the load-bearing piece of its correctness. Cross-host network filesystems will give us false positives there; we accept that and document it if real users hit it.
history.log becomes the persistent history surface. Once replay (OOS-2) and undo (OOS-1) land, they read from the same file with no schema changes.

22 KiB Raw Blame History