Unexpected maths aggregating with decimals #32

Closed
opened 2026-06-12 14:12:21 +01:00 by oli · 1 comment
Owner

This log:

2026-06-12T09:44:38Z|ok|create table Customers with pk
2026-06-12T09:44:52Z|ok|add column Customers: name (text)
2026-06-12T09:44:58Z|ok|seed Customers
2026-06-12T09:45:22Z|ok|create table Products with pk
2026-06-12T09:45:32Z|ok|add column Products : name(text)
2026-06-12T09:45:46Z|ok|add column Products : price(decimal)
2026-06-12T09:45:52Z|ok|seed Products
2026-06-12T09:46:23Z|ok|show table Products
2026-06-12T09:46:27Z|ok|show data Products
2026-06-12T09:46:35Z|ok|show data Products
2026-06-12T09:47:05Z|ok|create table OrderLines with pk
2026-06-12T09:47:30Z|ok|create table Orders with pk
2026-06-12T09:47:44Z|ok|add column Orders: date(date)
2026-06-12T09:49:32Z|ok|add 1:n relationship from Customers.id to Orders.customer_id --create-fk
2026-06-12T09:52:00Z|ok|add 1:n relationship from Orders.id to OrderLines.order_id --create-fk
2026-06-12T09:54:33Z|ok|add 1:n relationship from  Products.id to OrderLines.product_id --create-fk
2026-06-12T10:01:42Z|ok|seed Orders 10
2026-06-12T10:02:05Z|ok|seed OrderLines 10
2026-06-12T10:03:03Z|ok|add column OrderLines: count(int)
2026-06-12T10:06:41Z|ok|seed OrderLines.count
2026-06-12T10:10:03Z|ok|select c.name as customer_name, o.id as order_id, o.date, sum(ol.count*p.price) as total from Orders o join OrderLines ol on o.id=ol.order_id join Products p on p.id=ol.product_id join Customers c on c.id=o.customer_id group by o.customer_id

Why does sum(ol.count * p.price) show rounding errors? Should I expect that?

[advanced] select c.name as customer_name, o.id as order_id, o.date, sum(ol.count*p.price) as total from Orders o join OrderLines ol on o.id=ol.order_id join Products p on p.id=ol.product_id join Customers c on c.id=o.customer_id group by o.customer_id ✓
[system] ┌───────────────────┬──────────┬────────────┬────────────────────┐
[system] │ customer_name     │ order_id │ date       │ total              │
[system] ├───────────────────┼──────────┼────────────┼────────────────────┤
[system] │ Myra Wiza         │        6 │ 2023-01-24 │ 44868.719999999994 │
[system] │ Dortha Wolff      │        4 │ 2022-07-18 │ 13992.32           │
[system] │ Isidro Gibson     │        9 │ 2023-01-05 │ 2601.56            │
[system] │ Danielle McClure  │        2 │ 2024-04-20 │ 54507.649999999994 │
[system] │ Jarvis Windler    │        7 │ 2022-07-22 │ 23332.2            │
[system] │ Mckayla Rosenbaum │        3 │ 2024-05-06 │ 5371.16            │
[system] └───────────────────┴──────────┴────────────┴────────────────────┘
This log: ``` 2026-06-12T09:44:38Z|ok|create table Customers with pk 2026-06-12T09:44:52Z|ok|add column Customers: name (text) 2026-06-12T09:44:58Z|ok|seed Customers 2026-06-12T09:45:22Z|ok|create table Products with pk 2026-06-12T09:45:32Z|ok|add column Products : name(text) 2026-06-12T09:45:46Z|ok|add column Products : price(decimal) 2026-06-12T09:45:52Z|ok|seed Products 2026-06-12T09:46:23Z|ok|show table Products 2026-06-12T09:46:27Z|ok|show data Products 2026-06-12T09:46:35Z|ok|show data Products 2026-06-12T09:47:05Z|ok|create table OrderLines with pk 2026-06-12T09:47:30Z|ok|create table Orders with pk 2026-06-12T09:47:44Z|ok|add column Orders: date(date) 2026-06-12T09:49:32Z|ok|add 1:n relationship from Customers.id to Orders.customer_id --create-fk 2026-06-12T09:52:00Z|ok|add 1:n relationship from Orders.id to OrderLines.order_id --create-fk 2026-06-12T09:54:33Z|ok|add 1:n relationship from Products.id to OrderLines.product_id --create-fk 2026-06-12T10:01:42Z|ok|seed Orders 10 2026-06-12T10:02:05Z|ok|seed OrderLines 10 2026-06-12T10:03:03Z|ok|add column OrderLines: count(int) 2026-06-12T10:06:41Z|ok|seed OrderLines.count 2026-06-12T10:10:03Z|ok|select c.name as customer_name, o.id as order_id, o.date, sum(ol.count*p.price) as total from Orders o join OrderLines ol on o.id=ol.order_id join Products p on p.id=ol.product_id join Customers c on c.id=o.customer_id group by o.customer_id ``` Why does `sum(ol.count * p.price)` show rounding errors? Should I expect that? ``` [advanced] select c.name as customer_name, o.id as order_id, o.date, sum(ol.count*p.price) as total from Orders o join OrderLines ol on o.id=ol.order_id join Products p on p.id=ol.product_id join Customers c on c.id=o.customer_id group by o.customer_id ✓ [system] ┌───────────────────┬──────────┬────────────┬────────────────────┐ [system] │ customer_name │ order_id │ date │ total │ [system] ├───────────────────┼──────────┼────────────┼────────────────────┤ [system] │ Myra Wiza │ 6 │ 2023-01-24 │ 44868.719999999994 │ [system] │ Dortha Wolff │ 4 │ 2022-07-18 │ 13992.32 │ [system] │ Isidro Gibson │ 9 │ 2023-01-05 │ 2601.56 │ [system] │ Danielle McClure │ 2 │ 2024-04-20 │ 54507.649999999994 │ [system] │ Jarvis Windler │ 7 │ 2022-07-22 │ 23332.2 │ [system] │ Mckayla Rosenbaum │ 3 │ 2024-05-06 │ 5371.16 │ [system] └───────────────────┴──────────┴────────────┴────────────────────┘ ```
Collaborator

Fixed in 3d4a0fd.

Why it happens: SQLite has no native decimal/BCD type — its only storage classes are NULL/INTEGER/REAL/TEXT/BLOB (NUMERIC is an affinity, not a type). We store decimal as exact TEXT, so raw decimal columns round-trip perfectly (100.10 keeps its trailing zero). But SQLite's arithmetic operators and aggregates only work on INTEGER/REAL, so ol.count * p.price implicitly coerces the TEXT to an IEEE-754 double, and the computed result carries no playground type — so sum(...) rendered the double's full noise (298.59999999999997 for 298.60). So: no, you shouldn't expect to see that — it's a display artifact, now trimmed.

Fix: floating-point values are rounded to 15 significant figures for display (a double's reliable precision) and then shown in their shortest round-tripping form — 298.59999999999997298.6, 0.1 + 0.20.3. Wired into format_cell (result-set / show data cells) only, since that's the only surface where arithmetic noise surfaces.

Deliberately left exact (the boundary is semantic, not cosmetic):

  • CSV persistence keeps the exact shortest round-trip, so a stored real survives save/load byte-for-byte.
  • render_value is a canonical identity key for the uniqueness dry-runs (dry_run_unique, check_uniqueness_collisions) — rounding it would report a UNIQUE collision the exact-valued engine would actually accept. (Caught in review; render_value stays full-precision with a regression test.)
  • FK-key matching and EXPLAIN-SQL literals likewise keep full precision.

Raw decimal columns are untouched — TEXT, rendered verbatim including trailing zeros.

Not pursued: exact decimal arithmetic (a SQLite extension exposing decimal_mul/decimal_sum) would mean rewriting your standard-SQL operators into function calls, defeating both the "validated SQL runs verbatim" model and the goal of teaching ordinary SQL. Display-fix is the pragmatic answer.

Decision recorded in ADR-0005 Amendment 1. +7 tests; full suite green (2424 pass, 1 ignored).

Fixed in `3d4a0fd`. **Why it happens:** SQLite has no native decimal/BCD type — its only storage classes are NULL/INTEGER/REAL/TEXT/BLOB (`NUMERIC` is an *affinity*, not a type). We store `decimal` as exact **TEXT**, so raw decimal columns round-trip perfectly (`100.10` keeps its trailing zero). But SQLite's arithmetic operators and aggregates only work on INTEGER/REAL, so `ol.count * p.price` implicitly coerces the TEXT to an IEEE-754 **double**, and the computed result carries no playground type — so `sum(...)` rendered the double's full noise (`298.59999999999997` for `298.60`). So: no, you shouldn't expect to *see* that — it's a display artifact, now trimmed. **Fix:** floating-point values are rounded to **15 significant figures for display** (a double's reliable precision) and then shown in their shortest round-tripping form — `298.59999999999997` → `298.6`, `0.1 + 0.2` → `0.3`. Wired into `format_cell` (result-set / `show data` cells) only, since that's the only surface where arithmetic noise surfaces. **Deliberately left exact** (the boundary is semantic, not cosmetic): - **CSV persistence** keeps the exact shortest round-trip, so a stored `real` survives save/load byte-for-byte. - **`render_value`** is a *canonical identity key* for the uniqueness dry-runs (`dry_run_unique`, `check_uniqueness_collisions`) — rounding it would report a UNIQUE collision the exact-valued engine would actually accept. (Caught in review; `render_value` stays full-precision with a regression test.) - **FK-key matching** and **EXPLAIN-SQL literals** likewise keep full precision. Raw `decimal` columns are untouched — TEXT, rendered verbatim including trailing zeros. **Not pursued:** exact decimal *arithmetic* (a SQLite extension exposing `decimal_mul`/`decimal_sum`) would mean rewriting your standard-SQL operators into function calls, defeating both the "validated SQL runs verbatim" model and the goal of teaching ordinary SQL. Display-fix is the pragmatic answer. Decision recorded in **ADR-0005 Amendment 1**. +7 tests; full suite green (2424 pass, 1 ignored).
Sign in to join this conversation.