BenzAuto — Scheduler / bay-management decision brief (#403)

1 · Evidence from staging (org 0001) — grounds the whole decision

Four facts pulled from the live database. They rule out "just add more capacity" and point straight at engine logic as the failure.

ABays are specialised, and the engine ignores it. Bay 1 & 2 = General Service, Bay 3 = Tyre & Alignment (a stray "ZZ APA Bay 9" = test junk, to clean up). The bays table already HAS a bay_type column, but the engine picks bays.find(b => !busy) — bay_type is never read. A tyre job can land in a general bay and vice-versa.
BDurations vary hugely — including multi-day. 30 real appointments ran 45 min → ~73 hours (avg ~239 min). Catalog estimates span 30 min → 240 min (8×). So a single fixed estimated_duration_min is wrong; we need a min/max range + buffer, and a multi-day path.
CIt is NOT a capacity problem. Busiest recent day = 4 appointments across 4 bays; most days 1–3. Plenty of slack — yet customers were told "no slots." The failure is engine logic (stop-at-3 + no time-of-day + ignored bay_type), not load.
DPolicy hooks are half-built. service_items.max_per_day and booking_windows exist on every row but sit null. The code already enforces both when present — nothing populates them yet.

2 · Current engine gaps — what #403 must fix

The whole "do we have a slot?" decision lives in one function (packages/db/src/services/slot-proposal.ts). Here is everything it gets wrong.

G1Stops at the first 3 slots from the earliest open time — no spread across the day, no "later", no paging.
G2No time-of-day input. The tool takes only a date — a customer can't ask for "afternoon" or "2pm".
G3Ignores bay_type — no equipment matching (tyre job ≠ tyre bay).
G4Fixed per-service duration — no min/max range, no cleanup buffer, no multi-day jobs.
G5No capacity policy — no reserved fast-track bay / hours for walk-ins, no per-class daily caps.
G6Chatbot side (shipped in #401): now time-aware and fabrication-guarded — the bot knows the date and can't invent a slot.

3 · The four candidate engines

Same problem (bays × time × job length, constrained by hours and policy), four shapes of answer. Full flow sketches live in the options diagram.

A · Patch the 30-min slot grid

Keep today's forward-walking loop but collect all fitting slots (not the first 3), add a time-of-day filter, apply reserve/caps as post-filters, match bay_type.
Effort S Pros smallest change; closes the live "afternoon" / "next week" failures on today's engine. Cons reserve/caps are bolt-ons; still can't model variable duration cleanly. Ceiling still a rigid grid.

B · Interval capacity planner (rule-driven) recommended

Model each bay as a timeline of free intervals. Place a job of duration[min,max] + buffer into an equipment-matched bay, then run a per-tenant policy layer (reserve-bays, reserve-hours, caps). Emit slots or windows, staff-confirm.
Effort M Pros real per-bay scheduling; variable duration + buffer + equipment; clean policy layer; explainable; stays in Bun/TypeScript. Cons more code than A; we own the interval-fitting logic. The recommended middle ground.

C · Constraint solver (OR-Tools CP-SAT)

Jobs = time intervals, bays = resources, policy = constraints; a solver finds a feasible/optimal assignment. Generalises to technicians and skills. Likely a separate Python service.
Effort L Pros most powerful, future-proof; provably optimal. Cons separate runtime; heavy ops; added latency; hard to explain "why no slot" to a customer. Biggest jump — hold in reserve.

D · Drop-off / day-bucket capacity

No exact start times. Sell capacity per day or per AM/PM by job class (light / standard / heavy), capped, with headroom held back for walk-ins. The Tekmetric / Shopmonkey model.
Effort S–M Pros dead-simple "leave the car" UX; robust to estimate error; easy caps + walk-in headroom. Cons no exact start time; wrong for customers who want "2pm sharp". Best fit for drop-off shops.

Option	Model	Booking UX	Effort	Best for
A	Patched fixed 30-min grid	Exact slots, spread + time-of-day filter	S	Shipping a fix now on today's engine
B	Interval capacity planner (rules)	Offer windows / slots, staff-confirm	M	Realistic per-bay scheduling, in our stack
C	CP-SAT constraint solver	Optimal slots / windows	L	Long term: technicians, skills, optimisation
D	Day / AM-PM capacity buckets	"Leave the car", no exact time	S–M	Drop-off shops

4 · Per-tenant policy schema — the real "bay management"

Every option reads the same per-org configuration. A treats reserve/caps as post-filters; B/C make them first-class; D leans on caps + reserve. One shape keeps workshops portable across whichever engine wins.

bayManagement: {
  bays:     [ { name, bayType/equipment[], active } ]          # bay_type EXISTS today
  services: # per service → { class: from service_type, duration:{min,max}, buffer, multiDay? }
  reserve:  { keepBaysOpen: 1, reserveHoursPct: 0, releaseAfter: "15:00" }  # walk-in protection
  caps:     { heavyPerDay, majorPerDay }                       # max_per_day exists, unused
  bookingMode: slots | windows | dropoff
  requireBayEquipment: true                                       # match job → bay_type
}

Grilling note — schema fights the data model bays and services are already first-class tables (Bay has bayType + a GiST no-double-book constraint; ServiceItem has serviceType, estimatedDurationMin, maxPerDay, bookingWindows). Storing them again inside a settings blob shadows the source of truth and drops the foreign keys. Proposed split when we lock the schema: Bay table → equipment/active · ServiceItem table → class, duration min/max, buffer, multi-day, cap · Organization.settings.bayManagement → reserve, bookingMode, requireBayEquipment, default caps only.

5 · The decision, on three axes

The four options aren't a ranking — they're points on three independent axes. Pick a position on each and the option falls out.

Exact start time A and C anchor on exact times · B offers windows · D drops start times entirely Day / AM-PM bucket

Rule-based A, B, D are rule-driven · C is the only constraint solver Constraint solver

Bays only A, B, D schedule bays × time · only C adds technicians & skills Schedule people too

6 · Recommendation going in (to be grilled, not assumed)

Recommendation B (interval planner) behind the per-tenant policy layer, with bay_type matching and min/max + buffer durations. D's drop-off bucket as a per-tenant bookingMode for "leave the car" shops. A as an optional fast stabilisation if the live failures must be fixed this week. C held in reserve until technician/skill scheduling is a real requirement.

NOWA unblocks the live failures — cheapest path to "afternoon" and "next week" working, behind the same tool contract. Skip it if we commit straight to B.
NEXTB is the recommended middle ground — real per-bay scheduling, first-class policy, no new runtime. The natural home for the config block.
LATERC only if constraints get rich (technicians, skills, optimisation); D for any tenant that genuinely runs drop-off.

7 · Open questions to grill (some answered by the data)

Q1Bays interchangeable? → No (Bay 3 = Tyre & Alignment). Confirm the full equipment map with KL Chan + clean up "ZZ APA Bay 9".
Q2Durations fixed? → No, 45 min–multi-day. Who sets the estimate — service default or staff at intake? How does a multi-day job hold a bay across days?
Q3Light vs heavy known up front? service_type gives a hint; is the real length known at booking, or only after a diagnosis? (→ book Inspection first, schedule the heavy work after.)
Q4Walk-ins — do they enter via the bot at all, or is "protected capacity" really about not letting online bookings eat the whole floor?
Q5Technician scheduling (skills, shifts)? This is the one thing that would justify C over B.
Q6Timed bays vs drop-off — which do the target shops actually run, or both (→ per-tenant bookingMode)?