Digital agency portal, end-to-end delivery with AI in the loop

Problem

Like most digital agencies past a certain size, the studio was running on stitched-together tools, Notion for some things, Slack threads for others, email for proposals, a shared Drive for assets, manual handoffs between admin and PM. Inquiries got lost in the seam between 'admin received it' and 'PM owns it'. Scopes drifted between the discovery call and the contract because nobody was holding the canonical version. There was no audit trail when something went sideways, so disputes ended in 'whose memory wins'. They needed one place where the client journey lived end-to-end, and they wanted AI threaded in where it actually saved time, not as an afterthought.

Approach

Built one portal around a state machine, then layered AI into the joints where humans were paying the highest tax. The inquiry funnel is 12 explicit states with hard transitions, no implicit status, no admins typing free-form notes that might or might not mean 'approved'. Every mutating route writes to a compliance-grade audit log with a per-table safe-restore allowlist; the cancellation flow keeps the row, deletes tasks, and issues a credit ledger entry rather than hard-delete. AI is wired in at three points: the intake wizard uses Claude to adaptively pick the next question and generate a brand brief from the client's website (so the PM walks into the discovery call with context, not blind); a daily security-watch cron sweeps the audit log for anomalies (mass-signup spikes, DELETE bursts, stuck inquiries, classifier blackouts) and emails admin if anything trips; and per-task AI briefs auto-generate from the scope so designers don't read the whole project to start a single deliverable. AI never auto-publishes, never auto-approves, never sends client-facing communication without human review.

Outcome

Live in production. Onboards new clients in under 10 minutes (was 2-3 days of back-and-forth email). 16 modules, all documented in the repo's MODULES.md so the same shape can be forked for the next agency. The demo subdomain reseeds 5 canonical scenarios weekly so prospects can click around as a real client before signing. The pattern is now the recipe, every digital agency I work with starts here and we strip/extend modules per their shape.

Notes

What I built

The portal handles four distinct surfaces in one codebase: public marketing + signup wizard, client view of their own work, PM/admin operations console, and internal HR + presence tooling. Each has its own permission model, its own dashboard, and its own audit boundaries.

The 16 modules, from identity-auth at the base to security-watch at the tip, are catalogued in MODULES.md with a dependency graph that says exactly what depends on what. That document is the recipe: when I spin up a portal for the next client, I work down the list and decide module-by-module what to keep, drop, or customize. A pure task-tracker drops twelve modules. An internal HR tool keeps four.

The piece I'm proudest of is the review state machine on tasks. Every deliverable flows through DRAFT → SUBMITTED_FOR_INTERNAL → INTERNAL_APPROVED → SUBMITTED_FOR_CLIENT → CLIENT_APPROVED, with a CLIENT_REVISION_REQUESTED branch back into the loop. Each transition is a single-purpose API call with a side-effect contract, emails, notifications, audit row. No state is implicit. No admin can flip a task to "approved" by editing a free-text field.

Decisions I'd defend

The state machine is non-negotiable. Free-text status fields are how "almost works" turns into "definitely doesn't." Every transition is a function with explicit pre-conditions and side-effects, and every call site writes an audit row. The discipline pays for itself the first time a client claims a deliverable was never approved.

Audit log retention has two tiers. CONTRACT rows (anything financial or contractual) live 7 years. OPERATIONAL rows (everyday CRUD) live 1 year. Compliance teams stop arguing once the policy is in code; the per-table classification lives in auditLog.ts:defaultRetention().

AI fails open, never closed. Every Claude call site, brand brief, suspicion classifier, follow-up question generator, has a graceful fallback. If the API key is unset or the call times out, the workflow continues without the AI enrichment. I will not ship a system whose critical path waits on a third-party LLM.

Restore is allowlist-only. The Activity Log has a "Restore" button on update rows, but only for fields in auditRestoreAllowlist.ts. Status enums, money amounts, and assignment ownership are never auto-restorable, those need a human decision. The allowlist is the security boundary.

What broke (the honest section)

Activation latency on retainers. When a PM activates a heavy retainer scope, the portal generates milestones, tasks, and a per-task AI brief. The auto-assigner runs ~5 queries per task to score-and-pick by skill and workload. On a 30-deliverable retainer that's 150 sequential queries before the activation finishes. First version timed out the request. Fix was to hand the activation to a background queue and stream status back to the UI; meanwhile I shipped a "Activating… (spinning up tasks)" disabled state on the button so PMs don't double-click during the 70 seconds.

The Stripe-bypass coupon was supposed to be temporary. I shipped a BYPASS_COUPONS env var to support the demo period before live Stripe keys landed. It's still there. It will probably stay there as a comp/demo path. Some "temporary" hacks earn their keep.

"Lock scope" is a terminology hill I died on. PMs initially submitted call notes via "Mark as Call Done" which then locked the scope. Felt fine until users complained the verb didn't match the action, they were locking a scope, not just marking a call done. Renamed the button to "Lock Scope" and added an "Unlock & re-scope" button for the assigned PM. Sounds trivial. Took two iterations and a lot of email-template editing because the lock flips a notification + dashboard banner + email cascade.

The first cancellation flow was a hard delete. Wrong. Cancellation needs to keep the row, delete tasks, revert the inquiry to a re-scopeable state, and issue a credit. Took a v3.10 rebuild. The credit ledger now lives as ClientCredit rows; the finance module (planned) will treat them as first-class.

What's next

A separate platform for healthcare ops, different shape, different domain. Coming soon.

A finance module for the agency portal, mirroring the cancellation pattern: separate API + email template + admin UI + audit hooks. Expense tracking, salaried payroll, monthly reports. Last open loop on the roadmap.