Multi-tenant Storefront: one backend, six domains, one on-call

The Netorigo Storefront framework runs a single NestJS backend (netorigo-app-backend) behind many Next.js 16 frontends, each on its own domain. Six storefronts are live on this architecture today: nortinia.com, mediaorigo.com, plus four partner domains. Below is how tenant resolution actually works, why we did not split into per-tenant forks, and the leak we caught early.

Tenant resolution: the Host header decides

The backend cannot guess which tenant it serves a request for. The very first step of PublicStorefrontController is a storefronts.findByDomain(req.headers.host) call, which looks up a row in the storefront Prisma table by normalised hostname (strip www., lowercase). The resulting tenantId is placed into the AsyncLocalStorage context, and from there every subsequent Prisma query automatically gets a WHERE tenantId = ? filter.

The tenant boundary is logical (a column on the row), not physical (separate DB). That sounds risky, but the Prisma middleware layer rejects any read or write that does not carry the ALS tenant. Anyone who forgets tenantId in a where clause fails a unit test before they can open a PR.

Why one backend, not per-tenant

During the design phase we seriously considered per-tenant forks. Every tenant gets its own repo, its own deploy, its own DB. Upside: perfect isolation. Downside: with six tenants, six times the on-call, six times the migration coordination, and a bug fix has to land via six PRs and six code reviews. The math stopped working at four tenants.

One backend plus many frontends: upgraded together, one on-call, one CI pipeline. The frontends differ per tenant (theme, navigation menu, localised copy, MCP tool set) but the domain logic and data model are shared.

Per-tenant cache namespace

Redis cache keys always have the shape tenant:<tenantId>:<resource>:<id>. The CacheKeyService constructor mandatorily takes a tenantId, so there is no code path where someone can forget it. TypeScript types guarantee it at compile time.

The leak we caught early

In January 2025 a new popular_products cache layer landed, keyed naively on the tenant-agnostic popular_products_global slot. The reasoning at the time: "it is a global list, every tenant sees the same thing." Except the popular_products query already scopes the order_line table by tenantId, meaning the cached value was whichever tenant happened to be first into getPopular(). If mediaorigo.com received load first, nortinia.com would see mediaorigo's top list for five minutes.

The bug lived in prod for half a day before a partner reported it. The fix: a popular_products:tenant:<tenantId> key, plus an E2E test with two tenants asserting the two cache snapshots are disjoint. The test has never caught a regression since — which is exactly why it is there.

The tenant boundary fails cheaply. The catastrophe shows up only when you stop testing the boundary itself.

Infrastructure footprint for six storefronts

Six tenants run behind the backend today. A single NestJS pod (4 vCPU, 8 GB RAM) handles them all by default, because most of the traffic resolves at the edge cache (the 94% PDP hit rate is unpacked in a separate article). Backend p99 latency 38 ms, database p95 12 ms — no aggressive autoscale config because it isn't needed. We only fan out to +2 pods on Black Friday weekend and route reads to the Postgres replica.

Tenant onboarding checklist

Adding a tenant is four steps: (1) insert a storefront row in the DB (domain + tenantId + theme JSON), (2) point a DNS CNAME at the Vercel deploy URL, (3) HTTPS cert (Vercel auto-provisioned), (4) initialise the admin tenant-user via the /admin/tenants/[id]/init flow. Typically done in 35 minutes, no code change. Earlier the tenant onboarding required a config-file commit, but we killed that in March 2025 — a new tenant should not need a PR.

Back to Journal