Back to Journal
adminen

Operator audit log: traceability for every change

Who, what, when, before-after diff, request-id - plus the webhook infection that generated 14k spurious audit rows in 11 minutes.

In a multi-tenant ERP an audit log is not optional. NIS2 plus GDPR plus money-moving transactions all assume every state change is describable: who, when, what changed. The Netorigo Admin audit log module reached its standard shape in early 2026 after a year of iteration.

What we record

Every state-mutating action writes one audit_events row. The schema:

  • id - ULID
  • tenant_id - tenant-scoped, indexed
  • actor_id - the user or service that initiated the request (user:12345, system:webhook, ai_assistant:67890)
  • actor_type - enum (user, system, ai_assistant, api_key)
  • entity_type - what we changed (product, order, invoice, user, role, ...)
  • entity_id - which specific record
  • action - enum (create, update, delete, state_change, permission_grant, ...)
  • before - JSON, the state before the change (only affected fields)
  • after - JSON, the state after the change
  • request_id - ULID consistent across the whole HTTP request, so all events caused by one request can be grouped
  • ip_address, user_agent - context
  • created_at - timestamptz

before/after are diff-scoped, not the full record. A NestJS interceptor extracts the relevant fields before the service method, then diffs after.

What we do NOT record

Two categories are deliberately skipped:

  1. Read-only health checks - /health, /api/health, /livez, /readyz run every second and are not state changes. The interceptor early-returns on these.
  2. Pagination cursors - GET /products?cursor=abc123&limit=50 is also not a state change, just a read. If every listing fetch wrote an audit row, the table would grow 100k+ rows per tenant per day and useful signal would drown in noise.

The rule: only POST, PUT, PATCH, DELETE HTTP methods plus specific state_change service calls (e.g. Order.markAsPaid()) emit audit events. GETs never do.

Read side: filtering UI

/admin/audit is a filter-bar list:

  • Date range (default: last 7 days)
  • Actor (autocomplete from users, or system / ai_assistant)
  • Entity type + optional entity_id (e.g. all changes to product:12345)
  • Action (multi-select)
  • Search - free text across before/after JSON (Postgres tsvector index)

The list paginates 50 rows at a time, expandable into a before -> after diff in a side panel. JSON-diff with syntax highlighting (red = removed, green = added, yellow = changed).

CSV export is a BullMQ job: user clicks, worker generates, the download link is emailed when done. A 30-day export against a 100k-row log takes 4-5 minutes.

Retention

Default retention: 365 days. Tenant-configurable (min 30 days, max 7 years). Older rows are moved by a weekly cleanup job into S3 cold storage (gzipped JSONL) and deleted after 7 years. If ever needed, the cold storage can be re-imported via an admin tool.

The dedup policy: the webhook infection

In February 2026 a customer configured a webhook that responded to every order.updated event with a PUT /orders/:id carrying the same data. This created a loop: 14,000 spurious audit rows in 11 minutes, all with before == after.

The fix: the interceptor now byte-compares before and after. If equal, no audit event is written (just increments an audit_dedupe_counter metric visible in the Prometheus dashboard). This is not surfaced as an error to the client - the webhook returns 200, just no false audit row.

The dedup metric is also useful for us: if it spikes on one tenant, there's probably a config bug in one of their webhooks.