AI Quality Engineering

When AI writes the code, what's left for quality to do?

Fifteen years of test architecture in insurance, finance and government — now applied to the AI layer on top: agent orchestration, review gates for LLM output, and the audit trail your regulator still expects.

Example: a review gate I run at a client. Four checks, one verdict, in CI.
15+
Years
QA & test automation
14
Clients
Enterprise & government
4
Agents
In production, licensed
2
Open source
Templates on GitHub
Ambassador
Cypress.io
The approach

Four pillars beneath a software development process where quality stays in hand.

Every engagement touches all four, in different mixtures. My approach is always effective and pragmatic: the best solution that fits your organisation for the long term.

01

Test architecture that holds up when the team turns over.

Software testing patterns, AC/TS traceability, per-feature coverage matrix, Language-First. The foundation for any quality architecture, with Cypress or Playwright. Built so AI coding agents and the surrounding quality gates keep safeguarding quality — including after you leave.

02

AI coding agents in your release pipeline.

Claude Code, Cursor, Copilot. In-repo subagents, AGENTS.md, slash commands, review gates. The agents run in your repo and your CI; your source never ends up anywhere else.

03

Audit trail by design.

DORA, GDPR, NIS2, ISO/IEC 25010, TMMi. Every change traces back to an acceptance criterion, every gate is documented. What you do, you can defend — to an inspector or to internal audit.

04

Quality as a financial story.

One report your CFO and your auditor read the same way. What does a production bug cost? What does a green pipeline save? I make the hidden costs in your development process visible — and steerable. Example from a recent engagement¹: 40h regression research per release → 6h. First green pipeline on the agentic flow: week 3. ¹ Anonymized; figures TBD — placeholder. If you are interested in a deep-dive in the financial spectrum, see QualityProfit →

Earlier work

Where the work has been applied.

A selection. The same discipline, in different contexts — from a worldwide government rollout to a solo SaaS I build myself.

QualityProfit · Solo SaaS2025 — present

Solo SaaS, four agents.

Founder · Full-stack with Claude Code

A dashboard the customer deploys themselves, turning Jira, Azure DevOps, GitHub and GitLab signals into financial ROI for QA. Four in-repo subagents: release-reviewer, deploy-monitor, onboarding-smoke-tester, requirements-guard.

Python · FastAPI · Pydantic · React · Cypress · Docker · Caddy · Stripe · Claude Code
New Orange Digital Agency2025 — present

Their AI test stack, productized.

Architecture · Framework · Claude Code skill

Built an AI-augmented Playwright architecture, framework and reusable Claude Code skill for a Dutch digital agency. Designed to plug-and-play into any current or future client engagement, not for a single project. Codifies project structure, Page Object pattern, AC/TS traceability and per-feature coverage matrix. The agency now ships AI-augmented test suites to clients through a single Claude Code skill — productized AI-testing assets at agency scale.

Playwright · TypeScript · Next.js 16 · Turborepo · Tailwind v4 · Claude Code · Cursor · Copilot
Evides · National Utility2024 — present

Quality Framework rollout.

Quality Assurance Manager

TMMi-aligned Quality Framework on top of ISO/IEC 25010, embedded in delivery pipelines for a national utility. Quality maturity expressed in financial impact, with the traceability a regulator expects.

TMMi · ISO/IEC 25010 · Quality Framework
RvO · NL Government2024 — 2025

Language-First in gov.

Cypress + Playwright architecture

Test architecture across multiple government departments where different testing tools, specifications, scenarios and tests share one continuous human-readable layer. Presented at CypressConf 2024 — "Beyond the Battle: Empowering Test Automation with a Language-First Approach." The same Language-First approach I now extend into AI-augmented delivery.

Cypress · Playwright · TypeScript · Lerna · Artillery · Gherkin · Blueriq · GitLab · SonarQube
VGZ · Insurance2022 — 2024

Architect for the long run.

Test Automation Architect

Cypress + Lit Elements test architecture with Cucumber traceability, integrated into Azure DevOps. Page Object discipline and spec-to-test traceability that let the team keep the suite maintainable after I left. Every change anchored to a spec, every spec traceable to an acceptance criterion. Built to keep working without me; handed back to the team.

Cypress · Lit Elements · Cucumber · Azure DevOps
Ministry of Foreign Affairs · Government2021 / 2022 — 2024

Global rollout, audited.

QA Architect & Test Manager

Test management for a worldwide rollout under ISO 25010 / TMap discipline. Every change traceable, every gate documented, every decision defensible to an inspector. Earlier engagement covered Cypress, Angular and Docker on Azure DevOps.

ISO 25010 · TMap · Cypress · Angular · Docker · Azure DevOps
Four agents in production

What turns out to work in CI pipelines.

Four review gates I distilled out of client work over the last two years. They run in my own codebase and at a handful of teams. Anyone who wants to try one knows where to find me.

In production · v0.4.2

release-reviewer

Reviews every push for risk patterns: secrets in the diff, coverage thresholds, destructive migrations, touched auth code. Posts a verdict on the PR with the failing rule IDs. Runs on every commit in my own codebase.

Email me about it →
In production

deploy-monitor

Verifies the container digest in production matches the released artifact, exactly. Catches the silent drift between "CI was green" and "what's actually running."

Email me about it →
In production

onboarding-smoke-tester

Walks the full onboarding flow end-to-end through the real API on every release. Catches the "registration is broken in prod" class of regression before a customer does. Runs independently; opens an issue on failure.

Email me about it →
In production

requirements-guard

Reconciles the written spec against the live code on every PR. Flags drift between what was promised and what was built — before it reaches an auditor or a customer. Plays nicely with spec tools like openSpec. The discipline the other three agents lean on.

Email me about it →
An agent isn't plug-and-play. First a short working session to see if it fits your repo — and if it clicks, a focused two-to-four-week integration. Every rule is tunable per repo; a gate never blocks without an audited override path. Wondering if one of these would suit your team? Just drop me a note; we'll look at it together.
The context

Three sectors, one recurring conversation.

The domains I've worked in for fifteen years: insurance, financial services and government. The common question — from auditor, regulator, internal audit — is how AI-augmented delivery stays explainable to someone who doesn't read code.

Insurance & financial services

DORA is here. So is DNB.

Insurers, banks, payment platforms, asset managers. DORA, GDPR, NIS2, internal audit and third-party ICT risk — plus the regulators behind them. For Dutch insurers: DNB and AFM oversight, Wft implications, Solvency II reporting systems, IFRS 17 reconciliation pipelines. The regulator isn't asking whether you use AI any more — they're about to ask how you control it.

Government & public sector

Auditable at delivery, by default.

Ministries, public-service implementers, government IT bodies. Algoritmeregister, AVG, BIO, NPR 5326, EU AI Act. AI-assisted delivery that stands up under both an inspector and a change of administration — privacy and data residency answered by architecture, not paperwork. The discipline I built at RvO and the Ministry of Foreign Affairs.

Engineering & QA leadership

Speed in, traceability out.

CTOs, VPs of Engineering, Heads of QA in regulated organisations. Claude Code, Cursor and Copilot deliver speed; your audit committee wants the evidence behind it. That bridge — from velocity to a defensible release pipeline — is what I build.

Where it doesn't fit

Honest here beats discovered in week six.

A generic AI vendor with no story toward regulation, a one-off Cypress audit with no architecture beneath it, or a pure consumer context where "move fast, break things" still leads — that's a different trade. Better to be honest here than to discover it in week six.

Trust · Continuity · Data residency

The three questions your CISO, DPO and auditor ask first.

Honest answers, named risks. The Trust & Data pack — sub-processor list, DPA, regional data-flow diagram, continuity arrangements, security questionnaire — goes to your procurement, DPO and internal audit before the first POC.

Where does your code go?

Inside your repo. Inside your CI.

The agents run inside your repository and your CI runners — no proprietary cloud holds your source. LLM access goes through your existing Claude Code, Cursor or Copilot enterprise tenant: your region, your DPA, your training opt-out. Sub-processor list, regional flow diagram and DPA highlights ship with the engagement pack.

Continuity

One architect, fully covered.

Runbooks, named backup contractor, source-code escrow — scoped per engagement and signed before kick-off. Not a warning on a label; it's just covered before the conversation starts. The Trust & Data pack carries the specifics for your contract shape.

For procurement, DPO & audit

DORA / GDPR / Wft — on one page.

KvK-registered company, standard DPA, sub-processor list, security questionnaire (CAIQ-lite), and — available separately — a Dutch-language one-pager covering DORA, GDPR and Wft implications. For procurement, your DPO and internal audit.

Request Trust & Data pack → Vraag NL-bijsluiter aan (DORA / AVG / Wft) →
Tech stack

What I bring into your repo.

Pragmatic, opinionated, chosen for AI extension — not novelty.

AI / Agents
Claude Code · Custom subagents · Hooks · Prompt engineering · AGENTS.md / SKILL.md · Cursor · GitHub Copilot
Testing
Cypress.io · Playwright · Jest · Cucumber / Gherkin · Postman · Artillery · axe-core
Frontend
TypeScript · React · Next.js · Vue · Angular · Lit · Tailwind · Turborepo
Backend
Python · FastAPI · Pydantic · Java · Node · REST · GraphQL
DevOps
Docker · GitHub Actions · GitLab CI · Azure DevOps · SonarQube
Quality
TMMi · ISO/IEC 25010 · NPR 5326 · TMap · OTAP · CI/CD · Page Object pattern
Integrations
Jira · GitHub · GitLab · Azure DevOps · Blueriq · Sitecore · Stripe
Also experienced with
Selenium · TestNG · JMeter · Jenkins · TeamCity · Lerna · Hibernate / JPA · Caddy · AWS Cognito
Career timeline

15+ years across enterprise & government.

A selection — earlier roles span ING, SBB, Ministry of Foreign Affairs, ZLM, KPN and lecturing at The Hague University of Applied Sciences.

2024 — 2025
RvO (NL Government)
Quality Assurance Manager
2024 — present
Evides
Quality Assurance Manager
2025 — present
QualityProfit
Founder · Solo SaaS
2025 — present
New Orange Digital Agency
AI test stack productized
2022 — 2024
VGZ
Test Automation Architect
2022 — 2024
Ministry of Foreign Affairs
Test Manager
2022 — 2023
Aon
Quality Automation Architect
2021
Ministry of Foreign Affairs
QA Architect
2021
CZ
Test Automation Specialist
2020
Harlem Next · Nederlandse Transplantatie Stichting
Test Automation Specialist
2019 — 2020
Aon
Quality Assurance Manager
2018 — 2019
ING
Test Automation Specialist
Also built · Solo SaaS

Invisible costs, now visible.

QualityProfit is my solo SaaS that hands the financial language back to QA. Same story as pillar 04, in product form — Jira, Azure DevOps, GitHub and GitLab signals turned into numbers your CFO and your auditor can both work with. The four agents above run inside it today.

  • One report your CFO and your auditor read.
  • Correlation engine: bugs traced to their release fingerprint.
  • Customer-deployed: your data, your infra.
See qualityprofit.io →
Working session

An hour, your release pipeline, and honest questions.

No sales call. Write me a few lines about your team — that's enough — and I'll send a short agenda back. If it fits, we go further. If it doesn't, I'll say so honestly.

Half-day review
One pipeline, one verdict. Built to see fast whether the work lands.
Fixed track
Two to four weeks. Scope set up front, delivery includes runbooks.
Retainer
Fixed monthly capacity for teams evolving the architecture across quarters.