InferCheck: Why I Catalogued 347 AI Models for GDPR Compliance

If you've tried to pick a GDPR compliant AI API for a production product, you know the drill. You start at the provider's marketing page. It says "enterprise-grade security" and "SOC 2 certified." Neither of those answers the question you need answered: does inference happen inside the EU, and will this provider train on the data I send?

You click through to the DPA. If there is one. If you can find it. If it's not hidden behind a sales call.

Multiply that by the ten providers you're evaluating, then do it again next quarter when the model landscape shifts. That's the problem InferCheck solves: a neutral, structured, GDPR AI provider compliance directory — 347 models across 111 providers, filterable by the compliance properties that matter for EU businesses.

The Actual Problem

The AI inference market is moving fast. OpenAI is on GPT-5.4. Anthropic ships Claude Opus 4.6 and Sonnet 4.6. Google has Gemini 3.1 Pro in preview. Mistral, Qwen, and a dozen open-weight model families are available through multiple inference providers.

For an EU team building a product that touches personal data, the model selection question is never about capability alone. It's about capability plus data residency plus DPA availability plus training policy. And those compliance facts are scattered across legal pages, trust centres, and support tickets that may or may not be current.

There's no shortage of model benchmarks. There's a massive shortage of compliance benchmarks.

Three specific pain points kept coming up in projects I built for DACH startups:

Developers pick the model first, then discover the compliance situation. By the time legal flags the issue, the integration is half-built.
Compliance information is provider-authored. Marketing departments write DPA summaries. Neutral, structured comparisons don't exist.
The landscape changes faster than compliance reviews. A provider launches EU inference regions, signs a new DPA, changes training defaults — and nobody updates the internal spreadsheet.

What InferCheck Does

InferCheck is a directory. Not a SaaS platform, not a compliance tool, not legal advice. A reference database with a search interface.

You come in, search for a model — say claude-opus-4-6 — and see every provider offering it. For each provider, you get: compliance status, data residency, DPA availability, training policy, pricing, and the date the information was last verified. Every claim links back to a source document.

Compliance Filter Profiles

Four filter profiles cover the most common EU requirements:

Strict EU — data stored in EU, inference compute in EU, DPA available. This is the "our legal team said EU-only" filter.
EU + SCCs — DPA available, Standard Contractual Clauses in place, no training on customer data. Covers providers like OpenAI and Anthropic who process in the US but have signed SCCs.
No Training Anywhere — hard opt-out on model training, regardless of geography. For teams where data leakage into model weights is the red line.
Custom — individual toggles. Build your own filter combination.

Every filtered view is URL-persistent. You share the link with your legal team. They see the same table. No accounts, no login, no "schedule a demo."

The Colour System

Compliance status uses a strict traffic-light scheme:

Green — EU Only. Data stays in EU, inference in EU, DPA available.
Amber — EU + SCCs, or partial compliance. Usable with contractual safeguards.
Red — non-compliant for EU purposes. US/CN-only processing, trains on data, or no DPA.
Grey — unverified. No data available, or provider hasn't published enough information.

These colours are reserved for compliance signal. They never appear as UI decoration.

The Data Architecture

This is the part I think matters most for trust: the data is flat JSON, committed to a public Git repository.

111 Provider Files

Each provider has one JSON file in data/providers/. The file follows a Zod-validated schema and contains structured fields: data residency regions, inference locations, DPA URL, training policy, opt-out mechanism, SCC status, EU AI Act status, certifications, and source URLs for every claim.

{
  "name": "Anthropic",
  "slug": "anthropic",
  "compliance": {
    "dataResidency": ["US", "EU", "CA", "JP"],
    "inferenceLocation": ["US", "EU"],
    "dpaAvailable": true,
    "dpaUrl": "https://www.anthropic.com/dpa",
    "trainingPolicy": "no_training",
    "sccStatus": "signed",
    "lastVerified": "2026-04-07"
  }
}

Why flat JSON? Because Git diffs are readable. Because anyone can submit a correction via pull request. Because a compliance team can audit the entire dataset in one directory listing. No database queries, no admin panels, no opaque data transformations between what's stored and what's displayed.

Model Catalogue via Neon

The model data lives in Postgres (Neon) via Drizzle ORM. Models change too fast for hand-curation — new variants ship weekly, pricing adjusts, older versions get deprecated. A nightly Vercel Cron job at 2am UTC syncs model data from OpenRouter and EU-native provider APIs.

The split is intentional: slow-changing compliance metadata in Git (auditable, community-editable), fast-changing model metadata in a database (automatable, queryable).

Community Corrections

Every provider page has a "Report a change" button. It opens a pre-filled GitHub issue. No account beyond GitHub required. The issue gets labeled, reviewed, and merged if valid. All reports are public.

This solves the staleness problem without requiring a full-time compliance analyst. The people who notice that a provider updated their DPA are the people who read the DPA — developers and legal teams working with that provider.

What the Data Shows

Across 111 providers and 347 models, a few patterns stand out.

EU-Only Inference Is Available — If You Know Where to Look

Providers like Mistral, Scaleway, OVHcloud, STACKIT, Berget AI, and Amazon Bedrock (via EU regional endpoints) offer strict EU-only inference. Google Vertex AI serves Gemini models from EU regions. You can run Claude on Bedrock's Frankfurt endpoint.

The options are there. They're not always the cheapest or the most obvious, but they exist. InferCheck makes them findable in one table instead of across a dozen marketing pages.

The Big US Providers Are Amber, Not Red

OpenAI and Anthropic both offer signed DPAs, SCCs, and no-training-on-API-data policies. That puts them in amber — usable for many EU workloads when the proper contractual framework is in place. Not EU-only, but defensible.

This matters because the compliance conversation is often framed as binary: compliant or not. In practice, it's a spectrum. A product processing pseudonymised analytics data has a different risk profile than one processing medical records. InferCheck surfaces the facts; the risk assessment is yours to make.

The Red Zone Is Informative Too

DeepSeek processes in China and trains on data. Several smaller providers have no published DPA. Some have US-only inference with no SCCs. Knowing what doesn't pass is as useful as knowing what does — it saves the evaluation time before you start integrating.

Why Open Data Matters Here

Compliance directories are useless if you can't verify the underlying data. The worst version of this product would be a proprietary database behind a paywall where you trust one vendor's reading of another vendor's DPA.

InferCheck's data is licensed CC BY-NC-SA 4.0. The code is MIT. You can clone the repo, audit every claim, and trace every data point back to its source. If you disagree with a classification, you can open a PR. If a provider updates their terms, the community can flag it before the next nightly sync.

This is a deliberate trade-off. Proprietary data is easier to monetise. Open data is easier to trust. For a compliance reference tool, trust is the product.

How It Fits Into the Stack

I built InferCheck on the same stack I use for client MVPs at CodeAttack: Next.js, TypeScript, Postgres on Neon, deployed to Vercel. The data layer is Drizzle ORM. The provider JSON files are validated at build time with Zod.

The nightly sync job is a Vercel Cron function — nothing exotic. It hits the OpenRouter API, maps models to providers, updates pricing, and writes to the Neon database. EU-native provider adapters (Scaleway, STACKIT, OVHcloud) are on the roadmap for direct API sync.

Total infrastructure cost is minimal. The expensive part was reading 111 DPAs and building the schema to structure what I found.

What InferCheck Doesn't Do

It's not a compliance tool. It doesn't generate GDPR documentation or run risk assessments. It doesn't certify providers. It doesn't replace legal counsel.

It's a reference directory. The same way you'd check a model's benchmark score before picking it, you check its compliance profile. InferCheck puts that profile where it should have been all along: next to the model name, in a structured, filterable, verifiable format.

Current State and What's Next

InferCheck is live at infercheck.eu with 347 models and 111 providers. The model catalogue syncs nightly. Provider compliance data is updated manually and via community reports.

Next on the roadmap:

Side-by-side provider comparison — pick two providers, see compliance and pricing differences in one view.
Per-provider GDPR guides — editorial content explaining each provider's data processing setup in plain language.
EU-native provider sync adapters — direct API integration with Scaleway, STACKIT, Aleph Alpha, OVHcloud, and SAP AI Core.
Email digest — weekly notifications when a provider's compliance status changes.
API access — structured compliance data via REST API for teams who want to integrate it into their own tooling.

The goal is straightforward: make GDPR AI provider compliance a solved lookup problem instead of a quarterly research project.