CASE STUDY

firstdemand.io

A 4-stage AI pipeline that turns a landing page URL into a demand strategy — diagnosis, channel scorecard, 14-day playbook, and copy-ready assets — in under two minutes.

Live product — firstdemand.io

PRODUCT

What it does.

firstdemand.io solves a specific problem: technical founders ship a landing page and then stall because they don't know which early-demand channels fit their specific product, ICP, and comfort level. They don't need another SEO tool or generic marketing copilot — they need an opinionated senior strategist who reads their actual product page and hands them a plan they can execute this week.

The product is not a social scheduler, SEO tool, or generic AI marketing assistant. One session. Under two minutes. Four deliverables.

INPUT

What goes in.

Landing page URL — scraped live, including JS-rendered SPAs and Cloudflare-protected pages
Structured intake form (pre-filled by AI from the scrape): product description, ICP, launch stage, goals, constraints, channel preferences
Optional: plain-language correction to refine the AI's interpretation

OUTPUT

Four artifacts, one session.

Demand Readiness Diagnosis

Readiness score 0–100, four signal ratings (positioning / ICP / CTA / proof), main bottlenecks, and a ready/not-ready verdict.

Channel Scorecard

Top 3 channels ranked by fit — with effort, time-to-signal, tradeoffs, and an opinionated "why not the others" for rank 1.

14-Day Playbook

Day-grouped action plan with specific tasks, estimated time per task, and copy anchored to the founder's actual product language.

Asset Pack

Copy-paste ready: directory listings, community posts, outreach messages, CTA variants, founder bio, product one-liners — in any of six languages.

ARCHITECTURE

The pipeline.

Five stages, three models, two providers. Each stage streams results to the client via Server-Sent Events as it completes.

Stage 0 — URL Scrape + Form Prefill

GPT-5.4 with webSearchPreview browses the live URL — handles JS-rendered SPAs, Cloudflare protection, React apps. Returns a single unified schema: page signals AND form prefill in one call. This combined approach cut scrape latency from ~40s (two sequential calls) to ~16-18s.

GPT-5.4 + webSearchPreviewOpenAI

Stage 1 — Demand Readiness Diagnosis

GPT-5.4 analyses positioning quality, ICP clarity, CTA strength, and social proof. Outputs a structured diagnosis with a 0–100 readiness score. Free users receive this stage plus channel theme previews; the pipeline stops at the paywall.

GPT-5.4OpenAI

Stage 2 — Channel Scoring

GPT-5.4 matches ICP behaviour to channel fit across six channel families (directories, communities, outreach, partnerships, monitoring, public posting). The channel family is a Zod enum — the model cannot hallucinate outside it.

GPT-5.4OpenAI

Stage 3 — 14-Day Playbook

Claude Opus 4-6 via OpenRouter. Chosen for instruction-following on complex day-grouped schemas and noticeably stronger copy quality for sequential action plans. Playbook actions reference specific product language from the scrape, not generic AI copy.

Claude Opus 4-6Anthropic via OpenRouter

Stage 4 — Asset Pack

Claude Opus 4-6 via OpenRouter. Copywriting quality is the priority here. All assets are generated directly in the requested output language — no translation layer. Six languages supported; on-demand re-generation for additional languages without re-running the full pipeline.

Claude Opus 4-6Anthropic via OpenRouter

OpenAI handles analytical/reasoning stages. Anthropic handles creative/copywriting stages. Model assignments live in one 14-line providers.ts file — swapping a model is one config change.

ENGINEERING

What made it hard.

The scrape latency problem

The original scraper used two sequential LLM calls: first GPT-5.4 with webSearchPreview to fetch and read the page (~25-30s), then GPT-5-mini to synthesise the content into form prefill values (~10-15s). Total: ~40s before the user saw anything.

The fix: combine both into a single generateText call with a rich output schema (CombinedScrapeSchema) covering both page signals and form prefill in one pass. Result: ~16-18s. The key insight was designing the schema to serve two masters at once rather than making two focused calls.

Checkpoint resumability vs. idempotency

The pipeline has two overlapping guarantees: idempotency (if you re-hit the generate endpoint for a complete project, it replays cached results as SSE without burning tokens) and checkpoint resumability (if the pipeline dies mid-run, the next retry picks up from the last completed stage, not from scratch).

These two guarantees interact in non-obvious ways — particularly around the justPurchased flag that bypasses idempotency when the user just completed checkout. Both patterns are necessary for a paid product with expensive LLM calls and unreliable serverless infrastructure.

Prompt injection in user corrections

Users can submit plain-language corrections to refine the AI's interpretation of their product. Those corrections are injected into every subsequent prompt. A founder can accidentally (or intentionally) write instructions that override model behaviour.

The sanitiser strips structural injection vectors (role-override phrases, section delimiters, XML tags). The injected block is wrapped in a founder_note XML tag with explicit instruction to treat it as data, not instructions. This is a deliberate trade-off: too much sanitisation strips legitimate context; too little opens the model to manipulation.

URL-level shared cache

The diagnosis cache is keyed by (normalizedUrl, stage) with a 14-day TTL. Two different users analysing the same landing page share the same cached diagnosis — zero extra LLM cost. Only the full scorecard, playbook, and assets are personalised (and therefore not cached across users).

When a user submits a correction, the cache is bypassed entirely for that project. This means a paying user who has corrected once always runs fresh LLM calls, even on re-runs with no further changes. Documented deliberate decision.

NUMBERS

By the numbers.

LLM call sites in the pipeline

distinct models orchestrated

AI providers (OpenAI + Anthropic)

distinct Zod output schemas

~16s

scrape latency (AI path, current)

~40s

scrape latency (original, 2 calls)

60–90s

full pipeline latency (fresh, pro)

<1s

response time (cache hit)

526

lines in the generation route

DB schema migrations

PATTERNS

What transfers to client work.

The engineering patterns built for firstdemand.io are directly reusable. These aren't theoretical — they're battle-tested in a live product.

Output.object + Zod schemas

Every AI call uses Vercel AI SDK's Output.object with a ZodSchema pattern. This forces schema design before prompt writing, which consistently produces better-structured output. The SDK handles retries on parse failure automatically.

Context builder pattern

Prompt assembly is separated into pure functions (buildFounderContext, buildPageContext, buildCopyNotes, buildUserContextNote) that take typed objects and return formatted string blocks. Each prompt composes these blocks rather than inline-constructing the full string. Prompt changes stay localised.

Checkpoint + idempotency in multi-stage pipelines

Write a partial result row after each stage. On retry, detect the partial row and resume from the last completed stage. Separate idempotency guard: if a complete result exists, replay as SSE without re-running LLM calls. Applicable to any multi-step AI pipeline where steps are expensive and failures are expected.

Single-call URL scrape + synthesis

Attach a live-browsing tool (webSearchPreview), define a rich output schema covering both extraction and synthesis, let the model fill both in one turn. Cuts latency by ~60% compared to sequential calls. Portable to any product that reads a URL and extracts structured data.

See firstdemand.io →Start a project