Methodology
How Mirr scores AI brand visibility.
Every number in a Mirr report is either computed deterministically from public signals or scored by a named model against an explicit rubric. This page documents the full pipeline, every formula, and the rationale behind each design choice, so anyone can reproduce or challenge a result.
The four-stage pipeline
An audit runs through four stages. Stages 1 and 2 are deterministic: given the same brand name, website URL and competitor list, they return the same numbers every time. Stages 3 and 4 use AI, but operate on the deterministic data from stages 1 and 2 as ground truth, which keeps the final output anchored to facts.
Stage 1: Multi-LLM research
We run 15 standardised customer-discovery queries (for example, “best AI visibility audit tools for SaaS in 2026”) on four AI platforms: ChatGPT (OpenAI gpt-4o-mini), Perplexity (sonar), Claude (Haiku 4.5) and Gemini (1.5 Flash). That is 60 API calls per audit. Every response is stored verbatim so you can read them in the report's Raw Responses appendix.
15 queries × 4 platforms = 60 answers
Stage 2: Web presence audit
We check nine signals AI crawlers rely on: a Wikipedia page (via the official Wikipedia REST API, not inference), a verified LinkedIn company page, Schema.org structured data on your homepage, the meta title, meta description, Open Graph tags, Twitter card tags, canonical URLs, and presence on eight review platforms (G2, Capterra, Trustpilot, Klantenvertellen, Clutch, DesignRush, Google Business and AlternativeTo). Every check is a yes or no; there is no AI inference in this stage.
9 signals, present or not
Stage 3: Perception scoring
Claude Haiku 4.5 reads the aggregated research output from stage 1 and scores the brand on five category dimensions, each against an explicit rubric: Awareness (does AI know you exist), Consideration (are you on the shortlist for relevant queries), Decision (do recommendations end with you as the choice), Sentiment (is the tone positive) and Cultural relevance (does AI connect you to the right cultural signals). Each category returns 0-100. The Visibility Score is the average.
Stage 4: Strategic synthesis
Claude Opus 4.7 combines the stage 1 research, the stage 2 signals and the stage 3 scores into an identity gap analysis (six dimensions, 0-10 each), a competitor benchmark with per-competitor strengths and weaknesses, and a seven-step action plan. Every action in the plan must reference a specific finding; generic advice is rejected by the prompt contract.
Formulas
Every metric in a Mirr report is one of these:
Coverage: the share of prompts that name the brand at all.
We quote score bands, not single integers: rerun variance sits within a band.
- Coverage % = (prompts mentioning your brand ÷ 15 total prompts) × 100. Regex word-boundary match against normalised text, case-insensitive. Deterministic.
- Share of Voice % = (your brand mentions ÷ total brand mentions across all responses) × 100. Counts every occurrence of every provided brand name. Deterministic.
- Web Presence Score = Wikipedia (25) + Schema.org (15) + LinkedIn (15) + meta title (10) + meta description (10) + Open Graph (8) + reviews (up to 8) + Twitter card (5) + canonical URL (4). Maximum 100. Deterministic.
- Visibility Score = mean of five category scores (Awareness, Consideration, Decision, Sentiment, Cultural). Each category 0-100, scored by Claude Haiku against a rubric. Ranges: 0-20 Invisible, 20-50 Weak-Emerging, 50-75 Established, 75-100 Dominant.
- Identity Gap = six 0-10 scores for Tone, Values, Audience, Market Position, Brand Promise, Emotional Association. Scored by Claude Sonnet comparing your intended positioning (optional input) to actual AI perception.
- Per-LLM Coverage = same Coverage formula run independently per platform. Reveals whether your visibility is broad (balanced across ChatGPT, Perplexity, Claude, Gemini) or fragile (concentrated in a single corpus).
Why these four LLMs
Between them, ChatGPT, Perplexity, Claude and Gemini account for more than 90 percent of AI-assisted brand discovery today. Each has a different training corpus and update cadence, so a brand that appears in one may be invisible in another. Measuring all four is the only honest way to describe your AI visibility.
- ChatGPT (OpenAI gpt-4o-mini): largest consumer base, trained on a curated web corpus with periodic refreshes.
- Perplexity (sonar): research-first, fetches live web, cites sources. Strongest signal for freshness.
- Claude (Haiku 4.5): used widely by developers and enterprises, high reasoning quality.
- Gemini (1.5 Flash): integrated with Google Search and Android. Essential for consumer discovery.
Why 15 queries
The 15 queries are generated by a fixed template that asks each brand category five discovery-intent questions (for example, “what are the top tools for X”, “best alternatives to Y”), three comparison questions, three decision-stage questions, and four edge-case or long-tail questions. Running fewer queries produces unstable Coverage numbers; running more produces diminishing returns but tripled cost. Fifteen is the smallest number at which rerun variance drops below five points for most brands.
What varies between runs
Deterministic signals (web presence, coverage counts) are stable to within one point across reruns. AI-scored metrics (Visibility Score, identity gap) have run-to-run variance of roughly five points, which is why we quote the Visibility Score as a band (for example “Emerging”) rather than a single number. The honest number is the band; the specific integer is a directional indicator.
When scores move after changes
Deterministic web presence changes (canonical URLs, Schema.org, meta tags) are reflected the moment crawlers refetch your site, typically within 7 days. Perplexity picks up fresh content in 24 to 48 hours. ChatGPT, Claude and Gemini use training snapshots updated on their own schedules; expect LLM coverage gains to lag deterministic gains by 4 to 8 weeks.
Verifiability
Every audit PDF includes a Raw Responses appendix containing the first eight queries in full, with every LLM response. The numbers in the report are computable from these responses with nothing more than a regex match. If a result looks wrong, you can check the source material directly. This is deliberate: we score brands for AI visibility, so the report itself should be accountable to the same standard of verifiability it measures.
The rhythm of Mirr
One audit tells you where your brand stands today. It is a snapshot. AI models retrain every 3 to 6 months, your competitors are optimising for visibility too, and the action plan you receive in the report needs time to land before its impact shows up in coverage numbers. That is why Mirr is built around a cadence, not a one-off check.
Why audits are cyclical
Three forces keep your AI-visibility position in motion. First, model training data refreshes: the version of ChatGPT a customer talks to in November knows different things from the August version, and that shift can move your coverage by ten or twenty percentage points without you doing anything. Second, your competitors are publishing, getting reviewed, and improving their listings. Their relative weight in the model's answer changes every cycle. Third, the action plan you implement after audit one needs four to eight weeks to propagate into LLM training data, so its real impact only shows up in audit two or three.
When to re-audit
For most brands a quarterly cadence is the sweet spot: far enough apart that meaningful changes have time to land, close enough that you can correct course before a competitor pulls ahead. Outside the regular cycle, certain moments are also worth a fresh audit:
- after launching a major content push or PR campaign, to measure whether AI noticed
- after a rebrand or repositioning, to confirm the new identity has propagated
- before a product launch or expansion into a new market, as a baseline you can defend later
- when you notice a competitor pulling ahead in real-world buying conversations
What a second audit shows
The first audit gives you scores and a plan. The second audit gives you accountability: visibility shifted from 47 to 58, ChatGPT now mentions you in three queries it missed before, your German market closed half its gap on the Dutch one. You can see which action items moved the needle and which did not, which lets you concentrate the next cycle on what is actually working. Over three audits you stop optimising in the dark and start operating like a brand that knows how it shows up in AI.
The 3-pack exists for exactly this reason. Three audits at €83 each (versus €99 individually) covers a full year of quarterly check-ins and gives you the data continuity that single audits cannot. See pricing →
Last updated
22 April 2026
See the methodology applied to your brand
Run a full Mirr audit in under 15 minutes. 15-page PDF. €99.
Start your audit →