AXscore

Methodology · v2.0 · 27 May 2026

How AXscore measures.

AXscore publishes one methodology, applied uniformly to every property, and versioned with dates. There is no per-customer customisation.

Emergent measurement, not a curated list

Most measurement frameworks start with a list — "here are the 100 hotels we track" — and ask how those properties perform. AXscore does the opposite. The system classifies the property, builds a prompt basket tailored to that property's actual market, runs it against multiple AI engines, and discovers which properties get named in response.

The resulting leaderboard is "the hotels AI engines named across the prompt basket," not "how our list of 100 performed." This is more honest about how discovery actually works for guests today, and it surfaces properties that traditional tracking would have missed.

The trade-off is computational: every scan classifies the property, generates a basket, and extracts entities from free-form AI responses rather than checking against a fixed list. Classification and entity matching are the operationally most important steps.

The five stages

What every scan does, in order.

  1. 01

    Classify the property

    Before any prompts run, AXscore classifies the property into a segment. The segment determines which questions real guests in this market actually ask — and therefore which prompts the scan runs. Classification triangulates three independent evidence sources: what the operator states about positioning, what the property's homepage objectively shows (schema.org markup, published rates, on-page positioning copy), and how AI engines already describe the property (extracted from prior scan responses). Where the three disagree, the disagreement itself is recorded as a finding — never silently resolved. The disagreement is often the most useful thing a report can tell an operator.

  2. 02

    Build a realistic prompt basket

    For each classified property, AXscore generates a 15-prompt basket distributed across three funnel bands that mirror how real guests research: 5 top-of-funnel queries (broad, country-level — "best luxury resorts in Australia"), 7 mid-funnel queries (intent + region — "luxury wilderness escape in Tasmania"), 3 bottom-of-funnel queries (region- and segment-locked). A leader scores high in the bottom band (its specific niche) but does not sweep the top (competes nationally). That funnel shape — not engineered difficulty — is what produces meaningful score separation between properties. Every query traces to evidence of real guest behaviour; no query enters a basket from operator or developer intuition about what “should” be tested.

  3. 03

    Run, extract, resolve

    Each prompt runs three times per engine to control for response variance. Every response is parsed for property names regardless of whether the property appears on any pre-existing list — the dataset is emergent. Naming variations (“The Calile,” “Calile Hotel Brisbane,” “Calile”) resolve to canonical entities via exact normalisation first, then AI disambiguation at a 0.85 confidence threshold. This entity-matching layer is operationally the most important step; without it, leaderboard rankings are fictional.

  4. 04

    Score with structural integrity

    The visibility score is the share of scorable prompt-runs in which the property was named. Two rules govern “scorable” and both are non-negotiable. A non-empty answer that omits the property is a real miss and counts in the denominator. A structurally empty response (the engine declined to surface an AI answer for the query — common for Google AI Overview on hotel queries) produces no data point and is excluded from both numerator and denominator. “No answer” is not the same as “the engine answered and omitted you,” and conflating the two corrupts the score in opposite directions. A small share of opportunity-flagged queries (adjacent demand the property is unlikely to win) are reported separately as “adjacent demand we tested” and never depress the headline.

  5. 05

    Calibrate the basket against known references

    For each segment AXscore measures, the methodology is periodically checked against a known leader, a known mid-tier property, and a known weak property within that segment. Healthy basket: scores separate cleanly. Unhealthy basket: scores cluster, or rankings invert. When the calibration check flags a problem, AXscore investigates the basket realism, then mention extraction, then entity resolution — in that order. AXscore never hand-tunes queries to force scores to separate. The basket must reflect real guest queries; engineered difficulty would destroy the comparability that makes the dataset valuable.

The published taxonomy

The vocabulary every classification uses.

Segments are constructed from a published, deliberately coarse taxonomy. A property does not get to invent its own segment; if a property's positioning doesn't map cleanly to a taxonomy segment, it is classified to the nearest evidenced match and the mismatch is noted in the report.

Price / positioning tier (1 per property)

  • luxurytop-tier, premium rates
  • premiumupper-mid, boutique / high-end
  • mid_tieraccessible quality, well-reviewed
  • budget_consciousprice-led, best-value

Experience tags (1–3 per property)

  • nature_wilderness
  • wellness_spa
  • food_wine
  • family
  • design_architectural
  • heritage
  • romantic_adults_only
  • city_urban
  • beach_coastal

Region is the third axis — open-vocabulary, drawn from the property's actual market. Each experience tag must be independently evidenced from the property's surface or from how AI engines already describe it; tags without evidence are dropped.

The four engines

Where AXscore looks.

Each engine has different training data, retrieval behaviour, and citation conventions. Measuring all four prevents any one engine's biases from dominating the headline number.

ChatGPT

OpenAI

The most widely used consumer AI assistant. Responses vary across model versions; AXscore runs multiple invocations per prompt to dampen this.

Claude

Anthropic

Strong on reasoning and source attribution. Used both as a target engine and as the primary model behind AXscore's entity matching and segment classification layers.

Perplexity

Search-grounded

A search-augmented assistant that surfaces citations alongside answers. Useful for distinguishing AI memory from real-time citation behaviour.

Google AI Overviews

via SerpAPI

The AI summary shown above Google search results. Accessed via SerpAPI for reliable programmatic capture of the same content guests see. Google frequently declines to generate an AI Overview for conversational hotel queries; those non-answers produce no data points (per Stage 04), so Google contributes fewer scorable runs than the other engines on hotel queries. That is correct: a non-answer carries no signal about visibility.

Two scores, side by side

The headline number, and the technical why.

Every scan produces both scores. Reports treat them as connected sections — the visibility score is the what, the technical foundation is the why.

Headline

AI Visibility Score

The share of scorable prompt-runs in which the property was named, weighted across the engines that produced answers. Out of 100. The number every customer cares about.

Supporting

Technical Foundation Score

Six categories of technical signals: discoverability, structured data, content readability, booking interface accessibility, external presence, and agent infrastructure. Explains why the visibility score is what it is — and what to fix if it is lower than the property warrants.

One methodology, uniformly applied

What AXscore deliberately does not do.

  • No custom methodology per customer. The methodology is published transparently and applied identically to every property. This preserves the comparability that makes the leaderboards meaningful.
  • No claim that AI's recommendations are correct. AXscore measures AI behaviour; it does not assert that the hotels AI names are the right hotels for any given guest.
  • No property listed without observation. Every name on every leaderboard appears because it was named in the underlying scan data — never because AXscore chose to include it.
  • No hand-tuning of queries to manipulate scores. Every prompt in every basket traces to evidence of real guest behaviour. The calibration check (Stage 05) exists to diagnose when the basket fails to separate known leaders from known also-rans — and when it does, the fix is to the basket's realism or the extraction, never to the scoring.
  • No counting engine non-answers as misses. When an AI engine declines to generate an answer for a query, that run produces no data point — it is excluded from both numerator and denominator. An engine that answered and omitted you remains a real miss. The two cases are structurally different and the score must distinguish them.

Versioning

AXscore methodology v2.0, last updated 27 May 2026.

Earlier scans may have run under previous methodology versions (notably the v1 four-category basket prior to the funnel-band model). Where a report references an earlier version, that version's rules apply to that scan. Cross-version score comparisons are surfaced with a methodology note in the report.