ChatGPT
OpenAIThe most widely used consumer AI assistant. Responses vary across model versions; AXscore runs multiple invocations per prompt to dampen this.
Methodology · v2.0 · 27 May 2026
AXscore publishes one methodology, applied uniformly to every property, and versioned with dates. There is no per-customer customisation.
Most measurement frameworks start with a list — "here are the 100 hotels we track" — and ask how those properties perform. AXscore does the opposite. The system classifies the property, builds a prompt basket tailored to that property's actual market, runs it against multiple AI engines, and discovers which properties get named in response.
The resulting leaderboard is "the hotels AI engines named across the prompt basket," not "how our list of 100 performed." This is more honest about how discovery actually works for guests today, and it surfaces properties that traditional tracking would have missed.
The trade-off is computational: every scan classifies the property, generates a basket, and extracts entities from free-form AI responses rather than checking against a fixed list. Classification and entity matching are the operationally most important steps.
The five stages
01
Before any prompts run, AXscore classifies the property into a segment. The segment determines which questions real guests in this market actually ask — and therefore which prompts the scan runs. Classification triangulates three independent evidence sources: what the operator states about positioning, what the property's homepage objectively shows (schema.org markup, published rates, on-page positioning copy), and how AI engines already describe the property (extracted from prior scan responses). Where the three disagree, the disagreement itself is recorded as a finding — never silently resolved. The disagreement is often the most useful thing a report can tell an operator.
02
For each classified property, AXscore generates a 15-prompt basket distributed across three funnel bands that mirror how real guests research: 5 top-of-funnel queries (broad, country-level — "best luxury resorts in Australia"), 7 mid-funnel queries (intent + region — "luxury wilderness escape in Tasmania"), 3 bottom-of-funnel queries (region- and segment-locked). A leader scores high in the bottom band (its specific niche) but does not sweep the top (competes nationally). That funnel shape — not engineered difficulty — is what produces meaningful score separation between properties. Every query traces to evidence of real guest behaviour; no query enters a basket from operator or developer intuition about what “should” be tested.
03
Each prompt runs three times per engine to control for response variance. Every response is parsed for property names regardless of whether the property appears on any pre-existing list — the dataset is emergent. Naming variations (“The Calile,” “Calile Hotel Brisbane,” “Calile”) resolve to canonical entities via exact normalisation first, then AI disambiguation at a 0.85 confidence threshold. This entity-matching layer is operationally the most important step; without it, leaderboard rankings are fictional.
04
The visibility score is the share of scorable prompt-runs in which the property was named. Two rules govern “scorable” and both are non-negotiable. A non-empty answer that omits the property is a real miss and counts in the denominator. A structurally empty response (the engine declined to surface an AI answer for the query — common for Google AI Overview on hotel queries) produces no data point and is excluded from both numerator and denominator. “No answer” is not the same as “the engine answered and omitted you,” and conflating the two corrupts the score in opposite directions. A small share of opportunity-flagged queries (adjacent demand the property is unlikely to win) are reported separately as “adjacent demand we tested” and never depress the headline.
05
For each segment AXscore measures, the methodology is periodically checked against a known leader, a known mid-tier property, and a known weak property within that segment. Healthy basket: scores separate cleanly. Unhealthy basket: scores cluster, or rankings invert. When the calibration check flags a problem, AXscore investigates the basket realism, then mention extraction, then entity resolution — in that order. AXscore never hand-tunes queries to force scores to separate. The basket must reflect real guest queries; engineered difficulty would destroy the comparability that makes the dataset valuable.
The published taxonomy
Segments are constructed from a published, deliberately coarse taxonomy. A property does not get to invent its own segment; if a property's positioning doesn't map cleanly to a taxonomy segment, it is classified to the nearest evidenced match and the mismatch is noted in the report.
Region is the third axis — open-vocabulary, drawn from the property's actual market. Each experience tag must be independently evidenced from the property's surface or from how AI engines already describe it; tags without evidence are dropped.
The four engines
Each engine has different training data, retrieval behaviour, and citation conventions. Measuring all four prevents any one engine's biases from dominating the headline number.
The most widely used consumer AI assistant. Responses vary across model versions; AXscore runs multiple invocations per prompt to dampen this.
Strong on reasoning and source attribution. Used both as a target engine and as the primary model behind AXscore's entity matching and segment classification layers.
A search-augmented assistant that surfaces citations alongside answers. Useful for distinguishing AI memory from real-time citation behaviour.
The AI summary shown above Google search results. Accessed via SerpAPI for reliable programmatic capture of the same content guests see. Google frequently declines to generate an AI Overview for conversational hotel queries; those non-answers produce no data points (per Stage 04), so Google contributes fewer scorable runs than the other engines on hotel queries. That is correct: a non-answer carries no signal about visibility.
Two scores, side by side
Every scan produces both scores. Reports treat them as connected sections — the visibility score is the what, the technical foundation is the why.
Headline
The share of scorable prompt-runs in which the property was named, weighted across the engines that produced answers. Out of 100. The number every customer cares about.
Supporting
Six categories of technical signals: discoverability, structured data, content readability, booking interface accessibility, external presence, and agent infrastructure. Explains why the visibility score is what it is — and what to fix if it is lower than the property warrants.
One methodology, uniformly applied
Versioning
AXscore methodology v2.0, last updated 27 May 2026.
Earlier scans may have run under previous methodology versions (notably the v1 four-category basket prior to the funnel-band model). Where a report references an earlier version, that version's rules apply to that scan. Cross-version score comparisons are surfaced with a methodology note in the report.