20 ft · Builds · Visual essay · 11 min read

The Agent Panel.

Three hundred agents were built to choose a Chicago medspa the way a person does while scrolling on her phone: real Google reviews, star ratings, distance from home, treatment prices where published, and the uneasy silence where prices were missing.

Pretend, for a minute

This is the experiment: build a fictional medspa brand with four possible Chicago locations, drop those locations into the real competitive market, then ask 300 independent agents to choose where they would book. Each agent receives an agent context shelf: the eight nearby options and the same facts a human shopper would consider on her phone: Google reviews, star ratings, distance from home, visible prices, and recent review language. The useful part is not that an agent picked a winner. The useful part is seeing why every other option lost.

The mental model
1. 300 agents are built.Each one has its own persona, neighborhood, need, and agent context shelf: eight nearby clinics plus the facts visible to a shopper.
2. Distance decides who you see.If a clinic is too far, it never even shows up on the shortlist.
3. Pricing decides who you trust.A clinic that does not show its price loses a credibility tax before the shopper has even read a review.
4. The losses are the lesson.Each agent had to say why every option it didn't pick lost. Those reasons tell you what to fix.
Part 0 · The thirty-second version

If you have thirty seconds.

A made-up medspa company with four Chicago locations was thrown into the real local market. Then 300 agents were built. Each got a neighborhood, treatment need, budget, decision style, and an agent context shelf: eight nearby clinics with the same information a human would see while searching on a phone. Some clinics were real. Four were made-up. The agents were not told which was which.

Ukrainian Village won 33.7% of the time it appeared. Lincoln Park won 11.6%. South Loop 7.8%. West Loop 3.1%. Across the whole field, options with no visible pricing won 1.9% of their appearances. Options with structured core pricing won 24.0%. The sharpest split in a 52-clinic run was the price tag.

This is a synthetic instrument, not a census. That is the point: before buying a site or rewriting a brand, you can now pressure-test the agent context shelf, the actual set of alternatives, proof, distance, reviews, and prices a shopper compares, in an afternoon.

Visual 0 · The architecture

One orchestrator. Three hundred agents. One results file.

"Three hundred agents" means three hundred separate model calls, each with its own persona, neighborhood, and context shelf of nearby clinics.

Step 1 · Orchestrator
300 Agents are Built.
An Orchestrator Agent on the author's laptop generates 300 personas with attributes drawn from real Chicago neighborhood, treatment, and demographic mixes. Then it looks up real Chicago competitor clinics from Google Places, scrapes their published prices, and assembles each persona's 8-clinic context shelf.
Inputs: 300 personas, 52 facilities, 8-clinic context shelves
Step 2 · 300 parallel agents (each one a separate call to GPT-5.4-mini)
Sample agent · Agent 47 context shelf
NeighborhoodUkrainian Village TreatmentBotox / Dysport BudgetBalanced StyleConvenience-first Context shelf8 nearby clinics Pricing visibleWhere posted
Step 3 · Results
Each agent writes a booking choice.
Every agent returns the same structured response: one clinic booked, two backups, seven rejection reasons (one for each option not chosen), a 5-dimensional tradeoff score, and a 1-sentence verbatim quote. The Orchestrator Agent stitches the 300 responses into one results file.
Outputs: 300 bookings, 2,100 rejections, 1 results file
Orchestrator Agent → sends each persona + context shelf to a fresh API call → GPT-5.4-mini via OpenRouter → agent returns its written booking decision → combined results file
Orchestrator AgentGPT-5.4-mini via OpenRouter → combined results file
The point

Each agent has a private context window. It does not know about the other 299. It does not know which clinics belong to the made-up brand. It picks one place to book and explains why the other seven lost. Three hundred independent decisions, all in an afternoon.

Visual 0a · What each agent did

Build a shopper, build a context shelf, let her pick blind.

The architecture above shows the wiring. The panels below show one agent's context shelf: who she is, what she sees, and what she chooses.

Step 1
Build one Chicago shopper.
Lincoln Park Wants Botox First-time visit Premium-comfortable Reads reviews carefully 1 OF 300  ·  ALL DIFFERENT
Step 2
Show her eight nearby clinics, with phone-scroll proof.
A B C D E F G 0.4 mi · 4.9★ · $13/u 0.6 mi · 4.8★ · no price 0.5 mi · 4.7★ · $14/u 0.7 mi · 4.6★ · $15/u 0.8 mi · 4.7★ · no price 0.9 mi · 4.5★ · $15/u 1.1 mi · 5.0★ · $17/u rows C and F: the test brand, unlabeled 8 OPTIONS · REVIEWS · PRICES
Step 3
She picks one, in writing, with reasons.
BOOK Option C "close, calm, polished" A · "feels rushed" B · "weaker rating" D · "wrong vibe" E · "too far" F · "weaker trust" G · "not for me" H · "shorter reviews" REJECT REJECT REJECT REJECT REJECT REJECT REJECT 1 PICK  ·  7 REJECTIONS
The point

Repeat it 300 times and you get 300 booking choices, 2,100 written rejection reasons, and a 5-dimensional tradeoff score for every booking. That's the study. The rest is reading what came back.

Visual 0b · A single row, blown up

What an agent actually sees inside its context shelf.

The context shelf is the agent's search-results page: eight nearby clinics, each assembled from Google Places data: name, address, distance from home, star rating, review count, real review snippets, and visible prices where the operator publishes them. Below is one row from Agent 47's shelf.

SpaDerma · 1636 W Division St
★ 4.9  ·  1,247 reviews
Distance0.5 mi
Botox / Dysport$13 / unit
Consult feeFree
Five Google review signals visible to Agent 47
Review · 2 weeks ago"Best Botox I've ever had. Natural and subtle, exactly what I asked for. Pricing was clear up front, no surprise consult fee."
Review · 1 month ago"They take their time explaining. Felt safe as a first-timer. Less pushy than other places that tried to upsell me filler."
Review · 2 months ago"I compared a few places nearby and came back here because the reviews felt more specific about injectors, not just the front desk."
Review · 3 months ago"Honestly the most consistent injector I've found in Chicago. Posted prices, no haggling, real results."
Review · 4 months ago"Easy to book, clear about units, and the nurse talked me through what would look natural for my face instead of pushing more."
8 nearby clinics in this context shelf
Real Google data
Live pricing where posted
No ownership labels
Why this matters

The agent reads the same kind of context shelf an online shopper sees: reviews, stars, distance from home, and prices where prices exist. The thesis depends on that density: a 4.9-star clinic with 1,247 reviews, repeated natural-results language, and clear unit pricing is not the same object as a 4.9-star clinic with vague praise and hidden prices. When a clinic hides the price, the gap stays visible. The agents react to the silence.

Visual 1 · Who's in the room

Three hundred personas, a plausible composite of the Chicago medspa market.

The mix below is a working composite: neighborhood, treatment, budget, decision style, and category history. It gives the shelf enough shape to test where trust forms and where it breaks.

What she came in for
Botox or Dysport
30%
Skin rejuvenation
18%
Filler consult
18%
Laser hair removal
14%
Pre-event refresh
12%
Facial or peel
8%
Where she lives
Lincoln Park
19%
West Loop / Fulton Market
18%
River North / Gold Coast
17%
Ukrainian Village
17%
South Loop
15%
Wicker Park / Bucktown
14%
How she decides
Review-driven
32%
Trust-and-safety-first
28%
Convenience-first
20%
Aesthetic-brand-led
12%
Price-comparison
8%
History with the category
Currently uses a competitor
34%
Regular aesthetics client
28%
First-time medspa shopper
22%
Lapsed client
16%
30%Botox + Dysport shoppers
60%Review or trust-driven
22%First-time medspa shoppers
34%Currently use a competitor
30%Premium-comfortable on price
Visual 2 · The result

Same brand, four locations, one clear wedge.

How often each of the four made-up clinics was picked when it appeared on a shopper's eight-clinic shelf.

33.7%
Ukrainian Village · the neighborhood spot
Warm, local, $14/unit Botox, posted prices. The only location of the four with a real conversion story.
11.6%
Lincoln Park · the polished flagship
High consideration, low conversion. Shortlisted 61 out of 69 times it was shown. Booked 8.
3.1%
West Loop · the trendy urban brand
Highest exposure of any location, but rarely picked. 96 agents saw it, 3 booked it.
Part 1 · The instrument

An AI agent panel is a focus group that fits on a laptop.

Twelve real people in a conference room used to be one of the only ways to watch a customer reason through a choice. Three weeks, fifteen-to-thirty thousand dollars, one room. A language model now turns the first pass into software: 300 agents, one afternoon, one structured decision file.

This study used GPT-5.4-mini. Three hundred independent calls, fired through an Orchestrator Agent. Each agent had its own persona and context shelf. None knew which clinics belonged to the made-up brand. Each picked one and explained why.

Traditional market research asks people what they think. An agent panel forces a choice, then reads the story each shopper tells herself about why everyone else lost.

Part 2 · How the test stays honest

Three design choices do most of the work.

A study like this only matters if the shoppers are plausible and the test is fair. Three design choices carry most of the weight.

The made-up brand was blind. No "test brand" tag in the prompt. The four locations appeared with the same kind of name, rating, and review snippets as everyone else. Shoppers picked or rejected them the way they picked or rejected any other clinic.

Every rejection was written down. Eight clinics shown, one chosen, seven rejected. Across 300 agents, that is 2,100 sentences explaining what tipped the choice elsewhere. The losses turn a yes-or-no into an operating punch list.

Prices were real, or visibly missing. Pricing was scraped from competitors' actual websites, only where an official price page existed. Of 48 real Chicago competitors in the run, 30 published some pricing evidence; 15 showed structured core pricing. Missing prices were left missing. No averages were imputed. The shopper saw what a real online shopper would see: a price tag, or a noticeable silence where one should have been.

Visual 2a · The full pipeline

From raw data to a results file, in one picture.

This zooms out from the agent layer and shows where the data came from.

EXTERNAL DATA GOOGLE PLACES API 48 real Chicago medspas ratings, reviews, distances PRICING SCRAPER 30 / 48 had any price 15 had structured core pricing MADE-UP COMPANY 4 locations, real prices Lincoln Park, UV, WL, SL PERSONA BUILDER 300 imaginary shoppers neighborhood, need, style FACILITY CATALOG 52 facilities deduped, distance-tagged ORCHESTRATOR AGENT Builds 300 context shelves. Picks nearest 8 per persona, by distance. Strips test-brand labels from prompts. runs locally · author's laptop 300 calls GPT-5.4-MINI VIA OPENROUTER . . . 285 more . . . 300 separate context windows COMBINED RESULTS FILE 300 bookings · 2,100 rejections 1,500 tradeoff scores · verbatim quotes
External dataGoogle Places, pricing pages, four made-up clinics, and 300 persona seeds.48 real competitors, 30 with some pricing evidence, 15 with structured core pricing.
feeds
Facility catalog52 facilities are deduped, distance-tagged, and prepared for each shopper's shelf.Real options and made-up locations sit side by side with no ownership labels.
then
Orchestrator AgentBuilds 300 context shelves and picks the nearest eight clinics for each persona.It strips test-brand labels before the shopper sees the prompt.
runs
300 GPT-5.4-mini callsEach agent chooses one place to book and writes why the other seven lost.Same prompt template, different persona and context shelf.
returns
Combined results file300 bookings, 2,100 rejections, 1,500 tradeoff scores, and verbatim quotes.Everything in the essay reads off this file.
← swipe to see the full diagram →
InputsGoogle Places, a custom pricing scraper, and a persona generator. All real except the persona generator and the 4 made-up clinics.
Orchestrator AgentThe author's local agent system. Assembles facility catalogs, builds personas, manages the 300 LLM calls, and merges responses.
GPT-5.4-miniEach of the 300 calls runs in its own context window. Same prompt template, different persona and context shelf.
ResultsOne combined file with 300 booking choices, 2,100 written rejections, 5-dimensional tradeoff scores, and verbatim quotes. Everything in this essay reads off it.
Visual 3 · 300 agents, 300 decisions

Every dot is one agent, and one written booking decision.

Brass dots picked one of the four made-up clinics. Bone dots picked a real competitor. None of them ever knew which was which.

All Made-up brand wins Competitor wins
44 made-up brand wins · 256 competitor wins · 300 total agents
Of the four made-up clinics, one became a real neighborhood business in the agents' eyes. The other three got seen, sometimes shortlisted, and almost never booked. One was even the closest option a shopper had.
Part 3 · Four dots, four results

One operator, one city, one location that pulled away.

On a regular Chicago map, the four clinics would look like a sensible portfolio. Same brand, four neighborhoods, decent spread.

In the agents' choices, the map stopped being equal. Ukrainian Village converted roughly a third of the agents who saw it. Lincoln Park got shortlisted constantly and still barely converted. South Loop and West Loop mostly became noise.

Visual 4 · Trust density by location

One neighborhood pulled away. Three got seen, sometimes shortlisted, almost never booked.

Click any neighborhood below to see the breakdown, or press play to walk through all four.

Ukrainian Village Lincoln Park South Loop West Loop
34%Ukrainian Village 12%Lincoln Park 3%West Loop 8%South Loop LAKE
Visual 5 · The booking funnel by location

Lincoln Park got shortlisted almost as often as Ukrainian Village. It just didn't convert.

For each made-up clinic, the bar shows who saw it, who shortlisted it, and who actually booked it. Ukrainian Village converted 33.7% of shoppers who saw it. Lincoln Park converted 11.6%, even though most shoppers considered it.

Ukrainian Village
29 booked
39 shortlisted, didn't book
18 seen-only
34%
Lincoln Park
8
53 shortlisted, didn't book
8
12%
South Loop
4
25 shortlisted
22 seen-only
8%
West Loop
3
33 shortlisted
60 seen but never shortlisted
3%
Booked Shortlisted, didn't book Seen, never shortlisted

Lincoln Park is the failure mode worth sitting with. 61 of 69 agents who saw it shortlisted it. 4.9 stars. 342 reviews. Premium positioning. Polished language. Real posted prices. By every coverage-first underwriting test, this looked like the strongest site in the portfolio.

It converted 8 bookings. The other 53 shortlisted it and walked. The rejection language repeats across dozens of agents: "The premium, polished vibe is appealing, but $16/unit plus a $100 consultation makes it pricier than the best-value trust option."

The concept was not the problem. Premium-priced and trust-equivalent loses to trust-equivalent-and-cheaper. That distinction never shows up on a map.

Part 4 · The pricing layer

Clinics that did not show their prices won 1.9% of the time they appeared.

The sharpest split in the run came from something most operators still treat like a back-office detail: whether the clinic publishes prices on its own website.

Visual 6 · The pricing transparency split

Posted prices won. Hidden prices lost.

Win rate per option exposure, sorted by the kind of pricing evidence the clinic published on its own website. Missing prices were left visibly missing.

Structured core pricing (clear Botox / filler / consult prices)
24.0%
Any pricing or promo evidence published
17.3%
No published pricing found
1.9%
No website at all
0.0%
The point

Of 300 winning bookings, 286 went to clinics with some published pricing evidence. The cleaner split was structured core pricing: 24.0% win rate when the price was legible, 4.2% when it was not. Hidden prices looked like a credibility tax.

Hundreds of rejections used some version of "no published pricing found" or "harder to compare on budget." That language came from premium-comfortable, balanced, and budget-aware agents alike. Posted prices were not just a price signal. They were a trust signal.

This also explains Lincoln Park. Pricing was published, that part was working. The price itself was the problem: $16/unit plus a $100 consultation, on a shelf that included other 4.9-star clinics at $13/$14/unit. Transparent and premium loses to transparent and reasonable.

Part 5 · The controls

A thermometer that reads seventy in every room isn't a thermometer.

A panel of imaginary shoppers is only useful if the answer moves when the inputs move.

Change the treatment. 26% of filler-consult decisions. 20% of Botox decisions. 0% of laser hair removal. The made-up brand was positioned around injectables, and the agents picked up on that.

Change the shopper. "Premium-comfortable" agents picked the brand 23% of the time. "Balanced" and "budget-aware" agents picked it 11%. Same brand, same shelf, different shopper, different answer.

Look at who beat the brand. The winners were not random no-name clinics. They were recognizable local incumbents with stronger same-shelf trust, price, or service fit.

And the confidence check: agents reported 4.05/5 confidence on made-up-brand wins, 4.38 on competitor wins. Competitors won with more conviction, the right direction for an honest panel when one side has stronger same-shelf alternatives.

Three checks. Three ways the method could have looked broken. None did.

Visual 7 · Segment signal

Injectables pulled the brand up. Laser pulled it down.

Each card is a slice of the panel where the made-up brand had a different conversion rate. Click any card, or press play to walk through all six.

Pick a segment. The market shape starts to show.
Part 6 · Reading the losses

The most useful part of this study is not why people booked. It is why they didn't.

When a real customer books somewhere else, the operator learns nothing. She doesn't knock on the door to explain. The persona run produces the one thing a real customer almost never provides: a written reason another option won.

Two thousand one hundred reasons across the run. The most-cited reason agents picked someone else: price, value, or pricing transparency. Then brand and vibe. Then need fit. Then reviews. Price was not an afterthought. It was the first wound.

Two clinics with the same star rating and review tone, one with a posted $14/unit Botox price and one with no price published, did not read as the same business. They read as the credible one and the silent one.

Different losses imply different fixes. Distance loss: open closer. Review loss: build proof around the specific treatment. Vibe loss: rewrite the offer against the local set. Pricing loss: post the price, or accept that the shopper has already started to leave.

Visual 8 · Why shoppers picked someone else

The losses now lead with price and pricing transparency.

Coded reasons from the written rejections of the four made-up clinics. A single rejection sentence can mention more than one theme.

Visual 9 · What the shoppers actually said

A win and a loss, in the shoppers' own words.

Press play to watch the contrast once: first the review-and-price stack that won, then the premium offer that lost to a better-value shelf neighbor.

Win Loss
A Ukrainian Village win · busy parent, Botox

"I'd book Cool Medspa. It is basically in my neighborhood, the Botox price is clearly posted, and the reviews match exactly what I want: calm, natural-looking work without feeling upsold."

The winning location paired neighborhood convenience with a real, posted price and consultative review language. The combination is the wedge. A price tag is not enough on its own, and a vibe is not enough on its own.
A Lincoln Park loss · a pre-event refresh shopper

"The premium, polished vibe is appealing, but the $16/unit pricing plus consultation fee makes it pricier than the best-value trust option, and I don't need the most upscale choice."

The flagship had its prices posted, its reviews high, its positioning on point. It still lost, because the shopper could see, on the same shelf, other 4.9-star clinics charging less. Premium-priced and trust-equivalent loses to trust-equivalent-and-cheaper.
Part 7 · Past one medspa

Coverage has been the easy story.

A chain of local service businesses tends to be underwritten the same way. Look at a map. Draw a radius. Count the households. The inputs are visible. The customer is not.

The same setup works for dental, vet, urgent care, fitness, pet services, optical, home services, tutoring. Anywhere the customer is trying to avoid getting it wrong. Distance gets a place onto the shortlist. Reviews and language get it considered. Pricing transparency, in this run, helped decide who got booked.

For the operator, the question shifts from "where can a new location open?" to "where can the brand become the obvious safe choice, at a visible price?" Those produce different lists.

Part 8 · The validated persona business

The next AI business turns synthetic panels into underwritable markets.

The 300 agents are already useful. The next leap is calibration. The company that ties synthetic choices to actual bookings, transactions, and repeat behavior turns a sharp instrument into an underwritable one. That gap is the business.

Synthetic respondent panels already exist: fire a thousand agents at a question for a few hundred dollars instead of fielding a $50,000 survey. The bottleneck is trust. The buyer needs to know which panel reflects which real market, and how tightly.

The business that solves the gap does one hard thing:

It ties synthetic panels to real behavior: bookings, transactions, loyalty data, CRM outcomes. Every run gets graded against what customers actually did. The first customers are operators with enough first-party data to check the instrument. The validation becomes the moat.

The agent panel is the front end. Verified behavior is the substrate. Whoever owns the largest catalog of real consumer choice owns the category.

Method · the same thing in four steps

The whole study, on one page.

01
Build the shoppers300 imaginary Chicago shoppers with a real home neighborhood, what they want, how they shop, and their history with medspas.
02
Build the shelfEach agent sees 8 nearby clinics: real ones from Google plus four made-up ones, mixed together with no labels. Real published prices shown where the operator posted them. Missing prices left visibly missing.
03
Force a choicePick one to book, name two backups, explain why each of the other seven lost. In writing. No multiple choice.
04
Read the lossesEvery rejected made-up clinic came back with a reason. The reasons get sorted into themes that point at the operating fixes.
Operating notes

What this can do, and where it gets dangerous.

Directional instrument

The agents are AI models simulating shoppers. A single run produces a ranked set of places to look, not a court verdict. For an operator, that is already a different starting point.

Why the three controls matter

If the method were pure mush, every room would read seventy degrees. It did not. Different treatments produced different answers. Different shoppers produced different answers. Competitor wins also carried higher confidence than made-up-brand wins, 4.38 versus 4.05. That is not full market validation. It is signal with teeth.

Data provenance

The competitor clinics, ratings, review counts, and review snippets came from Google's official Places API. Google caps snippets, so the review set was real and recent, though not guaranteed to be Google's strictly-newest ordering. The four made-up clinics had synthetic review profiles, stripped of any "test brand" provenance before the agents saw them.

Positioning was varied

The four made-up clinics were deliberately given different positioning: a polished flagship, a warm neighborhood spot, a trendy urban brand, and a budget challenger. The result is not just a neighborhood test. It is a positioning test. A panel like this shows which concepts survive contact with a shopper before any of them get built.

Calibration path

The 300 agents were assembled from plausible distributions of Chicago neighborhood, treatment, and demographic mix. The next version ties those personas to spending data, transaction histories, or longitudinal panels. That is how the instrument becomes underwritable.

Source run: 300 GPT-5.4-mini agents · 44 made-up brand wins · 256 competitor wins · 8 options per agent context shelf · 52 facilities (48 real Chicago competitors plus 4 made-up locations) · 30 competitors with some pricing evidence · 15 with structured core pricing · 7 written rejection reasons per agent · 2,100 rejections total.

Bottom line

Not a survey. An instrument.

The result is not that the made-up brand won 14.7% of bookings. The result is that four positioning ideas, shown blind against the real shelf, produced four different answers. And the sharpest observed split came from something almost no operator treats like brand: whether the clinic published a price on its own website.

A persona panel does not replace the customer. It gives the operator a new first screen: where trust formed, where it broke, and which shelf to investigate next.