The Agent Panel.
Three hundred agents were built to choose a Chicago medspa the way a person does while scrolling on her phone: real Google reviews, star ratings, distance from home, treatment prices where published, and the uneasy silence where prices were missing.
This is the experiment: build a fictional medspa brand with four possible Chicago locations, drop those locations into the real competitive market, then ask 300 independent agents to choose where they would book. Each agent receives an agent context shelf: the eight nearby options and the same facts a human shopper would consider on her phone: Google reviews, star ratings, distance from home, visible prices, and recent review language. The useful part is not that an agent picked a winner. The useful part is seeing why every other option lost.
If you have thirty seconds.
A made-up medspa company with four Chicago locations was thrown into the real local market. Then 300 agents were built. Each got a neighborhood, treatment need, budget, decision style, and an agent context shelf: eight nearby clinics with the same information a human would see while searching on a phone. Some clinics were real. Four were made-up. The agents were not told which was which.
Ukrainian Village won 33.7% of the time it appeared. Lincoln Park won 11.6%. South Loop 7.8%. West Loop 3.1%. Across the whole field, options with no visible pricing won 1.9% of their appearances. Options with structured core pricing won 24.0%. The sharpest split in a 52-clinic run was the price tag.
This is a synthetic instrument, not a census. That is the point: before buying a site or rewriting a brand, you can now pressure-test the agent context shelf, the actual set of alternatives, proof, distance, reviews, and prices a shopper compares, in an afternoon.
One orchestrator. Three hundred agents. One results file.
"Three hundred agents" means three hundred separate model calls, each with its own persona, neighborhood, and context shelf of nearby clinics.
Each agent has a private context window. It does not know about the other 299. It does not know which clinics belong to the made-up brand. It picks one place to book and explains why the other seven lost. Three hundred independent decisions, all in an afternoon.
Build a shopper, build a context shelf, let her pick blind.
The architecture above shows the wiring. The panels below show one agent's context shelf: who she is, what she sees, and what she chooses.
Repeat it 300 times and you get 300 booking choices, 2,100 written rejection reasons, and a 5-dimensional tradeoff score for every booking. That's the study. The rest is reading what came back.
What an agent actually sees inside its context shelf.
The context shelf is the agent's search-results page: eight nearby clinics, each assembled from Google Places data: name, address, distance from home, star rating, review count, real review snippets, and visible prices where the operator publishes them. Below is one row from Agent 47's shelf.
The agent reads the same kind of context shelf an online shopper sees: reviews, stars, distance from home, and prices where prices exist. The thesis depends on that density: a 4.9-star clinic with 1,247 reviews, repeated natural-results language, and clear unit pricing is not the same object as a 4.9-star clinic with vague praise and hidden prices. When a clinic hides the price, the gap stays visible. The agents react to the silence.
Three hundred personas, a plausible composite of the Chicago medspa market.
The mix below is a working composite: neighborhood, treatment, budget, decision style, and category history. It gives the shelf enough shape to test where trust forms and where it breaks.
Same brand, four locations, one clear wedge.
How often each of the four made-up clinics was picked when it appeared on a shopper's eight-clinic shelf.
An AI agent panel is a focus group that fits on a laptop.
Twelve real people in a conference room used to be one of the only ways to watch a customer reason through a choice. Three weeks, fifteen-to-thirty thousand dollars, one room. A language model now turns the first pass into software: 300 agents, one afternoon, one structured decision file.
This study used GPT-5.4-mini. Three hundred independent calls, fired through an Orchestrator Agent. Each agent had its own persona and context shelf. None knew which clinics belonged to the made-up brand. Each picked one and explained why.
Traditional market research asks people what they think. An agent panel forces a choice, then reads the story each shopper tells herself about why everyone else lost.
Three design choices do most of the work.
A study like this only matters if the shoppers are plausible and the test is fair. Three design choices carry most of the weight.
The made-up brand was blind. No "test brand" tag in the prompt. The four locations appeared with the same kind of name, rating, and review snippets as everyone else. Shoppers picked or rejected them the way they picked or rejected any other clinic.
Every rejection was written down. Eight clinics shown, one chosen, seven rejected. Across 300 agents, that is 2,100 sentences explaining what tipped the choice elsewhere. The losses turn a yes-or-no into an operating punch list.
Prices were real, or visibly missing. Pricing was scraped from competitors' actual websites, only where an official price page existed. Of 48 real Chicago competitors in the run, 30 published some pricing evidence; 15 showed structured core pricing. Missing prices were left missing. No averages were imputed. The shopper saw what a real online shopper would see: a price tag, or a noticeable silence where one should have been.
From raw data to a results file, in one picture.
This zooms out from the agent layer and shows where the data came from.
Every dot is one agent, and one written booking decision.
Brass dots picked one of the four made-up clinics. Bone dots picked a real competitor. None of them ever knew which was which.
Of the four made-up clinics, one became a real neighborhood business in the agents' eyes. The other three got seen, sometimes shortlisted, and almost never booked. One was even the closest option a shopper had.
One operator, one city, one location that pulled away.
On a regular Chicago map, the four clinics would look like a sensible portfolio. Same brand, four neighborhoods, decent spread.
In the agents' choices, the map stopped being equal. Ukrainian Village converted roughly a third of the agents who saw it. Lincoln Park got shortlisted constantly and still barely converted. South Loop and West Loop mostly became noise.
One neighborhood pulled away. Three got seen, sometimes shortlisted, almost never booked.
Click any neighborhood below to see the breakdown, or press play to walk through all four.
Lincoln Park got shortlisted almost as often as Ukrainian Village. It just didn't convert.
For each made-up clinic, the bar shows who saw it, who shortlisted it, and who actually booked it. Ukrainian Village converted 33.7% of shoppers who saw it. Lincoln Park converted 11.6%, even though most shoppers considered it.
Lincoln Park is the failure mode worth sitting with. 61 of 69 agents who saw it shortlisted it. 4.9 stars. 342 reviews. Premium positioning. Polished language. Real posted prices. By every coverage-first underwriting test, this looked like the strongest site in the portfolio.
It converted 8 bookings. The other 53 shortlisted it and walked. The rejection language repeats across dozens of agents: "The premium, polished vibe is appealing, but $16/unit plus a $100 consultation makes it pricier than the best-value trust option."
The concept was not the problem. Premium-priced and trust-equivalent loses to trust-equivalent-and-cheaper. That distinction never shows up on a map.
Clinics that did not show their prices won 1.9% of the time they appeared.
The sharpest split in the run came from something most operators still treat like a back-office detail: whether the clinic publishes prices on its own website.
Posted prices won. Hidden prices lost.
Win rate per option exposure, sorted by the kind of pricing evidence the clinic published on its own website. Missing prices were left visibly missing.
Of 300 winning bookings, 286 went to clinics with some published pricing evidence. The cleaner split was structured core pricing: 24.0% win rate when the price was legible, 4.2% when it was not. Hidden prices looked like a credibility tax.
Hundreds of rejections used some version of "no published pricing found" or "harder to compare on budget." That language came from premium-comfortable, balanced, and budget-aware agents alike. Posted prices were not just a price signal. They were a trust signal.
This also explains Lincoln Park. Pricing was published, that part was working. The price itself was the problem: $16/unit plus a $100 consultation, on a shelf that included other 4.9-star clinics at $13/$14/unit. Transparent and premium loses to transparent and reasonable.
A thermometer that reads seventy in every room isn't a thermometer.
A panel of imaginary shoppers is only useful if the answer moves when the inputs move.
Change the treatment. 26% of filler-consult decisions. 20% of Botox decisions. 0% of laser hair removal. The made-up brand was positioned around injectables, and the agents picked up on that.
Change the shopper. "Premium-comfortable" agents picked the brand 23% of the time. "Balanced" and "budget-aware" agents picked it 11%. Same brand, same shelf, different shopper, different answer.
Look at who beat the brand. The winners were not random no-name clinics. They were recognizable local incumbents with stronger same-shelf trust, price, or service fit.
And the confidence check: agents reported 4.05/5 confidence on made-up-brand wins, 4.38 on competitor wins. Competitors won with more conviction, the right direction for an honest panel when one side has stronger same-shelf alternatives.
Three checks. Three ways the method could have looked broken. None did.
Injectables pulled the brand up. Laser pulled it down.
Each card is a slice of the panel where the made-up brand had a different conversion rate. Click any card, or press play to walk through all six.
The most useful part of this study is not why people booked. It is why they didn't.
When a real customer books somewhere else, the operator learns nothing. She doesn't knock on the door to explain. The persona run produces the one thing a real customer almost never provides: a written reason another option won.
Two thousand one hundred reasons across the run. The most-cited reason agents picked someone else: price, value, or pricing transparency. Then brand and vibe. Then need fit. Then reviews. Price was not an afterthought. It was the first wound.
Two clinics with the same star rating and review tone, one with a posted $14/unit Botox price and one with no price published, did not read as the same business. They read as the credible one and the silent one.
Different losses imply different fixes. Distance loss: open closer. Review loss: build proof around the specific treatment. Vibe loss: rewrite the offer against the local set. Pricing loss: post the price, or accept that the shopper has already started to leave.
The losses now lead with price and pricing transparency.
Coded reasons from the written rejections of the four made-up clinics. A single rejection sentence can mention more than one theme.
A win and a loss, in the shoppers' own words.
Press play to watch the contrast once: first the review-and-price stack that won, then the premium offer that lost to a better-value shelf neighbor.
"I'd book Cool Medspa. It is basically in my neighborhood, the Botox price is clearly posted, and the reviews match exactly what I want: calm, natural-looking work without feeling upsold."
"The premium, polished vibe is appealing, but the $16/unit pricing plus consultation fee makes it pricier than the best-value trust option, and I don't need the most upscale choice."
Coverage has been the easy story.
A chain of local service businesses tends to be underwritten the same way. Look at a map. Draw a radius. Count the households. The inputs are visible. The customer is not.
The same setup works for dental, vet, urgent care, fitness, pet services, optical, home services, tutoring. Anywhere the customer is trying to avoid getting it wrong. Distance gets a place onto the shortlist. Reviews and language get it considered. Pricing transparency, in this run, helped decide who got booked.
For the operator, the question shifts from "where can a new location open?" to "where can the brand become the obvious safe choice, at a visible price?" Those produce different lists.
The next AI business turns synthetic panels into underwritable markets.
The 300 agents are already useful. The next leap is calibration. The company that ties synthetic choices to actual bookings, transactions, and repeat behavior turns a sharp instrument into an underwritable one. That gap is the business.
Synthetic respondent panels already exist: fire a thousand agents at a question for a few hundred dollars instead of fielding a $50,000 survey. The bottleneck is trust. The buyer needs to know which panel reflects which real market, and how tightly.
The business that solves the gap does one hard thing:
It ties synthetic panels to real behavior: bookings, transactions, loyalty data, CRM outcomes. Every run gets graded against what customers actually did. The first customers are operators with enough first-party data to check the instrument. The validation becomes the moat.
The agent panel is the front end. Verified behavior is the substrate. Whoever owns the largest catalog of real consumer choice owns the category.
The whole study, on one page.
What this can do, and where it gets dangerous.
The agents are AI models simulating shoppers. A single run produces a ranked set of places to look, not a court verdict. For an operator, that is already a different starting point.
If the method were pure mush, every room would read seventy degrees. It did not. Different treatments produced different answers. Different shoppers produced different answers. Competitor wins also carried higher confidence than made-up-brand wins, 4.38 versus 4.05. That is not full market validation. It is signal with teeth.
The competitor clinics, ratings, review counts, and review snippets came from Google's official Places API. Google caps snippets, so the review set was real and recent, though not guaranteed to be Google's strictly-newest ordering. The four made-up clinics had synthetic review profiles, stripped of any "test brand" provenance before the agents saw them.
The four made-up clinics were deliberately given different positioning: a polished flagship, a warm neighborhood spot, a trendy urban brand, and a budget challenger. The result is not just a neighborhood test. It is a positioning test. A panel like this shows which concepts survive contact with a shopper before any of them get built.
The 300 agents were assembled from plausible distributions of Chicago neighborhood, treatment, and demographic mix. The next version ties those personas to spending data, transaction histories, or longitudinal panels. That is how the instrument becomes underwritable.
Source run: 300 GPT-5.4-mini agents · 44 made-up brand wins · 256 competitor wins · 8 options per agent context shelf · 52 facilities (48 real Chicago competitors plus 4 made-up locations) · 30 competitors with some pricing evidence · 15 with structured core pricing · 7 written rejection reasons per agent · 2,100 rejections total.
Not a survey. An instrument.
The result is not that the made-up brand won 14.7% of bookings. The result is that four positioning ideas, shown blind against the real shelf, produced four different answers. And the sharpest observed split came from something almost no operator treats like brand: whether the clinic published a price on its own website.
A persona panel does not replace the customer. It gives the operator a new first screen: where trust formed, where it broke, and which shelf to investigate next.