AI-Native Intelligence · Essay · 12 min read
The Principal and the Swarm
The Shift
AI agents make it free to build intelligence systems no one would have funded, staffed, or subscribed to before.
Everyone is focused on faster reports, cheaper analysis, and broader screening. That is real, and it is already commoditizing. The bigger shift is new intelligence: questions that used to be uneconomic are suddenly buildable.
The shift
Not just faster diligence. New diligence.
The important change is not speed alone. It is the ability to ask questions that would never have justified a team, a budget, or a software product before.
What matters now
Judgment. Agency. Ability to drive execution.
As execution gets cheap, the scarce resource shifts upward. The edge is framing the problem, picking the right lane, and getting the work all the way through.
One system. Eight experiments. A map of what is possible now.
Part 1 shows the operating system. Part 2 shows the experiments it produced, from Chicago permit data to consumer market simulation to research work.
Who this is for: people in finance and operations who want to see what is actually possible now, and builders who want to see where the leverage is.
I have spent the last several months building these systems, and one finding keeps recurring: as the cost of building drops toward zero, the value of knowing what to build goes up.
First, the operating system. Then, the experiments it made possible.
If you want the macro view of what this shift means, read The New Stable Orbits. This post shows the operating system and the experiments; that one zooms out to the broader structural change.
The Operating System
One person. A research firm's output.
This is the working shape of the system. One principal directs the work. Stable associates handle writing, analysis, engineering, and audit. A wider specialist bench comes in when needed.
The tools matter. What matters more is judgment, agency, and the ability to drive execution.
Project Workspace
Claude Code
Separate harness
Orchestration · Principal
Frontier Reasoning Model
Routes tasks, manages context, coordinates the team
OpenAI Codex
Separate harness
Adjacent workstreams in the same project workspace.
Associates
Creative
Writing, design, strategy
Analysis
Research, analysis, fast iteration
Engineering
Code, debugging, architecture
Audit
QA, review, verification
Specialist Pool · On-Demand
Open-source and frontier models on US servers, called when specialized capabilities are needed.
The infrastructure is OpenClaw and Hermes agents, open-source orchestration layers. The system remembers context across sessions, can be reached from anywhere, and spawns specialized sub-agents whose work gets reviewed before it counts. The principal reviews every associate's work and respawns with corrections, like marking up a teammate's draft before it ships.
The entire system runs on a laptop. What remains expensive is judgment, agency, and the ability to drive execution: framing the problem, asking the right questions, and directing specialized intelligence where it matters most.
Eight Probes into AI-Native Intelligence
These are high-level overviews of some experiments. I will release and go into further detail on various experiments in future blog posts. Everything below started the same way: a question relevant to investing or business intelligence, described in plain English. Most went from idea to working MVP in a single session. None required hiring a developer or engaging a consultant.
The point isn't finished products. These are experiments, quick, free probes into whether AI agents can produce genuine intelligence in domains that previously required expensive infrastructure.
Experiment 1
Business Intelligence100 AI Analysts Debate Peloton's Future
Structured adversarial simulation
The Question
Can a swarm of differently motivated AI analysts stress-test a company thesis better than a one-shot investment memo?
The Setup
100 agents are assigned distinct archetypes: dedicated subscriber, growth investor, short seller, skeptic, brand loyalist, cost-conscious consumer, and more. They debate Peloton over five rounds. The visual shows whether disagreement stays real or collapses into consensus.
Round 1. The debate starts dispersed, with only a slight bearish edge.
55% bearishScroll-in animates once. Controls let you compare the opening distribution with the final consensus.
The Finding
Peloton converged to a strong bearish consensus: 91% bearish by round five. The Lululemon control did not converge. The point is not that swarms always agree. It is that they can separate fragile theses from genuinely contested ones.
Gemini ran the agent swarm. Opus synthesized the output. Total cost: a few dollars.
The pattern
Agent swarms can pressure-test a thesis against structured diversity in an afternoon instead of a month.
Experiment 2
Business Intelligence300 AI Personas Simulate a Consumer Market
Real reviews, real pricing, real distances, simulated decisions at $1.80
The Question
Can AI personas grounded in real market inputs reveal hidden consumer demand faster than a traditional research project?
The Setup
Each persona gets a home location, household archetype, and price sensitivity. They are shown nearby facilities using real Google reviews, published pricing, and GPS-derived distance so the simulation reflects actual tradeoffs rather than abstract survey answers.
The Finding
The useful signal lives in the gap between first choice and would consider. Location C is not weak; it is under-converted. 71% of personas would consider it, but only 8% pick it first. That is latent demand a real operator can act on.
Traditional version: $5K+ budget and roughly three months. This version: $1.80, same day, rerunnable whenever the market changes.
The pattern
AI persona simulations grounded in real market data can produce consumer intelligence that used to require months and thousands of dollars, then rerun whenever inputs change.
Experiment 3
Business IntelligenceChicago Permit Velocity
Where public data meets investment intelligence
The Question
Where are renovation permits moving fastest and slowest in Chicago, and is that changing over time?
The Setup
10,047 permits across all 77 community areas are normalized into a neighborhood ranking surface. The view can switch between current speed and six-month momentum so the reader can distinguish a slow market from one that is improving.
Neighborhood intelligence
spread between the fastest and slowest permit markets in the same city
Median
42d
Ranked
75
Fastest → slowest
The Finding
Permit timing is not one citywide market. It is a collection of neighborhood markets with materially different operating conditions. That makes permit velocity useful for underwriting, expansion planning, and local business timing.
Codex wrote the pipeline. Opus audited the methodology. Total software cost: zero.
The pattern
Any public data source that is technically accessible but practically ignored can become a free intelligence layer.
Experiment 4
Business IntelligenceLocal Market Intelligence on Autopilot
Daily composite scoring across 6 markets, zero API cost
The Question
Can a fully automated public-data stack produce a daily market brief that feels closer to internal strategy intelligence than generic macro commentary?
The Setup
20+ collectors pull from TSA, FRED, gas, weather, Google Trends, news, RSS, and local signals. Those inputs are scored into a composite market index and synthesized into a daily brief without manual work.
Composite Market Index
Market Comparison · 6 regions
The Finding
This is the shape of CEO-grade local intelligence built entirely from free public data. Once the collection, scoring, and synthesis loop is automated, the brief becomes cheap to rerun and easy to expand to new markets.
20+ collectors. 400+ days of history. Zero manual input.
The pattern
Repeatable local intelligence layers, composite scoring, daily collection, and AI synthesis can be built from public data for almost any market at near-zero cost.
Experiment 5
Personal ExperimentThe System That Taught Itself to Sell Volatility
SPX options, AI-directed trading with feedback loops
The Question
Can an AI-directed system learn which options structures consistently survive across regimes instead of relying on static trading heuristics?
The Setup
Three layers work together: AutoResearcher for backtests, a regime playbook for market context, and an LLM override for real-time macro and news. The system reviews outcomes nightly and updates its preferences over time.
The Finding
The system started with no preference and taught itself that credit spreads were the durable answer. That held up in a second, uncorrelated market. The important point is not a single win-rate number. It is that the loop can learn, veto, and adapt.
BTC cross-check: credit spreads at 91.8% win rate versus 40.2% for debit spreads.
The pattern
Any domain with measurable outcomes and repeatable decisions is a candidate for learning systems that compound over time.
Experiment 6
Business IntelligencePricing Analysis of Pet Services Market
Published rates, review context, and radius-to-location mapping turned into a market benchmark
The Question
Can scattered public pricing pages, review context, and geography be turned into a usable competitive benchmark for a fragmented local market?
The Setup
Public boarding and daycare rates are scraped, normalized, and mapped to the nearest operating location. Reviews and package structure add context, so the output becomes a market ladder instead of a pile of disconnected websites.
Benchmark snapshot
One pass turns scattered public pricing into a market ladder you can actually benchmark against.
Market ladder
Budget
Core
Premium
The Finding
Once public pricing is normalized, the market stops looking like isolated websites and starts looking like structure: a floor, a midpoint, a premium tier, and a real spatial competitive set.
No private data and no hidden geography on this page. Just the broader lesson: public rates, review context, and basic geo logic become a useful benchmark surprisingly fast.
The pattern
Public websites already contain a large share of local market structure. With scraping, normalization, and simple geo logic, competitor pricing stops being scattered pages and becomes a usable benchmark.
Experiment 7
ResearchA Physics Paper Without a Physics Degree
Get Physics Done, from curiosity to research artifact in days
The Question
Can an AI-native research workflow turn curiosity in a domain I do not know into something that actually looks and feels like a structured research artifact?
The Setup
Multiple model passes were used to explore competing physics frameworks, draft arguments, and clean the output into a coherent artifact. The point is not peer review yet; it is whether the workflow can get from zero to serious-looking research in days.
QFT
Quantum Field Theory
FVD
False Vacuum Decay
CCC
Penrose Conformal Cycles
BIO
Bohm Implicate Order
5 models · ~40 pages · structured research artifact
Multiple frameworks, multiple model passes, quality control at every stage.
The Finding
The surprise was not the specific physics claim. It was how quickly the workflow moved from loose curiosity to a real research-shaped artifact. Whether the conclusion survives expert scrutiny is a separate question from whether the pipeline works.
The pattern
Domain expertise barriers are falling. AI-native workflows collapse the distance between curiosity and a structured first-pass research artifact from semesters to days.
Experiment 8
ResearchCan Relational Structure Predict the Future?
Intelligence Density, measuring predictive signal in how things connect
The Question
If you know how the parts of a system relate to each other, not just what the parts are, can that relational structure improve prediction?
The Setup
The framework compares a relation-aware predictor against a matched baseline that only sees local state. First it is tested in a synthetic coupled system, then in real language-model traces, to see whether structure itself carries signal.
Synthetic · Coupled Markov
13.8 ± 3.0 pp over matched baseline
At Scale · GPT-2 Prompt Traces
n = 500 prompts · relation-aware prediction still beats the baseline
Accuracy shift
.301
Baseline
.317
Relation-aware
The signal survives contact with a real language model, even if it is much smaller than in the synthetic system.
The Finding
Relational structure appears to carry predictive information. It is strong in a synthetic coupled system and still positive in real language-model traces. The contribution is not just the result; it is a reusable way to ask whether structure itself contains signal.
In plain English: how things connect may help predict what happens next, beyond knowing the things alone.
The pattern
Relational structure, how components connect and not just what they are, may be a measurable predictive signal. The framework to test that is reusable across domains.
What's Next
The experiments continue
None of these are finished products. They're probes, fast, cheap tests of whether AI agents with human judgment can produce real intelligence.
The consistent finding: they can. And the marginal cost of each new experiment is approaching zero because the operating system is reusable. Every new question just needs a brief and a session.
The firms that build this capability will compound an information advantage with every deal, every quarter, every new question they think to ask. The ones waiting for someone to package it into a SaaS product will pay a subscription for yesterday's intelligence.
The tools are open source. The compute is cheap. The scarce resource is the same one it's always been: knowing what to ask.
Coming Soon
This post is a primer
Various experiments will have their own upcoming posts. This piece is meant to be the high-level map first.
Want the 20,000-foot view?
The Principal and the Swarm shows what these systems can do. The New Stable Orbits is the macro companion piece — what happens when the cost of building intelligence collapses, which business shapes become newly viable, and why the middle gets squeezed.