AI-Native Intelligence · Essay · 12 min read

The Principal and the Swarm

PART 0

The Shift

AI agents make it free to build intelligence systems no one would have funded, staffed, or subscribed to before.

Everyone is focused on faster reports, cheaper analysis, and broader screening. That is real, and it is already commoditizing. The bigger shift is new intelligence: questions that used to be uneconomic are suddenly buildable.

The shift

Not just faster diligence. New diligence.

The important change is not speed alone. It is the ability to ask questions that would never have justified a team, a budget, or a software product before.

What matters now

Judgment. Agency. Ability to drive execution.

As execution gets cheap, the scarce resource shifts upward. The edge is framing the problem, picking the right lane, and getting the work all the way through.

One system. Eight experiments. A map of what is possible now.

Part 1 shows the operating system. Part 2 shows the experiments it produced, from Chicago permit data to consumer market simulation to research work.

Who this is for: people in finance and operations who want to see what is actually possible now, and builders who want to see where the leverage is.

I have spent the last several months building these systems, and one finding keeps recurring: as the cost of building drops toward zero, the value of knowing what to build goes up.

First, the operating system. Then, the experiments it made possible.

If you want the macro view of what this shift means, read The New Stable Orbits. This post shows the operating system and the experiments; that one zooms out to the broader structural change.

PART 1

The Operating System

One person. A research firm's output.

This is the working shape of the system. One principal directs the work. Stable associates handle writing, analysis, engineering, and audit. A wider specialist bench comes in when needed.

The tools matter. What matters more is judgment, agency, and the ability to drive execution.

Project Workspace

Claude Code

Separate harness

Orchestration · Principal

Frontier Reasoning Model

Routes tasks, manages context, coordinates the team

OpenAI Codex

Separate harness

Adjacent workstreams in the same project workspace.

Associates

Creative

Writing, design, strategy

Analysis

Research, analysis, fast iteration

Engineering

Code, debugging, architecture

Audit

QA, review, verification

Specialist Pool · On-Demand

Gemini Grok Kimi Minimax Claude GPT 5.4 + Others

Open-source and frontier models on US servers, called when specialized capabilities are needed.

The infrastructure is OpenClaw and Hermes agents, open-source orchestration layers. The system remembers context across sessions, can be reached from anywhere, and spawns specialized sub-agents whose work gets reviewed before it counts. The principal reviews every associate's work and respawns with corrections, like marking up a teammate's draft before it ships.

The entire system runs on a laptop. What remains expensive is judgment, agency, and the ability to drive execution: framing the problem, asking the right questions, and directing specialized intelligence where it matters most.

PART 2

Eight Probes into AI-Native Intelligence

These are high-level overviews of some experiments. I will release and go into further detail on various experiments in future blog posts. Everything below started the same way: a question relevant to investing or business intelligence, described in plain English. Most went from idea to working MVP in a single session. None required hiring a developer or engaging a consultant.

The point isn't finished products. These are experiments, quick, free probes into whether AI agents can produce genuine intelligence in domains that previously required expensive infrastructure.

Experiment 1

Business Intelligence

100 AI Analysts Debate Peloton's Future

Structured adversarial simulation

The Question

Can a swarm of differently motivated AI analysts stress-test a company thesis better than a one-shot investment memo?

The Setup

100 agents are assigned distinct archetypes: dedicated subscriber, growth investor, short seller, skeptic, brand loyalist, cost-conscious consumer, and more. They debate Peloton over five rounds. The visual shows whether disagreement stays real or collapses into consensus.

Round 1. The debate starts dispersed, with only a slight bearish edge.

55% bearish

Scroll-in animates once. Controls let you compare the opening distribution with the final consensus.

The Finding

Peloton converged to a strong bearish consensus: 91% bearish by round five. The Lululemon control did not converge. The point is not that swarms always agree. It is that they can separate fragile theses from genuinely contested ones.

Gemini ran the agent swarm. Opus synthesized the output. Total cost: a few dollars.

The pattern

Agent swarms can pressure-test a thesis against structured diversity in an afternoon instead of a month.

Experiment 2

Business Intelligence

300 AI Personas Simulate a Consumer Market

Real reviews, real pricing, real distances, simulated decisions at $1.80

The Question

Can AI personas grounded in real market inputs reveal hidden consumer demand faster than a traditional research project?

The Setup

Each persona gets a home location, household archetype, and price sensitivity. They are shown nearby facilities using real Google reviews, published pricing, and GPS-derived distance so the simulation reflects actual tradeoffs rather than abstract survey answers.

First choice 300 personas · 38 facilities · real data Would consider

Location A

34%

22%

Location B

26%

31%

Location C Hidden demand

71%

Location D

18%

14%

The Finding

The useful signal lives in the gap between first choice and would consider. Location C is not weak; it is under-converted. 71% of personas would consider it, but only 8% pick it first. That is latent demand a real operator can act on.

Traditional version: $5K+ budget and roughly three months. This version: $1.80, same day, rerunnable whenever the market changes.

The pattern

AI persona simulations grounded in real market data can produce consumer intelligence that used to require months and thousands of dollars, then rerun whenever inputs change.

Experiment 3

Business Intelligence

Chicago Permit Velocity

Where public data meets investment intelligence

The Question

Where are renovation permits moving fastest and slowest in Chicago, and is that changing over time?

The Setup

10,047 permits across all 77 community areas are normalized into a neighborhood ranking surface. The view can switch between current speed and six-month momentum so the reader can distinguish a slow market from one that is improving.

Neighborhood intelligence

7.7×

spread between the fastest and slowest permit markets in the same city

Median

42d

Ranked

Fastest → slowest

Citywide median marker at 42 days threshold-adjusted

The Finding

Permit timing is not one citywide market. It is a collection of neighborhood markets with materially different operating conditions. That makes permit velocity useful for underwriting, expansion planning, and local business timing.

Codex wrote the pipeline. Opus audited the methodology. Total software cost: zero.

The pattern

Any public data source that is technically accessible but practically ignored can become a free intelligence layer.

Experiment 4

Business Intelligence

Local Market Intelligence on Autopilot

Daily composite scoring across 6 markets, zero API cost

The Question

Can a fully automated public-data stack produce a daily market brief that feels closer to internal strategy intelligence than generic macro commentary?

The Setup

20+ collectors pull from TSA, FRED, gas, weather, Google Trends, news, RSS, and local signals. Those inputs are scored into a composite market index and synthesized into a daily brief without manual work.

Composite Market Index

0 of 100

>65 Bullish <40 Bearish

Market Comparison · 6 regions

Market 1 72

Market 2 67

Market 3 61

Market 4 55

Market 5 48

Market 6 38

Collectors FREDEIATSATrendsZillowWeatherBLS+5 → AI Synthesis

The Finding

This is the shape of CEO-grade local intelligence built entirely from free public data. Once the collection, scoring, and synthesis loop is automated, the brief becomes cheap to rerun and easy to expand to new markets.

20+ collectors. 400+ days of history. Zero manual input.

The pattern

Repeatable local intelligence layers, composite scoring, daily collection, and AI synthesis can be built from public data for almost any market at near-zero cost.

Experiment 5

Personal Experiment

The System That Taught Itself to Sell Volatility

SPX options, AI-directed trading with feedback loops

The Question

Can an AI-directed system learn which options structures consistently survive across regimes instead of relying on static trading heuristics?

The Setup

Three layers work together: AutoResearcher for backtests, a regime playbook for market context, and an LLM override for real-time macro and news. The system reviews outcomes nightly and updates its preferences over time.

Bull Put Spreads Dominant

96.3%

Iron Condors

70.8%

Debit Spreads Eliminated

40.2%

The Finding

The system started with no preference and taught itself that credit spreads were the durable answer. That held up in a second, uncorrelated market. The important point is not a single win-rate number. It is that the loop can learn, veto, and adapt.

BTC cross-check: credit spreads at 91.8% win rate versus 40.2% for debit spreads.

The pattern

Any domain with measurable outcomes and repeatable decisions is a candidate for learning systems that compound over time.

Experiment 6

Business Intelligence

Pricing Analysis of Pet Services Market

Published rates, review context, and radius-to-location mapping turned into a market benchmark

The Question

Can scattered public pricing pages, review context, and geography be turned into a usable competitive benchmark for a fragmented local market?

The Setup

Public boarding and daycare rates are scraped, normalized, and mapped to the nearest operating location. Reviews and package structure add context, so the output becomes a market ladder instead of a pile of disconnected websites.

Benchmark snapshot

0 operators with public pricing captured

One pass turns scattered public pricing into a market ladder you can actually benchmark against.

Boarding floor $43

Core boarding $60

Premium boarding $75

Public coverage 66%

Market ladder

Budget

Boarding$43

Daycare$29

Rating4.2

Core

Boarding$60

Daycare$36

Rating4.5

Premium

Boarding$75

Daycare$43

Rating4.7

Review context and distance-to-location mapping explain where an operator sits in the market ladder, not just what price appears on the website.

The Finding

Once public pricing is normalized, the market stops looking like isolated websites and starts looking like structure: a floor, a midpoint, a premium tier, and a real spatial competitive set.

No private data and no hidden geography on this page. Just the broader lesson: public rates, review context, and basic geo logic become a useful benchmark surprisingly fast.

The pattern

Public websites already contain a large share of local market structure. With scraping, normalization, and simple geo logic, competitor pricing stops being scattered pages and becomes a usable benchmark.

Experiment 7

Research

A Physics Paper Without a Physics Degree

Get Physics Done, from curiosity to research artifact in days

The Question

Can an AI-native research workflow turn curiosity in a domain I do not know into something that actually looks and feels like a structured research artifact?

The Setup

Multiple model passes were used to explore competing physics frameworks, draft arguments, and clean the output into a coherent artifact. The point is not peer review yet; it is whether the workflow can get from zero to serious-looking research in days.

QFT

Quantum Field Theory

FVD

False Vacuum Decay

CCC

Penrose Conformal Cycles

BIO

Bohm Implicate Order

5 models · ~40 pages · structured research artifact

Multiple frameworks, multiple model passes, quality control at every stage.

The Finding

The surprise was not the specific physics claim. It was how quickly the workflow moved from loose curiosity to a real research-shaped artifact. Whether the conclusion survives expert scrutiny is a separate question from whether the pipeline works.

The pattern

Domain expertise barriers are falling. AI-native workflows collapse the distance between curiosity and a structured first-pass research artifact from semesters to days.

Experiment 8

Research

Can Relational Structure Predict the Future?

Intelligence Density, measuring predictive signal in how things connect

The Question

If you know how the parts of a system relate to each other, not just what the parts are, can that relational structure improve prediction?

The Setup

The framework compares a relation-aware predictor against a matched baseline that only sees local state. First it is tested in a synthetic coupled system, then in real language-model traces, to see whether structure itself carries signal.

Synthetic · Coupled Markov

+0 pp uplift

13.8 ± 3.0 pp over matched baseline

Coupled system +13.8 pp

Null control ≈ 0

Shuffled control ≈ 0

At Scale · GPT-2 Prompt Traces

Small but real signal

n = 500 prompts · relation-aware prediction still beats the baseline

Accuracy shift

.301

Baseline

Positive lift

.317

Relation-aware

The signal survives contact with a real language model, even if it is much smaller than in the synthetic system.

The Finding

Relational structure appears to carry predictive information. It is strong in a synthetic coupled system and still positive in real language-model traces. The contribution is not just the result; it is a reusable way to ask whether structure itself contains signal.

In plain English: how things connect may help predict what happens next, beyond knowing the things alone.

The pattern

Relational structure, how components connect and not just what they are, may be a measurable predictive signal. The framework to test that is reusable across domains.

What's Next

The experiments continue

None of these are finished products. They're probes, fast, cheap tests of whether AI agents with human judgment can produce real intelligence.

The consistent finding: they can. And the marginal cost of each new experiment is approaching zero because the operating system is reusable. Every new question just needs a brief and a session.

The firms that build this capability will compound an information advantage with every deal, every quarter, every new question they think to ask. The ones waiting for someone to package it into a SaaS product will pay a subscription for yesterday's intelligence.

The tools are open source. The compute is cheap. The scarce resource is the same one it's always been: knowing what to ask.

Coming Soon

This post is a primer

Various experiments will have their own upcoming posts. This piece is meant to be the high-level map first.

Want the 20,000-foot view?

The Principal and the Swarm shows what these systems can do. The New Stable Orbits is the macro companion piece — what happens when the cost of building intelligence collapses, which business shapes become newly viable, and why the middle gets squeezed.