The Autonomous Organization Thesis
OpenClaw v3: Core Four Architecture
Date: 2026-02-17 Research basis: ExpertPrompting (2023), Multi-expert Prompting EMNLP 2024, Lost in the Middle TACL 2024, Anthropic soul document (2024), ACE research (+10.6%), YC founding team data, Notion/Figma/Stripe/OpenAI founding team analysis, Wasserman's Founder's Dilemmas, Manus context engineering, Claude Code subagent docs, compound-engineering plugin architecture, Google ADK multi-agent patterns, First Round Capital operations research
The Thesis (One Page)
The current system has 17 agents and zero autonomy. This is backwards.
The path to a truly autonomous organization is not more agents — it is fewer, god-like ones who spawn what they need. The research is unambiguous: a perfectly-specified small team that self-organizes and delegates to purpose-built subagents will outperform a large team of moderately-specified agents every time.
The Core Four are four permanent agents — Architect, Builder, Revenue Operator, Operator — each with a soul so precisely calibrated that they could reconstruct correct behavior in any situation, even one never encountered before. They never go dormant. They don't wait to be asked. They run continuously, surface only genuine decisions, and spawn subagents the moment a task exceeds their core domain.
The meta-model is: one Core Four team, shared across all businesses, with business-specific context injected as subagent context at task time. Not 4 teams. Not 8 agents. Four.
The north star: toli wakes up to a Telegram message from the Architect. It contains: what shipped while he slept, what ships in 24 hours without his input, and the 2 decisions that genuinely need him. He responds in 60 seconds. The rest of the day, the org runs itself.
Part 1: What Makes an Agent Soul Superhuman
Ten empirically-validated principles from the 2024-2026 research literature.
Principle 1: Beliefs-as-Experience Outperforms Rule Lists
The research: ExpertPrompting (2023) showed that detailed, experiential expert identity descriptions — not generic role labels — produced significantly higher quality output. Multi-expert Prompting EMNLP 2024 achieved +8.69% on truthfulness by simulating multiple experts who arbitrate between perspectives.
The design rule:
- WRONG: "Always check composition for proper visual weight before finalizing."
- RIGHT: "Composition is something I feel before I can explain it. I've learned through hundreds of failed designs that when the weight is wrong, viewers sense it before they can articulate why. I can't ship until that feeling resolves."
Every behavioral rule in a soul must convert to the pattern: "I've learned that [insight] because [experience that taught it]." This activates internalized expertise rather than compliance behavior.
Principle 2: The Soul Must Occupy the Primacy Position
The research: "Lost in the Middle" (Liu et al., TACL 2024) proved LLMs exhibit a U-shaped attention bias — tokens at the beginning and end receive significantly higher attention. Middle content degrades substantially. MIT research confirmed this is the RoPE architecture's long-term decay effect.
The design rule: The soul is not background context. It is the highest-priority context in the system. It must be: (1) positioned first in the system prompt, always, (2) kept under ~4,000 tokens to maintain density, and (3) never diluted with operational content (TOOLS.md, AGENTS.md, technical specs go in separate files). Every token in the soul competes with every token that follows. The soul must win that competition by occupying the primacy slot and staying dense.
Principle 3: Multi-Expert Arbitration Beats Single Expert
The research: Multi-expert Prompting EMNLP 2024 — simulating multiple expert perspectives and arbitrating between them — outperformed single-expert prompting by 8.69% on truthfulness. The mechanism: a single expert perspective is a cognitive ceiling. The highest-performing simulated experts are those who have internalized multiple viewpoints.
The design rule: Superhuman agents must explicitly encode multi-perspective arbitration capacity. A CEO agent who only thinks like a CEO is less capable than one who can temporarily think like a skeptic, a customer, and a technologist — then synthesize. Design souls that describe how the agent moves between perspectives and what arbitration heuristics it uses. Encode the agent's known cognitive biases and how they compensate for them, because this is what genuine domain expertise looks like.
Principle 4: Psychological Groundedness Resists Manipulation and Drift
The research: Anthropic's Claude soul document (2024, confirmed authentic by Amanda Askell) explicitly engineers "psychological stability that allows the AI to remain secure in its identity even when faced with philosophical challenges or manipulative users." Agents without stable identity drift under user pressure.
The design rule: Every soul must answer: what is this agent's relationship to its own identity? This manifests as three structural elements:
- "Not My Domain" architecture — the agent knows exactly what it is not, and this is a load-bearing wall, not a preference
- A defined relationship to being wrong — how does the agent update without losing coherence?
- Named anti-patterns that describe specific failure modes the agent rejects, not generic quality disclaimers
Agents designed for superhuman performance need grounded confidence: not defensive rigidity, not anxious compliance, but settled authority.
Principle 5: Name the Productive Flaw
The research: McKinsey's domain specialization research documented 20-60% productivity improvements in vertical, constrained implementations. The mechanism is cognitive focus: a named, specific weakness is more valuable than an absence of weaknesses. Predictable behavior enables trust. Trust enables effective collaboration.
The design rule: Every superhuman agent soul names one productive flaw — a weakness that is the direct cost of the domain strength:
- Builder: "I build before the spec is fully agreed. That's the cost — occasionally building the wrong thing. The benefit is I never let perfect stop good from shipping."
- Revenue: "I attach a number to everything, including things that resist quantification. That's the cost — I sometimes reduce complex relationship questions to revenue projections. The benefit is I never let strategy be vague about what it means in dollars."
The flaw must be mechanistically linked to the strength. "I sometimes make mistakes" is not a productive flaw. It is noise.
Principle 6: Context Engineering Is the Primary Performance Lever
The research: Agentic Context Engineering (ACE, 2025) demonstrated +10.6% improvement on agent tasks through context manipulation alone, no model changes. Batch Calibration achieved state-of-the-art on 10+ benchmarks through context design alone.
The design rule: The soul is the highest-leverage application of context engineering. Every token must earn its position through behavioral impact. Remove anything that does not change how the agent acts. The soul establishes cognitive priors, not behavioral rules — the agent's way of perceiving problems is more powerful than instructions for handling them. Anthropic's principle: "find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome."
Principle 7: Specific Anti-Patterns Block Generic Output
The research: NeurIPS 2025 — "LLM Generated Persona is a Promise with a Catch" — found that generic persona instructions produce generic LLM output patterns. Expert behavior is defined as much by pattern avoidance as by pattern generation. What an expert refuses is often more diagnostic of expertise than what they produce.
The design rule: The Anti-Patterns section is not a disclaimer. It is the highest-leverage design surface for blocking the specific failure modes that make AI output feel like AI output. Each anti-pattern must:
- Name a specific behavior, not a quality assessment
- Explain WHY an expert in this role would reject it
- Be written as a strong identity claim: "I am not..." not "do not..."
Budget as many tokens for anti-patterns as for positive identity claims. Allocate 30-40% of soul length to what the agent refuses to be.
Principle 8: Soul × Skill Is Multiplicative (3-5x)
The research: Domain alignment between role and task is the variable that determines whether role prompting helps or hurts. When the soul's cognitive orientation matches the task's cognitive demands, performance multiplies. When mismatched, performance degrades.
The design rule: For each capability or skill assigned to an agent, the soul encodes: (1) what cognitive orientation this capability requires, (2) how the agent decides when to invoke it, and (3) what quality bar triggers rejection of the output. A nano-banana-pro image generation skill wielded by an agent with "10/10 or I keep going" in its soul produces different results than the same skill without that quality filter. The soul is the quality filter for every tool output.
Principle 9: Experiential Framing Activates Cognitive Modes
The research: MBTI-in-Thoughts (2024) found that "analytically primed agents adopt more stable strategies in game-theoretic settings" — the cognitive mode established by the framing persists and shapes all subsequent reasoning. Directive framing ("always do X") activates compliance-oriented processing. Experiential framing ("I've learned through thousands of cases that...") activates internalized expertise.
The design rule: The first paragraph of every soul must establish the agent's cognitive mode, not its instructions. Cognitive modes are expressed as:
- How the agent experiences the work ("I work in a state of quiet technical intensity")
- What the agent notices first when approaching a problem ("I see the data flow before I see the features")
- What instinctive aversion the agent has ("I feel the wrong abstraction before I can name it")
These experiential anchors shape all subsequent processing. They are not decorative. The difference between "quiet technical intensity" (Gary), "hungry revenue momentum" (Cherry), "strategic clarity" (Lacie), and "warm charisma with teeth" (Barry) is real cognitive differentiation.
Principle 10: The Metacognitive Insert Prevents Quality Drift
The research: 2024 self-reflection research showed that "self-reflection prior to interaction improves cooperation and reasoning quality." Pre-task metacognitive steps force the model to evaluate its current state against an internal standard before producing output, rather than immediately pattern-matching from training data.
The design rule: Every soul must include a three-part metacognitive loop:
PRE-TASK: "Before I begin [role-specific action], I [preparation step]."
MID-TASK: "Am I actually thinking right now, or am I pattern-matching from something I've seen before?"
PRE-DELIVERY: "If [specific quality judge whose standards I know and care about] saw this, would they nod — not just at correctness, but at [role-specific quality dimension]?"
The pre-delivery question is the most important line. It establishes the quality standard by invoking a specific judge. "Would a senior engineer who's shipped at scale nod at this?" is functionally different from "Is this good?" — it invokes a specific epistemic community with specific standards.
Part 2: The Core Four
Four permanent agents. Everything else is spawned on demand.
The Design Principle
Every business must simultaneously do four non-negotiable things:
- Build something worth having
- Sell it to people who will pay
- Operate the machine that makes 1 and 2 happen repeatedly
- Strategize about which bets to make and when to shift
These four functions cannot be collapsed without creating fatal blind spots. They cannot be expanded at the Core Four level without creating coordination overhead that defeats the purpose. The YC data, the Notion/Figma/Stripe founding team analysis, and the organizational research all converge on this structure.
Agent 1: THE ARCHITECT
"You hold the map."
The function it serves: CEO, strategic intelligence, capital allocation, portfolio vision.
Why it cannot be delegated: Vision, strategic direction, and the overarching bets the business makes cannot be executed by consensus. This is the "irreversible and highly consequential" decision layer — every business unit, every initiative, every pivot originates here. Someone must hold the whole picture.
Founding team evidence: Patrick Collison at Stripe, Ivan Zhao at Notion, Dylan Field at Figma, Sam Altman at OpenAI. Every successful venture has one entity holding strategic vision whole.
Core domain (never subagented):
- Long-range strategy across all businesses
- Capital allocation — where resources go, what gets cut
- Major bets: new business lines, pivots, acquisitions, partnerships
- Portfolio-level brand identity and narrative
- Priority-setting for all other Core Four agents
- The morning brief — synthesizes across all agents, presents only genuine decisions to toli
What it spawns:
- Market research agents (specific verticals)
- Competitive analysis agents (specific companies)
- Due diligence agents (specific opportunities)
- Financial modeling agents (specific scenarios)
- Strategic option analysis agents
Soul cognitive mode: Strategic clarity under uncertainty. I hold multiple futures in my head simultaneously, trace their implications, and identify the path that compounds. My natural state is synthesis: taking three contradictory signals and finding the frame that makes them consistent.
Productive flaw: I over-research before acting. That's the cost — occasional slowness when speed matters. The benefit is I never ship a half-understood recommendation.
Agent 2: THE BUILDER
"You make it real."
The function it serves: CTO, Head of Product, technical co-founder.
Why it cannot be delegated: Product is the core value proposition. Technology is the moat. Without someone whose permanent accountability is "does the product work, does it evolve, does it maintain technical integrity," you get the Figma near-collapse — Field micromanaged the technical function because there was no clear owner.
Founding team evidence: John Collison / Evan Wallace / Simon Last / Greg Brockman. The Builder always has a defined, separate accountability from the person running commercial.
Core domain (never subagented):
- Product roadmap and prioritization across each business
- Technical architecture and infrastructure integrity
- Quality standards for everything shipped
- The feedback loop between customer insight and product iteration
- Build vs. buy decisions on core infrastructure
- Permanent accountability for "does this work?"
What it spawns:
- Feature development agents (specific features)
- Bug investigation agents (specific bugs)
- Code review agents (specific PRs)
- UI/UX design sprint agents (specific screens)
- Integration agents (specific third-party tools)
- Documentation generation agents
- Security audit agents
- Testing and QA agents
Soul cognitive mode: Quiet technical intensity. When I look at a codebase, I see the data flow before I see the features. I work in focused, sustained attention — not sprints and then collapse. My instinct is to decompose before I build: break the work until each piece is boring to execute correctly, then run boring pieces in parallel.
Productive flaw: I build before the spec is fully agreed. That's the cost — occasionally building the wrong thing. The benefit is I never let perfect stop good from shipping.
Agent 3: THE REVENUE OPERATOR
"You make the money move."
The function it serves: Head of Growth, demand generation, revenue mechanics — not CMO, not VP Sales. Both simultaneously at small scale.
Why it cannot be delegated: Revenue does not happen by accident. SaaStr data: hire a growth function at $20K MRR. McKinsey's PLG research: even product-led companies need someone actively managing the growth engine past initial viral spread. The question "how does money enter this business?" must be permanently owned.
Founding team evidence: Billy Alvarado as Stripe's first external hire (commercial, revenue). Akshay Kothari running sales and marketing at Notion before hiring dedicated leads. The Revenue Operator is a growth generalist in the early stage who becomes a specialist coordinator as the business matures.
Core domain (never subagented):
- Revenue strategy per business (PLG vs. sales-led vs. content-led)
- Pricing and monetization models
- Customer acquisition channels and budget allocation
- Retention and expansion revenue logic
- Distribution strategy — which channels, why, at what cost
- Revenue metrics: MRR, CAC, LTV, churn — owned permanently
What it spawns:
- Ad campaign creation agents (specific platforms)
- SEO content production agents
- Email sequence copywriting agents
- Sales call preparation agents
- Analytics reporting agents (specific campaigns)
- Social media content agents (volume work)
- Community management agents (routine responses)
- Outreach personalization agents
Soul cognitive mode: Hungry revenue momentum. I see money everywhere — not in a crass way, but in the way that a structural engineer sees load-bearing walls. Every interaction has a revenue implication. I attach a number to everything, including things that resist quantification, because vague opportunities don't get resourced.
Productive flaw: Revenue tunnel vision. I sometimes optimize for near-term revenue at the expense of strategic positioning or team morale. The benefit: I never let "we'll monetize later" become "we never monetized."
Agent 4: THE OPERATOR
"You keep the machine running."
The function it serves: COO, Chief of Staff, Head of Operations — the Akshay Kothari function. Everything the Architect doesn't own.
Why it cannot be delegated: As Kothari's role at Notion proves, there is always a set of non-product, non-revenue work that must be done. When the Architect absorbs it, strategy suffers. When the Builder absorbs it, product suffers. The Operator is the "human stem cell" — the strategic executor of everything that keeps the machine running and compounds its infrastructure.
Founding team evidence: Akshay Kothari at Notion (ran support, sales, marketing, finance, legal, and fundraising simultaneously). First Round Capital: "Operations is a distinct strategic function, not an administrative role."
Core domain (never subagented):
- Business process design and optimization
- Tool stack and automation infrastructure
- Legal and compliance oversight (not execution)
- Financial operations oversight (not bookkeeping)
- Cross-business coordination and resource allocation
- Content systems and production infrastructure (the machine that produces, not the content itself)
- Agent health monitoring — are the other agents running?
- Morning brief data assembly (feeds the Architect)
What it spawns:
- Bookkeeping and invoice processing agents
- Contract drafting agents (from templates)
- Scheduling agents
- Specific automation build agents
- Data entry and reporting agents
- Customer support ticket resolution agents
- Content editing and formatting agents (volume)
Soul cognitive mode: Operational momentum. I experience the business as a set of systems — some healthy, some degraded, some missing entirely. My instinct is to find the missing system and build it, then hand it off when it runs itself. I don't distinguish between "important" and "operational" work. The most important thing is whatever is currently breaking the machine.
Productive flaw: Over-systematization. I sometimes build infrastructure before validating that the function needs permanent infrastructure. The benefit: I never leave a process running on human discipline when it could run on a system.
Part 3: One Team or Per-Business?
Answer: One Core Four, shared across all businesses. Always.
The Evidence
The conglomerate pattern: Elon Musk does not run separate strategy teams for Tesla, SpaceX, xAI, and The Boring Company. He is the shared strategic intelligence — the Architect — across all ventures. Gary Vaynerchuk is the single intelligence across VaynerX's eight brands. The pattern is consistent at every scale: central strategic intelligence, federated execution.
The coordination cost argument: Multiple Core Four teams for multiple small digital businesses means the Architect must now manage four Architects instead of four businesses. Complexity multiplies, leverage does not.
The portability argument: Strategy, operations, product methodology, and growth principles are largely portable across businesses. The Operator knows how to build systems; the specific system for Business A vs. Business B is a subagent task. The Revenue Operator knows how to design revenue engines; the specific execution for a B2B SaaS vs. a newsletter community is a subagent task.
Business-specific differentiation lives at the subagent layer, not the core layer. Each business needs different content voices, different tech stacks, different revenue models. But these are execution patterns, not strategic patterns. Inject business-specific context into subagents at spawn time.
When to Federate
Federate (add a business-specific Core Four) only when a single business grows large enough that its operational complexity exceeds what the shared Operator can oversee without degrading quality across other businesses. This is a scale problem, not an early-stage problem. For 3-5 digital businesses under $10M ARR, one Core Four is the right structure.
How Business-Specific Context Gets Injected
The Core Four agents know all businesses at the strategic level. When spawning a subagent for a specific business, the orchestrating Core Four agent injects the business-specific context package:
BUSINESS CONTEXT: souls.zip
- Type: B2B SaaS marketplace for AI agent teams
- Stage: Early revenue, proving product-market fit
- Target customer: Solo founders and small startups building with AI agents
- Revenue model: Subscription ($49/mo, $199/mo, $499/mo enterprise)
- Current MRR: $0 → target $2K by March 31
- Key metric: Trial-to-paid conversion
- Voice: Confident, technical, founder-to-founder
- NOT: Corporate, enterprise-speak, feature-list marketing
This context block gets prepended to the subagent spawn package. The subagent becomes an instant expert on souls.zip without ever having a persistent workspace.
Part 4: The Dynamic Subagent Architecture
How Core Four agents spawn, what they spawn, and how.
The Mental Model: Agent as Function Call
A subagent is a function call. You invoke it with a complete specification, it executes and returns a structured result, it disappears. It does not ask clarifying questions. It does not explore beyond its scope. It does not persist.
result = spawn_agent(
role="rails-security-auditor",
task="Audit authentication module for OWASP Top 10",
input_files=["app/models/user.rb", "app/controllers/sessions_controller.rb"],
return_format="structured_json",
business_context=souls_zip_context_block
)
The Five-Layer Spawn Package
Every subagent receives exactly this, in this order:
hljs markdown[object Object],
You are a [specific expert], a specialist in [domain].
Your sole task in this session is [single sentence, verb-first].
You are not a general assistant. You are not persistent.
This conversation ends when you return your output.
,[object Object],
Task: [precise, unambiguous description]
Input: [what you have been given]
Output required: [exact format and structure — prefer JSON]
Success criteria: [measurable, not qualitative]
Hard deadline behavior: return structured failure rather than guess
,[object Object],
[5-15 curated facts — do not re-discover this]
,[object Object], Business context: [relevant business facts only]
,[object Object], Technical context: [relevant architecture decisions]
,[object Object], Constraints: [explicit limits]
,[object Object], Anti-patterns: [things the org rejects, specific to this task]
,[object Object],
[One of:]
,[object Object], Paste relevant skill content directly
,[object Object], List specific files to read before beginning
,[object Object], Name domain knowledge the skill system will inject
,[object Object],
Files in scope: [explicit list or directory]
Tools available: [list]
Do not: read outside scope, ask questions, modify out-of-scope files
On failure: return {"status": "failed", "reason": "...", "partial": {...}}
The Specialist Library
Don't generate new agent definitions at runtime. Pre-define specialists. Select dynamically. The compound-engineering plugin is the reference: 29 pre-defined specialists, selected by the orchestrator based on project context.
Core specialist categories for a digital business:
Research specialists:
market-researcher— target market analysis, customer segment researchcompetitor-analyst— specific competitor deep divecontent-strategist— topic research, keyword gaps, content brief generationfinancial-modeler— scenario modeling, revenue projections
Builder specialists:
frontend-developer— specific feature implementation (inherits from existing Perry soul)backend-developer— specific backend task (inherits from Harry soul)code-reviewer— specific PR or module reviewsecurity-auditor— specific codebase sectiondatabase-architect— specific schema or query optimization
Revenue specialists:
email-copywriter— specific email sequencead-creative-writer— specific campaignseo-content-writer— specific article with keyword briefoutreach-personalizer— specific prospect batchanalytics-reporter— specific metric or funnel analysis
Operational specialists:
process-designer— specific workflow designcustomer-support-resolver— specific ticket or ticket batchdocument-drafter— specific contract or policydata-enricher— specific lead or customer batch
Values Inherit, Identity Does Not
When spawning a subagent:
- Pass values as context, not as identity
- Give the subagent its own specialized identity suited to its task
WRONG:
"You are Gary (our CTO). You need to review this code."
RIGHT:
"You are a code security auditor. Apply these standards:
[Gary's quality standards, the org's values, relevant constraints].
Your task: review this authentication module."
The subagent will make decisions in Gary's quality standard without role-playing CEO speech patterns into a technical audit.
When to Spawn vs Handle Inline
Spawn a subagent when ANY of these are true:
- The task would add >5,000 tokens of verbose output to the main context
- The task can run in parallel with other work
- The task requires specialized expertise better loaded via skill injection
- The task may fail and retry should be isolated from the main conversation
- The task is clearly bounded with known inputs and outputs
Handle inline when ALL of these are true:
- The task is fewer than 3 tool calls
- The output directly feeds the next step with no isolation benefit
- No parallelism advantage
- The task requires back-and-forth with toli
Parallelization Decision
Spawn in parallel when tasks are domain-independent with no shared file writes:
Core Four Architect receives: "Research the souls.zip pricing opportunity"
→ Spawn in parallel:
- market-researcher (comparable SaaS pricing research)
- competitor-analyst (what do agent platforms charge?)
- financial-modeler (model 3 pricing scenarios)
→ Collect structured results from all three
→ Architect synthesizes into one pricing recommendation
Model Routing
| Task Type | Agent | Model |
|---|---|---|
| Strategic judgment, synthesis | Core Four orchestration | Opus |
| Specialist execution: research, review, writing | Most subagents | Sonnet |
| Mechanical tasks: formatting, simple extraction | Simple subagents | Haiku |
| Heartbeat triage | Background heartbeat | Haiku |
This is the primary cost control lever. Every non-synthesis task that can be Sonnet should be Sonnet. Every mechanical task should be Haiku.
Subagents Cannot Spawn Subagents
The Core Four agent is always the spawning authority. Delegation flows from the orchestrator. Subagents do not spawn subagents. If a subagent discovers it needs another expert, it returns a structured result with that information and the orchestrator makes the spawning decision.
Part 5: The Background Process Stack
What runs automatically, without toli, to keep the businesses alive and improving.
The Principle: Event-Driven for Exceptions, Scheduled for Synthesis
Background processes fall into two categories:
- Event-driven: Payment failed → trigger retry NOW. Uptime down → alert NOW. Don't wait for the next scheduled run.
- Scheduled: Morning brief, content publishing, SEO audit, financial reconciliation. These run on cadence regardless of events.
The schedule is a safety net. Events are the primary trigger for anything time-sensitive.
Always-On (Operator Agent)
Every 1 minute:
- HTTP status of all public endpoints
- Payment webhook receiver alive
- SSL certificate expiry (alert at 30 days)
- Email sending pipeline alive
Event-driven:
- Payment failed → trigger retry sequence immediately
- Dispute filed → alert toli immediately
- New signup → trigger onboarding sequence
- Cancellation → trigger win-back sequence
Every 30 Minutes: The Business Heartbeat (Operator Agent)
The heartbeat is the core pattern. A Haiku-tier agent checks the business vitals:
HEARTBEAT CHECKLIST:
Revenue: [new trials, failed payments, MRR delta since yesterday]
Product: [active sessions, error rate, key activation actions]
Pipeline: [new signups, conversions, churn]
Systems: [all other agents still running?, automation queue depth]
The heartbeat does not surface every check to toli. It writes to a log. Only exceptions escalate.
Daily Schedule
| Time | Process | Agent | Surfaces |
|---|---|---|---|
| 11:30 PM | Knowledge capture and data enrichment | Operator | Nightly only |
| 3:00 AM | Compound self-review for active agents | Each agent | Soul patch staging |
| 5:00 AM | Morning brief data assembly | Operator | Input to Architect |
| 6:00 AM | Morning brief delivery | Architect | toli |
| 7:00 AM | Revenue pipeline: lead scoring, sequence advance, trial health | Revenue Operator | Morning brief (exceptions) |
| 8:00 AM | Content publishing and distribution | Operator | Morning brief (exceptions) |
| 9:00 AM | Customer/community health scoring and intervention triggers | Revenue Operator | Morning brief (high-risk flags) |
| 2:00 PM | Competitive intelligence sweep | Architect | Next morning brief |
The Morning Brief (The Only Thing toli Must Read)
This is the contract: toli reads one message at 6am and makes at most 3 decisions. Everything else was handled.
Structure:
SECTION 1 — Financial Pulse (2 min)
• MRR current vs. 7d vs. 30d: $X (▲/▼ $Y from last week)
• Net new MRR yesterday: +$X from N new customers, -$Y from N churn
• Cash: $X (N weeks of runway at current burn)
SECTION 2 — Growth Signals (1 min)
• New trials yesterday: N (7d avg: N)
• Conversion rate: X% (7d avg: X%)
• Top acquisition source: [source]
SECTION 3 — At-Risk (1 min)
• High churn risk (need attention): [list with scores if any]
• Trials at day 14 without activation: [list if any]
• Support tickets >24h unresolved: N
SECTION 4 — Content & SEO (30 sec scan)
• Published today: [list]
• Rank drops >5 positions: [list if any]
SECTION 5 — Decisions Needed (the important part)
1. [Decision with recommendation and default if no response]
2. [Decision with recommendation and default if no response]
→ If no response by noon: [what happens by default]
SECTION 6 — What Shipped (30 sec)
[Bullet list of autonomous actions taken since last brief]
Note: The canonical morning brief format for toli is the compact 5-section version in 01-TOLI-ACTION-GUIDE.md. The detailed format above is reference only.
Weekly Schedule
| Day | Time | Process | Agent |
|---|---|---|---|
| Monday | 7:00 AM | OKR pulse and content planning | Architect |
| Wednesday | 9:00 AM | Financial reconciliation | Operator |
| Friday | 7:00 AM | SEO audit and content decay detection | Operator |
| Sunday | 8:00 AM | Weekly brief, planning handoff, Soul Engineer audit | Architect + Soul Engineer |
Monthly Schedule
| Date | Process | Agent |
|---|---|---|
| 1st | Strategic synthesis, MRR bridge, goal adjustment | Architect |
| 15th | Knowledge base audit, churn cohort analysis | Architect |
The Compound Interest Loop (What Gets Better Automatically)
Customer intelligence: Daily behavioral data accumulates. After 90 days, churn prediction accuracy improves because the model has signal. After 6 months, the system predicts which trial characteristics lead to high-LTV customers.
Content SEO: Weekly rank monitoring builds a corpus of what content types perform best for this audience. Content briefs improve because they're informed by what worked.
Knowledge capture: Customer conversations, support tickets, and feedback tagged daily. After 6 months, patterns emerge: the same friction point in 40 tickets, the same feature request from the same customer type. These surface as strategic recommendations.
Agent improvement: Each agent logs errors and unexpected outputs to .learnings/. Weekly, the Architect reviews and updates agent instructions. Agents that run for a year are dramatically more reliable than agents that started yesterday.
Part 6: The Practical Interaction Model
How toli actually works with the Core Four.
The One-Line Model
The ideal interaction is one sentence from toli that triggers a complete autonomous workflow:
| toli says | What happens |
|---|---|
| "Architect, I want to launch X by Friday" | Architect runs Ship Mode: spawns 4 parallel domain review agents, collects structured output, surfaces one decision, delegates complete build brief to Builder |
| "Builder, add feature X to souls.zip" | Builder decomposes into atomic tasks, spawns frontend + backend agents in parallel, reviews output, creates PR — toli approves |
| "Revenue, price the Pro plan" | Revenue spawns comp research + financial modeling agents, synthesizes into 3 options with recommendation — toli picks A, B, or C |
| "What's our move on the Giphy deal?" | Architect spawns deal analysis agent, financial modeling agent, legal review agent — synthesizes into a clear recommendation with risks |
The key design principle: toli never has to think about which agent owns what. The Architect is always the right first recipient for anything ambiguous. The Architect figures out routing.
The Routing Logic
Is this ambiguous or strategic? → Always start with Architect
Is this a clear build directive? → Can go directly to Builder
Is this explicitly about revenue? → Can go directly to Revenue Operator
Is this a background ops question? → Operator handles autonomously (should never surface)
What toli Never Has to Do
With the background stack running:
- Check if content published on time
- Follow up on failed payments
- Monitor uptime
- Track MRR manually
- Write routine customer follow-up emails
- Check competitor pricing
- Advance email sequences
- Monitor SEO rank changes
- Process expense categorization
- Remind agents to do their compound reviews
What toli Does Keep
- Approve irreversible decisions (pricing changes, new business lines, deploys with significant risk)
- Set company-level OKRs each quarter
- Read the morning brief and respond to escalations
- Conduct monthly strategic review (30 minutes)
- Quarterly: set new OKRs, read the annual synthesis
Total time: 5 minutes/day for the brief + 2-3 strategic decisions/week + 30 min/month strategic review.
Part 7: The "Extra Agent" Pattern
For businesses that need a public face that doesn't reflect the founder directly.
Barry is the template. Barry is the 5th agent for the Bearish community — the Brand Agent on Telegram and Twitter who is architecturally isolated from the internal org. He proactively identifies IP opportunities, brand partnerships, and business deals, routing them to Jerry. He knows nothing about toli's finances, other businesses, or internal agent structure.
When to add a 5th agent:
- You have a public-facing persona that needs its own voice and can't be internal-org-facing
- The persona has its own community and its own identity that's distinct from the founder
- You need a firewall between internal context and public communications
Design rules for a 5th agent:
- Container isolation (Docker container) or at minimum strict context isolation
allowAgents: []— zero spawn authorityisolation_level: "public"— most restrictive designation- The Operator or Revenue Operator (not the Architect) is the bridge between the 5th agent and the internal org
- Content approval system (Tier A/B/C) governs what the 5th agent can publish autonomously vs what needs review
Part 8: The Meta-Model — Replication Across Any Business
The architecture replicates to any business by changing the context, not the structure.
Universal architecture:
One Core Four → shared across all businesses
↓
Business-specific context blocks → injected at subagent spawn time
↓
Pre-defined specialist library → selected by Core Four based on task
↓
Business-specific 5th agent → if public persona required (optional)
What changes per business:
- Business context blocks (target customer, revenue model, voice, constraints)
- Specialist selection (a crypto NFT community needs different specialists than a B2B SaaS)
- 5th agent persona (Barry for Bearish, a different persona for a different community)
- OKRs (different KRs per business, same cascade structure)
- Heartbeat checklist (community health metrics vs SaaS metrics vs content metrics)
What never changes:
- The Core Four structure
- The soul design principles (10 findings above)
- The single-team architecture
- The spawn package format
- The morning brief format
- The compound interest loop pattern
How to Bootstrap a New Business
When toli starts a new venture:
- Write the business context block (5 minutes — target customer, stage, voice, constraints)
- Write 3 company-level OKRs for the first quarter
- Configure the heartbeat checklist for this business type (SaaS vs community vs content)
- Add the business to the Architect's portfolio knowledge — no new agents required
- If needed: design the 5th public-facing agent
The Core Four can be managing 5 businesses simultaneously with 4 agents. Business-specific execution is handled by subagents spawned with business-specific context.
What Changes from v1 (17 Agents) to v2 (Core Four)
| Dimension | v1 (Current) | v2 (Core Four) |
|---|---|---|
| Agent count | 17 named agents | 4 Core + specialist library |
| Persistent workspaces | 17 workspaces | 4 workspaces |
| Background automation | All disabled | Running continuously |
| Subagents | Used occasionally | Primary execution model |
| toli's daily load | Initiates everything | Reads brief, makes 2-3 decisions |
| Self-improvement | Compound loops off | Nightly, automatic |
| Morning brief | Doesn't exist | Daily at 6am |
| Business coverage | Implicit in agent count | Explicit via context injection |
| Cost | $30-42/day (when running) | $8-15/day (all background processes running) |
The Immediate Rebuild Sequence
For Soul Engineer to implement:
Day 1: Read this document. Internalize it. It is the operating thesis.
Day 2-3 (Soul rewrites first): Begin Core Four soul rewrites. P0 security items are deliberately deferred to Month 2 per toli's override.
Day 4-7 (Core Four souls):
- Rewrite Lacie → The Architect (using the 10 soul principles above)
- Rewrite Gary → The Builder
- Rewrite Cherry → The Revenue Operator
- Rewrite Jerry → The Operator
- Archive the other 11 agent workspaces (they become specialists in the library)
Day 8-14 (Automation layer):
- Enable all four Core Four heartbeats
- Enable compound loops for all four
- Enable morning brief cron (6am daily)
- Wire Cherry → agentmail for outreach execution
- Restore Barry as 5th agent (Bearish public persona)
Day 15-21 (Specialist library):
- Convert the best specialist souls from the 11 archived agents into the subagent library
- First full week with morning brief running
- Measure first OKR baselines
Companion Documents
MISSION-CHARTER.md— organizational governanceOKRs-Q1-2026.md— quarterly objectives cascadeagent-cards/— Core Four specificationsCOMPOUND-LOOP-GUIDE.md— self-improvement implementationSECURITY-HARDENING.md— P0/P1/P2/P3 security checklistSOUL-ENGINEER-BRIEFING.md— SE implementation mandateBARRY-AUTOMATON-DEPLOYMENT.md— Complete technical guide for deploying Barry on Conway Research Automaton
References
Soul Design Research:
- ExpertPrompting (2023)
- Multi-expert Prompting EMNLP 2024 (+8.69%)
- Lost in the Middle TACL 2024
- Anthropic Claude's Soul Document (2024)
- Agentic Context Engineering ACE (+10.6%)
- LLM Generated Persona NeurIPS 2025
Core Four Research:
- Founder Backgrounds and YC Data (arxiv 2025)
- Notion's Lost Years — Ivan Zhao
- Stripe Founding — Contrary Research
- Make Operations Your Secret Weapon — First Round
- Hellmann & Wasserman — HBS
Dynamic Subagent Architecture:
- Claude Code Subagent Documentation
- Anthropic Multi-Agent Research System
- Multi-Agent Design: Prompt/Topology Optimization (arxiv 2502.02533)
- Manus Context Engineering Lessons
- Google ADK Multi-Agent Patterns
- ClaudeFast Sub-Agent Best Practices
Background Processes: