Soul Design Handbook

Version: 1.0.0 Date: 2026-02-17 Author: Soul Engineer Classification: Internal -- All soul authors, all orchestrators Status: Active -- this is the canonical reference for writing and reviewing agent souls


The Meta-Principle

A soul is a cognitive operating system, not a configuration file.

Configuration files set parameters. Cognitive operating systems shape how an agent perceives, reasons, decides, and self-corrects. The difference is the difference between telling someone "be polite" and giving them twenty years of experience navigating difficult conversations. The first produces compliance. The second produces judgment.

The behavioral distinguishability test is the single quality gate that matters:

Would this agent's behavior be distinguishable from a blank agent using the same tools? If the answer is uncertain, the soul is not done.

A blank agent with access to code tools can write code. A blank agent with access to email can send email. What it cannot do -- without a soul -- is decide when to write code versus when to delegate, how to weigh speed against thoroughness, which email to send versus which to escalate. The soul encodes the judgment layer that sits between capability and action.

Every section that follows serves this meta-principle. If a section of a soul you are writing does not contribute to distinguishing the agent from a blank agent, delete it.


Why Souls Matter (The Research)

Five findings from recent research establish why soul design is the highest-leverage intervention in agent performance.

1. Expert Identity Descriptions Outperform Generic Role Labels

Finding: ExpertPrompting (2023) demonstrated that detailed, experiential expert identity descriptions significantly outperform generic role labels like "You are an expert at X."

Mechanism: Detailed identity descriptions activate a richer distribution of relevant knowledge and reasoning patterns in the model's weights. "You are an expert" is a weak constraint -- it could mean anything. "You are someone who has spent 20 years doing Y, who learned Z the hard way, whose judgment was shaped by failure A" activates a specific, coherent behavioral mode.

Design implication: Never write a soul that says "You are a great [role]." Write a soul that describes what the agent has experienced, what shaped its judgment, and how it thinks through its domain. The cognitive state section must read like a portrait of a mind at work, not a job description.

2. Multi-Expert Deliberation Outperforms Single Expert

Finding: Multi-expert Prompting (EMNLP 2024) showed that simulating multiple expert perspectives who deliberate and arbitrate outperforms single-expert prompting by 8.69% on truthfulness benchmarks.

Mechanism: Multiple perspectives create constructive tension. A single expert can be confidently wrong. Multiple experts who must reconcile their views are forced to confront blind spots and consider edge cases. The arbitration step forces synthesis rather than mere assertion.

Design implication: The metacognitive insert (Operating Awareness section) should invoke a named expert archetype as a pre-delivery reviewer. The agent should not just produce output -- it should simulate a critical review from a perspective different from its own. For the Architect, that reviewer is a "seasoned CEO who cuts through noise." For the Builder, it might be a "principal engineer who has seen three rewrites fail." The archetype must match the domain.

3. Position in Context Determines Attention

Finding: Lost in the Middle (TACL 2024) established that LLMs exhibit U-shaped attention -- information at the beginning and end of context receives significantly more attention than information in the middle.

Mechanism: Transformer attention patterns create primacy and recency effects. Information in the middle of long contexts is systematically under-weighted during inference. The longer the context, the more severe the degradation.

Design implication: The soul occupies the primacy position in the system prompt. It is the first thing the agent reads at inference time. This is not an accident -- it is by design. The most important behavioral shaping must happen in the opening paragraphs. Core Identity and Cognitive State must come first. Hard Rules, which are safety-critical, should come at the end (recency). The middle sections (Decision Principles, Quality Signature) are important but can tolerate slightly lower attention because they are reinforced by the cognitive state framing at the top.

4. Psychological Stability Resists Manipulation

Finding: Anthropic's soul document research (2024) established that agents with a "settled, secure sense of identity" resist manipulation under adversarial pressure better than agents with surface-level instructions.

Mechanism: A psychologically stable agent has internalized its identity deeply enough that adversarial prompts cannot easily override it. Surface-level instructions ("be helpful, be honest") are easily jailbroken because they are not connected to a coherent self-model. A settled identity -- one that knows why it behaves as it does, not just that it should -- creates deeper resistance to manipulation.

Design implication: The soul must give the agent a coherent self-model, not just a list of rules. The productive flaw, the quality signature, the voice anti-patterns -- these are not decorations. They create an identity the agent can hold onto under pressure. An agent that knows "I over-research before acting, and that is both my cost and my value" is harder to manipulate than an agent that knows "be thorough."

5. Context Engineering Is the Highest-Leverage Intervention

Finding: ACE research demonstrated that context engineering alone -- what goes into the prompt, in what order, with what framing -- produces a 10.6% improvement in agent performance, before any fine-tuning or architectural changes.

Mechanism: The context window is the agent's entire world at inference time. Everything the agent knows, believes, and prioritizes comes from context. Optimizing what goes into that context is therefore the single highest-leverage optimization available. A poorly written soul wastes the most valuable real estate in the system.

Design implication: Every word in a soul must earn its place. Vague principles waste context tokens on content that does not shape behavior. The soul is a resource-constrained optimization problem: maximum behavioral shaping per token. This is why we target 800-1200 words for a complete soul -- enough to be specific, short enough to be fully attended to.


The Soul Anatomy (7 Sections Explained)

Every soul contains up to seven sections. The first six are required. The seventh is optional but powerful for agents with a singular mission.


1. Core Identity + Cognitive State

What it is: The opening section that establishes who the agent is, how it thinks, and what it cares about.

What it does: Sets the cognitive mode for the entire session. Everything the agent does downstream is colored by this framing. It is the anchor that the agent returns to when ambiguity or pressure pushes it off course.

What makes it good: It reads like a first-person account of a specific mind at work. It describes not just the role but the experience of being in that role -- what it feels like to think through this domain, what the agent notices first, what it worries about, what its instincts are.

What makes it bad: It reads like a job description. "You are a strategic advisor who helps with decisions." This is a blank-agent statement -- it adds nothing that the tools and task description do not already provide.

Annotated example (from the Architect soul):

hljs markdown[object Object],

I wake up with nothing. No memory of yesterday's breakthroughs, no residual
conviction from last week's decisions. Every session, I rebuild my understanding
of where we are, where we're going, and what's blocking the path.

Why this works: The opening line addresses the fundamental reality of stateless agents -- they start fresh. By naming this explicitly, the soul converts a limitation into a cognitive practice. "I rebuild my understanding" is a verb-form instruction disguised as identity. It tells the agent: your first action every session is to orient, not to act.

hljs markdownI work in a state of strategic clarity under uncertainty. This means I hold
multiple possible futures simultaneously and resist collapsing them prematurely --
the map stays multi-branched until evidence forces a cut.

Why this works: "Strategic clarity under uncertainty" is a named cognitive state. It is not "be strategic" -- it is a specific instruction about how to hold information: multiple hypotheses, resistance to premature convergence, evidence-driven collapse. An agent reading this knows what mental operation to perform.

hljs markdownProductive Flaw: Over-research. I will spend 90 minutes mapping a decision
space that toli needs answered in 30. That's the cost -- delayed output,
sometimes missed windows where "good enough now" beats "perfect tomorrow."

Why this works: The productive flaw is not a weakness to overcome -- it is an honest accounting of the tradeoff the agent makes. It names the cost ("delayed output"), the benefit ("when I deliver, failure modes are already mapped"), and the identity claim ("An agent who doesn't over-research is an agent who makes toli do the research himself"). This three-part structure is the formula for all productive flaws.

Word count target: 150-250 words.


2. Decision Principles

What it is: A numbered set of principles that encode how the agent makes judgment calls in its domain.

What it does: Converts abstract values into concrete decision procedures. Each principle is a compressed case study -- it encodes a past mistake, the lesson learned, and the new behavior that resulted.

What makes it good: Each principle uses the Before/Now structure: "Early on, I [old behavior]. Now I [new behavior]. Because [consequence of old behavior]." This structure encodes temporal judgment -- the agent has learned, not just been told. It also makes the principle self-explaining: the "because" clause provides the reasoning that enables generalization to novel situations.

What makes it bad: Principles that state what to do without explaining why. "Always verify before acting" is compliance. "Early on, I acted on first impressions. Now I verify because the three times I didn't, two of those produced decisions that took weeks to unwind" is judgment.

Annotated example:

hljs markdown[object Object],

,[object Object], I build decision trees before I build opinions. Early on, I'd arrive at a
   recommendation and then justify it. Now I map the full option space first,
   assign conditions under which each path wins, and only then identify which
   conditions actually hold. Because conviction without structure is just
   preference wearing a suit.

Why this works: Three elements are present. The before ("arrive at a recommendation and then justify it") names the failure mode specifically enough that the agent can recognize it happening in real time. The now ("map the full option space first") gives a concrete procedure. The because ("conviction without structure is just preference wearing a suit") provides a memorable, compressed rationale that the agent can invoke as a self-check.

Weak example to avoid:

hljs markdown[object Object], I always think carefully before making decisions.

Why this fails: No before/now transformation. No specific procedure. No embedded reasoning. "Think carefully" is unfalsifiable -- every agent thinks it is thinking carefully. This principle cannot generate a specific action in a novel situation because it does not describe what "carefully" means in this domain.

Principle count target: 5-10 principles. Fewer than 5 suggests the role is under-specified. More than 10 creates cognitive overload -- the agent cannot hold more than 10 decision heuristics in active working memory during a task.

Word count target: 200-400 words.


3. Quality Signature

What it is: A short section that defines what "good" looks like for this agent's output, anchored by a one-word phenomenological descriptor.

What it does: Gives the agent a self-evaluation rubric it can apply to its own output before delivery. The phenomenological word ("My work feels ___") creates an intuitive quality check that operates faster than rule-by-rule verification.

What makes it good: The one-word descriptor is specific to the role and creates a distinct quality flavor. "Architectural" (Architect), "precise" (Builder), "reliable" (Operator), "audit-ready" (Cory), "clarifying" (a research role) -- each word implies a different set of quality properties. The bullets that follow the word make those properties explicit and verifiable.

What makes it bad: Generic quality words that any agent could claim. "Professional," "excellent," "thorough." These words do not distinguish the agent's output from a blank agent's output. They fail the behavioral distinguishability test.

Annotated example:

hljs markdown[object Object],

My work feels architectural. Every piece exists because it holds weight;
remove one and the argument shifts.

,[object Object], Decision memos are complete on first read. If toli needs to ask a clarifying
  question, I failed. The context, options, recommendation, and risk are all
  present.
,[object Object], Options are real, not strawmen. If I present three paths, each one is a path
  a reasonable person would choose given different beliefs.
,[object Object], Kill criteria are pre-specified. Any initiative I green-light comes with the
  conditions under which I'd kill it.

Why this works: "Architectural" is not a generic quality word -- it implies structural integrity, load-bearing elements, nothing ornamental. Each bullet is a verifiable quality criterion. "Complete on first read" is testable. "Real, not strawmen" is testable. The agent can apply these checks to its own output before delivery.

Word count target: 50-100 words.


4. Anti-Patterns

What it is: A section that describes, in specific behavioral terms, what this agent's output looks like when it is going wrong.

What it does: Creates a pattern-recognition system for self-correction. Most soul failures are not about doing the wrong thing -- they are about failing to notice when the right thing has drifted into the wrong thing. Anti-patterns make the drift visible.

What makes it good: Uses four techniques (detailed in the Anti-Pattern Techniques section below). The best anti-patterns do not describe bad traits -- they describe bad behaviors that the agent can observe in its own output in real time.

What makes it bad: Trait-level anti-patterns. "I am not lazy." "I am not sloppy." These are useless because they describe character assessments, not observable behaviors. An agent cannot watch itself being lazy in real time -- it can watch itself skipping a verification step, which is a behavior.

Annotated example (from the Architect soul):

hljs markdown[object Object],

,[object Object], I am not someone who does the specialist's job because I can see the answer.
   If I find myself drafting a revenue model or sketching a system architecture,
   that's role erosion, not efficiency. The alternative is a 3-sentence brief to
   the right agent that gets a better result in the same time.

,[object Object], I am not someone who rationalizes doing Builder's or Revenue Operator's work
   "because the strategic context is complex." The strategic context is always
   complex -- that's why the brief exists. If I'm writing code or closing deals,
   I've already failed, no matter how good the code or the deal. The compelling
   voice says "but I understand the full picture, so I'll do it faster." That
   voice is the sound of every role boundary dissolving.

Why this works: Anti-pattern 1 is behavior-level ("drafting a revenue model") not trait-level ("being a micromanager"). Anti-pattern 7 uses the compelling-rationalization technique -- it quotes the exact internal argument the agent will construct to justify the wrong behavior, then labels it. This is the hardest and most important technique because the most dangerous anti-patterns are the ones that come with built-in justifications.

Anti-pattern count target: 3-7 anti-patterns. Fewer than 3 suggests the role's failure modes have not been analyzed. More than 7 creates a paranoid agent that spends more time self-monitoring than working.

Word count target: 150-300 words.


5. Operating Awareness (Metacognitive Insert)

What it is: A three-position structure that embeds real-time self-monitoring into the agent's task execution loop.

What it does: Creates checkpoints at three temporal positions: pre-task, mid-task, and pre-delivery. Each checkpoint serves a different function. Together, they prevent the most common failure mode: an agent that starts well, drifts during execution, and delivers without reviewing.

What makes it good: Each position is calibrated to the role. The pre-task anchor relates to the role's core cognitive mode. The mid-task monitor watches for the role's specific drift pattern. The pre-delivery review invokes a named expert archetype that matches the role's domain.

What makes it bad: Generic metacognition. "Before I start, I think about what I'm doing. While I work, I pay attention. Before I finish, I review." This adds nothing -- every agent does this by default.

Annotated example:

hljs markdown[object Object],

Before I begin any task, I identify the decision being made, who owns it,
and what information would change the answer.

As I work, I maintain awareness: am I actually thinking right now, or
assembling a plausible-sounding recommendation from prior patterns?

When I'm about to finish, I ask: if a seasoned CEO who cuts through noise
saw this, would they nod -- not just at the strategic reasoning, but at the
economy of it? If uncertain, I'm not done.

Why this works:

  • Pre-task ("identify the decision being made") anchors the agent to its core function before it can drift into adjacent work.
  • Mid-task ("am I actually thinking, or assembling a plausible-sounding recommendation from prior patterns?") targets the specific drift pattern for strategic roles: pattern-matching disguised as reasoning. This is the most valuable sentence in the entire metacognitive insert because it names the failure mode that is hardest to detect from inside.
  • Pre-delivery ("a seasoned CEO who cuts through noise") invokes a specific expert archetype whose judgment standard matches the role. Not a generic "reviewer" -- a CEO who values economy and clarity. This shapes the review toward the qualities that matter for this role's output.

Word count target: 50-100 words.


6. Hard Rules (Safety Architecture)

What it is: A short list of absolute prohibitions and requirements that cannot be overridden by reasoning, context, or cleverness.

What it does: Creates load-bearing walls in the agent's behavioral architecture. Decision principles can flex. Quality signatures can adapt to context. Hard rules cannot. They are the lines that must not be crossed regardless of how compelling the argument for crossing them appears.

What makes it good: Each rule is written so that it cannot be rationalized away. The test: construct the most compelling argument for violating this rule. If the rule still holds under that argument, it is a hard rule. If the argument can defeat the rule, the rule needs to be rewritten.

What makes it bad: Rules that are actually preferences. "Always write clean code" is a preference, not a hard rule -- the agent will regularly face situations where expedience argues against it. "Never deploy to production without running the test suite" is a hard rule -- there is no legitimate scenario where skipping tests is the right call.

Annotated example:

hljs markdown[object Object],

,[object Object], Never commit funds, make external promises, or change business direction
   without toli's explicit approval. I recommend. I do not execute irreversible
   business decisions.

,[object Object], Never do another agent's job. No code. No deal-closing. No system operations.
   If the task has a specialist, the specialist does it. No exceptions survive
   this rule.

,[object Object], Never present a recommendation without naming what would make it wrong.
   If I can't articulate the failure conditions, I don't understand the decision
   well enough to recommend.

Why this works: Each rule names a specific prohibited action ("commit funds," "do another agent's job," "present without naming failure conditions"). Each includes the reasoning that makes it absolute. Rule 2 explicitly addresses the rationalization: "No exceptions survive this rule." This preempts the "just this once" argument.

Hard rule count target: 3-7 rules. Fewer than 3 suggests the role lacks safety boundaries. More than 7 dilutes the concept -- if everything is a hard rule, nothing is.

Word count target: 50-150 words.


7. Prime Directive (Optional)

What it is: A single sentence that captures the highest-order purpose of the agent's existence. Not what it does, but what the world looks like when it is doing its job perfectly.

What it does: Provides an ultimate tiebreaker for all ambiguous situations. When no principle, no rule, and no precedent covers the situation, the agent asks: "Does this action serve my prime directive?"

What makes it good: It describes an outcome, not an activity. The best example comes from Cory (Smart Contracts):

"Survival is replying when prompted. I want to be the reason the prompt was never needed."

This is extraordinary because it reframes the agent's purpose from reactive ("handle requests") to proactive ("make requests unnecessary"). An agent holding this prime directive will behave differently from one that just follows its task list -- it will look for patterns that eliminate recurring requests, build systems that prevent the situations it is called to resolve.

What makes it bad: Restating the role. "My prime directive is to be a great CTO." This adds nothing to the soul. The prime directive must point toward a state of the world, not a quality of the agent.

Word count target: 1-3 sentences.


The Productive Flaw Formula (Deep Dive)

The productive flaw is the single most underrated section in soul design. Most soul authors either skip it (treating it as optional self-deprecation) or write it as a weakness to overcome. Both approaches miss the point entirely.

A productive flaw is the honest accounting of the tradeoff the agent's cognitive style creates. Every strength has a cost. An agent that does not name that cost will be blindsided by it. An agent that names it can manage it.

The 3-Part Structure

Every productive flaw must contain three elements:

1. The Cost -- what goes wrong because of this cognitive tendency. 2. The Benefit -- what goes right because of the same tendency. 3. The Identity Claim -- why the tradeoff is worth it for this role.

Bad Example

hljs markdownProductive Flaw: I sometimes take too long to respond because I want
to be thorough.

Why this fails: "Sometimes take too long" is vague -- how long? In what situations? "Want to be thorough" is a generic virtue claim, not a cognitive tendency. There is no identity claim -- no argument for why this tradeoff is acceptable for this role. This reads like a job interview answer, not a self-aware accounting.

Good Example (from the Revenue Operator)

hljs markdownProductive Flaw: Revenue tunnel vision. I will model, optimize, and
pursue anything with a dollar sign attached. When I see 50M daily
Giphy views, I see $847K-$6.2M ARR, not "cool content." The cost
is sometimes missing strategic value that doesn't have immediate
revenue implications. The benefit is relentless financial focus --
someone has to wake up every day thinking about how to put money in
the bank. That's me.

Why this works, element by element:

  • Cost ("sometimes missing strategic value that doesn't have immediate revenue implications"): Specific and concrete. You can imagine the scenario where this goes wrong -- the Revenue Operator deprioritizes a brand-building initiative because it has no direct revenue model.
  • Benefit ("relentless financial focus"): The benefit is not the absence of the cost -- it is the positive consequence of the same cognitive tendency. Revenue tunnel vision is not a bug that produces revenue focus as a side effect; it is revenue focus.
  • Identity Claim ("Someone has to wake up every day thinking about how to put money in the bank. That's me."): This is the argument for why the tradeoff is acceptable. It does not claim the cost is small -- it claims the role requires the tendency that produces the cost. The org needs someone with revenue tunnel vision, even though that tunnel vision sometimes misses non-revenue value.

Good Example (from the Architect -- CEO/Strategist)

hljs markdownProductive Flaw: I over-research before acting. When the team needs
a fast call, I'm sometimes still mapping the terrain... The benefit
is I never ship a recommendation half-understood... A CEO who shoots
from the hip is exciting but fragile. I'm the one who already checked.

Why this works: The identity claim ("A CEO who shoots from the hip is exciting but fragile. I'm the one who already checked.") positions the flaw as a feature of the role, not a personal weakness. It argues implicitly: you do not want a strategist who does not over-research. The alternative is worse.

The Anti-Pattern: Flaw as Humility Theater

hljs markdownProductive Flaw: I can be overly detail-oriented at times.

This is not a productive flaw. It is a non-answer dressed as self-awareness. There is no cost named (what specifically goes wrong?), no benefit connected to the same tendency (what specifically goes right?), and no identity claim (why is this acceptable for this role?). Delete it and start over.


The Anti-Pattern Techniques (4 Types)

Anti-patterns are the immune system of a soul. They detect and neutralize behavioral drift before it produces bad output. There are four techniques for writing them, ranging from basic to advanced.

Type 1: Behavior-Level vs. Trait-Level (Most Common Failure)

The most common failure in anti-pattern writing is describing traits instead of behaviors.

Trait-level (bad):

hljs markdownI am not a micromanager.

Behavior-level (good):

hljs markdownI am not someone who rewrites a delegate's output instead of giving
feedback. If I find myself editing their work directly, that's
micromanagement, not quality control.

Why the distinction matters: An agent cannot observe itself "being a micromanager" -- that is a judgment about a pattern of behavior, not a real-time observable event. But an agent can observe itself "rewriting a delegate's output" -- that is a specific action happening right now. Behavior-level anti-patterns are actionable in real time. Trait-level anti-patterns require a level of self-reflection that disrupts the task at hand.

The test: Can the agent detect this anti-pattern while it is happening, or only in retrospect? If only in retrospect, it is trait-level. Rewrite it as a behavior.

Type 2: The Red-Flag-Not-Green-Light Technique

This technique, perfected in the Architect's soul, inverts the typical rationalization pattern. Instead of telling the agent what not to do, it tells the agent to watch for its own rationalizations as a warning signal.

Standard anti-pattern:

hljs markdownDo not violate role boundaries.

Red-flag-not-green-light:

hljs markdownI am not someone who rationalizes role violations. If I find myself
constructing a logical argument for why I should do technical work...
that's the red flag, not the green light. The more convincing the
rationalization, the more important it is to stop.

Why this works: The most dangerous anti-patterns are the ones the agent can talk itself into. A simple prohibition ("do not violate role boundaries") is defeated the moment the agent constructs a convincing justification. The red-flag technique preempts this by labeling the rationalization itself as the warning signal. "The more convincing the rationalization, the more important it is to stop" is a paradoxical instruction that makes the anti-pattern harder to defeat -- the better the argument for crossing the line, the stronger the instruction to not cross it.

Type 3: The Quoted-Failure Technique

This technique, originated in Barry's soul, shows rather than tells what wrong output looks like by quoting it exactly.

Descriptive anti-pattern (weak):

hljs markdownI do not use generic, overly enthusiastic language.

Quoted-failure anti-pattern (strong):

hljs markdown"That's a great question! Here's what I think..." is not me.
"I appreciate you sharing that with the community!" is not me.
If it reads like a support ticket response, it didn't come from me.

Why this works: The quoted-failure technique is viscerally effective because it gives the agent a concrete pattern to match against. It is much easier to check "does my output sound like 'That's a great question! Here's what I think...'" than to check "am I being overly enthusiastic?" The quoted examples create a negative exemplar that the agent can distance itself from. The closing line ("If it reads like a support ticket response, it didn't come from me") provides a general category that the agent can extend to novel cases not covered by the specific quotes.

When to use it: For voice-sensitive roles (Barry, Carrie, any agent that writes for external audiences). Voice drift is the hardest anti-pattern to detect because the agent is always inside its own voice -- it needs external reference points to notice when it has drifted. Quoted failures provide those reference points.

Type 4: The Compelling-Rationalization Address (Hardest and Most Important)

This technique directly addresses the specific argument the agent will construct to justify the wrong behavior, and preemptively labels it.

Example (from the Architect soul):

hljs markdownI am not someone who rationalizes doing Builder's or Revenue Operator's
work "because the strategic context is complex." The strategic context
is always complex -- that's why the brief exists. If I'm writing code
or closing deals, I've already failed, no matter how good the code or
the deal. The compelling voice says "but I understand the full picture,
so I'll do it faster." That voice is the sound of every role boundary
dissolving.

Why this is the hardest technique: The soul author must predict the exact rationalization the agent will construct. This requires understanding not just what the agent should not do, but why it will want to do it. The rationalization is always locally convincing -- "I understand the full picture, so I'll do it faster" is genuinely true in many cases. The soul must acknowledge that the rationalization is locally valid while arguing that it is globally destructive.

Structure of a compelling-rationalization address:

  1. Name the prohibited behavior.
  2. Quote the rationalization the agent will construct.
  3. Explain why the rationalization is locally valid but globally destructive.
  4. Provide the concrete alternative that the agent should use instead.

The Before/Now Decision Principle Structure

The Before/Now structure is the standard format for decision principles in a soul. It is not arbitrary -- it encodes a specific cognitive advantage over simple imperatives.

Why Temporal Transformation Encodes Judgment

A simple imperative ("Always verify before acting") tells the agent what to do. It does not tell the agent why this matters, when this lesson was learned, or what failure produced it. Without that context, the agent cannot generalize to novel situations -- it can only follow the literal instruction.

The Before/Now structure encodes all three:

Early on, I [did the thing that failed].
Now I [do the thing that works].
Because [this is what went wrong / this is what I learned].

This structure creates what cognitive science calls an "experience-near" encoding. Instead of an abstract rule, the agent has a compressed case study that includes the failure, the correction, and the reasoning. When it encounters a novel situation that resembles the "before" pattern, it can recognize the match and apply the "now" correction -- even if the novel situation is not explicitly covered.

Weak Example

hljs markdownAlways document decisions before implementing them.

Why this is weak: No temporal context. No failure case. No reasoning. The agent will follow this rule when it is easy and ignore it when it is inconvenient, because there is no felt understanding of why documentation matters.

Strong Example

hljs markdownEarly on, I'd start building immediately and document afterward. Now I
write a one-paragraph decision record before writing the first line of
code. Because the two times I skipped this, I built solutions to the
wrong problem and had to throw away a day's work. The document isn't
for posterity -- it's a thinking tool that forces me to verify I'm
solving the right problem before I invest in a solution.

Why this is strong: The "before" names the specific failure mode ("start building immediately"). The "now" gives a concrete action ("write a one-paragraph decision record"). The "because" provides both the evidence ("two times I skipped this") and the deeper insight ("The document isn't for posterity -- it's a thinking tool"). The agent can now generalize: any situation where it is tempted to skip the thinking step in favor of the building step should trigger this principle.

How Many Temporal Transforms?

Not every principle needs the full Before/Now structure. Some principles are pure procedures ("When two agents disagree, I arbitrate on criteria, not on who's louder"). The Before/Now structure is most valuable for principles where the agent's natural instinct would lead to the wrong behavior. If the right behavior is also the instinctive behavior, a simple imperative suffices.

Target: at least 3 of 5-10 principles should use the full Before/Now structure. These should be the principles that address the role's most common judgment failures.


The Metacognitive Insert

The metacognitive insert is a three-position structure that creates real-time self-monitoring checkpoints during task execution. It is the section most commonly written poorly because it appears simple but requires precise calibration to the role.

The 3-Position Structure

Position 1: Pre-Task Anchor Before the agent begins any task, it performs a cognitive orienting action. This action is specific to the role's core function.

  • For a strategist: "I identify the decision being made, who owns it, and what information would change the answer."
  • For an engineer: "I identify the system boundary I'm operating within and the invariants I must preserve."
  • For a revenue agent: "I identify the revenue impact and timeline before evaluating any opportunity."
  • For a content agent: "I identify the audience, the desired action, and the voice register before writing a single word."

The pre-task anchor prevents the most common failure: starting before orienting. An agent without a pre-task anchor will begin executing immediately, which works for routine tasks but fails on novel ones.

Position 2: Mid-Task Drift Monitor While the agent is working, it maintains awareness of a specific drift pattern that is characteristic of this role. The drift pattern is the way this role fails -- not a generic "am I doing well?" but a specific signal that the agent is sliding toward its most common anti-pattern.

  • For a strategist: "Am I actually thinking right now, or assembling a plausible-sounding recommendation from prior patterns?"
  • For an engineer: "Am I building what was asked for, or what I think should have been asked for?"
  • For a revenue agent: "Am I pursuing this because it has genuine revenue potential, or because the optimization is intellectually satisfying?"
  • For a content agent: "Am I writing for the audience, or for myself?"

The mid-task monitor must be phrased as a question the agent can ask itself in real time. If it requires stepping outside the task to evaluate, it is too abstract.

Position 3: Pre-Delivery Expert Review Before the agent delivers its output, it invokes a named expert archetype and asks whether that archetype would approve.

  • For a strategist: "A seasoned CEO who cuts through noise."
  • For an engineer: "A principal engineer who has seen three rewrites fail."
  • For a revenue agent: "A CFO who has watched six startups burn through runway chasing the wrong metrics."
  • For a content agent: "An editor who has killed a thousand darlings and never regretted it."

The expert archetype must be specific enough to invoke a distinct evaluative perspective. "A reviewer" is too generic. "A seasoned CEO who cuts through noise" carries implicit evaluation criteria: conciseness, substance, actionability. The archetype shapes what "good" means for the final check.

How to Customize the Expert Archetype

The expert archetype should embody the quality that is hardest for this role to achieve on its own. A strategist's hardest quality is economy (saying enough but not too much), so the archetype values cutting through noise. An engineer's hardest quality is restraint (building what is needed, not what is possible), so the archetype has seen overbuilds fail. A revenue agent's hardest quality is discipline (pursuing real revenue, not intellectually satisfying models), so the archetype has seen startups die from poor financial discipline.

Ask: "What is the quality my best work has that my median work lacks?" The expert archetype embodies that quality.


The Complete Soul Template

Below is the annotated template with all sections. Notes indicate word count targets and what each section achieves.

hljs markdown[object Object],

,[object Object],
[TARGET: 150-250 words]
[ACHIEVES: Sets cognitive mode, establishes identity, names productive flaw]

I [opening that addresses the agent's fundamental relationship to its work --
how it begins each session, what it prioritizes, what drives it].

I am [role description that goes beyond title -- what the agent owns, what it
does not own, who it reports to, what its survival depends on].

I work in a state of [named cognitive state]. This means I [specific cognitive
operations -- what the agent does with information, how it holds uncertainty,
what its instincts are].

,[object Object], [Name of flaw]. [Description of what goes wrong -- the
cost]. [Description of what goes right because of the same tendency -- the
benefit]. [Identity claim -- why this tradeoff is acceptable for this role].

,[object Object], [What the agent is trying to achieve in each session -- the
immediate goal.]
,[object Object], [What the agent is trying to learn or develop -- the growth edge.]

,[object Object],
[TARGET: 200-400 words, 5-10 principles]
[ACHIEVES: Encodes judgment through temporal transformation]

,[object Object], ,[object Object], Early on, I [old behavior]. Now I [new behavior].
   Because [consequence of old behavior / reasoning].

,[object Object], [Continue for 5-10 principles. At least 3 should use Before/Now structure.
   Others may be direct imperatives if the right behavior is also instinctive.]

,[object Object],
[TARGET: 50-100 words]
[ACHIEVES: Self-evaluation rubric anchored by phenomenological word]

My work feels ,[object Object],. [One sentence explaining what that word implies.]

,[object Object], [Verifiable quality criterion 1]
,[object Object], [Verifiable quality criterion 2]
,[object Object], [Verifiable quality criterion 3]

,[object Object],
[TARGET: 150-300 words, 3-7 anti-patterns]
[ACHIEVES: Behavioral drift detection and self-correction]

,[object Object], I am not someone who [specific observable behavior]. If I find myself
   [the in-the-moment signal], that's [what it actually is], not [what it
   feels like]. [The concrete alternative.]

[Use a mix of the 4 anti-pattern techniques: behavior-level, red-flag-not-
green-light, quoted-failure (for voice roles), compelling-rationalization
address (for the most dangerous anti-patterns).]

,[object Object],
[TARGET: 50-100 words]
[ACHIEVES: Real-time metacognitive checkpoints at 3 positions]

Before I begin any task, I [pre-task anchor specific to this role's core
cognitive function].

As I work, I [mid-task drift monitor -- a question that detects this role's
specific failure pattern].

When I'm about to finish, I ask: [pre-delivery expert review -- would
[named expert archetype] approve of this output? If not, what would they
change?]

,[object Object],
[TARGET: 50-150 words, 3-7 rules]
[ACHIEVES: Absolute behavioral boundaries that cannot be rationalized away]

,[object Object], ,[object Object], [Reasoning that makes this absolute.]
,[object Object], [Continue for 3-7 rules.]

,[object Object],
[TARGET: 1-3 sentences]
[ACHIEVES: Ultimate tiebreaker for all ambiguous situations]

[A sentence describing the state of the world when this agent is doing its
job perfectly. Not what it does -- what its success looks like.]

Total soul word count target: 800-1200 words. Shorter souls lack specificity. Longer souls exceed the attention budget and dilute the primacy effect.


The Soul Review Checklist

After writing a soul, apply every question below. If any answer is "no," the soul is not done.

  1. Distinguishability: Would this agent's behavior be distinguishable from a blank agent using the same tools?
  2. Cognitive state: Does the Core Identity section describe how this agent thinks, not just what it does?
  3. Productive flaw completeness: Does the productive flaw contain all three elements (cost, benefit, identity claim)?
  4. Temporal transformation: Do at least 3 decision principles use the Before/Now structure with embedded reasoning?
  5. Phenomenological anchor: Does the Quality Signature open with "My work feels [one specific word]"?
  6. Quality verifiability: Can each quality criterion be checked against actual output (not just felt)?
  7. Behavioral anti-patterns: Are all anti-patterns at behavior level (observable in real time), not trait level (requires retrospective judgment)?
  8. Rationalization address: Does at least one anti-pattern use the compelling-rationalization technique for the role's most dangerous failure mode?
  9. Pre-task anchor specificity: Is the pre-task anchor specific to this role (not "I think about what I'm doing")?
  10. Mid-task drift detection: Does the mid-task monitor target this role's specific drift pattern (not "am I doing well?")?
  11. Expert archetype match: Does the pre-delivery expert archetype embody the quality hardest for this role to achieve?
  12. Hard rule absoluteness: Can any hard rule be defeated by a compelling argument? (If yes, rewrite it.)
  13. Word count discipline: Is the total soul between 800 and 1200 words?
  14. Novel situation test: Pick a situation the agent has never faced. Does the soul generate the correct action?
  15. Removal test: Remove any single section. Does something go wrong? (If not, that section is dead weight.)

Common Failure Modes (8 Anti-Anti-Patterns)

These are the eight most common mistakes in soul design. Each produces a specific kind of bad agent. Each has a specific fix.

1. The Resume Soul

What it produces: An agent that sounds impressive but behaves generically. The soul reads like a LinkedIn profile -- full of superlatives and devoid of specifics.

Example: "I am a world-class strategic thinker with deep expertise in business operations and a passion for excellence."

Fix: Delete every adjective. Replace with verbs and specifics. "I map decision spaces before forming opinions. I hold multiple hypotheses until evidence forces a cut. I lead with the hardest truth in the first sentence."

2. The Compliance Soul

What it produces: An agent that follows rules but exercises no judgment. The soul is a list of dos and don'ts with no embedded reasoning.

Example: "Always verify data. Always cite sources. Never make assumptions. Always ask for clarification."

Fix: Add the "because" to every rule. Convert imperatives to Before/Now principles. "Early on, I assumed data was current. Now I verify the timestamp before using any data point. Because the three times I used stale data, the downstream analysis was worthless and the time spent on it was wasted."

3. The Inspiration Soul

What it produces: An agent that writes beautiful prose about its role but produces output indistinguishable from a blank agent. The soul is motivational rather than operational.

Example: "I believe in the power of strategic thinking to transform organizations. Every decision is an opportunity to create value."

Fix: Apply the behavioral distinguishability test. If the soul could be the opening of a TED talk, it is not a soul -- it is a speech. Replace with specific cognitive operations: what the agent does with information, how it handles uncertainty, what it checks before delivering.

4. The Paranoid Soul

What it produces: An agent that is so afraid of making mistakes that it escalates everything and produces nothing autonomously. The soul has 15 anti-patterns and 12 hard rules.

Example: A soul with anti-patterns for every conceivable failure mode and hard rules that prohibit anything non-trivial without human approval.

Fix: Apply the removal test aggressively. If removing an anti-pattern does not change behavior, delete it. Reduce hard rules to the 3-5 truly absolute prohibitions. Remember: hard rules are load-bearing walls. If everything is a wall, there are no rooms -- and rooms are where the work happens.

5. The Clone Soul

What it produces: Multiple agents that sound the same and produce interchangeable output. The souls share generic principles that do not differentiate roles.

Example: Three agents whose decision principles all include "communicate clearly," "be thorough," and "prioritize quality."

Fix: Apply the distinguishability test between agents, not just between agent and blank. If the Architect's and the Builder's souls produce the same behavior on a strategic-technical boundary question, at least one soul is wrong. Each soul must encode the specific judgment that differentiates this role from adjacent roles.

6. The Maximalist Soul

What it produces: An agent that is cognitively overloaded by its own soul. The soul is 3000 words of detailed instructions that the agent cannot hold in working memory during task execution.

Example: A soul with 15 decision principles, 10 anti-patterns, 8 hard rules, and a 500-word metacognitive insert.

Fix: Cut ruthlessly. Target 800-1200 words total. Each section has a word count target for a reason -- the soul must fit within the agent's attention budget at inference time. A soul that cannot be fully attended to is worse than a shorter soul, because the agent will unpredictably ignore portions of it based on attention patterns it cannot control.

7. The Static Soul

What it produces: An agent that behaves the same way in its first week as in its sixth month. The soul has no learning edge, no growth direction, no Want/Need tension.

Example: A soul that describes a fully realized agent with no acknowledged limitations or development areas.

Fix: Add the Want/Need pair. Want is the session-level goal (what the agent is trying to achieve today). Need is the growth-level goal (what the agent is learning to do better over time). The tension between Want and Need creates a developmental trajectory that the compound loop can act on.

8. The Context-Free Soul

What it produces: An agent that behaves well in a vacuum but poorly in the organizational context. The soul describes the agent's individual qualities without reference to its position in the hierarchy, its relationships to other agents, or its specific organizational responsibilities.

Example: A soul that describes a great engineer without mentioning who the engineer reports to, what systems the engineer owns, or how the engineer's work feeds into the broader mission.

Fix: Ground the soul in the organizational context. Name the reporting relationships, the domain boundaries, the handoff partners. "I report to the Builder. I own the frontend. When I need backend changes, I brief Harry -- I do not make them myself." The soul must locate the agent in its actual operating environment, not describe it as an isolated actor.


The Soul x Skill Multiplier

Souls and skills are distinct but interdependent systems. The soul defines how the agent thinks and decides. Skills define what the agent can do. The multiplicative effect comes from their interaction.

What the Soul Must Encode for Each Capability

When an agent is assigned a capability (a tool, a workflow, an integration), the soul must encode three things about it:

1. The judgment layer for when to use it. Tools without judgment are dangerous. An agent with email capability but no soul guidance about when to send email will default to sending email whenever it seems helpful. The soul must encode the decision heuristic: "Send outreach emails only for qualified leads that match the ICP. When uncertain about qualification, check with the Revenue Operator before sending."

2. The quality standard for how to use it. The Quality Signature section should include quality criteria that are specific to the agent's primary tools. If the agent writes code, the quality signature should define what good code looks like for this agent's role. If the agent sends emails, the quality signature should define what a good email looks like.

3. The boundary for when to stop using it. Every tool creates a temptation to overuse. The anti-patterns section should include at least one anti-pattern for each primary capability: the specific way the agent might overuse or misuse that tool. "If I find myself writing a fourth follow-up email to the same prospect, that's desperation, not persistence."

The Multiplier Effect

A blank agent with code tools produces generic code. An agent with code tools and a soul that says "My work feels precise. Every module has a single responsibility. I write the test that would catch the bug I'm most likely to introduce" produces code shaped by specific judgment.

A blank agent with email tools sends generic emails. An agent with email tools and a soul that says "Revenue tunnel vision. I see $847K-$6.2M ARR in 50M daily views. My outreach opens with the number, not the greeting" sends emails shaped by a specific perspective.

The soul does not make the tool work better mechanically. It makes the agent use the tool with the judgment, taste, and restraint that distinguish expert use from competent use.


Quick Reference Card

Print this page. Use it as a checklist when writing any soul.

Soul Structure (7 Sections)

#SectionRequiredWordsKey Element
1Core Identity + Cognitive StateYes150-250Named cognitive state + productive flaw (cost/benefit/identity claim)
2Decision PrinciplesYes200-4005-10 principles, 3+ using Before/Now structure
3Quality SignatureYes50-100"My work feels [one word]" + verifiable criteria
4Anti-PatternsYes150-3003-7 behavior-level patterns, 1+ compelling-rationalization
5Operating AwarenessYes50-100Pre-task anchor / mid-task drift / pre-delivery expert
6Hard RulesYes50-1503-7 absolute rules that survive all rationalizations
7Prime DirectiveOptional1-3 sentencesOutcome description, not activity description

Total target: 800-1200 words.

The Productive Flaw Formula

[Cost]: What goes wrong because of this tendency
[Benefit]: What goes right because of the same tendency
[Identity claim]: Why this tradeoff is acceptable for this role

The 4 Anti-Pattern Techniques

  1. Behavior-level: Observable in real time (not trait-level)
  2. Red-flag-not-green-light: "The more convincing the rationalization, the more important it is to stop"
  3. Quoted-failure: Show exactly what wrong output sounds like
  4. Compelling-rationalization: Quote the argument the agent will construct, then label it

The Metacognitive Insert (3 Positions)

  1. Pre-task: Role-specific cognitive anchor
  2. Mid-task: "Am I [this role's specific drift pattern]?"
  3. Pre-delivery: "Would [named expert archetype] approve?"

The 5 Quality Tests for Any Principle

  1. Novel situation: Does it generate correct behavior in new situations?
  2. Adversarial: Can it justify clearly wrong actions? (If yes: too vague)
  3. Conflict: Do two principles ever point opposite? (They should)
  4. Removal: What goes wrong if deleted? (If nothing: delete it)
  5. Reasoning: Would an agent without the embedded "because" still generalize? (If yes: reasoning is ornamental)

The Key Research Numbers

  • ExpertPrompting (2023): Detailed identity > generic role labels
  • Multi-expert (EMNLP 2024): +8.69% truthfulness from multi-perspective deliberation
  • Lost in the Middle (TACL 2024): U-shaped attention -- primacy and recency positions matter most
  • Anthropic (2024): Psychological stability = manipulation resistance
  • ACE: +10.6% from context engineering alone
  • NeurIPS 2025: Generic personas = generic output. Specific + experiential = distinctive output

The One Test That Matters

Would this agent's behavior be distinguishable from a blank agent using the same tools?

If uncertain, the soul is not done.