Skip to main content

Chapter 8: Prompt Chaining

Why Single Prompts Fall Short

You've likely noticed that when you ask a language model to handle something genuinely complex, the results disappoint. Ask it to analyze customer feedback, generate recommendations, and write a marketing pitch all in one prompt, and you get something that does none of them well. The analysis stays shallow. The recommendations feel generic. The pitch misses the mark. This isn't a limitation of the model itself. It's a design problem.

Here's what happens: when you compress multiple objectives into a single request, the model spreads its attention across all of them simultaneously. It cannot deeply understand your audience while simultaneously crafting persuasive language while simultaneously extracting insights from raw data. The cognitive load is too high, and the output suffers.

Think of how a marketing director actually works. She doesn't attempt market research, message development, creative execution, and channel strategy in parallel. Instead, each phase builds on the previous one. Market research informs positioning. Positioning guides creative development. Both shape which channels make sense. Each step gains clarity and focus because it works with concrete input from the step before.

Prompt chaining applies this same principle. You break a complex objective into a sequence of focused prompts, each handling a single well-defined task. The output of one prompt becomes input for the next. This layered approach produces work that is deeper, more accurate, and far more useful than what a single prompt can achieve.

From Decomposition to Chaining

Task decomposition gave you a framework for breaking complex work into manageable pieces. Chaining operationalizes that insight by turning those conceptual subtasks into an actual workflow of sequential prompts.

Consider what happens when a single prompt asks the model to analyze customer feedback, identify themes, rank them by frequency, and suggest product improvements. The model distributes its processing power thinly across too many simultaneous demands, producing shallow work on all fronts.

Chaining reverses this. Instead of asking for everything at once, you route the output of one prompt into the input of the next. The first prompt might focus entirely on extracting key complaints from feedback. Its output becomes the raw material for a second prompt that categorizes those complaints. A third prompt then ranks categories by frequency. A fourth generates specific product recommendations based on that ranking.

What changes fundamentally is focus. Each individual prompt has a single, well-defined job. The model generates output for that job alone, producing depth rather than breadth. You then feed that concentrated output forward.

This creates a deliberate control point between each step. You see exactly what the model produced at stage one before stage two runs. If the extraction missed something, you catch it before wasting processing on categorization. If a categorization went wrong, you adjust the prompt and rerun only that step, not the entire analysis.

The subtasks you identified through decomposition become the boundaries between prompts. Chaining transforms that planning tool into an executable system.

How Prompt Chains Work in Practice

A prompt chain operates as a relay system. You send a focused request to the model, collect its output, shape that output into input for the next prompt, and repeat. Each step narrows, transforms, or builds on what came before.

Start by mapping your decomposed task into discrete stages. If you're analyzing customer feedback to generate product improvements, your stages might be: extract themes, rate severity, suggest solutions. Write a prompt for each stage, making clear what input it expects and what format output should take. Specificity here prevents drift.

Execute the first prompt with your raw material. The model generates output. You then review this output and decide how to present it to the next prompt. This is the critical juncture. You might extract key phrases, reformat a list, or add clarifying context. This shaping step is what makes chains effective. Without deliberate formatting between stages, information degrades or becomes unusable.

Feed the shaped output into your second prompt along with any necessary context. The model processes it according to your instructions and generates new output. Repeat this cycle for each stage in your chain.

The power emerges from focus. Each prompt handles one well-defined job rather than asking the model to juggle everything at once. This reduces confusion in the model's output, makes failures easier to diagnose, and lets you refine individual steps without redesigning the entire workflow. When a chain produces weak results, you can pinpoint which stage faltered and adjust only that prompt.

Structured Outputs and Explicit Constraints

When you build a chain, each step produces output that becomes input for the next. That handoff is where fragility enters. If one prompt produces loose, inconsistent text, the next prompt struggles to parse it reliably. The model might misinterpret the data, extract the wrong information, or fail to process it at all.

Structured outputs solve this by imposing format discipline on what the model produces. Instead of asking for a response in natural language, you request a specific format: a numbered list, a JSON block, a table with defined columns. The model then generates text that conforms to that shape. When the next prompt receives that formatted output, it knows exactly where to find what it needs.

Explicit constraints work alongside structure. These are direct instructions about what must or must not appear in the response. "List only the three most important findings, not commentary." "Use exactly 50 words or fewer." "Include a confidence score between 0 and 1 for each claim." Constraints narrow the variation in output, making it more predictable downstream.

The payoff is measurable. A chain that receives well-structured, constrained input from the previous step processes faster, interprets more accurately, and produces more consistent results. Errors in one step don't cascade as readily. If downstream prompts fail, debugging is simpler because you know the format and content limits of what came before.

Building structure into your chain requires forethought. You must decide in advance what shape each output should take and what limits matter for the next step. That upfront clarity pays dividends in reliability and reusability.

Grounding Claims in Measured Evidence

A nonprofit director sits down to write a grant proposal. The program has run for two years. It's helping people. The funder wants evidence of impact, and the development team is eager to claim the program reduces recidivism by 40%, improves employment outcomes, and builds community connections. But the director knows something critical: they haven't measured recidivism or long-term employment yet. The evaluation data they do have tells a different story.

This is where a structured evidence chain becomes essential. The director builds the chain by verifying each claim against the evaluation data.

First, extract what was actually measured. The evaluation reports contain specific numbers: 82% completion rate (n=156 enrollments), 71% of completers earned their diploma or GED within 12 months (n=128), and 85% of participants report increased confidence in academic abilities via post-program survey. The director checks for employment outcomes: long-term employment tracking requires 24-month follow-up, and the program is only 14 months old. That data simply doesn't exist yet.

The director verifies each claim against these findings. Three claims hold up: completion, educational attainment, and confidence. Employment outcomes and recidivism reduction fail the verification check because no data supports them yet.

The resulting proposal presents only what was measured. When the funder asks about employment, the director explains the tracking timeline honestly. This builds trust. The claims are specific, grounded in evaluation data, and realistic about what remains unknown. Overstating impact to win funding creates a credibility crisis later when funders discover claims exceeded the evidence. Evidence chains protect you by forcing each claim to prove its source.

Chaining Architectures and When to Use Them

The grant proposal scenario worked because the stages followed a natural sequence: gather data, understand impact, then write persuasively. That linear flow is one chaining architecture. But not every problem unfolds in a line.

As you build longer workflows, you'll encounter three core patterns, each suited to different kinds of work.

Sequential chaining moves through stages one after another, each output feeding into the next input. Use this when later steps depend entirely on earlier results. The grant proposal follows this pattern because you need the impact analysis before you can write credibly about outcomes. The process is straightforward to build and debug because failures point clearly to a single stage.

Branching chaining splits into parallel paths after an initial stage, then recombines them. Imagine that nonprofit director needing two things at once: a compelling story about a specific beneficiary and quantified program metrics. You'd run both prompts simultaneously rather than sequentially, then merge the outputs into a final synthesis step. This architecture saves time and works well when independent analyses need to feed into one comprehensive view.

Iterative chaining loops back on itself, refining output through repeated passes. You might generate a first draft, evaluate it against criteria, then feed both the draft and the evaluation back into the model for revision. Each loop tightens the result. This pattern works when you need precision over speed, or when the problem benefits from progressive refinement rather than a single pass.

Start sequential. Add branches only when parallel work genuinely saves effort. Use iteration only when single-pass output consistently falls short of your standards.

Building Chains With Critique and Revision

Critique-and-revise chains embed the iteration loop directly into your workflow as explicit stages. You build a chain where the model first generates content, then critiques it against specific standards, then revises based on that critique. This transforms a single prompt into a quality-improvement engine.

When you ask the model to generate and critique in the same stage, it produces safe, middling work. The model defends what it just created rather than examining it rigorously. Separating these stages changes the output. The model critiques against real standards instead of justifying. Revision then becomes targeted rather than vague.

Consider a recruiter refreshing a job posting for a customer service supervisor role. The initial posting covers responsibilities, required skills, and salary. But it reads generic. Phrases like "oversee operations" and "excellent communication" appear in dozens of identical postings. The posting describes the job without painting a picture of daily experience, growth path, or what makes this company different.

Here's how the chain works:

Stage 1: Generate the initial posting using your standard job description template.

Stage 2: Critique explicitly. Prompt the model to identify which phrases feel generic, what's missing that would help strong candidates envision themselves in the role, and what details differentiate this company from competitors.

Stage 3: Revise based on critique. Replace vague language with specifics. The team handles 500+ daily interactions. Success means coaching reps to solve problems independently. The role offers leadership development and influence over customer experience decisions.

Stage 4: Critique again for remaining gaps. Does it clearly explain day-to-day work, team size, and growth paths? Are vague phrases lingering?

The refined posting attracts the right fit because specificity filters. It stops working for everyone and starts working for someone. That precision is the chain's purpose.

Recognizing and Fixing Chain Failures

Chains break in predictable ways. The most common failure occurs when outputs drift apart because later steps don't have access to earlier decisions. A technical writer building a product guide across three stages discovers this quickly. Step 1 produces an overview listing five key features. Step 2 writes detailed sections, but the model adds two additional features not mentioned in the overview. Step 3 generates a troubleshooting guide that references both the original five features and the new ones, creating reader confusion. Someone reading the troubleshooting section encounters instructions like "Reset the Sync Module" without ever learning what the Sync Module is in the prior sections.

The root cause is predictable: no validation between steps means inconsistencies go undetected until they harm the final output. Each prompt operated independently.

The fix is equally straightforward. Make the output from your first step a structural anchor that all downstream steps reference. Instead of asking for an overview, ask for a structured outline listing features in order. Format it clearly: "1. Feature Name | One-sentence description." Now Step 2 doesn't invent features. It writes detailed sections using that outline as a template, covering only the features already listed. Before moving to Step 3, a quick validation confirms the sections match the outline.

Step 3 then receives both the outline and the sections, along with an explicit constraint: "Only reference features already introduced in prior sections." This rule prevents the model from adding new material or inventing details.

The result is consistency. All three outputs reinforce each other. The troubleshooting guide now references only features the reader has already encountered, and every reference points back to where that feature was explained.

Knowing When Not to Chain

The instinct to break work into steps is powerful once you master chaining. You see complexity and immediately think: if I divide this into substeps, I'll get better results. Sometimes that's true. Often it's not.

A single well-crafted prompt frequently outperforms a chain for tasks that are genuinely straightforward. Consider categorizing customer feedback into sentiment and topic. One prompt that specifies both outputs, with clear examples of each combination, will likely succeed on the first try. Building a chain that analyzes sentiment first, then extracts topics adds overhead for almost no benefit. Each step multiplies opportunities for drift, introduces latency, and requires you to maintain more prompts.

The trade-off becomes real when you compare the cost and complexity against the risk of failure. A chain makes sense when downstream steps genuinely depend on high-quality outputs from earlier ones, or when the task is so complex that a single prompt becomes unwieldy and produces inconsistent results. A single prompt makes sense when you can describe the entire task clearly within one context, when the outputs don't cascade into each other in risky ways, and when few-shot examples can reliably guide the model toward accuracy.

Here's the practical test: if you can write a few-shot prompt with four or five clear examples and it performs well, stop there. The overhead of chaining is not worth the marginal improvement. Save chains for tasks where you've actually seen single prompts fail, or where the logic genuinely requires sequential reasoning that cannot be expressed in one go.

Design Your Prompt Chain Before Building It

The difference between a chain that works and one that fails often comes down to planning. When you're tempted to string prompts together, resist the urge to start writing immediately. Instead, spend fifteen minutes mapping the entire workflow on paper or in a document. This prevents the most common failure mode: handoffs that look sensible in your head but break in practice.

Start by writing your end goal, then list every subtask required to reach it in sequence. Be specific about what each step must produce. Don't write "information about X." Write "a ranked list of three options with pros and cons for each, formatted as bullet points." This output specification becomes the input specification for the next prompt. Vague handoffs create vague results.

Once you have your steps mapped, examine each handoff ruthlessly. Does the output of Step 1 contain exactly what Step 2 needs, and nothing else? Excess information from source material mixed with analysis, tangential context, or reasoning that clutters the payload confuses the next prompt and introduces errors. Use structured delimiters like tags or labeled sections to separate the output you need from anything else. <ANALYSIS> and <RECOMMENDATION> boundaries are explicit signals that prevent the model from mixing content types in downstream steps.

Then decide how you'll run this chain: in one continuous conversation, as separate chat interactions, or as a reusable template. Each approach carries context differently. Single conversations preserve memory. Separate steps allow you to integrate external data between prompts. Templates enable repeatable use without redesign.

Identify two or three validation points where you'll manually inspect output quality before feeding it forward. Early detection prevents cascading failures. Finally, test the entire chain end-to-end with real data. Real-world testing reveals handoff mismatches that planning always misses. Document what breaks and why so you can refine the design.

When and Why to Chain Prompts

Chaining extends your capabilities beyond what a single prompt can accomplish. The key is recognizing when a problem genuinely benefits from multiple steps versus when you're overcomplicating a task that works better in one go.

Start by asking yourself: Am I asking the model to do two fundamentally different things? If you need the model to first extract information, then analyze it, then generate recommendations, those are distinct operations. Each one has different success criteria. Each benefits from being isolated and checked. That's when chaining shines. When you try to combine them into one prompt, you're asking the model to juggle competing priorities, and clarity suffers.

Chaining also lets you apply different expertise at different stages. You might use a domain expert role for analysis, then shift to an editor role for refinement. Remember from role-based prompting that roles shape how the model approaches a task. Switching roles between steps means each step gets precision-focused behavior rather than generic output.

The practical limit is complexity and cost. Each additional step adds latency and expense. A three-step chain is often worth it. A ten-step chain rarely is. Chain when the output of one step genuinely informs and improves the next, not when you're simply breaking tasks into smaller pieces for psychological comfort.

Plan your chain using TIP principles at each step. Task clarity, Instruction precision, and Process definition matter just as much in step two as they do in step one. A well-designed chain produces reliable, composable output that you can depend on across professional work.

Comments

Popular posts from this blog

Chapter 1: What Are LLMs?

Imagine you're typing on your phone, and before even finishing a sentence, your phone suggests the next word.  You type "I'm running late for." and your phone readily suggests "work," "dinner," or "the meeting." This everyday technology that saves seconds from your day is actually a window into one of the most significant developments in artificial intelligence: Large Language Models, or LLMs. Auto Complete LLMs are like autocomplete on steroids. While your phone's autocomplete may provide some words in anticipation that are pulled from your recent messages and trending phrases, LLMs are working with an entirely different order of knowledge and sophistication. They're like having an autocomplete function that's read nearly everything human beings have ever written and can predict not just the next word, but entire paragraphs, stories, code, and intricate explanations. From Basic Autocomplete to AI Powerhouses To understand LLM...

Chapter 4: Evaluate and Iterate

You've learned to structure clear prompts using the TIP framework—defining your Task, providing necessary Information, and specifying your desired Product. That's the foundation. But effective prompt engineers don't expect perfection on the first try—they iterate,  just like how we take several photos from different angles before picking the best one. This chapter will teach you how to systematically evaluate outputs and refine your prompts to get consistently better results. Why Outputs Vary: The Probabilistic Nature of LLMs Remember the autocomplete analogy from Chapter 1? LLMs make probabilistic predictions—they don't pick the single "correct" next word, but calculate probabilities for thousands of options and sample from those possibilities. This means even identical prompts might yield slightly different responses. One time an LLM might say "Remote work has revolutionized workplace dynamics," another time "Leading remote teams requires ...