Back to Blog

Prompt Chaining: How to Build Sequential AI Workflows

AI AgentsAlin Dobra14 min read

Prompt Chaining: How to Build Sequential AI Workflows

You've hit the wall. Your single prompt is getting longer, more complex, and increasingly unreliable. The LLM sometimes nails it, sometimes completely misses. Sound familiar?

Prompt chaining is the solution: break your mega-prompt into smaller, focused steps where each LLM call does one thing well.

This guide shows you how to build reliable prompt chains, when to use them, and how to avoid the common pitfalls that trip up most developers.

What Is Prompt Chaining?

Prompt chaining connects multiple LLM calls in sequence. The output of one prompt becomes the input for the next:

text
1
        
2
   Prompt 1     Prompt 2     Prompt 3  
3
   Extract         Transform         Format    
4
        
5
                                           
6
                                           
7
  Raw Data          Structured          Final Output
8
                      Data
9
 

Instead of asking the LLM to do everything at once:

text
1
 "Read this document, extract the key points, translate them to Spanish, 
2
    summarize each point, and format as a newsletter"
3
 

You break it into steps:

text
1
 Step 1: "Extract key points from this document"
2
   Step 2: "Translate these points to Spanish"  
3
   Step 3: "Summarize each point in one sentence"
4
   Step 4: "Format these summaries as a newsletter"
5
 

Each step is simpler, more reliable, and easier to debug.

Why Prompt Chaining Works

1. Reduced Cognitive Load

LLMs perform better on focused tasks. A prompt that does one thing well consistently outperforms a prompt trying to juggle five things.

Research insight: Studies show LLM accuracy drops significantly as task complexity increases. Breaking a 5-step task into 5 prompts can improve overall accuracy by 20-40%.

2. Debuggability

When something goes wrong in a monolithic prompt, good luck figuring out where. With chains, you can inspect each intermediate output:

python
1
# Easy to debug
2
step1_output = extract_entities(document)      # Check: Are entities correct?
3
step2_output = classify_entities(step1_output) # Check: Are classifications correct?
4
step3_output = generate_summary(step2_output)  # Check: Is summary accurate?
5
 

3. Reusability

Chain steps become building blocks. Your "translate to Spanish" step works in any pipeline:

python
1
# Reuse across different workflows
2
translate_step = TranslatePrompt(target_language="Spanish")
3
 
4
workflow_a = Chain([extract, translate_step, summarize])
5
workflow_b = Chain([user_input, translate_step, respond])
6
 

4. Cost Optimization

You can use smaller, cheaper models for simpler steps and reserve expensive models for complex reasoning:

python
1
chain = [
2
    Step("Extract dates", model="gpt-3.5-turbo"),     # Simple extraction: cheap model
3
    Step("Parse to ISO format", model="gpt-3.5-turbo"), # Formatting: cheap model  
4
    Step("Analyze timeline", model="gpt-4o"),         # Complex reasoning: powerful model
5
]
6
 

Basic Prompt Chain Implementation

Here's a minimal but complete implementation:

python
1
import openai
2
from dataclasses import dataclass
3
 
4
@dataclass
5
class ChainStep:
6
    name: str
7
    prompt_template: str
8
    model: str = "gpt-4o"
9
 
10
class PromptChain:
11
    def __init__(self, steps: list[ChainStep]):
12
        self.steps = steps
13
        self.client = openai.OpenAI()
14
        self.trace = []  # For debugging
15
    
16
    def run(self, initial_input: str) -> str:
17
        current_input = initial_input
18
        
19
        for step in self.steps:
20
            # Format prompt with current input
21
            prompt = step.prompt_template.format(input=current_input)
22
            
23
            # Call LLM
24
            response = self.client.chat.completions.create(
25
                model=step.model,
26
                messages=[{"role": "user", "content": prompt}]
27
            )
28
            
29
            output = response.choices[0].message.content
30
            
31
            # Save trace for debugging
32
            self.trace.append({
33
                "step": step.name,
34
                "input": current_input[:200],  # Truncate for readability
35
                "output": output[:200]
36
            })
37
            
38
            # Output becomes next input
39
            current_input = output
40
        
41
        return current_input
42
    
43
    def debug(self):
44
        """Print execution trace"""
45
        for i, step in enumerate(self.trace):
46
            print(f"\n{'='*50}")
47
            print(f"Step {i+1}: {step['step']}")
48
            print(f"Input: {step['input']}...")
49
            print(f"Output: {step['output']}...")
50
 
51
 
52
# Usage
53
chain = PromptChain([
54
    ChainStep(
55
        name="Extract",
56
        prompt_template="Extract all person names from this text:\n\n{input}"
57
    ),
58
    ChainStep(
59
        name="Deduplicate", 
60
        prompt_template="Remove duplicates from this list of names:\n\n{input}"
61
    ),
62
    ChainStep(
63
        name="Format",
64
        prompt_template="Format these names as a numbered list:\n\n{input}"
65
    )
66
])
67
 
68
result = chain.run("John met Sarah at the coffee shop. Sarah introduced John to Mike...")
69
print(result)
70
chain.debug()  # See what happened at each step
71
 

Output:

text
1
1. John
2
2. Sarah
3
3. Mike
4
 
5
==================================================
6
Step 1: Extract
7
Input: John met Sarah at the coffee shop. Sarah introduced John to Mike...
8
Output: John, Sarah, John, Mike, Sarah...
9
 
10
==================================================
11
Step 2: Deduplicate
12
Input: John, Sarah, John, Mike, Sarah...
13
Output: John, Sarah, Mike...
14
 
15
==================================================
16
Step 3: Format
17
Input: John, Sarah, Mike...
18
Output: 1. John
19
2. Sarah
20
3. Mike...
21
 

Real-World Example: Document Processing Pipeline

Let's build a practical document processing chain that:

  1. Extracts key information
  2. Validates the extraction
  3. Transforms to structured data
  4. Generates a summary
python
1
from hopx import Sandbox
2
import openai
3
import json
4
 
5
class DocumentProcessor:
6
    def __init__(self):
7
        self.client = openai.OpenAI()
8
    
9
    def process(self, document: str) -> dict:
10
        # Step 1: Extract key information
11
        extracted = self._extract(document)
12
        
13
        # Step 2: Validate extraction (with code execution)
14
        validated = self._validate(extracted)
15
        
16
        # Step 3: Structure the data
17
        structured = self._structure(validated)
18
        
19
        # Step 4: Generate summary
20
        summary = self._summarize(structured)
21
        
22
        return {
23
            "extracted": extracted,
24
            "validated": validated,
25
            "structured": structured,
26
            "summary": summary
27
        }
28
    
29
    def _extract(self, document: str) -> str:
30
        """Step 1: Extract key entities and facts"""
31
        response = self.client.chat.completions.create(
32
            model="gpt-4o",
33
            messages=[{
34
                "role": "system",
35
                "content": """Extract the following from the document:
36
                - People mentioned (with roles)
37
                - Dates and deadlines
38
                - Action items
39
                - Key decisions
40
                
41
                Format as a structured list."""
42
            }, {
43
                "role": "user",
44
                "content": document
45
            }]
46
        )
47
        return response.choices[0].message.content
48
    
49
    def _validate(self, extracted: str) -> str:
50
        """Step 2: Validate with code execution"""
51
        sandbox = Sandbox.create(template="code-interpreter")
52
        
53
        try:
54
            # Use code to validate dates, check for inconsistencies
55
            validation_code = f'''
56
import re
57
from datetime import datetime
58
 
59
text = """{extracted}"""
60
 
61
# Find all dates
62
date_patterns = [
63
    r'\d{{1,2}}/\d{{1,2}}/\d{{4}}',
64
    r'\d{{4}}-\d{{2}}-\d{{2}}',
65
    r'(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{{1,2}},?\s+\d{{4}}'
66
]
67
 
68
dates_found = []
69
for pattern in date_patterns:
70
    dates_found.extend(re.findall(pattern, text))
71
 
72
# Check for potential issues
73
issues = []
74
if len(dates_found) == 0:
75
    issues.append("No dates found - verify manually")
76
 
77
# Output validation result
78
print("VALIDATION RESULT")
79
print(f"Dates found: {{dates_found}}")
80
print(f"Issues: {{issues if issues else 'None'}}")
81
print("---")
82
print(text)
83
'''
84
            
85
            sandbox.files.write("/app/validate.py", validation_code)
86
            result = sandbox.commands.run("python /app/validate.py")
87
            
88
            return result.stdout
89
        finally:
90
            sandbox.kill()
91
    
92
    def _structure(self, validated: str) -> dict:
93
        """Step 3: Convert to structured JSON"""
94
        response = self.client.chat.completions.create(
95
            model="gpt-4o",
96
            messages=[{
97
                "role": "system",
98
                "content": """Convert this information to JSON with the schema:
99
                {
100
                    "people": [{"name": "", "role": ""}],
101
                    "dates": [{"date": "", "event": ""}],
102
                    "action_items": [{"task": "", "owner": "", "due": ""}],
103
                    "decisions": [""]
104
                }"""
105
            }, {
106
                "role": "user",
107
                "content": validated
108
            }],
109
            response_format={"type": "json_object"}
110
        )
111
        return json.loads(response.choices[0].message.content)
112
    
113
    def _summarize(self, structured: dict) -> str:
114
        """Step 4: Generate executive summary"""
115
        response = self.client.chat.completions.create(
116
            model="gpt-4o",
117
            messages=[{
118
                "role": "system",
119
                "content": "Write a 2-3 sentence executive summary of this meeting/document."
120
            }, {
121
                "role": "user",
122
                "content": json.dumps(structured, indent=2)
123
            }]
124
        )
125
        return response.choices[0].message.content
126
 
127
 
128
# Usage
129
processor = DocumentProcessor()
130
result = processor.process("""
131
Meeting Notes - Product Launch Planning
132
Date: January 15, 2025
133
 
134
Attendees: Sarah Chen (PM), Mike Johnson (Engineering Lead), Lisa Park (Marketing)
135
 
136
Discussion:
137
Sarah presented the launch timeline. Target launch date is March 1, 2025.
138
Mike raised concerns about the API stability - needs 2 more weeks of testing.
139
Lisa confirmed marketing materials will be ready by February 15.
140
 
141
Decisions:
142
- Soft launch to beta users on February 20
143
- Full public launch on March 1
144
- Mike to own the stability testing
145
 
146
Action Items:
147
- Mike: Complete API load testing by February 1
148
- Lisa: Finalize press release by February 10
149
- Sarah: Coordinate with sales team by January 20
150
""")
151
 
152
print(json.dumps(result, indent=2))
153
 

Prompt Chaining Patterns

Pattern 1: Linear Chain

The simplest pattern—each step feeds into the next:

text
1
Input  [A]  [B]  [C]  Output
2
 
python
1
def linear_chain(text):
2
    extracted = extract(text)
3
    translated = translate(extracted)
4
    formatted = format_output(translated)
5
    return formatted
6
 

Best for: Sequential transformations, document processing, data pipelines.

Pattern 2: Branching Chain

Different paths based on intermediate results:

text
1
            [B1]
2
Input  [A]      [D] Output
3
            [B2]
4
 
python
1
def branching_chain(text):
2
    classification = classify(text)
3
    
4
    if classification == "technical":
5
        processed = technical_processor(text)
6
    else:
7
        processed = general_processor(text)
8
    
9
    return finalize(processed)
10
 

Best for: Content routing, specialized processing, conditional logic.

Pattern 3: Parallel Chain

Multiple independent steps that merge:

text
1
        [A]
2
Input [B] Merge  Output
3
        [C]
4
 
python
1
import concurrent.futures
2
 
3
def parallel_chain(text):
4
    with concurrent.futures.ThreadPoolExecutor() as executor:
5
        future_summary = executor.submit(summarize, text)
6
        future_entities = executor.submit(extract_entities, text)
7
        future_sentiment = executor.submit(analyze_sentiment, text)
8
        
9
        summary = future_summary.result()
10
        entities = future_entities.result()
11
        sentiment = future_sentiment.result()
12
    
13
    return merge_results(summary, entities, sentiment)
14
 

Best for: Independent analyses, multi-perspective processing, speed optimization.

Pattern 4: Iterative Chain (Loop)

Repeat until a condition is met:

text
1
        
2
                      
3
Input  [Process]  [Check] (not done)
4
            
5
        (done)
6
            
7
         Output
8
 
python
1
def iterative_chain(text, max_iterations=5):
2
    current = text
3
    
4
    for i in range(max_iterations):
5
        # Process
6
        improved = improve(current)
7
        
8
        # Check if good enough
9
        score = evaluate(improved)
10
        if score > 0.9:
11
            return improved
12
        
13
        current = improved
14
    
15
    return current
16
 

Best for: Refinement tasks, quality improvement, self-correction.

Pattern 5: Fallback Chain

Try multiple approaches, use first success:

text
1
Input  [A] (fail) [B] (fail) [C]  Output
2
                                        
3
      (success)        (success)       (success)
4
                                        
5
       Output          Output          Output
6
 
python
1
def fallback_chain(text):
2
    strategies = [
3
        ("precise", precise_extract),
4
        ("fuzzy", fuzzy_extract),
5
        ("llm_only", llm_extract)
6
    ]
7
    
8
    for name, strategy in strategies:
9
        try:
10
            result = strategy(text)
11
            if validate(result):
12
                return result
13
        except Exception as e:
14
            print(f"{name} failed: {e}")
15
            continue
16
    
17
    raise ValueError("All strategies failed")
18
 

Best for: Robust systems, graceful degradation, handling edge cases.

Adding Code Execution to Chains

Many chain steps benefit from actual code execution—not just LLM reasoning. This is where sandboxed execution becomes essential:

python
1
from hopx import Sandbox
2
import openai
3
 
4
class CodeAugmentedChain:
5
    def __init__(self):
6
        self.client = openai.OpenAI()
7
    
8
    def analyze_data(self, data_description: str, question: str) -> dict:
9
        """
10
        Chain: 
11
        1. LLM generates analysis code
12
        2. Code executes in sandbox
13
        3. LLM interprets results
14
        """
15
        
16
        # Step 1: Generate analysis code
17
        code = self._generate_code(data_description, question)
18
        
19
        # Step 2: Execute in sandbox
20
        execution_result = self._execute_code(code)
21
        
22
        # Step 3: Interpret results
23
        interpretation = self._interpret_results(question, execution_result)
24
        
25
        return {
26
            "code": code,
27
            "raw_output": execution_result,
28
            "interpretation": interpretation
29
        }
30
    
31
    def _generate_code(self, data_description: str, question: str) -> str:
32
        response = self.client.chat.completions.create(
33
            model="gpt-4o",
34
            messages=[{
35
                "role": "system",
36
                "content": """Generate Python code to analyze data and answer the question.
37
                Use pandas for data manipulation.
38
                Print results clearly.
39
                Do not use plt.show() - save plots to files instead."""
40
            }, {
41
                "role": "user",
42
                "content": f"Data: {data_description}\n\nQuestion: {question}"
43
            }]
44
        )
45
        
46
        # Extract code from response
47
        content = response.choices[0].message.content
48
        if "```python" in content:
49
            code = content.split("```python")[1].split("```")[0]
50
        else:
51
            code = content
52
        
53
        return code.strip()
54
    
55
    def _execute_code(self, code: str) -> str:
56
        sandbox = Sandbox.create(template="code-interpreter")
57
        
58
        try:
59
            # Install required packages
60
            sandbox.commands.run("pip install pandas numpy -q")
61
            
62
            # Write and execute code
63
            sandbox.files.write("/app/analysis.py", code)
64
            result = sandbox.commands.run("python /app/analysis.py")
65
            
66
            if result.exit_code != 0:
67
                return f"ERROR:\n{result.stderr}"
68
            
69
            return result.stdout
70
        
71
        finally:
72
            sandbox.kill()
73
    
74
    def _interpret_results(self, question: str, raw_output: str) -> str:
75
        response = self.client.chat.completions.create(
76
            model="gpt-4o",
77
            messages=[{
78
                "role": "system",
79
                "content": "Interpret these analysis results in plain English. Be specific and cite numbers."
80
            }, {
81
                "role": "user",
82
                "content": f"Question: {question}\n\nAnalysis Output:\n{raw_output}"
83
            }]
84
        )
85
        return response.choices[0].message.content
86
 
87
 
88
# Usage
89
chain = CodeAugmentedChain()
90
result = chain.analyze_data(
91
    data_description="CSV file at /app/sales.csv with columns: date, product, revenue, units_sold",
92
    question="What was the best-selling product in Q4 2024?"
93
)
94
 

Error Handling in Chains

Chains fail. Here's how to handle it gracefully:

python
1
from dataclasses import dataclass
2
from typing import Optional
3
import traceback
4
 
5
@dataclass
6
class ChainResult:
7
    success: bool
8
    output: Optional[str]
9
    failed_step: Optional[str]
10
    error: Optional[str]
11
    partial_results: dict
12
 
13
class RobustChain:
14
    def __init__(self, steps: list):
15
        self.steps = steps
16
    
17
    def run(self, initial_input: str) -> ChainResult:
18
        current_input = initial_input
19
        partial_results = {}
20
        
21
        for step in self.steps:
22
            try:
23
                output = step.execute(current_input)
24
                partial_results[step.name] = output
25
                current_input = output
26
                
27
            except Exception as e:
28
                return ChainResult(
29
                    success=False,
30
                    output=None,
31
                    failed_step=step.name,
32
                    error=f"{type(e).__name__}: {str(e)}\n{traceback.format_exc()}",
33
                    partial_results=partial_results
34
                )
35
        
36
        return ChainResult(
37
            success=True,
38
            output=current_input,
39
            failed_step=None,
40
            error=None,
41
            partial_results=partial_results
42
        )
43
 
44
 
45
# With retry logic
46
class RetryableChain(RobustChain):
47
    def run(self, initial_input: str, max_retries: int = 3) -> ChainResult:
48
        current_input = initial_input
49
        partial_results = {}
50
        
51
        for step in self.steps:
52
            for attempt in range(max_retries):
53
                try:
54
                    output = step.execute(current_input)
55
                    partial_results[step.name] = output
56
                    current_input = output
57
                    break  # Success, move to next step
58
                    
59
                except Exception as e:
60
                    if attempt == max_retries - 1:
61
                        return ChainResult(
62
                            success=False,
63
                            output=None,
64
                            failed_step=step.name,
65
                            error=str(e),
66
                            partial_results=partial_results
67
                        )
68
                    # Wait before retry (exponential backoff)
69
                    import time
70
                    time.sleep(2 ** attempt)
71
        
72
        return ChainResult(
73
            success=True,
74
            output=current_input,
75
            failed_step=None,
76
            error=None,
77
            partial_results=partial_results
78
        )
79
 

When NOT to Use Prompt Chaining

Chaining isn't always the answer. Avoid it when:

ScenarioWhy Chaining HurtsBetter Alternative
Simple, single-step taskUnnecessary complexitySingle prompt
Highly interdependent reasoningContext loss between stepsLong-context model
Real-time latency requirementsEach step adds latencyCached/precomputed
Very short inputsOverhead exceeds benefitSingle prompt
Exploratory/creative tasksStructure kills creativityOpen-ended prompt

Signs You're Over-Chaining

  • Each step is trivial (could be done with string formatting)
  • You're passing the same context through every step
  • The chain is slower than a single smart prompt
  • Steps are so coupled they always fail/succeed together

Performance Optimization

1. Parallelize Independent Steps

python
1
import asyncio
2
 
3
async def optimized_chain(text):
4
    # These can run in parallel
5
    summary_task = asyncio.create_task(summarize(text))
6
    entities_task = asyncio.create_task(extract_entities(text))
7
    
8
    summary, entities = await asyncio.gather(summary_task, entities_task)
9
    
10
    # This depends on previous results
11
    final = await generate_report(summary, entities)
12
    
13
    return final
14
 

2. Use Smaller Models for Simple Steps

python
1
steps = [
2
    Step("Format cleanup", model="gpt-3.5-turbo"),      # Simple
3
    Step("Entity extraction", model="gpt-3.5-turbo"),   # Pattern matching
4
    Step("Complex reasoning", model="gpt-4o"),          # Needs power
5
    Step("Final formatting", model="gpt-3.5-turbo"),    # Simple
6
]
7
# Cost: ~60% less than using gpt-4o for everything
8
 

3. Cache Repeated Steps

python
1
from functools import lru_cache
2
import hashlib
3
 
4
@lru_cache(maxsize=1000)
5
def cached_step(input_hash: str, step_name: str) -> str:
6
    # Actual processing
7
    pass
8
 
9
def chain_with_cache(text):
10
    input_hash = hashlib.md5(text.encode()).hexdigest()
11
    
12
    # Check cache first
13
    cached = cached_step(input_hash, "extract")
14
    if cached:
15
        return cached
16
    
17
    # Process and cache
18
    result = extract(text)
19
    cached_step.cache_info()  # Store result
20
    return result
21
 

4. Stream Long Chains

python
1
async def streaming_chain(text):
2
    """Yield results as each step completes"""
3
    
4
    yield {"step": "extract", "status": "starting"}
5
    extracted = await extract(text)
6
    yield {"step": "extract", "status": "complete", "preview": extracted[:100]}
7
    
8
    yield {"step": "transform", "status": "starting"}
9
    transformed = await transform(extracted)
10
    yield {"step": "transform", "status": "complete", "preview": transformed[:100]}
11
    
12
    yield {"step": "format", "status": "starting"}
13
    final = await format_output(transformed)
14
    yield {"step": "format", "status": "complete", "result": final}
15
 

Prompt Chaining vs. Agent Loops

Don't confuse chaining with agentic systems:

Prompt ChainingAgent Loops
Fixed sequence of stepsDynamic, decides next step
Predictable execution pathUnpredictable path
Faster, cheaperMore flexible, expensive
Easier to debugHarder to debug
Best for known workflowsBest for open-ended tasks

Use chaining when you know the steps upfront.
Use agents when the LLM needs to figure out the steps.

Many production systems combine both: an agent that decides what to do, then triggers chains to do it.

Building Your First Chain: Quickstart

python
1
# Install
2
# pip install openai hopx
3
 
4
from openai import OpenAI
5
 
6
client = OpenAI()
7
 
8
def chain_step(prompt: str, input_text: str, model: str = "gpt-4o") -> str:
9
    """Single chain step"""
10
    response = client.chat.completions.create(
11
        model=model,
12
        messages=[{"role": "user", "content": f"{prompt}\n\nInput:\n{input_text}"}]
13
    )
14
    return response.choices[0].message.content
15
 
16
# Your first chain
17
text = "The quick brown fox jumps over the lazy dog. This is a sample text."
18
 
19
step1 = chain_step("Count the words in this text", text)
20
step2 = chain_step("Is this count correct? Verify.", step1)
21
step3 = chain_step("Summarize your findings in one sentence.", step2)
22
 
23
print(step3)
24
 

Once you're comfortable, add:

  1. Error handling
  2. Logging/tracing
  3. Parallel execution
  4. Code execution with sandboxes

Conclusion

Prompt chaining transforms unreliable mega-prompts into robust, debuggable pipelines:

  • Break complex tasks into focused steps
  • Debug easily by inspecting intermediate outputs
  • Optimize costs by using right-sized models per step
  • Build reusable components for multiple workflows

Start simple—a 2-3 step chain. Add complexity only when needed.

The best chains feel invisible: they just work, every time.


Ready to add code execution to your chains? Get started with HopX — sandboxes that spin up in 100ms.

Further Reading