Parallelization

Run multiple LLM calls simultaneously via sectioning or voting to reduce latency and improve reliability.

●●●○○ Complexity

Overview

Parallelization runs multiple LLM calls simultaneously rather than sequentially. It comes in two flavors. Sectioning splits a task into independent subtasks that can each be processed by a separate LLM call at the same time, then aggregates the results. Voting runs the same prompt multiple times and uses consensus or majority-vote logic to select the final answer. Both sub-patterns exploit the fact that LLM calls are IO-bound, so running them concurrently dramatically reduces wall-clock time while improving either coverage (sectioning) or reliability (voting).

How It Works

  1. Identify the parallelization strategy. Decide whether your task benefits from sectioning (different subtasks) or voting (same task, multiple attempts), or a combination of both.
  2. For sectioning: decompose the task into independent parts. Each part must be self-contained — it cannot depend on the output of another parallel branch. Assign each part its own prompt.
  3. For voting: define the prompt that will be run multiple times (typically 3-5 runs). Decide on an aggregation strategy: majority vote, highest-confidence answer, or LLM-based adjudication.
  4. Dispatch all calls concurrently. Use async/parallel execution to fire all LLM calls at once.
  5. Aggregate results. For sectioning, merge the outputs into a unified result. For voting, apply your consensus logic to select the final answer.

When to Use

  • Sectioning: The task has independent components that do not depend on each other (e.g., analyzing different aspects of a document, running multiple safety checks on the same content).
  • Sectioning: You need to reduce latency for a multi-part task by eliminating sequential waits.
  • Voting: The task has a verifiable answer but a single LLM call is unreliable (e.g., classification tasks with borderline cases).
  • Voting: The cost of a wrong answer is high, and you want to increase confidence through redundancy.

When Not to Use

  • The subtasks have dependencies on each other — use Prompt Chaining instead.
  • The task requires dynamic decomposition that cannot be determined in advance — use Orchestrator-Worker instead.
  • A single LLM call already produces reliable, high-quality results — adding parallelism adds cost with no benefit.
  • You are budget-constrained and cannot afford the additional API calls that voting requires.

Example

# Parallelization (Sectioning): Content moderation with multiple independent checks.

import asyncio

async def check_toxicity(content: str) -> dict:
    """Check content for toxic or hateful language."""
    response = await llm.async_call(
        system="You are a toxicity classifier. Respond with 'safe' or 'toxic' and a confidence score.",
        prompt=content
    )
    return {"check": "toxicity", "result": response.text}

async def check_pii(content: str) -> dict:
    """Check content for personally identifiable information."""
    response = await llm.async_call(
        system="You are a PII detector. List any PII found, or respond 'none'.",
        prompt=content
    )
    return {"check": "pii", "result": response.text}

async def check_factuality(content: str) -> dict:
    """Check content for obviously false or misleading claims."""
    response = await llm.async_call(
        system="You are a fact-checker. Flag any claims that are clearly false.",
        prompt=content
    )
    return {"check": "factuality", "result": response.text}

async def moderate_content(content: str) -> list[dict]:
    """Run all moderation checks in parallel and aggregate results."""
    results = await asyncio.gather(
        check_toxicity(content),
        check_pii(content),
        check_factuality(content),
    )
    return results

# Run the parallel checks
results = asyncio.run(moderate_content("The user-submitted article text goes here."))
flagged = [r for r in results if "safe" not in r["result"].lower() and "none" not in r["result"].lower()]
  • Prompt Chaining — Use chaining when subtasks have sequential dependencies; use parallelization when they do not.
  • Orchestrator-Worker — When the set of parallel subtasks must be determined dynamically rather than defined in advance.
  • Evaluator-Optimizer — Voting is a lightweight alternative to a full evaluator loop when you need higher reliability without iterative refinement.