Routing

Classify incoming requests and direct them to specialized handlers for optimized, domain-specific processing.

●●○○○ Complexity

Overview

Routing uses an LLM (or a lightweight classifier) to categorize an incoming request and direct it to a specialized handler — a dedicated prompt, a different model, or an entirely separate processing pipeline. Instead of building a single prompt that tries to handle every possible input type, you create focused handlers that each excel at one category. The router acts as a dispatcher, reading the input and deciding which path to take. This separation of concerns produces better results because each handler can be independently optimized for its specific domain.

How It Works

  1. Define the categories. Identify the distinct types of input your system needs to handle. Each category should map to a meaningfully different processing strategy (e.g., “billing question,” “technical support,” “general inquiry”).
  2. Build the router. Write a prompt (or train a classifier) that takes the raw input and outputs a category label. Keep the router’s job narrow: classification only, no answer generation.
  3. Create specialized handlers. For each category, design a dedicated prompt or sub-workflow optimized for that type of input. These handlers can use different system messages, few-shot examples, tools, or even different models.
  4. Dispatch the input. Based on the router’s classification, forward the original input to the appropriate handler.
  5. Return the handler’s output. The result from the selected handler is returned as the final response. Optionally, include metadata about which route was selected for observability.

When to Use

  • Your inputs fall into distinct categories that benefit from different handling strategies.
  • A single general-purpose prompt struggles to maintain quality across all input types.
  • You want to optimize cost by routing simple queries to smaller, cheaper models and complex queries to more capable ones.
  • You need clear separation of concerns for maintainability — each handler can be tested and updated independently.
  • Classification accuracy is high enough that routing errors are rare.

When Not to Use

  • All inputs are similar enough that a single prompt handles them well.
  • You have only two or three categories and the handling differences are minor — a single prompt with conditional instructions may suffice.
  • The categories are ambiguous or overlapping, making reliable classification difficult.
  • Latency constraints do not allow the extra round-trip for classification.

Example

# Routing: Customer support system that routes tickets to specialized handlers.

from enum import Enum

class TicketCategory(Enum):
    BILLING = "billing"
    TECHNICAL = "technical"
    GENERAL = "general"

def classify_ticket(message: str) -> TicketCategory:
    """Router: Classify the customer message into a category."""
    response = llm.call(
        system="Classify the customer message into exactly one category: billing, technical, or general.",
        prompt=message
    )
    return TicketCategory(response.text.strip().lower())

def handle_billing(message: str) -> str:
    return llm.call(
        system="You are a billing specialist. You have access to account and invoice data.",
        prompt=message
    ).text

def handle_technical(message: str) -> str:
    return llm.call(
        system="You are a senior technical support engineer. Provide step-by-step troubleshooting.",
        prompt=message
    ).text

def handle_general(message: str) -> str:
    return llm.call(
        system="You are a friendly customer service representative.",
        prompt=message
    ).text

HANDLERS = {
    TicketCategory.BILLING: handle_billing,
    TicketCategory.TECHNICAL: handle_technical,
    TicketCategory.GENERAL: handle_general,
}

# Run the router
ticket = "I was charged twice for my subscription last month."
category = classify_ticket(ticket)
response = HANDLERS[category](ticket)
  • Prompt Chaining — Routing is often the first step in a chain, selecting which downstream chain to execute.
  • Parallelization — When inputs could belong to multiple categories, you can run multiple handlers in parallel and merge results.
  • Orchestrator-Worker — For tasks where the set of categories is not fixed, an orchestrator can dynamically decide what workers to invoke.