Operations

Catering Orders by Phone: Why They Break Every Voice AI System (Including Ours)

May 28, 2026 · 7 min read

Abstract branching order tree visualizing the complexity of catering order modifier paths

Last fall we got a support ticket that read, approximately: "Your system took a catering order for 80 people and confirmed it, but the order summary was completely wrong. The customer showed up and half the food wasn't what they ordered." That ticket kicked off three months of rethinking how we handle a category of calls that doesn't get enough attention in voice AI discussions: catering.

We're writing about this not because we've solved it — we haven't, fully — but because we've learned enough that it's worth sharing with operators who are thinking about whether voice AI can handle their catering line. The short answer: not the way it handles your regular ordering line, and here's why.

How Catering Orders Are Structurally Different

A regular phone order at a QSR location involves one customer, ordering food for immediate consumption, with a menu they've probably seen before. The interaction is typically under three minutes, involves fewer than five line items, and has a low stakes error environment — if something is wrong, the customer is there in person to fix it.

A catering order is a different transaction class:

High item counts with quantity multipliers. "Twenty chicken sandwiches, half of them no pickle, ten burgers with extra cheese, thirty sides of fries" is a single order with six line items and multiple modifier sub-specifications. The branching factor is 10–15x a regular order.
Future scheduling. The order is typically placed days in advance. The customer needs a pickup or delivery time confirmation, a callback if there are availability issues, and often a reference number for corporate expense purposes.
Negotiation and substitution. Catering callers frequently ask questions: "Do you do a bulk pricing?" "Can we substitute the standard sides for a specific item?" "What's the minimum order for delivery?" These are not yes/no questions and they're not handled by the standard order flow.
High-stakes error environment. A wrong item on an individual order is a minor inconvenience. A wrong item on 20 of 80 sandwiches for a corporate lunch is a reputation event. The customer isn't there to catch the error at the counter.

Voice AI systems built around individual order flows assume a particular conversational structure: greet, take item, apply modifiers, confirm, close. Catering calls don't follow that structure. They meander. The customer checks prices, asks about capacity, places the main order, then circles back to change an item they forgot, then asks about a discount they heard about from another location.

Where Our System Failed

The incident that generated the support ticket above happened because our order confirmation logic was designed to summarize the last coherent order state it could construct from the conversation. On a complex 20-minute catering call with multiple mid-call edits, the "last coherent state" logic built a summary that missed a round of corrections the customer had made in the final five minutes.

We had three compounding failures:

State management across long calls. Our conversation state model was optimized for sub-five-minute interactions. A 20-minute catering call pushed it into territory where mid-call edits weren't being applied correctly to the running order.
Modifier inheritance at quantity. When a customer says "make ten of those no pickle," the modifier needs to apply to a subset of a previously confirmed quantity. Our modifier logic was built for single-item specification, not partial-batch modification.
No explicit order review step. For a regular order, the confirmation read-back is the review. For a catering order, the customer needs to hear a structured summary — by item category, with quantities and modifiers — before committing. We didn't have a different confirmation flow for high-item-count orders.

What We Changed (and What We Still Can't Automate)

We've made three architectural changes specifically for catering calls:

First, we now detect catering calls by item count threshold — when an order exceeds eight line items or the total item quantity exceeds fifteen, the system switches to a different conversation mode with extended state management and a structured read-back protocol. The caller hears "Let me read that back to you by category before we confirm" and gets a full itemized summary they can correct before the order is placed.

Second, we rebuilt modifier handling to support quantity-scoped specifications. "Half of those no pickle" is now a parseable instruction that applies to 50% of the last referenced item quantity. We tested this against a library of real catering call transcripts before deploying it.

Third, we now automatically route all catering orders above a certain dollar threshold to a human review queue before sending them to the kitchen or generating a confirmation. The AI collects the order and structures it, but a human does the final confirmation call to the customer. This adds friction but eliminates the high-stakes error scenario.

What we have not solved is the negotiation layer. When a catering caller asks about bulk pricing that isn't formalized in the menu, or wants to substitute a menu item that doesn't have a configured alternative, the AI correctly escalates to a human. We're not trying to automate that conversation. The business logic is too operator-specific and the downside of getting it wrong is too high.

The Honest Recommendation for Operators

If catering represents more than 10% of your phone order volume, you should think of voice AI as a catering call pre-screener and structured intake system rather than a full-service catering agent. The AI can collect the order information, confirm standard menu items and pricing, and package the order clearly for human review — which saves your staff significant time compared to taking the full call manually. But a human should be in the loop for order confirmation on anything above a threshold you're comfortable with.

If catering is less than 5% of your call volume, the current system — auto-escalation at high item counts, structured read-back, human review above threshold — handles it reasonably well for the majority of calls. You'll still have occasional complex catering calls that require human intervention, but the system will identify them correctly rather than attempting to handle them and failing silently.

What We're Working On

The longer-term solution we're developing is a catering-specific conversation model that understands the multi-segment structure of a catering call from the start, maintains persistent order state across call pauses and edits, and generates a structured order draft that can be reviewed and confirmed asynchronously rather than in-call. Think of it as the AI taking a complete intake and then sending the customer a structured confirmation they can approve by text before the order is finalized.

We're being transparent that this isn't in production yet. What is in production — the quantity-threshold detection, extended state management, and human review queue — reduces the error rate on catering calls significantly compared to where we were. But we'd rather tell operators clearly where the boundaries are than oversell capabilities that will fail them on a $2,000 catering order.