April 4, 2026 • 8 min read
The most expensive line of code in AI-powered software isn't the one that calls GPT. It's the one that calls GPT when it didn't need to.
The Reflex Every AI Startup Gets Wrong
When you're building an AI-native product, there's an intoxicating moment early on where everything feels like a nail. Customer replied? Call AI. Lead went quiet? Call AI. Form submitted? Call AI. Someone sneezed near a keyboard? Believe it or not - call AI.
We did this too. And then we looked at the data from Foto Master - our first customer and the real-world proving ground where we stress-test everything we build.
Our AI engine processes leads every five minutes. Each reasoning call costs a credit. Even half-a-cent API calls add up to real money - and more importantly, real latency. Every unnecessary AI call is two to four seconds of processing time for a lead that didn't need it.
So we started asking a different question: when can we not call AI and still give the right answer?
The answer turned out to be: surprisingly often.
"Thanks, I'll Take a Look" Is Not a Sales Objection
Here's a scenario we see play out dozens of times a day at Foto Master:
- A rep sends a quote for a product
- The customer replies: "Thanks, I'll review it and get back to you"
- Our system detects a new inbound message and queues the lead for AI processing
Should we burn a credit to have GPT analyse that reply? Absolutely not. A human can tell in half a second that "Thanks, I'll review it" means wait. So can a few lines of code.
The logic is simple. We maintain two lists: acknowledgment phrases and explicit ask phrases.
Acknowledgment phrases include things like "thanks," "got it," "sounds good," "I'll review," "take a look," "discuss internally," "get back to you," and "think about it." These are the signals that someone received your message and is processing it. They're not asking for anything. They're telling you to hold.
Explicit ask phrases include question marks, "can you," "please send," "what," "when," "how," and "need." These are the signals that someone wants something from you right now.
If the customer's latest reply matches an acknowledgment phrase without containing an explicit ask, the system classifies it as "acknowledged, reviewing" - no AI call needed. The lead gets snoozed to Future Tasks with a stage-appropriate follow-up timer. Twenty-four hours for demos. Forty-eight for negotiations. Seventy-two for proposals.
This single check eliminates roughly 20-30% of AI calls based on what we see at Foto Master. The customers who reply "sounds good, let me review" are the most common responders. And not one of them needs GPT to tell you they're reviewing.
The Subtle Part: Reading Only What They Actually Wrote
There's a trap here that's easy to miss. When someone replies "Thanks, I'll take a look," their email client appends the entire previous thread below the signature. If you run pattern matching against the full email body, you'll find question marks and "can you" phrases in the quoted history - from messages the rep sent days ago.
So before any matching happens, we strip the quoted thread. We look for markers like "On [date] [person] wrote:" or lines starting with >, and we cut everything below them. Only the customer's actual new text gets analysed.
This is the kind of thing that takes five minutes to build and saves you from a category of false negatives that would silently erode trust in the entire system.
If the Threads Haven't Changed, Neither Has the Answer
Every lead in our system has email threads associated with it. The AI engine reads these threads to generate reasoning - classification, urgency, suggested actions, draft replies.
But here's the thing: most of the time, when the engine picks up a lead, nothing has changed since the last run. The same threads, the same messages, the same content. Calling GPT with identical input will produce a near-identical output. So why pay for it?
We built a thread fingerprinting system. Gmail assigns a historyId to each thread that increments whenever a new message arrives. Before making an AI call, we compute a fingerprint from every thread ID paired with its history ID, sort them, and hash the result.
If the fingerprint matches what we stored from the lead's last AI run, and that run was less than twenty-four hours ago, we skip the call entirely and reuse the previous result. Same input, same output - zero latency, zero cost.
This layer catches another 15-25% of would-be AI calls, particularly for leads in slower-moving stages where threads don't change daily. And because we check the hash before we even fetch the full thread content from Gmail, we also save the API calls to pull those threads.
CRM Webhooks Are Not Customer Inquiries
Foto Master connects external CRMs and lead sources that fire webhooks on field changes - stage updates, owner reassignments, call logs. These arrive as "form submissions" in our system.
Early on, the AI engine would dutifully analyse these as if they were fresh buyer inquiries. GPT would look at a payload containing fields like "stage: closed-won" and "owner: Sarah" and try to compose a sales email to someone who'd already placed their order. Not great.
The fix is a pre-AI filter that recognises CRM lifecycle events by their field signatures. If a submission contains two or more CRM lifecycle fields - things like assigned_to, pipeline, call_ended, stage, or owner - it's flagged as a relay, not a lead. No AI call.
If the submission contains a terminal stage value like "closed-won" or "lost," same thing. This isn't a buyer reaching out. It's your CRM telling you a deal already ended.
This is a smaller percentage of total calls, but it prevents the most embarrassing AI outputs - the kind where your tool drafts a cold outreach email to a client who already placed their order last week.
Auto-Replies and Bounces Don't Need Analysis
When a rep sends an email and the recipient's server responds with "I'm out of office until Monday" or "This mailbox is no longer monitored," that generates an inbound message in Gmail. Our system sees it as activity on the thread and queues the lead.
But an out-of-office reply after an outbound message is not a customer interaction. It's a system message. We detect this by checking the message metadata: if the latest system message is an auto-reply that arrived after the rep's last outbound message, the lead skips AI reasoning entirely and goes straight into a "waiting for reply" state with an appropriate snooze timer.
Bounces are handled similarly, but more permanently. A bounced email means there's no inbox on the other end. The lead gets flagged with exclude_from_ai permanently, because there's no one to sell to. No point spending credits on a dead address.
Deterministic Follow-Up Timing
This is the one that surprises people most.
When a Foto Master rep completes a task and the outcome is "waiting on customer," the system needs to decide: when should we follow up? The intuitive approach is to ask AI. It knows the context, the deal stage, the customer's communication patterns. Surely it can pick the optimal timing.
We tried that. It cost a credit per scheduling decision. The AI's answers were... fine. But looking at the pattern across Foto Master's leads, they clustered tightly around predictable values. Demos got one-day follow-ups. Proposals got three-day follow-ups. Negotiations got two-day follow-ups. The AI was basically implementing a lookup table, but charging us for the privilege.
So we replaced it with an actual lookup table.
Consultation stage? Follow up in twenty-four hours. Demo? Twenty-four hours. Negotiation? Forty-eight hours. Proposal? Seventy-two hours. Everything else? Default to seventy-two hours.
Zero API calls. The follow-up timing is now deterministic, instant, and free. And because the timing feeds into a timezone-aware scheduling system that respects rep availability windows, the delivered result is actually more precise than what the AI was producing. The follow-up lands during working hours in the rep's timezone, not at 3 AM because GPT said "follow up in 2.5 days."
The Compound Effect
Each layer on its own is a modest optimisation. Together, they're transformative.
Here's what our AI engine does before it even considers calling OpenAI:
- Is this a CRM webhook, not a customer inquiry? Skip.
- Is the latest reply an acknowledgment with no explicit ask? Snooze, skip.
- Is the latest message an auto-reply or bounce? Handle deterministically, skip.
- Have the email threads changed since the last AI run? If not, use cache, skip.
- Is the follow-up timing the only decision needed? Use lookup table, skip.
Only after passing through all five gates does a lead reach the AI model. In practice, based on what we observe at Foto Master, 40-60% of leads that would have triggered an AI call don't. That's the difference between a sustainable cost structure and a margin crisis.
But the benefits go beyond cost:
Latency. A pattern match takes less than a millisecond. An API call takes two to four seconds. Leads that hit cache or deterministic paths get their tasks updated instantly. When a customer sends "sounds good," the system has already snoozed the lead and scheduled the follow-up before the rep finishes reading the email. Reliability. No API means no rate limits, no timeouts, no model degradation. The deterministic paths never go down. They never hallucinate. They never return a confident-sounding wrong answer. They just work. Predictability. When a Foto Master rep asks "why did the system snooze this lead?", the answer is "because the customer said 'I'll review it' and the deal is in proposal stage - standard 72-hour follow-up." That's more trustworthy than "the AI decided." You can explain it. You can audit it. You can predict what it will do next. Try saying that about GPT.When to Call AI (And When You're Just Showing Off)
After building these layers, we developed a simple heuristic for deciding whether a decision needs AI:
Call AI when the answer depends on meaning. What does this email thread imply about the customer's intent? What action would a skilled rep take here? What product is this person interested in based on the conversation? These require understanding context, nuance, and domain knowledge. Language models are genuinely good at this. Don't call AI when the answer depends on structure. Is this an acknowledgment or a question? Has the data changed? What stage is this deal in? How many hours until the next follow-up? These have deterministic answers derivable from the data itself. A pattern match, a hash comparison, or a lookup table will give you the same answer - faster, cheaper, and more reliably.The temptation in AI-native products is to route everything through the model because it can handle everything. But "can" and "should" are different questions. A language model that analyses "Thanks, I'll review it" and concludes "the customer is in a review phase, recommend following up in 72 hours" is doing a five-dollar lookup in a half-cent language model. You're paying for the wrapper, not the insight.
The Counterintuitive Takeaway
The smartest AI system we've built is one that calls AI less.
Not because AI isn't valuable - it absolutely is. Our reasoning engine, when it runs, generates genuinely useful action recommendations, draft replies, and pipeline classifications that sales reps love. The value per AI call is high precisely because we've filtered out the noise. Every credit spent goes toward a lead that actually needs intelligent analysis.
The AI calls that remain are the hard ones. The ambiguous emails. The leads with complex thread histories. The deals where the customer's intent isn't obvious from surface-level patterns. These are the decisions where a language model earns its keep.
Everything else? Pattern matching, hash comparisons, and lookup tables. Boring, fast, free, and correct.
Sometimes the best thing your AI can do is nothing at all.
This is part of a series on building AI-native sales software. Previously: AI Credits: The Greatest Markup Since Movie Theater Popcorn - why your CRM charges 400x the actual cost of an AI call.