Back to Blog
Engineering2026-06-027 min

Our transactional email said it sent for weeks — and delivered nothing

J

John C. Thomas

Founder, BlueWave Projects

Here is a failure mode that is almost worse than an outage: software that reports success while doing nothing.

For a few weeks, one of our products sent transactional email that never arrived. Welcome emails, billing receipts, form notifications — the API returned success every single time, the logs were clean, and not one message reached an inbox. We only caught it because someone asked, "hey, did that test email come through?" It had not. None of them had.

This is the post-mortem, because the root cause is a trap any team on a modern email API can fall into.

The symptom

The endpoint that sends a notification returned HTTP 200. The send helper wrapped the provider call in a try/catch, did not throw, and returned an ok response. Everything downstream saw success — the dashboard logged the event, the user got a friendly "message received" screen, and the email went nowhere.

A silent failure is worse than a loud one. A 500 gets investigated within the hour. A cheerful 200 that does nothing can run for weeks.

The root cause: an unverified sender domain

We send through Resend, but the lesson generalizes to SES, Postmark, SendGrid — any provider that verifies sending domains. To send from an address, the provider has to have verified that you own its domain. We had verified a subdomain dedicated to sending, mail.ourdomain.com. But an environment variable pointed the from address at the bare apex, [email protected].

The provider had no verification record for the apex, so it rejected every send with a 403: the ourdomain.com domain is not verified.

Here is the part that turned a one-line config typo into a weeks-long silent outage: the provider returned the rejection as a value, not an exception. Modern SDKs increasingly hand back an object shaped like data-or-error instead of throwing. Our code awaited the send and returned success without ever inspecting the error field. A thrown error would have hit our catch and shown up in the logs. A returned error sailed straight through.

So: unverified domain, provider returns an error object, code ignores the error object, success response, silent failure across every email path in the product.

Why "it worked in dev" did not save us

In development, sends are a no-op — no API key, or a dry-run flag. Nobody ever saw a real rejection locally. The first real send happened in production, where the failure was invisible by construction. The gap between "the code ran" and "the mail arrived" was exactly the gap we had no test for.

The fix, in two parts

First, point the sender at the verified domain. One environment change moved from to [email protected]. We proved it with a direct API call before trusting anything: sending from the apex returned a 403, sending from the verified subdomain returned a real message id. That five-minute check is the entire difference between "I think it works" and "it works."

Second, stop ignoring the provider's error value. The send helper now inspects the returned error and treats it the same as a thrown one — it logs, surfaces, and on the paths that matter, fails loudly instead of returning a cheerful 200. We also hardened the code's default from address, which used to point at a domain we did not even control; now the fallback is the verified sending domain, so a missing variable degrades to "still works" instead of "silently 403s."

Send from a subdomain on purpose

There is a second lesson hiding in the first. People reach for the clean apex address because it looks nicer in a from-line. For sending, a dedicated subdomain like mail.brand.com is the better practice, not a downgrade:

  • It isolates your sending reputation from your root domain. If a campaign ever gets you flagged, it does not poison the domain your customers type into a browser.
  • The provider's required DNS records — the DKIM key and a return-path — live on dedicated subdomains and do not collide with your apex's existing SPF or inbound mail routing.
  • DMARC still aligns. The DKIM signature is signed as your root domain under relaxed alignment, so mail from the subdomain passes.
  • The apex is not a spam-filter win. Alignment, sender reputation, and engagement are what land you in the inbox. Send from the subdomain and keep the apex clean.

    What I would tell another team

  • A 200 is not a delivery. Verify the outcome, not the call. For email, that means at least one real send to a real inbox through the verified domain after every change to the sender or its domain.
  • Check the value, not just the exception. If your SDK returns an error field instead of throwing, an unchecked error is a silent failure waiting to happen. Give returned errors the same weight as thrown ones.
  • Make the fallback safe. Hardcoded defaults should point at infrastructure you actually control. A default that 403s is a landmine for the day someone forgets the variable.
  • Test the gap your dev environment hides. If sends are mocked locally, your only real test is in production — so build that test deliberately instead of discovering it by accident.
  • The bug was a one-line config mistake. The damage was multiplied by an API shape that fails by returning instead of throwing, and a code path that trusted the call instead of the outcome. We run a lot of small products on a small team, and the discipline that came out of this — verify the outcome, never the 200 — is now wired into every send path we ship.

    If you want the kind of team that finds the silent failures before your customers do, [come say hello](https://bluewaveprojects.com/booking).

    More from BlueWave