Validating Addresses in an Unbounded Namespace

by J.D. Falk
Director of Product Strategy

Earlier this week, we wrote about the expansion of top-level domain names, and the decreasing importance of domain names to users looking for web content.

Though they don’t always realize it, accuracy in domain names is important to end users when it comes to their email addresses — and it’s equally important to anyone who collects email addresses, for any purpose.

Mistyped email addresses can have far-reaching consequences. One of the canonical examples is the story of Nadine, where someone input the wrong address when signing up at a sweepstakes site in 2001 — and the owner of the domain name still receives more than 70 spam messages addressed to her every day.

Closer to home, there’s somebody out there named James Falk (no relation to me) who occasionally gives sites my email address, which I’ve had since 1996 or ’97. One of those legal document email sites (a competitor to our partner RPost) even sent me what appeared to be the lease to his new house! There was no confirmation (or “double opt in”) step, no “this is not me” link, no way to unsubscribe. Often, these sites will happily send all sorts of personal information — except, unfortunately, his actual email address so I can inform him of his mistake.

In both of these cases, a user typed in the wrong address at a valid domain. There’s no way to gather statistics, but I’m sure it’s far more common that typos point to entirely invalid domains: yahoo.cmo, or returnpath.nett. These can be caught in software, but it’s still not as easy as it looks.

Consider a regular expression such as:


That would match email addresses at the original six generic top-level domains, or gTLDs. It wouldn’t match two-letter country code TLDs (ccTLDs), but there are hundreds of those, so let’s include them more simply:


For those who can’t read regular expressions, this means: from the start of the line, match any number of any characters, then an @ symbol, then any number of any characters, then a . symbol, then either: one of com, edu, gov, int, mil, net, or org, or two characters — after which the line ends.

But now there are more gTLDs. If they were all three characters long, it’d be easy — but they’re not, so we’re left with:


And with ICANN poised to add more soon, that list will keep getting longer — requiring constant maintenance just for this deceptively simple and woefully incomplete email address checking algorithm.

Why incomplete? For one thing, it only tells you that the domain might exist, not that it does. It also lets through all sorts of characters which aren’t valid, and has nothing to prevent SQL injection or similar attacks. Seriously, it’s just an example, don’t use it.

Last year Steve Atkins wrote about the legal components of an email address (which are far more limited than my simple regular expressions here), and gave a list of things to check to make sure an address is valid. His list is a lot longer and more accurate than my example above, and should be proof against the now-unbounded TLD namespace.

I still see web sites from time to time which do their address verification poorly, usually disallowing things that should be allowed and allowing things that could never be valid. Using standard PHP, Perl, or JavaScript address checking libraries helps, but you have to keep them up to date — just like if you wrote it yourself.

And, remember: just because a domain is valid, and the email address is correctly formed, doesn’t mean that there isn’t a spam trap on the other end — or that the recipient, if there is one, wants to receive your email. Luckily, there’s an easy way to ask — and to make sure the address is valid at the same time.

It’s just a shame that invalid addresses can’t be caught as easily in software anymore.

minute read

Popular stories



BriteVerify email verification ensures that an email address actually exists in real-time


The #1 global data quality tool used by thousands of Salesforce admins


Insights and deliverability guidance from the only all-in-one email marketing solution

GridBuddy Cloud

Transform how you interact with your data through the versatility of grids.

Return Path

World-class deliverability applications to optimize email marketing programs

Trust Assessments

A revolutionary new solution for assessing Salesforce data quality


Validity for Email

Increase inbox placement and maximize subscriber reach with clean and actionable data

Validity for Data Management

Simplify data management with solutions that improve data quality and increase CRM adoption

Validity for Sales Productivity

Give your sales team back hours per day with tools designed to increase productivity and mitigate pipeline risks in real-time