So, at work today there was some discussion about email validation by regex, because a third party provider couldn’t accept one of our customers email addresses because it contained a Spanish ñ character.

Too many email address regexes assume that only ascii characters will be present in emails, and thus fall foul of more recent innovations which allow UTF8 characters.

Mind you, even those that are just checking for ascii characters arguably fall short of decent validation, as Phil Haack makes clear in his old blog entry here http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx

Who knew that there were so many potential valid (but effectively unusable) variants on email addresses?

My attention was drawn to the note about gmail addresses though – that if you are someone@gmail.com you can provide an email address in the form someone++mytag@gmail.com and it will be successfully routed to you – but with the advantage that if you get an email from someone unexpected you can find out from the ++mytag which you set who it was who allowed your email to be passed on.