Putting Email Obfuscation to the Test
A few decades ago, when spam was relatively new, spammers would collect emails by crawling the web. It isn’t difficult to write a script that downloads webpages, and follows all the links it finds, and goes through millions of pages a week (this was the late 90s). All through that journey, they would scan each page for email addresses (also a simple task). In this way they grew long lists of email addresses to send spam to. These were the emails of unwitting students and researchers publishing academic papers, sales people anxious for leads, and businesses large and small that wanted the public to be able to contact them.
Ever since, it is a terrible idea to publish your email address on the internet. That is still one of the best ways to get spam.
Over the years, we developed ways of publishing email addresses that would confound those crawlers, but every year they got smarter and figured them out. But how well do they work? It’s not easy to know unless you publish your own email address, which is what I’m doing in this post. Of course I'm not using my personal address, but instead some throw-away email addresses from free email providers that should be around for years.
The emails used in these tests may look scrambled, and they are in fact some random letters - I didn’t want guessable email addresses either, as that would skew the results. Email spammers don’t need to crawl the web to find addresses that are just known words. All they need to do is run through the dictionary and add @gmail.com to the end of every word and they probably have someone’s address. I’m sorry if your email address is aardvark at whatever dot com.
Testing 1 different ways to hide emails
(More to come)
- The JavaScript Scramble
-
This is a simple obfuscation where the original string has noise mixed in with it. A tiny JavaScript removes the noise and writes the original html to the document. This is an older method that relies on the idea that crawlers only load and parse the HTML, and do not run the JavaScript.
has been published on this page since February 26, 2022. Emails received so far: 0 (last updated 2022-03-01).
Advantages: You can scramble complex html and all of it is processed by the browser as though it was never scrambled.
Disadvantages: It requires a back-end script (php, ruby, etc) to scramble the html.
It is well known that the big crawlers like Google can and do run the JavaScript on pages that they crawl, but Google isn’t trying to create spam. Are the malicious crawlers sophisticated enough to run the JavaScript on this page? Let’s find out.