Spam; once just a tasty lunchtime snack it’s now a curse on your inbox. It follows a long history of unwanted communication that includes junk mail and cold calling. Alexander Graham Bell probably had people cold calling him about his PPI’s. But where spam differs is that much of what is received is incomprehensible gobbledygook that appears to serve no purpose.
Spam can be divided into two forms; direct and form-based
Direct spam is the stuff that comes direct to your inbox from another email address. Your email provider will normally push the worst of this to a junk folder.
Form-based is where a website acts as a midpoint between the sender and recipient. In such cases automated spam bots traverse the internet looking for forms to drop information into and submit. Unlike direct spam a company like Bronco can implement techniques on a website to reduce the amount of spam that is submitted via a form.
The evil Captcha
The Captcha is a technique most people associate with spam prevention as it is the technique most visible to users. At times where a client will receive a lot of spam they may ask specifically for this technique, often forgetting the number of times they've had bad experiences in trying to decipher the answer.
For me a Captcha is a solution of last resort. Even the most capable and internet savvy people have issues with this technique so just imagine the problems those with poor eyesight or blindness have with them; not to mention other disabilities too.
Some Captcha's try to address these problems but what a Captcha is really saying to a user is "We're getting lots of spam we don't want. To stop it we're going to make things harder for you".
And by making things harder you risk damaging a websites conversion rate. This article details how Animoto saw a 33% increase on conversion when they ditched their Captcha.
So what else can be done?
At Bronco we're constantly reviewing how we deal with spam, understanding that even with Captcha it can be difficult to stop 100% of all spam but instead we look to reduce it to a more manageable amount.
We currently implement a combination of 4 techniques:
Reduce the number of forms
To increase its conversion rate a website may replace Call to Action buttons with the form that button would otherwise be linking to. At Bronco when we used this technique on our own website it led to a large increase in spam.
The increase was due to the fact that a spam bot would submit the form multiple times on multiple pages. These days we default to including only a single form as default and add additional forms as necessary.
By limiting the number of forms we work to improve the appearance of Call to Actions that navigate users to this form. In doing this we don't remove any functionality from the user but do necessitate an extra click.
It's likely that testing would prove additional forms provide a much higher conversion rate but by starting from this position we can always make changes when circumstances or data call for it.
This technique has been around quite some time and because of this it is less successful as spam bots are created to be aware and work around this technique.
The way this technique works is that an additional input field is added to the form but hidden from the user. On the assumption the user will never be able fill an input box they cannot see but a spam bot will fill whatever it finds we therefore only submit forms where this field remains blank. But as mentioned we have seen more spam get past this technique because either spam bots are able to determine what fields are hidden or the input box is named in such a way that the bot is not enticed to enter any information.
Even with our concerns that this technique has become less effective we do currently still include it as we have not yet got any accurate data on if this technique is successfully blocking spam or not.
Spam bots are speedy buggers and can fill out a form and submit it in less than a second. Humans on the other hand take some time, even on a short form it can take over 5 seconds to read the labels and type the relevant information.
So with this knowledge we can analyse the difference between the time the page loads and the time the form is submitted. In our case if that time is less than 5 seconds we mark it as spam otherwise we allow it to be submitted.
Even if we do think it's spam we'll still show a relevant error message just in case a user happens to have super-fast fingers. This allows them to understand why their information has not been submitted and try again.
Why send one email when you can send loads; this seems to be the mantra of the spam bot.
When spam bots do submit a form multiple times in quick succession the information submitted is usually exactly the same, especially the IP address. So on each successful submission we store the IP address of a user in a text file to compare against the next submission. If the two IP addresses match, and both submissions occur within a specific time period, we will mark it as spam.
We don't try to stop spam altogether as that's really difficult, instead we implement techniques to reduce spam. This technique will always allow spam through but works to stop any repetition.
Sometimes when you look through the spam you're receiving you begin to see a pattern. Maybe the same IP address or a certain field always containing the same keyword. If you begin to see a pattern then it may be possible to exclude submissions based on this, but you have to remember there are risks to this. If you have a form submission where the field ‘Name' is just random characters that you'd likely never see a real person enter then excluding based on this is fairly safe. But if that text is an actual name or word then you might find yourself blocking real submissions.
The same is true if you search for sub-strings. In the past with so much spam containing porn related terms many filtered out these types of terms; much to the annoyance of those in the Yorkshire town of Scunthorpe.
IP can feel a little safer, but with many homes and open networks sharing IP's this can be risky as you may also block genuine people who just happen to us using an IP you've blocked.
Exclusions are only short term fixes. They block an existing spammer that may have already stopped and may also never return, or at least not with the same details. These techniques are fairly ineffective over the long term.
A never ending battle
So far we've had good success rates with these techniques, with any spam largely coming from humans rather than bots. Unfortunately blocking humans is much harder as all the techniques above, including Captcha's, assume we're dealing with bots with limited intelligence.
In the future a central resource such as Akismet might be needed for spam protection. Currently used on most blogs Akismet is able to pool far more data to not only determine emails that look like spam but also catch spam that is are hitting a lot of websites in a short time period.