CAPTCHA Limitations of Bot Mitigation
An essential part of the technological evolution is creating systems, machines and applications that autonomously and independently create, collect and communicate data. This automation frees information technology folk to focus on other tasks. Currently, such bots generate more than half of the internet traffic, but unfortunately every evolution brings with it some form of abuse. Various ‘bad’ bots aim to achieve different goals, among which are web scraping, web application DDoS and clickjacking. While simple script-based bots are not much of a challenge to detect and block, advanced bots dramatically complicate the mitigation process using techniques such as mimicking user behavior, using dynamic IP addresses, operating behind anonymous proxies and CDNs, etc.
CAPTCHA means “Completely Automated Public Turing Test to tell Computers and Humans Apart”.
As bots are growing in sophistication, they are a complex threat for many site operators. Solutions in the market today leverage several techniques that each has its own limitations, in particular those that are based on IPs like blacklisting. Others include challenges during the handshake but these can be tricked too. In websites, one of the most popular forms of making a distinction between bots and human is captcha. Unfortunately, it does not deliver the required level of protection compared to the more advanced device fingerprinting approach.
Is Captcha Failing the Test?
Let’s try and look at it from a bot’s point of view. If I am a bot and I want to bypass a fingerprint mechanism, the effort is much greater and complex than bypassing a captcha. First, captcha requires human intervention and cooperation. We all know how unpleasant captcha can be – sometimes we must answer a question by clicking pictures, by typing a word or numbers that have been distorted, or even sometimes by typing a word heard from a soundtrack. Bots become more and more sophisticated and use new technologies to bypass captcha, like speech-to-text technology that write the word played on a soundtrack. A little over a year ago, Google presented reCAPTCHA. reCAPTCHA is more resilient than the classic captcha, however it still requires human interaction (to check the “I’m not a robot” box), and also has limitations in some use cases. reCAPTCHA is presented as though the request is coming from a legitimate user and not a bot. Multiple factors are used to identify the bot before presenting the reCAPTCHA: Number of reCAPTCHA per domain / per browser, simultaneous requests per day and more). And then the reCAPTCHA is presented.
Not only does the human interaction slow down the communication and possibly lead to false positives, but both contribute to a bad customer experience, which is the last thing a website owner wants.
No User Interaction
The device fingerprint is a transparent mechanism that does not require any user interaction. It can be used early in the process as a detection mechanism rather than only as a mitigation/escalation mechanism during a bot attack. To uniquely identify an attack source (and create a fingerprint) the first step would be a JS challenge-response. Let’s imagine that the bot does not answer the challenge, then the fingerprint mechanism will immediately block the bot. If the bot is sophisticated enough to answer the challenge simulating a human behavior, the challenge will provide cues that every user unwittingly provides. These cues are compiled to provide a unique fingerprint that will identify the source and track its activity. All of the cues are very hard to spoof. Only the high-end, most sophisticated bots can reproduce real human behavior such as mouse movement or keyboard typing.
Captcha doesn’t challenge the source – it only adds a manual step to pass the security. As explained above, bots today can easily bypass captcha. A very simple Internet search will present websites where anyone can buy tools to bypass it. The transparent device fingerprint tracks sources and fully protects against malicious activity. Source activity tracking that is only based on an IP address usually leads to a high false positives rate, since bots can dynamically change their IP addresses, and evade solutions that rely on signatures or another form of IP-based filtering. The challenge mechanism identifies the source uniquely.
Bot Management Requires Revealing the True Identity
Device fingerprinting will block the vast majority of the bots, just because they cannot respond to the challenge. Bots that pass the fingerprinting mechanism (like ‘good’ bots such as Google’s crawler for instance), will be tracked based on their behavioral attributes, to detect and block them if turned out to be engaged in malicious activity. For this purpose, each source (a single HTTP client identified by the fingerprinting algorithm) gets a certain score according to its conduct. The fingerprint is necessary not only to determine whether a certain request is human vs. a bot, it is also required in cases of fraud attempts to point at a real user vs. stolen account credentials.
A combination of challenging the bot, device fingerprinting and activity tracking provide a successful bot management and accurate bot-attack mitigation.
Device Fingerprinting allows unique source identification, which contributes to detection accuracy. It mitigates multiple attack types: HTTP parsing, SQL Injection, XSS, Brute Force, Session hijacking and DDOS Layer 7 attacks like SlowLoris and others.