Resources

Blocking Bot Traffic and Excluding it in Reports

Resources

Websites around the world are prone to different kinds of bot traffic every day. From a business perspective, it is important to understand that bot traffic can cause harm to your company’s website and can make your reporting and analytics inaccurate.

What is bot traffic?

Bot traffic is automated web traffic generated by bots, or computer programs and scripts. These bots can be malicious or non-malicious depending on their creator or user. They can find your website and visit it in order to just crawl it, scrape content off it, post spam comments, commit ad fraud, or even carry cyberattacks such as credential stuffing or distributed denial of service attacks (DDoS). Bots can be administered from a single computer or from a large network of computers such as a botnet (often made up of hacked computers). Bots can disguise themselves pretty well by pretending to be humans and they can even route themselves through residential IP addresses through anonymizing proxies.

Bot traffic is huge. Some estimates put it as equal to natural human-generated traffic. There is also confusion and overlaps between bot traffic and dark traffic. Dark traffic is web traffic that is significantly large and missing any sort of attribution. It shows up in your analytics as “Direct” but leaves you thinking – there can’t be that many people who have typed the URL into their browsers. Dark traffic comes mostly from the deep web protected by privacy tools and stripped of any information on referral source, and it is a growing concern for digital marketing.

How can bot traffic be identified?

In general, non-malicious bots are easily identified through certain behaviours and fingerprints. For instance, some of them don’t load JavaScript or don’t maintain a session cookie, and most of them happily identify themselves as bots in their User Agent information and obey the robots.txt instructions you have setup on your website. Examples include search engine crawlers. You don’t have to worry about these bots because they are friendly, and they are either excluded by default or you have very easy ways of excluding them or blocking them if you wish.

Malicious bots on the other hand try to mimic human behaviour, and they continue to evolve and become more sophisticated. On a basic level they can pose as a human using a specific browser mentioned in the User Agent request header, connecting through a home IP address, and having a normal session with a random duration spanning multiple pages. On a more sophisticated level they can fake page scrolls, video views, and even mouse movements. This type of bot traffic is not easy to identify.

The Interactive Advertising Bureau (IAB) maintains an international list of known spiders and bots, mostly crawlers and commercial bots, and this list is accessed by major analytics software such as Google Analytics, Adobe Analytics, and HubSpot Analytics. As a result, all these tools can take advantage of the list to help you exclude bot traffic in your reporting. This helps take care of a significant chunk of bot traffic, but certainly not all of it. The IAB list is based mainly on the User Agent information. The IAB does not keep track of IP addresses and it is not in their line of business to do any cybersecurity intelligence to detect malicious bots.

Can malicious bots be blocked?

There are numerous solutions dedicated to protecting businesses from bots, and the continue to get more powerful and sophisticated as they leverage artificial intelligence.

One simple action you can take today to protect your website from malicious bot attacks is to use CloudFlare and take advantage of its DDoS attack mitigation and its Web Application Firewall (WAF) which protects from certain targeted attacks.

Depending on how serious the situation is to your business you can go further and invest in dedicated solutions such as CloudFlare Bot Management, DataDome, Imperva, etc.

Why is it important to exclude bot traffic in reporting?

Apart from the obvious fact that bot traffic is meaningless traffic, it can mess up your numbers badly. In a lot of cases, bot traffic will fluctuate from one month to another, and it will have a 100% bounce rate and an average session duration of less than a second. Imagine what this does to your month on month changes, average bounce rate, and average session duration figures.

Excluding bot traffic in analytics and reporting is essential to produce reliable data that you can use in your marketing decisions and activities.

How to exclude bot traffic?

In addition to excluding all known bots listed by the IAB, you can take extra steps to analyze your traffic and identify suspected bots to exclude by applying custom rules and filters. This may not be a very reliable or practical measure, but it can in many cases help you find obvious culprits and take them out.

In Google Analytics

Bot Filtering in Google Analytics is available on the View level. Go to View Settings and check the box that says “Exclude all hits from known bots and spiders”. This would take care of all the bots listed by the IAB.

To identify suspicious traffic, look for patterns and fingerprints such as:

  • Outdated or “Not set” browsers and operating systems, Linux operating system, “0-bit” screen colours, “Not set” screen resolutions, “No” Java support – under Audience > Technology > Browser & OS
  • Unreasonably high traffic from a single service provider with 100% bounce rate and 00:00:00 session duration (e.g. Microsoft Corp) – under Audience > Technology > Network [Unfortunately this is no longer available as Google dropped the “Service Provider” dimension data early February 2020]
  • Countries you never expect traffic from – under Audience > Geo > Location
  • Unreasonably high traffic from a single source – under Acquisition > All Traffic > Referrals

Once you are certain about the bot traffic you want to exclude, take note of the dimensions and values you used to identify it. You can then use filters or segments to exclude the selected traffic. The main difference between a filter and a segment is that a filter will cause Google Analytics to stop counting that traffic completely until the filter is removed, while a segment ensures that Google Analytics counts all the traffic but only displays the relevant data after excluding unwanted traffic. We recommend using segments over filters to avoid losing any data.

  • Filters are added on the account level by going to All Filters under Account in the Admin section. You can use a set of predefined filters from Google such as ISP domain, IP addresses, subdirectories, and host names, or you can create custom filters using any dimension available from Google Analytics.
  • Segments are created on the View level and they are available on almost every section within Google Analytics where you’ll see an +Add Segment option at the top next to the default All Users segment.

In Adobe Analytics

Just like Google Analytics, Adobe Analytics will allow you to exclude traffic from IAB’s list of bots as a first line of defence. In your Adobe Analytics head over to the Admin tab and go to Report Suites. In the Report Suite Manager go to Edit Settings > General > Bot Rules and check the box that says “Enable IAB Bot Filtering Rules”.

Adobe Analytics also allows you to set custom rules to filter out bot traffic, but these rules can only be defined using IP addresses or the User Agent request header. It is unlikely that you will use the IP address filters too often because it would require you to identify the IP addresses of suspected bots. Granted, there is flexibility in using IP addresses in the rules so you can use wildcards (e.g. 52.108.44.*) and you can also specify IP address ranges (e.g. Start: 20.36.32.0 End: 20.36.32.19).

On the other hand, the User Agent filter can be quite useful because you can use a “contains” condition and add words like “bot”, “spider”, and “crawler” to eliminate a lot of bot traffic. This is an advantage for Adobe over Google which doesn’t import the User Agent request header by default (a workaround using Google Tag Manager is possible by creating a custom Javascript variable with the navigator.userAgent property)

To go a step further in determining bot traffic to exclude in reporting you can use segments based on data dimensions in a similar fashion to Google Analytics. In the Analytics Workspace, look for outdated browsers, “unknown” operating systems and browser versions, Linux operating systems, countries you never expect traffic from, odd hours, etc. You would then apply the exclusion segments in the Virtual Report Suite when producing your reports.

In HubSpot Analytics

When it comes to blocking bot traffic HubSpot Analytics provides you with the basic tools to filter out bot traffic from your reports. Just like Google and Adobe, HubSpot allows you to filter out activity of all known bots. It doesn’t say it’s based on IAB’s list but it most likely is.

To enable bot filtering in HubSpot head over to your settings and go to Reports > Tracking Code > Advanced Tracking and tick the box that says “Bot filtering”.

In addition, HubSpot will also allow you to exclude traffic from specific IP addresses or ranges of addresses, as well as from specific referring domains. This can be done from the same section under Exclude Traffic. These filters are just like Google Analytics’ filters, they effectively avoid recording such traffic and result in deletion of data.

There is no way in HubSpot to apply segmentation to analytics based on data dimensions like it’s done in Google or Adobe, and so there is no advanced way or tools to identify bot traffic for exclusion. It is important to remember that HubSpot’s analytics is more of a useful component within an integrated suite of marketing tools than it is a standalone analytics product. Like all the other components it was designed for ease of use and intended to be mostly self-service without requiring much technical knowledge.

Featured Image Credits:

From Walt Disney’s animation movie Big Hero 6, the Yokai (a phantom in Japanese) commanding a swarm of microbots. The microbots invented by the movie’s main character Hiro Hamada, are micro-sized robots capable of linking and forging together to manifest themselves in whichever shape or function the creator has the mental capacity to command using a neurocranial transmitter. Despite being created for a good purpose, the movie’s villain Professor Callaghan steals them and starts using them as a dangerous weapon.

We take processes apart, rethink, rebuild, and deliver them back working smarter than ever before.