March 27th 2020

A bot is an automated piece of software that is programmed to carry out automated tasks on the internet. They generally do highly repetitive tasks, which would take humans many more hours to complete. Most commonly, bots crawl the internet fetching and analyzing information from web pages (the most typical example being search engine crawlers, which collect this information to help them decide which pages to rank for each query) and are not malicious in nature. However, some bots are used to scan for vulnerabilities that could allow attacks such as hacking.

What types of bots are there?

As we previously discussed, the most common bot is a search engine crawler, but there are loads of different types of bots you may encounter on the internet. Some, you may not even realise are bots at all!

  • Chatbots: talk to users online in a similar fashion to a human in customer support. For example, they may answer FAQs, take bookings or help you place orders.
  • Search Engine Crawlers (or “spiders”): use links on web pages to discover new pages and collect important information about each page it visits on its way.
  • Malicious bots: also described as malware, malicious bots will scan for vulnerabilities that give them access to steal your information, scrape and steal your content, post spam comments and more. Malicious bots are generally used by cyber criminals.
  • Social bots: post automated information on social profiles. E.g. they may tweet or post sports score updates, updates to the weather, even automated news bulletins.
  • Spambots: post spam content or comments online, on forums, social media platforms etc.

When are bots harmful?

It goes without saying, malicious bots that are trying to access sensitive information, steal your content or posting spam is likely to be harmful to your site. But there are cases where bots can be harmful, even when there’s no criminal intention. Any bot reaching your site will be making requests to your server, which will take up bandwidth, increase your server costs and may result in server errors, which can stop genuine visitors from being able to access your site.

Bot traffic can also skew precious analytics data, and it can be quite difficult to extract bot data from real user data in order to conduct accurate site analysis. This means that if you’re experiencing high levels of bot traffic that isn’t being correctly filtered out of GA, your marketing efforts may be in vain. Yikes.

One of the first things we do with new clients is recommend a full Google Analytics Audit to ensure that we have good quality, clean data to work with, and that any bot traffic is being filtered out correctly.

How can you spot bot activity?

Provided your Google Analytics account is set up correctly, most bot traffic should not appear in your account.

Traffic Changes

From time to time however, a new bot, not known to Google Analytics, may reach your site and begin to contaminate your data. You can spot bot traffic by looking for unexplained traffic spikes or abnormally high or very low bounce rates for a limited timeframe. Often, when you dig into this traffic using secondary dimensions, you will see that the spike is caused by a small number of “users” from a location that’s unusual to your site. Bot traffic is usually attributed to the Direct channel, so pay close attention to changing patterns in this channel if you suspect you’re experiencing a large amount of bot traffic.

Site Speed

Keeping an eye on your site’s speed over time can highlight any possible increase in bot activity. Even if the bots are being filtered out of Google Analytics, the impact they have on server performance, and therefore site speed, is often still evident.

Server Logs

Server logs are invaluable for analyzing all site traffic, bots included! If you have access to your site’s server logs, you will be able to see all requests made to your server over a set period, the method, what type of resource was requested and on what page, and the user agent the request and their IP address. This information makes it very easy to identify outliers and harmful bots, and differentiate them from useful bots like tools and Search Engine crawlers.

How can you reduce the impact of bots on site performance?

An easy way to limit the impact of bots on your site is to make it difficult for them to get in in the first place.

Requiring users to complete a CAPTCHA when submitting contact forms or comment sections is a great place to start.  Make sure you keep a close eye on your analytics to learn what’s ‘normal’ for your site, so you can spot any emerging patterns that may be suspect.

Robots.txt files can be configured to prevent bot traffic crawling your site. Bear in mind, however, that this isn’t failsafe. Robots.txt is a set of instructions, but bots can ignore these instructions if they are programmed to do so.

By analysing server logs, you should be able to identify a list of suspicious IP addresses, which can then be blocked from accessing your site using a tool such as a WAF.

Finally, if bot traffic is still having a significant negative impact on your site, you may want to consider a bot management tool. These use a range of techniques to identify malicious bot traffic and protect your site from attacks.