Search This Blog

Google Analytics, despite ad blockers

You have noticed that ad blockers also block Google Analytics in some valiant attempt to protect your privacy. If you're one of the normals then you use Analytics so you have some idea what is going on when people visit your website, not to create a vast database of users. Indeed Google Analytics itself anonymises some of its data to try to allay the fears of the truly paranoid.

You therefore need a way of tracking what is going on.

First, let's have a quick look at what is going on with the Analytics code.

Once you have created your property, you are told to add a code snippet to your pages:

(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create','UA-12345678-90','auto');
ga('send','pageview');

This delightful bit of script sets up the ga queue so that it can be immediately used, and then attempts to load the rest of the goodness asynchronously. Once fully loaded the data can be sent to the GA server.

If your visitor has an ad block enabled then the server is immediately blocked. The queue will continue to fill but the data will not have anywhere to go.

The solution, then, is to give the data somewhere to go. You can't simply send the data to the IP address of a server because they (appear to be) virtually hosted (on IPv4 anyway). As the user is already on your site then we must assume that your server is not being blocked. We therefore need to hijack1 the data and channel it through our own server.

In this example I have used PHP, but any server-side scripting should be able to handle it.

<script type='application/javascript'>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create','UA-12345678-90','auto');
ga('send','pageview');
</script>

<script type='application/javascript'>
function loadbackup(){
  $.ajax( {
    url: '/include/backupanalytics.html',
    dataType: 'script'
  } )
}
</script>

<body onload="setTimeout( function() {if (ga.loaded != true) { loadbackup() }}, 2000 )">

And in /include/backupanalytics.html:

<?php
  $analytics = file_get_contents('https://www.google-analytics.com/analytics.js');
  $hijack = preg_replace('/www.google-analytics.com/', $_SERVER['HTTP_HOST'], $analytics);
  echo $hijack;
?>

Now, what does this all mean?

The first part we have already discussed: the GA queue is created and the browser will attempt to load the rest of the code from the server, which will fail.

The second part instructs the server to pull in the Analytics code itself. As your server is talking directly to the GA server then the ad block has no say in what is going on. The code itself is pulled into a function only when it is needed so that it does not execute automatically. You don't want two Analytics scripts running. It also edits the script on the fly (via the backupanalytics.html page), removing www.google-analytics.com as the server and replacing it with your server.

The third part checks to see if the GA code has correctly loaded the first time. If it has failed then it was probably blocked, so the browser is told to go ahead and use the second code which came via our server.

The fourth part, in the <body> tag tells the browser that once the main body of the page has loaded, and after waiting 2 seconds, to run the code in the third part and hence see if the code was loaded properly the first time around.

If, at this point, the GA script still hasn't loaded then the backup script is. The queue then starts to empty as the data is being sent to your server.

We now need to make sure that your server knows why you're sending Analytics data to it, and what to do with it. Essentially, you need to reverse everything. This is fairly easy. As the script is now sending data to your-server.net/collect you will need a /collect to process it:

<?php

$uri = $_SERVER['REQUEST_URI'];
$urim = preg_replace('/\//','',$uri);

$url = 'http://www.google-analytics.com/'.$urim;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
curl_close($ch);

?>

All that's happening here is the data being sent to it is forwarded on to the GA server. Again, because the data is going via your own server, the ad block has no say in it. The data should arrive at the GA server fully intact.

Is it safe?

Probably not. There'll be some exploit I haven't thought of. In fact I haven't thought about it at all. When is PHP ever safe?
You should probably put some sort of safeguard in to make sure that your redirect is not exploited. Shouldn't be too hard. As always make sure your server settings are secure.

Does this violate the Analytics ToS?

There's no reason that it should. There's nothing underhand here. The data ends up at its originally-intended destination. Google have made the Analytics extremely versatile and this is one of many, many ways in which data can be sent to the service. The data is all processed in the same way, and it's your data.

How do I know that it is working?

You should set up a test property for these sorts of things. You can also add a custom variable to show up in Analytics.



Won't this increase the load on my server?

Of course it will. Test it. It's up to you whether or not you want to keep this missing data or not. If you can amend the code so that the backup script is inserted dynamically (after checking if ga.loaded == true) then feel free to do so.

Does this increase the load on the Analytics servers?

Technically: yes. For each page that loads, two copies of the script are pulled in. The load is therefore being increased 100%. However, the script is small and in both cases is more-than-likely being served by separate GGC servers. Ideally the code should be inserted dynamically when needed, but I haven't figured that one out yet.
Not in this version of the code. The GA servers are contacted only once to download the script, either way.

Is this method safe from the ad blockers?

No. Nothing is safe from the ad blockers. Indeed this can be manually blocked by any Thom, Ricky or Harriette. This is an arms race, not a permanent solution. There are many ways to work around the ad blockers and they're only effective until someone writes the code to block them. Of course nobody is going to be daft enough to write individual code for every single server implementation of a workaround.

Please let me know your elegant solutions to the ad blocker problem. I'm always keen to copy!

1“Hijack” is a strong word as it is your data and it was produced by your website. You're effectively proxying the data anyway.

No comments :

Post a Comment

My profile on StackExchange