How to stop personal information getting into your Google Analytics

How to stop personal information getting into your Google Analytics

Screen with Javascript

Google has strict policies around passing data to Google Analytics that includes personally identifiable information (or PII). This would include any hits that contain:

  • User emails
  • User addresses
  • User phone numbers
  • User full names or usernames

Depending on your tracking set up, the most common place for these to accidentally show up is inside URLs. So how do we fix this? The best way to avoid any PII data in the account is to clean it before it is sent. If use use Google Tag Manager for all your tracking (and you should be!), this is quite easy as GTM allows you to make structured transformations to your tracking at scale. If you aren’t familiar with GTM variables, it would be a good idea to learn the basics of GTM first.

There is a good solution provided on Simo Ahavas blog, however, our solution is more generic since it will also work for Google Analytics 4 setups.

The solution is to create a tag that strips any PII from both the current page and the previous page (the referrer) so that GA4 only gets the masked versions.

  • Create a new tag, called something like “Mask PII”.
  • Set the trigger to be as early as possible: “Consent Initialization”.
  • Name the variable according to your naming convention, eg. “JS – Cleaned Page”.
  • Copy the content from the textbox below and save the tag.

Code to be pasted


<script>
	function maskEmail(x){
		if(x.includes("@")||x.includes("%40"))return "XXXX";	
		return x;
	}

	function checkKey(x){
		if(x.match(/email|phone|name|mobile/))return true;
		return false;
	}
		
	function processAtom(x){	
		if(x.includes("=")){
			//Key-value pair
			var key = x.split("=")[0];
			var value = x.split("=")[1];			
			if(checkKey(key))return key+"=XXXX";
			else return key+"="+maskEmail(value);
		}
		//Standard text
		else return maskEmail(x);
	}

	function processAtoms(x,delim){
		return x.split(delim).map(processAtom).join(delim);
	}

	function processURL(x) {
		if(x=="")return x;	//The referrer might be blank
		var url = new URL(x);
		var result = url.protocol +"//"+ url.hostname;
		result += processAtoms(url.pathname,"/");
		if(url.search != "")result += "?" + processAtoms(url.search.replace(/^\?/,""),"&");
		if(url.hash != "")result += url.hash;
		return result;		
	}

	function processURLs() {
		dataLayer.push({
			"url":processURL(document.location.href),
			"referrer":processURL(document.referrer)
		});
	}
</script>

What else do I need to do?

Every GA4 tag now needs to take the data layer variables called “url” and “referrer”. Create 2 data layer variables corresponding to those names and then in your GA4 configuration tag set the following fields, which will ensure every subsequent event on your page is using these masked fields:

  • page_location = {{The name of your data layer variable for url}}
  • page_referrer = {{The name of your data layer variable for referrer}}

If you have a SPA (single page application) you probably need to do a bit more work to ensure that the function to mask URLs runs each time there is a soft page reload and that it does this based on the real previous/next page. If you’re not sure how to do this feel free to contact us.

How to test this in GTM

  • Open up a preview session in Google Tag Manager (more info here if you need). When specifying which URL to open, append the test parameters to your home page domain URL, eg. “https://yourdomain.com.au/?testemail=test@gmail.com&phone=0400000000&test=testvalue”.
  • Click on your GA4 measurement ID and click on any event like “Page View”.
  • When you look at the parameters you should see that page location has the values masked: https://yourdomain.com.au/?testemail=XXXX&phone=XXXX&test=testvalue.
  • Navigate to another page and click on an event from that page in your GTM preview.
  • Check that the referrer field is the same masked URL, with the XXXXs.