16 min read
Published On: September 22, 2022

Abstract

With the approach of the 90 days given to Italian companies to adopt a GDPR-compliant alternative to Universal Analytics, now should companies look at Google Analytics 4? The present analysis aims at providing an overall perspective of the relevant matters concerning the GDPR’s compliance with GA4: designed for having “privacy at its core to provide a better experience” for both GA4 customers and website users, despite some changes in privacy settings, Google Analytics 4 still collects personal data (unique user IDs) and processes it outside the EU. Finally, Google Analytics 4 is still a product developed and maintained by Google, a U.S. entity and subject to the U.S. data surveillance laws such as FISA and the Cloud Act.

Citizens, entrepreneurs, and professionals are called upon for analysis worthy of the best intelligence in a thicket of interpretive and application problems. Indeed, in its decision, the Italian DPA identified the website operator (namely Caffeina Media S.r.l.) as the data controller and Google LLC as the processor concerning the personal data processed by Google Analytics: pursuant to Article 24 GDPR (accountability), the controller shall implement appropriate technical and organizational measures to ensure and to be able to demonstrate that processing is performed in accordance with this Regulation, including applicable international data transfers to the U.S. The concerns of web analytics use touch on issues of online user privacy, government use of personal information, and information on website user activity. The only certainty in all this uncertainty is that the European digital sector involves hundreds of thousands of companies and workers who suffer firsthand the situation of uncertainty that has arisen. The abstractionist exercise of European jurisprudence, and what is ensuing in terms of enforcement, leads us to the distortion of the principle of accountability: who should be in charge of assessing the adequacy of foreign countries? According to GDPR, at least this task should be left to the European Commission; unfortunately, however, Article 45 seems to work only in an additive-positive sense: if the Commission does not express itself, the foreign country stands in a grey area of doubt. And here is the extreme, distorted accountability: the assessment of the constitutional and legislative suitability of the destination state is left to each exporter (data controller or processor) and his advisors. Is it fair to expect companies to find a way out rather than Google? We wonder whether Google LLC can be de facto considered the data controller than the processor as its mission is to organize the world’s information and make it universally accessible and useful.

Google Analytics (known as Universal Analytics, hereinafter “UA”) is a service that can be integrated with websites (e.g. E-commerce sites) to measure the number of visits by Internet users. UA works by including a piece of JavaScript code on the pages of a website. When a user visits a webpage, this code triggers the uploading of a JavaScript file and then performs the tracking operation for UA by assigning a unique identifier to each website’s visitor. UA collects user data, including pages visited, browser information, operating system, screen resolution, selected language, date and time of page views, and the user device’s IP address, which are transferred to Google Analytics servers, all hosted in the United States.

One might therefore think that simply inhibiting the transfer of IP addresses overseas would be enough to overcome any objection. However, in this regard, it should be noted that online identifiers, such as IP addresses or information stored in cookies can commonly be used to identify a user, particularly when combined with other similar types of information (for instance the browser used, the date and time of navigation). This is illustrated by Recital 30 GDPR, according to which the assignment of online identifiers such as IP addresses and cookie identifiers to natural persons or their devices may “leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.”

No matter how hard Google struggled to find a solution, indeed a margin of uncertainty persists as to whether GA4 is effective in resolving the concern regarding data transfer to the US. Precisely, the targeted question to pose is whether the effect of (potential) user identifiability always requires as a necessary prerequisite the processing of the IP address associated with the other data, or whether the other data collected, notably when the website visitor logs in to her Google account, may result in the identifiability of the person, even when not associated with the IP address. The Italian DPA keeps silent on it.

The present analysis aims to envisage an overall perspective of the relevant matters concerning the compliance of GA4 to the EU Regulation n. 2016/679. In this regard, the guide “Analytics Help” made available by Google, gives us insights into how GA4 works. Some of the main innovations – the most suitable for this inquiry – will be briefly discussed below.

a) EVENTS
GA4 is a new property with an innovative approach. Events represent a fundamental data model difference between the oldest and the newest analytics properties. It is useful to take a step backwards: the activity of tracking visitor behaviour within the website is carried out through a snippet of code provided by Google and which becomes an integral part of the website itself. When a user visits the website, the said code is called up and then transferred and processed within the browser. Inside the fragment, there is always a unique tracking identifier for the website (Tracking ID or Measurement ID) attributed by Google itself.

The technology by which Google Analytics code is embedded on the website has changed: in the UA version it was a JavaScript code called within the HTML; GA4 uses a new technology called Google Tag Manager. Nevertheless, in both cases, once the website user visits a page with Google Analytics tracking code, a piece of code is automatically processed. This code transfers via TCP/IP and HTTP protocols a certain amount of information to Google’s servers sufficient – in most cases – to identify the visitor and here stands the concern for user privacy since this data is sufficient to make the Internet user identifiable, in particular when associated with the IP address.

Going back to events, GA4 is based entirely on them and user actions, rather than on sessions with a much higher level of granularity/detail than in the UA version. An event allows you to measure distinct user interactions on a website or app. For example, loading a page, clicking a link, and completing a purchase are all interactions you can measure with events. Each “hit” corresponds to an event. A session recorded by UA, conversely, is a group of user interactions with a website that takes place within a given time frame.

b) IP-ANONYMIZATION
It has been made clear on several occasions that, when collecting data, Google Analytics 4 does not log or store IP addresses. In the Analytics Help’s session dedicated to EU-focused data and privacy, we read that the IP of European users will no longer be stored; conversely, IP addresses will be exploited to determine the optimal local data center (i.e. the domains and servers located in the EU) by collecting approximate geographic location data, thus deriving the following particular metadata from IP addresses: city (plus derived latitude and longitude of the city), continent, country, geographic area, subcontinent (and ID-based equivalents). From this perspective, it, therefore, appears that IP addresses are merely processed in a volatile and temporary manner on EU servers to extract approximate geographical location data: the latter is the only data transferred for processing to servers in the U.S.

c) DISABLING DEFAULT FUNCTIONS
Another brand-new feature offered by GA4 consists of the option to enable/disable the collection of granular location-and-device data (collected by default) on a per-region basis. The administrator of the individual account, by disabling such collection, gives rise to the “generalization” of data, a technique that acts on reducing its granularity. Thus, less precise data are disclosed than the source data (the original data collected). In practical terms, once implemented the function, GA4 will not collect the following data: city, latitude and longitude (of the city), browser minor version, browser User-Agent string, device brand, model and name, operating system minor version, platform minor version, screen resolution.
However, Google warnsIf you disable the collection of granular location and device data for a region, then the modelled-conversion volume is significantly reduced for that region. Downstream conversion modelling and reporting in linked Google Ads and Search Ads 360 accounts are also impacted.

  • d) SERVER-SIDE TAGGING
    Server-side tagging allows a site administrator to move measurement tag instrumentation from its website or app to a server-side processing container on Google Cloud Platform (GCP), or any other platform of its choosing. Server-side tagging offers a few advantages over client-side tags: a) improved performance since fewer measurement tags in the website means less code to run on the client side, b) visitor data is better protected and more secure when collected and distributed in a customer-managed server-side environment. Data is sent to a cloud instance where it is then processed and routed by other tags.
    The French DPA in banning the use of GDPR published FAQs in relation to enforcement actions regarding the use of UA – not also GA4 -, as well as guidance on bringing audience measurement tools into compliance with the GDPR. However, CNIL acknowledged that the implementation of the above measures may be costly and complex.
    Among the potential effective solutions envisaged by the CNIL, there is the use of a proxy server, subject to conditions. CNIL highlighted that it must be ensured that a proxy server is hosted under conditions guaranteeing that the data it processes will not be transferred outside the EU. Additionally, the server performing the proxy will have to implement measures to ensure the following:
    • the absence of transfer of the IP address to the servers of the measurement tool;
    • the replacement of the user identifier by the proxy server;
    • the deletion of the referring site information external to the site;
    • the deletion of any parameter contained in the URLs collected (e.g. UTMs and URL parameters allowing the internal routing of the site);
    • the reprocessing of information that can participate in the generation of a fingerprint, such as ‘user agents’, to remove the rarest configurations that can lead to re-identification;
    • the absence of any collection of cross-site identifiers; and
    • the deletion of any other data that may lead to re-identification.

In this regard, the Google guide highlights that this is what can be precisely achieved by configuring Server-Side tracking in Google Analytics 4. Notably, one solution lies in using Google Tag Manager (GTM) Server-Side through manual configuration. With the above configuration, the GA4 script will send the data to GTM Server-Side where it will be possible to anonymize or remove partially or completely the personal data in the hits.

FINAL STATEMENTS
While regulators agreed in principle on a new framework in March, a formal framework likely will not be announced until the end of the year. Until then, website operators serving European consumers should be on notice that their analytics services may land them in hot water with regulators. And with fines reaching as high as 4 per cent of the business’s annual revenue, most organizations can’t afford to roll the dice.

Is Google Analytics 4 GDPR compliant? The short answer in my opinion, although with some uncertainties, should be yes. The use of GA4 has not been admonished either by the Italian DPA or by the various European Data Protection Supervisory Authorities – so far. In addition, it seems unlikely that in the short term any of them will release a judgement banning its use.

GA4 at this juncture may be considered compliant with the applicable law to the extent that we endeavour to recognize the mere omission of the IP addresses as a conclusive remedial to the problem of data transfer to the U.S. The foregoing concerns persist as, in all its analytics properties, Google itself owns the encryption key, and ousts personal data only upon receiving them. Companies that opt for the migration from UA to GA4 should be aware of the area of uncertainty and risk surrounding the option: it is technically unclear whether GA4’s volatile processing of the IP address in such a case makes feasible some form of storage – albeit temporary – of the address, therefore, U.S. authorities could access it (Google states that there is no such contingency).
To get the big picture, Google may be forced to transmit personal data of European website users to U.S. government agencies under the terms of applicable U.S. regulations, all the above without the awareness of the controller (i.e., the website operator). In this regard, recall that the Cloud Act allows U.S. authorities, law enforcement and intelligence agencies to acquire computer data from cloud computing service operators regardless of where they are located outside the U.S. The condition to be met is alternate that a) the operators are subject to U.S. jurisdiction, b) they are European companies with a subsidiary in the U.S. or c) they operate in the U.S. market. However, is still debated whether the hypothesis put forward can actually come true. If IP addresses are processed in the EU/EEA for a very short time, with their deletion almost simultaneous with the collection of geographic location data, then the possibility of a request for access by U.S. authorities would presumably be of no real consequence.
Overall, companies that collect data on EU residents may feel more secure to rethink their choices now, to prepare for any scenario. The most privacy-friendly approach would be to switch to an EU-based analytics platform that protects user data and offers secure hosting, ideally in an EU-owned data center such as Plausible, Piwik PRO, Visitor Analytics etc., or to opt for self-hosted and open-source analytics platforms such as Matomo and Umami. Hopefully, in the short-term Google will favour this approach by collecting data on servers hosted in the EU so guaranteeing that the data it processes will not be transferred outside the EU. This would be the preferable option: indeed, to date, Google Analytics 4 remains the best analysis tool on the market in terms of performance, without considering the free accessibility of the platform.

Recent Articles