Google Previews the Coming Tussle Between GDPR and DMA Article 6(11)

Cite this Article
Mikolaj Barczentewicz, Google Previews the Coming Tussle Between GDPR and DMA Article 6(11), Truth on the Market (May 07, 2024),

Among the less-discussed requirements of the European Union’s Digital Markets Act (DMA) is the data-sharing obligation created by Article 6(11). This provision requires firms designated under the law as “gatekeepers” to share “ranking, query, click and view data” with third-party online search engines, while ensuring that any personal data is anonymized.

Given how restrictively the notion of “anonymization” has been interpreted under the EU’s General Data Protection Regulation (GDPR), the DMA creates significant tension without pointing to a clear resolution. Sophie Stalla-Bourdillon and Bárbara da Rosa Lazarotto recently published a helpful analysis of the relevant legal questions on the European Law Blog. In this post, I will examine Google’s proposed solution.

Google’s Solution

Google published a report in early March detailing how it has ensured compliance with Article 6(11), which requires gatekeepers to provide third-party online search engines access to “ranking, query, click and view data in relation to free and paid search generated by end users.” It also mandates that any personal data included must be anonymized.

According to Google’s compliance report, before DMA implementation, the company made some search data publicly available via Google Trends, which provides insights on query popularity based on real-time and historical data samples. To comply fully with Article 6(11), Google developed a new European search-dataset licensing program. Under this program, third-party online search engines can obtain a dataset covering “more than one billion distinct queries across all 30 EEA countries.” 

The dataset includes query, click, view, and ranking data. To meet the anonymization requirement, Google applies frequency thresholding. They are also working on additional mechanisms to add data that doesn’t pass the frequency thresholds in a privacy-safe manner.

Google representatives provided more details about their solution during a DMA compliance workshop organized by the European Commission (which I have discussed previously):

At a high level, what we’ve come up with are packages of a data set that will be released quarterly that represents approximately 1 billion distinct queries across 30 EEA countries. So that is about 1.5 terabytes of anonymized search data per quarter with around 53 billion rows that cover the query string itself and then extensive information on the Google results shown in their ranking.

They further elaborated on the anonymization methods used:

We went through a variety of methods for looking at anonymization, worked with privacy experts, and we came up with the frequency thresholding method to protect personal data. The data set is going to include all queries that have been entered at least 30 times by signed-in users globally over the past 13 months before the end of a relevant quarter.

Licensees can pay Google for the entire European Economic Area (EEA) dataset, or for subsets based on their target market. Pricing starts at 3 euros per thousand distinct queries, with discounts for smaller subsets. The data is delivered quarterly in JSON format via Google Cloud.

To be eligible for the program, an applicant must be an online search engine operating in the EEA, have a track record of safeguarding user data, be financially viable, and have no connection to non-EEA state actors.

While Google claims this solution’s development involved an interactive process with other search engines, some competitors like DuckDuckGo have expressed dissatisfaction with the outcome. Google responded by emphasizing the challenges in balancing data utility with privacy requirements, particularly for smaller regions where the combination of query, country, and URL data could lead to exposure, even with anonymization measures.

Assessing Google’s Solution

Google appears to have put significant effort into developing an anonymization solution that aims to balance the utility of shared search data with its users’ privacy. Their approach has several positive aspects, from a privacy standpoint.

Using frequency thresholding to only include queries entered by a minimum number of users globally over an extended time period reduces the risk of a query being unique to, and identifying of, a single user. Google also applied an additional threshold requiring a minimum number of signed-in users per country and device type when combining query data with metadata like ranks, clicks, and views. They found this allowed them to share four times more data than just using the global threshold.

The operational measures that Google proposes also appear appropriate. Providing flexibility for licensees to obtain only subsets of data relevant to their needs and markets should contribute to minimizing the data shared on a per-recipient basis. Similarly, requiring licensees to demonstrate a track record of safeguarding user data and having no connection to non-EEA state actors may provide some reassurance the recipients will handle the data responsibly.

Google’s solution will, however, likely be criticized from a privacy perspective. The anonymized data, while less risky than raw personal data, may remain sufficiently rich to enable some degree of user profiling, and to draw inferences about the behavior and interests of user groups, even if not directly tied to identified individuals. Aggregation and thresholding reduce, but do not eliminate, the risk.

It’s also unclear if Google’s thresholds of 30 users globally and five users per country are high enough to reliably anonymize data for users in smaller countries and linguistic communities. Rare names, locations, or highly specific queries could remain identifying, even with thresholding.

Google also doesn’t specify retention time limits for how long licensees can keep the data. Long retention periods expand the window for potential misuse. Strict deletion requirements upon license termination could mitigate this risk.

Naturally, Google’s proposal will also be criticized from a different perspective: that of Google’s competitors, who hope to benefit from the DMA. Each of the privacy-protecting measures inconveniences them and may raise cries about undermining the effectiveness of the DMA’s requirements. Competitors will point out what I noted in my previous comment:

The second temptation stems from the fact that some privacy- and security-protecting measures do, in fact, align with gatekeeper business interests beyond their interest to uphold a reputation of providing safe services. The uncomfortable truth for the DMA’s authors is that increasing personal data privacy and security by limiting data sharing and keeping it under “lock and key” is in the economic interest of the large service providers, to whom users are happy to provide “first party” data.

But I also noted:

DMA enforcement that chooses to remain blind to this duality of interest will only see the gatekeepers acting in their self-interest, dismissing the user benefits of such actions. Moreover, it will fail to account for the risks that come from the (very much self-interested) actions of those businesses who wish to gain from DMA enforcement. 

I’m not aware of exactly what Google’s competitors’ demands are. Given that they are likely demanding forced large-scale handovers of Europeans’ data with minimal safeguards, perhaps we should expect them to make their case publicly. Of course, they may be worried that this would make them unpopular. It could be a bad look for a “challenger” who advertises as privacy-focused if it became public that they are fighting against privacy protections and to be given the data of users who did not choose to share it with them.

It would also be very valuable for the EU data-privacy authorities to communicate their perspective on how this (and other) DMA obligations can be implemented in compliance with the GDPR. Alas, I’m afraid there will be very little appetite for that, especially if it could be perceived to “help big tech.”