Pseudonymous data should not be exempted from data protection.

April 3, 2013

Europe is currently discussing an update of its data protection regime. The Albrecht Report suggests several amendments to the Commission's proposal for a new regulation. One of the proposals is to limit the protection for pseudonymous data. I think this a dangerous idea.

In the privacy debate, pseudonyms are a red herring. They offer only a weak level of protection. People often believe that they can hide behind a pseudonym. But this belief is wrong. Pseudonyms only provide context separation. They make it impossible to link data about me in one context with data about me in another context. Within one context, pseudonyms act like real identifiers, and behave just like real names.

Whether a data record refers to me by name, or by email address, or by a strong pseudonym (that cryptographically prevents the pseudonym to be linked to me) does not really matter. The data refers to me, and the data will be used, within that context, to judge me, make decisions about me, etc. Therefore, the protection offered by the regulation is just as necessary for pseudonymous data as it is for non-pseudonymous data. This is not to say that pseudonyms are useless. They are a sane technical measure in the privacy-by-design toolchest. But they should not provide an escape route to avoid compliance with the data protection regulation.

, 2013-04-03 15:09:28

, 2013-04-03 16:09:31

This assumes that the one holding the pseudonymous data can also access the data that puts it into context (that is say that the data is originating from you). Meaning if data is given to a researcher, that researcher can tell something about that data, but as long as he or she can not link back the data to an actual person, the data is anonymous (a bit depending on the type of data). The crucial part is where the linkage is done between the results of the researcher and the link back to you as the person.

, 2013-04-03 20:39:18

You are referring to a very specific application, where say health data is pseudonymised before it is stored. This would allow researchers to detect patterns in there (for instance to determine effectiveness of certain treatments).

But pseudonyms are used in many more applications. Another common example (and one advocated in the Netherlands as well for the use of DigiD in the private sector) is to derive a sector specific identifier (i.e. a pseudonym) from your social security number.

Pseudonyms act like identifiers, that make people traceable along all their interactions within the context in which they use that identifier. In that sense they are similar to IP addresses, or identifiers in RFID tags, smartcards, etc. If the context is small, if pseudonyms are refreshed often, or if only used a few times, for a limited purpose, the harm is limited. But this should be determined using the general principles prescribed by the data protection regulation.

So even though pseudonymous data is not the same as personal data (one cannot be traced back to a named individual, and the other can), in practice the difference is irrelevant in my opinion.

, 2013-04-04 06:46:51

In general, I hope, we agree that a pseudonym should be unique for each and every dataset. As soon as a pseudonym, or any other ID, is used over and over again, it will act as a name to an individual, they become synonymous. But, in my opinion, there is a difference between the pseudonym and pseudonymous data. I agree that it highly matters how one deals with the pseudonym and how safe the location is where the link identifier between the data set and the personal data is kept.

