Using icons to summarise privacy policies: an analysis and a proposal.

September 21, 2016

Privacy policies are hard to read. They are very long, and written in 'legalese' that very few people understand. As a result, people don't read them. To allow people to nevertheless learn how websites, apps or services treat their personal data, the use of privacy icons have been proposed. These icons should, when properly designed and used, summarise the privacy policy and convey its essential privacy characteristics. In this blog post I will discuss and analyse the main proposals, and suggest some steps forward.

In what follows I will not really discuss the visual design of such privacy icons, and how they should be embedded into the user interface. This is much better left the experts: graphical designers and user interface designers. Instead I will focus on the privacy characteristics: which possible elements of a privacy policy, which essential aspects of the processing of personal data, must be represented by an icon?

First I will discuss some existing (mostly abandoned) approaches, and some earlier analysis of these approaches. After that I will synthesise the findings and propose a minimum set of privacy characteristics I think should be represented by an appropriate icon set. I will also discuss some important conditions to make it more likely that such a set of icons becomes succesful in practice.

Overview of existing approaches

Based on the success of the Creative Commons icons to quickly communicate the license under which content can be shared, several proposals to use icons to summarise a privacy policy have been made in the past. Roughly in chronological order these are:

the 'Iconset for Data-Privacy Declarations' by Martin Mehldau,
the 'Policy Coding Methodology' of the KnownPrivacy project,
the Mozilla Privacy Icons,
the Disconnect.me icons,
PrimeLife's icons, and
a proposal by TNO.

I will briefly summarise and discuss each of these approaches below.

Martin Mehldau's Iconset for Data-Privacy Declarations

One early approach by Martin Mehldau from 2007, the Iconset for Data-Privacy Declarations, distinguished the following characteristics that each deserve a separate icon, grouped into four different categories:

what data is processed: real name, username, address, IP address/files/time, email address, comments/conversations, mails/messages, contacts/friends, favourites/interests, edits, and cookies.
how is it processed (and by whom): deleted, saved, anonymised, encrypted, published, passed on, for friends/contacts, for friends of friends, and whether you have a choice.
for what purpose: statistics, advertisement, or shopping.
for how long: for a session (end of usage/logout), until the end of a contract, for hours, for days, for weeks, for months, for years, for the time being (i.e. forever).

It is interesting to note that Mehldau considers IP addresses to be in the same characteristic as files and time information, and that he considers things like conversations and favourites as separate characteristics that deserve a separate icon. Location is missing as a characteristic.

Mehldau makes no distinction between how the data is handled, and with/by whom: both types of icons are in the same category.

I particularly like the distinction Mehldau makes between different kind of data retention periods:

for the current session,
for the current contract,
for some fixed period of time, or
indefinitely.

Apart from that, the list of characteristics is quite large and rather unstructured.

The KnowPrivacy Policy Coding Methodology

The KnowPrivacy project published a set of icons in 2009, as part of their Policy Coding Methodology. They distinguished the following three categories of privacy characteristics.

types of data collected: contact information (name, address, email address, phone number), computer data (IP address, browser type, OS information), interactive data (browsing behaviour, search history), financial data (account status, credit information, purchase history), or content (personal communications, stored documents or media).
general data collection practices: ad customisation (user data used to customise advertising), third party tracking (third parties are allowed to track user behaviour), public display (information contributed by the user may be displayed publicly), user control (users can access and correct personal information), data retention (including an indication of the retention period).
data sharing practices: shared with affiliates (bound by the same privacy practices), contractors (bound by the same privacy practices), or third parties (not bound by the same privacy practices).

I like the consolidation of distinguishing only five different types of data (compared to Mehldau's more unwieldly and less structured set of data types). Unfortunately again location is missing. Moreover, legally speaking important data types like medical data or special data (information about race, religion or sex) are sadly missing.

I also like the fact that the classification underlines the significance of whether historical data (search queries, financial records) is collected.

The Mozilla Privacy Icons

The Mozilla Privacy Icons propose a much simpler set of icons, distinguishing only the following characteristics:

retention period,
third party use,
ad networks, and
law enforcement access.

Given the other approaches discussed so far, this set of characteristics is disappointingly minimal. It is however interesting as it recognises the importance of law enforcement access, that does not appear in any of the other icon sets proposed. Mozilla proposes to distinguish two cases: a statutory process (where organisations require the government to comply, at a minimum, with the legal process provided by the law before getting users’ data) and a transparent process (where organisations always follows a publicly-documented and consistent process).

Such an icon could also be accompanied by a link to an (annual) transparency report, or warrant canary.

The Disconnect.me Icons

The Disconnect.me Icons evolved from a Mozilla led working group. It is unclear, unfortunately, whether this is the same group that designed the icons mentioned in the section above. They distinguish the following characteristics:

expected use (does the service use data in ways other than you would reasonably expect given the site’s service?)
expected collection (does the service allow other companies like ad providers and analytics firms to track users on the site?)
precise location (does the service tracks a user's actual location?)
data retention (how long does the service retain personal data?)
do not track (does it honour user do-not-track preferences?)
children's privacy (has this website received TRUSTe’s Children’s Privacy Certification?)

They also distinguish, quite oddly and rather specifically, whether the service offers SSL support and whether it is affected by the Heartbleed bug. Again the proposed set of icons is minimalistic.

What I do like in the Disconnect.me approach is the quite innovative idea to use a reasonable expectation of privacy of the average user of the service as a point of departure. Of course the question then becomes what exactly is reasonable and who determines that. This may be very hard to make objective and easy to understand in practice.

The PrimeLife Approach

The EU funded PrimeLife research project has investigated the use of icons to represent privacy policies in some depth around 2010. They took the icon sets designed by Mary Rundle (whose icons seem to have disappeared form the web) and Martin Mehldau as point of departure. PrimeLife distinguishes the following two important categories.

data types: personal data, sensitive data, medical data, payment data.
processing steps (including references to common purposes): legal obligations, shipping, user tracking, profiling, storage (including an indication of the length of the data retention period, deletion, pseudonymisation, anonymisation, data disclosure to third parties, data collection from third parties.

PrimeLife aimed to design a limited set of icons¹, restricted to data types and purposes users often cope with in the online world. Explicitly included are icons representing positive steps taken by data controllers, like the use of encryption or anonymisation techniques.

According to PrimeLife it makes sense to design a general set of icons applicable in all application domains, as well as designing additional icons for specific application domains, e.g social networks. For this they introduced icons for the following additional category

groups of recipients: friends, friends of friends, selected individuals, the public.

Interestingly enough, PrimeLife did not consider to make similar distinctions between recipients of data in the general case (although it did suggest to specify this with an optional text string along the 'data disclosure to third parties' or 'data collection from third parties' icons).

TNO's approach

As a last contribution I will briefly discuss a recent but not very well known approach from Johanneke Siljee of TNO. She proposed a totally different approach in 2015, doing away with separate icon categories, and instead proposing to signal the following characteristics using an icon:

can the service be used anonymously or not; does the site collect anonymous usage statistics?
does the service implement user choice through opt-in or opt-out, or not at all?
access rights: can you see which personal data the service collects, and can you have it corrected?
does the service collect (the legally important) sensitive or special data?
does the service perform or allow profiling and data mining?
does the service disclose or sell personal data to third parties?
does the service disclose personal data to other countries, or countries outside the EU?
how long does the service retain your data?

What I like here is the recognition of both user choice and access rights. Also, the important question whether data is shared with other countries, especially those outside the EU, is covered in this icon set.

Earlier analysis of icon based approaches

Some of the suggested approaches to summarise privacy policies using icons have been analysed by other scholars before. I will briefly summarise both the analysis of Van den Berg and Van der Hof as well as the analysis of Edwards and Abel here.

The analysis of Van den Berg and Van der Hof

Van den Berg and Van der Hof are quite critical about trying to summarise and present privacy policies to users in a graphical way. They conducted a survey among Internet users to determine the kind of information (i.e. the characteristics) about a privacy policy they care most about, when they would like to be informed about this, and how.

Their survey revealed that most users like to be informed about the privacy policy in simple, everyday language. Visual feedback was much less often mentioned as an acceptable form of providing information. It should be noted that the respondents in the survey were (very) concerned about their privacy. Also, people were not asked to rank possible ways of providing information about the privacy policy, but were instead asked to select all preferred options from the following list: legal text, everyday speech, with examples, with visual information.

According to the same survey, the kind of information users are mostly interested in are:

which of their personal data are collected,
how these data are used,
whether or not their data are passed on to third parties,
whether their data is handled securely, and
whether they can object to the use of their data.

Based on their findings, Van den Berg and Van der Hof proceed to develop a 'Privacy Wheel', that looks a bit like a privacy labelling approach instead of an icon based approach and which is loosely based on the OECD Fair Information Processing Practices. For us the above list of five types of information most relevant to users is most significant when considering the relevant privacy characteristics to display using icons.

The analysis of Edwards and Abel

Edwards and Abel have written a very nice report discussing the icon based approaches mentioned above. The following list summarises their main findings.

Icons allow for quick comprehension regardless of social and cultural backgrounds of users.
The use of icons to clearly communicate the essence of a privacy policy has not really been tested in practice yet.
Critical mass is essential, yet hard to achieve; government mandates or co-sponsorship may be beneficial.
Icons need to be simple (and hence sacrifice legal detail).
A standardised graphical approach across multiple jurisdictions is best (it creates more trust, less confusion, and creates the best opportunities to create critical mass).
It is helpful to include icons that indicate which legal regime(s) the privacy policy satisfies.
Layered privacy policies are less prominent now than several years ago².

Edwards and Abel argue that there is a problem with icon schemes that aim for an international scope. I do not agree. (In fact, if it was true there would be no point to introduce privacy icons...). In fact, I think it should be quite feasible to indicate whether a privacy policy satisfies the data protection rules of a country or region. Moreover, I think that most of the characteristics proposed so far are quite factual and not open to a lot of different interpretations: things like retention periods, data types and processing steps are quite clearly defined. (Of course the interpretation of the associated privacy risk may be culturally dependent...)

My own analysis

Although Van den Berg and Van der Hof are quite critical about the use of privacy icons, I interpret their results differently (also in light of the analysis of Edwards and Abel). Their result shows that providing information about the privacy policy in everyday speech is a very important part of an effective communication strategy to inform users about this privacy policy. However, the use of icons is an additional form of communication that allows users have a more immediate understanding of the main points of the privacy policy, about which they can learn more by reading the summary of it in everyday speech.

Practical applications and real-world experiments using privacy icons are sorely needed to support this claim and to better understand what does and does not work in practice. This research needs to be done by scientists with the appropriate background and skill sets to perform these kind of user studies, and should be based on icon sets designed by graphical design professionals.

All of the icon sets analysed have issues expressing the purpose of the processing. They all focus on expressing only very specific, sometimes only 'harmful', purposes (like shopping, advertising or profiling). This is not surprising given the fact that personal data may be processed for a great variety of reasons, ranging from optimising or personalising the service, through big data analytics or collecting information to complete an order and ship the items bought, all the way to profiling and tracking users. Some attempts to capture some of these aspects in icons resulted in very complex designs that were poorly understood (e.g the PrimeLife icons). We conclude that the purpose of the processing cannot be expressed graphically and should instead be explained using a short sentence in everyday speech.

What definitely needs to be expressed is what type of personal data is collected. This is what almost all analysed schemes do, in varying degrees of detail. In the proposal outlined below a common sense balance between detail and simplicity is struck, making sure that legally significant classes of personal data (like health data and so called 'special' data) is clearly distinguished.

Intuitively, it makes sense to define a characteristic that specifies who processes the data (where processing includes having access to the data). However, it matters a great deal whether data is processed locally (on a user device but by an externally provided app) or remotely, even if in both cases the same (external) party is responsible for the data processing (i.e. is the data controller). So instead we define a slightly broader characteristic that specifies where the data is processed.

Based on the results of the survey of Van den Berg and Van der Hof, I believe it is also important to express how the data is processed: is this done a secure fashion, done in accordance with certain legal requirements (as suggested by Edwards and Abel). This category also should contain information about the retention period (where I draw on the ideas of Mehldau) . I also think it makes sense to include information about how governmental data access requests are being dealt with (as originally proposed by Mozilla), although this is the least important characteristic in my opinion. If in the end the number of characteristics proves to be unwieldy, this one could be omitted.

A proposal for a set essential privacy characteristics

Given the overview of existing proposals (and their shortcomings), and based on the analysis above I propose the following set of privacy characteristics to be shown using icons as a minimum:

what personal data is processed,
where it is processed (including by whom), and
how (i.e. with which safeguards) it is processed.

As explained above, the purpose of the processing cannot be adequately expressed by an icon. As a result there is no why category for icons. This should be expressed by a short statement in everyday language instead.

Note that the scope of the icons is broadened by referring to data processing (as defined in the Data Protection Directive (DPD) as well as the upcoming General Data Protection Regulation (GDPR)) instead of only referring to data collection (as most of the previous proposals have done, although they probably intended to include all forms of personal data processing).

As said before, this is only a proposal for a set of essential privacy characteristics. It does not include actual icons to graphically represent these. I would love to include those however. So: if you are a professional graphics designer and would like to contribute, please send me your designs!

With that out of the way, let me describe the privacy characteristics in a bit more detail.

What: which type of personal data is processed?

Show an icon for each of the following types of personal data if it is processed.

Contact data: name, address, email address, phone number.
Financial data: account status, credit information, purchase history.
Medical data: DNA, medicine/drug usage, patient dossier, biometrical data.
Special data: religious beliefs, criminal records, gender, race.
Behavioural data: browsing and search history, energy consumption patterns, location data. In other words: what you did and where you did it. (This corresponds to metadata, or the observed data class from the WEF classification.)
Content: personal communications, stored documents or media.
Tracking data: cookies, IP address, browser type, OS information.

Where: at which location/device is the data processed, and under whose responsibility?

Indicate where the personal data is processed:

locally, at the user device
centrally, at the data controller
shared with third parties

Show icons for each of the following safeguards related to the data processing taking place.

Jurisdiction: Indicate which legal regimes (US, EU, specific country) the privacy policy satisfies. Multiple icons possible if multiple legal regimes are satisfied.
Retention period: Indicate the period for which the personal data is retained:
- the current session,
- the current contract,
- some fixed period of time, or
- indefinitely.
Security: Indicate whether personal data is processed securely.
Consent: Indicate whether personal data is only processed after explicit consent of the data subject.
Governmental access: Indicate whether access to personal data by law enforcement, tax agencies, intelligence services and the like is restricted using a statutory process and/or a transparent process.

Closing remarks

An alternative approach that I haven't explored yet is related to the idea behind the Disconnect.me approach to express deviations from the expected collection and expected use of personal data. As discussed above, this particular idea is hard to make objective and (thus) understandable for the average user. However, the idea could be transformed into a benchmark approach. Different services could be scored against a benchmark and given icons to represent whether they perform better or worse, compared to the average. In this case, no icons means that the privacy protection is average. Green icons could be used to indicate a service performs better than average on a certain characteristic. Red could be used when a service performs worse than average on a characteristic.

Another approach is to only specify privacy characteristics that pose a risk. Then the best (i.e most privacy friendly) service is one that has no icons (because it induces no risks). In this approach care has to be taken to convert privacy protective measures (e.g. the use of anonymisation) into their opposite, privacy risk inducing, qualities.

References

A great analysis of current approaches to using icons to represent privacy policies is provided by:

Lilian Edwards, Wiebke Abel: "The Use of Privacy Icons and Standard Contract Terms for Generating Consumer Trust and Confidence in Digital Services", CREATe Working Paper 2014/15 (October 2014), pdf

For PrimeLife icons, see

Leif-Erik Holtz, Katharina Nocun, Marit Hansen, "Towards displaying privacy information with icons", Privacy and Identity Management for Life, 6th IFIP Summer School, Helsingborg, Sweden, August 2--6, Springer, 2010, pp 338-348 pdf
Leif-Erik Holtz, Harald Zwingelberg, Marit Hansen, "Privacy Policy Icons", Privacy and Identity Management for Life, Springer, 2011, pp 270--285.
"Policy Icons and Tests", Chapter 3, PrimeLife Deliverable D4.3.2, pdf

For the TNO approach, see

Johanneke Siljee, "Privacy Transparency Patterns", Chapter 9, The Privacy & Identity Lab, 4 years later, ISBN: 978-90-82483 5-0-5, November 2015. pdf
Johanneke Siljee, "Privacy Transparency Patterns", EuroPLoP '15, July 08 - 12, 2015, Kaufbeuren, Germany.

See also the following references for additional material

Privacy Nutrition Labels developed by CMU for a slightly different approach.
Privicons, that offer a simple set of icons to express privacy preferences for emails that you send.
B. van den Berg and S. van der Hof, “What Happens to my Data? A Novel Approach to Informing Users of Data Processing Practices” (2012) 7:2 First Monday.

A draft set of icons included for example singularisation, tracking and cross-site tracking, but these were omitted from the final set.↩︎
The authors see this as yet another indication that self-regulation is not working in the privacy marketplace.↩︎

In case you spot any errors on this page, please notify me!

Or, leave a comment.

Jeroen

, 2016-09-23 07:43:10

(reply)

I would add another design consideration: who is going to publish the icons? It does not make sense to have an icon describing illegal activities when the company is publishing themselves.

Also, if companies would be publishing them, there is a chance of conforming behaviour. I.e., changing behaviour so they fit the icons better. This may be a good or bad thing, depending heavily on the categories and design. (If others publish them there’s also a chance if this, but I would expect a much weaker effect)

Jaap-Henk

, 2016-09-23 08:05:15

(reply)

The intention is that companies publish these icons themselves on their own website. This makes it easiest for user to get informed about the privacy policy when visiting the website. Of course some independent organisation needs to check that the icons displayed honestly represent how the company treats your personal data.

W.r.t. your second point: this is indeed the intended effect. We hope that by using the icons (perhaps being forced because all competitors use them) companies become aware of their privacy practices and start improving them.

De Week « Bits of Freedom

, 2016-10-14 14:39:37

(reply)

[…] Hoepman heeft een vergelijking gemaakt van de verschillende voorstellen die er zijn gedaan om lange privacyvoorwaarden duidelijker […]

Winfried Tilanus

, 2016-10-17 07:32:56

(reply)

All these approaches, except partially the one of disconnect.me, fail because they stick to the legal and/or the technical perspective. People share data because that is part of a certain kind of relationship with an other person or an organization. Big part of the current privacy problem is that many organizations pretend to offer one kind of relation (access to information, personal contact) while in reality there product is an other one (profiling, advertisement). Being able to choose here and control this, really matters to people. The technical and the legal perspective still provide a hiding place for the real purpose of the data gathering and usage by organizations.

So an effective icon system must force organizations to be open over what kind of relationship they engage their users in and how data is used to shape that relation. But I don’t expect the internet-industry to be mature enough to do this on a self regulatory base.

Using icons to summarise privacy policies: an analysis and a proposal.

Overview of existing approaches

Martin Mehldau's Iconset for Data-Privacy Declarations

The KnowPrivacy Policy Coding Methodology

The Mozilla Privacy Icons

The Disconnect.me Icons

The PrimeLife Approach

TNO's approach

Earlier analysis of icon based approaches

The analysis of Van den Berg and Van der Hof

The analysis of Edwards and Abel

My own analysis

A proposal for a set essential privacy characteristics

What: which type of personal data is processed?

Where: at which location/device is the data processed, and under whose responsibility?

How: what are the safeguards related to the data processing?

Closing remarks

References