Please find below a summary of the lectures given on day #2 of the Interdisciplinary Summerschool on Privacy (ISP 2016), held at Berg en Dal this week. There were lectures by George Danezis privacy-friendly services and Helen Nissenbaum on contextual integrity.
Computer security solutions (focused on confidentiality) cannot be applied to solve privacy problems straightforwardly. Privacy enhancing technologies (PETs) have a different threat model: the 5 C's: Cost, Collusion, Compulsion, Corruption, Carelessness. In PETs 10% is about confidentiality and 90% is about making sure no one cheats.
First, there is huge asymmetry of power: there are weak actors versus powerful adversary. The actors are ordinary users, that need to use of-the-shelf components or software. (Contrast this with computer security that have as mental model military organisations or big companies with IT departments that want to defend themselves against external adversaries.) The audience challenged this point: aren't PETS technologies that companies use to protect the data of their users (data subjects) that they collect to comply with data protection laws.
Second, there is risk of compulsion. To use a system or an app, users have to agree with the privacy policy. If not, they cannot use the app. Often, the use of the system is mandatory (e.g. systems for public transport, or filing tax returns). In extreme cases (like journalists or NGOs working in failed states) there is even a risk of physical compulsion.
Third, we cannot assume the existence of a trusted third party. In computer security, often all involved parties work in the same organisation, or all have an incentive to protect themselves against a third (truly external) party. In privacy, sometimes some of the involved parties are not that trusted. Or the involved parties do not know each other well enough to determine whether they share a third party that they trust.
There are also different design principles within privacy engineering. First, we need to rely on end-user devices (which is actually a challenge, because such devices are notoriously insecure). Second, we have to distribute trust over several (partially) trusted parties and allow users to choose whom they trust. We have to use cryptography (it's all we have to bootstrap from). You need to keep only short term secrets, if possible, because we need perfect forward secrecy.
Suppose Alice and Bob (the traditional parties involved in a cryptographic protocol) are communicating securely. Perfect forward secrecy guarantees that even if an adversary records all exchanged (encrypted) data, when Alice and Bob stop talking and close the conversation, the adversary cannot retrieve whatever they said, even if he forces Alice and Bob to tell him all they know. The way to make systems perfect forward secure is to throw away the keys to encrypt the conversation immediately after the conversation is finished, and to ensure that this key cannot be retrieved from the messages used to set up the conversation. (This is, surprisingly perhaps, possible using public key cryptography.) This prevents any kind of after-the-fact compulsion attacks and thus very important when designing PETs
Note that if messages are stored locally are bets are off! (Many apps do this, by the way.)
Hiding meta-data (i.e. data about the conversation like who is talking with whom, and even the fact that a communication is taking place) is much harder than protecting the content of the communication. This meta-data is surprisingly sensitive: a lot of personal information can be derived from seemingly innocuous bits of meta-data.
Anonymous communication is the study of protecting this meta-data. (This will be discussed further in the Friday morning lecture). One way to hide who is talking to whom is to use a sequence proxies that each forward the message to the next proxy on the path from Alice to Bob, where each only knows about how to forward the message to the next hop on the path. They have no knowledge of the sender or the ultimate recipient. Each of the proxies need to be only semi-trusted: as long as at least one of all the proxies is honest, privacy is guaranteed. This is a property you want to have for all privacy enhancing technologies.
Private information retrieval is another basic technique to protect privacy, in this case in the context of databases. Here the meta-data is information about who is looking for which data item in the database. (This is a property that even intelligence agencies, law enforcement and even big companies (think patents) desire.) Private information retrieval guarantees users of a database that they can access any record in the database without revealing (to either an external observer or even the database manager itself) which record they want.
It is even possible to let a group of people (that each hold a private, sensitive, input) to compute a general, arbitrary, function of these private inputs. For example, two people can determine which of them is older, without revealing their actual age to each other (or anyone else). Another example is privacy friendly auctions: each user has a private bid, and the function computes the winning bid without anybody learning about the input bids. There are two fundamental techniques to build systems that do this: Secure Multiparty Computation, and (Fully) Homomorphic Encryption. The first technique is tried and tested and some commercial offerings exist. Fully Homomorphic computations are hugely inefficient, Secure Multiparty Computations are slightly more efficient but still warp you back (in terms of computing power) to the sixties of the previous century. This partially explains why these techniques are not deployed a lot (if at all) in practice. Also, there are huge deployment hurdles: who will provide a platform to deploy these techniques? What is the business model? Who do you trust? And how much more protection does this really offer, especially if the parties involved already trust each other or at least believe they themselves can be trusted.
One final important, and fascinating, technique are zero knowledge proofs that allow you to prove statements about secrets without ever revealing the secret itself. It is like a magician being able to convince you that he has a bunny in the hat, without showing you ever the bunny!
George is skeptical about data anonymisation techniques. To him this is magical thinking, that offer no real guarantees. However, if the database itself is kept secret but you allow (selected) queries to it then some guarantees can be given using differential privacy. Such queries should not involve single individuals but always involve groups of individuals of a sufficient size. This technique adds noise to query results, to prevent maliciously crafted queries to single out information about single individuals by computing the intersection over query results for groups of people that differ by one person.
Some people are suggesting that we can no longer control the collection of personal data, and that we should instead control (bad) use of personal data. This doesn't help, at least from a technological perspective: it is just as hard.
PETs try to prevent something that goes wrong. What about corrective actions? In a way perfect forward secrecy recover from the fact that keys are compromised. If you view PETs as retaining control, then yes it is possible. If you see PETs as keeping secrets, then all bets are off: once the cat is out of the bag, there is nothing we can do.
What about trust. In medicine, we test drugs, and they are certified. But the domain is much less adversarial: there is no real incentive to create malicious or bad drugs. But in PETs, the first certifying authority that comes to mind is the government. All intelligence services have a group responsible for certifying technologies for government use, for example. The problem is, of course, that the government is an adversary to the privacy of users. The question is then who to trust to certify the software you use. Android runs SELinux, which contains millions of lines of code contributed by the NSA. Someone found a very subtle bug (a race condition), but was this malicious or simply an accident?
To compute specific functions in private, the situation is more positive however as the following examples show
Pay as you drive (PAYD): pay a car insurance fee that depends on when, how and how much you drive. This was introduced in the UK already in the early 21st century, in a totally privacy unfriendly fashion where the insurance company collected all information about driving patterns. The resulting public image damage made the company withdraw the scheme. (Note that in the Netherlands a similar, privacy invasive, scheme has been introduced recently.)
George designed, together with others, a system called PriPAYD in 2007: a privacy friendly PAYD design. The central idea was to install a blackbox in the car that collects and processes all personal data locally, and computes the premium locally using an algorithm provided by the insurance company. There are some issues with this approach, however. Mainly, the black box has to be trusted by everybody (both the user for protecting privacy, and the insurance company for security). Also: what if the GPS data is inaccurate (or manipulated to reduce the insurance fee). In a centralised scheme this can be detected, but not in a decentralised one. Another problem is how to ensure that the algorithm does not leak personal information to the insurance company in a subtle way; and how can this be guaranteed after a software update. (Note that such systems also have an impact on solidarity: there will be groups of consumers that will suffer a negative impact, e.g a larger premium, because of their driving habits. Note that this is not the best domain to argue for solidarity because there is a huge factor of agency for car drivers; such a system even has the promise to reduce insurance fee for everybody because driving becomes much safer. The health domain is better, because there is much less agency and much more 'bad luck'.)
PrETP was a similar system for toll charging that aimed to reduce the level of trust required in the blackbox, in particular the toll company no longer needs to trust the blackbox to compute the toll charge correctly. The idea was to enforce honesty of the blackbox using spot checks. The advantage of this approach is that it allows users to choose the device to run the PrETP system on. It could, theoretically, be the user's own mobile phone!
It is very hard to design systems that have to be trusted by many different parties for often conflicting aims. This is especially the case if we want to offer law enforcement access to the data collected by the blackbox as well: the privacy preference of the user conflicts with the aim of law enforcement. For this reason they did not offer such forms of escrow.
Smart meters were rolled out for many use cases, some of them unknown when the systems were designed. Infrastructures have open ended purposes. This collides with a basic privacy design principle that requires you to know the purpose for which the system must be designed. As a result, in smart metering there is basic communications security and only procedural countermeasures to protect access to the personally sensitive data.
Question is: can we design a privacy friendly system that also has a certain amount of utility and flexibility. A system that allows the computation of aggregate statistics in real time, as well as compute functions over (sets of) individual readings. One possible approach is to encrypt data locally using a homomorphic encryption scheme. The approach also creates a mechanism that allows the utility companies to verify that the input values used to compute the function (e.g. the bill) are indeed valid without revealing the individual meter readings. The underlying technology uses zero knowledge proofs.
Helen discussed two lines of research she has been pursuing in parallel over the last years:
Values in design. How does technology influence certain values in society, and their understanding. For example, security professionals understand trust different; a secure system for them creates trust, whereas in the 'ordinary' interpretation trust is something you need if there is no security to rely on...
Consider a smartphone. What is the phone? The software. Can be a paperweight. It's personal. Has a brand. But the same can be said of a water bottle. To talk about the functioning of the phone, to talk about what is is only possible if you place it in the much larger context and environment it needs to function. Values in technology are not simply a function of use but of design. Ethical values emerge from technologies as they function within particular human, societal settings.
If this is the case, why not make the "practical turn" and design for (these) values. This is what "values at play" is about. The process consists of the following phases (that take place more or less in parallel!)
These values are not the main functional requirements. The main function of a toaster is to toast bread, not to be safe or reliable; those things are important quality attributes however, that need to be taken into account too when designing the system.
Values in design stands in opposition to the neutrality thesis (Chomsky, saying that technology can be used both for good and bad purposes), because the way systems are designed create affordances that make one purpose more natural or logical than another.
The question is where the values come from, who decides what values to incorporate within the design of a system. Should this be the market? Legal approaches? Engineers? The values of engineers should not decide for the rest of society! Quote "Technology is the political philosophy of our day"!
Technological developments have created several capabilities: collection/monitoring (GPS, RFID, video capture, cookies), aggregation/analysis (machine learning, big data), communication (the Internet, Web 2.0). With technological advancements, there were outcries by people saying that their privacy was being violated. This is where Helen got interested in privacy, because she wanted to know why people felt their privacy was violated. What is the disruptive character of these technologies that created this upheaval about privacy loss? How do you conceptualise privacy (to understand how these developments threaten 'it')?
The idea of contextual integrity originated from the observation (in 1997) that there is a need for privacy in public: the collection of bits of public information (for which we have no reasonable expectation of privacy) lead, when aggregated and analysed, to insights into our personal life that are considered harmful.
The four essential claims of contextual integrity are:
Contextual Integrity wants you to acknowledge that privacy is promoting democracy (when applied to voting), i.e. wants you to acknowledge the importance of societal norms, beyond the personal benefits of having privacy.
Note: contextual integrity really pushes back on the idea of privacy as control (by data subjects) over information, and instead focuses on putting constraints of information flow, that can be expressed by both data subjects as recipients of the information.
Note: in user experience a lot of research is done in the area of privacy, but there the focus is on making sure users are not surprised. Changes are made in the user interface; not in the actual workings of the systems.
The question asked often is who defines the norms. But norms are societal expectations. They evolve. Nobody defines them. It is actually hard to determine what 'the norms' are at any given time in any given context...
[…] guarantees, avoid a single point of failure (or distribute trust, see George’s talk yesterday), and make the solution open source (security is a process, it takes a village to keep a system […]