Highlights of the 15th Privacy Enhancing Technologies Symposium 2015, day #2.

July 1, 2015

The 15th Privacy Enhancing Technologies Symposium (PETS) was held June 30 - July 20 2015 at Drexel University, Philadelphia. Please find below a summary of some of the presentations given on day #1. (I have not included a summary of all talks, and do not pay an equal amount of attention to all talks).

I have also made reports for day #1.

Networks and Anonymity (Chair: Paul Syverson)

Portrait of a Privacy Invasion: Detecting Relationships Through Large-scale Photo Analysis

By: Y. Shoshitaishvili, C. Kruegel, G. Vigna. Presented by: Yan Shoshitaishvili Paper link.

In 2014, the number of pictures uploaded to social networks daily exceeded 1.8 billion.

Ability to retrieve relationship information from large photographs.

Android third party apps have access to your pictures you store on your SD card. If you upload your pictures to Facebook, Facebook itself, and Facebook apps, as well as your friends (and their apps!) can access your pictures. If your friends have the Facebook app installed, even their third party apps can access your pictures.

Pictures represent data about you, which can be useful for targeted advertising. For example, they reveal the brands you are wearing (we are not there yet). But where we are now, is understanding with whom you are having a romantic relationship.

The authors developed "Creepic". The system takes a bunch of pictures, analyses them, and predicts who is dating who. To train they need a set of 'mugshots' of people they want to be able to recognise and determine their relationship status of. Moreover, the classifier (SVM) needs to be trained with labeled feature sets (extracted from sample pictures). Data used to train the system were taken from a Facebook app installed by volunteers, as well as using data sets from Zimbio. This is a database of pictures of celebrities, together with precise timestamps of when relationships started and ended.

Results: after training, applying Creepic to the Facebook data set give 62% true positive rate and 12% false positive rate. On the Zimbio data set the results were 66% true positive rate and 8% false positive rate.

Are there countermeasures? They think uploading chaotic pictures to your social networks to confuse Creepic might help, but this is not fully understood yet. I fear this will just add a bit of noise, but will not fundamentally make the problem harder.

Note: the system raises issues if you consider gay relationships: this leaks sexual preferences.

Constructing Elastic Distinguishability Metrics for Location Privacy

By: K. Chatzikokolakis, C. Palamidessi, M. Stronati. Presented by: Marco Stronati. Paper link.

To provide privacy for location based services, systems typically transform the actual location x of a user to another location M(x) that is less precise. There are several ways to do this, with different properties. One property one might want to have is that if two locations are close, then it is hard to determine which of the two correspond to a certain 'mapped' location M(x). Whereas if the distance is large, then it may be much easier. In other words, the distance between two mapped points M(x) and M(x') is bound from above by the distance d(x,x') between the input locations x and x'.

This latter distance can be set as some c times the actual Euclidean distance between x and x'. Then c in effect sets the required privacy level. However, the actual privacy this provides depends on the context. If you are in a city, then there are many possible locations a person is likely to be (a cafe, restaurant, etc.) in small area, whereas if you are in the countryside the same sized area contains much less likely places someone can be, thus reducing the actual privacy.

The authors propose an alternative definition of distance that compensates for this, such that a given c corresponds to some context-independent level of location privacy.

For this they extract something they call privacy mass from the OpenStreetMap. In essence, this measures the number of 'interesting places' in a certain area, and assigns a higher privacy mass to an area with many interesting places. The new distance measure between two locations x and x' is then the shortest weighted path from x to x' through the density map corresponding to the privacy mass.

DP5: A Private Presence Service

By: N. Borisov, G. Danezis, I. Goldberg. Presented by: Ian Goldberg. Paper link.

See my summary of the same talk at Real World Cryptography workshop (Session 13).

Cloud Computation (Chair: Amir Houmansadr)

Know Thy Neighbor: Crypto Library Detection in Cloud

By: G. Irazoqui, M. Inci, B. Sunar, T. Eisenbarth. Presented by: Thomas Eisenbarth. Paper link.

People increasingly use the cloud to outsource computational tasks, for example by hosting a virtual machine in the cloud. In such a setup each tenant gets his own virtual machine (VM), running isolated (at least in theory) on some shared hardware by a hypervisor. However, because the hardware is shared, VM's may 'feel' the load of other VM's. In a cloud setting you can use information leaked across VM's to extract private keys, for example [ZJRR12,YF14].

In this work the authors show how one can detect which cryptographic library a co-located VM is using, and even which version of the library. The method uses the L3 ache as a covert channel. This cache is shared across all VM's, and the adversary can monitor this cache to see whether your cache entries have been evicted or loaded (indicating some activity by others)

Many VM's do de-duplication to ensure that only one copy of duplicate data is actually stored in physical RAM (these days it is optional for VMs; default for OSes). So suppose a spy and victim share some memory page (e.g. a crypto library code page). The spy now flushes the L3 cache and then waits to see what victim does. If the victim loads the de-duplicated page, then if the spy loads it later (to test) it will load faster than if the victim did not request data from that page.

With this method you an detect if a victim is using a cryptographic library, by letting the spy use the suspected library (i.e. you need to guess first) as well, and observe the load times as described above.

Secure and Scalable Match: Overcoming the Universal Circuit Bottleneck using Group Programs

By: R. Krishnan, R. Sundaram. Presented by: Ravi Sundaram (by recorded video). Paper link.

In so called content-centric networking data is automatically pushed to people, based on their interests, based on a publish/subscribe mechanism, provided by brokers.

The basic idea to do this in a privacy friendly way, runs as follows. The publisher publishes encrypted metadata, the subscriber publishes encrypted keywords, and the broker matches these using some algorithm. It is known that any predicate can be matched privately in polynomial time by the broker, using techniques from secure two-party computation.

The authors proposal SPARMatch as a more efficient approach. SPARMatch's construction is based on Barrignton's theorem, which states that any function computable by a circuit of depth d can be computed by a constant width branching program of length 4^d.

Substring-Searchable Symmetric Encryption

By: M. Chase, E. Shen Presented by: Melissa Chase Paper link.

Suppose you are outsourcing storage, e.g. genomic data, that you want to be able to query. In particular, suppose you want to see if a query string is a substring of some of the stored strings. For security and privacy reasons you want to store the data encrypted on the cloud. In this model the user storing the string and the user(s) querying the string share a secret key.

If you try to solve the problem straightforwardly, multiple queries may allow the cloud server to tell whether several queries match the same strings, at related positions, for example. This leaks information. Question is, how to prevent this. Known (very privacy friendly) solutions are very inefficient. In this approach, therefore, the authors try to balance privacy and efficiency.

As an underlying data structure they use a so called suffix tree. Such a tree stores each suffix of the string as a path from the root to some leaf in the tree. To match a query substring, you can traverse the suffix tree character by character of the query substring, until all characters have been consumed. If that is possible, the query matches, otherwise it does not.

In the actual protocol, each character of the string to be stored is encrypted in a semantically secure manner before uploading it to the server. The query is similarly encrypted. The resulting protocol is somewhat more complicated in how characters are matched. In terms of performance, the approach requires O(n.k) storage and O(m.k) query communication (where k depends on the characterset). Executing a query requires O(m) symmetric key operations and O(m) table lookups. The access patterns still leak some information though, so full privacy is not achieved.

The research raises some interesting broader questions, like how to analyse leakage, and what types of data structures can we encrypt efficiently,

Town hall meeting: The Future Evolution of PETS and PoPETs (Chairs: Apu Kapadia and Steven Murdoch)

The Town hall meeting was organised to discuss the future evolution of PETS and PoPETs.

Some history first. It was decided to have 4 deadlines each year, at a predictable schedule, instead of a deadline each month. Mostly because there is quite a significant cost associated with each deadline.

Going for a journal style publication has created new types of verdicts for submitted papers: accepted with minor revisions (is like a shepherded paper for a conference), and accepted with major revisions. The journal has a quick turn around: authors receive a notification after two months, and the paper appears after two more months.

The journal is attracting more papers, and at the same level quality level, this means more papers are presented at PETS. How to deal with that? Extend PETS by one more day? Shorter talks? Go for multiple tracks? Only accept certain papers for publications?

This question was posed to the audience. The following comments were made.

  • People express support to extend PETS by one day.
  • There are far too many computer science conferences. The current CS field is way too fragmented. Prevent PETS to split up.
  • Allowing major revisions as a possible decision has resulted in more papers getting this decision than desirable (than rejecting them outright).
  • The PETS/PoPETS model may be an opportunity to go against the Least Publishable Unit model (i.e. the fact that the intellectual content of papers gets smaller year after year). The journal publication can be required to be long and content rich while the presentation can still be short, to counter this.
  • The new setup may have affected WPES, who attracted less submissions this year. (This was not intentional at all!)
  • Note that a more radical model (where all submissions and all reviews would have been published) was discussed internally. One could go even further and sort of outsource the role of the PC to the community at large, as part of a more general reputation based system.
  • You can totally separate the journal from the symposium, and only accept some papers for presentation, based on a separate 'submission for presentation'.
  • Other people express the desire to make all reviews public. (I think in that case also the name of the reviewer should be published, to discourage 'bullyish' reviews.)
  • Going for more radical approaches really requires us consider the position of more junior researchers, both as reviewers as well as authors.

13:30 User Profiling (Chair: Matthew Wright)

Information Flow Experiments on Ad Privacy Settings

By: A. Datta, M. Tschantz, A. Datta. Presented by: Amit Datta. Paper link. (title of presentation changed.)

Ads served by Google's add network on the web are based on a prior behavioural model created by Google. For transparency reasons, Google offers Ad Settings, to view some inferred user attributes, and offers to change certain settings. How these settings influence the ad behaviour is unclear, because the ad ecosystem is a black box. Goal of the research is to obtain more information about this ad ecosystem. For this they developed the AdFisher testing platform.

They studied a few questions.

The first question was whether discrimination based on attributes (e.g. sex) is taking place. The authors found that men browsing websites related to finding jobs get offered different ads than women.

The second question was whether the Ad Settings offered by Google is transparent, in the sense that the Ad Settings show a significant attribute if the ads offered actually show a significant difference. In other words: is there important information flow (profiling) taking place towards the advertisers, which is not revealed to the user through the Ad Settings panel. The authors found that the panel is not transparent, in that browsing for abuse related websites did result in different ads being shown later, while this did not result in any notice on the Ad Settings panel.

Third question was whether the changes in the Ad Settings let to changes in ads displayed in a significant choice, i.e. whether choice is implemented in a proper way. The authors found that changing Ad Settings related to religious beliefs did change the ads significantly.

The second part of the talk discussed the research methodology; for this I refer to the paper.

An Automated Approach for Complementing Ad Blockers' Blacklists

By: D. Gugelmann, M. Happe, B. Ager, V. Lenders. Presented by: David Gugelmann. Paper link.

Research shows that 25-55% of websites load objects from more than 10 third parties. These collect (personal) information using cookies or running scripts in the user browsers. To protect against this, users try to protect them using either

  • default-block policies with whitelist, like NoScript.
  • default-accept policies with blacklist, like Adblock Plus.

The latter is preferred because blocking scripts by default renders most websites unusable. It is hard to keep the blacklists up to date, because the tracker landscape changes fast, and service providers may not necessarily be trusted (AdBlock received money from companies to prevent them from being blacklisted, for example).

This research investigates whether these blacklists can be compiled automatically using a machine learning approach. An important aspect is the selection of which features to extract from web traffic that allow to determine malicious third party content reliably. Significant features are the number of distinct referrer domains, and the ratio between sent and received bytes for a particular service. (Note that there are services for which this ratio drops even below one, which is typical for websites that collect information about users instead of providing them with information.)

The classifier identified (in August 2013) around 200 malicious objects, 70 of which were later added to the AdBlock Plus blacklist independently by volunteers in February 2015, proving the validity of the approach taken by the authors.

Privacy Games: Optimal User-Centric Data Obfuscation

By: R. Shokri Presented by: Reza Shokri. Paper link.

Skipped.

Applied Cryptography (Chair: Nick Hopper)

A Practical Set-Membership Proof for Privacy-Preserving NFC Mobile Ticketing

By: G. Arfaoui, J. Lalande, J. Traoré, N. Desmoulins, P. Berthomé, S. Gharout. Presented by: Ghada Arfaoui. Paper link.

In Paris you can buy booklets of 10 metro tickets. It would be great if you could store such a booklet metro tickets on your smart phone, and use the NFC interface of the mobile phone to validate such a ticket and open the gate. A protocol for mobile ticketing presented by Rupp et al presented at Financial Crypto 2013 assumes an honest-but-curious transport operator. But what if he is malicious?

The authors present a privacy friendly protocol that is unlinkable, works even if the battery of the smart phone is off (so there is no computation delegation to the smart phone, i.e. the NFC secure element (SE) has to do all the work), while the validation time stays below 300 ms.

The protocol uses set membership protocols as a primitive. Existing set membership protocols can not be applied in this setting, because existing protocols use pairings at the prover side (which cannot be performed efficiently by a SIM card/SE). To avoid such a pairing at the prover side, the authors propose a protocol based on Boneh-Boyen signatures.

Accountable Metadata-Hiding Escrow: A Group Signature Case Study

By: M. Kohlweiss, I. Miers. Presented by: Ian Miers. Paper link.

Politics is demanding key escrow again now that strong encryption is finally being used at large scale by the general public (Textsecure, iMessage). You can't have one golden key for everybody, because that makes everybody insecure.

The authors try to design an accountable escrow scheme, that allow users to retroactively determine whether their communication was opened.

(Unfortunately the paper was presented so fast - or I was too jetlagged to keep up - that I lost track of the main ideas. That's why this summary is so short.)

During the Q&A a discussion ensued about the threat model and who would use this scheme.

Optimal Rate Private Information Retrieval from Homomorphic Encryption

By: A. Kiayias, N. Leonardos, H. Lipmaa, K. Pavlyk, Q. Tang. Presented by: Qiang Tang. Paper link.

Scenario: I want to download a particular movie (10 GB) from a Netflix movie database, without revealing which one. This is an instance of Private Information Retrieval (PIR).

Let t be the size of the item queried, and T the size of the total communication required to retrieve the item. Let's define the communication rate as t/T. When t is large (as in the Netflix case), even a rate of 0.5 is unacceptable.

Lipmaa's CPIR scheme uses homomorphic encryption, allowing the PIR server to return just the homomorphic encryption of the queries item. Then the communication rate depends on the overhead incurred by the homomorphic encryption scheme, and how to send the query to the server. The author present a protocol with an optimal rate, based on branching programs.

Rump Session (Chair: Roger Dingledine)

Seda Guerses: Announces the IEEE S&P International Workshop on Privacy Engineering (IWPE). In the future IWPE wants to alternate between US and Europe in the future. And: see the movie 'What we do in the shadows'.

Peeter Laude (speaking for Sharemind): Should IT students work during their studies? Should they drop out and go to work? They answered these question using government databases, using Sharemind's secure multiparty computation systems. Largest secure MPC application ever (623k records on education and 10.6 million tax records). But: what were the answers???

Ian Miers: presents Libforwardsec: forward security for messaging. Wonder how this compares to what Textsecure and Silent Circle do.…

Zack: poses a research challenge for a game theory wizzard to figure out whether (Tor) guards are really a good idea. What is worse: p% of the time, you are linked when setting up a new circuit (this is the case without guards) versus p% of the time, you are linked for the whole duration of the guard rotation period (when using guards).

George Danezis: presented two things.

  • Shameless plug: UCL has lots of PhD and postdoc positions available.
  • Take PETS out of the lab: Why are PETS not used: crypto papers are scary, understanding what to implement is hard, and implementing crypto properly is hard. Therefore George created a library containing a set of PETS, in a python library called petlib.

Roger Dingledine: Tor is looking for an executive director. Someone who is good at fund raising in the foundation domain, good at managing other people, and someone who knows the open source / free software ethos.

Kate (Tor's new director of communications): explains how to get media attention about defenses for attacks against Tor. (Attacks on Tor get way too much attention ;-)

Adam: presents the Open Technology Fund. OTF is a private non profit entirely funded by the US government. It supports internet freedom worldwide, focusing on censorship prevention and protection against surveillance. Project funding ranges from $10k to $900k. Every individual or organisation is eligible.

JC: Why doesn't the privacy community have a flag? Say containing a logo with something like a blind eagle (signaling concern as well as patriotism). We should have the right to make nude pictures and store them in the cloud, without risk. (That was just one example.) We need a privacy bill of rights. Blog tweet etc with hashtag #privacybillofrights

?? from the Guardian Project (I apologise, I didn't catch the name): working on OrFox, bringing the Tor Browser to Android.

Mark: working on defenses against website fingerprinting for Tor. Current defenses are way to expensive, partly because the effect of the attacks are overestimated (and the countermeasures try to defend to this exaggerated effect).

Claudia Diaz: from KU Leuven in Belgium: famous for great chocolate, great beer, and great crypto. They are hiring, for the privacy research group, and for the crypto and security group in general. They are looking for PhD's and postdoc.

Fransesca: wrote a master thesis in Italy about the revelations of Edward Snowden in relation to international law. Currently interning in KU Leuven to get to learn technology. Now looking for a PhD position: so if you are interested, let her know.

Nikita Borisov: studying attribute hiding in pictures (versus hiding identity in pictures by face blurring). I.e. how to hide what you are doing? Instead of blurring the face of a person, blur the fact he is not wearing pants ;-) Someone in the crowd shouted: "We have our flag" (referring to JC's statement).

Rob: as a research community we should realise that the real world impact of our work should be improved. Rob wants to help by creating an incentive to do this more. Therefore he wants organise a pre-PETS workshop, to discuss deployment challenges, and how deployment has influenced real-world users. Rob is looking for support, like PC members.

Nick: 10 year old programs are impossible to build today, because of horrible project dependencies. To solve this: collect all your dependencies with your code, stuff it all in a VM and store that all together.

Harry Halpin: W3C and IETF are looking to improve the security and privacy impact of standards they produce. They would like to have a unified security and privacy review policy. This fall they are looking to hire someone to do this, as a postdoc at MIT (under Wendy Selzer).

(The remaining talks were not video-recorded. I therefore will not blog about them.)

In case you spot any errors on this page, please notify me!
Or, leave a comment.