The 15th Privacy Enhancing Technologies Symposium (PETS) was held June 30 – July 20 2015 at Drexel University, Philadelphia. Please find below a summary of some of the presentations given on day #1. (I have not included a summary of all talks, and do not pay an equal amount of attention to all talks).


Session 1: Censorship Resistance (Chair: Rob Jansen)

Analyzing the Great Firewall of China Over Space and Time

By: R. Ensafi, P. Winter, A. Mueen, J. Crandall.
Presented by: Roya Ensafi.
Paper link.

Roya introduced a technique called hybrid idle scans. It is based on the idle scan technique for port scanning. This relies on the predictability of a so-called IPID that is embedded in IP packets from a given source. Typical TCP/IP protocol implementation assigns values to this mandatory field generally by a fixed value (1) increment. (Although more recent OS-es have patched protocols that randomise this number.)

The idea is that a probe helps testing whether communication between a client and server is blocked in either direction. The probe connects to the client obtaining the client IPID. The probe forwards this to the server who tries to connect to the client. If the connection is successful, there is no blocking. If the server sees no responses, blocking takes place. The probe can determine the direction of the block using the IPID. If the probe connects again to client and sees the IPID is incremented by only 1, then the client never saw the message from the server (i.e. the block is from server to client). If the IPID is higher, the client tried to sent a few messages to the server that never reached it. In other words, the block is client to server.

They applied their technique to measure the Great Firewall of China. They found the following. There were no geographical differences found in what servers were blocked, and how. Most blocks are server to client. Some Tor relays were accessible throughout the day. Those were the Tor relays that were not recognised as Tor relays by the active probes sent by China.

A Glance through the VPN Looking Glass: IPv6 Leakage and DNS Hijacking in Commercial VPN clients

By: V. Perta, M. Barbera, G. Tyson, H. Haddadi, A. Mei.
Presented by: Marco Barbera.
Paper link.

VPNs are increasingly popular, to protect against an untrusted local network (like e.g. a public WiFi network). The authors tested several popular VPN providers and found two main issues: IPv6 leakage, and DNS hijacking.

IPv6 leakage.
Clients prefer IPv6 connections if available. If the VPN is not properly configured to also tunnel IPv6 traffic, this leaks the fact that you connect to a particular server directly. It is hard to spot by ordinary users, and clients are easily triggered by an attacker to connect over IPv6 instead of IPv4.

DNS hijacking.
An adversary can easily convince the host network that the DNS server is a local host, which is therefore not protected by the VPN. In OpenVPN this works as follows. The local, untrusted, gateway creates a second interface with the same address as the DNS server, and advertises this gateway address to the client. As a result, the client thinks that the DNS server is local, and a request is sent unencrypted to the router, unprotected by the VPN.

Blocking-Resistant Communication through Domain Fronting

By: D. Fifield, C. Lan, R. Hynes, P. Wegmann, V. Paxson.
Presented by David Fifield.
Paper link.

Suppose you want to connect to an outside server through a firewall that sees all traffic. Domain fronting helps you circumvent any blocks set by the firewall. The idea is to use different domain names at different communication layers. The censor sees a domain name he does not want to (in practice cannot) block, like a very popular website. Embedded in the stream is the actual domain name of the host you want to connect to.

Specifically: HTTP specifies a Host field in the http request header. This is typically used for co-located domains, to specify which specific site you want to access when accessing the generic domain. Domain fronting connects to the generic domain using https, and has the actual domain name embedded in the Host field (which is encrypted because of the https connection). As result, the censor cannot see this value. This requires cooperation from the non-censored domain to forward the request specified to the Host. Typically you run a proxy there.

The main idea is similar to what other censorship prevention mechanisms, like decoy routers (TeLeX), or cloud transport, use. They all use TLS and some invisible tag that the censor cannot see, but can be seen by some network intermediary. The intermediary uses the tag to forward the message to the intended recipient.

Why does this idea works so well? Censors have the following techniques they can use: address blocking, content blocking, or active probing. Domain fronting renders all three techniques useless.

Domain fronting is expensive though. To front Tor traffic (11 GB in May 2015) costs about $2.000 a month.

Session 2: Anonymous Communications (Chair: Nick Mathewson)

20,000 In League Under the Sea: Anonymous Communication, Trust, MLATs, and Undersea Cables

By: A. D. Jaggard, A. Johnson, S. Cortes, P. Syverson, J. Feigenbaum.
Presented by: Aaron Jaggard.
Paper link.

A known problem for Tor is traffic-correlation attacks. In such attacks and adversary who monitors the link from the client to the Tor network (the guard node) as well as the link from the Tor network to the endpoint server (the exit node) can correlate these by looking at the similarity of both streams (in terms of traffic patterns; the content is of course encrypted and hence incomparable).

One solution to this problem is network trust that allows users to avoid relays that are not trusted (enough). The paper describes an ontology and a belief language that is automatically translated into probability tables for an Bayesian belief network. This belief network can then be used to determine the level of trust to assign to a particular relay, and hence influences the decision to use a particular Tor relay.

Beliefs are not something users specify themselves; rather they would obtain it from trusted third parties, like NGO’s. The paper also tries to model how things like Mutual legal assistance treaties (MLAT’s) should contribute to trust beliefs. They try to similarly model adversary types.

Guard Sets for Onion Routing

By: J. Hayes, G. Danezis.
Presented by: Jamie Hayes.
Paper link.

In Tor, if you randomly select an entry and exit node each time you connect to a server, the probability that at some point both are under control of the attacker increases. In that case traffic correlation attacks (see above) become possible.

As a countermeasure, fixed guard nodes are proposed that serve as the entry node for a prolonged period of time (typically 3 months). This approach means that new guards are initially underused, because it takes on average 1.5 months before they are being considered as replacements. This makes choosing your rotation time hard: you either have easy compromise, or under-utilisation.

Jamie discussed how the construction of guard sets can overcome this problem.

Defending Tor from Network Adversaries: A Case Study of Network Path Prediction

By: J. Juen, A. Johnson, A. Das, N. Borisov, M. Caesar
Presented by: Joshua Juen
Paper link.

Skipped.

Session 3: 13:45 Identity Protection (Chair: Sadia Afroz)

De-anonymizing Genomic Databases using Phenotypic Traits

By: M. Humbert, K. Huguenin, J. Hugonot, E. Ayday, J. Hubaux
Presented by: Mathias Humbert.
Paper link.

We are facing a genomic data deluge because of the decreasing cost of DNA sequencing. 23andme claim to have about 1 million genotyped people in their database. Genomic data brings great benefits, like early diagnosis of dangerous illnesses. On the other hand predisposition to certain diseases can lead to discrimination. This is not science fiction: the local football team in Lausanne has genotyped their team to better understand the health situation of their players!

Anonymization of genomic data does not provide enough protection. E.g. statistical relationships between genomic data and phenotypic relationships exist. For example, one can reconstruct facial shapes using genomic data, used by certain police forces (and this work of art).

A typical identification attack scenario runs as follows. Adversary knows someone is a member of a genomic database, containing n genotypes. The adversary computes the maximum likelihood that the phenotype of the person matches the genotypes, over all genotypes in the database.

This can be extended to a perfect matching attack, with n genotypes and n phenotypes to determine the best pairing of phenotypes (i.e. known people) with a set of known genotypes.

The authors trained algorithms to perform these attacks and reached the following results. The identification attack outperforms the baseline (just observing sex) by 3-8 times. The perfect matching attack algorithm matches correctly 23% of the times when matching 50 individuals with 50 genotypes.

They launched a new community website, collecting recent research and relevant events, for example.

Toward Mending Two Nation-Scale Brokered Identification Systems

By: L. Brandão, N. Christin, G. Danezis, Anonymous
Presented by: Luis Brandão.
Paper link.

Several countries are pushing for a kind of federated identity management systems, where users access services by being redirected to an identity provider to prove their identity. Typically a hub is used to allow the user to select which identity provider to sign in to. The Federal Cloud Credential Exchange, now called connect.gov developed in the US in the context of NSTIC, uses this approach. Also Gov.uk verify in the UK uses this approach. This creates potential for undetected and unaccountable surveillance.

The main problem is linkability: the hub essentially sees all information exchanged.

For example, the hub sees the IdP user pseudonym when computing the service specific pseudonym. The authors propose as a solution to use secure two party multi-party computation between IdP and the hub that allows both to compute the service specific pseudonym, without the hub learning the IdP pseudonym, while the IdP does not learn the service specific identifier.

This reminds me a lot of our earlier work on the Identity Crisis, which discusses these (and other problems) extensively.

In terms of solutions, we believe the problem should not be found within the frame of federated identity management. Instead approaches based on attributed based credentials provide more fundamental solutions to the typical problems of federated identity management schemes.

PETS Keynote Address: Secure computation: Where do we go from here?

The keynote address was given by Jonathan Katz (University of Maryland). I’m not going to even try to summarise a summary of tens of years of research… Let me just say I totally enjoyed the overview, and give you Jonathan’s insights on where we will go from here.

Secure computation will have a tremendous impact on real world security and privacy computing. This has been said before, but there now is real world interests. There are four companies trying to commercialise secure multi-party computations: Partisia, Sharemind, Sepior, Dyadic. Also there is interest from government (IARPA, DARPA, US Air Force Office of Financial Research).

The question is: will it be a niche interest, or get more widespread? Also: what is the business model? And what are the right security requirements (especially because most results are in the semi-honest model)? What kind of questions are relevant from a practical perspective to be studied in research: what is the right ecosystem, what are suitable cost metrics, what are real barriers to real-world application, and how can they be addressed?

Session 4: Secure Computation (Chair: Nikita Borisov)

Practical Forward-Secure Range and Sort Queries with Update-Oblivious Linked Lists

By: E. Blass, T. Mayberry, G. Noubir
Presented by: Travis Mayberry
Paper link.

The setting is the following. You want to outsource some data to a cloud service, but the data has numerical values you want to search on later. In fact, you have a slightly more general setting, where a set of users upload data, and a separate surveyor is going to search the data. Example: physicians uploading data, and a hospital or medical association wanting to

The goal is to achieve the following forward privacy requirement. Suppose if you issue a range query (e.g. give me all entries with numerical value in the range 1 to 5) at some point. Does the server learn which files that are uploaded after the query was issued match this queries. If not, then the scheme provides forward privacy.

None of the existing solutions are ideal, as the following table shows.

communication computation range privacy forward privacy
order preserving encrytion + +
oblivious RAM + + +
homomorphic encryption + + +
pairing based +

Note that pairing based schemes are order of magnitudes more efficient than homomorphic encryption based schemes.

The idea is store a linked list on a linear array modeling the cloud storage, one list for each possible numerical value. The start of this list (i.e. the cell in which the first value is stored) is the hash of a known constant and a shared key (shared between surveyor and users). Subsequent locations are selected at random and stored as forward pointers together with the current value to be appended to the list. Users remember this next to be written cell for the list. As a result, users access a cell at most once. Note that the server does not learn the forward linking: a cell that is currently written can belong to any of the existing lists. The adversary only learns this if the lists are traversed as the result of answering a range query (explained below).

When issuing a query, the server returns the elements in all the associated lists in the queries range, by traversing the lists. A special trick is used to ensure that if you read beyond the end of the list (accessing a cell that does not contain a new value), this does not make it possible to link the location in the array read in vain to the next value appended to this list (that would be stored at exactly this location).

Parallel Oblivious Array Access for Secure Multiparty Computation and Privacy-Preserving Minimum Spanning Trees

By: P. Laud
Presented by: Peeter Laud.
Paper link.

Setting: n parties with n inputs want to securely compute the result of a function over all inputs, revealing nothing more than the function result to all parties. One can use garbled circuits, homomorphic encryption techniques or secret sharing techniques to solve this. The latter are faster.

One important subroutine used within these techniques is private read and write access to an array. One option to implement this is using oblivious RAM. These are inherently sequential. If you work with secret sharing you want the reads and writes in parallel. Peeter presented a way to implement this primitive.

Recursive Trees for Practical ORAM

By: T. Moataz, E. Blass, G. Noubir
Presented by: Tarik Moataz
Paper link.

Skipped.

PET Award Reception (Chair: Rachel Greenstadt and Nick Hopper) LeBow 220

18:30 Closing (Apu Kapadia and Steven Murdoch)

Observations

Like last year in Amsterdam, PETS again was very well attended. The large room was quite full of people; I estimate more than 100 people attended.

PETS vs. PoPETS

PETS papers are published in a new journal, the Proceedings on Privacy Enhancing Technologies (PoPETs). This brings journal style reviewing to conferences. The journal has multiple deadlines a year. They are moving towards 4 deadlines each year. Papers accepted on or before the February deadline will be presented at PETS that year. This model allows papers to get tentatively accepted with major revisions that need to be addressed in the next submission. The journal is open access, published by De Gruyter.

Some figures. The expected reviewing load for a PC member is 4 papers per 2 months. The expected number of reviews per paper is 4. PoPETS issue 1 received 46 submissions, issue 2 received 73 submissions (including 6 revisions).