Do they really need to know who you are?

June 26, 2019

attribute-based-credentials identity-management privacy-by-design

Suppose you want to go to the movies tonight. Or perhaps your favourite band is coming to town. To secure a ticket for the event, you decide to buy one online. You select the event details, make sure you selected the right date and time, choose the e-ticket option (provided the shop even offers alternative delivery options), and you are ready to proceed to checkout and pay.

But wait.

Somewhere along the ordering process you are required to sign in to your account at the online ticket shop. If you don't have an account yet, you'll have to create one, and you will probably be asked to provide your full name, home address, phone number and email address. In some cases you will have to provide more information, like your age, and perhaps your credit card number (for future purchases). Doesn't that surprise you? No? Perhaps you are so used to it now, so conditioned to it, that you no longer really notice this identification step, let alone question it. Apparently you have bought into the myth that 'they' always need to know who you are. But do they, really?

At first sight it appears they do. After all, an account is personal, and so it must be tied to a person. And for many transactions, a name, email address or physical address is necessary to complete the transaction and to email the e-ticket or to ship the ordered items. But is that really necessary?

In fact, selling you a ticket really should not involve your identity at all. Just like brick-and-mortar box offices do not ask for your identity when you buy a ticket there. There is a simple exchange of money for tickets, and that's it.

In abstract terms, the exchange of money for tickets happens within a so-called session. The session starts as soon as you approach the counter of the box office and start talking with the cashier behind the counter. The session terminates as soon as you walk away (with or without your tickets). Everything that happens within this session is relevant for the duration of the session. As soon as the session ends however, you and cashier can (and should) forget all about it. The way the interaction is scripted within the session (you asking for a certain number of tickets for a particular show, the cashier checking that there are still seats available before asking you to pay, you paying the requested amount, the cashier checking the amount before printing the tickets, and you checking the tickets before walking off) ensures the integrity of the transaction. If something goes wrong, it should also be dealt with within the session. If you go home after getting your tickets and only then find out you got too few, you may try to return to the box office and pray to be able to get this sorted out, but in all likelihood this will fail: there may be another person behind the counter, the person behind the counter doesn't remember you, or he or she may simply not believe you.

This model can also be used online. Instead of asking you for personal details, the ticket shop could immediately proceed to the payment phase of the shopping process. Once the ticket shop successfully receives the payment (and yes, as we will see below, even this step can be done in a privacy preserving way), the ticket shop can simply display the e-ticket (whether this a pdf with a bar code or a html page with a QR code) on the purchase confirmation page. You can then print the pdf, or, if you bought the ticket using your smartphone, ask the ticket to be stored in your wallet. This `session-based' design for the ticket purchase process is privacy friendly: the ticket shop only learns that someone visited the shop, browsed for tickets, selected some for purchase, paid for them, after which the tickets were displayed in that person's browser (to be printed or stored in a wallet). The ticket shop does not need to remember much after the session ends.

People might object that this approach is perhaps privacy friendly but not very user friendly: what if something goes wrong in the purchase step? What if someone accidentally closes the browser window with the e-ticket before printing or storing it? If you would have an account, all tickets you ever purchased would still be there, waiting for you.

This is a very valid objection, and one that many privacy friendly approaches do not sufficiently address. One way to solve this problem in this particular case is to make smart use of transaction numbers during the payment process. Suppose every payment is tagged with a unique transaction number, for which the ticket shop knows the corresponding e-tickets that were bought during that transaction. This transaction number will be part of the transaction information of your credit card statement or the overview of your debit card payments kept by your bank. This allows you to look up a transaction number for a transaction that went astray. If the ticket shop provides a way to retrieve e-tickets after entering a transaction number for which the ticket shop indeed received the payment, such mistakes can be solved. Clearly the transaction number should be random and long enough to make it impossible for someone to successfully guess a transaction number and subsequently download the corresponding e-tickets that actually belong to someone else.

So buying e-tickets does not require the use of your name, phone number or email address. There is no need for the ticket shop to know you. But what about other online services? Do they really need to know who you are? Let's consider a few cases.

Your real and full identity is legally required in certain cases, when dealing with the government for example (filing tax returns, requesting permits, applying for social benefits), or when opening a bank account or an account at a cryptocurrency exchange (due to Know Your Customer (KYC) and Anti Money Laundering (AML) regulations). Similarly, your identity needs to be verified when you vote to ensure you are eligible, but great care is taken subsequently to ensure the anonymity of your actual vote.

This last example is in fact a nice case where your actual identity does not matter (at least not when entering the polling station). All that matters is that your eligibility to vote can be reliably established. Which would be a typical example of the use of attribute based credentials, except for the fact that people should not be allowed to vote twice.

(Because of the strong unlinkability properties of attribute based credentials a voter could present their eligibility attribute many times to vote many times without anybody noticing. Attribute based credentials can actually prevent this by forcing the voter to also show a fixed and voting-specific pseudonym that nevertheless is unknown to the issuer and hence cannot be linked back to the identity of the voter.)

Much more often online service providers (legally) need or simply want to ascertain certain properties about you, instead of obtaining your full identity. Many services are required to verify that you are above a certain age, to establish your current country of residence, or to confirm your nationality. Some of these services rely on your honesty in this when creating an account. Others actually require you to submit a scan of your identity card or passport so they can verify these pieces of information. In these cases, the creation of an account is unnecessary. Attribute based credentials can be used to prove all kinds of properties (attributes like your age, your nationality, your achievements) about yourself in a totally privacy friendly fashion, without revealing your identity. All online services that claim they need to know who you are because they need to know your age, your gender, your whatever are simply lying. There is a privacy friendly (and probably even much more secure) alternative available.

In fact a personal account at an online service need not be identifying at all. All that matters is that the service provider can reliably establish that the same person that created the account is now returning to access it. Who that person is exactly is completely irrelevant in many cases. Traditional ways of accessing an account using usernames and passwords at least provided the option to create an arbitrary silly pseudonym as username. Moreover, you could choose a different username for each service, making it impossible for the services to link each other's accounts. Nowadays, the use of email addresses (that are in principle identifying and at least allow such linking of accounts) as username is ubiquitous. Again we see a usability trade-off here: with an email address as username, you are unlikely to forget the username of an account and you at least have the option to ask for a password reset should you have forgotten your password.

Subscriptions to online newspapers, video or music streaming services etcetera, can be perfectly anonymous while still being strictly personal, for example by storing your preferences, your playlists, or your saves news articles. Of course, the more personal the preferences you store in such accounts is, the more likely it becomes you are identifiable based on these preferences. This does require a certain amount of effort from the service provider however, and so this is a far cry from services that simply require you to provide your real name, address, email address or phone number. Radically different approaches that dispense with the accounts concept altogether in fact also exist. Instead of having an account at each and every service provider, users maintain their own account information once, on their own devices, and share this information with service providers on request.

Digital payments can be made as anonymous as ordinary cash. In particular, credit card payments can be done in such a way that the merchant does not learn the credit card number while the credit card company does not learn details of the goods being purchased. For this VISA and Mastercard developed, already in 1996, the Secure Electronic Transaction (SET) protocol. It never caught on, unfortunately. A simple trick would similarly shield your bank account number from the online merchant when paying by debit card online: your bank could feed your payment through a central clearing account that would then show up on the payment information at the merchant side, instead of your own bank account number.

Another interesting class of applications are messaging services. Most of these use your mobile phone number (which is tied to your identity in many cases, because mobile phone operators are in many counties also bound to KYC requirements) as the main identifier for you and your contacts. Again, this is not really necessary. In fact in certain cases this is downright undesirable: the current state of affairs requires the contact list of a investigative journalist or foreign correspondent to contain the phone numbers of whistle blowers or informants. All the service needs is some kind of service specific identifier that ensures that messages intended for me are indeed delivered to me. This identifier could even be different for each of my contacts, i.e. the identifier you use to reach me does not have to be the same as the identifier someone else uses to reach me. Clearly the service provider is still able to link all these identifiers, unless the scheme is set up in a special way.

But perhaps we should look at the problem from another angle, and not ask ourselves how to make privacy invasive systems less privacy invasive by applying a certain privacy enhancing technology (while leaving the underlying mechanisms that lead to the privacy invasion more or less intact). Instead we should challenge these very mechanisms and assumptions and try to create systems differently from the ground up. (Admittedly the use of for example attribute based credentials is by itself already quite a paradigm shift, but it still fundamentally relies on the use of personal attributes to grant or deny access.)

As a small example of a totally different approach to access control, i.e. deciding who gets access to what, is the use of (physical) keys. I use my house key to enter my house. I use my car key to open and start my car. I use my bicycle key to unlock my bike (and untie it from the lamppost). My house does not `know' who I am. Neither does my car, nor my bike. In fact, who gets access is determined not by considering identity (i.e. tying access conditions to fixed identities) but simply by who has (a copy of) the key that opens the lock. The same mechanism is used to open lockers, safes, turn switches, etc.

As the above discussion illustrates: knowing you is hardly ever necessary!

In case you spot any errors on this page, please notify me!

Or, leave a comment.

iemand

, 2019-06-26 19:47:29

(reply)

Goed artikel! Wel een paar opmerkingen: 1) De laatste tijd iets teveel LateX getypt? 2) Waarom moet ik bij deze reactie ook mijn emailadres invullen?

Jaap-Henk

, 2019-06-27 09:57:40

(reply)

Fair point ;-) Is nu aangepast: naam en email adres zijn nu optioneel.

Qqwy/W-M

, 2019-06-27 12:04:41

(reply)

I think one big problem with the current (digitally provided) services are structured is that a capitalist ‘free market’ system is essentially evolutionary. Against contrary belief, this is not “survival of the fittest”, but instead “survival of the ‘good enough’”, where the bar for ‘good enough’ might be ludicrously low.

More concretely, I think there is a really large group of organizations that ask for data “because we can get away with it, since our competitors do it too”. And if ‘fittest’ means ‘earns the most money’, then gathering more data about who is using your service means that your service is more likely to survive. Why? Because data can (a) either be sold to a third party, making a quick buck, or (b) be used to get more insight on who is actually using the system, which can be used to make the service more streamlined (which does not mean ‘better for the user’! Rather, it means that maybe more people will use the system, or the average user will spend more money than before. Neither of which might actually be in the best interest of the user themselves.)

This I believe to be the reason why there are so few services that actually attempt to reduce the amount of data they require you to fill in. Except the few services that actually use privacy as a core feature to stand out from the competition, they have no reason to do so.

About etickets: For movie tickets etc. your story is iron-clad. For tickets to other types of venues, one reason they ask people (both online and in person(!)) for their information, is to prevent ticket resale (which is in many situations considered detrimental to the potential visitor). Of course, using attribute-based credentials for this use-case would solve this problem, I think. :-)

Jaap-Henk

, 2019-06-27 12:17:46

(reply)

Indeed, attribute credentials can be used to prevent the resale of tickets very well, as I explained in the older](https://blog.xot.nl/2012/11/06/irma-using-attribute-based-credentials-to-stop-resale-of-tickets/“>older) blogpost