Now that many more people are joining Signal, some interesting new privacy concerns about its contact discovery and notification protocol are popping up. In particular, people are surprised other people that are (no longer) in their contact list can still discover them when the join Signal. Moreover, some people that join Signal are annoyed that other people are automatically notified of this fact.
Let me be clear: Signal makes pretty sure that Signal itself cannot learn who is connected with who. The problem is with the way people expect contact discovery works, and how it actually works. And with a lack of control over how discovery works, and who gets notified of what.
The way Signal contact discovery works is as follows. When your register, a hash of your phone number is stored by Signal. Every once in a while, your Signal app computes a hash of all phone numbers in your contact list, and sends this list to Signal. Signal compares this list with the list of hashes of known registered users, and tells your app of any matches. The contacts that match are added to the Signal’s own address book.
This means that if your number is on my contact list, your number will be added to Signal’s own address book on my phone, and I will be able to contact you via Signal. This happens even if my number is (no longer) in your contact list. On top of that, Signal notifies me when a person in my contact list joins Signal (again irrespective of whether I appear in that person’s contact list). This is not necessarily what people expect.
Some people would only want to be discovered by people that are on their own contact list. And not be unknown people that happen to know their phone number, or by people that they used to know but parted ways with (these could be partners, but also previous colleagues, or patients). Moreover, notification of discovering new contacts should be optional, that is: the person discovered should be able to control whether others are notified of this fact.
Indeed Signal’s matching process could be made such that people are only notified if both have the other person’s number in the contact list. But note that Signal would be next to unusable if I would not be able to tell at all whether you are on Signal if I want to send you a message with it - if you would be able to block discovery of your number, you would never receive a Signal message. So there is a privacy versus usability trade-off there. It would be good to allow users to choose their own position regarding that trade- off. You might even want to offer users the option to make a selection of their contacts available for discovery, meaning that only selected contacts will ever know that you are on Signal, and can contact you there.
Regarding notification, note that even if Signal would not notify users of new contacts, users could very easily test whether someone in their contact list is on Signal (by trying to send a message and see what happens). With some clever scripting (malicious) users could easily automate this. But as almost all users will not try this for real, limiting notifications actually matters in practice. Again something that should be implemented.
Note there are other, more privacy friendly, ways to establish contact and send messages anonymously (note that once you exchange messages the Signal server by necessity learns the social graph). The underlying issue is that all these services use phone numbers as contact points. You’d rather use something more ephemeral to establish contact, something that is not public, and something that you can revoke for particular contacts. See this paper I wrote five years ago (there is also an open access pdf version).
(This blog post was based on some Twitter discussions. See this tweet and all its branches.)
P.S.: Interestingly enough a matching algorithm that only discovers contacts when both A’s number is in B’s list of contacts and vice versa is inherently more privacy friendly.
Let C(A) denote the list of phone numbers on the contact list of A. Let KDF be a key derivation function, i.e. a hash function that takes a considerable time to compute. Because of this, when given O = KDF(I) it is impossible to do a brute force search for I. But of course you can still test a suspicion for I to see if KDF(I) matches O.
Define SAB to be KDF(A|B).
The basic algorithm would run as follows.
If A discovers B, this means that B sent SBA in the first round. A also must have sent SAB in the first round. As B will also send SAB in the second round, B will also discover A.
Note that this way it becomes harder for the server to search for users that registered for the service. Instead of trying to brute force search for A that matches a hash of A, now the server has to brute force for all possible combinations of A and B. Even testing whether a suspicion that A joined the service becomes harder, because you either have to know a contact B of A (and test KDF(A|B), or brute force this for all possible values of B.