The European Commission is proposing a regulation on preventing and combatting the sexual abuse and sexual exploitation of children. I recently wrote (in Dutch) about some concerns I have about the proposal, especially the requirement it creates for large messaging platforms to implement client side scanning for child sexual abuse material (CSAM). Today I learned I forgot something: it appears to be possible to DDoS the system!
Client-side scanning is a technique where the messaging app matches, on the device of the user, every image about to be sent against a set of fingerprints (also called digest or hashes) encoding known CSAM. A match is reported to a dedicated abuse centre for further investigation. Today, a colleague of mine pointed me to a recent paper, showing that it is possible for an attacker to create a variant of some innocent source image that matches some target fingerprint that belongs to known CSAM.
This leads to interesting avenue of attack, allowing activists opposing client-side scanning to DDoS the system as follows. Using the method outlined in the paper, activists could create innocuous images that nevertheless match a digest for some known CSAM. Note that unlike ‘normal’ attacks, these images could actually be totally meaningless and look like noise because the sole purpose is to trigger a match, and nothing else. This makes such images much easier to generate. Added on 2023-10-10: There are claims that Apple’s NeuralHash has a false positive rate of 3 in 100 million while Microsoft’s PhotoDNA is claimed to have a false positive rate 1 in 10 billion. These rates are within the realm of brute-force testing. Both protocols are proprietary so these claims have not been verified independently. There are reports that PhotDNA is in fact not irreversible, and also not that good in matching similar pictures: it does not detect flips, mirroring, 90-degree rotations, or inverting of images.
Activist could subsequently decide to regularly send these images to one another, thus triggering the detection mechanism implemented through client-side scanning. The image would be reported to the abuse centre, that needs to apply its investigation procedure to determine whether the image is actually some known CSAM. Of course it is not (it is after all an innocuous image that was only constructed to falsely trigger the matching algorithm), but the centre needs to devote some of its time and resources to arrive at this conclusion. Depending on the number of participating activists, and the number of images they exchange, this might overwhelm and thus DDoS the centre.
A prerequisite for this attack to work is to have knowledge of some fingerprints for known CSAM. Is this a reasonable assumption?
Of course, law enforcement wants to keep the database of
fingerprints of known CSAM confidential, in particular to prevent
child abusers to evade detection. But a straightforward
implementation of client side scanning stores the full database of all
these fingerprints on the device of the user. (This is how
Apple’s proposed
and lather withdrawn system was supposed to work.) Even if this
database is scrambled, experience shows that skilful and determined
attackers will eventually be able to retrieve such a database. An
alternative would be to only store a copy of the database at the
messaging service provider and implement an interactive private
matching protocol, where the device proves to the service it is
not sending an image with a fingerprint in the database. This
would surely increase the load on the messaging service provider
beyond acceptable levels as it would have to run such a protocol for
each and every image being sent. Even if this turns out to be
inefficient but doable, this still depends on the messaging service
providers to properly guard this database and keep the fingerprints
confidential. And on the protocol being able to protect even against a
malicious device set to learn which fingerprints lead to a match. It
thus appears likely the database of fingerprints of known CSAM
eventually leaks. Edited on 2023-10-10 because it is more
complicated than I originally thought. This is of course a bad
idea, so Apple’s proposed
and lather withdrawn system stores a blinded copy of the
fingerprints on the device of the user (thus making it mathematically
impossible to recover the actual CSAM fingerprints), and uses a
relatively straightforward set intersection protocol to find matches
on the server. This does increase the load on the client, the network
and the server somewhat, because with every encrypted image an
additional ‘safety voucher’ needs to be computed, sent, and matched.
But this only adds a linear cost, as images need to be encrypted, sent
and processed anyway. Note that if the server providers themselves are
not supposed to be involved in the matching process and learn the
outcome of the match, this infrastructure needs to be created and
maintained by the European abuse centre; this is a
significant cost. Because matches are only computed on the server, the
device does not learn whether a picture matches. This prevents
attackers to run a dictionary attack using the matching protocol
against the blinded database of fingerprints to find collisions.
Provided all providers use a similarly secure matching protocol,
leaking (a subset of) the database of fingerprints of known CSAM is
not really very likely.
Another question is whether the abuse centre can protect itself from such a DDoS attack. It is not immediately clear that it could: DDoS attacks are notoriously hard to defend against.
Of course the abuse centre quickly learns that certain fingerprints triggering such false positives are targeted by a DDoS attack. Removing these fingerprints from the database is clearly not an option, as this would prevent the detection of actual CSAM that matches these fingerprints. Perhaps the client-side scanning algorithm could be extended to do some additional testing when a match with a fingerprint of known CSAM is found, to filter out any known images used in a previous DDoS attack. This would lead to a cat and mouse game, with activists generating new trigger images that then need to be added to this ever growing pre-filtering database. It is unclear whether this converges to a situation that is manageable both for the abuse agency and the messaging service providers. And it is also unclear to what extent such pre-filtering could adversely affect the detection of actual CSAM.
Although the objectives and incentives are entirely different, the difficulty of designing sufficiently strong digital rights management systems should be a clear warning sign. And perhaps more thought should be devoted to determine whether such a DDoS threat can properly mitigated before pushing ahead with this (anyway ill-perceived) proposal.