Tainting the CSAM client-side scanning database.

October 11, 2023

The proposal of the European Commission for a regulation on preventing and combatting the sexual abuse and sexual exploitation of children is currently discussed in Dutch parliament. I recently wrote about some concerns and the risk of a DDoS attack. It turns out it is also possible to taint the database of images of known child sexual abuse material (CSAM), allowing an adversary to trick the client-side scanning system to also trigger an alarm for other, non CSAM, material. Client side scanning could thus be vulnerable to undetectable function creep.

Client-side scanning works by matching images on a user device against the fingerprints stored in this central database. There is no independent way for the service providers or the devices to verify that the fingerprints in this data actual correspond to CSAM. Therefore the devices have to fully trust this database that this is indeed the case. As such the proposal already suffers from a significant risk of function creep, as the scope of the fingerprints contained in the database could be extended to allow the detection of other material, like those propagating terrorism or other hateful or abusive content. To be clear this is absolutely out of scope of the current proposal for a regulation that is strictly limited to the prevention of the spread of CSAM. But other legislation could be enacted in the future to broaden the scope, without any need to change anything to the detection system already in place to detect CSAM. This significantly lowers the barrier to enact such legislation. Hence the risk of function creep. But at least any push for such function creep will happen in plain sight.

I am more concerned about an surreptitious form of function creep that appears to crop up with client-side scanning.

Clearly, as new CSAM is being discovered, its spread needs to be prevented by adding the fingerprints of this newly discovered CSAM to the database. Strict procedures are (purportedly) in place to guarantee that only fingerprints of known CSAM are added to this database. (Otherwise, fingerprints of other offensive content could easily be added.)

Unfortunately, it is impossible to guarantee this.

Consider an entity that is allowed to propose new entries to the CSAM database. A recent paper by Jonathan Prokos et. al., shows that it is possible for an attacker to create a variant of some arbitrary source image that matches some target fingerprint. A malicious entity could surreptitiously taint the database with other offensive content using these techniques as follows:

  • It takes an image A of some offensive content, not being CSAM, for which it wants to add a fingerprint to the database. The entity keeps A secret.
  • It computes its fingerprint f(A). (Although the algorithms to compute this are proprietary, researchers have reversed engineered the commonly used PhotoDNA algorithm, and a binary implementation of PhotoDNA is available to compute the fingerprint.)
  • It creates a convincingly looking synthetic variant of CSAM using recent AI techniques (or waits until it discovers a new image as being CSAM). Call this image B.
  • Using the techniques from Prokos et. al., it subtly modifies this source image B to create another image B whose fingerprint F(B′) matches the target fingerprint F(A). In other words f(B′) = f(A). Note that the modifications to B do not visibly alter the image, so B will still pass as CSAM.
  • The malicious entity submits the altered image B to the database.
  • Because B is visibly indistinguishable from B, which was constructed or selected to be CSAM, the fingerprint f(B′) of B is accepted for inclusion in the database.
  • But as by construction f(B′) = f(A), the malicious entity has now successfully ensured that the client-side scanning algorithm will also report any occurrence of A (the non-CSAM but otherwise deemed offensive image) to the abuse centre.

This shows that the database can be tainted with non-CSAM material by an entity that can submit entries to it. In the case of the proposed European regulation these are the Coordinating Authorities set up by each of the member states. Some of the member states (Hungary and Poland, for example) have shown certain authoritarian streaks, and should perhaps be less trusted with this new detection capability…

The above attack is theoretical. The real question is of course whether this Would be an issue in practice?

Suppose the database is tainted with image A (deemed offensive, but not being CSAM) as above. Then any person about to send this image will be notified that the scanner detected a possible case of CSAM, and that a report has been filed with the central reporting EU centre. Even though the EU Centre will quickly determine this is not a case of actual CSAM, this could still lead to chilling effects, as people may feel reluctant to use messaging apps to discuss anything sensitive as there is a (perceived) risk of being reported to the central EU Abuse centre. Even if this happens only occasionally.

To really judge the impact of this attack in practice, it is important to also look at what happens when such a report is filed with the abuse centre. The report contains the image A (offensive perhaps, but not CSAM) and the fingerprint against which it matched. The abuse centre will look op the image B corresponding to the fingerprint (constructed by the malicious entity which looks like CSAM) and immediately conclude that A does not look at all like B and also is not CSAM. One could conclude that the abuse centre could thus easily spot the malicious entry in its database after a few of these reports all concerned with the same image, and delete the corresponding fingerprint from the database. If that were the case, this avenue of attack would not be a problem in practice. This assumes, however, that the abuse centre keeps track of such false positives over time to detect such maliciously uploaded fingerprints. This may or may not be the case.

Moreover, we cannot consider this particular attack and its potential mitigation in isolation: we also need to consider other avenues of attack, and whether the mitigation proposed above perhaps enables other forms of attack. This brings us back to the DDoS attack on client-side scanning that I wrote about earlier. Because that attack essentially uses the possibility to find fingerprint collisions in the opposite direction: it creates an innocent image whose fingerprint matches the fingerprint of known CSAM in the database. As discussed

Removing these fingerprints from the database is clearly not an option, as this would prevent the detection of actual CSAM that matches these fingerprints.

So here we find the EU abuse centre confronted with an image that is clearly not CSAM, but perhaps could be considered offensive to certain governments. How does it determine whether the fingerprint being matched belongs to actual CSAM (uploaded in good faith) or synthesised (uploaded maliciously)? The judgement would be highly contextual and be wrong ever so often. The mere fact that a particular fingerprint is or is not the target of a DDoS attack (the number of matches could give at least that away) does not necessarily offer any resolution. And even a non-CSAM image that is uploaded only occasionally but that triggers the client-side detection system could just as well be an image constructed to match the fingerprint of actual CSAM, just to try to trick the abuse centre into removing the fingerprint from the database.

Also note that if the client-side scanning system would be structured slightly differently, with reports of possible CSAM only being forwarded to certain authorities (e.g. when reports concerning a Hungarian user would only be forwarded to the Hungarian authorities), this form of function creep would not necessarily be detected. This shows that the overall design of the full client-side scanning system, including the way reports are handled, matters a lot.

In case you spot any errors on this page, please notify me!
Or, leave a comment.