(This is a provocation for the workshop “10 Years Of Profiling The European Citizen”, June 12-13, 2018, Brussels, for the panel on “Transparency theory for data driven decision making”)
Perhaps Louis Brandeis can be considered the father of all transparency theory because of this famous quote of his:
“Publicity is justly commended as a remedy for social and industrial diseases. Sunlight is said to be the best of disinfectants; electric light the most efficient policeman.”
Indeed transparency is commonly seen as an important tool to counter the ill effects of automated, data driven, decision making.
But I cannot fail to wonder: what if the sun does not shine?…. Wouldn’t that render transparency useless? Indeed, wouldn’t that turn transparency into a perfect cover-up, allowing organisations to hide in plain sight, pretending not to be engaged in any nefarious activities?
Below I will discuss the limits of transparency and discuss six different reasons why transparency by itself is not enough. First, transparency only helps if there are enough experts to verify the information provided. Second, transparency is useless if subjects do not have agency and have no meaningful way to challenge a decision. Third, transparency requirements may be subverted or sidestepped by providing information in an opaque way. Fourth, certain decision making process are hard to explain to begin with. Fifth, a decision may be hard to challenge because scrutinising the decision requires domain expertise and sufficient (computational) resources. And finally, transparency may conflict with business or government interests.
These six arguments are presented in detail below, followed by a brief conclusion.
It is a common mantra in the open source community: “many eyeballs make bugs shallow”. In fact it is one of the main arguments why the source code of all software we develop should be open. By publishing the source code of the software, one allows public scrutiny of that code by other, independent, experts. Bugs (i.e. programming mistakes) will be found that would otherwise lay undetected in the source code forever. And more fundamental design decisions can be challenged, possibly leading to improved designs. These improvements do not only apply to the system under scrutiny: also other systems can and will benefit from these new insights.
However….
The mantra assumes three things. First, that an unlimited number of eyeballs, i.e independent experts, is available to scrutinise the growing pool of open source projects. Second, that these experts have an interest or incentive to spend some of their (valuable) time on this. And third, that every open source project is equally likely to attract the attention of a sufficient number of experts.
All three assumptions are unfounded.
The number of experts is severely limited. These experts may often be inclined to start their own open source project rather than contributing to someone else’s project. And many open source projects remain unnoticed. Only a few, high profile open source projects receive the eyeballs they need.
Translating this to the use of transparency to balance data driven decision making, we observe the same set of potential problems. Even if all automated decision making by all government organisations and all business is done in a transparent way, there will always be only a limited number of experts that can scrutinise and challenge these decisions. Which decisions will actually be challenged depends on the incentives; again we may suspect that only high profile cases attract the attention they deserve. (There is one mitigating circumstance however: in the case of data driven decision making there is always a subject that is confronted with the decision who has every incentive to challenge it.)
Let’s assume transparency works in the sense that ‘bugs’, i.e. improper data driven decisions, come to light and people want to take action. Transparency by itself does not allow them to do so, however. (Note that for exactly this reason a large class of open source software is in fact free, as in free speech. This allows anyone with the necessary technical capabilities to change the source code, fix whatever bug they find, and redistribute the solution.)
In many cases people have no agency whatsoever. Computer says no, tells you why, but no matter how you try, you will not be able to successfully challenge that decision. This is caused by several factors.
The first, most important one, is the lack of power. A single person, wronged by a decision of a large organisation, is but an itch that is easily scratched. Even if the case involves a larger, powerful, group of subjects that are collectively impacted by the decision, or if the case is taken over by a powerful consumer organisation or a fancy law firm, you still need laws and regulations that create a (legal) basis on which the decision can be challenged. Finally, the process of appealing a decision may be so cumbersome that the effort to challenge the decision may thwart the benefit of doing so. Individuals easily get stuck into bureaucratic swamps. And do note that businesses as well as governments are masters in creating such swamps for their own benefit.
In other words, transparency by itself is useless without agency. You need the means (law, process) to challenge the decision, and the power (resources, capabilities) to apply them. Moreover the process needs to be effective, i.e. the cost of executing the process should be proportional to the expected gain. In other words: for low impact decisions the process needs to be lightweight, whereas for high impact decisions the process can be more extensive.
A mirror is made of glass, but it is not transparent. A house of mirrors is a seemingly transparent maze where one easily gets lost.
The same problem plagues transparency theory: you may be transparent about the decision making process, but your description may in effect be opaque, hard to understand, hard to access/find, and/or hard to compare with others.
Many privacy policies are overly legalistic, making them unintelligible by the average user. They are often far to long too, requiring so much reading time that no one ever reads all privacy policies of all sites they visit.
Even if you honestly try to be transparent about the decision making process and honestly aim to explain a particular decision to the subject of that decision, this explanation may still be too complex to understand. The explanation may use jargon, may depend on complex rules (if rule-based at all), and may depend on so many variables that one easily loses track.
These properties may also be put into use disingenuously, to make the explanation unintelligible on purpose, while claiming to be transparent. We can observe a similar effect in the telecommunications market where mobile phone subscription plans are complex, and where different operators use incomparable price categories. As a result ordinary users have a hard time figuring out which offer suits them best (and a whole market of comparison services was born, not only for the telecommunications market, but also for insurances for example).
It very much depends on the decision making process whether it is easy to supply a proper explanation for every decision made. In classical rule based expert systems this is certainly possible (by disclosing the rules applied and the facts/data/propositions on which they were applied), but in modern machine learning settings this is much less clear. In many cases the machine learning system constructs an internal representation ‘explaining’ the example cases presented to it during the learning phase. But this internal representation, the model of the type of cases the algorithm is supposed to be applied to, is not necessarily close to how we, humans, understand these type of cases and the logic we apply to decide them. A complex vector of weighing factors that represent a neural network does nothing to explain (at least in any human interpretation of the concept of ‘explanation’) the decision made with that neural network.
Challenging a decision is hard. Even when given the explanation of the decision and the data underlying the decision, it may be hard to verify that the decision is valid. This is caused by several factors.
First of all, you need the necessary domain knowledge to understand the explanation, and to spot potential problems or inconsistencies in it. For example, to understand whether a decision in, say, environmental law is correct you need to be an expert in environmental law yourself. (This partially overlaps the first argument of the difficulty of finding and incentivising experts to challenge a decision.)
Secondly, the validity of a decision depends both on the interpretation of the data on which it is based, and the interpretation of the rules used to arrive at the decision. Moreover, the selection of the rules matters a lot: it may very well be that applying a different set of rules would have lead to an entirely different set of decisions. (And all this assumes that the decision making is in fact rule based, allowing such a clear interpretation.)
Thirdly, the data set may be so large and the model used to ‘compute’ the decision so complex, that even a basic verification of the consistency of the decision itself (let alone any complex ‘what-if’ scenario analysis) cannot be done ‘by hand’ and thus requires access to sufficiently powerful data processing resources. In the worst case the problem is so complex that only the source of the decision (i.e. the organisation making it) has enough resources to perform such an analysis. This totally undermines the principle of independent oversight.
Lastly, the explanation of the decision may be valid and reasonable, but may not be the actual reason the decision was made. A common example is the (inadvertent) use of proxies (like home address or neighbourhood) for sensitive personal data categories like race or religion. Sometimes this happens on purpose, sometimes this is a mistake.
Even if the system used allows for the proper explanation of all decisions made, publishing these explanations may reveal to much information about the underlying model used to arrive at the decision. Of course, that is the whole point of requiring transparency. However, certain organisations may wish to keep their decision making logic secret, and may have a legitimate interest for this. For example, law enforcement or intelligence agencies have every reason not to reveal the models they use to identify potential terrorists (for fear of the terrorists to change their modus operandi to evade detection). Similar arguments apply to fraud detection algorithms for example. Business, like credit scoring agencies, may not want to reveal their models as these algorithms, these models, may be the only true asset, the crown jewels, of the company.
We have discussed six arguments to show that transparency by itself is insufficient to counterbalance the ill effects of automated, data driven, decision making. For transparency to work, agency is a prerequisite. We need suitably incentivised experts that can help challenge decisions. Proper enforcement of transparency requirements is necessary, to ensure that the information provided is accessible and intelligible. Using hard to explain decision making processes should be made illegal. And independent verification platforms that make it possible to verify and analyse decisions based on complex models and data sets must be made available. Finally, where transparency conflicts with other legitimate interests, a clear set of principles are necessary to decide when an explanation is not required.
Without sun, transparency is the perfect cover, hiding in plain sight what everyone fails to see.