The other day a few students asked me whether the most secure password actually was a passphrase consisting of at least three words. They had read this here (which is more or less a Dutch translation of this post). Passphrases are also recommended by Bits of Freedom. The source of the students was a bit dodgy however (claiming that the passphrase "This is fun" is secure forever!), so I decided to investigate.

In our analysis we assume the following (based on the extensive blog entry by Jesper Johansson on this matter, but extended and adjusted based on other sources).

- Password characters are typically selected from a set of 76 symbols: 26 lowercase letters, 26 uppercase letters, 10 digits and 14 other symbols (!@#$%^&*()-_+=). Not all of them are as frequently used however. Many passwords are restricted to a subset of the 32 most common symbols. A truly random password can use all printable ASCII characters, of which there are 95.
- English language has over 600.000 words. The average passive vocabulary (receptive knowledge) of a native speaker ranges from 10.000 to 20.000 words.
- The number of words actually used (productive knowledge) by a native speaker is much lower than that. Jesper Johansson estimates this to be 300, but I think this is highly conservative. One source claims a random copy of the Sun (a popular UK newspaper) contains 8000 distinct words. Basic word lists on the other hand contain 850-3000 words.
- Frequency of word use is highly unbalanced. The first 25 most common words make up about one-third of all printed material in English, and the first 100 make up about one-half of all written material. This may bias words used in passphrases.

We distinguish the following classes of passwords and passphrases.

**mutilated word passwords**, where a single uncommon word is taken as a password, after which certain characters are replaced with similar looking ones, and some non-letter symbols are appended (to comply with the password policy).**random passwords**where each character is selected at random from all 95 available symbols.**common word passphrases**, where each word is taken from the 1000 most basic words.**uncommon word passphrases**, where each word is taken from an extended list of 10000 words. (The motivation here is that for a strong passphrase, people should pick words they know, not necessarily ones that they would ever use in actual conversation.)

The case of a single word used as a password (like "password", or a personal name) is covered by the common word passphrase of length one.

A common means to express password strength is to compute the *entropy* of the class to which it belongs. Without getting to be too mathematical, the entropy of a set of words loosely corresponds to the number bits needed to uniquely represent each word in the set. A set of 16 words that are all equally likely to occur have an entropy of 4 bits.The expected number of tries needed to find the one that matches an entry in a password file is given by the *guessing entropy* of the set. If all these words are equally likely, to find the one that matches will on average require you to try half of them. In other words in this case the guessing entropy is [latex]2^{\text{entropy}-1}[/latex] tries.

We make the (simplifying) assumption that, within each class, each password or passphrase is equally likely. In general this is not the case, so in general the attacker is expected to discover the password or passphrase sooner than suggested by the entropy alone. In practice, the attacker will launch several attacks in parallel, using different assumptions about the construction of the password used. For example, the attacker will try a list of common passwords (like "password") in one of these threads. By distinguishing the classes above, we crudely approximate this. In any case, without any data to approximate the distribution of passwords or passphrases in a single class, this is the best we can do.

Given the above assumptions, we can compute the entropy [latex]H[/latex] of entries in each class. In the following, [latex]n[/latex] is the password length (in characters) or passphrase length (in words). In other words, [latex]n[/latex] is the number of units a user needs to memorise.

**mutilated word passwords**: [latex]H() = \log(10.000) + 4 + 2*\log(42)=28[/latex]. Here 4 bits of entropy are added for substitutions in the uncommon word used as password, and 2 times [latex]\log(42)[/latex] is added for adding two non-letter characters at the end of the password.**random passwords**: [latex]H(n)=n*\log(95)[/latex], where 95 is the number of printable ASCII characters.**common word passphrases**: [latex]H(n)=n*\log(1000)[/latex], where 1000 is the number of common words.**uncommon word passphrases**: [latex]H(n)=n*\log(10.000)[/latex], where 10.000 is the number of uncommon words.

The following table presents the entropy for entries of different length in each class.

length |
||||||||||

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |

mutilated word passwords |
28 | |||||||||

random passwords |
6 | 11 | 17 | 23 | 29 | 34 | 40 | 46 | 51 | 57 |

common word passphrases |
10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |

uncommon word passphrases |
13 | 27 | 40 | 53 | 66 | 80 | 93 | 106 | 120 | 133 |

The amount of time an attacker needs to verify a password or passphrase guess depends on several factors.

For an online attack, a typical estimate for the number of guesses an attacker can try is a 1000 per second. However, we will not consider this case here as online services may easily make such attacks impractical by using increasingly more time to respond to a login for an account that has many failed login attempts. Note that blocking the account after a few tries is not a good idea, as this can be abused for a denial-of-service attack.

In an offline attack, where the attackers has a copy of the password file from the server, it heavily depends on the resources (i.e. money) of the attacker and the hashfunction used to map the password onto the digest stored in the password file. It is for this reason that the hashfunction used for passwords should be special and take a significant time to compute. An attacker can easily try a million of guesses a second, and even more with dedicated hardware. Against such an attacker, a password or passphrase in a class of 46 bits of entropy takes on average a year to crack, and a 28 bit password or passphrase only a day.

Note that the Ecrypt II recommendations for cryptographic key lenght specify 80 bits as the minimum entropy for the smallest general purpose security level. This corresponds to a passphrase consisting of 6 uncommon words.

This clearly shows that the passphrase "This is fun" is really not that secure, contrary to what is claimed here. It is important to stress why this is the case: an adversary will first try the short list of common words. This is the same reason why a password like "jskerv" is risky while "J4fS<2" is more secure. As long as many people choose letter only passwords, a smart attacker first tries only such passwords before trying more complex ones.

An empirical study from Cambridge University found that passphrases are easier to remember than strong random passwords. Moreover, mnemomic passwords derived from such passphrases (by using the first letter of each word as a password character) are just as secure as those random passwords (of the same length).

However, the above discussion shows that using a full passphrase (instead of a mnemomic password derived from it) is as secure as a random password of twice the length!

So, xkcd is right: passphrases are easier to remember and harder to crack, while passwords are hard to remember and still easy to crack. (Although the example password he uses is not really random.)

In case you spot any errors on this page, please
notify me!

Or, leave a comment.

[…] te kiezen. De vraag is dan natuurlijk: hoe doe je dat? Hierbij een aantal tips gebaseerd op een analyse die ik eerder […]