This week I attended an interesting workshop on online tracking in Brussels. Up until recently, online tracking of users was restricted to single devices. But users increasingly use mobile devices (sometimes owning both a smartphone and a tablet), and use apps instead of browsers to access the internet. This makes it harder for websites to recognise returning visitors. Cross device tracking aims to solve this problem. Its goal is to determine that a cellphone, work computer, home computer or tablet are linked and belong to the same person. I tried to think of ways one could achieve this in practice, and this is what I came up with.
In principle there are three approaches to link devices to each other. The most obvious approach is to try to detect that the same user is using them. The easiest way is when users reveal their online identity. Another, harder, method involves analysing the behaviour of a user, and trying to recognise a user based on previous behavioural patterns. Secondly, one can try to look at the device itself. Users have preferences and personalise their devices, often in unique ways. Finally, one can consider contextual information, e.g. whether device are used in the same (home) network, or at the same physical location.
Which method is most effective in linking devices, depends on the party that tries to do the linking, and how it tries to do it. A simple website sees very little traffic (only the interactions of the user with that specific site) and may not obtain enough information this way. On the other hand, a website where users have accounts is in a much better position. A large network operator (or a state sponsored surveillance agency…) sees a lot of traffic, and if that traffic is unencrypted the network operator has access to a lot of information it can use. Websites may collaborate, and invoke the help of third parties (like cross domain tracking using third party cookies). Such totally passive mechanisms using only eavesdropping, which have no access to the device itself but merely rely on observing communication patterns, is less accurate than an active mechanism that can interrogate or access the device itself.
Now lets study the different approaches in a bit more detail.
The user: account or behaviour based
If you use several devices to access an online account, for example Facebook or Twitter, this makes it trivial for the service provider to determine that these devices belong to the same person. By using cookies and/or mobile device identifiers, these services can track you for ever after, across all your devices, even if you never sign in again. Similarly, if users shop online and use the same credit card on multiple devices, this can be used to link them.
Alternatively, users can be recognised by the way they use their devices. Let’s consider a few examples of behaviour that may allow recognition of a user, listed in decreasing accuracy.
Users have very specific browsing patterns. Everyone has his personal list of favourite websites (recall that your top five favourite movies are quite identifying). In fact,
your browsing history becomes unique after a few visited websites as well. And people read their favourites in a fixed order. It’s like other daily routines like making breakfast, walking the dog, etc. Such routines typically take place at a specific time of the day, which makes time another classifier. Also the time spent on a particular page or website is significant: reading speed varies among people, and the level of interest in a website topic determines the time people linger on a site.
Similar to browsing patterns, people access the apps on their mobile device in a specific, routine, way. Such access patterns may be visible on the network if the apps access the network when opened (which most apps do). Like with browsing pages, the time spent within an app, the time or day an app is used, and even the number of times an app is used each day, may profile a user. Many websites have apps to allow easy access to their content on a mobile device. Examples are public transport apps, news sites, etc.
If you read your email on both your PC and your mobile devices, the servers contacted by the mail program to fetch your mail already provide a good indication of your identity. This is especially true if you have several different email accounts. And this hold seven if you fetch your email securely: no need to know your email address.
All approaches above assume that the tracker has access to this information. Only simple behavioural patterns that only involve the tracking website itself, are easy to implement. More complex patterns involving several websites require collaboration across these websites, or the eavesdropping capacity of a network operator. trying to measure this at the device may be hard, given the fact that apps and browser pages are increasingly sandboxed.
Measuring device movement, and access to other sensors on the device, can also be used to recognise a person. This depends on the sensitivity of the sensors, and how well they correspond to actual movement of your body. With more and more sensors present on devices, this method becomes more accurate.
Most of these methods require statistical analysis, and may typically link devices to users with only a limited degree of accuracy, although I expect that browsing patterns quite accurately identify individual users.
Finally, one could also use biometrics (face recognition or finger prints, both actually used for unlocking phones these days) to recognise a user across several devices.
The device: Personalisation
If a tracker has access to the device itself (for example by embedding some tracking scripts on the webpages you visit, or if it is embedded in an app that requests permission to access information stored on the device), it can information about how the device is configured and personalised to the taste of the user. This includes the plugins installed in the browser, or the apps installed on the device (people tend to use the same app, for example a note taking app like Simplenote, on all of there mobile devices). System preferences, account settings, your alarm clock settings are other examples.
I believe there are some limits to this approach however. Mobile devices are quite different from standard PCs and laptops. A typical user does not install a plugin in their mobile browser (if that is even possible) for example. Configuration options are quite different between both classes of devices as well, and may even differ among mobile devices running a different operating system. Perhaps the only exception occurs when a user uses different devices running a variant from a single operating system vendor (e.g. Microsoft Windows, or Apple OS-es). In that case the chances of linking devices this way may increase.
(Note: this is related to the concept of device fingerprinting. The EFF developed a tool called panopticlick that you can use to determine how unique your browser is. In other words, how unique your fingerprint is. Note that the outcome is an optimistic estimate: your fingerprint is actually much more precise than the score suggest, as it is typically combined with other information (your IP address, for example) to make it unique. I have been told that only passive (non script based) techniques are sufficient to obtain a unique fingerprint. But browser fingerprinting by itself appears to be not very helpful for cross device linking.)
A very strong technique to link devices is to use information about the context in which a device is used. In particular information about the location in which the device is used is very useful. If two devices are often seen at the same location (at home, at work, etc.) they probably belong together. If a device appears to be at a particular location every night, one can assume the location is the address of the owner. Even devices that do not themselves provide their location directly (like PCs) can be linked because the network they connect to may be known to be at a particular location.
Alternatively, devices that appear to be connected to the same network local network (which you can determine by looking at the IP address your receive) are probably related. If that network is a home network, chances are quite high all devices belong to the same person (or to some close relative).
If you switch devices, for example because you bought a new smartphone, then the locations or networks you visit with your new phone will probably be the same. This allows linking the profile associated with your old phone with your new one.
Linking is transitive. If I can link device A to device B and device B to device C, then device A and C are linked too with high probability.
Note that if you can recognise the same user across several devices by looking at his behaviour, then you can distinguish different users sharing the same device as well. This allows an advertisement company to serve different ads to you, compared to the other users of the device.
As I wrote in the introduction, I tried to come up with as many ways I could think of that might be useful to implement cross device tracking of users. If I made mistakes, or if you can think of even other approaches I’d love to hear about them in the comments.
(Update 13-02-2014: Added the option to recognise users based on their browsing history, or their credit card. Thanks to Claude Castelluccia for pointing this out.