The worries of mixing consumer personal data with health data

We have the healthcare understanding that we do today, thanks to science, and of course, technology that implements it.

However, healthcare information is very intimate by definition, and contains information about a person that not only defines them but that can also be transformed into a very powerful weapon in the wrong hands.

Lately, many tech companies that have been focusing on consumer technology, and on providing general services for users have also found that health information and healthcare enhancing devices could also be a profitable business to run.

Tech companies in healthcare

Several companies have lately been shipping devices to monitor your health. You have inoffensive SpO2 indicators that you can just put in your finger, to sophisticated Bluetooth-LE -enabled weigh scales and smart watches with machinery capable of doing a mobile EKG on you.

In particular, you may already know about Google Health, a division where they hope to save your life one day. Aside Google, within Alphabet Inc.'s corporate structure, Verily and Calico list within their "Other Bets" for health-related endeavours.

Google's strong AI divisions, Google Brain and DeepMind, have been collaborating with healthcare providers in order to produce research and tools for better healthcare. However, in order for that to happen, Google has needed to obtain confidential patient data from healthcare providers. In particular, it is known that UK National Health Service trusts have agreements with Google in order to use their technology. Even if their NHS has successfully prevented Google from using such data for other purposes, which is very complex to define, the intelligence gained from such research will be crucial for other agreements that may follow, which may lower the standard for data protection, especially in those jurisdictions where data protection is not a requirement by law.

We know that in the past, even where those regulations where in effect, Google and other tech companies have failed to comply.

Fitbit is also part of the Google corporation (since November 2019). This gives Google the ability to engage a lot of "health data customers", and mine their health information while they busy themselves with EKGs from their expensive OLED-equipped wristbands, like Apple or Samsung do too.

The fact that Google (or any tech company whose business requires them, essentially to "know you") knows a huge amount of information about you cannot be disputed. It probably knows a lot of health information already too:

Have you ever searched for condoms online, or different contraceptive mechanisms? Then they know you are sexually active.
If you searched recently a lot about how to differentiate among STIs, you might have had unprotected sex with someone whose sex history you don't trust.

And these state just the obvious.

Complex data mining techniques could be used to exploit much more interesting and intricate patterns on our psychological behaviour which we might have no idea about.

Mixing-in healthcare information

There is a kind of data that is much more sensitive. Do you have cancer? Or maybe a genetic predisposition to coronary diseases? Have you ever had an abortion? Do you suffer from psychological disorders? Have you been diagnosed an early-stage of a terminal condition?

You may think you don't have any of those, and as such, have nothing to hide. But consider you do, for a moment. Maybe at that point you would have something to hide. But then, if you would want to hide the answer when it is true, you should want to hide it also when it's false, because otherwise we could just keep asking until you stop feeling comfortable about answering (and then we'd know!).

Imagine having cancer and promptly finding out via a targeted ad that your specific type of cancer can now be cured. It may not be true, but you will click that link.

Health information is especially sensitive (so much so that the European GDPR mentions it explicitly in Article 9), and by using private information to construct neural networks we are embedding that personal information into the neural network model data. Re-identification of information pieces embedded as part of training sets has not yet been studied thoroughly enough to assert conclusively that it cannot be recovered (and, in some neural network constructions, such as compressor-decompressor units, the idea is precisely to exploit that possibility).

COVID-19

Google has recently published the COVID-19 Community Mobility Reports, a set of reports done using differential privacy from Google users' personal data, using the same data paths and techniques they use for other Maps features, such as traffic estimation. Essentially, users consenting to be tracked about their location. Keep in mind that this is only data that Google already had available, but that they are now publishing in an aggregate form, in a responsible way that does not break their privacy policy while still shedding some light on information (when the number of Google users that are tracked is representative enough of the population).

And, in my opinion, it's great that they do this, and that they have found ways around their own privacy policy to be able to cast lights over their mass collection of information in a useful way to the general public without harming, as far as we know today, any one person specifically.

However, if Google were to be handled even pseudonymous health data, given the database of personal information it has about most of us, re-identification may be very much plausible for their compute power, if such data is not aggregate and is needed for potential treatment advice.

This leaves us with the current pandemic situation of COVID-19 and the different apps to track the pandemic spread. Many private corporations want to help. And that's great. The whole civil society should be helping ourselves tackle this pandemic globally.

However, great care must be taken into account while holding personal health information. In particular, if they are to process personal data, they should be using their publicly known infrastructure. For example, in the case of Google, using their own Google Cloud Platform, instead of the internal Borg, so that they can also open-source all code, frontend and backend of the solution. This could increase public confidence on the solution, and allow outside experts to audit and assess its adequacy.

A decentralized solution that does not hoard data into central, public nor private datacenters, which keeps users sovereign of their possible encounters with COVID-19 -positive patients, would be most beneficial. First, because according to the principle of data minimization within the GDPR, it is not needed to keep a central ledger of who comes into contact with who, but more importantly, because enabling high-detail tracing of such information now can be a huge privacy deal-breaker in the future. Singapore is currently finishing their reference documentation for https://bluetrace.io/, the tool they have deployed which only requires citizens to keep their Bluetooth connection active.

BlueTrace documentation is not yet available, but from the reference documents given to the Singaporean public, it holds your telephone number in a Govt. database in order to call you if they need to trace back to you, as well as some other form of id, which we suspect will be a public key. Every time you come into contact with other people your phones send each other's an attestation that you've been together. If any of you become ill, your app can know which other public keys you were in contact with (and maybe even at which time). When you pass that information to the trace personnel, they can trace the public keys back to their telephone numbers and contact them in order to work the trace backwards, and possibly, advice them whether they should seek medical attention or self-isolate.

Summary

Even though we don't usually share the same information with everybody (friends, physicians, psychologists, employers, law enforcement), it seems that we don't share the same expectations of privacy when talking to non-human services.

It seems that collectively our expectation of privacy when talking to others' machines is as much as if they were intimate to us. At least, with an intensity emotionally related to the activity that the device is doing.

A smartwatch or a phone can seemingly track us everywhere (and track our health and effort through the day and sleep patterns through the night) and the benefit of the tracking seems to outweigh the cost individually, but when things stop to be so, it will already be late to regret it.