27 April 2016 marked a turning point for a lot of countries as well as a lot businesses worldwide: EU regulation 2016/679 (going by it’s more widely known name General Data Protection Regulation and abbreviated GDPR) was adopted by the European Parliament, the Council as well as the Commission [1]. Especially readers from countries outside of the EU might ask “Why should this be of interest for me?”.
The point is: if your business is dealing with data of EU citizens (e.g. because you are having an online shop selling goods in the EU, or you operate a social network platform with customers that are EU citizens) you are liable under GDPR – this is regulated in Article 3, section 2 of the regulation: “This Regulation applies to the processing of personal data of data subjects residing in the Union by a controller not established in the Union, where the processing activities are related to:
(a) the offering of goods or services to such data subjects in the Union; or
(b) the monitoring of their behaviour.”
I’d guess that if you are reading these lines you become aware (if not have been so before) that your business might most probably be affected by GDPR as well. Now, the purpose of this blog post is not to enlighten you on the basics of GDPR but to discuss one special, interesting aspect of this regulation: pseudonymisation and how it might support your way to become compliant with GDPR.
However, as a general disclaimer: we at ERNW are no lawyers and we are not allowed to give any legal advice. If you want to be on the safe side, please consult the attorney of your choice. What we are presenting here is food for thought from a technical and especially information security auditor’s point of view.
GDPR regulates in Article 6 “Lawfullness of processing”, paragraph 4, sentence e that for processing of personal data the controller shall ensure “[…] the existence of appropriate safeguards, which may include encryption or pseudonymisation”. This directly allows for pseudonymisation as an appropriate control, which is interesting as nowhere else controls are directly described within the regulation. As a matter of fact, pseudonymization is considered a quite strong safeguard which is supported by article 25 where it is again pointed out as an “[…] appropriate technical and organisational measures, such as pseudonymisation, which are designed to implement data-protection principles […]”. That is great news, right? We apply pseudonymisation everywhere and should be good to go. The only question is what exactly pseudonymisation means in the context of the regulation? According to [2], pseudonymisation is a technique “[…] which means the use of false names, pseudonyms.”. Further, pseudonymisation is the removal and replacement of the true identies of individuals or organisations. Unlike removing identifiers for an identity (e.g. name, address, social security number etc.) “[…] pseudonymisation still enables the linkage of data associated to the pseudo-identities (pseudo-IDs)”[2]. Now, to put it simple: replace the name and address (and all other uniquely identifying data of a data subject) in a data set with a random ID only you can relate to the data subject and you have achieved pseudonymity within your system (and would have realized appropriate data-protection safeguards according to GDPR if and only if the non-pseudonymisated data would not be accessible to unauthorized entities).
However, there might be companies or services that require the authentication of a data subject but not the personally identifying data of the data subject (e.g., consider a loyalty programs broker, who collects points collected at loyalty programs and your customer number but no personal data). Using a third party authentication mechanism lfor identity vouching like OpenID, GoogleID, or Facebook Login would allow your service to store (a) an identifier in form of a pseudo-ID provided by the identity provider (if you limit the data access scope of such mechanisms to the identifier only without requesting access to any personal data) and (b) only non-identifying data (customer numbers, that could be further treated as pseudo-IDs if you do not possess relatable data, and points collected). Now, following the GDPR rationale you would have implemented appropriate safeguards from a data-protection perspective as you do not process any personally identifiable data of data subjects but only handle pseudo-IDs.
This situation leads to quite interesting thoughts – consider having systems publically exposed only processing pseudonymised data:
- a data breach would not make you liable to notify authorities or the users of the breach
- you would not be subject to the quite significant fines, that could be imposed under GDPR
- for internal systems you still keep the link between the pseudo-ID and the data subject including the personal identifiable data enabling you to perform all sorts of data processing you are accustomed to
Basically, from what is known so far about GDPR you would have a quite solid stance in such an unfortunate situation as a data breach. So, thinking about pseudonymisation approaches for data exposed on (in any way) publically accessible systems definetly makes sense and should be persued by responsible architects and developers to improve both: the legal risk for their own company as well as their customers with respect to data-protection requirements.
[1] http://eur-lex.europa.eu/legal-content/DE/TXT/?uri=CELEX%3A32016R0679
[2] Rose Tinabo, Fred Mtenzi, Brendan O’Shea: Anonymisation Vs. Pseudonymisation: Which one is the most useful for both privacy protection and usefulness of e-healthcare data