I have spent some time the last few years discussing GDPR (the new EU personal data privacy legislations) and also spent some time designing SaaS solutions targeting the public sector (a.k.a. the state sector) with all that comes with it (laws and tenancy isolation etc.). Both of these required a lot of thinking about how to limit access to sensitive data or the data of other tenants or users. There are ways you can protect your data in Azure to mitigate risks with data security in mind. While I am not a specialized DBA, lawyer or CSO (Chief Security Officer) but rather a generic (Cloud/Azure) Solution Architect, I will post a series of posts aiming to demonstrate how you can secure your data to a higher extent. I will in the upcoming posts limit the platform to SQL Databases in Azure.
These posts are aimed towards other solution architects and (lead) developers and will go through step-by-step instructions on how to set up the suggested features that you can follow even if you have never done it before. While my examples will show you one way to do it, you should of course see if your scenario differs before you implement the same solutions in your projects and maybe tweak the solution to match your own needs for privacy and security.
The plan is to include the following topics and the individual articles will be published during autumn winter 2018-2019
- Introduction (this post)
- Basic Database Protection
- Column Encryption (Always Encrypted) with Azure Key Vault
- Row-level security (for tenancy or user-data isolation)
- Dynamic Data Masking
GDPR summary/hints for solution architects
Though GDPR covers the EU region (EU Citizens) -other regions/countries have similar regulations regarding personal information and how it may be handled, stored, used etc. While GDPR cannot really be fully discussed with a few sentences in a post like this, a small introduction for solution architects could still be useful. As a Solution Architect (SA) it cannot be expected that you are a diplomaed lawyer or expert in IT security. Handling and dealing with personal information should be a combined effort between Legal, DPO, EA, IT Architect and SA roles. But in the end us SA’s will design/detail the solution and make sure that the application follows the decided strategy. I would recommend that you as system designers at least have some general awareness of the legislation regarding the protection of personal information. I will supply some links in the references sections below.
GDPR concerns all personal information that can directly or indirectly be connected to a person. An attribute directly connected to a person is SSN – it is basically a direct link to a person. It is important to note that other attributes can also be used to identify an individual. For example age and village could in combination be used to identify a person in a small village. So basically you are affected by GDPR if you handle ANY information regarding a user that involves something that can be used to identify an individual, such as email, phone, address, car number etc.
So if there is any doubt if your system is affected by GDPR it is better to assume that it is and plan for it. After all, GDPR does not forbid you to store personal information.
And while the cloud and Azure has little or nothing to do with GDPR – it applies just as much to on premise solutions. Well, the part about data leaving the European Union is of a little more concern for cloud solutions than a solution locked away at a physical location. However, if your hosting partner uses offshore resources that can access the data you might still be in breach of regulations.
So what should you think about especially as a solution architect? I will mention a few points that have great significance to you as solution designers.
Your company’s role
Your legal obligations differs a bit from what kind of company/situation you are in:
Controller: have the main legal responsibility as they can be viewed as owners of the personal information. If your company develop an e-commerce site aimed to the consumer market you are the controller, while if you develop a SaaS application for Employee management you are rather a data processor meaning that the subscribers to your service are actually the controllers.
Processor: Is an entity that processes personal information, such as a SaaS solution provider.
Neither: Example: You work as an on-site IT-consultant for a company that handles personal information. Your legal obligations are rather defined by NDAs and cooperation contracts instead. However, your job is to make sure that whatever you do does not breach the regulations for the Controller or Processor but your consultancy firm does not hold any GDPR definition and obligations.
It is not uncommon for a company to have both these roles simultaneously, i.e. both own the data and process it. It is vital that you understand your company’s role to make sure that you fulfill your obligations.
Make sure that you evaluate all personal data and make sure that the information stored is of relevance to your solution. For example political, sexual preference may not be needed to be stored in your system handling employee data as there is no legitimate reason for you to include it in your data store. However, if there are laws forcing you to keep a record of all employees you have a lawful reason to store information about those employees.
Consent & data content
Make sure that the purpose of storing user data is obvious (last point) and that there is a consent from the person whose data you are storing. You also need to think extra if you are handling the personal information of minors (younger than 16-year olds) – in which case you may need parental consent. The consent must also be obvious “opt-in”. Automatic opt-in if you don’t respond is not permitted. The subjects should also be able to take part of the information stored about them and should be able to rectify faulty information. There are special concerns for data involving sensitive data (sexual preference, politics, sickness related and criminal information) to name a few.
The right to be forgotten
The person opting-in should also be able to withdraw his/her approval and even to have data “erased” (if not contradictory to legislation). However according to information I have received – can´t recall the source – it cannot be required that you delete individual records in your long time backup of your database. The backup should be stored as-is for as long as you keep the backups. You should however have a backup plan that includes the topic of removal of old backups when they aren’t needed anymore from either practical or legislative reasons.
Anonymization / pseudonymization
So by now you might feel like “hey, I don’t want anything to do with personal information” and may look at alternative setups. In some cases you simply cannot , for example if your system is an e-commerce site and to be able to send the goods you need to store personal information, email addresses, orders etc. In these cases you have a lawful reason for storing the data but you should take action on how to protect it (which we look deeper into in later posts) and consider how you can archive and delete the information as well as making sure that the customers can see and correct their stored data.
In other cases you may have a possibility to anomymize the data. Consider that you have a medical system and want to publish some statistical health information based on regions to the public. In such case you could anonymize the data by removing SSN, Name, Address etc to group by postal code or similar. You could also replace the values with random values or random identifiers. The important thing is that you should not be able to restore the data to its original state or be able to identify individuals on the remaining data. Be careful of indirect identifiers that can be used to indirectly identify an individual with other columns or data sources. Consider that you simplify your medical records to only store diagnose, age and town, then you might still identify Howard 94 years – living in VerySmallVillage by his age and town as he is likely the only one at that age living in that small village. But if you remove all factors that can be used in themselves or in combination with other fields in the data you have effectively anonymized the data. Just don’t underestimate the number of factors that can be used to indirectly identify individuals.
A similar approach is pseudonymisation, which removes the specifics and generalizes the data to an extent where it in itself does not directly contain the identifiers but replaces them with synonyms. So I could replace the SSN with a GUID, town with RandomTownId but leave the rest of the data intact, I could replace birth date with birth year, name with a random name etc. This allows the data to be back-traceable and possible restored to its original but only with use of other data sources. The advantage of this approach is that you can still keep many columns that can be used for research but just removing the exact details that can identify an individual. You could also back trace for errors by using the synonyms. I would however be sparse in using this kind of solutions and only use it if you have a valid scenario that cannot be solved by anonymizing or just dealing with the fact that you have personal data.
In general if you store personal information, anonymized/paranonymized or not, you should combine the data store with encryptions to make the solution even more secure from would be leaks.
While transfer of personal data outside the country border (read EU virtual border) is generally prohibited, the EU-US privacy shield has had statements amended to still be able to match GDPR such as the possibility to delete personal information and that 3rd party companies processing data on behalf of Privacy Shield members guarantees the same level of protection. This is likely what allows companies in EU to use the cloud services of Microsoft as they have subcontractors as well. Microsoft has a complete set of legally binding regulations in their Online Services Terms. You cannot simply hand out those Online terms to your own customers if you host a SaaS-solution and claim that Microsoft guarantees GDPR processor accountability. In fact Microsoft will hold data processor accountability to your company while your customers can only claim your company’s responsibility and cannot sue Microsoft directly. Any case act against Microsoft would have to be filed by your company as you own the subscription and therefore are the one who has a legal agreement with Microsoft.
While you cannot copy the OST to your customers you could in theory offer the same conditions to your customers as Microsoft is guaranteeing you their terms. To put it in other words – “if Microsoft promises you something – then why shouldn’t you to promise the same conditions to your customers?”.
In reality it is not always that easy and you may encounter organisations that require contract with all subcontractors. Good thing is that you can deliver Azure applications in other ways that enables the OST to be between your customer and Microsoft.
- Install in customers Azure Subscription. This way the OST is automatically in the customers hands. You could still send your solution as a SaaS application and script the environment to open access to your DevOps team and limit the customers own users.
- You could use the CSP (Cloud Solution Provider) setup. This basically allows the azure subscription to be created FOR the customer (with OST included) but you handle the actual installations and operation. This takes a bit longer to enable as the customer must approve you as a CSP for their organisation. Also your company might not be a tier 1 (direct) or tier 2 (indirect) CSP in case you have to go though a process to become one. Read more about CSP here https://partner.microsoft.com/en-us/cloud-solution-provider/csp-partner.
Breaches are what we should try to avoid the most as solution designers. A breach could be a “leak” of data, storing of non-lawful data, transfer of data to 3rd part, tenants seeing other tenants data (regarding personal information) or hackers gaining access or developers misusing the data. This series of posts aim towards highlighting a few scenarios where a better protection of data can be accomplished. Breaches must be reported within 72 Hours.
Privacy-by-design is mentioned in GDPR and in a nice way declares that privacy should be a design issue and not an afterthought. Companies should design their systems with regards to what information is stored, how it should be accessed, who should be able to access it. The systems should prevent unauthorized access of personal data and support the consent, revocation of consent and secure access and storage of data. Document access rights and the reason for storing the information as well as a strategy for how you handle a person requesting its information or deletion (it may just be a select clause or a policy stating that upon request you delete the user in the database or run a script to clean up inactive users). This enables you to comply to the privacy by design.
Use Azure as an enabler of GDPR
While some companies complain about the insecurity and the risk of moving to the cloud (and thus also Azure), I personally see Azure as an enabler of GDPR compliant solutions. The Microsoft Online Service Terms are very detailed in what kind of responsibility that Microsoft takes for storing and protecting your sensitive information. Azure also uses machine learning and other security features to detect abnormal access to some of your data sources and can alert you of suspiscious behavior which can actually help you keep your data safe compared to your self hosted services.
It is important to note that if you add a web service that publishes the entire SQL database, Microsoft’s attorneys will void any claims that you have. Microsoft will make sure that the platform is secure but you as solution designers are responsible for the applications (just as in an on premise solution). Consider that if you publish an on premise data source through your own networks you have done exactly the same thing, cloud or not from a GDPR perspective.
My experience is that Azure is a far better platform (security-wise) than the operational platforms of most of my customers during my 23 years of consultancy. Especially regarding physical access, at many customers I could just walk behind an employee sat down in an empty room and connect my computer. Sometimes the customer employees would even hold up the door for me. Inside the corporate network the security is usually much lower than if you try to penetrate from the outside.
GDPR does not say anything about tenancy (at least from what I have found). So if you want to store your customers in the same database or put them in separate ones is not a concern of GDPR as long as you have clearly documented your privacy-by-design policies to ensure that the data from different customers is not accessible for other customers. Then there are other benefits/disadvantages for different tenancy solutions but it is important to note that it isn’t GDPR that dictates these rules.