A Trusted Third Party
Digital Health and Care Wales – A trusted third party to protecting individual’s identities
SAIL Databank does not receive or handle identifiable data. We make anonymised data available for genuine research purposes only where there is a potential for benefit. Commonly recognised identifying details are removed before datasets come to SAIL Databank and once anonymised they cannot be reconstructed. Because SAIL holds only anonymised data, researchers carry out their work without knowing the identities of the individuals represented in the data. Continue to the ‘Anonymisation Process’ to find out more about how why protect individual identities in partnership with our Trusted Third Party provider.
The Anonymisation & Linkage Process
Splitting the Datasets
Datasets are split into:
- a demographic component (comprising commonly-recognised identifiers), and
- a clinical or event component (such as medication records and procedures).
The demographic component is transported to our Trusted Third Party (TTP), Digital Health and Care Wales, whilst the clinical component goes to SAIL Databank using a web based secure file upload and switching service.
Anonymisation and Encryption
The TTP anonymise and encrypt the demographic data which is then subjected to quality assurance to ensure content anonymity. Each individual record is assigned an Anonymous Linking Field (ALF) or a Residential Anonymous Linking Field (RALF) for places of residence.
Re-Combining the Datasets
These anonymised demographic elements of the datasets are then sent to SAIL Databank ready to be loaded. They contain only the ALF, week of birth, gender code and area of residence (Lower Super Output Area of approximately 1500 head of population). They are then recombined with the clinical/event component of the dataset making them ready for linkage to other datasets for use.
As an added layer of security and in addition to the standard anonymization process, SAIL Databank carries out further encryption of the ALF to form an ALF-E before loading. Linkage across datasets is conducted using the ALF-E.
For instances where only a small number of individuals are being studied such as in the case of a rare disease, then the data is aggregated ready for statistical analysis to avoid any possibility of identification.
Once a project is completed all datasets are archived.
Technical, Physical & Procedural Safeguards
The SAIL Gateway – Secure Remote Access
The Independent Governance Review Panel (IGRP) gives careful consideration to each project to ensure the proper and appropriate use of SAIL Databank data. When access has been granted, the requested data can be viewed using the SAIL Gateway, a privacy-protecting safe haven and remote access system.
This means that research can be carried out in a secure and protected environment and it safeguards the data from external linkage (jigsaw) attacks that may risk individual privacy. SAIL’s unique remote access system provides time-limited access to the datasets and is subject to researcher verification, a data access agreement, and physical and procedural controls.
SAIL Gateway has a number of levels of security that ensure its safe and effective operation:
- fire-walled Virtual Private Network (VPN)
- enhanced user authentication
- auditing of all SQL commands
- configuration controls to ensure that data cannot be removed or transferred unless authorised.
When presenting the linked data views to researchers for analysis via the SAIL Gateway we employ a variety of measures that help maximise utility whilst minimising any risk of disclosure. These include:
- masking of practitioner codes
- aggregation and suppression
- limiting the numbers of variables provided
- project-specific encryption of the ALF-E to prevent cross-linkage where data users are involved in multiple projects.
SAIL data are managed within the dedicated Data Science Building, which was built to house our population data science initiatives. It is replete with a set of secure physical safeguards, including building-level access control, and limits on access to floors and zones within the building, with particularly strict controls on who has access to areas where data are prepared and loaded into SAIL Databank.
Once a researcher has completed their analysis they are only able to remove their results from the SAIL Gateway following scrutiny by a SAIL Data Guardian. The SAIL Data Guardian assesses the proposed outputs to ensure that any risk of disclosure has been mitigated. Once scrutinised and satisfied, the results can then be released to the researcher.