Frequently Asked Questions


SAIL stands for Secure Anonymised Information Linkage. SAIL Databank is a Wales-wide research resource focused on improving health, well-being and services. Its databank of anonymised data about the population of Wales is world recognised.

SAIL Databank receives core funding from the Welsh Government’s Health and Care Research Wales. A range of anonymised, person-based datasets are held in SAIL Databank, and, subject to safeguards and approvals, these can be anonymously linked together to address important research questions.

SAIL Databank is committed to working with researchers, the NHS and other health-related stakeholders to conduct projects that lead to enhanced patient care, public benefit and improvements in health and well-being. SAIL Databank makes data available for genuine research purposes only where there is a potential for benefit. Because SAIL Databank holds only anonymised data, researchers carry out their work without knowing the identities of the individuals represented in the datasets.

Yes – SAIL Databank does not receive or handle identifiable data. The commonly-recognised identifying details are removed before datasets come to SAIL Databank, and so SAIL Databank cannot reconstruct the identifiable datasets. More details can be found here.

As SAIL datasets can be linked with each other, they can help provide answers to very complex and precise research questions. Here we give a few examples:

  • Is disease x increasing or decreasing?
  • What are the long term outcomes of the Welsh Government’s anti-smoking policy?
  • Might early childhood medication later affect how well they do in school?
  • How many patients would benefit from a new treatment (supported by the National Institute for Health and Clinical Excellence) and how much would this cost?
  • If care is redesigned in a particular way, what will be the likely impact on GP services and hospital services, and on different populations – for example different age groups?
  • How much does poverty affect the need and demand for health services?
  • Are there enough patients suitable for a new clinical trial in a specific area in Wales?

SAIL Databank is an anonymised databank and does not receive or handle identifiable data. Organisations who provide data to SAIL Databank do so via Digital Health and Care Wales, which acts as our Trusted Third Party (TTP) for anonymisation and encryption.

Digital Health and Care Wales replaces the commonly-recognised identifiable items (including name, address and date of birth) for each person with an encrypted code and sends this, along with minimal information (on gender, area of residence and week of birth) to SAIL.

By using an encrypted code that is unique to each person represented in the datasets, data can be anonymously linked together and used for research whilst safeguarding individual privacy.

SAIL Databank is proud to be involved with the many organisations that provide an anonymised version of their datasets to the SAIL Databank. A list of providers to date and their core SAIL datasets can be seen here:

  • Office for National Statistics
  • Annual District Birth Extract
  • Annual District Death Extract
  • Digital Health and Care Wales
  • Emergency Department Data Set
  • National Community Child Health Database
  • Outpatient Data Set
  • Patient Episode Database for Wales
  • Welsh Demographic Service
  • Welsh Cancer Intelligence and Surveillance Unit
  • Public Health Wales
  • Bowel Screening Wales
  • Breast Test Wales
  • Cervical Screening Wales
  • GP Practices signed up to SAIL
  • Primary Care GP Dataset
  • Congenital Anomaly Register & Information Service

Datasets are provided by various organisations. The data provider splits each dataset into two components.

  1. Demographic Component – this holds the identifying information to be anonymised.
  2. Content Component – this holds other details, such as diagnosis, medication, etc.

The demographic component (1) is sent to Digital Health and Care Wales, where it is validated and each record is anonymised and assigned a unique, non-identifiable code. This code, and minimal information on gender, area of residence and week of birth is then sent to SAIL Databank.

The content component (2) is sent directly to SAIL Databank where the two components of the dataset (1 and 2) are linked together. The complete de-identified dataset can now be accessed for research, subject to approvals.

Stage 1: How are the data provided?

Many organisations collect and hold electronic data about the people for whom they provide services or care. This could be data on hospital admissions, a register of individuals receiving a care package, or a database of blood test results.

If an organisation is considering whether it would like to provide its dataset to SAIL Databank, SAIL Databank supports them in their ‘due diligence processes’. This means providing the organisation with information so that they can make an informed decision on whether they should provide their dataset to SAIL Databank. If data provision is approved, SAIL Databank provides technical support so that everything that arrives at SAIL is in an anonymous form.

Stage 2: How are the data anonymised?

The data provider divides their data into two parts: the demographic component and the content component. These go on two different journeys, and how this works is shown below in Diagram 1.

The demographic component containing name, address, gender, date of birth and, for NHS datasets, NHS number is sent to the Digital Health and Care Wales. These items are required at Digital Health and Care Wales for data checking and matching. If the NHS number is not present in the data received, Digital Health and Care Wales will locate it from the Welsh Demographic Service and add it. The NHS number is then used to generate a unique, non-identifiable and encrypted code. This is referred to as an Anonymous Linking Field (ALF). An ALF is assigned to each person represented in the dataset. The ALFs and minimal demographic information on area of residence (LSOA1), week of birth and gender are sent to SAIL Databank.

The content component containing information – such as length of hospital stay, medicines prescribed and test results – is sent directly to SAIL Databank.

Diagram 1. NWIS and data anonoymisation

Diagram 1. Digital Health and Care Wales and data anonoymisation

Stage 3: How are the data linked?

When the data provider divides their dataset into the demographic and content components, they assign a ‘join key’ in each part of the data. This key has no meaning of its own: its only use is to enable the two parts of the dataset to be re-combined at SAIL Databank. In other words, the purpose of the join key is to enable an anonymised version of the original dataset to be assembled. Although the ALF is already an encrypted code, SAIL Databank encrypts this again as an additional safeguard. This double encryption means that neither SAIL Databank nor Digital Health and Care Wales can decrypt the data to patient identifiers.

Diagram 2. SAIL and data linkage

Diagram 2. SAIL Databank and data linkage

The ALF is the key to linking different anonymised datasets together so that they can be used in research studies whilst maintaining individual privacy. An example of this would be to link the Emergency Department Data Set with the Primary GP Data Set. This would provide the data to answer a research question such as ‘How much follow-up care does a GP provide when someone has attended A&E after a fall in the home resulting in a sprained ankle?’

Stage 4: How are the data accessed?

Before any data can be accessed, approval must be given by the independent Information Governance Review Panel (IGRP). The IGRP is comprised of representatives from the British Medical Association, Public Health Wales, National Research Ethics Service, Digital Health and Care Wales and the Consumer Panel for Data Linkage Research. The IGRP gives careful consideration to each project to ensures proper and appropriate use of SAIL Databank data.

When access has been granted, it is gained through a privacy-protecting safe haven and remote access system referred to as the SAIL Gateway. This means that research can be carried out in a secure and protected environment.

The standard operating model of SAIL Databank is to provide access to data via the Gateway rather than to release linked datasets. Thus the data are safeguarded from linkage (jigsaw) attacks that may risk individual privacy. These attacks can sometimes occur where someone has a set of anonymised data and uses it in conjunction with other information they hold to attempt to re identify individuals. Further:

  1. All access to SAIL Databank is monitored closely.
  2. Even though all data are anonymous, SAIL Databank has additional safeguards. One example is aggregation. If a rare event or medical condition are being studied, so that there are only small numbers of cases, these can be presented in groups rather than as individual records. For example, cases will be presented in age groups, such as 20-29, 30-39, 40-49 etc.
  3. Datasets are archived at the end of the project.

ISO 27001 Accreditation

ISO 27001 is an internationally recognised best practice standard for an information Security Management System (ISMS). An ISMS is a framework of policies and procedures that include all legal, physical and technical controls that an organisation has in place to secure information / data throughout its lifetime.

The SAIL Programme has implemented an ISO 27001 Information Security Management System (ISMS), which was externally certified by independent industry assessors in DEC 2015. An externally certified ISO 27001 ISMS demonstrates an organisations commitment to both securing data and the continuous improvement of its information security management system and associated controls.

The SAIL Databank team strongly believe in the benefits of research using anonymised data, and have robust policies and security in place to prevent misuse of individuals’ data. Nevertheless, we understand that some individuals may not want information about them used for research. If someone is concerned about having their anonymised records included in SAIL, they should first read the information on this website or contact us for any further information they require.

Because SAIL Databank holds only anonymised data and is not able to identify individuals, we do not ourselves have the ability to process opt-out requests. Anyone wishing to opt out of anonymised data related to them being sent to SAIL or used for other secondary purposes, should make an enquiry to the relevant data provider(s) about what options they may provide for allowing individuals to opt out. For primary care records, individuals can opt out by making a request to their GP.

Working With SAIL

We would prefer that you contact us as early as possible to discuss your research ideas. Our analysts can advise on whether SAIL data is a good fit for your research question and how to best align your research interests with the available data.

Unlike many providers of research data, we do not charge for access to the data. We only charge for the support and infrastructure costs related to your project (such as time for data preparation and use of computing resources). In practice, this means that there is a cost for most projects. This varies based on the complexity of the project, the support you would like from us, and the type of funding. These costs will be detailed during the scoping process.

Informed consent is usually required to collect identifiable information. If it is not practicable to obtain consent it is possible to seek a section s251 waiver via the Confidentiality Advisory Group of the Health Research Authority. But when data is used only anonymously, there is no legal requirement under the Data Protection Act for individual consent to be obtained. Usually this means that consent is not required for loading new datasets into SAIL Databank. However, there may be additional requirements for some datasets, based on factors such as the agreement under which the data was collected or additional conditions imposed by the data owner.

Where datasets are gathered prospectively, informed consent is required for the use of the identifiable data being collected and, following anonymisation, for subsequent linkage to health record data in SAIL Databank.

Sometimes studies that have informed consent (for example, clinical trials) want to be able to link SAIL data back to identifiable individuals. In these cases, the individuals must have given explicit consent to link their medical records. A copy of the consent form showing this will need to be provided to SAIL Databank. With appropriate patient level consent and alignment with data provider agreement, we can create the linkage in cooperation with Digital Health and Care Wales, so that no identifiable information is brought into SAIL.

* Please note that if you plan to bring data into SAIL Databank for linkage there are key fields required for the matching and anonymisation process. For more information about the split file system, please click here.

Yes, and we have experience of supporting projects using external datasets such as Hospital Episode Statistics (HES) from England. It will be the researcher’s responsibility to get access to the non-SAIL data, but we can assist in this process (for example, working with English data providers, they often ask about how the data will be managed, security, etc.) We can help answer these questions.

It is the researcher’s responsibility to be aware of and obtain all relevant regulatory and governance approvals pertaining to their study. In accordance with Health Research Authority guidance, ethical approval is not mandatory for studies using only anonymised data. The need for peer review is dependent on the project funder and sponsor requirements.

Beyond this, all SAIL Databank projects are reviewed for approval by the Information Governance Review Panel (IGRP). If you are applying to use data from outside SAIL Databank, the data provider may have additional requirements. If ethical approval and/or peer review are required for a specific project, they can be done in parallel with IGRP submission.

It is possible to remotely access the data within a secure remote desktop platform, called the SAIL Gateway, which is designed to provide access to approved researchers. Data for anonymised research projects can only be accessed within this secure environment, so typically all data preparation and analysis for a study must take place within the Gateway. The environment is a standard Windows desktop with major statistical software packages pre-installed. We also provide the ability for researchers to install other software when needed.

For more information about the SAIL Gateway click here.

Working with SAIL data in its raw form typically involves writing SQL queries on the database. After initial data preparation in the database, researchers typically use common statistical packages or scientific programming languages to work with the data (such as R, Stata, Python, SAS, or SPSS). However, whatever your background, skills, or desire for hands-on involvement with the data, we have a team of experienced data scientists who can partner with you and offer a variable level of support. Please get in touch to discuss options.