SAIL stands for Secure Anonymised Information Linkage. SAIL Databank is a Wales-wide research resource focused on improving health, well-being and services. Its databank of anonymised data about the population of Wales is world recognised.
SAIL Databank receives core funding from the Welsh Government’s Health and Care Research Wales. A range of anonymised, person-based datasets are held in SAIL Databank, and, subject to safeguards and approvals, these can be anonymously linked together to address important research questions.
SAIL Databank is committed to working with researchers, the NHS and other health-related stakeholders to conduct projects that lead to enhanced patient care, public benefit and improvements in health and well-being. SAIL Databank makes data available for genuine research purposes only where there is a potential for benefit. Because SAIL Databank holds only anonymised data, researchers carry out their work without knowing the identities of the individuals represented in the datasets.
Yes – SAIL Databank does not receive or handle identifiable data. The commonly-recognised identifying details are removed before datasets come to SAIL Databank, and so SAIL Databank cannot reconstruct the identifiable datasets. More details can be found here.
As SAIL datasets can be linked with each other, they can help provide answers to very complex and precise research questions. Here we give a few examples:
- Is disease x increasing or decreasing?
- What are the long term outcomes of the Welsh Government’s anti-smoking policy?
- Might early childhood medication later affect how well they do in school?
- How many patients would benefit from a new treatment (supported by the National Institute for Health and Clinical Excellence) and how much would this cost?
- If care is redesigned in a particular way, what will be the likely impact on GP services and hospital services, and on different populations – for example different age groups?
- How much does poverty affect the need and demand for health services?
- Are there enough patients suitable for a new clinical trial in a specific area in Wales?
SAIL Databank is an anonymised databank and does not receive or handle identifiable data. Organisations who provide data to SAIL Databank do so via Digital Health and Care Wales, which acts as our Trusted Third Party (TTP) for anonymisation and encryption.
Digital Health and Care Wales replaces the commonly-recognised identifiable items (including name, address and date of birth) for each person with an encrypted code and sends this, along with minimal information (on gender, area of residence and week of birth) to SAIL.
By using an encrypted code that is unique to each person represented in the datasets, data can be anonymously linked together and used for research whilst safeguarding individual privacy.
SAIL Databank is proud to be involved with the many organisations that provide an anonymised version of their datasets to the SAIL Databank. A list of providers to date and their core SAIL datasets can be seen here:
- Office for National Statistics
- Annual District Birth Extract
- Annual District Death Extract
- Digital Health and Care Wales
- Emergency Department Data Set
- National Community Child Health Database
- Outpatient Data Set
- Patient Episode Database for Wales
- Welsh Demographic Service
- Welsh Cancer Intelligence and Surveillance Unit
- Public Health Wales
- Bowel Screening Wales
- Breast Test Wales
- Cervical Screening Wales
- GP Practices signed up to SAIL
- Primary Care GP Dataset
- Congenital Anomaly Register & Information Service
Datasets are provided by various organisations. The data provider splits each dataset into two components.
- Demographic Component – this holds the identifying information to be anonymised.
- Content Component – this holds other details, such as diagnosis, medication, etc.
The demographic component (1) is sent to Digital Health and Care Wales, where it is validated and each record is anonymised and assigned a unique, non-identifiable code. This code, and minimal information on gender, area of residence and week of birth is then sent to SAIL Databank.
The content component (2) is sent directly to SAIL Databank where the two components of the dataset (1 and 2) are linked together. The complete de-identified dataset can now be accessed for research, subject to approvals.
ISO 27001 Accreditation
ISO 27001 is an internationally recognised best practice standard for an information Security Management System (ISMS). An ISMS is a framework of policies and procedures that include all legal, physical and technical controls that an organisation has in place to secure information / data throughout its lifetime.
The SAIL Programme has implemented an ISO 27001 Information Security Management System (ISMS), which was externally certified by independent industry assessors in DEC 2015. An externally certified ISO 27001 ISMS demonstrates an organisations commitment to both securing data and the continuous improvement of its information security management system and associated controls.
The SAIL Databank team strongly believe in the benefits of research using anonymised data, and have robust policies and security in place to prevent misuse of individuals’ data. Nevertheless, we understand that some individuals may not want information about them used for research. If someone is concerned about having their anonymised records included in SAIL, they should first read the information on this website or contact us for any further information they require.
Because SAIL Databank holds only anonymised data and is not able to identify individuals, we do not ourselves have the ability to process opt-out requests. Anyone wishing to opt out of anonymised data related to them being sent to SAIL or used for other secondary purposes, should make an enquiry to the relevant data provider(s) about what options they may provide for allowing individuals to opt out. For primary care records, individuals can opt out by making a request to their GP.
Working with SAIL Databank
We would prefer that you contact us as early as possible to discuss your research ideas. Our analysts can advise on whether SAIL data is a good fit for your research question and how to best align your research interests with the available data.
Unlike many providers of research data, we do not charge for access to the data. We only charge for the support and infrastructure costs related to your project (such as time for data preparation and use of computing resources). In practice, this means that there is a cost for most projects. This varies based on the complexity of the project, the support you would like from us, and the type of funding. These costs will be detailed during the scoping process.
Informed consent is usually required to collect identifiable information. If it is not practicable to obtain consent it is possible to seek a section s251 waiver via the Confidentiality Advisory Group of the Health Research Authority. But when data is used only anonymously, there is no legal requirement under the Data Protection Act and UK GDPR for individual consent to be obtained. Usually this means that consent is not required for loading new datasets into SAIL Databank. However, there may be additional requirements for some datasets, based on factors such as the agreement under which the data was collected or additional conditions imposed by the data owner.
Where datasets are gathered prospectively, informed consent is required for the use of the identifiable data being collected and, following anonymisation, for subsequent linkage to health record data in SAIL Databank.
Sometimes studies that have informed consent (for example, clinical trials) want to be able to link SAIL data back to identifiable individuals. In these cases, the individuals must have given explicit consent to link their medical records. A copy of the consent form showing this will need to be provided to SAIL Databank. With appropriate patient level consent and alignment with data provider agreement, we can create the linkage in cooperation with Digital Health and Care Wales, so that no identifiable information is brought into SAIL.
* Please note that if you plan to bring data into SAIL Databank for linkage there are key fields required for the matching and anonymisation process. For more information about the split file system, please click here.
Yes, and we have experience of supporting projects using external datasets such as Hospital Episode Statistics (HES) from England. It will be the researcher’s responsibility to get access to the non-SAIL data, but we can assist in this process (for example, working with English data providers, they often ask about how the data will be managed, security, etc.) We can help answer these questions.
It is the researcher’s responsibility to be aware of and obtain all relevant regulatory and governance approvals pertaining to their study. In accordance with Health Research Authority guidance, ethical approval is not mandatory for studies using only anonymised data. The need for peer review is dependent on the project funder and sponsor requirements.
Beyond this, all SAIL Databank projects are reviewed for approval by the Information Governance Review Panel (IGRP). If you are applying to use data from outside SAIL Databank, the data provider may have additional requirements. If ethical approval and/or peer review are required for a specific project, they can be done in parallel with IGRP submission.
It is possible to remotely access the data within a secure remote desktop platform, called the SAIL Gateway, which is designed to provide access to approved researchers. Data for anonymised research projects can only be accessed within this secure environment, so typically all data preparation and analysis for a study must take place within the Gateway. The environment is a standard Windows desktop with major statistical software packages pre-installed. We also provide the ability for researchers to install other software when needed.
For more information about the SAIL Gateway click here.
Working with SAIL data in its raw form typically involves writing SQL queries on the database. After initial data preparation in the database, researchers typically use common statistical packages or scientific programming languages to work with the data (such as R, Stata, Python, SAS, or SPSS). However, whatever your background, skills, or desire for hands-on involvement with the data, we have a team of experienced data scientists who can partner with you and offer a variable level of support. Please get in touch to discuss options.