SAIL Databank & National Core Studies: Accelerating the availability and accessibility of UK health data

Secure Anonymised Information Linkage (SAIL) Databank began as a localised pilot project for the Swansea area but has now become the national Trusted Research Environment (TRE) for Wales.

SAIL Databank’s Programme Manager, Chris Orton, from Population Data Science at Swansea University Medical School, told us about its evolution and its role in advancing health data research.

First published by HDR UK on 15th Nov 2022

How did the SAIL Databank come about?

The SAIL Databank was set up in 2007, initially as a pilot infrastructure for the secondary use of healthcare and social care data within the Swansea area. A governance structure and related principles were established to ensure data transfer, anonymisation, linkage, independent project governance, access controls, and output review could be intertwined to create an end-to-end research platform.

The pilot was in partnership with Health and Care Research Wales, and local and national data providers within the NHS and local authorities. Since 2008, SAIL has become the national TRE for Wales housing population-scale linked health and administrative data across multiple tiers of care and settings, and also takes part in several TRE-as-a-service collaborations.

Have the National Core Studies supported the development of your TRE?

During the pandemic, the National Core Studies supported SAIL to continue onboarding datasets specific to COVID-19 and increase the periodicity of already established datasets within SAIL to be used rapidly for pandemic-related research. NCS also provided the framework to provide support to researchers using data held by SAIL, and provide the network with other TREs and policy makers to drive the agenda forwards with cross-TRE integration and standardising developments.

What data is available in your TRE, and how can researchers access and provide appropriate accreditation for this data?

The primary datasets represent the health status of the population of Wales, but also link in administrative information such as education and social care. SAIL also hosts data for the BREATHE Health Data Research Hub, several cohort studies such as COVIDENCE and UK REACH, and assists with UK-wide linkage projects providing centralised TRE services to undertake secure linked data analysis.

All SAIL data is listed on the HDR UK Innovation Gateway and our application process is detailed on our website.

How have the public and patients been involved in the development of the TRE, and how do you maintain trust and transparency?

SAIL has a consumer panel with lay members where developments, projects, and ongoing work are detailed and where feedback and advice on best practise in PPI and general operations are reviewed.

SAIL convenes an independent Information Governance Review Panel (IGRP) to preside over each project application – lay representation is mandatory. Reviews ensure any data requests are appropriate and the research is in the public’s interest.

Once a researcher has completed their analysis, they are only able to remove their results following scrutiny by a SAIL Data Guardian to avoid any risk of disclosure.

What is the maturity level of the TRE, and how do you see this evolving?

With the expansion of the SAIL infrastructure into the overarching Secure eResearch Platform (SeRP), it is a mature TRE which sets high standards of service, research capability, data hosting, and worldwide collaboration.

We manage over 70 data sources which are available for access, and maintain over 500 data sharing agreements. SAIL also maintains an ISO:27001 certified Information Security Management System (ISMS), Digital Economy Act (DEA) Accredited Processor status, NHS Data Security and Protection Toolkit (DSPT) compliance, and CyberEssentials certification.

SAIL has supported ~600 research projects, and invests in developing technologies for increased capacity and capability to improve user experience and support research. A major area of development is around storing and using medical images.

Another evolving capability is the use of Natural Language Processing (NLP) to bring together data derived from handwritten notes such as clinical reports. Meaningful information can be extracted from these unstructured data sources enabling the creation of research ready datasets.

Tell us about some key use case examples for the SAIL Databank 

The International Perinatal Outcomes in the Pandemic study (iPOP) ICODA project recognised that COVID-19 lockdowns resulted in substantial fluctuations in preterm birth and stillbirth numbers across different countries. SAIL Databank provided the TRE service for ICODA, housing data from over 30 international data contributors aiming to find out why and how lockdown-connected factors affect perinatal health worldwide. Initial findings have been published by the Wellcome Trust and Nature.

A Strategic Approach to Social Care Data in Wales is a report published by Social Care Wales to develop a data strategy to support social services and social care in Wales. SAIL can anonymously link large, population-level social care datasets. This includes data on family justice, children in care and care leavers as well as education and health data. SAIL provides the capacity to properly understand children’s journeys both into and following care proceedings.