Research Officer – Software Developer for SAIL

Are you a software developer who is interested in using your skills to advance a cutting-edge field of research and improve patients’ lives?

Consider applying for a job with SAIL Analytical Services.

The role

We are seeking a data scientist – software developer to contribute to the development of tools and infrastructure that will enable ground-breaking approaches to research using anonymised, linked medical records.

Data routinely collected by the NHS and other service providers holds great potential for health research.  Technical challenges related to the complexity of working with the data are one of the barriers to realising this potential.

This post provides the opportunity to work on a team that is seeking to overcome these barriers by developing novel methods of reusable research.  This will enable the data to be used effectively as an evidence base supporting improved care, increasing the health and well-being of the population.

Find out more and apply on the Swansea University website (posting closes 15th July).

SAIL News When health data not used

The problems that occur when health data is not used

By Kerina Jones, SAIL Databank Associate Professor

Health data is more than just statistics or numbers. It can be collected, used and shared in lots of different ways. But ignoring certain medical data has the potential to change the way you are treated, how your care is provided and what happens to you as a result.

When you visit a doctor or hospital, the questions you’re asked, and treatment you’re given are all taken down in medical records. These are confidential, but when anonymised this information holds immeasurable potential for public benefit, so long as it is used safely and effectively. All of this information combined ultimately forms the evidence for new clinical guidelines, drug development and government health policy.

However, because it could have such a big impact, the data needs to be relevant, accurate, comprehensive and timely. But despite the best will in the world, and various high-profile campaigns to promote data use, important data often lies unused.

We have recently completed an international study looking at clinical records, research, and regulatory frameworks, to find out why such useful data is often not used, and to explore the implications of this for citizens and society.

We found that there are many reasons for the non-use of health data, and that it is strongly implicated in the deaths of many thousands of people and the potential waste of billions of pounds.

Medical records

The UK’s NHS is held in universal high-regard, but it’s common knowledge that hospitals, doctors’ surgeries and other health services are overstretched. We found that this includes IT systems too: their availability varied considerably between each location, often with a continuing reliance solely on paper records. There were limitations on data completeness and availability, and a lack of data sharing between departments too. This has knock-on effects for clinical audits and research relying on the records – to the disadvantage of patient well-being. At the patient level, this may mean tests need to be repeated and treatments delayed. It can also impact the ability of clinician to make an accurate diagnosis.

Although the majority of errors are corrected in good time or have little ill effect, it has been estimated that 40,000 to 80,000 deaths a year in the US alone are due to misdiagnosis. There will be other reasons, but the non-use of clinical data is highly likely to be a major contributing factor. The high demand on staff to uphold standards of care as a priority, along with other service constraints and budget limitations, means problems persist in relation to clinical data collection and use.


We also found that research carried out by pharmaceutical companies and universities was subject to various types of data non-use. Sometimes data were withheld intentionally, but in the majority of cases research is conducted with integrity and problems are inadvertent.

Pharmaceutical companies have been known to delay, or selectively use, clinical trial data for market advantage to the detriment of individuals and healthcare budgets. And even within academia the pressure to perform can influence work carried out and subsequently published. Furthermore, peer-reviewed journals strongly favour publishing positive results, making it harder to share valid but negative or inconclusive findings. Altogether, this leads to publication bias.

In his 2012 book, science writer Ben Goldacre stated that over 100,000 people died in the 1980s due to the inappropriate use of one particular heart drug. Sadly, there had been a small, previous study that indicated a likely problem, but it hadn’t been published at the time. Now there are initiatives like the AllTrials campaign which strive to promote the proper reporting of clinical trials. Even so, it has been estimated that less than 50% of findings are reported within two years of study completion. Clearly, more needs to be done here and across the spectrum of research studies to ensure data are used in a timely way.


Our research also found that problems in using data arise due to regulations for the proper use of data about people. These exist to safeguard patients, the public and professionals. Sometimes, however, these regulations may be implemented in an over-cautious manner, and/or there may be lengthy processes to follow before the data can be used. This may be due to unclear responsibilities or the fear of making a serious mistake.

Sometimes relying on the need for individual consent can limit studies about groups that are difficult to reach, as well as problems such as substance misuse, and any issues seen as sensitive. In the US, for example, people’s records with information on substance misuse have been withheld from research datasets because of privacy concerns. Without being able to access data on this difficult problem, research that would help individuals and their families may be delayed or abandoned. The problem of substance misuse is associated with over 60,000 deaths a year in the US alone, vast costs to society and untold emotional damage.

What this means

The implications of data non-use are massive, but none of these problems stand in isolation. It is fully appreciated that there are often very good reasons that health data cannot be used to its best advantage, but there are also many areas where improvements can be made. Clearly, every step in data collection and use is crucial if we are to avoid harm due to non-use.

It can been argued that data non-use is a greater risk to well-being than data misuse. The non-use of data is a global problem and one that can be difficult to quantify. As individuals, we have a role to play in supporting the safe use of data and taking part where we are able.

SAIL Associate Professor Kerina Jones is the academic lead for data governance and public engagement.

MQ data platform SAIL

Data platform promises major change in young people’s mental health research

Mental health research charity MQ has partnered with a team at Swansea University Medical School to develop a major new young people’s mental health data platform.

The MQ Adolescent Data Platform for Mental Health Research, will be hosted by SAIL Databank and will anonymously bring together data for research from a large range of existing and new information relating to the mental health of young people aged 10-24 from across the UK.  

This will provide an unprecedented resource for researchers and policy-makers to improve understanding of mental illness in young people, address historical service challenges, and tackle inequalities in mental health.

Project launch

In total, billions of pieces of data will be included ranging from administrative health, social and education data, to psychological and clinical data, as well as information from research studies and held within the privacy protecting SAIL Databank.

With funding of £800K for 2018-2020, Professor Ann John and her team of world-leading data scientists from the Farr Institute will spend the first year building the infrastructure, working with other researchers, securing data agreements, preparing and linking data.

The team will begin preliminary data analysis on available data within the first 12 months – and will be working with other researchers across the UK to grow the size of the hub, nations covered, and breadth and depth of data during that period.

The platform, which will officially launch in 2018, addresses a significant gap in young people’s mental health research.

Potential impact

The UK boasts a wealth of unique and rich data on mental health, including that of young people. Data that could fill the many gaps in our knowledge of mental health is being collected constantly – in hospitals, in GP surgeries, in schools. These details are, however, not being systematically brought together to drive improvements.

Following recent Government and industry reports calling for greater use of data in healthcare, this new project will be a UK hub for data-driven research and young people’s mental health policy internationally.

It will make it easier for researchers worldwide to use and learn from data, reducing the costs and time involved in mental health research and creating vast new potential insights.

Commenting, MQ Director of Research Sophie Dix said:

“This initiative promises a step-change in research efforts to understand and transform young people’s mental health. We hope to see the platform become a global focal point for young people’s research and we are confident it will create much-needed momentum in the field. It has the potential to help researchers uncover obstacles and facilitate leaps in understanding and intervention that have been lacking for decades.”

Professor of Public Health and Psychiatry at Swansea University Medical School and project lead, Ann John said:

“The MQ Adolescent Data Platform will be an unparalleled resource bringing together world class scientists from across universities. Our aim is to transform research into children and young people’s mental health, making it easier and faster for scientists to deliver progress in tackling historical issues of under-treatment and under-recognition. As well as supporting policies to improve access and quality in young people’s services.”

SAIL Analytical Services

Analytical services team supports SAIL users conducting research

SAIL Analytical Services are an interdisciplinary team that provides research support for the SAIL Databank.

Our team of data scientists support SAIL users throughout the research life cycle, from an initial idea to publication and beyond. They collaborate with internal and external research groups on SAIL research projects, and lead their own methodological research projects in developing new methods of working with large datasets.

Research Support Services

Our scoping service helps potential SAIL researchers understand what is possible and shape research questions to best fit our data, as well as providing a quote of how much support will be needed for the project and what it will cost. During the application stage, senior members of the analytical services team participate in internal review of applications to use the data.

Once a project becomes active, our team provisions projects with data and provides basic support for data-related queries, in addition to any specific role which our team may have been funded to undertake as part of the project (such as preparing and reformatting data).

When it is time for research results to be reported, senior team members review outputs from the secure environment to ensure that they are safe and in line with the project proposal.

Research Collaboration

We also collaborate with researchers, both at Swansea University and external institutions, on a variety of projects.  Our role varies based on the needs of the research team, from basic data preparation to taking the lead role in planning and executing the research from beginning to end.  Our team has a wide range of skills and experience, including:

  • Manipulation of large datasets
  • Assessing data quality and data cleaning
  • Software development
  • Research study design
  • Statistics

In addition, our extensive experience with SAIL datasets means that we can quickly get started with delivering a project, without the long learning curve that new users of the data experience.

Methodological Development

Our team also leads methodological research and software development to enable faster, more efficient, higher quality research with SAIL data.  Examples of past and current projects include:

  • Standardized methods for identifying populations and data coverage.
  • Reusable code for matching control groups.
  • Automated generation of flags from clinical coded data.
  • Automatic quality checking and reporting for datasets.

Find out more about the SAIL Analytical Services team

Are you planning a SAIL project?

We would be happy to speak about potentially collaborating on your research.

Contact the SAIL Databank team to discuss your research.


New study reveals how electronic health records can improve clinical trial follow up

A new study by Swansea University academics has indicated that SAIL Databank can provide a simple, cost-effective way to follow-up after the completion of randomised controlled trials (RCTs).

The study Long term extension of a randomised controlled trial of probiotics using electronic health records led by researchers in the Swansea University Medical School and the College of Human and Health Sciences, was published in Scientific Reports . 

The findings demonstrate the potential of using de-identified routinely collected electronic health records, such as those linked in SAIL, for more complete trial results. Results showed that SAIL can help track trial participants, with long term monitoring of medical interventions and health outcomes, and new insights into population health.

Typically, RCTs are relatively short term, and due to costs and resources, have limited opportunity to be re-visited or extended which means the effects of treatments cannot be scrutinised beyond the duration of the study, typically 1-2 years.

With patients’ consent, data analysts can match patients to their records and access data quickly. As a result, the cost of follow-up using routine data is potentially relatively small and does not increase with the number of participants.


The study

The original RCT investigated the impact of probiotics taken during pregnancy on childhood asthma and eczema in a group of children at 6 months and then 2 years of age.

Professor Sue Jordan of Swansea University’s College of Human and Health Science who led the study said:

“In this study we reported on the feasibility and efficiency of electronic follow up, and compared it with traditional trial follow up. We gained new insights from outcomes electronically recorded 3 years after the end of the trial, and could then identify the differences between trial data and electronic data.”

The use of electronic databases in clinical trials has been hailed as one of the major benefits of a nationwide electronic health records system. However, few studies have demonstrated this benefit, or formally assessed the relationship between traditional trial data and electronic health records databases.”

Key research findings

  • Using SAIL, the retention of children from lower socio-economic groups was improved which helped reduce volunteer bias.
  • Results from the electronic follow up were more reliable due to reduced risk of bias, unreliability or inaccuracy in participants’ recall.
  • New insights were gained from the electronic five year follow up, particularly for asthma, which typically appears after 2 years of age.
  • For the electronic follow up at five years, retention was still high and free of bias in socio-economic status.
  • Any future extension of the trial is straightforward.


SAIL RCT Figure 1
Figure 1 – click to enlarge
SAIL RCT Figure 2
Figure 2 – click to enlarge

Research impact

Follow up of trial participants on anonymised routine electronic health care databases such as SAIL offers great potential to maximise the economic efficiency of trials and allow access to a fuller range of health information.

SAIL Associate Professor Kerina Jones is the academic lead for data governance and public engagement.  She said:

“SAIL is a world-class, privacy-protecting data linkage system that securely brings together routinely-collected health data.

SAIL is part-funded by the Welsh Government, and makes person-based health data available for genuine research purposes only where there is a potential for benefit. Because SAIL removes the identities of participants to protect their privacy and holds only de-identified data, researchers carry out their work without knowing the identities of the individuals.”

Professor Jordan said:

“The number of participants volunteering for RCTs is decreasing, particularly amongst the most economically disadvantaged. Trial data are vulnerable to misunderstandings of questionnaires or definitions of illness.”

Professor Michael Gravenor, who led the data analysis for the study said:

“These results lead us to conclude that using electronic health records have benefits relating to the cost-effective, long term monitoring of complex interventions which could have a positive impact for future clinical trial design.”

pregnancy and epilepsy

Epilepsy drug exposure in womb linked to significantly poorer school test results

Epilepsy research conducted by Arron Lacey at Swansea University Medical School found exposure to epilepsy drugs in the womb is linked to significantly poorer school test results among seven year olds.

Arron is part of the Prudent Healthcare Research Team and the Swansea Neurology Research Group that includes SAIL analysts, clinicians and academics. They conduct research using the SAIL Databank as well as analysing unstructured text in medical records. Having already published several population based studies exploring the effects of epilepsy on social deprivation, the effects of epilepsy drugs, as well as prescribing trends in epilepsy, they are using natural language processing (NLP) to extract clinical data from clinic letters for epilepsy research.

As part of a study published online in the BMJ Journal of Neurology, Neurosurgery and Psychiatry, the team studied mothers that had epilepsy, recorded the type of epilepsy drug that they were prescribed during pregnancy and analysed their children’s school test results, comparing them with a matched control group.

The study

Several previous studies have shown that epilepsy drugs, particularly sodium valproate, when taken during pregnancy, are associated with neurodevelopmental disorders, but few of these studies have been based on population data.

Arron and the team used routinely collected healthcare information from SAIL Databank and national school test data at Key Stage 1 to compare the academic performance of seven year olds in Wales born to mothers with epilepsy to the matched control group.

Prescription patterns were divided into five categories: treatment with one drug (carbamazepine, lamotrigine or sodium valproate); a combination of epilepsy drugs; and no drug treatment.

Research findings

The results showed that children born to mums prescribed sodium valproate during their pregnancy performed 10.5 to just under 13 per cent less well on all Key Stage 1 tests than those in the control group.

Children born to mums who had been prescribed epilepsy drugs in combination achieved worse results still, where scores were 19­–22 percent lower than the control group.

Children born to mums who had been prescribed carbamazepine or lamotrigine, or nothing, performed just as well as those born to mums in the control group.

The research team acknowledge that they weren’t able to account for certain potentially influential factors, such as the mothers’ IQ, weight or alcohol consumption; the doses of epilepsy drugs prescribed; or intake of folic acid around conception.

SAIL epilepsy research Arron Lacey

Co-author and clinical neurologist Dr Owen Pickrell suggested it’s possible that the mum’s epilepsy severity may affect their child’s school performance.

“This might be due to a more severe underlying brain pathology, which is partly genetic and may be passed on to the child.”

Dr Pickrell also pointed out that it’s generally women with more severe epilepsy who are receiving polytherapy during pregnancy.

Key outcomes

Arron states in the research paper:

“While this study highlights the risk of cognitive effects in the children of mothers prescribed sodium valproate or multiple [anti-epilepsy drugs], it is important to acknowledge that some epilepsies are difficult to treat without these treatment regimens.”

“Women with epilepsy who need drugs to control their seizures are currently advised to continue taking them during pregnancy because convulsions can harm both mother and unborn child.”

“Women with epilepsy should be informed of this risk and alternative treatment regimens should be discussed before their pregnancy with a physician that specialises in epilepsy.”

In a linked commentary, Dr Richard Chin of the University of Edinburgh’s Muir Maxwell Epilepsy Centre, emphasises the importance of a study that is based on population data as this can be used to inform preventive/interventional strategies and help women to better understand the implications of epilepsy treatment while pregnant.

“By providing ‘functional’ outcome data from their study, the authors have now provided information that prospective parents may find readily tangible: it should be included in information given to women with epilepsy prior to pregnancy.”


Hands-on Work Experience with Research using Population Health Data

What is this about?

The Secure Anonymised Information Linkage (SAIL) Databank, based in Swansea University Medical School, is a world-class resource for conducting population health research, holding billions of records from a variety of data sources covering the Welsh population.

An interdisciplinary workforce in the Data Science Building conducts a variety of methodological and applied research projects, as well as supporting and further developing SAIL Databank. Skills used include software development, database development and management, epidemiology, statistics, data mining, data visualisation and more.

This internship program offers a 12 week paid position to students and others interested in gaining experience with real-world data analysis, to develop and apply their skills in this highly secure state of the art technical environment. It is a great opportunity to develop your skills working as part of a multi-disciplinary team, enhance your experience and work with some of the latest technology to assist you on your path to further employment.

This work placement may be especially relevant for students in the Health Data Science, Health Informatics, and Computer Science degree programs at Swansea University, though applications from other disciplines who have similar skills, expertise and technical background who are interested are welcome.

What are we looking for?

We have a wide range of opportunities which require a diverse array of skills. The SAIL analytical services team, for example, use SQL, R, Python and a range of other tools to undertake the research activities of the department.

We are looking for people that can work well as part of a team and are interested in contributing to data analysis, software development, and related research activities.

Placement Departments and Example Projects

SAIL Analytical Services is an interdisciplinary team that provides research support for the SAIL Databank, collaborates with internal and external research groups on SAIL research projects, and leads our own methodological research projects in developing new methods of working with large datasets. Examples of possible projects with our team:

  • Analysis and Visualisation of GP Dataset Coverage – SAIL Databank holds primary care data for 75% of the practices in Wales.  A common question is how representative the data we hold is.  The goal of this project is to undertake analysis to compare the population with primary care data to the Welsh population overall, and develop visualisations showing the geographic, historical, and demographic profile of the dataset.
  • Accuracy of Address Records – SAIL Databank holds resident address data from individuals in several different datasets.  The accuracy of address data could be assessed by comparing different sources.
  • Automated Cohort Creation – Develop tools to assist in automatic selection and creation of cohorts selected from the general population for research.  Including capturing relevant variables that will be used in a study.
  • Health Data Research UK (HDRUK) supports world-leading research to develop cutting-edge analytical tools and methodologies to address the most pressing health research challenges. These tools and methods allow us to use complex and diverse data at an unprecedented depth and scale. As a national informatics research programme, HDR UK can capitalise on the UK’s unique research strengths and data assets including those in the SAIL Databank.

We would hope that an applicant joining us would add value by maximising the utility of available routine health-related data, as well as UK cohorts, surveys and non-health administrative data, using novel and reproducible methods and techniques to produce a suitable output by the end of their internship towards our ongoing research portfolio and strategic goals.


Work placements are paid, up to full time, for a period of 12 weeks. Start and end dates can be flexible based on students’ schedules.

The payment is £16,654 per annum on a pro rata basis, with the number of hours per week to be agreed with the student. Some students will be limited in the hours they can work by university policy (a 12 hour per week limit for those in full time masters’ programs at Swansea University, and a 20 hour per week limit for international students on tier 4 visas).

Who can apply?

Anyone who is either a student in a relevant degree program (at Swansea University or elsewhere), or has some relevant experience, is welcome to apply, provided the applicant is eligible to work in the UK.


For a brief informal discussion about accessing work placement opportunities at the  Data Science building, please contact Dan Thayer, Senior Data Scientist – Team Lead, SAIL Analytical Services ( or Ashley Akbari, Senior Research Officer, Health Data Research UK (

Learn more about Data Science at Swansea University Medical School.

Application process

Please send a CV and covering letter to Dan Thayer ( and Ashley Akbari ( by May 11th 2018.