Testing Gastrointestinal Disease Outbreaks with Statistical Modelling

30th May, 2019
    Nikola OndrikovaPhD Student

"Winter is coming!" is the motto of House Stark known to all Game of Thrones fans. One interpretation for this concerns the White Walkers coming to destroy the living. As you may have gathered from the title, the blog is not aimed at discussing the series. However, the threat looming over us every winter has its equivalent in our world - norovirus. The winter-vomiting-bug is a highly contagious virus causing gastrointestinal illness in humans, mostly transmitted by direct contact with an infected person. Although it can be caught all year round, the majority of infections come during winter like the Night King with his crew.

Unfortunately, norovirus is not alone. We are at risk of gastrointestinal illness from a range of pathogens entering our bodies every day. Looking at the seasonal profiles, geographic patterns and risk factors for infection, these agents differ dramatically. For example, Campylobacter is a bacterial pathogen observed more frequently in summer associated with the consumption of contaminated food, as opposed to winter norovirus mentioned above. Outbreaks of gastrointestinal disease cause disruption in institutions such as schools, hospitals, and care homes; with children missing school days, patients staying longer in hospital and elderly being put at risk of severe health complications.

Extending the knowledge of how various gastrointestinal pathogens spread within a population could help to enhance outbreak management. Nevertheless, the process of disease transmission from one person to another is dynamic and can change over time, making it challenging to estimate the burden of the disease in real time.

Have no fear! A multidisciplinary approach is here to the rescue! Four years, one PhD student and multiple supervisors from diverse backgrounds giving you an A-team to help tackle the challenge. With financial support from EPSRC, ESRC and Public Health England as collaborators for this project, we will improve the understanding of gastrointestinal illness epidemiology using statistical modelling methods.

Example of how the results of the statistical model could be implemented and used via user interface. (Author: Nikola Ondrikova)

Statistical models help us decipher the relationship between one or multiple factors on one side, and the phenomenon we would like to understand better on the other. In other words, we can use mathematical equations to find patterns in the world which are too complicated for the human mind. In our case, we are interested how the number of people getting sick from gastrointestinal pathogen changes over time and in space For instance, if an outbreak of norovirus is reported in a school within a highly populated urban area, the risk due to the infection spreading is likely to be higher than when the same happens in the outskirts of a city. In this example, to understand the burden of the disease, you need to have a dataset with information on population density and rural-urban classification for a particular area. To represent time, you need to have information on the date and the number of reported cases on that day. Luckily, we have access to a public health surveillance dataset consisting of weekly reports on cases of gastroenteritis from laboratories, GPs and individuals across England and Wales.

As good as it all sounds, there is still one more problem. The data is biased towards the older population because hospitals and care homes report outbreaks of gastrointestinal infection more readily, and the rest of the community do not. Also, different agents cause different levels of severity of the symptoms, and so the mechanism of reporting or rather non-reporting varies as well. For instance, organisms causing severe symptoms such as Campylobacter are more likely to appear in national statistics than milder, self-limiting pathogens such as norovirus. Adding computer science to the mix, we can determine periods throughout the year when the public is aware of gastrointestinal outbreaks with media coverage data and analysis. By comparing the number of reports from the time window defined as "a high-diarrhoeal awareness period" and outside of this window, we can quantify the uncertainty in the dataset and account for the under-reporting mechanisms in the statistical model.

In conclusion, gastrointestinal outbreaks are an issue affecting health services and population health. The combination of state-of-art statistical modelling methods with analysis of media outlets has the potential to significantly enhance our understanding of the temporal and geographical epidemiology as well as the under-reporting mechanism of gastrointestinal illness.