From 4b7e6fc64f7d24a5999b7bd3e24cc7243363cce4 Mon Sep 17 00:00:00 2001 From: Jason Pott <43917006+jasonpott@users.noreply.github.com> Date: Tue, 1 Oct 2024 16:33:08 +0100 Subject: [PATCH 1/3] Update synthetic_news_data.Rmd Added in information about the NEWS score and some code examples that can be used to calculate the individual NEWS sub scores --- vignettes/synthetic_news_data.Rmd | 91 +++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) diff --git a/vignettes/synthetic_news_data.Rmd b/vignettes/synthetic_news_data.Rmd index 7cf0a15..41f0266 100644 --- a/vignettes/synthetic_news_data.Rmd +++ b/vignettes/synthetic_news_data.Rmd @@ -101,6 +101,97 @@ This dataset is available from the [NHSRDatasets](https://CRAN.R-project.org/pac For mode information about the [synthpop](http://gradientdescending.com/generating-synthetic-data-sets-with-synthpop-in-r/) package. +## What is NEWS? + +NEWS is short for the National Early Warning Score. [NHS England have provided a detailed introduction here](https://www.england.nhs.uk/ourwork/clinical-policy/sepsis/nationalearlywarningscore/) + +The latest iteration of the NEWS score is NEWS2 + +The premise of NEWS is that physiology such as heart rate (pulse), respiration rate, consciousness (GCS or AVPU) are all routinely measured. +GCS = Glasgow Coma Score (Categorical score 3-15) measuring the Eyes, verbal and motor responses. +AVPU = A categorical description of how concious a patient is A - Alert, V - Responds to voice, P - Responds to painful stimuli, U - Unresponsive + +However there are a range of professional groups who use these measurements, and it can e challenging to recognise the deteriorating patient from the raw measurements alone especially if you do not often work with acutely unwell patients + +NEWS(2) provides categorical classifications for distinct ranges of physiology. Each category is scored 0-3 + +The more abnormal a measure of physiology the greater the categorical score attributed. The score is supposed to be calculated at the time the physiology is measured. In a hospital this is often when the nurse or healthcare assistant completes their observation rounds. + +The categorical NEWS score then is linked to distinct actions that should be followed. These actions will typically be localised by organisations depending on the level of resource that is available to support medical emergencies. + +There are some criticisms of NEWS that were addressed by NEWS2. These were that normal measures of Oxygen saturation (SpO2) were not universal and often meant over escalation of "normal" abnormal physiology in patients with respiratory diseases such as COPD. These were addressed though adjusted ranges for SpO2. + +There have also been concerns that in some cases the NEWS score has been introduced to settings (often mandatory) where it has not been validated. The Score was developed by the Royal College of Physicians. They often represent clinical specialties who work in, in-patient medicine. As such the data that was used to develop the score was based on data from patients who were typically out of the acute phase of their illness and so abnormal physiology was a measure post therapeutic interventions. In most Cases NEWS has been shown to be robust to these criticisms. + +NEWS is more work for (typically nursing) staff to complete, NEWS is also not validated as an incomplete score for example where just a heart rate, Blood pressure and SpO2 are recorded which is a common set of measurements in most outpatient settings. + +### Here are some code chunks for the calculation of NEWS sub scores: + +#### Systolic Blood pressure +```{r} +sbp_news <- NEWS_var%>% + mutate (sbp = as.numeric(pulse)) %>% + mutate(news = case_when( + sbp <= 90 | sbp >=220 ~ 3, + sbp %in% c(91:100) ~ 2, + sbp %in% c(101:110) ~ 1, + !is.numeric(pulse) ~ NA_real_, + TRUE ~ 0) +) +``` + +#### Heart Rate +```{r} +hr_news <- NEWS_var%>% + mutate (pulse = as.numeric(pulse)) %>% + mutate(news = case_when( + pulse <= 40 | pulse >=131 ~ 3, + pulse %in% c(111:130) ~ 2, + pulse %in% c(41:50,91:110) ~ 1, + !is.numeric(pulse) ~ NA_real_, + TRUE ~ 0) +) +``` + +#### Resp Rate +```{r} +rr_news <- NEWS_var%>% + mutate (resp_rate = as.numeric(resp_rate)) %>% + mutate(news = case_when( + resp_rate <= 8 | resp_rate >=25 ~ 3, + resp_rate %in% c(21:24) ~ 2, + resp_rate %in% c(9:11) ~ 1, + !is.numeric(resp_rate) ~ NA_real_, + TRUE ~ 0) +) +``` + +#### SpO2 +```{r} +NEWS_var%>% + mutate(news = case_when( + spo2 <= 91 ~3, + spo2 %in% c(92:93) ~2, + spo2 %in% c(94:95) ~1, + !is.numeric(spo2) ~ NA_real_, + TRUE ~ 0 + )) +``` + +#### Temperature +```{r} +NEWS_var%>% + mutate(news = case_when( + temperature <= 35 ~ 3, + temperature >= 39.1 ~ 2, + temperature %in% c(38.1:39,35.1:36) ~ 1, + !is.numeric(temperature) ~ NA_real_, + TRUE ~ 0)) +``` + +In addition NEWS2 has altered ranges for patients with known respiratory diseases. These need additional logic on a per patient basis to implement. + + ## Summary In many ways, synthetic data reflects George Box’s observation that “all models are wrong, but some are useful” while providing a “useful approximation [of] those found in the real world,” From e49552b7a6760722603f72a6fbb880f8eb05f1ba Mon Sep 17 00:00:00 2001 From: Lextuga007 Date: Tue, 1 Oct 2024 17:57:26 +0100 Subject: [PATCH 2/3] Removed original vignette to move to Quarto website as a blog --- vignettes/synthetic_news_data.Rmd | 82 +------------------------------ 1 file changed, 1 insertion(+), 81 deletions(-) diff --git a/vignettes/synthetic_news_data.Rmd b/vignettes/synthetic_news_data.Rmd index 41f0266..7c53529 100644 --- a/vignettes/synthetic_news_data.Rmd +++ b/vignettes/synthetic_news_data.Rmd @@ -1,6 +1,6 @@ --- title: "Synthetic NEWS Data" -author: "Dr Muhammad Faisal, Gary Hutson, and Professor Mohammed A Mohammed" +author: "Jason Pott" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Synthetic NEWS Data} @@ -15,86 +15,6 @@ knitr::opts_chunk$set( ) ``` -## What is Synthetic data? - -The goal is to generate a data set which contains no real units, therefore safe for public release and retains the structure of the data. - -In other words, one can say that synthetic data contains all the characteristics of original data minus the sensitive content. - -Synthetic data is generally made to validate mathematical models. This data is used to compare the behaviour of the real data against the one generated by the model. - - -## How we generate synthetic data? - -The principle is to observe real-world statistic distributions from the original data and reproduce fake data by drawing simple numbers. - -Consider a data set with $p$ variables. In a nutshell, synthesis follows these steps: - -1. Take a simple random sample of $x_{1,obs}$ and set as $x_{1,syn}$ -2. Fit model $f(x_{2,obs}|x_{1,obs})$ and draw $x_{2,syn}$ from $f(x_{2,syn}|x_{1,syn})$ -3. Fit model $f(x_{3,obs}|x_{1,obs},x_{2,obs})$ and draw $x_{3,syn}$ from $f(x_{3,syn}|x_{1,syn},x_{2,syn})$ -4. And so on, until $f(x_{p,syn}|x_{1,syn},x_{2,syn},...,x_{p-1,syn})$ - -Fitting statistical models to the original data and generating completely new records for public release. -Joint distribution $f(x_1,x_2,x_3,…,x_p)$ is approximated by a set of conditional distributions $f(x_2|x_1)$. - - -## Synthetic data generation - National early warning score (NEWS) utilising real data - -The data this is based on is the [NEWS](https://www.rcplondon.ac.uk/projects/outputs/national-early-warning-score-news-2) Score devised by the Royal College of Physicians. - -Synthetic data can be generated from new data, utilising the above methodology, on the real observed data: - -```{r observed_data_generation} -library(readr) -library(dplyr) -df <- suppressWarnings(read_csv("https://raw.githubusercontent.com/StatsGary/SyntheticNEWSData/main/observed_news_data.csv") %>% - dplyr::select(everything(), -X1)) - -glimpse(df) -``` - -This reads in the observed NEWS data from the GitHub repository. Now, we will utilise the `synthpop` package to create a synthetically generated dataset. - -## Generating synthetic NEWS dataset using synthpop package - -As stated, now we will use the real observed data and generate a synthetic set, utilising the equations and process mapped out in the preceding sections: - -```{r synth} -library(synthpop) -syn_df <- syn(df, seed = 4321) -#### synthetic data -synthetic_news_data <- syn_df$syn -glimpse(synthetic_news_data) -``` - -```{r visuals} -library(ggplot2) -# Create temperature tibbles to compare observed vs synthetically generated labels -obs <- tibble(label = "observed_data", value = df$temp) -synth <- tibble(label = "synthetic_data", value = synthetic_news_data$temp) - -# Merge the frames together to get a comparison -merged <- obs %>% - bind_rows(synth) - -# Create the plot -plot <- merged %>% - ggplot(aes(value, fill = label)) + - geom_histogram(alpha = 0.9, position = "identity") + - theme_minimal() + - scale_fill_manual(values = c("#BCBDC1", "#2061AC")) + - labs( - title = "Observed vs Synthetically NEWS values", - subtitle = "Based on NEWS Temperature score", - x = "NEWS Temperature Score", y = "Score frequency" - ) + - theme(legend.position = "none") - -print(plot) -``` - - ## Loading the dataset from NHSRDatasets This dataset is available from the [NHSRDatasets](https://CRAN.R-project.org/package=NHSRdatasets) package and similar comparisons can be made with the above. These examples can be used for data wrangling and data visualisation. From ae90f7f161bd1df08138e5e50ddb20127964fda1 Mon Sep 17 00:00:00 2001 From: Lextuga007 Date: Tue, 1 Oct 2024 19:49:05 +0100 Subject: [PATCH 3/3] Minor edits, applied styler, added NHSRdatasets --- vignettes/synthetic_news_data.Rmd | 98 ++++++++++++++++++------------- 1 file changed, 58 insertions(+), 40 deletions(-) diff --git a/vignettes/synthetic_news_data.Rmd b/vignettes/synthetic_news_data.Rmd index 7c53529..cf6e63b 100644 --- a/vignettes/synthetic_news_data.Rmd +++ b/vignettes/synthetic_news_data.Rmd @@ -19,102 +19,120 @@ knitr::opts_chunk$set( This dataset is available from the [NHSRDatasets](https://CRAN.R-project.org/package=NHSRdatasets) package and similar comparisons can be made with the above. These examples can be used for data wrangling and data visualisation. +```{r} +library(NHSRdatasets) + +NEWS_var <- NHSRdatasets::synthetic_news_data + +``` + For mode information about the [synthpop](http://gradientdescending.com/generating-synthetic-data-sets-with-synthpop-in-r/) package. ## What is NEWS? NEWS is short for the National Early Warning Score. [NHS England have provided a detailed introduction here](https://www.england.nhs.uk/ourwork/clinical-policy/sepsis/nationalearlywarningscore/) -The latest iteration of the NEWS score is NEWS2 +The latest iteration of the NEWS score is NEWS2. + +The premise of NEWS is that physiology such as heart rate (pulse), respiration rate, consciousness (GCS or AVPU) are all routinely measured. -The premise of NEWS is that physiology such as heart rate (pulse), respiration rate, consciousness (GCS or AVPU) are all routinely measured. GCS = Glasgow Coma Score (Categorical score 3-15) measuring the Eyes, verbal and motor responses. -AVPU = A categorical description of how concious a patient is A - Alert, V - Responds to voice, P - Responds to painful stimuli, U - Unresponsive +AVPU = A categorical description of how concious a patient is A - Alert, V - Responds to voice, P - Responds to painful stimuli, U - Unresponsive. -However there are a range of professional groups who use these measurements, and it can e challenging to recognise the deteriorating patient from the raw measurements alone especially if you do not often work with acutely unwell patients +However there are a range of professional groups who use these measurements, and it can e challenging to recognise the deteriorating patient from the raw measurements alone especially if you do not often work with acutely unwell patients. -NEWS(2) provides categorical classifications for distinct ranges of physiology. Each category is scored 0-3 +NEWS(2) provides categorical classifications for distinct ranges of physiology. Each category is scored 0-3. The more abnormal a measure of physiology the greater the categorical score attributed. The score is supposed to be calculated at the time the physiology is measured. In a hospital this is often when the nurse or healthcare assistant completes their observation rounds. The categorical NEWS score then is linked to distinct actions that should be followed. These actions will typically be localised by organisations depending on the level of resource that is available to support medical emergencies. +### Criticisms of NEWS + There are some criticisms of NEWS that were addressed by NEWS2. These were that normal measures of Oxygen saturation (SpO2) were not universal and often meant over escalation of "normal" abnormal physiology in patients with respiratory diseases such as COPD. These were addressed though adjusted ranges for SpO2. -There have also been concerns that in some cases the NEWS score has been introduced to settings (often mandatory) where it has not been validated. The Score was developed by the Royal College of Physicians. They often represent clinical specialties who work in, in-patient medicine. As such the data that was used to develop the score was based on data from patients who were typically out of the acute phase of their illness and so abnormal physiology was a measure post therapeutic interventions. In most Cases NEWS has been shown to be robust to these criticisms. +There have also been concerns that in some cases the NEWS score has been introduced to settings (often mandatory) where it has not been validated. The Score was developed by the Royal College of Physicians. They often represent clinical specialties who work in-patient medicine. As such the data that was used to develop the score was based on data from patients who were typically out of the acute phase of their illness and so abnormal physiology was a measure post therapeutic interventions. In most Cases NEWS has been shown to be robust to these criticisms. NEWS is more work for (typically nursing) staff to complete, NEWS is also not validated as an incomplete score for example where just a heart rate, Blood pressure and SpO2 are recorded which is a common set of measurements in most outpatient settings. ### Here are some code chunks for the calculation of NEWS sub scores: -#### Systolic Blood pressure +#### Systolic Blood pressure (column `syst`) + ```{r} -sbp_news <- NEWS_var%>% - mutate (sbp = as.numeric(pulse)) %>% +library(NHSRdatasets) +library(dplyr) + +sbp_news <- NEWS_var |> + mutate(sbp = as.numeric(syst)) |> mutate(news = case_when( - sbp <= 90 | sbp >=220 ~ 3, + sbp <= 90 | sbp >= 220 ~ 3, sbp %in% c(91:100) ~ 2, sbp %in% c(101:110) ~ 1, !is.numeric(pulse) ~ NA_real_, - TRUE ~ 0) -) + TRUE ~ 0 + )) ``` -#### Heart Rate +#### Heart Rate (column `pulse`) + ```{r} -hr_news <- NEWS_var%>% - mutate (pulse = as.numeric(pulse)) %>% +hr_news <- NEWS_var |> + mutate(pulse = as.numeric(pulse)) |> mutate(news = case_when( - pulse <= 40 | pulse >=131 ~ 3, + pulse <= 40 | pulse >= 131 ~ 3, pulse %in% c(111:130) ~ 2, - pulse %in% c(41:50,91:110) ~ 1, + pulse %in% c(41:50, 91:110) ~ 1, !is.numeric(pulse) ~ NA_real_, - TRUE ~ 0) -) + TRUE ~ 0 + )) ``` -#### Resp Rate +#### Resp Rate (column `resp`) + ```{r} -rr_news <- NEWS_var%>% - mutate (resp_rate = as.numeric(resp_rate)) %>% +rr_news <- NEWS_var |> + mutate(resp_rate = as.numeric(resp)) |> mutate(news = case_when( - resp_rate <= 8 | resp_rate >=25 ~ 3, + resp_rate <= 8 | resp_rate >= 25 ~ 3, resp_rate %in% c(21:24) ~ 2, resp_rate %in% c(9:11) ~ 1, !is.numeric(resp_rate) ~ NA_real_, - TRUE ~ 0) -) + TRUE ~ 0 + )) ``` -#### SpO2 +#### SpO2 Oxygen Saturation (column `sat`) + ```{r} -NEWS_var%>% +NEWS_var |> mutate(news = case_when( - spo2 <= 91 ~3, - spo2 %in% c(92:93) ~2, - spo2 %in% c(94:95) ~1, - !is.numeric(spo2) ~ NA_real_, + sat <= 91 ~ 3, + sat %in% c(92:93) ~ 2, + sat %in% c(94:95) ~ 1, + !is.numeric(sat) ~ NA_real_, TRUE ~ 0 )) ``` -#### Temperature +#### Temperature (column `temp`) + ```{r} -NEWS_var%>% +NEWS_var |> mutate(news = case_when( - temperature <= 35 ~ 3, - temperature >= 39.1 ~ 2, - temperature %in% c(38.1:39,35.1:36) ~ 1, - !is.numeric(temperature) ~ NA_real_, - TRUE ~ 0)) + temp <= 35 ~ 3, + temp >= 39.1 ~ 2, + temp %in% c(38.1:39, 35.1:36) ~ 1, + !is.numeric(temp) ~ NA_real_, + TRUE ~ 0 + )) ``` In addition NEWS2 has altered ranges for patients with known respiratory diseases. These need additional logic on a per patient basis to implement. - ## Summary -In many ways, synthetic data reflects George Box’s observation that “all models are wrong, but some are useful” while providing a “useful approximation [of] those found in the real world,” +In many ways, synthetic data reflects George Box's observation that "all models are wrong, but some are useful" while providing a "useful approximation [of] those found in the real world". The connection between the clinical outcomes of a patient visits and costs rarely exist in practice, so being able to assess these trade-offs in synthetic data allow for measurement and enhancement of the value of care – cost divided by outcomes.