SignalDetectionTool.Rmd
The SignalDetectionTool (SDT) is an R package which contains an R shiny app to perform signal detection on infectious disease surveillance data and visualise results. Furthermore the functions of this R package can also be used standalone without running the shiny application. This vignette gives guidance on how to use the functions within the R package.
This section is extensively described in the README of the package thus we give minimal instructions here. First install and load the SDT package.
library(SignalDetectionTool)
Run the app entering the following command in the console.
run_app()
It is possible to run the app with customised settings using a yaml file. Please have a look at the README.
The input to the SDT is a linelist of infectious disease cases
structured according to the format defined in
input_metadata
. A sample linelist
(input_example
) and its corresponding metadata
specification (input_metadata
) is available in the package
as internal datasets and can be accessed using:
To apply signal detection to your data for the most recent 6 weeks
you first need to preprocess your linelist and then apply
get_signals()
. Inside get_signals()
the
linelist is aggregated to a timeseries containing weekly counts of
cases.
To apply different signal detection methods use the parameter method. To
generate stratified signals use the parameter stratification. Please
have a look at the function documentation for more details. This is an
example for signals generated using the default paramters with
FarringtonFlexible and no stratification:
data_prepro <- input_example %>% preprocess_data()
signals <- data_prepro %>% get_signals()
The generated signals
output is a data.frame containing
the aggregated number of cases per week with additional columns
cases_in_outbreak
, alarms
,
upperbound
, expected
, category
,
stratum
, method
and
number_of_weeks
.
cases_in_outbreak
is only generated if the column
outbreak_status
was available in your provided linelist. It
counts how many cases in this week were already part of a known
outbreak.alarms
is a column with booleans indicating for which
weeks a signal has been generated. Of course the majority of weeks in
the aggregated timeseries contain NA
in this column as only
the most recent selected number_of_weeks
are filled with
TRUE
of FALSE
.upperbound
contains a numeric threshold calculated with
the signal detection methods. As for alarms
the majority of
weeks in the aggregated timeseries contain NA
as this is
only calculated for the number_of_weeks
for which signals
are generated. Furthermore some algorithms such as EARS do not calculate
a threshold.expected
is a numeric value giving the expected number
of cases by the signal detection algorithm. Only filled for the weeks
for which signals are generated.category
Character string specifying to which stratum
timeseries belongs to when having added stratification parameters (more
details in the next example). For unstratified timeseries it is
NA
.stratum
Character string specifying to which category
of the stratum the timeseries belongs to when having added
stratification parameters (more details in the next example). For
unstratified timeseries it is NA
.method
Character string specifying the signal detection
method which was used.number_of_weeks
Integer specifying the number of weeks
for which signals were generated.Generating stratified signals using EARS:
data_prepro <- input_example %>% preprocess_data()
signals_ears_stratified <- data_prepro %>% get_signals(
method = "ears",
stratification = c("age_group", "county")
)
The signals_ears_stratified
output is a data frame in
long format, containing multiple time series of weekly aggregated case
counts. Each time series corresponds to a unique combination of the
specified stratification variables, with all strata stacked
vertically.
The columns category
and stratum
identify
the individual timeseries. In this example we obtain the following
values:
* category
is filled with the column names provided in the
stratification
parameter.
* stratum
with the values of these variables.
In this example we obtain:
* category
is filled with “age_group” and “county”
* stratum
contains “00–04”, “05–09”, “10–14”, “15–19”,
“20–24”, “25–29”, “30–34”, “35–39”, “40–44”, “45–49”, “50–54”, “55–59”,
“60–64”, “65–69”, “70–74”, “75–79”, “80–84”, “85–89”, “90–94”, “95–99”,
“100–104”, “105–109” for rows with category == "age_group"
.
For rows with category == "county"
the values of
stratum
are “Burgenland”, “Kärnten”, “Niederösterreich”,
“Oberösterreich”, “Salzburg”, “Steiermark”, “Tirol”, “Vorarlberg”,
“Wien”.
You can directly generate an HTML or Word report containing signal
results and visualizations using run_report()
. Simply pass
your surveillance linelist—structured according to the format defined in
input_metadata
—to run_report()
. The data will
be preprocessed automatically within the function. The following code
chunk generates a Word report using EARS without any stratification
(default) for the past six weeks (default):
run_report(input_example,
report_format = "DOCX",
method = "EARS"
)
This generates an HTML report (default) with stratification by age group, county and sex for the last 2 weeks using FarringtonFlexible (default):
run_report(input_example,
strata = c("age_group", "county", "sex"),
number_of_weeks = 2
)