tlf-baseline.Rmd

# Baseline characteristics

Following [ICH E3 guidance](https://database.ich.org/sites/default/files/E3_Guideline.pdf),
we need to summarize critical demographic and baseline characteristics of the participants
in Section 11.2, Demographic and Other Baseline Characteristics.

In this chapter, we illustrate how to create a simplified
baseline characteristics table for a study.

```{r, include=FALSE}
source("common.R")
```

```{r, out.width = "100%", out.height = "400px", echo = FALSE, fig.align = "center"}
knitr::include_graphics("tlf/tlf_base.pdf")
```

There are many R packages that can efficiently summarize baseline information.
The [`table1`](https://github.com/benjaminrich/table1) R package is one of them.

```{r}
library(table1)
library(r2rtf)
library(haven)
library(dplyr)
library(tidyr)
library(stringr)
library(tools)
```

As in previous chapters, we first read the `adsl` dataset that contains all
the required information for the baseline characteristics table.

```{r}
adsl <- read_sas("data-adam/adsl.sas7bdat")
```

For simplicity, we only analyze `SEX`, `AGE` and, `RACE` in this example
using the `table1` R package. More details of the `table1` R package can
be found in the package
[vignettes](https://benjaminrich.github.io/table1/vignettes/table1-examples.html).

The `table1` R package directly creates an HTML report.

```{r}
ana <- adsl %>%
  mutate(
    SEX = factor(SEX, c("F", "M"), c("Female", "Male")),
    RACE = toTitleCase(tolower(RACE))
  )

tbl <- table1(~ SEX + AGE + RACE | TRT01P, data = ana)
tbl
```

The code below transfer the output into a dataframe
that only contains ASCII characters
recommended by regulatory agencies.
`tbl_base` is used as input for `r2rtf` to create the final report.

```{r}
tbl_base <- tbl %>%
  as.data.frame() %>%
  as_tibble() %>%
  mutate(across(
    everything(),
    ~ str_replace_all(.x, intToUtf8(160), " ")
  ))


names(tbl_base) <- str_replace_all(names(tbl_base), intToUtf8(160), " ")
tbl_base
```

We define the format of the output. We highlight items that
are not discussed in previous discussion.

`text_indent_first` and `text_indent_left` are used to control the indent
space of text. They are helpful when you need to control the white space
of a long phrase, "AMERICAN INDIAN OR ALASKA NATIVE"
in the table provides an example.

```{r}
colheader1 <- paste(names(tbl_base), collapse = "|")
colheader2 <- paste(tbl_base[1, ], collapse = "|")
rel_width <- c(2.5, rep(1, 4))

tbl_base[-1, ] %>%
  rtf_title(
    "Baseline Characteristics of Participants",
    "(All Participants Randomized)"
  ) %>%
  rtf_colheader(colheader1,
    col_rel_width = rel_width
  ) %>%
  rtf_colheader(colheader2,
    border_top = "",
    col_rel_width = rel_width
  ) %>%
  rtf_body(
    col_rel_width = rel_width,
    text_justification = c("l", rep("c", 4)),
    text_indent_first = -240,
    text_indent_left = 180
  ) %>%
  rtf_encode() %>%
  write_rtf("tlf/tlf_base.rtf")
```

```{r, include=FALSE}
rtf2pdf("tlf/tlf_base.rtf")
```

```{r, out.width = "100%", out.height = "400px", echo = FALSE, fig.align = "center"}
knitr::include_graphics("tlf/tlf_base.pdf")
```

In conclusion, the procedure to generate demographic and baseline characteristics table is summarized as follows:

- Step 1: Read the data set.
- Step 2: Use `table1::table1()` to get the baseline characteristics table.
- Step 3: Transfer the output from Step 2 into a data frame that only contains ASCII characters.
- Step 4: Define the format of the RTF table by using the R package `r2rtf`.