From f1d4121d1c375abc7e088e014263f8cc3f3403f3 Mon Sep 17 00:00:00 2001 From: ntaranto Date: Thu, 31 Oct 2024 17:33:59 -0400 Subject: [PATCH 01/14] Add files via upload --- FinalProject.qmd | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 FinalProject.qmd diff --git a/FinalProject.qmd b/FinalProject.qmd new file mode 100644 index 000000000..dcfc7cf39 --- /dev/null +++ b/FinalProject.qmd @@ -0,0 +1,32 @@ +--- +title: "Final Presentation" +format: html +editor: visual +--- + +## Final Project Overview: Instructions -- The overview consists of 2-3 sentences summarizing the project and goals. + + +## Introduction: For the introduction, the first paragraph describes the problem addressed, its significance, and some background to motivate the problem. + +After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA as risk factors for recurrence. In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells in the bone marrow and circulating tumor DNA in the blood. + +# In the second paragraph, explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff. + + +## Methods-- Start working on the Methods/Results section, which consists of code and its output along with text describing what you are doing. Push your draft to GitHub and provide a link to it. + +```{r} +1 + 1 +``` + + +## Results +You can add options to executable code like this + +```{r} +#| echo: false +2 * 2 +``` + +The `echo: false` option disables the printing of code (only output is displayed). From 1c0eef806467c0357bd3868963c56317b1c2489b Mon Sep 17 00:00:00 2001 From: ntaranto Date: Tue, 12 Nov 2024 17:01:35 -0500 Subject: [PATCH 02/14] Update FinalProject.qmd --- FinalProject.qmd | 2918 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 2910 insertions(+), 8 deletions(-) diff --git a/FinalProject.qmd b/FinalProject.qmd index dcfc7cf39..5d928cf86 100644 --- a/FinalProject.qmd +++ b/FinalProject.qmd @@ -7,26 +7,2928 @@ editor: visual ## Final Project Overview: Instructions -- The overview consists of 2-3 sentences summarizing the project and goals. -## Introduction: For the introduction, the first paragraph describes the problem addressed, its significance, and some background to motivate the problem. +## Introduction: After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA as risk factors for recurrence. In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells in the bone marrow and circulating tumor DNA in the blood. -# In the second paragraph, explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff. +This is a translational study, and so we will look at clinical risk factors for ctDNA and DTC positivity. +## Methods-- -## Methods-- Start working on the Methods/Results section, which consists of code and its output along with text describing what you are doing. Push your draft to GitHub and provide a link to it. +In SURMOUNT, ```{r} -1 + 1 +library(here) +library(dplyr) + d <- read.csv(file = here("..", "Datasets", + "surmount184_merged_20241108.csv")) +names(d) + +### I'm not sure what else goes into the methods for this vs results + ``` ## Results -You can add options to executable code like this ```{r} -#| echo: false -2 * 2 + + +#summary variables: final_overall_stage final_t_stage final_n_stage +# final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') final_tumor_grade final_histology +# demo_race_final fu_locreg_site_num (numeric values for local regional site) +# fu_locreg_site_char (character values for local regional site) +# fu_dist_site_num (numeric values for distant site) +# fu_dist_site_char (character values for distant site) +# censor_date (most recent fu_date_to among patients who are alive without local or distant progression) + + +# identify if 22-021 and 21-033 are in here +participant_check <- c("28115-22-021", "28115-21-033") %in% d$participant_id + +# Print results +names(participant_check) <- c("28115-22-021", "28115-21-033") +print(participant_check) +#neither is in here! great! + + +str(d) #to look at what structure variables are in -- > we will need to do some stuff with dates +#timepoint is a character (ok) -- do we need this to be factor? +#collection_date is a character, should convert to date +#eVAF numeric +#mean_VAF numeric +#total variants interger +#ctDNA_detected = character, ok +#ctdna_cohort = integer (but there are some NAs)-- we may want to +#censor_date --> should convert to date +#fu_dist_Date should conver to date +#_fu_locreg_date should convert to date + + + +####date nonsense, come back to this#### NOT DOING THIS PART +library(dplyr) +#this does not quite work as it turns a bunch into NAs. May need to do this for each individual line based on how the data is structured +date_columns <- grep("date", colnames(d), value = TRUE, ignore.case = TRUE) +date_columns +#d[date_columns] <- lapply(d[date_columns], function(x) as.Date(x, format = "%d/%b/%Y")) +#str(d) #converted most of the date columns to dates + + +###### ctDNA to limit to ctDNA cohort (but ok to include NAs as long as they were ever ctDNA cohort == 1) --> shall call this subset_data + +# Step 1: Identify all participant_ids where ctDNA_cohort == 1 +valid_participants <- d |> + filter(ctdna_cohort == 1) |> + pull(participant_id) |> + unique() + +# Step 2: Subset the data to include all rows where participant_id is in the valid list +subset_data <- d |> + filter(participant_id %in% valid_participants) + +# Count the number of unique participant_ids in the subset_data +unique_count <- subset_data |> + summarise(unique_participants = n_distinct(participant_id)) + +# View the result == 109!! +unique_count + +#now we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected +#ctDNA_detected = character, ok + +names(subset_data) +### Excluding the FAILS from this cohort +######create the ctDNA Ever positive variable +table(subset_data$ctDNA_detected) # 2 missing and then FALSE, TRUE + + +# Create the 'ctDNA_ever' variable: +# This will be 1 if ctDNA_detected was 1 for any record for the participant, otherwise 0. +subset_data <- subset_data %>% + group_by(participant_id) %>% + mutate(ctDNA_ever = if_else(any(ctDNA_detected == TRUE), TRUE, FALSE)) %>% + ungroup() + +# View the updated data +table(subset_data$participant_id, subset_data$ctDNA_ever) + +#100 negatives, 9 positives, as it should be! for ctDNA_ever +subset_data |> + group_by(participant_id) |> + summarize(ctDNA_ever = first(ctDNA_ever)) |> + count(ctDNA_ever) + +####### DTCS -- create DTC_ever -- come back to this ######## +#do the same thing for DTCs --ever dtc positive --> this is a little wonky, want to ensure we aren't actually eliminating any of the dtc results in our subsetting... + +names(subset_data) #looking at the names of variables to find the DTC indicator variable +library(stringr) + +#variable is dtc_ihc_result_final +#dtc_ihc_summary_count +#dtc_final_result_ date + +#looking at the unique counts for the different dtc variables, dtc_ihc_result_final has more than final_result so is the combined one +unique(subset_data$dtc_ihc_result_final) #0, NA, 1 +table(subset_data$dtc_ihc_result_final) # 152 zeros, 41 positives as I can see, where are the NAs? +table(d$dtc_ihc_result_final) +sum(is.na(subset_data$dtc_ihc_result_final)) #128 NAs +sum(is.na(subset_data$FINAL_RESULT)) # 447 NAs + + +unique(subset_data$FINAL_RESULT) +table(subset_data$FINAL_RESULT) #79 people using this one + + +subset_data <- subset_data |> + group_by(participant_id) |> + mutate(dtc_ever = if_else(any(dtc_ihc_result_final == 1, na.rm = TRUE), 1, 0)) |> + ungroup() + +# View the updated data +table(subset_data$participant_id, subset_data$dtc_ever) + +#70 DTC ever negatives, 39 positives, correct! +subset_data |> + group_by(participant_id) |> + summarize(dtc_ever = first(dtc_ever)) |> + count(dtc_ever) + + +########### Variables to look at for Table 1 ######### + +###### median age at diagnosis + +names(subset_data) #to identify the variables I want to use +str(subset_data$diag_date_1) #character +str(subset_data$demo_dob) #character + +d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y") +d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y") + +str(d$diag_date_1) #dates! +str(d$demo_dob) #dates! + +### doing the same for subset_data as it didn't carry over into that data set +subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y") +subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y") + +# calculating age from date of diagnosis to dob +subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25 +head(subset_data$age_at_diag) + +summary(subset_data$age_at_diag) #median 48.75 + +age_summary <- subset_data %>% + group_by(ctDNA_ever) %>% + summarise( + mean_age = mean(age_at_diag, na.rm = TRUE), # Calculate mean age + median_age = median(age_at_diag, na.rm = TRUE), # Calculate median age + sd_age = sd(age_at_diag, na.rm = TRUE), # Calculate standard deviation of age + n = n() # Number of participants in each group + ) + +print(age_summary) + +# Perform the Wilcoxon rank-sum test to compare the medians of age between ctDNA_ever positive and negative groups +wilcox_test_result <- wilcox.test(age_at_diag ~ ctDNA_ever, data = subset_data) + +# Print the result +print(wilcox_test_result) + +#looking at range of age for the ctDNA pos vs neg groups +age_summary <- subset_data %>% + group_by(ctDNA_ever) %>% + summarise( + min_age = min(age_at_diag, na.rm = TRUE), # Minimum age + max_age = max(age_at_diag, na.rm = TRUE), # Maximum age + .groups = "drop" + ) + +# View the summary table +print(age_summary) + +##### Race: demo_race_final + +# Get the count of unique participant_ids for each category in demo_race_final +race_counts_unique_percent <- subset_data %>% + group_by(demo_race_final) %>% + summarise(unique_participants = n_distinct(participant_id)) %>% + mutate(percent = unique_participants / sum(unique_participants) * 100) + +# View the result +print(race_counts_unique_percent) + + + +# Count distinct participant_ids by ctDNA_ever and demo_race_final +count_distinct_participants <- subset_data %>% + group_by(demo_race_final, ctDNA_ever) %>% + summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop") + +# Print the result +count_distinct_participants + + + +library(dplyr) + +# Step 1: Summarize by unique participant_id +summarized_data <- subset_data %>% + group_by(participant_id) %>% + summarise( + ctDNA_ever = first(ctDNA_ever), # Taking the first observed value of ctDNA_ever for each participant + demo_race_final = first(demo_race_final), # Taking the first observed value of demo_race_final for each participant + .groups = "drop" + ) + +# Step 2: Create the contingency table +contingency_table <- table(summarized_data$ctDNA_ever, summarized_data$demo_race_final) +contingency_table +# Step 3: Perform the chi-squared test of independence +chisq_test <- chisq.test(contingency_table) + +# Step 4: Print the result p val - 0.91 +chisq_test + + + +#receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') + +# Breakdown of final_receptor_group by unique participant_id +receptor_status_by_participant <- subset_data %>% + group_by(participant_id) %>% + summarise(final_receptor_group = first(final_receptor_group), # Or choose the most frequent group if needed + .groups = "drop") + +# View the result +table(receptor_status_by_participant$final_receptor_group) + +# Summarizing data by participant_id, final_receptor_group, and ctDNA_ever +receptor_ctDNA_status <- subset_data %>% + group_by(participant_id) %>% + summarise( + final_receptor_group = first(final_receptor_group), # Or the most frequent if needed + ctDNA_ever = first(ctDNA_ever), # Taking the first observed value for ctDNA_ever + .groups = "drop" + ) + +# Step 2: Create the contingency table +contingency_table_receptor <- table(receptor_ctDNA_status$final_receptor_group, receptor_ctDNA_status$ctDNA_ever) +contingency_table_receptor + +# Step 3: Perform the chi-squared test of independence +chisq_test <- chisq.test(contingency_table_receptor) + +# Step 4: Print the result # p-value 0.10 +chisq_test + + +#curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive) +#start with TNBC (using QDC) +#inclusion criteria inc_dx_crit___1 = TNBC + + +#inc_dx_crit_list___1 + +TNBC_ctDNA_status <- subset_data %>% + group_by(participant_id) %>% + summarise( + inc_dx_crit_list___1 = first(inc_dx_crit_list___1), # Or the most frequent if needed + ctDNA_ever = first(ctDNA_ever), # Taking the first observed value for ctDNA_ever + .groups = "drop" + ) + +# Step 2: Create the contingency table +contingency_table_TNBC <- table(TNBC_ctDNA_status$inc_dx_crit_list___1, TNBC_ctDNA_status$ctDNA_ever) +contingency_table_TNBC + +# Step 3: Perform the chi-squared test of independence +chisq_test <- chisq.test(contingency_table_TNBC) + +# Step 4: p-val is 0.12 +chisq_test + + +#ER vs non-ER +#first create HR_status variable +subset_data <- subset_data |> + mutate(HR_status = case_when( + final_receptor_group %in% c(2, 3) ~ "HR+", + final_receptor_group %in% c(1, 4) ~ "Non-HR+", + TRUE ~ NA_character_ # In case there are missing or other unexpected values + )) + +# View the new HR_status variable +table(subset_data$HR_status) + +HR_status_by_participant <- subset_data %>% + group_by(participant_id) %>% + summarise(HR_status = first(HR_status), # Or use mode() if you have multiple rows per participant + .groups = "drop") + +# View the result +table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-) + +# Summarize ctDNA_detected status by HR_status, for each unique participant_id +summary_data <- subset_data %>% + group_by(participant_id) %>% + summarise( + HR_status = first(HR_status), # Get the HR_status for the participant + ctDNA_status = first(ctDNA_ever), # Get the ctDNA_detected status for the participant + .groups = "drop" + ) + +contingency_table_HR <- table(summary_data$ctDNA_status, summary_data$HR_status) +contingency_table_HR +chisq_test <- chisq.test(contingency_table_HR) + +# Print chi-squared test results #0.28 +chisq_test + + + + +###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported + +# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported +summary_data <- subset_data %>% + filter(final_tumor_grade != 3) %>% # Exclude grade == 3 + group_by(participant_id) %>% + summarise( + grade = first(final_tumor_grade), # Get the final_tumor_grade for each participant + ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of grade vs ctDNA_ever +contingency_table <- table(summary_data$grade, summary_data$ctDNA_ever) + +# View the contingency table +print(contingency_table) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# View the Chi-squared test result -- p-value 0.0229 +print(chisq_test) + +######histology #people have different combinations of histology (1-15) + table(subset_data$participant_id, subset_data$final_histology) + + histology_summary <- subset_data %>% + distinct(participant_id, final_histology) %>% # Get unique participant-histology combinations + group_by(final_histology) %>% # Group by histology type + summarise(count = n()) # Count the number of participants per histology type + + # View the summary table + print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology + + #trying to create Ductal, lobular, both, or other variables + subset_data <- subset_data %>% + mutate(histology_category = case_when( + grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular", # Both Ductal and Lobular + grepl("3", as.character(final_histology)) ~ "Ductal", # Ductal + grepl("14", as.character(final_histology)) ~ "Lobular", # Lobular + TRUE ~ "Other" # Any other combination + )) + + # Count the number of participants in each histology category + histology_counts <- subset_data %>% + group_by(histology_category) %>% + summarise(count = n_distinct(participant_id)) # Count distinct participants + + # View the counts -- adds up to 109! + print(histology_counts) + + #contingency table + library(tidyr) + contingency_table <- subset_data %>% + distinct(participant_id, histology_category, ctDNA_ever) %>% # Ensure each patient is counted once + count(histology_category, ctDNA_ever) %>% + pivot_wider(names_from = ctDNA_ever, values_from = n, values_fill = list(n = 0)) # Pivot the table to get ctDNA_ever as columns + + # 3. Perform the Chi-squared test of independence + chisq_test <- chisq.test(contingency_table[,-1]) # Remove the histology_category column for the test + + # 4. Print the contingency table + print(contingency_table) + + # 5. Print the result of the Chi-squared test p-value - 0.2276 + print(chisq_test) + + + + #### Stage -- N stage --> come back to this N stage stuff + +table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3) + + nodal_summary <- subset_data %>% + distinct(participant_id, final_n_stage) %>% # Get unique participant-stage combinations + group_by(final_n_stage) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +#View the summary table --adds up to 109, 46 = pN0 63 = pN1 + print(nodal_summary) + + subset_data_by_id <- subset_data %>% + filter(final_n_stage %in% c(0, 1, 2, 3)) %>% # Include only relevant nodal stages + group_by(participant_id) %>% + summarise( + nodal_status = first(final_n_stage), # Use final_n_stage as nodal_status for each participant + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" + ) + + # Step 3: Create a contingency table of nodal_status vs ctDNA_ever + contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$ctDNA_ever) + + # Step 4: Check if any cells in the contingency table have zero counts, which could affect test validity + print(contingency_table) + + # Step 5: Perform Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Step 6: Print the Chi-squared test result p = 0.0001 + print(chisq_test) + + + #### Creating Node - vs node + variable from summary indicator variable + subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"), # Node negative if 0, positive otherwise + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" + ) + + # Step 2: Create a contingency table of node_status vs ctDNA_ever + contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$ctDNA_ever) + + # Step 3: Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Step 4: Print the contingency table and Chi-squared test results + print(contingency_table) + print(chisq_test) + + + ####### EXTRA CODE/CONFIRMATION / slightly different but ignore for our analysis + #cross-check with indicator pN0 in our data that reflects nodal positivity.... there is 1 patient that is node - by summary variable but node + by indicator variable + ## should double check this at some point + node_pos <- subset_data %>% + distinct(participant_id, inc_dx_crit_list___2) %>% # Get unique participant-stage combinations + group_by(inc_dx_crit_list___2) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + + print(node_pos) + + contingency_table <- subset_data %>% + distinct(participant_id, inc_dx_crit_list___2, ctDNA_ever) %>% # Ensure unique participants + count(inc_dx_crit_list___2, ctDNA_ever) %>% # Count occurrences + spread(key = ctDNA_ever, value = n, fill = 0) # Spread data into a matrix + + # View the contingency table + print(contingency_table) + + # Perform the Chi-square test =0.3902 + chi_square_result <- chisq.test(contingency_table[, -1]) # Exclude the first column with the levels + print(chi_square_result) + + + + + +#######t stage final_t_stage + + table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (90 = pTx) so can proceed with this + + t_summary <- subset_data %>% + distinct(participant_id, final_t_stage) %>% # Get unique participant-stage combinations + group_by(final_t_stage) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + + # View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2 + print(t_summary) + + + #### T stage, for our T stage table, will use T1 vs T2 or greater to simplify + #exclude 99 (the pTx) + subset_data_clean <- subset_data %>% + filter(final_t_stage != 99, ctDNA_ever != 99) + + # Combine final_t_stage into T1 vs. T2 or greater + subset_data_clean <- subset_data_clean %>% + mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater")) + + # Summarize the data by participant_id after creating the new combined t_stage + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + + # Create a contingency table of final_t_stage_combined vs ctDNA_ever + contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Print the contingency table and Chi-squared test results + print(contingency_table) + print(chisq_test) + +#### TRy for T stage stats using T3 or greater as cutoff -- not super useful, So DONOT USE THIS FOR TABLE + + #exclude 99 (the pTx) + subset_data_clean <- subset_data %>% + filter(final_t_stage != 99, ctDNA_ever != 99) + + # Combine final_t_stage into T1/T2 or T3 or greater + subset_data_clean <- subset_data_clean %>% + mutate(final_t_stage_combined = case_when( + final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2", # Group T1 and T2 together + final_t_stage >= 3 ~ "T3 or greater", # Group T3 and higher as a separate category + TRUE ~ NA_character_ # Handle any unexpected values + )) + + + # Summarize the data by participant_id after creating the new combined t_stage + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + + # Create a contingency table of final_t_stage_combined vs ctDNA_ever + contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Print the contingency table and Chi-squared test results --> not significant so ignore this + print(contingency_table) + print(chisq_test) + + + + ########stage of disease -- final_overall_stage + + table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this + + stage_summary <- subset_data %>% + distinct(participant_id, final_overall_stage) %>% # Get unique participant-stage combinations + group_by(final_overall_stage) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + + # View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay) + print(stage_summary) + + #exclude the 99 + subset_data_clean <- subset_data %>% + filter(final_overall_stage != 99, ctDNA_ever != 99) + + # Summarize the data by participant_id + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + final_overall_stage = first(final_overall_stage), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + + # Create a contingency table of final_overall_stage vs ctDNA_ever + contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Print the contingency table and Chi-squared test results --> hot dogggg p val = 0.006 + print(contingency_table) + print(chisq_test) + + + + +###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) + + + table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness + + surgery <- subset_data %>% + distinct(participant_id, diag_surgery_type_1) %>% # Get unique participant-stage combinations + group_by(diag_surgery_type_1) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + + # View the summary table + print(surgery) + + + # Summarize the data by participant_id + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + surgery = first(diag_surgery_type_1), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + + # Create a contingency table of final_overall_stage vs ctDNA_ever + contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Print the contingency table and Chi-squared test results --> p-val = 1.... + print(contingency_table) + print(chisq_test) + + + +######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms) + + table(subset_data$diag_axillary_type___2_1) + table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two + + # Create a binary variable to identify participants who had axillary dissection + subset_data_clean <- subset_data %>% + mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) + + # Ensure every participant has a ctDNA_ever and axillary_dissection value + # Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one + subset_data_clean <- subset_data %>% + mutate(axillary_dissection = case_when( + diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1, # Had axillary dissection + TRUE ~ 0 # No axillary dissection (includes missing values) + )) + + # Summarize the data by participant_id, including the axillary_dissection and ctDNA_ever variables + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + axillary_dissection = first(axillary_dissection), # Get the axillary dissection status for each participant + ctDNA_ever = first(ctDNA_ever) # Get the ctDNA_ever status for each participant + ) + + contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + fishers <- fisher.test(contingency_table) + print(fishers) + + # Print the contingency table and Chi-squared test results --> p-value 0.173 (used chisq for consistency...) + print(contingency_table) + print(chisq_test) + +####inflammatory inflamm_yn -- IGNORE THIS for Table 1 +table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable) +table(d$inflamm_yn_2) ### I think inflammatory folks just not in subset of patients in the ctDNA cohort +table(subset_data$inflamm_yn) + +#### radiation prtx_radiation +table(subset_data$prtx_radiation) + +radiation <- subset_data |> + distinct(participant_id,prtx_radiation) |> + group_by(prtx_radiation) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(radiation) + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + radiation = first(prtx_radiation), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs ctDNA_ever +contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) +fishers <- fisher.test(contingency_table) +print(fishers) + +# Print the contingency table and Chi-squared test results --> p-val = 0.33 +print(contingency_table) +print(chisq_test) + + +#### chemotherapy prtx_chemo +table(subset_data$prtx_chemo) + +chemo <- subset_data |> + distinct(participant_id,prtx_chemo) |> + group_by(prtx_chemo) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(chemo) #3 people didn not get chemo in this cohort + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + chemo = first(prtx_chemo), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs ctDNA_ever +contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) +fishers <- fisher.test(contingency_table) +print(fishers) + +# Print the contingency table and Chi-squared test results --> p-val = 0.33 +print(contingency_table) +print(chisq_test) + + + +####neoadjuvant chemo diag_neoadj_chemo_1 or diag_neoadj_chemo_2 + +table(subset_data$diag_neoadj_chemo_1) +table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable + +nact <- subset_data |> + distinct(participant_id,diag_neoadj_chemo_1) |> + group_by(diag_neoadj_chemo_1) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(nact) #3 people didn not get chemo in this cohort + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + nact = first(diag_neoadj_chemo_1), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs ctDNA_ever +contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 0.95 +print(contingency_table) +print(chisq_test) + + +####hormone therapy prtx_endo + +table(subset_data$prtx_endo) + +endo <- subset_data |> + distinct(participant_id,prtx_endo) |> + group_by(prtx_endo) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(endo) #most ppl did get endo (62 of the 109) + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + endo = first(prtx_endo), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs ctDNA_ever +contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 0.95 +print(contingency_table) +print(chisq_test) + + + + +####bone modifying agents prtx_bonemod + +table(subset_data$prtx_bonemod) + +bonemod <- subset_data |> + distinct(participant_id,prtx_bonemod) |> + group_by(prtx_bonemod) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(bonemod) #most ppl did get endo (39 got bonemod) + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + bonemod = first(prtx_bonemod), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + +# Create a contingency table of bonemod vs ctDNA_ever +contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 0.84 +print(contingency_table) +print(chisq_test) + + + +####referred to trial fu_trial_yn --> this variable seems to have disappeared. I can re-make it based on fu_trial_pid + +names(d) + +table(subset_data$fu_trial_yn) #this variable does not exist in our data set + +subset_data <- subset_data %>% + mutate(trial = ifelse(!is.na(fu_trial_pid) & fu_trial_pid != "", "Yes", "No")) +print(subset_data$trial) + + +trial <- subset_data |> + distinct(participant_id,trial) |> + group_by(trial) |> # Group by trial yes/no + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(trial) #most ppl did get endo (62 of the 109) --> 38 pts went on trial + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + trial = first(trial), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs ctDNA_ever +contingency_table <- table(subset_data_by_id$trial, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 0.95 +print(contingency_table) +print(chisq_test) + + + + + +Later +#2 = non-pcr, 1 = pcr +#path cr diag_pcr_1 or diag_pcr_2 +table(subset_data$diag_pcr_1) +table(subset_data$diag_pcr_2) #none recorded here si can just use pcr_1 + +pcr <- subset_data %>% + mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) %>% # Convert "." to NA + filter(!is.na(diag_pcr_1)) %>% # Exclude rows where diag_pcr_1 is NA + distinct(participant_id, diag_pcr_1) %>% + group_by(diag_pcr_1) %>% + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + pcr = first(diag_pcr_1), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + +# Create a contingency table of pcr vs ctDNA_ever +contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 0.86 -- does not seem to be association among those who got pcr (but also we have a group with 1 in it...) +print(contingency_table) +print(chisq_test) + + + +########recurrence +#local first, then distant.then create summary variable of either locreg or distant +#local fu_locreg_prog + +# Step 1: Summarize data by unique participant_id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + fu_locreg_prog = first(fu_locreg_prog), # Get fu_locreg_prog status for each participant + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" + ) + +# Step 2: Create a contingency table of fu_locreg_prog vs ctDNA_ever +contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$ctDNA_ever) + +# Step 3: Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 +print(contingency_table) +print(chisq_test) + +####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char +### Just want to look at site distribution here + +# Summarize the distribution of fu_locreg_site_char by unique participant_id +site_distribution <- subset_data %>% + group_by(participant_id) %>% + summarise( + site = first(fu_locreg_site_char), # Get the site for each unique participant + .groups = "drop" + ) %>% + count(site) # Count the occurrences of each site + +# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast +print(site_distribution) + +#####distant recurrence: distant fu_dist_prog + +# Step 1: Summarize data by unique participant_id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + fu_dist_prog = first(fu_dist_prog), # Get fu_dist_prog status for each participant + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" + ) + +# Step 2: Create a contingency table of dist prog vs ctDNA_ever --> 12 who had distant progression +contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$ctDNA_ever) + +# Step 3: Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 +print(contingency_table) +print(chisq_test) + + +### Distant sites +#distant site fu_dist_site_num #fu_dist_site_char -- start justl ooking at the locations + +# Summarize the distribution of fu_dist_site_char by unique participant_id +dist_site_distribution <- subset_data %>% + group_by(participant_id) %>% + summarise( + site = first(fu_dist_site_char), # Get the site for each unique participant + .groups = "drop" + ) %>% + count(site) # Count the occurrences of each site + +# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal +print(dist_site_distribution) + +#any recurrence +#either fu_locreg_prog or fu_dist_prog + +subset_data <- subset_data %>% + mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No")) + +# link by participant id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of ever_relapsed vs ctDNA_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results -- total 14 relapses, 8 were ctDNA +, 6 were not ever ctDNA positive +print(contingency_table) +print(chisq_test) + +#### Relapse and DTC +#using ever_relapsed + +# link by participant id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + dtc = first(dtc_ever), # Get the ctDNA_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of ever_relapsed vs ctDNA_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results +print(contingency_table) +print(chisq_test) + +# Identify participants missing data in either `ever_relapsed` or `dtc_ever` +missing_data <- subset_data_by_id %>% + filter(is.na(ever_relapsed) | is.na(dtc)) + +# Print the IDs of participants with missing data +print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above) + +### look at ever_relapsed by ctDNA + +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + ctDNA = first(ctDNA_ever), # Get the ctDNA_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of ever_relapsed vs ctDNA_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results +print(contingency_table) +print(chisq_test) + +####survival analysis fu_survival + +table(subset_data$fu_surv) + +surv <- subset_data %>% + distinct(participant_id, fu_surv) %>% + group_by(fu_surv) %>% + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(surv) #1 NA patient --> identify the NA patient below dead = 5, alive 103. There is 1 that's an NA. + +na_participant <- subset_data %>% + filter(is.na(fu_surv)) %>% + select(participant_id, fu_surv) + +# Print the result -- 28115-17-021 -- no follow up data for this pt looking in redcap, everyone else has some survival data in the ctDNA cohort. +print(na_participant) + +# Summarize data by unique participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + surv = first(fu_surv), # Get survival status for each participant + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of surv vs ctDNA_ever +contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results +print(contingency_table) +print(chisq_test) + + +############ Test Characteristics and Baseline versus cumulative positivity (ctDNA to start)####### + + +### DTC by ctDNA (ever positive) + +# link by participant id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + dtc = first(DTC_ever), # Get the ever dtc for each participant + ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of dtc vs ctDNA_ever +contingency_table <- table(subset_data_by_id$dtc, subset_data_by_id$ctDNA_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results, p-val 0.839 +print(contingency_table) +print(chisq_test) + + + +##### Test stuff (#s and such of tests) + +#number of tests (ctDNA) +library(dplyr) + +# Assuming the status variable is named `ctDNA_status` in d, and then in subset +status_summary_d <- d %>% + group_by(ctDNA_detected) %>% + summarise(total_samples = n(), .groups = "drop") + +# Print the summary -- we've got 385 FALSE, 8 FAILS, 11 TRUES +print(status_summary_d) + +#looking at the number of Fails by unique participant_id +fail_count <- d %>% + filter(ctDNA_detected == "Fail") %>% # Filter for rows where status is "FAIL" + distinct(participant_id) %>% # Get unique participant IDs + summarise(total_fails = n()) # Count unique participant IDs + +# Print the result -- 4 individuals with FAIL results, which is what we got in the consort +print(fail_count) +fail_count <- subset_data %>% + filter(ctDNA_detected == "Fail") %>% # Filter for rows where status is "FAIL" + distinct(participant_id) %>% # Get unique participant IDs + summarise(total_fails = n()) # Count unique participant IDs + +# Print the result -- none of the fails were pulled into the ctDNA cohort +print(fail_count) + +#number of DTC tests in this cohort of 109 patients + +unique(subset_data$dtc_ihc_result_final) #NA, 0, and 1 + +status_summary_subset <- subset_data %>% + group_by(dtc_ihc_result_final) %>% + summarise(total_samples = n(), .groups = "drop") + +# Print the summary -- we've got 221 negatives, 49 positives, 128 NAs, across 39 patients (positive) and 70 patients (negative) +#### confirm with nick that not missing the NAs, but I suspect based on the below that we are fine and thse are just ctDNA only timepoints +print(status_summary_subset) + +### looking at NAs -- all of them have FALSE (so i think these are all the ones that had ctDNA timepoints ) +na_participants_dtc <- subset_data %>% + filter(is.na(dtc_ihc_result_final)) %>% + select(participant_id, dtc_ihc_result_final, FINAL_RESULT, ORIG_RSLT_DTC, ctDNA_detected, timepoint) + +# Print the list of participant IDs with NA in `dtc_ihc_result_final`-- they all have FALSE ctDNA results, so these are the ctDNA timepoints +#all of the timepoints are long-term except for CLEVER baseline. +print(na_participants_dtc, n=128) + +#look at timepoints +unique_timepoints <- unique(subset_data$timepoint) +print(unique_timepoints) + +#Identify participant_ids with both "SURMOUNT" and "CLEVER Screening" timepoints +participants_dual_timepoints <- subset_data %>% + filter(timepoint %in% c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", + "Year 3 Follow Up", "Year 4 Follow Up", "CLEVER-Baseline")) %>% # Filter for relevant timepoints + group_by(participant_id) %>% + filter(n_distinct(timepoint) > 1) %>% # Ensure participant has both timepoints + ungroup() %>% + select(participant_id, timepoint, date) %>% # Select participant_id, timepoint, and date + distinct() + +participants_dual_timepoints <- subset_data %>% + filter(timepoint %in% c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", + "Year 3 Follow Up", "Year 4 Follow Up", "CLEVER-Baseline")) %>% + group_by(participant_id) %>% + filter(n_distinct(timepoint) > 1) %>% + ungroup() %>% + select(participant_id, timepoint, dtc_ihc_date_final, dtc_ihc_result_final) %>% + distinct() + +# Print the list of participant_ids with both timepoints -- great, all the CLEVER-Baselines are NAs in this as they should (blood only) +print(participants_dual_timepoints, n=190) + + +##### eVAF +names(subset_data) #use eVAF + +# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE` +eVAF_range_ctDNA_detected_percent <- subset_data %>% + filter(ctDNA_detected == TRUE) %>% # Filter for those with ctDNA detected + summarise( + median_eVAF_percent = median(eVAF, na.rm = TRUE) * 100, # Convert median to percentage + min_eVAF_percent = min(eVAF, na.rm = TRUE) * 100, # Convert minimum to percentage + max_eVAF_percent = max(eVAF, na.rm = TRUE) * 100 # Convert maximum to percentage + ) + +# Print the result +print(eVAF_range_ctDNA_detected_percent) + +#### DTC counts +names(subset_data) #use dtc_ihc_summary_count_final + +# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE` +dtc_count <- subset_data %>% + filter(dtc_ihc_result_final == 1) %>% # Filter for those with dtcs detected + summarise( + median_dtc_count = median(dtc_ihc_summary_count_final, na.rm = TRUE), + min_dtc_count = min(dtc_ihc_summary_count_final, na.rm = TRUE), + max_dtc_count = max(dtc_ihc_summary_count_final, na.rm = TRUE) + ) + +# Print the result +print(dtc_count) + + +#### Number of timepoints we see + +# Timepoints per patient (median, range) +timepoints_per_patient <- subset_data %>% + group_by(participant_id) %>% + summarise( + total_timepoints = n_distinct(timepoint), # Count distinct timepoints for each patient + .groups = "drop" + ) %>% + summarise( + median_timepoints = median(total_timepoints, na.rm = TRUE), # Calculate median + min_timepoints = min(total_timepoints, na.rm = TRUE), # Calculate minimum + max_timepoints = max(total_timepoints, na.rm = TRUE) # Calculate maximum + ) + +# Timepoints of ctDNA assessment (`ctDNA_detected`) +ctDNA_timepoints <- subset_data %>% + filter(!is.na(ctDNA_detected)) %>% # Filter out NA values for ctDNA_detected + group_by(participant_id) %>% + summarise( + ctDNA_timepoints = n_distinct(timepoint), # Count distinct timepoints of ctDNA assessment + .groups = "drop" + ) %>% + summarise( + median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE), # Calculate median + min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE), # Calculate minimum + max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE) # Calculate maximum + ) + +# Timepoints of DTC assessment (`dtc_ihc_results_final`) +dtc_timepoints <- subset_data %>% + filter(!is.na(dtc_ihc_result_final)) %>% # Filter out NA values for dtc_ihc_result_final + group_by(participant_id) %>% + summarise( + dtc_timepoints = n_distinct(timepoint), # Count distinct timepoints of DTC assessment + .groups = "drop" + ) %>% + summarise( + median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE), # Calculate median + min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE), # Calculate minimum + max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE) # Calculate maximum + ) + +# Print all summaries +print("Timepoints per patient:") +print(timepoints_per_patient) + +print("Timepoints of ctDNA assessment:") +print(ctDNA_timepoints) + +print("Timepoints of DTC assessment:") +print(dtc_timepoints) + +### timepoints on clinical trial ### Ask Nick -- should we include all the timepoints on trial technically +#(CLEVER-Baseline, EOO, C3, C6, C12, 6M F/U, etc.) or just the ones while on treatment (C3, C6, C12) +#, or only the ones while patiennts are +unique_timepoints <- unique(subset_data$timepoint) +print(unique_timepoints) + + +trial_timepoints <- c("CLEVER-Baseline", "EOO", "C3", "C6", "C12", "6M F/U", "12M F/U", "18M F/U", "24M F/U", "30M F/U", "36M F/U") + +# Count the number of samples by timepoint (for specific clinical trial timepoints) +samples_by_trial_timepoint <- subset_data %>% + filter(timepoint %in% trial_timepoints) %>% # Filter for relevant timepoints + group_by(timepoint) %>% # Group by timepoint + summarise( + total_samples = n_distinct(participant_id), # Count distinct participant_ids (samples) + .groups = "drop" # Remove grouping after summarizing + ) + +# Print the result +print(samples_by_trial_timepoint) #total samples on trial (ctDNA and dtC) + +#### ctDNA on trial + +ctDNA_samples_by_timepoint <- subset_data %>% + filter(timepoint %in% trial_timepoints, !is.na(ctDNA_detected)) %>% # Filter for relevant timepoints and ctDNA detected + group_by(timepoint) %>% # Group by timepoint + summarise( + total_samples_ctDNA = n_distinct(participant_id), # Count distinct participant_ids (ctDNA samples) + .groups = "drop" # Remove grouping after summarizing + ) + +# Print the result for ctDNA samples -- 32 CLEVER-Baseline, 9 EOO, 20 C3, 28 C6, 4 C12, 27 6M, 18 12M, 13 18M, 13 24M, 12 30M, 18 36M +print(ctDNA_samples_by_timepoint) + + +##### DTC by trial timepoint +# Count the number of DTC samples by timepoint (for specific clinical trial timepoints) +dtc_samples_by_timepoint <- subset_data %>% + filter(timepoint %in% trial_timepoints, !is.na(dtc_ihc_result_final)) %>% # Filter for relevant timepoints and DTC results + group_by(timepoint) %>% # Group by timepoint + summarise( + total_samples_dtc = n_distinct(participant_id), # Count distinct participant_ids (DTC samples) + .groups = "drop" # Remove grouping after summarizing + ) + +# Print the result for DTC samples -- makes sense, no CLEVER baseline timepoints, 9 EOO, 19 C3, 28 C6, 4 C12, 19 6M F/U +print(dtc_samples_by_timepoint) + +#### Number of ctDNA timepoints on surmount +print(unique_timepoints) +surmount_timepoints <- c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", "Year 3 Follow Up", "Year 4 Follow Up", "Long Term FU 1", "Long Term FU 2") + +ctDNA_surmount <- subset_data %>% + filter(timepoint %in% surmount_timepoints, !is.na(ctDNA_detected)) %>% # Filter for relevant timepoints and ctDNA detected + group_by(timepoint) %>% # Group by timepoint + summarise( + total_samples_ctDNA = n_distinct(participant_id), # Count distinct participant_ids (ctDNA samples) + .groups = "drop" # Remove grouping after summarizing + ) + +# Print the result for ctDNA samples -- 109 Baseline, Y1 FU 40, Y2 FU 25, Y3 FU 14, Y4 FU 4, LT FU 10, LT 2 FU 2 +print(ctDNA_surmount) + + +### number of DTC timepoints on surmount +# Count the number of DTC samples by timepoint +dtc_timepoint_surmount <- subset_data %>% + filter(timepoint %in% surmount_timepoints, !is.na(dtc_ihc_result_final)) %>% # Filter for relevant timepoints and DTC results + group_by(timepoint) %>% # Group by timepoint + summarise( + total_samples_dtc = n_distinct(participant_id), # Count distinct participant_ids (DTC samples) + .groups = "drop" # Remove grouping after summarizing + ) + +# Print the result for DTC samples -- +print(dtc_timepoint_surmount) + + +#### positivity by timepoint -- ctDNA + +ctDNA_pos_rate_by_timepoint <- subset_data %>% + filter(!is.na(ctDNA_detected)) %>% # Ensure we are considering only non-missing ctDNA_detected values + group_by(timepoint, participant_id) %>% # Group by timepoint and participant + summarise( + ctDNA_pos = max(ctDNA_detected == TRUE), # If any value is TRUE, participant is ctDNA positive at that timepoint + .groups = "drop" + ) %>% + group_by(timepoint) %>% # Group again by timepoint to calculate the positivity rate + summarise( + positivity_rate = mean(ctDNA_pos), # Calculate the positivity rate for each timepoint + total_samples = n_distinct(participant_id), # Count the number of distinct participants + .groups = "drop" + ) + +# Print the result for ctDNA positivity rate by timepoint +print(ctDNA_pos_rate_by_timepoint) + +# Calculate cumulative ctDNA positivity rate by timepoint +ctDNA_pos_rate_cumulative <- ctDNA_pos_rate_by_timepoint %>% + arrange(timepoint) %>% # Ensure the data is sorted by timepoint + mutate( + cumulative_pos_rate = cumsum(positivity_rate * total_samples) / cumsum(total_samples) # Cumulative positivity rate + ) + +print(ctDNA_pos_rate_cumulative) + +#### Cumulative positivity ctDNA + +library(dplyr) + +# Calculate ctDNA positivity rate by participant +ctDNA_pos_rate <- subset_data %>% + filter(!is.na(ctDNA_detected)) %>% # Exclude missing ctDNA results + group_by(participant_id) %>% # Group by participant + summarise( + ctDNA_pos = max(ctDNA_detected == TRUE), # If any value is TRUE, participant is ctDNA positive + .groups = "drop" + ) + +# Calculate cumulative positivity rate +ctDNA_pos_rate_cumulative <- ctDNA_pos_rate %>% + summarise( + total_pos = sum(ctDNA_pos), # Total number of ctDNA positive participants + total_samples = n(), # Total number of participants + cumulative_pos_rate = total_pos / total_samples # Cumulative positivity rate + ) + +# Print the cumulative positivity rate +print(ctDNA_pos_rate_cumulative) + + +# Count the number of positive ctDNA samples and total samples +ctDNA_pos_vs_total <- subset_data %>% + filter(!is.na(ctDNA_detected)) %>% # Exclude missing ctDNA results + summarise( + total_samples = n(), # Total number of ctDNA samples + positive_samples = sum(ctDNA_detected == TRUE), # Count of positive ctDNA samples + .groups = "drop" + ) %>% + mutate( + positivity_rate = positive_samples / total_samples # Proportion of positive ctDNA samples + ) + +# Print the results +print(ctDNA_pos_vs_total) + + +#### cumulative positivity DTC + +# Calculate ctDNA positivity rate by participant +DTC_pos_rate <- subset_data %>% + filter(!is.na(dtc_ihc_result_final)) %>% # Exclude missing ctDNA results + group_by(participant_id) %>% # Group by participant + summarise( + dtc = max(dtc_ihc_result_final == 1), # If any value is TRUE, participant is ctDNA positive + .groups = "drop" + ) + +# Calculate cumulative positivity rate +DTC_pos_rate_cumulative <- DTC_pos_rate %>% + summarise( + total_pos = sum(dtc), # Total number of ctDNA positive participants + total_samples = n(), # Total number of participants + cumulative_pos_rate = total_pos / total_samples # Cumulative positivity rate + ) + +# Print the cumulative positivity rate +print(DTC_pos_rate_cumulative) + + +# Count the number of positive ctDNA samples and total samples +dtc_pos_vs_total <- subset_data %>% + filter(!is.na(dtc_ihc_result_final)) %>% # Exclude missing ctDNA results + summarise( + total_samples = n(), # Total number of ctDNA samples + positive_samples = sum(dtc_ihc_result_final == 1), # Count of positive ctDNA samples + .groups = "drop" + ) %>% + mutate( + positivity_rate = positive_samples / total_samples # Proportion of positive ctDNA samples + ) + +# Print the results +print(dtc_pos_vs_total) + + +###### Test characteristics ctDNA + +true_results <- subset_data$ever_relapsed # Actual labels (True: 1 = relapsed, 0 = not relapsed) +predicted_results <- subset_data$ctDNA_ever # Predicted labels (True: 1 = ctDNA ever detected, 0 = not detected) + +# Create confusion matrix +conf_matrix <- table(Predicted = predicted_results, Actual = true_results) + +# Calculate sensitivity, specificity, PPV, and NPV +sensitivity <- conf_matrix[2, 2] / sum(conf_matrix[2, ]) # True Positives / (True Positives + False Negatives) +specificity <- conf_matrix[1, 1] / sum(conf_matrix[1, ]) # True Negatives / (True Negatives + False Positives) +ppv <- conf_matrix[2, 2] / sum(conf_matrix[, 2]) # True Positives / (True Positives + False Positives) +npv <- conf_matrix[1, 1] / sum(conf_matrix[, 1]) # True Negatives / (True Negatives + False Negatives) + +# Print the results +cat("Sensitivity: ", sensitivity, "\n") +cat("Specificity: ", specificity, "\n") +cat("Positive Predictive Value (PPV): ", ppv, "\n") +cat("Negative Predictive Value (NPV): ", npv, "\n") + + +### Test characteristics for DTC + + +# Example data: True results (ever relapsed) and predicted results (DTC ever detected) +true_results <- subset_data$ever_relapsed # Actual labels (True: 1 = relapsed, 0 = not relapsed) +predicted_results <- subset_data$dtc_ever # Predicted labels (True: 1 = DTC ever detected, 0 = not detected) + +# Create confusion matrix +conf_matrix_2 <- table(Predicted = predicted_results, Actual = true_results) + +# Calculate sensitivity, specificity, PPV, and NPV +sensitivity <- conf_matrix_2[2, 2] / sum(conf_matrix_2[2, ]) # True Positives / (True Positives + False Negatives) +specificity <- conf_matrix_2[1, 1] / sum(conf_matrix_2[1, ]) # True Negatives / (True Negatives + False Positives) +ppv <- conf_matrix_2[2, 2] / sum(conf_matrix_2[, 2]) # True Positives / (True Positives + False Positives) +npv <- conf_matrix_2[1, 1] / sum(conf_matrix_2[, 1]) # True Negatives / (True Negatives + False Negatives) + +# Print the results +cat("Sensitivity: ", sensitivity, "\n") +cat("Specificity: ", specificity, "\n") +cat("Positive Predictive Value (PPV): ", ppv, "\n") +cat("Negative Predictive Value (NPV): ", npv, "\n") + +#### how many DTC pts went on trial? + +# Total DTC+ patients +total_dtc_plus <- nrow(subset(subset_data, dtc_ihc_result_final == 1)) + +names(subset_data) + +# DTC+ patients who went on trial (those who have a fu_trial_pid) + +library(dplyr) + +# Total unique DTC+ patients +total_dtc_plus <- subset_data %>% + filter(dtc_ihc_result_final == 1) %>% + distinct(participant_id) %>% + nrow() + +# Unique DTC+ patients who went on trial (those who have a trial ID fu_trial_pid) +dtc_plus_trial <- subset_data %>% + filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) %>% + distinct(participant_id) %>% + nrow() + +# Proportion of DTC+ patients who went on trial +proportion_trial <- dtc_plus_trial / total_dtc_plus + +# Display results +cat("Total unique DTC+ patients:", total_dtc_plus, "\n") +cat("Unique DTC+ patients who went on trial:", dtc_plus_trial, "\n") +cat("Proportion of DTC+ patients who went on trial:", proportion_trial, "\n") + +# All DTC + patients went on trial (39/39) + + +##### Concordance between DTC and ctDNA + + +### concordance overall + +# Filter and get unique participants by participant_id +concordance_overall_unique <- subset_data %>% + distinct(participant_id, .keep_all = TRUE) %>% + mutate(concordance = ifelse(dtc_ever == ctDNA_ever, "Concordant", "Discordant")) + +# Count total concordant and discordant pairs for unique participants +overall_concordant <- sum(concordance_overall_unique$concordance == "Concordant") +overall_discordant <- sum(concordance_overall_unique$concordance == "Discordant") + +# Proportion of concordance +proportion_concordant <- overall_concordant / (overall_concordant + overall_discordant) + +cat("Overall Concordant (unique participants):", overall_concordant, "\n") +cat("Overall Discordant (unique participants):", overall_discordant, "\n") +cat("Overall Proportion Concordant (unique participants):", proportion_concordant, "\n") + +#Proportion concordance 63% (ever positive) + +# Ensure that dtc_ihc_result_final is converted to TRUE/FALSE for consistency with ctDNA_detected +concordance_by_timepoint <- subset_data %>% + filter(!is.na(dtc_ihc_result_final) & !is.na(ctDNA_detected)) %>% + mutate( + # Convert dtc_ihc_result_final (1 = TRUE, 0 = FALSE) to match ctDNA format (TRUE/FALSE) + dtc_ihc_result_final_bool = ifelse(dtc_ihc_result_final == 1, TRUE, FALSE), + # Create a column to determine concordance (both DTC and ctDNA being TRUE or both being FALSE) + concordance = ifelse(dtc_ihc_result_final_bool == ctDNA_detected, "Concordant", "Discordant") + ) %>% + group_by(timepoint) %>% + summarise( + total_concordant = sum(concordance == "Concordant"), + total_discordant = sum(concordance == "Discordant"), + total_samples = n(), # Total number of samples at this timepoint + concordance_rate = total_concordant / total_samples # Concordance rate per timepoint + ) + +# Print concordance results for each timepoint +print(concordance_by_timepoint) + +# Now calculate overall concordance across all timepoints +overall_concordance <- sum(concordance_by_timepoint$total_concordant) / + sum(concordance_by_timepoint$total_samples) + +cat("Overall Concordance Rate across all timepoints:", overall_concordance, "\n") +#concordance, considering testing by timepoint, is 80%, versus 63% when you consider the tests separately. Does this make sense? + + +############### DTC Demographics ########## + +###### median age at diagnosis + +names(subset_data) #to identify the variables I want to use +str(subset_data$diag_date_1) #character +str(subset_data$org_consent_date) #character +str(subset_data$collection_date) #character + +d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y") +d$org_consent_date <- as.Date(d$demo_dob, format = "%m/%d/%Y") +d$collection_date <- as.Date(d$demo_dob, format = "%m/%d/%Y") + + +str(d$diag_date_1) #dates! +str(d$org_consent_date) #dates! + +### doing the same for subset_data as it didn't carry over into that data set +subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y") +subset_data$org_consent_date <- as.Date(subset_data$org_consent_date, format = "%m/%d/%Y") +subset_data$collection_date <- as.Date(subset_data$collection_date, format = "%m/%d/%Y") + + +# calculating age from date of diagnosis to dob +subset_data$time_to_consent <- as.numeric(difftime(subset_data$org_consent_date, subset_data$diag_date_1, units = "days")) / 365.25 +head(subset_data$time_to_consent) + +subset_data$time_to_consent_month <- as.numeric(difftime(subset_data$org_consent_date, subset_data$diag_date_1, units = "days")) / 30 +head(subset_data$time_to_consent_month) + +summary(subset_data$time_to_consent) #median + +time_to_consent <- subset_data %>% + group_by(ctDNA_ever) %>% + summarise( + mean_time_to_consent = mean(time_to_consent, na.rm = TRUE), # Calculate mean age + median_time_to_consent = median(time_to_consent, na.rm = TRUE), # Calculate median age + sd_age = sd(time_to_consent, na.rm = TRUE), # Calculate standard deviation of age + n = n() # Number of participants in each group + ) + +print(time_to_consent) #interesting dtc ever are slightly more positive + +# Perform the Wilcoxon rank-sum test to compare the medians of age between dtc_ever positive and negative groups +wilcox_test_result <- wilcox.test(time_to_consent ~ ctDNA_ever, data = subset_data) + +# Print the result +print(wilcox_test_result) + +#looking at range of age for the dtc pos +consent_summ <- subset_data %>% + group_by(ctDNA_ever) %>% + summarise( + min_time_to_consent = min(time_to_consent, na.rm = TRUE), # Minimum age + max_time_to_consent = max(time_to_consent, na.rm = TRUE), # Maximum age + .groups = "drop" + ) + +# View the summary table +print(consent_summ) + + + +#### Age at Dx (by DTC) + +names(subset_data) #to identify the variables I want to use +str(subset_data$diag_date_1) #character +str(subset_data$demo_dob) #character + +d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y") +d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y") + +str(d$diag_date_1) #dates! +str(d$demo_dob) #dates! + +### doing the same for subset_data as it didn't carry over into that data set +subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y") +subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y") + +# calculating age from date of diagnosis to dob +subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25 +head(subset_data$age_at_diag) + +summary(subset_data$age_at_diag) #median 48.75 + +age_summary <- subset_data %>% + group_by(dtc_ever) %>% + summarise( + mean_age = mean(age_at_diag, na.rm = TRUE), # Calculate mean age + median_age = median(age_at_diag, na.rm = TRUE), # Calculate median age + sd_age = sd(age_at_diag, na.rm = TRUE), # Calculate standard deviation of age + n = n() # Number of participants in each group + ) + +print(age_summary) #interesting dtc ever are slightly more positive + +# Perform the Wilcoxon rank-sum test to compare the medians of age between dtc_ever positive and negative groups +wilcox_test_result <- wilcox.test(age_at_diag ~ dtc_ever, data = subset_data) + +# Print the result +print(wilcox_test_result) + +#looking at range of age for the dtc pos +age_summary <- subset_data %>% + group_by(dtc_ever) %>% + summarise( + min_age = min(age_at_diag, na.rm = TRUE), # Minimum age + max_age = max(age_at_diag, na.rm = TRUE), # Maximum age + .groups = "drop" + ) + +# View the summary table +print(age_summary) + + +##### Race: demo_race_final + +# Get the count of unique participant_ids for each category in demo_race_final +race_counts_unique_percent <- subset_data %>% + group_by(demo_race_final) %>% + summarise(unique_participants = n_distinct(participant_id)) %>% + mutate(percent = unique_participants / sum(unique_participants) * 100) + +# View the result +print(race_counts_unique_percent) + + + +# Count distinct participant_ids by dtc_ever and demo_race_final +count_distinct_participants <- subset_data %>% + group_by(demo_race_final, dtc_ever) %>% + summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop") + +# Print the result +count_distinct_participants + + + +library(dplyr) + +# Step 1: Summarize by unique participant_id +summarized_data <- subset_data %>% + group_by(participant_id) %>% + summarise( + dtc_ever = first(dtc_ever), # Taking the first observed value of dtc_ever for each participant + demo_race_final = first(demo_race_final), # Taking the first observed value of demo_race_final for each participant + .groups = "drop" + ) + +# Step 2: Create the contingency table +contingency_table <- table(summarized_data$dtc_ever, summarized_data$demo_race_final) +contingency_table +# Step 3: Perform the chi-squared test of independence +chisq_test <- chisq.test(contingency_table) + +# Step 4: Print the result p val - 0.65 +chisq_test + + + +#receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') + +# Breakdown of final_receptor_group by unique participant_id +receptor_status_by_participant <- subset_data %>% + group_by(participant_id) %>% + summarise(final_receptor_group = first(final_receptor_group), # Or choose the most frequent group if needed + .groups = "drop") + +# View the result +table(receptor_status_by_participant$final_receptor_group) + +# Summarizing data by participant_id, final_receptor_group, and dtc_ever +receptor_dtc_status <- subset_data %>% + group_by(participant_id) %>% + summarise( + final_receptor_group = first(final_receptor_group), # Or the most frequent if needed + dtc_ever = first(dtc_ever), # Taking the first observed value for dtc_ever + .groups = "drop" + ) + +# Step 2: Create the contingency table +contingency_table_receptor <- table(receptor_dtc_status$final_receptor_group, receptor_dtc_status$dtc_ever) +contingency_table_receptor + +# Step 3: Perform the chi-squared test of independence +chisq_test <- chisq.test(contingency_table_receptor) + +# Step 4: Print the result # p-value 0.14 -- interesting looks like more even distribution of DTC + across TNBC than for ctDNA +chisq_test + + +#curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive) +#start with TNBC (using QDC) +#inclusion criteria inc_dx_crit___1 = TNBC + + +#inc_dx_crit_list___1 + +TNBC_dtc_status <- subset_data %>% + group_by(participant_id) %>% + summarise( + inc_dx_crit_list___1 = first(inc_dx_crit_list___1), # Or the most frequent if needed + dtc_ever = first(dtc_ever), # Taking the first observed value for dtc_ever + .groups = "drop" + ) + +# Step 2: Create the contingency table +contingency_table_TNBC <- table(TNBC_dtc_status$inc_dx_crit_list___1, TNBC_dtc_status$dtc_ever) +contingency_table_TNBC + +# Step 3: Perform the chi-squared test of independence +chisq_test <- chisq.test(contingency_table_TNBC) + +# Step 4: p-val is 0.17 +chisq_test + + +#ER vs non-ER +#first create HR_status variable +subset_data <- subset_data |> + mutate(HR_status = case_when( + final_receptor_group %in% c(2, 3) ~ "HR+", + final_receptor_group %in% c(1, 4) ~ "Non-HR+", + TRUE ~ NA_character_ # In case there are missing or other unexpected values + )) + +# View the new HR_status variable +table(subset_data$HR_status) + +HR_status_by_participant <- subset_data %>% + group_by(participant_id) %>% + summarise(HR_status = first(HR_status), # Or use mode() if you have multiple rows per participant + .groups = "drop") + +# View the result +table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-) + +# Summarize dtc_detected status by HR_status, for each unique participant_id +summary_data <- subset_data %>% + group_by(participant_id) %>% + summarise( + HR_status = first(HR_status), # Get the HR_status for the participant + dtc_status = first(dtc_ever), # Get the dtc_detected status for the participant + .groups = "drop" + ) + +contingency_table_HR <- table(summary_data$dtc_status, summary_data$HR_status) +contingency_table_HR +chisq_test <- chisq.test(contingency_table_HR) + +# Print chi-squared test results #0.28 +chisq_test + + + + +###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported + +# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported +summary_data <- subset_data %>% + filter(final_tumor_grade != 3) %>% # Exclude grade == 3 + group_by(participant_id) %>% + summarise( + grade = first(final_tumor_grade), # Get the final_tumor_grade for each participant + dtc_ever = first(dtc_ever), # Get the dtc_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of grade vs dtc_ever +contingency_table <- table(summary_data$grade, summary_data$dtc_ever) + +# View the contingency table +print(contingency_table) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# View the Chi-squared test result -- p-value 0.12 NOT SIG for DTCs +print(chisq_test) + +######histology #people have different combinations of histology (1-15) +table(subset_data$participant_id, subset_data$final_histology) + +histology_summary <- subset_data %>% + distinct(participant_id, final_histology) %>% # Get unique participant-histology combinations + group_by(final_histology) %>% # Group by histology type + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology + +#trying to create Ductal, lobular, both, or other variables +subset_data <- subset_data %>% + mutate(histology_category = case_when( + grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular", # Both Ductal and Lobular + grepl("3", as.character(final_histology)) ~ "Ductal", # Ductal + grepl("14", as.character(final_histology)) ~ "Lobular", # Lobular + TRUE ~ "Other" # Any other combination + )) + +# Count the number of participants in each histology category +histology_counts <- subset_data %>% + group_by(histology_category) %>% + summarise(count = n_distinct(participant_id)) # Count distinct participants + +# View the counts -- adds up to 109! +print(histology_counts) + +#contingency table +library(tidyr) +contingency_table <- subset_data %>% + distinct(participant_id, histology_category, dtc_ever) %>% # Ensure each patient is counted once + count(histology_category, dtc_ever) %>% + pivot_wider(names_from = dtc_ever, values_from = n, values_fill = list(n = 0)) # Pivot the table to get dtc_ever as columns + +# 3. Perform the Chi-squared test of independence +chisq_test <- chisq.test(contingency_table[,-1]) # Remove the histology_category column for the test + +# 4. Print the contingency table +print(contingency_table) + +# 5. Print the result of the Chi-squared test p-value - 0.03 ### More ductal positive generally compard to all histology +print(chisq_test) + + + +#### Stage -- N stage --> come back to this N stage stuff + +table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3) + +nodal_summary <- subset_data %>% + distinct(participant_id, final_n_stage) %>% # Get unique participant-stage combinations + group_by(final_n_stage) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +#View the summary table --adds up to 109, 46 = pN0 63 = pN1 +print(nodal_summary) + +subset_data_by_id <- subset_data %>% + filter(final_n_stage %in% c(0, 1, 2, 3)) %>% # Include only relevant nodal stages + group_by(participant_id) %>% + summarise( + nodal_status = first(final_n_stage), # Use final_n_stage as nodal_status for each participant + dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant + .groups = "drop" + ) + +# Step 3: Create a contingency table of nodal_status vs dtc_ever +contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$dtc_ever) + +# Step 4: Check if any cells in the contingency table have zero counts, which could affect test validity +print(contingency_table) + +# Step 5: Perform Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Step 6: Print the Chi-squared test result p = 0.0001 +print(chisq_test) + + +#### Creating Node - vs node + variable from summary variable +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"), # Node negative if 0, positive otherwise + dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant + .groups = "drop" + ) + +# Step 2: Create a contingency table of node_status vs dtc_ever +contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$dtc_ever) + +# Step 3: Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Step 4: Print the contingency table and Chi-squared test results +print(contingency_table) +print(chisq_test) + + +####### EXTRA CODE/CONFIRMATION / slightly different but ignore for our analysis +#cross-check with indicator pN0 in our data that reflects nodal positivity.... there is 1 patient that is node - by summary variable but node + by indicator variable +## should double check this at some point +node_pos <- subset_data %>% + distinct(participant_id, inc_dx_crit_list___2) %>% # Get unique participant-stage combinations + group_by(inc_dx_crit_list___2) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +print(node_pos) + +contingency_table <- subset_data %>% + distinct(participant_id, inc_dx_crit_list___2, dtc_ever) %>% # Ensure unique participants + count(inc_dx_crit_list___2, dtc_ever) %>% # Count occurrences + spread(key = dtc_ever, value = n, fill = 0) # Spread data into a matrix + +# View the contingency table +print(contingency_table) + +# Perform the Chi-square test =0.3902 +chi_square_result <- chisq.test(contingency_table[, -1]) # Exclude the first column with the levels +print(chi_square_result) + + + + + +#######t stage final_t_stage + +table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (90 = pTx) so can proceed with this + +t_summary <- subset_data %>% + distinct(participant_id, final_t_stage) %>% # Get unique participant-stage combinations + group_by(final_t_stage) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2 +print(t_summary) + + +#### T stage, for our T stage table, will use T1 vs T2 or greater to simplify +#exclude 99 (the pTx) +subset_data_clean <- subset_data %>% + filter(final_t_stage != 99, dtc_ever != 99) + +# Combine final_t_stage into T1 vs. T2 or greater +subset_data_clean <- subset_data_clean %>% + mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater")) + +# Summarize the data by participant_id after creating the new combined t_stage +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of final_t_stage_combined vs dtc_ever +contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results +print(contingency_table) +print(chisq_test) + +#### TRy for T stage stats using T3 or greater as cutoff -- not super useful, So DONOT USE THIS FOR TABLE + +#exclude 99 (the pTx) +subset_data_clean <- subset_data %>% + filter(final_t_stage != 99, dtc_ever != 99) + +# Combine final_t_stage into T1/T2 or T3 or greater +subset_data_clean <- subset_data_clean %>% + mutate(final_t_stage_combined = case_when( + final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2", # Group T1 and T2 together + final_t_stage >= 3 ~ "T3 or greater", # Group T3 and higher as a separate category + TRUE ~ NA_character_ # Handle any unexpected values + )) + + +# Summarize the data by participant_id after creating the new combined t_stage +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of final_t_stage_combined vs dtc_ever +contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> not significant so ignore this +print(contingency_table) +print(chisq_test) + + + +########stage of disease -- final_overall_stage + +table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this + +stage_summary <- subset_data %>% + distinct(participant_id, final_overall_stage) %>% # Get unique participant-stage combinations + group_by(final_overall_stage) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay) +print(stage_summary) + +#exclude the 99 +subset_data_clean <- subset_data %>% + filter(final_overall_stage != 99, dtc_ever != 99) + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + final_overall_stage = first(final_overall_stage), # Get the final_overall_stage for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs dtc_ever +contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> kind of interesting, stage doesnt seem to predict dtc pos --> 0.80 +print(contingency_table) +print(chisq_test) + + + + +###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) + + +table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness + +surgery <- subset_data %>% + distinct(participant_id, diag_surgery_type_1) %>% # Get unique participant-stage combinations + group_by(diag_surgery_type_1) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(surgery) + + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + surgery = first(diag_surgery_type_1), # Get the final_overall_stage for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs dtc_ever +contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 0.48.... +print(contingency_table) +print(chisq_test) + + + +######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms) + +table(subset_data$diag_axillary_type___2_1) +table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two + +# Create a binary variable to identify participants who had axillary dissection +subset_data_clean <- subset_data %>% + mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) + +# Ensure every participant has a dtc_ever and axillary_dissection value +# Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one +subset_data_clean <- subset_data %>% + mutate(axillary_dissection = case_when( + diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1, # Had axillary dissection + TRUE ~ 0 # No axillary dissection (includes missing values) + )) + +# Summarize the data by participant_id, including the axillary_dissection and dtc_ever variables +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + axillary_dissection = first(axillary_dissection), # Get the axillary dissection status for each participant + dtc_ever = first(dtc_ever) # Get the dtc_ever status for each participant + ) + +contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) +fishers <- fisher.test(contingency_table) +print(fishers) + +# Print the contingency table and Chi-squared test results --> p-value 0.173 (used chisq for consistency...) +print(contingency_table) +print(chisq_test) + +####inflammatory inflamm_yn -- IGNORE THIS for Table 1 +table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable) +table(d$inflamm_yn_2) ### I think inflammatory folks just not in subset of patients in the dtc cohort +table(subset_data$inflamm_yn) + +#### radiation prtx_radiation +table(subset_data$prtx_radiation) + +radiation <- subset_data |> + distinct(participant_id,prtx_radiation) |> + group_by(prtx_radiation) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(radiation) + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + radiation = first(prtx_radiation), # Get the final_overall_stage for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs dtc_ever +contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) +fishers <- fisher.test(contingency_table) +print(fishers) + +# Print the contingency table and Chi-squared test results --> p-val = 0.77 +print(contingency_table) +print(chisq_test) + + +#### chemotherapy prtx_chemo +table(subset_data$prtx_chemo) + +chemo <- subset_data |> + distinct(participant_id,prtx_chemo) |> + group_by(prtx_chemo) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(chemo) #3 people didn not get chemo in this cohort + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + chemo = first(prtx_chemo), # Get the final_overall_stage for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs dtc_ever +contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) +fishers <- fisher.test(contingency_table) +print(fishers) + +# Print the contingency table and Chi-squared test results --> p-val = 0.60 +print(contingency_table) +print(chisq_test) + + + +####neoadjuvant chemo diag_neoadj_chemo_1 or diag_neoadj_chemo_2 + +table(subset_data$diag_neoadj_chemo_1) +table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable + +nact <- subset_data |> + distinct(participant_id,diag_neoadj_chemo_1) |> + group_by(diag_neoadj_chemo_1) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(nact) #3 people didn not get chemo in this cohort + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + nact = first(diag_neoadj_chemo_1), # Get the final_overall_stage for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs dtc_ever +contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 0.37 slightly greater trend than with ctDNA +print(contingency_table) +print(chisq_test) + + +####hormone therapy prtx_endo + +table(subset_data$prtx_endo) + +endo <- subset_data |> + distinct(participant_id,prtx_endo) |> + group_by(prtx_endo) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(endo) #most ppl did get endo (62 of the 109) + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + endo = first(prtx_endo), # Get the final_overall_stage for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs dtc_ever +contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 0.50 +print(contingency_table) +print(chisq_test) + + + + +####bone modifying agents prtx_bonemod + +table(subset_data$prtx_bonemod) + +bonemod <- subset_data |> + distinct(participant_id,prtx_bonemod) |> + group_by(prtx_bonemod) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(bonemod) #most ppl did get endo (39 got bonemod) + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + bonemod = first(prtx_bonemod), # Get the final_overall_stage for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of bonemod vs dtc_ever +contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 1 +print(contingency_table) +print(chisq_test) + + + +####referred to trial fu_trial_yn --> this variable seems to have disappeared. I can re-make it based on fu_trial_pid +#### SOMETHING WEIRD HAPPENING WITH TRIAL REFERRAL HERE WHEN I LOOK AT IT BY DTC +names(d) + +table(subset_data$fu_trial_yn) #this variable does not exist in our data set + +subset_data <- subset_data %>% + mutate(trial = ifelse(!is.na(fu_trial_pid) & fu_trial_pid != "", "Yes", "No")) +print(subset_data$trial) + + +trial <- subset_data |> + distinct(participant_id,trial) |> + group_by(trial) |> # Group by trial yes/no + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(trial) #38 pts went on trial based on this fu_trial_id + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + trial = first(trial), # Get the final_overall_stage for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of final_overall_stage vs dtc_ever +contingency_table <- table(subset_data_by_id$trial, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> ### something weird +print(contingency_table) +print(chisq_test) + + + + + +Later +#2 = non-pcr, 1 = pcr +#path cr diag_pcr_1 or diag_pcr_2 +table(subset_data$diag_pcr_1) +table(subset_data$diag_pcr_2) #none recorded here si can just use pcr_1 + +pcr <- subset_data %>% + mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) %>% # Convert "." to NA + filter(!is.na(diag_pcr_1)) %>% # Exclude rows where diag_pcr_1 is NA + distinct(participant_id, diag_pcr_1) %>% + group_by(diag_pcr_1) %>% + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data + +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + pcr = first(diag_pcr_1), # Get the final_overall_stage for each participant + dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant + ) + +# Create a contingency table of pcr vs dtc_ever +contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results --> p-val = 0.86 -- does not seem to be association among those who got pcr (but also we have a group with 1 in it...) +print(contingency_table) +print(chisq_test) + + + +########recurrence +#local first, then distant.then create summary variable of either locreg or distant +#local fu_locreg_prog + +# Step 1: Summarize data by unique participant_id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + fu_locreg_prog = first(fu_locreg_prog), # Get fu_locreg_prog status for each participant + dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant + .groups = "drop" + ) + +# Step 2: Create a contingency table of fu_locreg_prog vs dtc_ever +contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$dtc_ever) + +# Step 3: Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Step 4: Print the contingency table and Chi-squared test results -- p-val of 0.74, less of an association (but pts on trial) +print(contingency_table) +print(chisq_test) + +####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char +### Just want to look at site distribution here + +# Summarize the distribution of fu_locreg_site_char by unique participant_id +site_distribution <- subset_data %>% + group_by(participant_id) %>% + summarise( + site = first(fu_locreg_site_char), # Get the site for each unique participant + .groups = "drop" + ) %>% + count(site) # Count the occurrences of each site + +# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast +print(site_distribution) + +#####distant recurrence: distant fu_dist_prog + +# Step 1: Summarize data by unique participant_id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + fu_dist_prog = first(fu_dist_prog), # Get fu_dist_prog status for each participant + dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant + .groups = "drop" + ) + +# Step 2: Create a contingency table of dist prog vs dtc_ever --> 12 who had distant progression +contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$dtc_ever) + +# Step 3: Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Step 4: Print the contingency table and Chi-squared test results -- p-val 0.63 +print(contingency_table) +print(chisq_test) + + +### Distant sites +#distant site fu_dist_site_num #fu_dist_site_char -- start justl ooking at the locations + +# Summarize the distribution of fu_dist_site_char by unique participant_id +dist_site_distribution <- subset_data %>% + group_by(participant_id) %>% + summarise( + site = first(fu_dist_site_char), # Get the site for each unique participant + .groups = "drop" + ) %>% + count(site) # Count the occurrences of each site + +# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal +print(dist_site_distribution) + +#any recurrence +#either fu_locreg_prog or fu_dist_prog + +subset_data <- subset_data %>% + mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No")) + +# link by participant id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + dtc_ever = first(dtc_ever), # Get the dtc_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of ever_relapsed vs dtc_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results -- total 14 relapses, 10 were dtc - 4 were dtc + +print(contingency_table) +print(chisq_test) + +#### Relapse and DTC +#using ever_relapsed + +# link by participant id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + dtc = first(dtc_ever), # Get the dtc_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of ever_relapsed vs dtc_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results +print(contingency_table) +print(chisq_test) + +# Identify participants missing data in either `ever_relapsed` or `dtc_ever` +missing_data <- subset_data_by_id %>% + filter(is.na(ever_relapsed) | is.na(dtc)) + +# Print the IDs of participants with missing data +print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above) + + + +####survival analysis fu_survival + +table(subset_data$fu_surv) + +surv <- subset_data %>% + distinct(participant_id, fu_surv) %>% + group_by(fu_surv) %>% + summarise(count = n()) # Count the number of participants per histology type + +# View the summary table +print(surv) #1 NA patient --> identify the NA patient below dead = 5, alive 103. There is 1 that's an NA. + +na_participant <- subset_data %>% + filter(is.na(fu_surv)) %>% + select(participant_id, fu_surv) + +# Print the result -- 28115-17-021 -- no follow up data for this pt looking in redcap, everyone else has some survival data in the dtc cohort. +print(na_participant) + +# Summarize data by unique participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + surv = first(fu_surv), # Get survival status for each participant + dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of surv vs dtc_ever +contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$dtc_ever) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results +print(contingency_table) +print(chisq_test) + +########## + +### looking at the population of ctDNA + and other testing + +###### Messing around with code to look at ctDNA and relapse + +# Filter for participants with ctDNA ever positive +ctDNA_ever_participants <- subset(subset_data, ctDNA_ever == 1) + +# Select relevant columns (participant_id, timepoint, test results, relapse status) +ctDNA_test_results <- ctDNA_ever_participants %>% + select(participant_id, timepoint, ctDNA_ever, ctDNA_detected, collection_date, dtc_ever, ever_relapsed, fu_locreg_date, fu_dist_date, fu_date_to) + +# View results# View resultsctDNA_ever +print(ctDNA_test_results, n=26) + +#### We do NOT see clearance in the 1 patient who has not recurred 21-024 (but that may just be because we do not have any follow up testing on that patient)--we do have f/u data on that patient and +#### she has not recurred as of 10/2024. + +### for the patietns who went onto clever -- we see clearance (17-032 cleared after several) + + +#### ctDNA positivity graph #### none of this was working, code in extra code doc. nick putting together + + +#### swimmers + +library(khroma) +library(here) +library(dplyr) + +bright <- khroma::color("bright") +light <- khroma::color("light") +hicon <- khroma::color("high contrast") + +# cols <- khroma::color("vibrant")(4) +cols <- hicon(3) +her2col <- "forestgreen" +hrcol <- "forestgreen" + +d <- read.csv(file = here("..", "Datasets from BAC (Colleen and Jean)", + "surmount184_merged_20241108.csv")) + +# Step 1: Identify all participant_ids where ctDNA_cohort == 1 +valid_participants <- d |> + filter(ctdna_cohort == 1) |> + pull(participant_id) |> + unique() + +# Step 2: Subset the data to include all rows where participant_id is in the +# valid list +subset_data <- d |> + filter(participant_id %in% valid_participants) + +# Count the number of unique participant_ids in the subset_data +unique_count <- subset_data |> + summarise(unique_participants = n_distinct(participant_id)) |> as.numeric() + +names(subset_data) + +######create the ctDNA Ever positive variable + +# Create the 'ctDNA_ever' variable: This will be 1 if ctDNA_detected was 1 for +# any record for the participant, otherwise 0. +subset_data <- subset_data %>% + group_by(participant_id) %>% + mutate(ctDNA_ever = if_else(any(ctDNA_detected == TRUE), TRUE, FALSE)) %>% + ungroup() + +# View the updated data +head(subset_data) +table(subset_data$participant_id, subset_data$ctDNA_ever) + +#100 negatives, 9 positives, as it should be! for ctDNA_ever +subset_data |> + group_by(participant_id) |> + summarize(ctDNA_ever = first(ctDNA_ever)) |> + count(ctDNA_ever) + +####### DTCS -- create DTC_ever -- come back to this ######## do the same thing +#for DTCs --ever dtc positive --> this is a little wonky, want to ensure we +#aren't actually eliminating any of the dtc results in our subsetting... + +names(subset_data) +#looking at the names of variables to find the DTC indicator variable +library(stringr) + +#variable is dtc_ihc_result_final +unique(subset_data$dtc_ihc_result_final) #0, NA, 1 +table(subset_data$dtc_ihc_result_final) +# 152 zeros, 41 positives as I can see, where are the NAs? + +subset_data <- subset_data |> + group_by(participant_id) |> + mutate(dtc_ever = if_else(any(dtc_ihc_result_final == 1, na.rm = TRUE), 1, 0)) |> + ungroup() + +# View the updated data +table(subset_data$participant_id, subset_data$dtc_ever) +# there are still some "NAs" --> is this just because some people have ctDNA +# results but no dTC results in this cohort? + + +#70 DTC ever negatives, 39 positives +subset_data |> + group_by(participant_id) |> + summarize(dtc_ever = first(dtc_ever)) |> + count(dtc_ever) + +subset_data <- subset_data %>% + mutate(trial = ifelse(!is.na(fu_trial_pid) & fu_trial_pid != "", TRUE, FALSE)) + +dsub <- subset(subset_data, + !(ID %in% c(17021, 18032)), + select = c("ID", "timepoint", "org_consent_date", "trial", + "final_receptor_group", + "collection_date", "ctDNA_detected", + "dtc_ihc_date_final", "dtc_ihc_result_final", + "fu_date_to", "fu_date_death", + "fu_locreg_date", "fu_dist_date")) + +dsub$collection_date <- as.Date(dsub$collection_date, format = "%d%b%Y") +dsub$dtc_ihc_date_final <- as.Date(dsub$dtc_ihc_date_final, + format = "%d%b%y:%H:%M:%S") +dsub[, c("org_consent_date", "fu_date_to", "fu_date_death", + "fu_locreg_date", "fu_dist_date")] <- + lapply(dsub[, c("org_consent_date", "fu_date_to", "fu_date_death", + "fu_locreg_date", "fu_dist_date")], + \(x) as.Date(x, format = "%m/%d/%Y")) + +key_dates <- subset(dsub, timepoint == "SURMOUNT-Baseline") +key_dates$start <- pmin(key_dates$collection_date, + key_dates$dtc_ihc_date_final, na.rm = T) +key_dates$end <- pmin(key_dates$fu_date_to, key_dates$fu_date_death, + na.rm = T) +key_dates$time_on_study <- as.numeric(key_dates$end - key_dates$start) +key_dates$death_time <- as.numeric(key_dates$fu_date_death - key_dates$start) + +key_dates$rank <- NA +key_dates$rank[key_dates$trial] <- + rank(key_dates$time_on_study[key_dates$trial], na.last = F) + +key_dates$rank[!key_dates$trial] <- + rank(key_dates$time_on_study[!key_dates$trial], na.last = F) + + max(key_dates$rank[key_dates$trial]) + 3 + +key_dates$locreg <- as.numeric(key_dates$fu_locreg_date - key_dates$start) +key_dates$dist <- as.numeric(key_dates$fu_dist_date - key_dates$start) +key_dates$prog <- with(key_dates, pmin(locreg, dist, na.rm = T)) + +dsub <- merge(dsub, subset(key_dates, + select = c("ID", "start", "end", + "time_on_study", "death_time", "rank", + "locreg", "dist", "prog")), + by = "ID", all = T) + +dsub$DTC_days <- as.numeric(dsub$dtc_ihc_date_final - dsub$start) +dsub$ctDNA_days <- as.numeric(dsub$collection_date - dsub$start) + +# Set up a blank plotting space without any points +# Set up a blank plotting space without any points +plot.new() +par(mar = c(4, 1, 1, 4) + 0.1) +plot.window(xlim = c(-200, 3000), ylim = c(0, 107)) + +# Draw simple lines for each patient's time on study +segments(x0 = 0, y0 = key_dates$rank, y1 = key_dates$rank, + x1 = key_dates$time_on_study, col = "#e7e7e7", lwd = 1) + +# Add x-axis with year labels +axis(side = 1, at = 0:8 * 365.25, labels = 0:8, line = 0) + +# Add x-axis label +mtext("Years on Study", side = 1, line = 2.5, cex = 1) + +# Add labels on the y-axis +mtext("DTC Positive", side = 2, line = 0, + at = mean(key_dates$rank[key_dates$trial]), las = 0) +mtext("DTC Negative", side = 2, line = 0, + at = mean(key_dates$rank[!key_dates$trial]), las = 0) + +# No frame around the plot +box(which = "plot", lty = "blank") + +# Receptor Status +dsub$hrPos <- dsub$final_receptor_group %in% c(2, 3) +dsub$her2Pos <- dsub$final_receptor_group %in% c(3, 4) +points(x = c(rep(-200, nrow(key_dates)), rep(-100, nrow(key_dates))), + y = rep(key_dates$rank, 2), + col = c(rep(her2col, nrow(key_dates)), rep(hrcol, nrow(key_dates))), + pch = 22, cex = .5, + lwd = .5) +points(x = rep(-100, sum(dsub$her2Pos)), y = dsub$rank[dsub$her2Pos], pch = 22, + col = her2col, bg = her2col, cex = .5) +points(x = rep(-200, sum(dsub$hrPos)), y = dsub$rank[dsub$hrPos], pch = 22, + col = hrcol, bg = hrcol, cex = .5) +text(x = -50, y = 109.5, "HER2+", pos = 3, cex = .5, xpd = NA, adj = 0) +text(x = -250, y = 109.5, "HR+", pos = 3, cex = .5, xpd = NA, adj = 1) + +# Recolor segments +dsub_ctDNApos <- subset(dsub, ctDNA_detected == "TRUE") +dsub_ctDNApos <- + do.call(rbind, + lapply(split(dsub_ctDNApos, dsub_ctDNApos$ID), \(x) { + x[which.min(x$ctDNA_days), ] + })) + +with(subset(dsub_ctDNApos, !is.na(prog)), + segments(x0 = ctDNA_days, + x1 = prog, + y0 = rank, + col = cols[3], lwd = 1)) + +with(subset(key_dates, !is.na(prog)), + segments(x0 = prog, + x1 = time_on_study, + y0 = rank, + col = "darkgray", lwd = 1)) + +# Deaths +points(x = key_dates$death_time, y = key_dates$rank, pch = 4, lwd = 1) + +# Locoregional Recurrence +points(x = key_dates$locreg, y = key_dates$rank, pch = 8, lwd = 2, + col = cols[1], cex = .6) + +# Distant Recurrence +points(x = key_dates$dist, y = key_dates$rank, pch = 3, lwd = 2, + col = cols[1], cex = .6) + +# ctDNA Testing +## ctDNA- +with(subset(dsub, ctDNA_detected == "FALSE"), + points(x = jitter(ctDNA_days), y = rank, pch = 23, + col = cols[3])) + +## ctDNA+ +with(subset(dsub, ctDNA_detected == "TRUE"), { + points(x = jitter(ctDNA_days), y = rank, pch = 23, # Diamond shape + col = "red", bg = "red", cex = 1) +}) + +#############3 +#KEY + +# Legend +legend(x = 2700, y = 65, + legend = c("ctDNA+", "Locoreg Recur", + "Dist Recur", "Death", + "ctDNA+ to Recurrence", + "Recurrence to End F/U"), + pch = c(23, 8, 3, 4, NA, NA), + lwd = c(rep(NA, 5), 2, 2), + col = c(cols[c(3, 2, 1, 1)], "black", cols[3], "darkgray"), + pt.lwd = c(rep(1, 4), rep(2, 3)), + pt.bg = c(paste0(cols[3], "88"), + paste0(cols[2], "88")), + xpd = NA, cex = .7) + + +legend(x = 2700, y = 65, + legend = c("ctDNA+", "Locoreg Recur", "Dist Recur", "Death", + "ctDNA+ to Recurrence", "Recurrence to End F/U"), + pch = c(23, 8, 3, 4, NA, NA), # Symbols: diamond, star, plus, cross + lwd = c(rep(NA, 4), 2, 2), # Line width for segments + col = c("red", cols[1], cols[1], "black", cols[3], "darkgray"), # Outline colors + pt.bg = c("red", NA, NA, NA, NA, NA), # Fill color, "red" for ctDNA+ + pt.cex = c(1, 1, 1, 1, NA, NA), # Point sizes for shapes + pt.lwd = c(1, 1, 1, 1, 2, 2), # Point line widths + xpd = NA, cex = 0.7) + +#### Extra Code + + +# DTC Testing +## DTC- +with(subset(dsub, dtc_ihc_result_final == 0), + points(x = jitter(DTC_days), y = rank, pch = 24, + col = cols[2])) + +## DTC+/ctDNA- +with(subset(dsub, dtc_ihc_result_final == 1), #& ctDNA_detected == "FALSE"), + points(x = DTC_days, y = rank, pch = 24, + col = cols[2], bg = paste0(cols[2], "88"), + cex = 0.6)) + +## ctDNA+/DTC+ +with(subset(dsub, ctDNA_detected == "TRUE" & dtc_ihc_result_final == 1), + points(x = ctDNA_days, y = rank, pch = 24, + col = cols[5], bg = paste0(cols[5], "90"))) +# +with(subset(dsub, ctDNA_detected == "TRUE" & dtc_ihc_result_final == 1), + points(x = ctDNA_days, y = rank, pch = 25, + col = cols[5], bg = paste0(cols[5], "88"))) + +# Legend +legend(x = 2700, y = 65, + legend = c("ctDNA+", "DTC+", "Locoreg Recur", + "Dist Recur", "Death", + "ctDNA+ to Recur", + "Recur to End F/U"), + pch = c(23, 24, 8, 3, 4, NA, NA), + lwd = c(rep(NA, 5), 2, 2), + col = c(cols[c(3, 2, 1, 1)], "black", cols[3], "darkgray"), + pt.lwd = c(rep(1, 4), rep(2, 3)), + pt.bg = c(paste0(cols[3], "88"), + paste0(cols[2], "88")), + xpd = NA, cex = .7) + + + + + + ``` -The `echo: false` option disables the printing of code (only output is displayed). From 2738b4ff22796e219b6179affc5434a6b77173a0 Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 25 Nov 2024 17:32:13 -0500 Subject: [PATCH 03/14] Update and rename FinalProject.qmd to FinalProjectTaranto.qmd --- FinalProject.qmd => FinalProjectTaranto.qmd | 592 ++++++-------------- 1 file changed, 171 insertions(+), 421 deletions(-) rename FinalProject.qmd => FinalProjectTaranto.qmd (84%) diff --git a/FinalProject.qmd b/FinalProjectTaranto.qmd similarity index 84% rename from FinalProject.qmd rename to FinalProjectTaranto.qmd index 5d928cf86..a0f816b63 100644 --- a/FinalProject.qmd +++ b/FinalProjectTaranto.qmd @@ -1,89 +1,84 @@ --- -title: "Final Presentation" +title: "Predictors of ctDNA positivity" +subtitle: "BMIN503/EPID600 Final Project" +author: "Eleanor Taranto" format: html editor: visual +number-sections: true +embed-resources: true --- -## Final Project Overview: Instructions -- The overview consists of 2-3 sentences summarizing the project and goals. +------------------------------------------------------------------------ +## Overview {#sec-overview} -## Introduction: +Link to my final project: https://github.com/ntaranto/BMIN503_Final_Project -After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA as risk factors for recurrence. In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells in the bone marrow and circulating tumor DNA in the blood. +After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA (ctDNA) as risk factors for recurrence. Historically, and in retrospective, observational studies, these markers of minimal residual disease (MRD) have demonstrated strong associations with recurrence and disease relapse. But it remains unclear how these associations persist with modern treatment paradigms what the time course of positivity and what clinical risk factors put patients at greatest risk for having DTC or ctDNA positivity and subsequent relapse--and which most strongly predict biomarker positivity. -This is a translational study, and so we will look at clinical risk factors for ctDNA and DTC positivity. +Optimizing detection, intervention and surveillance for MRD after breast cancer treatment utilizing highly sensitive, real-time assays to identify patients with dormant tumor cells and those undergoing tumor reactivation in-transit to recurrence would fundamentally change the paradigm of breast cancer follow up from one of “watchful waiting” to a proactive approach that could enable ongoing monitoring for subclinical disease and suppression or eradication of cells that could potentially metastasize. Ultimately, our approach has the potential to provide reassurance to patients with definitively negative MRD testing that they are unlikely to ever experience a relapse, enable effective MRD-based surveillance, detection and treatment strategies for those in whom it is detectable and, ultimately, prevent recurrence and subsequent death from metastatic breast cancer while preserving quality of life for millions of breast cancer survivors around the world. -## Methods-- +In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this overall study, we are assessing the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time, optimizing the type and number of tests needed to predict recurrence, outcomes and lead time, and further evaluating the long-term impact of our prior therapeutic interventions. In this specific analysis, we will look at clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed. -In SURMOUNT, +For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the association between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about the biomarkers of breast cancer recurrence and dormance more broadly. -```{r} -library(here) -library(dplyr) - d <- read.csv(file = here("..", "Datasets", - "surmount184_merged_20241108.csv")) -names(d) +## Introduction {#sec-introduction} -### I'm not sure what else goes into the methods for this vs results +Breast cancer is the most prevalent cancer since it is both common and treatable, with 5-year survival rates approaching 90%. Of the 14.5 million cancer survivors alive today in the U.S., nearly 1 in 4 are breast cancer survivors. Unfortunately, up to 30% of patients who receive adjuvant therapy for breast cancer will ultimately recur and die from their disease, typically as a consequence of metastatic recurrence. Since recurrent breast cancer is incurable, the propensity of cancers to recur following treatment is arguably the most important determinant of clinical outcome. -``` +Recurrent breast cancers arise from the pool of breast cancer cells that remain after initial treatment, likely in the form of minimal residual disease (MRD): local and disseminated residual tumor cells (RTCs) that survive in their host in a presumed dormant state following treatment of the primary breast cancer.The development of incurable metastatic disease is due to this persistent pool of residual disease resulting from escape of cells from the primary tumor, intravasation and survival in the circulation, and eventual extravasation and metastatic seeding. Many breast cancers pass through a latent phase in which disseminated tumor cells (DTCs) persist in niches where they may reside in a dormant state for months to decades. These DTCs exist in a temporary quiescent state of reversible cell-cycle arrest, from which some cells may eventually reactivate, resume proliferation and recirculate, at which point they can be detected as circulating tumor DNA (ctDNA) in the blood. Longitudinal studies demonstrate that the detection of DTCs in the bone marrow in such patients is associated with poorer disease-free, breast cancer recurrence-free, and overall survival compared to patients without DTCs. Several mechanisms implicated in this process by preclinical studies are therapeutically targetable, and we have developed several interventional trials aimed at targeting these DTCs that are fed by the SURMOUNT surveillance study. +In the SURMOUNT surveillance study, patients with early stage (i.e. curable) but high-risk breast cancer are enrolled and undergo initial baseline bone marrow assessment (BMA) for evaluation of DTCs, as well as peripheral blood assessment for retrospective ctDNA assesmsent. Patients who screen DTC positive--either at baseline or on yearly surveillance BMA--are referred for interventional trials. Patients who screen DTC negative remain in the SURMOUNT surveillance cohort and undergo yearly DTC assessment and peripheral blood collection. The first interventional trial, CLEVER, completed enrollment in 2021, and so this initial analysis of the surveillance cohort is focused on the patients who were enrolled for the purposes of accruing this first interventional trial. -## Results +Despite years of progress in breast cancer diagnostics and therapeutics, identifying the individuals at risk of recurrence -- and figuring out how to manage and minimize their elevated risk--remains a challenge. In this study, we seek to assess the clinical validity of DTC and ctDNA assessment and to better understand the population in which they may be useful. -```{r} +## Methods {#sec-methods} +“PENN SURMOUNT” is a single center, prospective, longitudinal cohort study examining MRD biomarkers among pts within 5 years (y) of BC diagnosis who completed all curative treatment except endocrine therapy. Eligible pts must have had: 1) TNBC, or 2) HER2+ or HR+ BC with positive LN and/or residual disease after neoadjuvant therapy, or 3) HR+ BC with a 21-gene Recurrence score \>25 and/or high risk Mammaprint. Pts had annual bone marrow aspirate (BMA) for DTCs by immunohistochemistry (using methods of Naume et al.). DTC+ pts went on therapeutic trial; DTC- pts had up to 5y of annual BMA and blood testing. ctDNA was retrospectively assessed using the RaDaR assay, which targets pt-specific somatic mutations identified by whole-exome sequencing (WES) of primary tumor tissue. -#summary variables: final_overall_stage final_t_stage final_n_stage -# final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') final_tumor_grade final_histology -# demo_race_final fu_locreg_site_num (numeric values for local regional site) -# fu_locreg_site_char (character values for local regional site) -# fu_dist_site_num (numeric values for distant site) -# fu_dist_site_char (character values for distant site) -# censor_date (most recent fu_date_to among patients who are alive without local or distant progression) +The ctDNA assessment was performed after bespoke panel development on tissue on peripheral blood from 109 patients by Neogenomics, inc. and provided back to the research team in .csv format, with the last data drop occurring July 30, 2024. DTC assessment was performed based on bone marrow assessment and ultimately entered into REDCap database through this same follow-up date. Clinical and demographic factors--and follow-up data--were abstracted by the TCE research team through October 2024 and entered into the same REDCap database. Data were exported in mid-October 2024 by the TCE data manager, and merged with the ctDNA data prior to hand-off for this analysis. The final locked and merged dataset, labeled "surmount184_merged_20241108.csv" is maintained in the TCE box for the ctDNA analysis, and a copy is being stored in the FinalProject_files. +**First,** we will import csv of final data, which is entitled "surmount184_merged_20241108.csv" -# identify if 22-021 and 21-033 are in here -participant_check <- c("28115-22-021", "28115-21-033") %in% d$participant_id +```{r} +library(here) +library(dplyr) -# Print results -names(participant_check) <- c("28115-22-021", "28115-21-033") -print(participant_check) -#neither is in here! great! +d <- read.csv(file = here("FinalProject_files", + "surmount184_merged_20241108.csv")) +``` -str(d) #to look at what structure variables are in -- > we will need to do some stuff with dates -#timepoint is a character (ok) -- do we need this to be factor? -#collection_date is a character, should convert to date -#eVAF numeric -#mean_VAF numeric -#total variants interger -#ctDNA_detected = character, ok -#ctdna_cohort = integer (but there are some NAs)-- we may want to -#censor_date --> should convert to date -#fu_dist_Date should conver to date -#_fu_locreg_date should convert to date +**Next,** we will limit data to the 109 patients who had ctDNA tested, of the 184 individuals. we will look at the names and structures of the variables in the dataset "d", of which there are 387, the majority of which are clinical variables, but some of which are outcome variables. +```{r} +#looking at the names of the variables, and the structure of the variables. +names(d) +str(d) -####date nonsense, come back to this#### NOT DOING THIS PART -library(dplyr) -#this does not quite work as it turns a bunch into NAs. May need to do this for each individual line based on how the data is structured -date_columns <- grep("date", colnames(d), value = TRUE, ignore.case = TRUE) -date_columns -#d[date_columns] <- lapply(d[date_columns], function(x) as.Date(x, format = "%d/%b/%Y")) -#str(d) #converted most of the date columns to dates +``` + +**Summary variables:** We have a few different important summary variables which we've identified. +summary variables: final_overall_stage final_t_stage final_n_stage, final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') final_tumor_grade final_histology, demo_race_final fu_locreg_site_num (numeric values for local regional site), fu_locreg_site_char (character values for local regional site), fu_dist_site_num (numeric values for distant site), fu_dist_site_char (character values for distant site), censor_date (most recent fu_date_to among patients who are alive without local or distant progression). + +**Limiting from the overall cohort (184) to the ctDNA cohort**: We know that this data merge contains 184 individuals (as this was the overall cohort of individuals that were screened for the CLEVER interventional study on SURMOUNT), but also know, from the separate ctDNA csv and the information from the Neogenomics summary data, that there were 109 individuals on whom ctDNA was assessed. We need to limit the data set "d" to this "ctDNA cohort"--we will call the ctDNA cohort "subset_data." We have an indicator variable "ctDNA_cohort" with which we can limit this subset. + +```{r} +#looking at the names of the variables, and the structure of the variables. +names(d) +str(d) ###### ctDNA to limit to ctDNA cohort (but ok to include NAs as long as they were ever ctDNA cohort == 1) --> shall call this subset_data -# Step 1: Identify all participant_ids where ctDNA_cohort == 1 +# Identified all participant_ids where ctDNA_cohort == 1 valid_participants <- d |> filter(ctdna_cohort == 1) |> pull(participant_id) |> unique() -# Step 2: Subset the data to include all rows where participant_id is in the valid list +# Subset the data to include all rows where participant_id is in the valid list subset_data <- d |> filter(participant_id %in% valid_participants) @@ -91,70 +86,85 @@ subset_data <- d |> unique_count <- subset_data |> summarise(unique_participants = n_distinct(participant_id)) -# View the result == 109!! +# View the result == 109! This is the correct # of patients. unique_count -#now we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected +``` + +Now that we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected. We can see, looking at the table by sample count using the ctDNA_detected variable (false = negative/ctDNA was NOT detected, true = positive/ctDNA was detected), that there were 385 negative samples, and 11 positive samples within the ctDNA cohort. Next, we will create the ctDNA_ever variable that will, by participant_id (which is the unique study ID), tell us if that participant ever had ctDNA detected. + +``` {r} #ctDNA_detected = character, ok names(subset_data) ### Excluding the FAILS from this cohort ######create the ctDNA Ever positive variable -table(subset_data$ctDNA_detected) # 2 missing and then FALSE, TRUE - +table(subset_data$ctDNA_detected) #385 FALSE, 11 TRUE # Create the 'ctDNA_ever' variable: # This will be 1 if ctDNA_detected was 1 for any record for the participant, otherwise 0. -subset_data <- subset_data %>% - group_by(participant_id) %>% - mutate(ctDNA_ever = if_else(any(ctDNA_detected == TRUE), TRUE, FALSE)) %>% +subset_data <- subset_data |> + group_by(participant_id) |> + mutate(ctDNA_ever = if_else(any(ctDNA_detected == TRUE), TRUE, FALSE)) |> ungroup() # View the updated data table(subset_data$participant_id, subset_data$ctDNA_ever) -#100 negatives, 9 positives, as it should be! for ctDNA_ever subset_data |> group_by(participant_id) |> summarize(ctDNA_ever = first(ctDNA_ever)) |> count(ctDNA_ever) -####### DTCS -- create DTC_ever -- come back to this ######## -#do the same thing for DTCs --ever dtc positive --> this is a little wonky, want to ensure we aren't actually eliminating any of the dtc results in our subsetting... +``` + +We can see here using the summary variable ctDNA_ever that there are 100 individuals with always negative results, and 9 individuals with "ever positive" ctDNA results, which matches our original ctDNA source data. + +**Ever DTC Positive** +Next, we will create a variable to represent whether someone ever had a DTC positive test. To do this, we will use the final result variable "dtc_ihc_result_final" which tells us, for a given sample/date, whether that DTC result was positive ("1") or negative ("0"). We see in this data set, by sample, that there are 221 negatives, and 49 positives, which aligns with our prior data and consorts. + +``` {r} names(subset_data) #looking at the names of variables to find the DTC indicator variable library(stringr) -#variable is dtc_ihc_result_final -#dtc_ihc_summary_count -#dtc_final_result_ date - -#looking at the unique counts for the different dtc variables, dtc_ihc_result_final has more than final_result so is the combined one -unique(subset_data$dtc_ihc_result_final) #0, NA, 1 -table(subset_data$dtc_ihc_result_final) # 152 zeros, 41 positives as I can see, where are the NAs? -table(d$dtc_ihc_result_final) -sum(is.na(subset_data$dtc_ihc_result_final)) #128 NAs -sum(is.na(subset_data$FINAL_RESULT)) # 447 NAs - - -unique(subset_data$FINAL_RESULT) -table(subset_data$FINAL_RESULT) #79 people using this one +#final result variable is dtc_ihc_result_final. This is on a by sample level though. +#final count for DTCs is dtc_ihc_summary_count +#final result date is dtc_final_result_ date +table(subset_data$dtc_ihc_result_final) #221 negatives, 49 positives +#making the dtc_ever variable subset_data <- subset_data |> group_by(participant_id) |> mutate(dtc_ever = if_else(any(dtc_ihc_result_final == 1, na.rm = TRUE), 1, 0)) |> ungroup() -# View the updated data table(subset_data$participant_id, subset_data$dtc_ever) -#70 DTC ever negatives, 39 positives, correct! subset_data |> group_by(participant_id) |> summarize(dtc_ever = first(dtc_ever)) |> count(dtc_ever) +``` +Looking at the number of DTC positives by unique participant, we see 70 DTC ever negatives, 39 positives, which aligns with our source data for this specific ctDNA cohort. + + +Describe the data used and general methodological approach used to address the problem described in the @sec-introduction. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why. + +------------------------------------------------------------------------ + +```{r} +1 + 1 +``` + +## Results {#sec-results} + +You can add options to executable code like this + +```{r} +#| echo: false #disables the printing of code (only output is displayed) ########### Variables to look at for Table 1 ######### @@ -848,49 +858,6 @@ print(contingency_table) print(chisq_test) - -####referred to trial fu_trial_yn --> this variable seems to have disappeared. I can re-make it based on fu_trial_pid - -names(d) - -table(subset_data$fu_trial_yn) #this variable does not exist in our data set - -subset_data <- subset_data %>% - mutate(trial = ifelse(!is.na(fu_trial_pid) & fu_trial_pid != "", "Yes", "No")) -print(subset_data$trial) - - -trial <- subset_data |> - distinct(participant_id,trial) |> - group_by(trial) |> # Group by trial yes/no - summarise(count = n()) # Count the number of participants per histology type - -# View the summary table -print(trial) #most ppl did get endo (62 of the 109) --> 38 pts went on trial - -# Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% - summarise( - trial = first(trial), # Get the final_overall_stage for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant - ) - -# Create a contingency table of final_overall_stage vs ctDNA_ever -contingency_table <- table(subset_data_by_id$trial, subset_data_by_id$ctDNA_ever) - -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) - -# Print the contingency table and Chi-squared test results --> p-val = 0.95 -print(contingency_table) -print(chisq_test) - - - - - -Later #2 = non-pcr, 1 = pcr #path cr diag_pcr_1 or diag_pcr_2 table(subset_data$diag_pcr_1) @@ -1055,6 +1022,61 @@ missing_data <- subset_data_by_id %>% # Print the IDs of participants with missing data print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above) +### look at relapses and ctDNA in the patinets who were dtc+ + +# Filter for DTC-positive patients who have NOT relapsed +dtc_not_recurred <- subset(subset_data, subset_data$dtc_ever == 1 & subset_data$fu_locreg_prog == 0 & subset_data$fu_dist_prog == 0) + +# Summarize the ctDNA positivity status within this subset +table(dtc_not_recurred$ctDNA_ever) + +### timing of ctDNA testing among those DTC + patients who relapsed +library(dplyr) + +# Step 1: Filter for DTC-positive participants with any relapse and arrange by date +dtc_relapsed_data <- subset_data %>% + filter(dtc_ihc_result_final == 1, ever_relapsed == 1) %>% + arrange(participant_id, timepoint) + +# Step 2: Calculate overall relapse date as the earliest of locoregional and distant relapse dates +dtc_relapsed_timing <- dtc_relapsed_data %>% + group_by(participant_id) %>% + summarize( + dtc_date = min(timepoint[dtc_ihc_result_final == 1], na.rm = TRUE), + relapse_date = pmin(fu_locreg_date, fu_dist_date, na.rm = TRUE), # Calculate earliest relapse date + .groups = "drop" + ) %>% + filter(!is.na(relapse_date)) # Keep only those with a relapse after DTC+ + +# Step 3: Find ctDNA tests before relapse for these participants +ctDNA_before_relapse <- subset_data %>% + filter(!is.na(ctDNA_test_date)) %>% + inner_join(dtc_relapsed_timing, by = "participant_id") %>% + filter(ctDNA_test_date < relapse_date) %>% + arrange(participant_id, ctDNA_test_date) + +# Step 4: Summarize ctDNA test counts and dates before relapse +ctDNA_summary <- ctDNA_before_relapse %>% + group_by(participant_id) %>% + summarize( + num_ctDNA_tests_before_relapse = n(), + ctDNA_test_dates = list(ctDNA_test_date), + .groups = "drop" + ) + +# View the results +ctDNA_summary + +# Filter for participant 28115-17-025 and select DTC and ctDNA-related columns +patient_results <- subset_data %>% + filter(participant_id == "28115-17-025") %>% + select(participant_id, timepoint, dtc_ever, dtc_ihc_date_final, dtc_ihc_result_final, + ctDNA_ever, collection_date, ctDNA_detected, ever_relapsed, fu_locreg_date, fu_dist_date) + +# View the results +patient_results + + ### look at ever_relapsed by ctDNA subset_data_by_id <- subset_data %>% @@ -1145,7 +1167,7 @@ print(chisq_test) #number of tests (ctDNA) library(dplyr) -# Assuming the status variable is named `ctDNA_status` in d, and then in subset +# Assuming the status variable is named `ctDNA_detected` in d, and then in subset status_summary_d <- d %>% group_by(ctDNA_detected) %>% summarise(total_samples = n(), .groups = "drop") @@ -2616,319 +2638,47 @@ chisq_test <- chisq.test(contingency_table) print(contingency_table) print(chisq_test) -########## - -### looking at the population of ctDNA + and other testing - -###### Messing around with code to look at ctDNA and relapse - -# Filter for participants with ctDNA ever positive -ctDNA_ever_participants <- subset(subset_data, ctDNA_ever == 1) - -# Select relevant columns (participant_id, timepoint, test results, relapse status) -ctDNA_test_results <- ctDNA_ever_participants %>% - select(participant_id, timepoint, ctDNA_ever, ctDNA_detected, collection_date, dtc_ever, ever_relapsed, fu_locreg_date, fu_dist_date, fu_date_to) - -# View results# View resultsctDNA_ever -print(ctDNA_test_results, n=26) - -#### We do NOT see clearance in the 1 patient who has not recurred 21-024 (but that may just be because we do not have any follow up testing on that patient)--we do have f/u data on that patient and -#### she has not recurred as of 10/2024. - -### for the patietns who went onto clever -- we see clearance (17-032 cleared after several) - - -#### ctDNA positivity graph #### none of this was working, code in extra code doc. nick putting together - - -#### swimmers - -library(khroma) -library(here) -library(dplyr) - -bright <- khroma::color("bright") -light <- khroma::color("light") -hicon <- khroma::color("high contrast") - -# cols <- khroma::color("vibrant")(4) -cols <- hicon(3) -her2col <- "forestgreen" -hrcol <- "forestgreen" - -d <- read.csv(file = here("..", "Datasets from BAC (Colleen and Jean)", - "surmount184_merged_20241108.csv")) +####### Making Table 1 ######### +library(tableone) -# Step 1: Identify all participant_ids where ctDNA_cohort == 1 -valid_participants <- d |> - filter(ctdna_cohort == 1) |> - pull(participant_id) |> - unique() +# Specify variables +continuous_vars <- c("age_summary", "dtc_count", "dtc_ihc_summary_count_final", + "timepoints_per_patient", "total_timepoints", "ctDNA_timepoints", "dtc_timepoints") -# Step 2: Subset the data to include all rows where participant_id is in the -# valid list -subset_data <- d |> - filter(participant_id %in% valid_participants) +categorical_vars <- c("demo_race_final", "final_receptor_group", "HR_status_by_participant", + "final_tumor_grade", "histology_category", "final_n_stage", + "nodal_status", "final_t_stage", "final_t_stage_combined", "final_overall_stage", + "diag_surgery_type_1", "axillary_dissection", "prtx_radiation", "prtx_chemo", + "diag_neoadj_chemo_1", "prtx_endo", "prtx_bonemod", "diag_pcr_1", "fu_locreg_prog", + "fu_locreg_site_char", "fu_dist_prog", "fu_dist_site_char", "ever_relapsed", + "fu_surv", "unique_timepoints") -# Count the number of unique participant_ids in the subset_data -unique_count <- subset_data |> - summarise(unique_participants = n_distinct(participant_id)) |> as.numeric() - -names(subset_data) -######create the ctDNA Ever positive variable - -# Create the 'ctDNA_ever' variable: This will be 1 if ctDNA_detected was 1 for -# any record for the participant, otherwise 0. +# Ensure categorical variables are factors with meaningful labels subset_data <- subset_data %>% - group_by(participant_id) %>% - mutate(ctDNA_ever = if_else(any(ctDNA_detected == TRUE), TRUE, FALSE)) %>% - ungroup() + mutate(across(all_of(categorical_vars), as.factor)) -# View the updated data -head(subset_data) -table(subset_data$participant_id, subset_data$ctDNA_ever) +## labels race +subset_data$demo_race_final <- factor(subset_data$demo_race_final, + levels = c("1", "2", "3"), + labels = c("White", "Black", "Asian")) -#100 negatives, 9 positives, as it should be! for ctDNA_ever -subset_data |> - group_by(participant_id) |> - summarize(ctDNA_ever = first(ctDNA_ever)) |> - count(ctDNA_ever) +### labels +subset_data$final_receptor_group <- factor(subset_data$final_receptor_group, + levels = c("ER+", "ER-", "HER2+"), + labels = c("ER Positive", "ER Negative", "HER2 Positive")) -####### DTCS -- create DTC_ever -- come back to this ######## do the same thing -#for DTCs --ever dtc positive --> this is a little wonky, want to ensure we -#aren't actually eliminating any of the dtc results in our subsetting... - -names(subset_data) -#looking at the names of variables to find the DTC indicator variable -library(stringr) - -#variable is dtc_ihc_result_final -unique(subset_data$dtc_ihc_result_final) #0, NA, 1 -table(subset_data$dtc_ihc_result_final) -# 152 zeros, 41 positives as I can see, where are the NAs? - -subset_data <- subset_data |> - group_by(participant_id) |> - mutate(dtc_ever = if_else(any(dtc_ihc_result_final == 1, na.rm = TRUE), 1, 0)) |> - ungroup() - -# View the updated data -table(subset_data$participant_id, subset_data$dtc_ever) -# there are still some "NAs" --> is this just because some people have ctDNA -# results but no dTC results in this cohort? - - -#70 DTC ever negatives, 39 positives -subset_data |> - group_by(participant_id) |> - summarize(dtc_ever = first(dtc_ever)) |> - count(dtc_ever) - -subset_data <- subset_data %>% - mutate(trial = ifelse(!is.na(fu_trial_pid) & fu_trial_pid != "", TRUE, FALSE)) - -dsub <- subset(subset_data, - !(ID %in% c(17021, 18032)), - select = c("ID", "timepoint", "org_consent_date", "trial", - "final_receptor_group", - "collection_date", "ctDNA_detected", - "dtc_ihc_date_final", "dtc_ihc_result_final", - "fu_date_to", "fu_date_death", - "fu_locreg_date", "fu_dist_date")) - -dsub$collection_date <- as.Date(dsub$collection_date, format = "%d%b%Y") -dsub$dtc_ihc_date_final <- as.Date(dsub$dtc_ihc_date_final, - format = "%d%b%y:%H:%M:%S") -dsub[, c("org_consent_date", "fu_date_to", "fu_date_death", - "fu_locreg_date", "fu_dist_date")] <- - lapply(dsub[, c("org_consent_date", "fu_date_to", "fu_date_death", - "fu_locreg_date", "fu_dist_date")], - \(x) as.Date(x, format = "%m/%d/%Y")) - -key_dates <- subset(dsub, timepoint == "SURMOUNT-Baseline") -key_dates$start <- pmin(key_dates$collection_date, - key_dates$dtc_ihc_date_final, na.rm = T) -key_dates$end <- pmin(key_dates$fu_date_to, key_dates$fu_date_death, - na.rm = T) -key_dates$time_on_study <- as.numeric(key_dates$end - key_dates$start) -key_dates$death_time <- as.numeric(key_dates$fu_date_death - key_dates$start) - -key_dates$rank <- NA -key_dates$rank[key_dates$trial] <- - rank(key_dates$time_on_study[key_dates$trial], na.last = F) - -key_dates$rank[!key_dates$trial] <- - rank(key_dates$time_on_study[!key_dates$trial], na.last = F) + - max(key_dates$rank[key_dates$trial]) + 3 - -key_dates$locreg <- as.numeric(key_dates$fu_locreg_date - key_dates$start) -key_dates$dist <- as.numeric(key_dates$fu_dist_date - key_dates$start) -key_dates$prog <- with(key_dates, pmin(locreg, dist, na.rm = T)) - -dsub <- merge(dsub, subset(key_dates, - select = c("ID", "start", "end", - "time_on_study", "death_time", "rank", - "locreg", "dist", "prog")), - by = "ID", all = T) - -dsub$DTC_days <- as.numeric(dsub$dtc_ihc_date_final - dsub$start) -dsub$ctDNA_days <- as.numeric(dsub$collection_date - dsub$start) - -# Set up a blank plotting space without any points -# Set up a blank plotting space without any points -plot.new() -par(mar = c(4, 1, 1, 4) + 0.1) -plot.window(xlim = c(-200, 3000), ylim = c(0, 107)) - -# Draw simple lines for each patient's time on study -segments(x0 = 0, y0 = key_dates$rank, y1 = key_dates$rank, - x1 = key_dates$time_on_study, col = "#e7e7e7", lwd = 1) - -# Add x-axis with year labels -axis(side = 1, at = 0:8 * 365.25, labels = 0:8, line = 0) - -# Add x-axis label -mtext("Years on Study", side = 1, line = 2.5, cex = 1) - -# Add labels on the y-axis -mtext("DTC Positive", side = 2, line = 0, - at = mean(key_dates$rank[key_dates$trial]), las = 0) -mtext("DTC Negative", side = 2, line = 0, - at = mean(key_dates$rank[!key_dates$trial]), las = 0) - -# No frame around the plot -box(which = "plot", lty = "blank") - -# Receptor Status -dsub$hrPos <- dsub$final_receptor_group %in% c(2, 3) -dsub$her2Pos <- dsub$final_receptor_group %in% c(3, 4) -points(x = c(rep(-200, nrow(key_dates)), rep(-100, nrow(key_dates))), - y = rep(key_dates$rank, 2), - col = c(rep(her2col, nrow(key_dates)), rep(hrcol, nrow(key_dates))), - pch = 22, cex = .5, - lwd = .5) -points(x = rep(-100, sum(dsub$her2Pos)), y = dsub$rank[dsub$her2Pos], pch = 22, - col = her2col, bg = her2col, cex = .5) -points(x = rep(-200, sum(dsub$hrPos)), y = dsub$rank[dsub$hrPos], pch = 22, - col = hrcol, bg = hrcol, cex = .5) -text(x = -50, y = 109.5, "HER2+", pos = 3, cex = .5, xpd = NA, adj = 0) -text(x = -250, y = 109.5, "HR+", pos = 3, cex = .5, xpd = NA, adj = 1) - -# Recolor segments -dsub_ctDNApos <- subset(dsub, ctDNA_detected == "TRUE") -dsub_ctDNApos <- - do.call(rbind, - lapply(split(dsub_ctDNApos, dsub_ctDNApos$ID), \(x) { - x[which.min(x$ctDNA_days), ] - })) - -with(subset(dsub_ctDNApos, !is.na(prog)), - segments(x0 = ctDNA_days, - x1 = prog, - y0 = rank, - col = cols[3], lwd = 1)) - -with(subset(key_dates, !is.na(prog)), - segments(x0 = prog, - x1 = time_on_study, - y0 = rank, - col = "darkgray", lwd = 1)) - -# Deaths -points(x = key_dates$death_time, y = key_dates$rank, pch = 4, lwd = 1) - -# Locoregional Recurrence -points(x = key_dates$locreg, y = key_dates$rank, pch = 8, lwd = 2, - col = cols[1], cex = .6) - -# Distant Recurrence -points(x = key_dates$dist, y = key_dates$rank, pch = 3, lwd = 2, - col = cols[1], cex = .6) - -# ctDNA Testing -## ctDNA- -with(subset(dsub, ctDNA_detected == "FALSE"), - points(x = jitter(ctDNA_days), y = rank, pch = 23, - col = cols[3])) - -## ctDNA+ -with(subset(dsub, ctDNA_detected == "TRUE"), { - points(x = jitter(ctDNA_days), y = rank, pch = 23, # Diamond shape - col = "red", bg = "red", cex = 1) -}) - -#############3 -#KEY - -# Legend -legend(x = 2700, y = 65, - legend = c("ctDNA+", "Locoreg Recur", - "Dist Recur", "Death", - "ctDNA+ to Recurrence", - "Recurrence to End F/U"), - pch = c(23, 8, 3, 4, NA, NA), - lwd = c(rep(NA, 5), 2, 2), - col = c(cols[c(3, 2, 1, 1)], "black", cols[3], "darkgray"), - pt.lwd = c(rep(1, 4), rep(2, 3)), - pt.bg = c(paste0(cols[3], "88"), - paste0(cols[2], "88")), - xpd = NA, cex = .7) - - -legend(x = 2700, y = 65, - legend = c("ctDNA+", "Locoreg Recur", "Dist Recur", "Death", - "ctDNA+ to Recurrence", "Recurrence to End F/U"), - pch = c(23, 8, 3, 4, NA, NA), # Symbols: diamond, star, plus, cross - lwd = c(rep(NA, 4), 2, 2), # Line width for segments - col = c("red", cols[1], cols[1], "black", cols[3], "darkgray"), # Outline colors - pt.bg = c("red", NA, NA, NA, NA, NA), # Fill color, "red" for ctDNA+ - pt.cex = c(1, 1, 1, 1, NA, NA), # Point sizes for shapes - pt.lwd = c(1, 1, 1, 1, 2, 2), # Point line widths - xpd = NA, cex = 0.7) - -#### Extra Code - - -# DTC Testing -## DTC- -with(subset(dsub, dtc_ihc_result_final == 0), - points(x = jitter(DTC_days), y = rank, pch = 24, - col = cols[2])) - -## DTC+/ctDNA- -with(subset(dsub, dtc_ihc_result_final == 1), #& ctDNA_detected == "FALSE"), - points(x = DTC_days, y = rank, pch = 24, - col = cols[2], bg = paste0(cols[2], "88"), - cex = 0.6)) - -## ctDNA+/DTC+ -with(subset(dsub, ctDNA_detected == "TRUE" & dtc_ihc_result_final == 1), - points(x = ctDNA_days, y = rank, pch = 24, - col = cols[5], bg = paste0(cols[5], "90"))) -# -with(subset(dsub, ctDNA_detected == "TRUE" & dtc_ihc_result_final == 1), - points(x = ctDNA_days, y = rank, pch = 25, - col = cols[5], bg = paste0(cols[5], "88"))) - -# Legend -legend(x = 2700, y = 65, - legend = c("ctDNA+", "DTC+", "Locoreg Recur", - "Dist Recur", "Death", - "ctDNA+ to Recur", - "Recur to End F/U"), - pch = c(23, 24, 8, 3, 4, NA, NA), - lwd = c(rep(NA, 5), 2, 2), - col = c(cols[c(3, 2, 1, 1)], "black", cols[3], "darkgray"), - pt.lwd = c(rep(1, 4), rep(2, 3)), - pt.bg = c(paste0(cols[3], "88"), - paste0(cols[2], "88")), - xpd = NA, cex = .7) +# Create table +table1 <- CreateTableOne(vars = c(continuous_vars, categorical_vars), + strata = "ctDNA_ever", + data = subset_data, + factorVars = categorical_vars) +# Print the table with p-values +print(table1, showAllLevels = TRUE, pDigits = 3) ``` - From b235d467405e27907c4efd00b2202d9beea86e51 Mon Sep 17 00:00:00 2001 From: ntaranto Date: Tue, 26 Nov 2024 15:16:50 -0500 Subject: [PATCH 04/14] Update FinalProjectTaranto.qmd --- FinalProjectTaranto.qmd | 503 +++++++++++++++++++++++++--------------- 1 file changed, 314 insertions(+), 189 deletions(-) diff --git a/FinalProjectTaranto.qmd b/FinalProjectTaranto.qmd index a0f816b63..0951c3872 100644 --- a/FinalProjectTaranto.qmd +++ b/FinalProjectTaranto.qmd @@ -156,7 +156,10 @@ Describe the data used and general methodological approach used to address the p ------------------------------------------------------------------------ ```{r} -1 + 1 + + + + ``` ## Results {#sec-results} @@ -166,6 +169,8 @@ You can add options to executable code like this ```{r} #| echo: false #disables the printing of code (only output is displayed) +library(dplyr) + ########### Variables to look at for Table 1 ######### ###### median age at diagnosis @@ -216,7 +221,7 @@ age_summary <- subset_data %>% .groups = "drop" ) -# View the summary table +# View the summary table for age print(age_summary) ##### Race: demo_race_final @@ -241,9 +246,6 @@ count_distinct_participants <- subset_data %>% count_distinct_participants - -library(dplyr) - # Step 1: Summarize by unique participant_id summarized_data <- subset_data %>% group_by(participant_id) %>% @@ -263,8 +265,7 @@ chisq_test <- chisq.test(contingency_table) chisq_test - -#receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') +#####receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') # Breakdown of final_receptor_group by unique participant_id receptor_status_by_participant <- subset_data %>% @@ -295,11 +296,8 @@ chisq_test <- chisq.test(contingency_table_receptor) chisq_test -#curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive) -#start with TNBC (using QDC) -#inclusion criteria inc_dx_crit___1 = TNBC - - +#I was curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive) +#inclusion criteria inc_dx_crit___1 = TNBC (This has been confirmed with the study team) #inc_dx_crit_list___1 TNBC_ctDNA_status <- subset_data %>% @@ -321,8 +319,8 @@ chisq_test <- chisq.test(contingency_table_TNBC) chisq_test -#ER vs non-ER -#first create HR_status variable +### HR positive vs HR negative (Hormone receptor positive vs hormone receptor negative) +#first, I need to create a HR positive variable (HR_status) subset_data <- subset_data |> mutate(HR_status = case_when( final_receptor_group %in% c(2, 3) ~ "HR+", @@ -358,10 +356,7 @@ chisq_test <- chisq.test(contingency_table_HR) chisq_test - - ###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported - # Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported summary_data <- subset_data %>% filter(final_tumor_grade != 3) %>% # Exclude grade == 3 @@ -384,8 +379,9 @@ chisq_test <- chisq.test(contingency_table) # View the Chi-squared test result -- p-value 0.0229 print(chisq_test) -######histology #people have different combinations of histology (1-15) - table(subset_data$participant_id, subset_data$final_histology) +######histology (final histology) +#people have different combinations of histology (1-15) +table(subset_data$participant_id, subset_data$final_histology) histology_summary <- subset_data %>% distinct(participant_id, final_histology) %>% # Get unique participant-histology combinations @@ -395,7 +391,7 @@ print(chisq_test) # View the summary table print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology - #trying to create Ductal, lobular, both, or other variables + #trying to create Ductal, lobular, both, or other variables --> histology_category subset_data <- subset_data %>% mutate(histology_category = case_when( grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular", # Both Ductal and Lobular @@ -430,11 +426,11 @@ print(chisq_test) - #### Stage -- N stage --> come back to this N stage stuff +#### Staging N stage (Nodal stage) table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3) - nodal_summary <- subset_data %>% +nodal_summary <- subset_data %>% distinct(participant_id, final_n_stage) %>% # Get unique participant-stage combinations group_by(final_n_stage) %>% # Group by stage summarise(count = n()) # Count the number of participants per histology type @@ -451,10 +447,10 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 .groups = "drop" ) - # Step 3: Create a contingency table of nodal_status vs ctDNA_ever + #Create a contingency table of nodal_status vs ctDNA_ever contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$ctDNA_ever) - # Step 4: Check if any cells in the contingency table have zero counts, which could affect test validity + # Check if any cells in the contingency table have zero counts, which could affect test validity print(contingency_table) # Step 5: Perform Chi-squared test @@ -464,7 +460,7 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 print(chisq_test) - #### Creating Node - vs node + variable from summary indicator variable + #### Node positive versus node negative: Using the final n stage to create a Node - vs node + variable from this summary indicator variable subset_data_by_id <- subset_data %>% group_by(participant_id) %>% summarise( @@ -473,46 +469,25 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 .groups = "drop" ) - # Step 2: Create a contingency table of node_status vs ctDNA_ever + #adding node_status to subset_data + subset_data <- subset_data %>% + left_join(subset_data_by_id %>% select(participant_id, node_status), by = "participant_id") + + + #Create a contingency table of node_status vs ctDNA_ever contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$ctDNA_ever) - # Step 3: Perform the Chi-squared test + # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) - # Step 4: Print the contingency table and Chi-squared test results + #Print the contingency table and Chi-squared test results print(contingency_table) print(chisq_test) - - ####### EXTRA CODE/CONFIRMATION / slightly different but ignore for our analysis - #cross-check with indicator pN0 in our data that reflects nodal positivity.... there is 1 patient that is node - by summary variable but node + by indicator variable - ## should double check this at some point - node_pos <- subset_data %>% - distinct(participant_id, inc_dx_crit_list___2) %>% # Get unique participant-stage combinations - group_by(inc_dx_crit_list___2) %>% # Group by stage - summarise(count = n()) # Count the number of participants per histology type - - print(node_pos) - - contingency_table <- subset_data %>% - distinct(participant_id, inc_dx_crit_list___2, ctDNA_ever) %>% # Ensure unique participants - count(inc_dx_crit_list___2, ctDNA_ever) %>% # Count occurrences - spread(key = ctDNA_ever, value = n, fill = 0) # Spread data into a matrix - - # View the contingency table - print(contingency_table) - - # Perform the Chi-square test =0.3902 - chi_square_result <- chisq.test(contingency_table[, -1]) # Exclude the first column with the levels - print(chi_square_result) - - - +#######Looking at T stage or tumor size: the variable is final_t_stage -#######t stage final_t_stage - - table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (90 = pTx) so can proceed with this + table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (99 = pTx, cannot evaluate) so can proceed with this t_summary <- subset_data %>% distinct(participant_id, final_t_stage) %>% # Get unique participant-stage combinations @@ -523,8 +498,7 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 print(t_summary) - #### T stage, for our T stage table, will use T1 vs T2 or greater to simplify - #exclude 99 (the pTx) + #for our T stage table, will use T1 vs T2 or greater to simplify, and we want to exclude 99 (the pTx). We will create "final_t_stage_combined" to represent this. subset_data_clean <- subset_data %>% filter(final_t_stage != 99, ctDNA_ever != 99) @@ -546,11 +520,11 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) - # Print the contingency table and Chi-squared test results + # Print the contingency table and Chi-squared test results. P value = 0.6 print(contingency_table) print(chisq_test) -#### TRy for T stage stats using T3 or greater as cutoff -- not super useful, So DONOT USE THIS FOR TABLE +#### I looked at a different cut-off for T stage stats using T3 or greater as cutoff and didn't see any significant difference so am not using this for the table. #exclude 99 (the pTx) subset_data_clean <- subset_data %>% @@ -585,7 +559,7 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 - ########stage of disease -- final_overall_stage + ########Overall stage of disease -- final_overall_stage table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this @@ -615,7 +589,7 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) - # Print the contingency table and Chi-squared test results --> hot dogggg p val = 0.006 + # Print the contingency table and Chi-squared test results --> hot dogggg p val = 0.006. Higher stage is associated with ctDNA_ever. print(contingency_table) print(chisq_test) @@ -624,7 +598,6 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 ###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) - table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness surgery <- subset_data %>% @@ -656,7 +629,7 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 -######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms) +######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms). Created a new variable (axillary_dissection) table(subset_data$diag_axillary_type___2_1) table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two @@ -665,6 +638,9 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 subset_data_clean <- subset_data %>% mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) + subset_data <- subset_data %>% + mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) + # Ensure every participant has a ctDNA_ever and axillary_dissection value # Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one subset_data_clean <- subset_data %>% @@ -685,18 +661,17 @@ table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) - fishers <- fisher.test(contingency_table) - print(fishers) - # Print the contingency table and Chi-squared test results --> p-value 0.173 (used chisq for consistency...) + # Print the contingency table and Chi-squared test results --> p-value 0.173 print(contingency_table) print(chisq_test) -####inflammatory inflamm_yn -- IGNORE THIS for Table 1 +####inflammatory (variable inflamm_yn)-- do not include inflammatory variable as there were NO inflammatory breast cancers in the ctDNA cohort. table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable) table(d$inflamm_yn_2) ### I think inflammatory folks just not in subset of patients in the ctDNA cohort -table(subset_data$inflamm_yn) +table(subset_data$inflamm_yn) + #### radiation prtx_radiation table(subset_data$prtx_radiation) @@ -712,7 +687,7 @@ print(radiation) subset_data_by_id <- subset_data_clean %>% group_by(participant_id) %>% summarise( - radiation = first(prtx_radiation), # Get the final_overall_stage for each participant + radiation = first(prtx_radiation), # xrt for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant ) @@ -721,8 +696,6 @@ contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$ctDNA_ # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) -fishers <- fisher.test(contingency_table) -print(fishers) # Print the contingency table and Chi-squared test results --> p-val = 0.33 print(contingency_table) @@ -738,13 +711,13 @@ chemo <- subset_data |> summarise(count = n()) # Count the number of participants per histology type # View the summary table -print(chemo) #3 people didn not get chemo in this cohort +print(chemo) #3 people did not get chemo in this cohort # Summarize the data by participant_id subset_data_by_id <- subset_data_clean %>% group_by(participant_id) %>% summarise( - chemo = first(prtx_chemo), # Get the final_overall_stage for each participant + chemo = first(prtx_chemo), # chemo for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant ) @@ -753,16 +726,14 @@ contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$ctDNA_ever # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) -fishers <- fisher.test(contingency_table) -print(fishers) -# Print the contingency table and Chi-squared test results --> p-val = 0.33 +# Print the contingency table and Chi-squared test results --> p-val = 0.59 print(contingency_table) print(chisq_test) -####neoadjuvant chemo diag_neoadj_chemo_1 or diag_neoadj_chemo_2 +####neoadjuvant chemo -- there are two variables for this that could theoretically be included: diag_neoadj_chemo_1 or diag_neoadj_chemo_2 table(subset_data$diag_neoadj_chemo_1) table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable @@ -773,17 +744,17 @@ nact <- subset_data |> summarise(count = n()) # Count the number of participants per histology type # View the summary table -print(nact) #3 people didn not get chemo in this cohort +print(nact) #3 people did not get chemo in this cohort # Summarize the data by participant_id subset_data_by_id <- subset_data_clean %>% group_by(participant_id) %>% summarise( - nact = first(diag_neoadj_chemo_1), # Get the final_overall_stage for each participant + nact = first(diag_neoadj_chemo_1), # NACT for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant ) -# Create a contingency table of final_overall_stage vs ctDNA_ever +# Create a contingency table of NACT vs ctDNA_ever contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$ctDNA_ever) # Perform the Chi-squared test @@ -820,13 +791,12 @@ contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$ctDNA_ever) # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) -# Print the contingency table and Chi-squared test results --> p-val = 0.95 +# Print the contingency table and Chi-squared test results --> p-val = 0.33 print(contingency_table) print(chisq_test) - ####bone modifying agents prtx_bonemod table(subset_data$prtx_bonemod) @@ -843,7 +813,7 @@ print(bonemod) #most ppl did get endo (39 got bonemod) subset_data_by_id <- subset_data_clean %>% group_by(participant_id) %>% summarise( - bonemod = first(prtx_bonemod), # Get the final_overall_stage for each participant + bonemod = first(prtx_bonemod), # Get bone mod status for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant ) @@ -857,11 +827,11 @@ chisq_test <- chisq.test(contingency_table) print(contingency_table) print(chisq_test) - -#2 = non-pcr, 1 = pcr -#path cr diag_pcr_1 or diag_pcr_2 +#### PCR -- did NOT include this in Table 1 as it aligns closely with NACT) +# 2 = non-pcr, 1 = pcr +#the variables of interest for path cr: diag_pcr_1 or diag_pcr_2 table(subset_data$diag_pcr_1) -table(subset_data$diag_pcr_2) #none recorded here si can just use pcr_1 +table(subset_data$diag_pcr_2) #none recorded here so can just use diag_pcr_1 pcr <- subset_data %>% mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) %>% # Convert "." to NA @@ -877,7 +847,7 @@ print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for subset_data_by_id <- subset_data_clean %>% group_by(participant_id) %>% summarise( - pcr = first(diag_pcr_1), # Get the final_overall_stage for each participant + pcr = first(diag_pcr_1), # Get pcr for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant ) @@ -968,9 +938,9 @@ dist_site_distribution <- subset_data %>% # View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal print(dist_site_distribution) -#any recurrence -#either fu_locreg_prog or fu_dist_prog +##### ANY Recurrence -- this includes either fu_locreg_prog or fu_dist_prog +#create ever_relapsed variable subset_data <- subset_data %>% mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No")) @@ -993,8 +963,8 @@ chisq_test <- chisq.test(contingency_table) print(contingency_table) print(chisq_test) -#### Relapse and DTC -#using ever_relapsed +#### Relapse and DTCs +#using ever_relapsed and dtc_ever # link by participant id subset_data_by_id <- subset_data %>% @@ -1022,61 +992,6 @@ missing_data <- subset_data_by_id %>% # Print the IDs of participants with missing data print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above) -### look at relapses and ctDNA in the patinets who were dtc+ - -# Filter for DTC-positive patients who have NOT relapsed -dtc_not_recurred <- subset(subset_data, subset_data$dtc_ever == 1 & subset_data$fu_locreg_prog == 0 & subset_data$fu_dist_prog == 0) - -# Summarize the ctDNA positivity status within this subset -table(dtc_not_recurred$ctDNA_ever) - -### timing of ctDNA testing among those DTC + patients who relapsed -library(dplyr) - -# Step 1: Filter for DTC-positive participants with any relapse and arrange by date -dtc_relapsed_data <- subset_data %>% - filter(dtc_ihc_result_final == 1, ever_relapsed == 1) %>% - arrange(participant_id, timepoint) - -# Step 2: Calculate overall relapse date as the earliest of locoregional and distant relapse dates -dtc_relapsed_timing <- dtc_relapsed_data %>% - group_by(participant_id) %>% - summarize( - dtc_date = min(timepoint[dtc_ihc_result_final == 1], na.rm = TRUE), - relapse_date = pmin(fu_locreg_date, fu_dist_date, na.rm = TRUE), # Calculate earliest relapse date - .groups = "drop" - ) %>% - filter(!is.na(relapse_date)) # Keep only those with a relapse after DTC+ - -# Step 3: Find ctDNA tests before relapse for these participants -ctDNA_before_relapse <- subset_data %>% - filter(!is.na(ctDNA_test_date)) %>% - inner_join(dtc_relapsed_timing, by = "participant_id") %>% - filter(ctDNA_test_date < relapse_date) %>% - arrange(participant_id, ctDNA_test_date) - -# Step 4: Summarize ctDNA test counts and dates before relapse -ctDNA_summary <- ctDNA_before_relapse %>% - group_by(participant_id) %>% - summarize( - num_ctDNA_tests_before_relapse = n(), - ctDNA_test_dates = list(ctDNA_test_date), - .groups = "drop" - ) - -# View the results -ctDNA_summary - -# Filter for participant 28115-17-025 and select DTC and ctDNA-related columns -patient_results <- subset_data %>% - filter(participant_id == "28115-17-025") %>% - select(participant_id, timepoint, dtc_ever, dtc_ihc_date_final, dtc_ihc_result_final, - ctDNA_ever, collection_date, ctDNA_detected, ever_relapsed, fu_locreg_date, fu_dist_date) - -# View the results -patient_results - - ### look at ever_relapsed by ctDNA subset_data_by_id <- subset_data %>% @@ -1093,11 +1008,11 @@ contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ct # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) -# Print the contingency table and Chi-squared test results +# Print the contingency table and Chi-squared test results, p < 0.00001 print(contingency_table) print(chisq_test) -####survival analysis fu_survival +####survival: fu_survival table(subset_data$fu_surv) @@ -1131,21 +1046,31 @@ contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$ctDNA_ever) # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) -# Print the contingency table and Chi-squared test results +# Print the contingency table and Chi-squared test results, p<0.00001 print(contingency_table) print(chisq_test) -############ Test Characteristics and Baseline versus cumulative positivity (ctDNA to start)####### +``` -### DTC by ctDNA (ever positive) + + +**Test Characteristics** + +Next, we will look at ctDNA and DTC test characteristics. First we will look at the association between ctDNA and DTC positivity. Next we will look at the number of tests. + +``` {r} + +############ Test Characteristics and Baseline versus cumulative positivity (ctDNA to start)####### + +### DTC by ctDNA (ever positive), association between test positivity. # link by participant id subset_data_by_id <- subset_data %>% group_by(participant_id) %>% summarise( - dtc = first(DTC_ever), # Get the ever dtc for each participant + dtc = first(dtc_ever), # Get the ever dtc for each participant ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant .groups = "drop" ) @@ -1507,50 +1432,143 @@ dtc_pos_vs_total <- subset_data %>% # Print the results print(dtc_pos_vs_total) + + +``` + +**Test Characteristics of ctDNA assay**: +Next we will look at the sensitivity and specificity of the ctDNA assay. +``` {r} ###### Test characteristics ctDNA +#trying to do ctDNA 2x2 with ever relapsed on a patient level -true_results <- subset_data$ever_relapsed # Actual labels (True: 1 = relapsed, 0 = not relapsed) -predicted_results <- subset_data$ctDNA_ever # Predicted labels (True: 1 = ctDNA ever detected, 0 = not detected) +library(dplyr) -# Create confusion matrix -conf_matrix <- table(Predicted = predicted_results, Actual = true_results) +# Exclude participants with all NA for `ctDNA_ever` or `ever_relapsed` +summarized_data <- subset_data %>% + filter(!is.na(ctDNA_ever) | !is.na(ever_relapsed)) %>% # Keep rows with at least one non-NA value + group_by(participant_id) %>% + summarize( + ctDNA_ever = max(ctDNA_ever, na.rm = TRUE), + ever_relapsed = max(ever_relapsed, na.rm = TRUE) + ) -# Calculate sensitivity, specificity, PPV, and NPV -sensitivity <- conf_matrix[2, 2] / sum(conf_matrix[2, ]) # True Positives / (True Positives + False Negatives) -specificity <- conf_matrix[1, 1] / sum(conf_matrix[1, ]) # True Negatives / (True Negatives + False Positives) -ppv <- conf_matrix[2, 2] / sum(conf_matrix[, 2]) # True Positives / (True Positives + False Positives) -npv <- conf_matrix[1, 1] / sum(conf_matrix[, 1]) # True Negatives / (True Negatives + False Negatives) +# Create the confusion matrix +confusion_matrix <- table(summarized_data$ctDNA_ever, summarized_data$ever_relapsed) -# Print the results -cat("Sensitivity: ", sensitivity, "\n") -cat("Specificity: ", specificity, "\n") -cat("Positive Predictive Value (PPV): ", ppv, "\n") -cat("Negative Predictive Value (NPV): ", npv, "\n") +# Extract counts from the confusion matrix +TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0) # True Positives +FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0) # False Positives +TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0) # True Negatives +FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0) # False Negatives + +# Calculate performance metrics +sensitivity <- TP / (TP + FN) # Sensitivity +specificity <- TN / (TN + FP) # Specificity +PPV <- TP / (TP + FP) # Positive Predictive Value +NPV <- TN / (TN + FN) # Negative Predictive Value + +# Create a data frame for the table +performance_table <- data.frame( + Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"), + Value = c(sensitivity, specificity, PPV, NPV) +) + +# Print the table +print(performance_table) + +#Format the table for better readability +library(knitr) +kable(performance_table, digits = 2, col.names = c("Metric", "Value")) +``` + +This ctDNA assay has high specificity (99%), with a high positive predictive value for relapse (94%) and also a high negative predictive value (88%). +``` {r} ### Test characteristics for DTC +library(dplyr) -# Example data: True results (ever relapsed) and predicted results (DTC ever detected) -true_results <- subset_data$ever_relapsed # Actual labels (True: 1 = relapsed, 0 = not relapsed) -predicted_results <- subset_data$dtc_ever # Predicted labels (True: 1 = DTC ever detected, 0 = not detected) +# Total unique DTC+ patients +total_dtc_plus <- subset_data %>% + filter(dtc_ihc_result_final == 1) %>% + distinct(participant_id) %>% + nrow() -# Create confusion matrix -conf_matrix_2 <- table(Predicted = predicted_results, Actual = true_results) +# Unique DTC+ patients who went on trial (those who have a trial ID fu_trial_pid) +dtc_plus_trial <- subset_data %>% + filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) %>% + distinct(participant_id) %>% + nrow() -# Calculate sensitivity, specificity, PPV, and NPV -sensitivity <- conf_matrix_2[2, 2] / sum(conf_matrix_2[2, ]) # True Positives / (True Positives + False Negatives) -specificity <- conf_matrix_2[1, 1] / sum(conf_matrix_2[1, ]) # True Negatives / (True Negatives + False Positives) -ppv <- conf_matrix_2[2, 2] / sum(conf_matrix_2[, 2]) # True Positives / (True Positives + False Positives) -npv <- conf_matrix_2[1, 1] / sum(conf_matrix_2[, 1]) # True Negatives / (True Negatives + False Negatives) +# Proportion of DTC+ patients who went on trial +proportion_trial <- dtc_plus_trial / total_dtc_plus + +# Display results +cat("Total unique DTC+ patients:", total_dtc_plus, "\n") +cat("Unique DTC+ patients who went on trial:", dtc_plus_trial, "\n") +cat("Proportion of DTC+ patients who went on trial:", proportion_trial, "\n") + +# All DTC + patients went on trial (39/39) + +``` + +All of the 39 individuals who were DTC positive went onto an interventional treatment trial aimed at eliminating the presence of the DTCs. It is therefore challenging to interpret the sensitivity and specificity of the DTC test, as relapse is the outcome and all of these patients are receiving an intervention aimed at eliminating the presence of the DTCs. + +``` {r} +##### Concordance between DTC and ctDNA + + +### concordance overall + +# Filter and get unique participants by participant_id +concordance_overall_unique <- subset_data %>% + distinct(participant_id, .keep_all = TRUE) %>% + mutate(concordance = ifelse(dtc_ever == ctDNA_ever, "Concordant", "Discordant")) + +# Count total concordant and discordant pairs for unique participants +overall_concordant <- sum(concordance_overall_unique$concordance == "Concordant") +overall_discordant <- sum(concordance_overall_unique$concordance == "Discordant") + +# Proportion of concordance +proportion_concordant <- overall_concordant / (overall_concordant + overall_discordant) + +cat("Overall Concordant (unique participants):", overall_concordant, "\n") +cat("Overall Discordant (unique participants):", overall_discordant, "\n") +cat("Overall Proportion Concordant (unique participants):", proportion_concordant, "\n") + +#Proportion concordance 63% (ever positive) + +# Ensure that dtc_ihc_result_final is converted to TRUE/FALSE for consistency with ctDNA_detected +concordance_by_timepoint <- subset_data %>% + filter(!is.na(dtc_ihc_result_final) & !is.na(ctDNA_detected)) %>% + mutate( + # Convert dtc_ihc_result_final (1 = TRUE, 0 = FALSE) to match ctDNA format (TRUE/FALSE) + dtc_ihc_result_final_bool = ifelse(dtc_ihc_result_final == 1, TRUE, FALSE), + # Create a column to determine concordance (both DTC and ctDNA being TRUE or both being FALSE) + concordance = ifelse(dtc_ihc_result_final_bool == ctDNA_detected, "Concordant", "Discordant") + ) %>% + group_by(timepoint) %>% + summarise( + total_concordant = sum(concordance == "Concordant"), + total_discordant = sum(concordance == "Discordant"), + total_samples = n(), # Total number of samples at this timepoint + concordance_rate = total_concordant / total_samples # Concordance rate per timepoint + ) + +# Print concordance results for each timepoint +print(concordance_by_timepoint) + +# Now calculate overall concordance across all timepoints +overall_concordance <- sum(concordance_by_timepoint$total_concordant) / + sum(concordance_by_timepoint$total_samples) + +cat("Overall Concordance Rate across all timepoints:", overall_concordance, "\n") +#concordance, considering testing by timepoint, is 80%, versus 63% when you consider the tests separately. Does this make sense? -# Print the results -cat("Sensitivity: ", sensitivity, "\n") -cat("Specificity: ", specificity, "\n") -cat("Positive Predictive Value (PPV): ", ppv, "\n") -cat("Negative Predictive Value (NPV): ", npv, "\n") #### how many DTC pts went on trial? @@ -2642,16 +2660,15 @@ print(chisq_test) library(tableone) # Specify variables -continuous_vars <- c("age_summary", "dtc_count", "dtc_ihc_summary_count_final", - "timepoints_per_patient", "total_timepoints", "ctDNA_timepoints", "dtc_timepoints") +continuous_vars <- c("age_summary", "dtc_count", "dtc_ihc_summary_count_final") -categorical_vars <- c("demo_race_final", "final_receptor_group", "HR_status_by_participant", +categorical_vars <- c("demo_race_final", "final_receptor_group", "final_tumor_grade", "histology_category", "final_n_stage", - "nodal_status", "final_t_stage", "final_t_stage_combined", "final_overall_stage", + "final_t_stage", "final_overall_stage", "diag_surgery_type_1", "axillary_dissection", "prtx_radiation", "prtx_chemo", "diag_neoadj_chemo_1", "prtx_endo", "prtx_bonemod", "diag_pcr_1", "fu_locreg_prog", "fu_locreg_site_char", "fu_dist_prog", "fu_dist_site_char", "ever_relapsed", - "fu_surv", "unique_timepoints") + "fu_surv") # Ensure categorical variables are factors with meaningful labels @@ -2660,8 +2677,16 @@ subset_data <- subset_data %>% ## labels race subset_data$demo_race_final <- factor(subset_data$demo_race_final, - levels = c("1", "2", "3"), + levels = c("5", "1", "3"), labels = c("White", "Black", "Asian")) +print(subset_data$demo_race_final) + +as.factor(d$demo_race_final) +#5 = white +#1 = black +#3 = asian + + ### labels subset_data$final_receptor_group <- factor(subset_data$final_receptor_group, @@ -2679,6 +2704,106 @@ table1 <- CreateTableOne(vars = c(continuous_vars, categorical_vars), # Print the table with p-values print(table1, showAllLevels = TRUE, pDigits = 3) +names(subset_data) + + +``` + + +I have chosen to perform a multivariable logistic regression to identify predictors of ctDNA positivity. ctDNA positivity is a binary outcome, and we have performed univariable analyses as above already to look at + +``` {r} +library(dplyr) + +# Univariable logistic regression +# Define the outcome variable +outcome <- "ctDNA_ever" + +# Continuous predictors +continuous_vars <- c("age_at_diag", "dtc_ihc_summary_count_final") + +# Categorical predictors +categorical_vars <- c("demo_race_final", "final_receptor_group", "HR_status", + "final_tumor_grade", "histology_category", "final_n_stage", + "final_t_stage", "final_overall_stage", + "diag_surgery_type_1", "axillary_dissection", "prtx_radiation", "prtx_chemo", + "diag_neoadj_chemo_1", "prtx_endo", "prtx_bonemod", "diag_pcr_1", "fu_locreg_prog", + "fu_locreg_site_char", "fu_dist_prog", "fu_dist_site_char", "ever_relapsed", + "fu_surv") + +# Univariable regression for continuous variables +univariable_results_continuous <- sapply(continuous_vars, function(var) { + formula <- as.formula(paste(outcome, "~", var)) + model <- glm(formula, data = subset_data, family = "binomial") + summary(model)$coefficients[2, c(1, 4)] # Extract coefficient and p-value +}) + +# Univariable regression for categorical variables (assuming factors are properly encoded) +univariable_results_categorical <- sapply(categorical_vars, function(var) { + formula <- as.formula(paste(outcome, "~", var)) + model <- glm(formula, data = subset_data, family = "binomial") + summary(model)$coefficients[2, c(1, 4)] # Extract coefficient and p-value +}) + +# Combine continuous and categorical results +univariable_results <- data.frame( + Variable = c(continuous_vars, categorical_vars), + Estimate = c(univariable_results_continuous[1,], univariable_results_categorical[1,]), + p_value = c(univariable_results_continuous[2,], univariable_results_categorical[2,]) +) + +# Print univariable results +print(univariable_results) ``` + +** do i need to do univariate associations + +We will next think about our multivariable regression model. We have several variables that were significant in our univariable analyses (chi-squared). These include median age-at-diagnosis, longer time from diagnosis to enrollment, higher tumor grade, and higher initial stage at diagnosis. Variables that were not significant but that could be considered include histology, nodal positivity, higher t-stage, and receptor status. While recurrence (both distant and local) as well as worse survival are significantly associated with ctDNA positivity, these are outcomes that we think of as following ctDNA positivity temporally and therefore should not be included in our predictive model as predictors. + +In thinking about what these variables represent, we think about the extent of treatment that patients have received as one major category. We also think about intrinsic tumor risk factors as another + +There is no specific method to choose variables, but generally purposeful selection begins with univariate analysis. Any varialbe having significance is selected as a cndidate for multivariate analysis, based on a p-value cut-offp oint of 0.25, as more stringent cutoffs can fail to identify variables known to be important. Significance is evaluated at the 0.1 alpha level, and confounding as a change in the parameter estimate greater than 15% or 20% compared to the full model. This is as per Hosmer and Lemeshow selection methodology. + +``` {r} + +# Multivariable logistic regression +# Combine all variables into a formula +multivariable_formula <- as.formula(paste(outcome, "~", paste(c(continuous_vars, categorical_vars), collapse = " + "))) + +# Fit the multivariable logistic regression model +multivariable_model <- glm(multivariable_formula, data = subset_data, family = "binomial") + +# Summary of the multivariable model +summary(multivariable_model) + +# Extract coefficients and p-values from the multivariable model +multivariable_results <- data.frame( + Variable = rownames(summary(multivariable_model)$coefficients), + Estimate = summary(multivariable_model)$coefficients[, 1], + Std_Error = summary(multivariable_model)$coefficients[, 2], + z_value = summary(multivariable_model)$coefficients[, 3], + p_value = summary(multivariable_model)$coefficients[, 4] +) + +# Print multivariable results +print(multivariable_results) + + +``` + + + +Of 184 pts enrolled from 2016 – 2021, 121 had tissue available; 114/121 (94%) had successful WES. A total of \_\_\_\_\_ plasma samples from 109 pts (median 2 timepoints each, range 1-12) have been successfully tested by RaDaR to date. + +Overall, ctDNA was detected in 11 samples from 9/96 pts (9.3%) with a median eVAF of 0.009% (range 0.002-0.084%). Two pts were ctDNA+ at baseline (BL), and 7 became positive on surveillance. 87/96 (90.6%) were ctDNA- across all timepoints. + +34/96 pts (35%) were DTC+, either at BL (n=24, 25%) or after (n=10, 10%). Considering all timepoints, concordance was 64%. Of 34 ever-DTC+ pts, 4 (12%) were ctDNA+ (of whom 3/4 recurred) and 30 remained ctDNA- (with 1/30 who recurred). Among the 62 pts who remained DTC-, 5 (8%) were ctDNA+ (with 5/5 who recurred), and 57 remained ctDNA- (of whom 5/57 recurred). All ctDNA positivity in DTC+ pts occurred at the time of or after DTC positivity. Over median follow-up (f/u) of 65 months (m), BC recurrence occurred in 14/96 pts (15%), with 2 locoregional-only and 12 distant +/- locoregional recurrences (involving the bone, liver, lung/pleura, and brain); 8/14 pts (57%) were ctDNA+ prior to relapse. 7/12 (58%) with distant recurrences were ctDNA+ prior to metastatic diagnosis, at a median lead time of 15 m (range 0 – 25). Overall, ctDNA+ pts experienced a median lead time from ctDNA positivity to recurrence of 13 m (range 0 – 25). Only 1 of 9 ctDNA+ pts has not recurred; this pt was DTC+ and went on therapeutic trial, without evidence of recurrence over 20 m f/u. 30/34 DTC+ pts (89%) who went on therapeutic trial have not had ctDNA detected during f/u and have not recurred. Overall, ctDNA status was significantly associated with relapse (p\<0.01), with a PPV of 89% and NPV of 93%. Of the 24 BL DTC+ pts, 2 became ctDNA+ at subsequent timepoints, an average of 18 m after DTC assessment, and both relapsed (3 and 5 m from ctDNA detection, respectively). + +Describe your results and include relevant tables, plots, and code/comments used to obtain them. You may refer to the @sec-methods as needed. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you'd like, but this is not required. + +## Conclusion {#sec-conclusion} + + +In this study, X was associated with Y. From 99c87df3a3210d5c193da0c273b118c2cdca4d29 Mon Sep 17 00:00:00 2001 From: ntaranto Date: Fri, 29 Nov 2024 17:19:52 -0500 Subject: [PATCH 05/14] Add files via upload --- FinalProject.html | 553 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 553 insertions(+) create mode 100644 FinalProject.html diff --git a/FinalProject.html b/FinalProject.html new file mode 100644 index 000000000..b07acefc5 --- /dev/null +++ b/FinalProject.html @@ -0,0 +1,553 @@ + + + + + + + + + +Final Presentation + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+

Final Presentation

+
+ + + +
+ + + + +
+ + + +
+ + +
+

Final Project Overview: Instructions – The overview consists of 2-3 sentences summarizing the project and goals. For the introduction, the first paragraph describes the problem addressed, its significance, and some background to motivate the problem.

+

After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA as risk factors for recurrence. In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells in the bone marrow and circulating tumor DNA in the blood.

+
+ + +
+ + +
+ + + + + \ No newline at end of file From 0bfd9a5b0abde647e6143f0d15a24235cb03aa84 Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 2 Dec 2024 07:12:55 -0500 Subject: [PATCH 06/14] Update FinalProjectTaranto.qmd --- FinalProjectTaranto.qmd | 3376 ++++++++++++++++++++++++--------------- 1 file changed, 2054 insertions(+), 1322 deletions(-) diff --git a/FinalProjectTaranto.qmd b/FinalProjectTaranto.qmd index 0951c3872..b51dce736 100644 --- a/FinalProjectTaranto.qmd +++ b/FinalProjectTaranto.qmd @@ -20,7 +20,7 @@ Optimizing detection, intervention and surveillance for MRD after breast cancer In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this overall study, we are assessing the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time, optimizing the type and number of tests needed to predict recurrence, outcomes and lead time, and further evaluating the long-term impact of our prior therapeutic interventions. In this specific analysis, we will look at clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed. -For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the association between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about the biomarkers of breast cancer recurrence and dormance more broadly. +For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the associations between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about the biomarkers of breast cancer recurrence and dormance more broadly. ## Introduction {#sec-introduction} @@ -28,15 +28,15 @@ Breast cancer is the most prevalent cancer since it is both common and treatable Recurrent breast cancers arise from the pool of breast cancer cells that remain after initial treatment, likely in the form of minimal residual disease (MRD): local and disseminated residual tumor cells (RTCs) that survive in their host in a presumed dormant state following treatment of the primary breast cancer.The development of incurable metastatic disease is due to this persistent pool of residual disease resulting from escape of cells from the primary tumor, intravasation and survival in the circulation, and eventual extravasation and metastatic seeding. Many breast cancers pass through a latent phase in which disseminated tumor cells (DTCs) persist in niches where they may reside in a dormant state for months to decades. These DTCs exist in a temporary quiescent state of reversible cell-cycle arrest, from which some cells may eventually reactivate, resume proliferation and recirculate, at which point they can be detected as circulating tumor DNA (ctDNA) in the blood. Longitudinal studies demonstrate that the detection of DTCs in the bone marrow in such patients is associated with poorer disease-free, breast cancer recurrence-free, and overall survival compared to patients without DTCs. Several mechanisms implicated in this process by preclinical studies are therapeutically targetable, and we have developed several interventional trials aimed at targeting these DTCs that are fed by the SURMOUNT surveillance study. -In the SURMOUNT surveillance study, patients with early stage (i.e. curable) but high-risk breast cancer are enrolled and undergo initial baseline bone marrow assessment (BMA) for evaluation of DTCs, as well as peripheral blood assessment for retrospective ctDNA assesmsent. Patients who screen DTC positive--either at baseline or on yearly surveillance BMA--are referred for interventional trials. Patients who screen DTC negative remain in the SURMOUNT surveillance cohort and undergo yearly DTC assessment and peripheral blood collection. The first interventional trial, CLEVER, completed enrollment in 2021, and so this initial analysis of the surveillance cohort is focused on the patients who were enrolled for the purposes of accruing this first interventional trial. +In the SURMOUNT surveillance study, patients with early stage (i.e. curable) but high-risk breast cancer are enrolled and undergo initial baseline bone marrow assessment (BMA) for evaluation of DTCs, as well as peripheral blood assessment for retrospective ctDNA assesmsent. Patients who screen DTC positive--either at baseline or on yearly surveillance BMA--are referred for interventional trials. Patients who screen DTC negative remain in the SURMOUNT surveillance cohort and undergo yearly DTC assessment and peripheral blood collection for ctDNA assessment. The first intervention trial, CLEVER, completed enrollment in 2021, and so this initial analysis is focused on the patients who were enrolled for the purposes of accruing this first trial. -Despite years of progress in breast cancer diagnostics and therapeutics, identifying the individuals at risk of recurrence -- and figuring out how to manage and minimize their elevated risk--remains a challenge. In this study, we seek to assess the clinical validity of DTC and ctDNA assessment and to better understand the population in which they may be useful. +Despite years of progress in breast cancer diagnostics and therapeutics, identifying the individuals at risk of recurrence -- and figuring out how to manage and minimize their elevated risk--remains a challenge. In this study, we seek to assess the clinical validity of DTC and ctDNA assessment and to better understand the population in which they may be useful. Specifically, in this analysis, we looked at overall rates of ctDNA and DTC positivity in this cohort and clinical factors that were associated with each. ## Methods {#sec-methods} “PENN SURMOUNT” is a single center, prospective, longitudinal cohort study examining MRD biomarkers among pts within 5 years (y) of BC diagnosis who completed all curative treatment except endocrine therapy. Eligible pts must have had: 1) TNBC, or 2) HER2+ or HR+ BC with positive LN and/or residual disease after neoadjuvant therapy, or 3) HR+ BC with a 21-gene Recurrence score \>25 and/or high risk Mammaprint. Pts had annual bone marrow aspirate (BMA) for DTCs by immunohistochemistry (using methods of Naume et al.). DTC+ pts went on therapeutic trial; DTC- pts had up to 5y of annual BMA and blood testing. ctDNA was retrospectively assessed using the RaDaR assay, which targets pt-specific somatic mutations identified by whole-exome sequencing (WES) of primary tumor tissue. -The ctDNA assessment was performed after bespoke panel development on tissue on peripheral blood from 109 patients by Neogenomics, inc. and provided back to the research team in .csv format, with the last data drop occurring July 30, 2024. DTC assessment was performed based on bone marrow assessment and ultimately entered into REDCap database through this same follow-up date. Clinical and demographic factors--and follow-up data--were abstracted by the TCE research team through October 2024 and entered into the same REDCap database. Data were exported in mid-October 2024 by the TCE data manager, and merged with the ctDNA data prior to hand-off for this analysis. The final locked and merged dataset, labeled "surmount184_merged_20241108.csv" is maintained in the TCE box for the ctDNA analysis, and a copy is being stored in the FinalProject_files. +The ctDNA assessment was performed after bespoke panel development on tissue on peripheral blood from 109 patients by Neogenomics, inc. and provided back to the research team in .csv format, with the last data drop occurring July 30, 2024. DTC assessment was performed based on bone marrow assessment and ultimately entered into REDCap database by the research team through this same follow-up date. Clinical and demographic factors--and follow-up data--were abstracted by the research team through October 2024 and entered into the same REDCap database. Data were exported in mid-October 2024 by the TCE data manager, and merged with the ctDNA data prior to hand-off for this analysis. The final locked and merged dataset, labeled "surmount184_merged_20241108.csv" is maintained in the TCE box for the ctDNA analysis, and a copy is being stored in the FinalProject_files. **First,** we will import csv of final data, which is entitled "surmount184_merged_20241108.csv" @@ -49,7 +49,7 @@ d <- read.csv(file = here("FinalProject_files", ``` -**Next,** we will limit data to the 109 patients who had ctDNA tested, of the 184 individuals. we will look at the names and structures of the variables in the dataset "d", of which there are 387, the majority of which are clinical variables, but some of which are outcome variables. +**Next,** we will limit the data to the 109 patients who had ctDNA tested, of the 184 individuals who were included in the initial CLEVER trial screening group. We will look at the names and structures of the variables in the dataset "d", of which there are 387, the majority of which are clinical variables (often categorical dummy variables), but some of which are outcome variables related to local relapse, distant relapse, and survival as well as to the pathology report accounting for DTC. ```{r} @@ -59,10 +59,9 @@ str(d) ``` -**Summary variables:** We have a few different important summary variables which we've identified. -summary variables: final_overall_stage final_t_stage final_n_stage, final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') final_tumor_grade final_histology, demo_race_final fu_locreg_site_num (numeric values for local regional site), fu_locreg_site_char (character values for local regional site), fu_dist_site_num (numeric values for distant site), fu_dist_site_char (character values for distant site), censor_date (most recent fu_date_to among patients who are alive without local or distant progression). +**Summary variables:** We have a few different important summary variables which we've identified. Summary variables: final_overall_stage final_t_stage final_n_stage, final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') final_tumor_grade final_histology, demo_race_final fu_locreg_site_num (numeric values for local regional site), fu_locreg_site_char (character values for local regional site), fu_dist_site_num (numeric values for distant site), fu_dist_site_char (character values for distant site), censor_date (most recent fu_date_to among patients who are alive without local or distant progression). -**Limiting from the overall cohort (184) to the ctDNA cohort**: We know that this data merge contains 184 individuals (as this was the overall cohort of individuals that were screened for the CLEVER interventional study on SURMOUNT), but also know, from the separate ctDNA csv and the information from the Neogenomics summary data, that there were 109 individuals on whom ctDNA was assessed. We need to limit the data set "d" to this "ctDNA cohort"--we will call the ctDNA cohort "subset_data." We have an indicator variable "ctDNA_cohort" with which we can limit this subset. +**Limiting from the overall cohort (184) to the ctDNA cohort**: We know that this data merge contains 184 individuals (as this was the overall cohort of individuals that were screened for the CLEVER interventional study on SURMOUNT), but also know, from the separate ctDNA csv and the information from the Neogenomics summary data, that there were 109 individuals on whom ctDNA was assessed. We need to limit the data set "d" to this "ctDNA cohort"--we will call the ctDNA cohort "subset_data." We have an indicator variable "ctDNA_cohort" with which we can limit this subset. ```{r} @@ -91,15 +90,16 @@ unique_count ``` -Now that we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected. We can see, looking at the table by sample count using the ctDNA_detected variable (false = negative/ctDNA was NOT detected, true = positive/ctDNA was detected), that there were 385 negative samples, and 11 positive samples within the ctDNA cohort. Next, we will create the ctDNA_ever variable that will, by participant_id (which is the unique study ID), tell us if that participant ever had ctDNA detected. +Now that we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected. We can see, looking at the table by sample count using the ctDNA_detected variable (false = negative/ctDNA was NOT detected, true = positive/ctDNA was detected), that there were 385 negative samples, and 11 positive samples within the ctDNA cohort. Next, we will create the ctDNA_ever variable that will, by participant_id (which is the unique study ID), tell us if that participant ever had ctDNA detected. -``` {r} +```{r} #ctDNA_detected = character, ok names(subset_data) ### Excluding the FAILS from this cohort ######create the ctDNA Ever positive variable table(subset_data$ctDNA_detected) #385 FALSE, 11 TRUE +table(d$ctDNA_detected) # Create the 'ctDNA_ever' variable: # This will be 1 if ctDNA_detected was 1 for any record for the participant, otherwise 0. @@ -118,12 +118,11 @@ subset_data |> ``` -We can see here using the summary variable ctDNA_ever that there are 100 individuals with always negative results, and 9 individuals with "ever positive" ctDNA results, which matches our original ctDNA source data. +We can see here using the summary variable ctDNA_ever that there are 100 individuals with always negative results, and 9 individuals with "ever positive" ctDNA results, which matches our original ctDNA source data. -**Ever DTC Positive** -Next, we will create a variable to represent whether someone ever had a DTC positive test. To do this, we will use the final result variable "dtc_ihc_result_final" which tells us, for a given sample/date, whether that DTC result was positive ("1") or negative ("0"). We see in this data set, by sample, that there are 221 negatives, and 49 positives, which aligns with our prior data and consorts. +**Ever DTC Positive** Next, we will create a variable to represent whether someone ever had a DTC positive test. To do this, we will use the final result variable "dtc_ihc_result_final" which tells us, for a given sample/date, whether that DTC result was positive ("1") or negative ("0"). We see in this data set, by sample, that there are 221 negatives, and 49 positives, which aligns with our prior data and consorts. -``` {r} +```{r} names(subset_data) #looking at the names of variables to find the DTC indicator variable library(stringr) @@ -148,1575 +147,1589 @@ subset_data |> count(dtc_ever) ``` -Looking at the number of DTC positives by unique participant, we see 70 DTC ever negatives, 39 positives, which aligns with our source data for this specific ctDNA cohort. - -Describe the data used and general methodological approach used to address the problem described in the @sec-introduction. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why. +Looking at the number of DTC positives by unique participant, we see 70 DTC ever negatives, 39 positives, which aligns with our source data on DTC positivity for this specific ctDNA cohort. ------------------------------------------------------------------------ -```{r} +## Results {#sec-results} +**Sample and Testing Information:** +In this cohort of 109 individuals who had ctDNA and DTC testing, 100 remained persistently ctDNA negative, and 70 remained persistently DTC negative--with 9 respective ctDNA positive individuals and 39 DTC positive individuals. Of 184 pts enrolled from 2016 – 2021, 121 had tissue available; 114/121 (94%) had successful WES (prior data/NeoGenomics data). +``` {r} +#counts for ctDNA positivity +subset_data |> + filter(ctDNA_ever == "TRUE") |> + summarize(unique_participants = n_distinct(participant_id)) -``` +table(subset_data$ctDNA_detected) #385 FALSE, 11 TRUE +table(d$ctDNA_detected) #385 false, 11 true, 8 fails -## Results {#sec-results} +# Count unique participants with FAIL in ctDNA_detected (this is in database d, the original database, not in the ctDNA cohort, as these patients were excluded from the cohort) +num_fail <- d |> + filter(ctDNA_detected == "Fail") |> # Filter rows where ctDNA_detected is FAIL + distinct(participant_id) |> # Select unique participant_id + nrow() # Count the number of rows -You can add options to executable code like this +num_fail #4 individuals with Fails in original d dataset -```{r} -#| echo: false #disables the printing of code (only output is displayed) -library(dplyr) -########### Variables to look at for Table 1 ######### -###### median age at diagnosis +#timepoints of positivity. 2 at baseline, 7 after. +subset_data |> + filter(ctDNA_ever == "TRUE") |> + group_by(participant_id) |> + summarize(positive_timepoints = list(timepoint)) -names(subset_data) #to identify the variables I want to use -str(subset_data$diag_date_1) #character -str(subset_data$demo_dob) #character -d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y") -d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y") +subset_data |> + filter(ctDNA_detected == "TRUE", timepoint == "SURMOUNT-Baseline") |> + summarize(count_SURMOUNT_Baseline = n()) -str(d$diag_date_1) #dates! -str(d$demo_dob) #dates! +#eVAF -### doing the same for subset_data as it didn't carry over into that data set -subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y") -subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y") +subset_data |> + filter(ctDNA_ever == "TRUE") |> + summarize( + mean_eVAF = mean(eVAF, na.rm = TRUE), + median_eVAF = median(eVAF, na.rm = TRUE), + sd_eVAF = sd(eVAF, na.rm = TRUE), + min_eVAF = min(eVAF, na.rm = TRUE), + max_eVAF = max(eVAF, na.rm = TRUE) + ) -# calculating age from date of diagnosis to dob -subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25 -head(subset_data$age_at_diag) +#### DTC counts -summary(subset_data$age_at_diag) #median 48.75 +#counts for DTC positivity --> 39 +subset_data |> + filter(dtc_ever == 1) |> + summarize(unique_participants = n_distinct(participant_id)) -age_summary <- subset_data %>% - group_by(ctDNA_ever) %>% - summarise( - mean_age = mean(age_at_diag, na.rm = TRUE), # Calculate mean age - median_age = median(age_at_diag, na.rm = TRUE), # Calculate median age - sd_age = sd(age_at_diag, na.rm = TRUE), # Calculate standard deviation of age - n = n() # Number of participants in each group - ) +#timepoints of positivity. +subset_data |> + filter(dtc_ever == 1) |> + select(participant_id, timepoint) -print(age_summary) +# numbers at baseline -# Perform the Wilcoxon rank-sum test to compare the medians of age between ctDNA_ever positive and negative groups -wilcox_test_result <- wilcox.test(age_at_diag ~ ctDNA_ever, data = subset_data) +subset_data |> + filter(dtc_ihc_result_final == 1, timepoint == "SURMOUNT-Baseline") |> + summarize(count_SURMOUNT_Baseline = n()) -# Print the result -print(wilcox_test_result) +### Timepoint Data (# timepoints per patient) -#looking at range of age for the ctDNA pos vs neg groups -age_summary <- subset_data %>% - group_by(ctDNA_ever) %>% +# Timepoints per patient (median, range), overall +timepoints_per_patient <- subset_data %>% + group_by(participant_id) %>% summarise( - min_age = min(age_at_diag, na.rm = TRUE), # Minimum age - max_age = max(age_at_diag, na.rm = TRUE), # Maximum age + total_timepoints = n_distinct(timepoint), # Count distinct timepoints for each patient .groups = "drop" + ) %>% + summarise( + median_timepoints = median(total_timepoints, na.rm = TRUE), # Calculate median + min_timepoints = min(total_timepoints, na.rm = TRUE), # Calculate minimum + max_timepoints = max(total_timepoints, na.rm = TRUE) # Calculate maximum ) +timepoints_per_patient -# View the summary table for age -print(age_summary) - -##### Race: demo_race_final - -# Get the count of unique participant_ids for each category in demo_race_final -race_counts_unique_percent <- subset_data %>% - group_by(demo_race_final) %>% - summarise(unique_participants = n_distinct(participant_id)) %>% - mutate(percent = unique_participants / sum(unique_participants) * 100) - -# View the result -print(race_counts_unique_percent) - - - -# Count distinct participant_ids by ctDNA_ever and demo_race_final -count_distinct_participants <- subset_data %>% - group_by(demo_race_final, ctDNA_ever) %>% - summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop") - -# Print the result -count_distinct_participants - +# Timepoints of ctDNA assessment (`ctDNA_detected`) +ctDNA_timepoints <- subset_data %>% + filter(!is.na(ctDNA_detected)) %>% # Filter out NA values for ctDNA_detected + group_by(participant_id) %>% + summarise( + ctDNA_timepoints = n_distinct(timepoint), # Count distinct timepoints of ctDNA assessment + .groups = "drop" + ) %>% + summarise( + median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE), # Calculate median + min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE), # Calculate minimum + max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE) # Calculate maximum + ) +ctDNA_timepoints -# Step 1: Summarize by unique participant_id -summarized_data <- subset_data %>% +# Timepoints of DTC assessment (`dtc_ihc_results_final`) +dtc_timepoints <- subset_data %>% + filter(!is.na(dtc_ihc_result_final)) %>% # Filter out NA values for dtc_ihc_result_final group_by(participant_id) %>% summarise( - ctDNA_ever = first(ctDNA_ever), # Taking the first observed value of ctDNA_ever for each participant - demo_race_final = first(demo_race_final), # Taking the first observed value of demo_race_final for each participant + dtc_timepoints = n_distinct(timepoint), # Count distinct timepoints of DTC assessment .groups = "drop" + ) %>% + summarise( + median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE), # Calculate median + min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE), # Calculate minimum + max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE) # Calculate maximum ) +dtc_timepoints -# Step 2: Create the contingency table -contingency_table <- table(summarized_data$ctDNA_ever, summarized_data$demo_race_final) -contingency_table -# Step 3: Perform the chi-squared test of independence -chisq_test <- chisq.test(contingency_table) +# Print all summaries +print("Timepoints per patient:") +print(timepoints_per_patient) -# Step 4: Print the result p val - 0.91 -chisq_test +print("Timepoints of ctDNA assessment:") +print(ctDNA_timepoints) +print("Timepoints of DTC assessment:") +print(dtc_timepoints) -#####receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') -# Breakdown of final_receptor_group by unique participant_id -receptor_status_by_participant <- subset_data %>% - group_by(participant_id) %>% - summarise(final_receptor_group = first(final_receptor_group), # Or choose the most frequent group if needed - .groups = "drop") +``` +A total of 396 plasma samples from 109 pts (median 2 timepoints each, range 1-12) have been successfully tested by RaDaR to date, with 8 failed samples across 4 unique individuals. These 4 individuals were excluded from the ctDNA cohort as they did not ultimately have succcesful ctDNA assessment. These 109 individuals had a median of 2 DTC assessment timepoints (range 1-6). -# View the result -table(receptor_status_by_participant$final_receptor_group) +Overall, ctDNA was detected in 11 samples from 9/109 pts with a mean eVAF of 0.009% (range 0.002-0.084%). +Two pts were ctDNA+ at baseline (BL), and 7 became positive on surveillance. 100/109 were ctDNA- across all timepoints. 39/109 pts were DTC+, either at BL (n=26) or after (n=13). -# Summarizing data by participant_id, final_receptor_group, and ctDNA_ever -receptor_ctDNA_status <- subset_data %>% - group_by(participant_id) %>% - summarise( - final_receptor_group = first(final_receptor_group), # Or the most frequent if needed - ctDNA_ever = first(ctDNA_ever), # Taking the first observed value for ctDNA_ever - .groups = "drop" - ) -# Step 2: Create the contingency table -contingency_table_receptor <- table(receptor_ctDNA_status$final_receptor_group, receptor_ctDNA_status$ctDNA_ever) -contingency_table_receptor +``` {r} +# Filter and get unique participants by participant_id +concordance_overall_unique <- subset_data |> + distinct(participant_id, .keep_all = TRUE) |> + mutate(concordance = ifelse(dtc_ever == ctDNA_ever, "Concordant", "Discordant")) -# Step 3: Perform the chi-squared test of independence -chisq_test <- chisq.test(contingency_table_receptor) +# Count total concordant and discordant pairs for unique participants +overall_concordant <- sum(concordance_overall_unique$concordance == "Concordant") +overall_discordant <- sum(concordance_overall_unique$concordance == "Discordant") -# Step 4: Print the result # p-value 0.10 -chisq_test +# Proportion of concordance +proportion_concordant <- overall_concordant / (overall_concordant + overall_discordant) +cat("Overall Concordant (unique participants):", overall_concordant, "\n") +cat("Overall Discordant (unique participants):", overall_discordant, "\n") +cat("Overall Proportion Concordant (unique participants):", proportion_concordant, "\n") -#I was curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive) -#inclusion criteria inc_dx_crit___1 = TNBC (This has been confirmed with the study team) -#inc_dx_crit_list___1 +#Proportion concordance 63% (ever positive) +unique <- subset_data |> + group_by(participant_id) |> + summarize( + dtc_ever = max(dtc_ever, na.rm = TRUE), # Ensures 1 if DTC is ever detected + ctDNA_ever = max(ctDNA_ever, na.rm = TRUE) # Ensures 1 if ctDNA is ever detected + ) -TNBC_ctDNA_status <- subset_data %>% - group_by(participant_id) %>% +# Create the 2x2 table +table_ctDNA_dtc <- table(unique$ctDNA_ever, unique$dtc_ever) +print(table_ctDNA_dtc) +``` + +``` {r} +#Concordance by timepoint + +# Ensure that dtc_ihc_result_final is converted to TRUE/FALSE for consistency with ctDNA_detected +concordance_by_timepoint <- subset_data |> + filter(!is.na(dtc_ihc_result_final) & !is.na(ctDNA_detected)) |> + mutate( + # Convert dtc_ihc_result_final (1 = TRUE, 0 = FALSE) to match ctDNA format (TRUE/FALSE) + dtc_ihc_result_final_bool = ifelse(dtc_ihc_result_final == 1, TRUE, FALSE), + # Create a column to determine concordance (both DTC and ctDNA being TRUE or both being FALSE) + concordance = ifelse(dtc_ihc_result_final_bool == ctDNA_detected, "Concordant", "Discordant") + ) %>% + group_by(timepoint) %>% summarise( - inc_dx_crit_list___1 = first(inc_dx_crit_list___1), # Or the most frequent if needed - ctDNA_ever = first(ctDNA_ever), # Taking the first observed value for ctDNA_ever - .groups = "drop" + total_concordant = sum(concordance == "Concordant"), + total_discordant = sum(concordance == "Discordant"), + total_samples = n(), # Total number of samples at this timepoint + concordance_rate = total_concordant / total_samples # Concordance rate per timepoint ) -# Step 2: Create the contingency table -contingency_table_TNBC <- table(TNBC_ctDNA_status$inc_dx_crit_list___1, TNBC_ctDNA_status$ctDNA_ever) -contingency_table_TNBC +# Print concordance results for each timepoint +print(concordance_by_timepoint) -# Step 3: Perform the chi-squared test of independence -chisq_test <- chisq.test(contingency_table_TNBC) +# Now calculate overall concordance across all timepoints +overall_concordance <- sum(concordance_by_timepoint$total_concordant) / + sum(concordance_by_timepoint$total_samples) -# Step 4: p-val is 0.12 -chisq_test - +cat("Overall Concordance Rate across all timepoints:", overall_concordance, "\n") +#concordance, considering testing by timepoint, is 80% -### HR positive vs HR negative (Hormone receptor positive vs hormone receptor negative) -#first, I need to create a HR positive variable (HR_status) -subset_data <- subset_data |> - mutate(HR_status = case_when( - final_receptor_group %in% c(2, 3) ~ "HR+", - final_receptor_group %in% c(1, 4) ~ "Non-HR+", - TRUE ~ NA_character_ # In case there are missing or other unexpected values - )) +``` -# View the new HR_status variable -table(subset_data$HR_status) +Considering all timepoints, concordance was 63%, with higher concordance (80%) taking into account result concordance at each timepoint. Of 39 ever-DTC+ pts, 4 were ctDNA+ (of whom 3/4 recurred) and 35 remained ctDNA- (with 1/30 who recurred). -HR_status_by_participant <- subset_data %>% - group_by(participant_id) %>% - summarise(HR_status = first(HR_status), # Or use mode() if you have multiple rows per participant - .groups = "drop") -# View the result -table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-) -# Summarize ctDNA_detected status by HR_status, for each unique participant_id -summary_data <- subset_data %>% - group_by(participant_id) %>% - summarise( - HR_status = first(HR_status), # Get the HR_status for the participant - ctDNA_status = first(ctDNA_ever), # Get the ctDNA_detected status for the participant - .groups = "drop" - ) +**Test Characteristics** -contingency_table_HR <- table(summary_data$ctDNA_status, summary_data$HR_status) -contingency_table_HR -chisq_test <- chisq.test(contingency_table_HR) +Next, we will look at ctDNA and DTC test characteristics. First we will look at the association between ctDNA and DTC positivity. Next we will look at the number of tests. -# Print chi-squared test results #0.28 -chisq_test +```{r} +############ Test Characteristics and Baseline versus cumulative positivity (ctDNA to start)####### -###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported -# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported -summary_data <- subset_data %>% - filter(final_tumor_grade != 3) %>% # Exclude grade == 3 +### DTC by ctDNA (ever positive), association between test positivity. + +# link by participant id +subset_data_by_id <- subset_data %>% group_by(participant_id) %>% summarise( - grade = first(final_tumor_grade), # Get the final_tumor_grade for each participant - ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant + dtc = first(dtc_ever), # Get the ever dtc for each participant + ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant .groups = "drop" ) -# Create a contingency table of grade vs ctDNA_ever -contingency_table <- table(summary_data$grade, summary_data$ctDNA_ever) - -# View the contingency table -print(contingency_table) +# Create a contingency table of dtc vs ctDNA_ever +contingency_table <- table(subset_data_by_id$dtc, subset_data_by_id$ctDNA_ever) # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) -# View the Chi-squared test result -- p-value 0.0229 +# Print the contingency table and Chi-squared test results, p-val 0.839 +print(contingency_table) print(chisq_test) -######histology (final histology) -#people have different combinations of histology (1-15) -table(subset_data$participant_id, subset_data$final_histology) - - histology_summary <- subset_data %>% - distinct(participant_id, final_histology) %>% # Get unique participant-histology combinations - group_by(final_histology) %>% # Group by histology type - summarise(count = n()) # Count the number of participants per histology type - - # View the summary table - print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology - - #trying to create Ductal, lobular, both, or other variables --> histology_category - subset_data <- subset_data %>% - mutate(histology_category = case_when( - grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular", # Both Ductal and Lobular - grepl("3", as.character(final_histology)) ~ "Ductal", # Ductal - grepl("14", as.character(final_histology)) ~ "Lobular", # Lobular - TRUE ~ "Other" # Any other combination - )) - - # Count the number of participants in each histology category - histology_counts <- subset_data %>% - group_by(histology_category) %>% - summarise(count = n_distinct(participant_id)) # Count distinct participants - - # View the counts -- adds up to 109! - print(histology_counts) - - #contingency table - library(tidyr) - contingency_table <- subset_data %>% - distinct(participant_id, histology_category, ctDNA_ever) %>% # Ensure each patient is counted once - count(histology_category, ctDNA_ever) %>% - pivot_wider(names_from = ctDNA_ever, values_from = n, values_fill = list(n = 0)) # Pivot the table to get ctDNA_ever as columns - - # 3. Perform the Chi-squared test of independence - chisq_test <- chisq.test(contingency_table[,-1]) # Remove the histology_category column for the test - - # 4. Print the contingency table - print(contingency_table) - - # 5. Print the result of the Chi-squared test p-value - 0.2276 - print(chisq_test) - - - -#### Staging N stage (Nodal stage) -table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3) - -nodal_summary <- subset_data %>% - distinct(participant_id, final_n_stage) %>% # Get unique participant-stage combinations - group_by(final_n_stage) %>% # Group by stage - summarise(count = n()) # Count the number of participants per histology type - -#View the summary table --adds up to 109, 46 = pN0 63 = pN1 - print(nodal_summary) - - subset_data_by_id <- subset_data %>% - filter(final_n_stage %in% c(0, 1, 2, 3)) %>% # Include only relevant nodal stages - group_by(participant_id) %>% - summarise( - nodal_status = first(final_n_stage), # Use final_n_stage as nodal_status for each participant - ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant - .groups = "drop" - ) - - #Create a contingency table of nodal_status vs ctDNA_ever - contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$ctDNA_ever) - - # Check if any cells in the contingency table have zero counts, which could affect test validity - print(contingency_table) - - # Step 5: Perform Chi-squared test - chisq_test <- chisq.test(contingency_table) - - # Step 6: Print the Chi-squared test result p = 0.0001 - print(chisq_test) - - - #### Node positive versus node negative: Using the final n stage to create a Node - vs node + variable from this summary indicator variable - subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% - summarise( - node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"), # Node negative if 0, positive otherwise - ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant - .groups = "drop" - ) - - #adding node_status to subset_data - subset_data <- subset_data %>% - left_join(subset_data_by_id %>% select(participant_id, node_status), by = "participant_id") - - - #Create a contingency table of node_status vs ctDNA_ever - contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$ctDNA_ever) - - # Perform the Chi-squared test - chisq_test <- chisq.test(contingency_table) - - #Print the contingency table and Chi-squared test results - print(contingency_table) - print(chisq_test) - -#######Looking at T stage or tumor size: the variable is final_t_stage - - table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (99 = pTx, cannot evaluate) so can proceed with this - - t_summary <- subset_data %>% - distinct(participant_id, final_t_stage) %>% # Get unique participant-stage combinations - group_by(final_t_stage) %>% # Group by stage - summarise(count = n()) # Count the number of participants per histology type - - # View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2 - print(t_summary) - - - #for our T stage table, will use T1 vs T2 or greater to simplify, and we want to exclude 99 (the pTx). We will create "final_t_stage_combined" to represent this. - subset_data_clean <- subset_data %>% - filter(final_t_stage != 99, ctDNA_ever != 99) - - # Combine final_t_stage into T1 vs. T2 or greater - subset_data_clean <- subset_data_clean %>% - mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater")) - - # Summarize the data by participant_id after creating the new combined t_stage - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% - summarise( - final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant - ) - - # Create a contingency table of final_t_stage_combined vs ctDNA_ever - contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever) - - # Perform the Chi-squared test - chisq_test <- chisq.test(contingency_table) - - # Print the contingency table and Chi-squared test results. P value = 0.6 - print(contingency_table) - print(chisq_test) +##### Test stuff (#s and such of tests) -#### I looked at a different cut-off for T stage stats using T3 or greater as cutoff and didn't see any significant difference so am not using this for the table. - - #exclude 99 (the pTx) - subset_data_clean <- subset_data %>% - filter(final_t_stage != 99, ctDNA_ever != 99) - - # Combine final_t_stage into T1/T2 or T3 or greater - subset_data_clean <- subset_data_clean %>% - mutate(final_t_stage_combined = case_when( - final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2", # Group T1 and T2 together - final_t_stage >= 3 ~ "T3 or greater", # Group T3 and higher as a separate category - TRUE ~ NA_character_ # Handle any unexpected values - )) - - - # Summarize the data by participant_id after creating the new combined t_stage - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% - summarise( - final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant - ) - - # Create a contingency table of final_t_stage_combined vs ctDNA_ever - contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever) - - # Perform the Chi-squared test - chisq_test <- chisq.test(contingency_table) - - # Print the contingency table and Chi-squared test results --> not significant so ignore this - print(contingency_table) - print(chisq_test) - - - - ########Overall stage of disease -- final_overall_stage +#number of tests (ctDNA) +library(dplyr) - table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this - - stage_summary <- subset_data %>% - distinct(participant_id, final_overall_stage) %>% # Get unique participant-stage combinations - group_by(final_overall_stage) %>% # Group by stage - summarise(count = n()) # Count the number of participants per histology type - - # View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay) - print(stage_summary) - - #exclude the 99 - subset_data_clean <- subset_data %>% - filter(final_overall_stage != 99, ctDNA_ever != 99) - - # Summarize the data by participant_id - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% - summarise( - final_overall_stage = first(final_overall_stage), # Get the final_overall_stage for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant - ) - - # Create a contingency table of final_overall_stage vs ctDNA_ever - contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$ctDNA_ever) - - # Perform the Chi-squared test - chisq_test <- chisq.test(contingency_table) - - # Print the contingency table and Chi-squared test results --> hot dogggg p val = 0.006. Higher stage is associated with ctDNA_ever. - print(contingency_table) - print(chisq_test) - - +# Assuming the status variable is named `ctDNA_detected` in d, and then in subset +status_summary_d <- d %>% + group_by(ctDNA_detected) %>% + summarise(total_samples = n(), .groups = "drop") +# Print the summary -- we've got 385 FALSE, 8 FAILS, 11 TRUES +print(status_summary_d) -###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) - - table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness - - surgery <- subset_data %>% - distinct(participant_id, diag_surgery_type_1) %>% # Get unique participant-stage combinations - group_by(diag_surgery_type_1) %>% # Group by stage - summarise(count = n()) # Count the number of participants per histology type - - # View the summary table - print(surgery) - - - # Summarize the data by participant_id - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% - summarise( - surgery = first(diag_surgery_type_1), # Get the final_overall_stage for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant - ) - - # Create a contingency table of final_overall_stage vs ctDNA_ever - contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$ctDNA_ever) - - # Perform the Chi-squared test - chisq_test <- chisq.test(contingency_table) - - # Print the contingency table and Chi-squared test results --> p-val = 1.... - print(contingency_table) - print(chisq_test) - - +#looking at the number of Fails by unique participant_id +fail_count <- d %>% + filter(ctDNA_detected == "Fail") %>% # Filter for rows where status is "FAIL" + distinct(participant_id) %>% # Get unique participant IDs + summarise(total_fails = n()) # Count unique participant IDs -######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms). Created a new variable (axillary_dissection) +# Print the result -- 4 individuals with FAIL results, which is what we got in the consort +print(fail_count) +fail_count <- subset_data %>% + filter(ctDNA_detected == "Fail") %>% # Filter for rows where status is "FAIL" + distinct(participant_id) %>% # Get unique participant IDs + summarise(total_fails = n()) # Count unique participant IDs - table(subset_data$diag_axillary_type___2_1) - table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two - - # Create a binary variable to identify participants who had axillary dissection - subset_data_clean <- subset_data %>% - mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) - - subset_data <- subset_data %>% - mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) - - # Ensure every participant has a ctDNA_ever and axillary_dissection value - # Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one - subset_data_clean <- subset_data %>% - mutate(axillary_dissection = case_when( - diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1, # Had axillary dissection - TRUE ~ 0 # No axillary dissection (includes missing values) - )) - - # Summarize the data by participant_id, including the axillary_dissection and ctDNA_ever variables - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% - summarise( - axillary_dissection = first(axillary_dissection), # Get the axillary dissection status for each participant - ctDNA_ever = first(ctDNA_ever) # Get the ctDNA_ever status for each participant - ) - - contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$ctDNA_ever) - - # Perform the Chi-squared test - chisq_test <- chisq.test(contingency_table) - - # Print the contingency table and Chi-squared test results --> p-value 0.173 - print(contingency_table) - print(chisq_test) +# Print the result -- none of the fails were pulled into the ctDNA cohort +print(fail_count) -####inflammatory (variable inflamm_yn)-- do not include inflammatory variable as there were NO inflammatory breast cancers in the ctDNA cohort. -table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable) -table(d$inflamm_yn_2) ### I think inflammatory folks just not in subset of patients in the ctDNA cohort -table(subset_data$inflamm_yn) - +#number of DTC tests in this cohort of 109 patients -#### radiation prtx_radiation -table(subset_data$prtx_radiation) +unique(subset_data$dtc_ihc_result_final) #NA, 0, and 1 -radiation <- subset_data |> - distinct(participant_id,prtx_radiation) |> - group_by(prtx_radiation) |> # Group by stage - summarise(count = n()) # Count the number of participants per histology type +status_summary_subset <- subset_data %>% + group_by(dtc_ihc_result_final) %>% + summarise(total_samples = n(), .groups = "drop") -# View the summary table -print(radiation) +# Print the summary -- we've got 221 negatives, 49 positives, 128 NAs, across 39 patients (positive) and 70 patients (negative) +#### confirm with nick that not missing the NAs, but I suspect based on the below that we are fine and thse are just ctDNA only timepoints +print(status_summary_subset) -# Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +### looking at NAs -- all of them have FALSE (so i think these are all the ones that had ctDNA timepoints ) +na_participants_dtc <- subset_data %>% + filter(is.na(dtc_ihc_result_final)) %>% + select(participant_id, dtc_ihc_result_final, FINAL_RESULT, ORIG_RSLT_DTC, ctDNA_detected, timepoint) + +# Print the list of participant IDs with NA in `dtc_ihc_result_final`-- they all have FALSE ctDNA results, so these are the ctDNA timepoints +#all of the timepoints are long-term except for CLEVER baseline. +print(na_participants_dtc, n=128) + +#look at timepoints +unique_timepoints <- unique(subset_data$timepoint) +print(unique_timepoints) + + + +##### eVAF +names(subset_data) #use eVAF + +# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE` +eVAF_range_ctDNA_detected_percent <- subset_data %>% + filter(ctDNA_detected == TRUE) %>% # Filter for those with ctDNA detected summarise( - radiation = first(prtx_radiation), # xrt for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + median_eVAF_percent = median(eVAF, na.rm = TRUE) * 100, # Convert median to percentage + min_eVAF_percent = min(eVAF, na.rm = TRUE) * 100, # Convert minimum to percentage + max_eVAF_percent = max(eVAF, na.rm = TRUE) * 100 # Convert maximum to percentage ) -# Create a contingency table of final_overall_stage vs ctDNA_ever -contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$ctDNA_ever) +# Print the result +print(eVAF_range_ctDNA_detected_percent) -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) +#### DTC counts +names(subset_data) #use dtc_ihc_summary_count_final -# Print the contingency table and Chi-squared test results --> p-val = 0.33 -print(contingency_table) -print(chisq_test) +# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE` +dtc_count <- subset_data %>% + filter(dtc_ihc_result_final == 1) %>% # Filter for those with dtcs detected + summarise( + median_dtc_count = median(dtc_ihc_summary_count_final, na.rm = TRUE), + min_dtc_count = min(dtc_ihc_summary_count_final, na.rm = TRUE), + max_dtc_count = max(dtc_ihc_summary_count_final, na.rm = TRUE) + ) +# Print the result +print(dtc_count) -#### chemotherapy prtx_chemo -table(subset_data$prtx_chemo) -chemo <- subset_data |> - distinct(participant_id,prtx_chemo) |> - group_by(prtx_chemo) |> # Group by stage - summarise(count = n()) # Count the number of participants per histology type +#### Number of timepoints we see -# View the summary table -print(chemo) #3 people did not get chemo in this cohort +# Timepoints per patient (median, range) +timepoints_per_patient <- subset_data %>% + group_by(participant_id) %>% + summarise( + total_timepoints = n_distinct(timepoint), # Count distinct timepoints for each patient + .groups = "drop" + ) %>% + summarise( + median_timepoints = median(total_timepoints, na.rm = TRUE), # Calculate median + min_timepoints = min(total_timepoints, na.rm = TRUE), # Calculate minimum + max_timepoints = max(total_timepoints, na.rm = TRUE) # Calculate maximum + ) -# Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% +# Timepoints of ctDNA assessment (`ctDNA_detected`) +ctDNA_timepoints <- subset_data %>% + filter(!is.na(ctDNA_detected)) %>% # Filter out NA values for ctDNA_detected + group_by(participant_id) %>% + summarise( + ctDNA_timepoints = n_distinct(timepoint), # Count distinct timepoints of ctDNA assessment + .groups = "drop" + ) %>% + summarise( + median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE), # Calculate median + min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE), # Calculate minimum + max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE) # Calculate maximum + ) + +# Timepoints of DTC assessment (`dtc_ihc_results_final`) +dtc_timepoints <- subset_data %>% + filter(!is.na(dtc_ihc_result_final)) %>% # Filter out NA values for dtc_ihc_result_final group_by(participant_id) %>% summarise( - chemo = first(prtx_chemo), # chemo for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + dtc_timepoints = n_distinct(timepoint), # Count distinct timepoints of DTC assessment + .groups = "drop" + ) %>% + summarise( + median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE), # Calculate median + min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE), # Calculate minimum + max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE) # Calculate maximum ) -# Create a contingency table of final_overall_stage vs ctDNA_ever -contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$ctDNA_ever) +# Print all summaries +print("Timepoints per patient:") +print(timepoints_per_patient) -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) +print("Timepoints of ctDNA assessment:") +print(ctDNA_timepoints) -# Print the contingency table and Chi-squared test results --> p-val = 0.59 -print(contingency_table) -print(chisq_test) +print("Timepoints of DTC assessment:") +print(dtc_timepoints) +### timepoints on clinical trial ### Ask Nick -- should we include all the timepoints on trial technically +#(CLEVER-Baseline, EOO, C3, C6, C12, 6M F/U, etc.) or just the ones while on treatment (C3, C6, C12) +#, or only the ones while patiennts are +unique_timepoints <- unique(subset_data$timepoint) +print(unique_timepoints) -####neoadjuvant chemo -- there are two variables for this that could theoretically be included: diag_neoadj_chemo_1 or diag_neoadj_chemo_2 +trial_timepoints <- c("CLEVER-Baseline", "EOO", "C3", "C6", "C12", "6M F/U", "12M F/U", "18M F/U", "24M F/U", "30M F/U", "36M F/U") -table(subset_data$diag_neoadj_chemo_1) -table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable +# Count the number of samples by timepoint (for specific clinical trial timepoints) +samples_by_trial_timepoint <- subset_data %>% + filter(timepoint %in% trial_timepoints) %>% # Filter for relevant timepoints + group_by(timepoint) %>% # Group by timepoint + summarise( + total_samples = n_distinct(participant_id), # Count distinct participant_ids (samples) + .groups = "drop" # Remove grouping after summarizing + ) -nact <- subset_data |> - distinct(participant_id,diag_neoadj_chemo_1) |> - group_by(diag_neoadj_chemo_1) |> # Group by stage - summarise(count = n()) # Count the number of participants per histology type +# Print the result +print(samples_by_trial_timepoint) #total samples on trial (ctDNA and dtC) -# View the summary table -print(nact) #3 people did not get chemo in this cohort +#### ctDNA on trial -# Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +ctDNA_samples_by_timepoint <- subset_data %>% + filter(timepoint %in% trial_timepoints, !is.na(ctDNA_detected)) %>% # Filter for relevant timepoints and ctDNA detected + group_by(timepoint) %>% # Group by timepoint summarise( - nact = first(diag_neoadj_chemo_1), # NACT for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + total_samples_ctDNA = n_distinct(participant_id), # Count distinct participant_ids (ctDNA samples) + .groups = "drop" # Remove grouping after summarizing ) -# Create a contingency table of NACT vs ctDNA_ever -contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$ctDNA_ever) +# Print the result for ctDNA samples -- 32 CLEVER-Baseline, 9 EOO, 20 C3, 28 C6, 4 C12, 27 6M, 18 12M, 13 18M, 13 24M, 12 30M, 18 36M +print(ctDNA_samples_by_timepoint) -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) -# Print the contingency table and Chi-squared test results --> p-val = 0.95 -print(contingency_table) -print(chisq_test) +##### DTC by trial timepoint +# Count the number of DTC samples by timepoint (for specific clinical trial timepoints) +dtc_samples_by_timepoint <- subset_data %>% + filter(timepoint %in% trial_timepoints, !is.na(dtc_ihc_result_final)) %>% # Filter for relevant timepoints and DTC results + group_by(timepoint) %>% # Group by timepoint + summarise( + total_samples_dtc = n_distinct(participant_id), # Count distinct participant_ids (DTC samples) + .groups = "drop" # Remove grouping after summarizing + ) +# Print the result for DTC samples -- makes sense, no CLEVER baseline timepoints, 9 EOO, 19 C3, 28 C6, 4 C12, 19 6M F/U +print(dtc_samples_by_timepoint) -####hormone therapy prtx_endo +#### Number of ctDNA timepoints on surmount +print(unique_timepoints) +surmount_timepoints <- c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", "Year 3 Follow Up", "Year 4 Follow Up", "Long Term FU 1", "Long Term FU 2") -table(subset_data$prtx_endo) +ctDNA_surmount <- subset_data %>% + filter(timepoint %in% surmount_timepoints, !is.na(ctDNA_detected)) %>% # Filter for relevant timepoints and ctDNA detected + group_by(timepoint) %>% # Group by timepoint + summarise( + total_samples_ctDNA = n_distinct(participant_id), # Count distinct participant_ids (ctDNA samples) + .groups = "drop" # Remove grouping after summarizing + ) -endo <- subset_data |> - distinct(participant_id,prtx_endo) |> - group_by(prtx_endo) |> # Group by stage - summarise(count = n()) # Count the number of participants per histology type +# Print the result for ctDNA samples -- 109 Baseline, Y1 FU 40, Y2 FU 25, Y3 FU 14, Y4 FU 4, LT FU 10, LT 2 FU 2 +print(ctDNA_surmount) -# View the summary table -print(endo) #most ppl did get endo (62 of the 109) -# Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +### number of DTC timepoints on surmount +# Count the number of DTC samples by timepoint +dtc_timepoint_surmount <- subset_data %>% + filter(timepoint %in% surmount_timepoints, !is.na(dtc_ihc_result_final)) %>% # Filter for relevant timepoints and DTC results + group_by(timepoint) %>% # Group by timepoint summarise( - endo = first(prtx_endo), # Get the final_overall_stage for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + total_samples_dtc = n_distinct(participant_id), # Count distinct participant_ids (DTC samples) + .groups = "drop" # Remove grouping after summarizing ) -# Create a contingency table of final_overall_stage vs ctDNA_ever -contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$ctDNA_ever) +# Print the result for DTC samples -- +print(dtc_timepoint_surmount) -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) -# Print the contingency table and Chi-squared test results --> p-val = 0.33 -print(contingency_table) -print(chisq_test) +#### positivity by timepoint -- ctDNA + +ctDNA_pos_rate_by_timepoint <- subset_data %>% + filter(!is.na(ctDNA_detected)) %>% # Ensure we are considering only non-missing ctDNA_detected values + group_by(timepoint, participant_id) %>% # Group by timepoint and participant + summarise( + ctDNA_pos = max(ctDNA_detected == TRUE), # If any value is TRUE, participant is ctDNA positive at that timepoint + .groups = "drop" + ) %>% + group_by(timepoint) %>% # Group again by timepoint to calculate the positivity rate + summarise( + positivity_rate = mean(ctDNA_pos), # Calculate the positivity rate for each timepoint + total_samples = n_distinct(participant_id), # Count the number of distinct participants + .groups = "drop" + ) +# Print the result for ctDNA positivity rate by timepoint +print(ctDNA_pos_rate_by_timepoint) +# Calculate cumulative ctDNA positivity rate by timepoint +ctDNA_pos_rate_cumulative <- ctDNA_pos_rate_by_timepoint %>% + arrange(timepoint) %>% # Ensure the data is sorted by timepoint + mutate( + cumulative_pos_rate = cumsum(positivity_rate * total_samples) / cumsum(total_samples) # Cumulative positivity rate + ) -####bone modifying agents prtx_bonemod +print(ctDNA_pos_rate_cumulative) -table(subset_data$prtx_bonemod) +#### Cumulative positivity ctDNA -bonemod <- subset_data |> - distinct(participant_id,prtx_bonemod) |> - group_by(prtx_bonemod) |> # Group by stage - summarise(count = n()) # Count the number of participants per histology type +library(dplyr) -# View the summary table -print(bonemod) #most ppl did get endo (39 got bonemod) +# Calculate ctDNA positivity rate by participant +ctDNA_pos_rate <- subset_data %>% + filter(!is.na(ctDNA_detected)) %>% # Exclude missing ctDNA results + group_by(participant_id) %>% # Group by participant + summarise( + ctDNA_pos = max(ctDNA_detected == TRUE), # If any value is TRUE, participant is ctDNA positive + .groups = "drop" + ) -# Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +# Calculate cumulative positivity rate +ctDNA_pos_rate_cumulative <- ctDNA_pos_rate %>% summarise( - bonemod = first(prtx_bonemod), # Get bone mod status for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + total_pos = sum(ctDNA_pos), # Total number of ctDNA positive participants + total_samples = n(), # Total number of participants + cumulative_pos_rate = total_pos / total_samples # Cumulative positivity rate ) -# Create a contingency table of bonemod vs ctDNA_ever -contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$ctDNA_ever) +# Print the cumulative positivity rate +print(ctDNA_pos_rate_cumulative) -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) -# Print the contingency table and Chi-squared test results --> p-val = 0.84 -print(contingency_table) -print(chisq_test) +# Count the number of positive ctDNA samples and total samples +ctDNA_pos_vs_total <- subset_data %>% + filter(!is.na(ctDNA_detected)) %>% # Exclude missing ctDNA results + summarise( + total_samples = n(), # Total number of ctDNA samples + positive_samples = sum(ctDNA_detected == TRUE), # Count of positive ctDNA samples + .groups = "drop" + ) %>% + mutate( + positivity_rate = positive_samples / total_samples # Proportion of positive ctDNA samples + ) -#### PCR -- did NOT include this in Table 1 as it aligns closely with NACT) -# 2 = non-pcr, 1 = pcr -#the variables of interest for path cr: diag_pcr_1 or diag_pcr_2 -table(subset_data$diag_pcr_1) -table(subset_data$diag_pcr_2) #none recorded here so can just use diag_pcr_1 +# Print the results +print(ctDNA_pos_vs_total) -pcr <- subset_data %>% - mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) %>% # Convert "." to NA - filter(!is.na(diag_pcr_1)) %>% # Exclude rows where diag_pcr_1 is NA - distinct(participant_id, diag_pcr_1) %>% - group_by(diag_pcr_1) %>% - summarise(count = n()) # Count the number of participants per histology type -# View the summary table -print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data +#### cumulative positivity DTC -# Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +# Calculate ctDNA positivity rate by participant +DTC_pos_rate <- subset_data %>% + filter(!is.na(dtc_ihc_result_final)) %>% # Exclude missing ctDNA results + group_by(participant_id) %>% # Group by participant summarise( - pcr = first(diag_pcr_1), # Get pcr for each participant - ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + dtc = max(dtc_ihc_result_final == 1), # If any value is TRUE, participant is ctDNA positive + .groups = "drop" ) -# Create a contingency table of pcr vs ctDNA_ever -contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$ctDNA_ever) - -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) - -# Print the contingency table and Chi-squared test results --> p-val = 0.86 -- does not seem to be association among those who got pcr (but also we have a group with 1 in it...) -print(contingency_table) -print(chisq_test) - +# Calculate cumulative positivity rate +DTC_pos_rate_cumulative <- DTC_pos_rate %>% + summarise( + total_pos = sum(dtc), # Total number of ctDNA positive participants + total_samples = n(), # Total number of participants + cumulative_pos_rate = total_pos / total_samples # Cumulative positivity rate + ) +# Print the cumulative positivity rate +print(DTC_pos_rate_cumulative) -########recurrence -#local first, then distant.then create summary variable of either locreg or distant -#local fu_locreg_prog -# Step 1: Summarize data by unique participant_id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +# Count the number of positive ctDNA samples and total samples +dtc_pos_vs_total <- subset_data %>% + filter(!is.na(dtc_ihc_result_final)) %>% # Exclude missing ctDNA results summarise( - fu_locreg_prog = first(fu_locreg_prog), # Get fu_locreg_prog status for each participant - ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + total_samples = n(), # Total number of ctDNA samples + positive_samples = sum(dtc_ihc_result_final == 1), # Count of positive ctDNA samples .groups = "drop" + ) %>% + mutate( + positivity_rate = positive_samples / total_samples # Proportion of positive ctDNA samples ) -# Step 2: Create a contingency table of fu_locreg_prog vs ctDNA_ever -contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$ctDNA_ever) +# Print the results +print(dtc_pos_vs_total) + -# Step 3: Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) +``` -# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 -print(contingency_table) -print(chisq_test) +**Test Characteristics of ctDNA assay**: Next we will look at the sensitivity and specificity of the ctDNA assay. -####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char -### Just want to look at site distribution here +```{r} +###### Test characteristics ctDNA +#trying to do ctDNA 2x2 with ever relapsed on a patient level -# Summarize the distribution of fu_locreg_site_char by unique participant_id -site_distribution <- subset_data %>% - group_by(participant_id) %>% - summarise( - site = first(fu_locreg_site_char), # Get the site for each unique participant - .groups = "drop" - ) %>% - count(site) # Count the occurrences of each site +library(dplyr) -# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast -print(site_distribution) +#create ever_relapsed variable +subset_data <- subset_data %>% + mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No")) -#####distant recurrence: distant fu_dist_prog -# Step 1: Summarize data by unique participant_id -subset_data_by_id <- subset_data %>% +# Exclude participants with all NA for `ctDNA_ever` or `ever_relapsed` +summarized_data <- subset_data %>% + filter(!is.na(ctDNA_ever) | !is.na(ever_relapsed)) %>% # Keep rows with at least one non-NA value group_by(participant_id) %>% - summarise( - fu_dist_prog = first(fu_dist_prog), # Get fu_dist_prog status for each participant - ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant - .groups = "drop" + summarize( + ctDNA_ever = max(ctDNA_ever, na.rm = TRUE), + ever_relapsed = max(ever_relapsed, na.rm = TRUE) ) -# Step 2: Create a contingency table of dist prog vs ctDNA_ever --> 12 who had distant progression -contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$ctDNA_ever) +# Create the confusion matrix +confusion_matrix <- table(summarized_data$ctDNA_ever, summarized_data$ever_relapsed) -# Step 3: Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) +# Extract counts from the confusion matrix +TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0) # True Positives +FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0) # False Positives +TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0) # True Negatives +FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0) # False Negatives -# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 -print(contingency_table) -print(chisq_test) +# Calculate performance metrics +sensitivity <- TP / (TP + FN) # Sensitivity +specificity <- TN / (TN + FP) # Specificity +PPV <- TP / (TP + FP) # Positive Predictive Value +NPV <- TN / (TN + FN) # Negative Predictive Value +# Create a data frame for the table +performance_table <- data.frame( + Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"), + Value = c(sensitivity, specificity, PPV, NPV) +) -### Distant sites -#distant site fu_dist_site_num #fu_dist_site_char -- start justl ooking at the locations +# Print the table +print(performance_table) -# Summarize the distribution of fu_dist_site_char by unique participant_id -dist_site_distribution <- subset_data %>% - group_by(participant_id) %>% - summarise( - site = first(fu_dist_site_char), # Get the site for each unique participant - .groups = "drop" - ) %>% - count(site) # Count the occurrences of each site +#Format the table for better readability +library(knitr) +kable(performance_table, digits = 2, col.names = c("Metric", "Value")) -# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal -print(dist_site_distribution) +``` -##### ANY Recurrence -- this includes either fu_locreg_prog or fu_dist_prog +This ctDNA assay has high specificity (99%), with a high positive predictive value for relapse (89%) and also a high negative predictive value (94%). -#create ever_relapsed variable -subset_data <- subset_data %>% - mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No")) +```{r} +### Test characteristics for DTC -- and trial #s -# link by participant id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% - summarise( - ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant - ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant - .groups = "drop" - ) +library(dplyr) -# Create a contingency table of ever_relapsed vs ctDNA_ever -contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA_ever) +# Total unique DTC+ patients +total_dtc_plus <- subset_data %>% + filter(dtc_ihc_result_final == 1) %>% + distinct(participant_id) %>% + nrow() -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) +# Unique DTC+ patients who went on trial (those who have a trial ID fu_trial_pid) +dtc_plus_trial <- subset_data %>% + filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) %>% + distinct(participant_id) %>% + nrow() -# Print the contingency table and Chi-squared test results -- total 14 relapses, 8 were ctDNA +, 6 were not ever ctDNA positive -print(contingency_table) -print(chisq_test) +# Proportion of DTC+ patients who went on trial +proportion_trial <- dtc_plus_trial / total_dtc_plus -#### Relapse and DTCs -#using ever_relapsed and dtc_ever +# Display results +cat("Total unique DTC+ patients:", total_dtc_plus, "\n") +cat("Unique DTC+ patients who went on trial:", dtc_plus_trial, "\n") +cat("Proportion of DTC+ patients who went on trial:", proportion_trial, "\n") -# link by participant id -subset_data_by_id <- subset_data %>% +# All DTC + patients went on trial (39/39) + + +# Exclude participants with all NA for `dtc_ever` or `ever_relapsed` +summarized_data <- subset_data %>% + filter(!is.na(dtc_ever) | !is.na(ever_relapsed)) %>% # Keep rows with at least one non-NA value group_by(participant_id) %>% - summarise( - ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant - dtc = first(dtc_ever), # Get the ctDNA_ever status for each participant - .groups = "drop" + summarize( + dtc_ever = max(dtc_ever, na.rm = TRUE), + ever_relapsed = max(ever_relapsed, na.rm = TRUE) ) -# Create a contingency table of ever_relapsed vs ctDNA_ever -contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc) +# Create the confusion matrix +confusion_matrix <- table(summarized_data$dtc_ever, summarized_data$ever_relapsed) -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) +# Extract counts from the confusion matrix +TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0) # True Positives +FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0) # False Positives +TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0) # True Negatives +FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0) # False Negatives -# Print the contingency table and Chi-squared test results -print(contingency_table) -print(chisq_test) +# Calculate performance metrics +sensitivity <- TP / (TP + FN) # Sensitivity +specificity <- TN / (TN + FP) # Specificity +PPV <- TP / (TP + FP) # Positive Predictive Value +NPV <- TN / (TN + FN) # Negative Predictive Value -# Identify participants missing data in either `ever_relapsed` or `dtc_ever` -missing_data <- subset_data_by_id %>% - filter(is.na(ever_relapsed) | is.na(dtc)) +# Create a data frame for the table +performance_table <- data.frame( + Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"), + Value = c(sensitivity, specificity, PPV, NPV) +) -# Print the IDs of participants with missing data -print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above) +# Print the table +print(performance_table) -### look at ever_relapsed by ctDNA +#Format the table for better readability +library(knitr) +kable(performance_table, digits = 2, col.names = c("Metric", "Value")) -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% - summarise( - ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant - ctDNA = first(ctDNA_ever), # Get the ctDNA_ever status for each participant - .groups = "drop" - ) +``` -# Create a contingency table of ever_relapsed vs ctDNA_ever -contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA) +All of the 39 individuals who were DTC positive went onto an interventional treatment trial aimed at eliminating the presence of the DTCs. This is different from the workflow for ctDNA assessment, which occurred retrospectively--sometimes several years after testing--and was not the basis for any trial/intervention decision-making. It is therefore somewhat challenging to interpret the sensitivity and specificity of the DTC test, as relapse is the outcome and all of these patients are receiving an intervention aimed at eliminating the presence of the DTCs and thereby preventing relapse. The intervention after DTC assessment explains in part the low positive predictive value and the low sensitivity of the test. However, the high negative predictive value of 0.86 in the cohort--which is looking only at those who remained DTC negative and their outcomes (ie. those who did NOT get an intervention) suggests that repeat negative DTC testing (ie always remaining DTC negative on all testing) is valuable in predicting a good outcome (ie. NO relapse during follow-up). -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) + **Table 1** + + Next we will build our Table 1, looking at important clinical and demographic variables in this ctDNA cohort. -# Print the contingency table and Chi-squared test results, p < 0.00001 -print(contingency_table) -print(chisq_test) +```{r} -####survival: fu_survival +library(dplyr) -table(subset_data$fu_surv) +########### Variables to look at for Table 1 ######### +names(subset_data) #to identify the variables I want to use -surv <- subset_data %>% - distinct(participant_id, fu_surv) %>% - group_by(fu_surv) %>% - summarise(count = n()) # Count the number of participants per histology type +###### median age at diagnosis -- this requires some initial varialbe manipulation to start as the variables are in character form, not date form +str(subset_data$diag_date_1) #character -- need to be changed to date +str(subset_data$demo_dob) #character -- need to be changed to date -# View the summary table -print(surv) #1 NA patient --> identify the NA patient below dead = 5, alive 103. There is 1 that's an NA. +d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y") +d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y") -na_participant <- subset_data %>% - filter(is.na(fu_surv)) %>% - select(participant_id, fu_surv) +str(d$diag_date_1) #dates! +str(d$demo_dob) #dates! -# Print the result -- 28115-17-021 -- no follow up data for this pt looking in redcap, everyone else has some survival data in the ctDNA cohort. -print(na_participant) +### doing the same for subset_data as it didn't carry over into that data set +subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y") +subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y") -# Summarize data by unique participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +# calculating age from date of diagnosis to dob +subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25 +head(subset_data$age_at_diag) + +summary(subset_data$age_at_diag) #median 48.75 + +age_summary <- subset_data |> + group_by(ctDNA_ever) |> summarise( - surv = first(fu_surv), # Get survival status for each participant - ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant - .groups = "drop" + mean_age = mean(age_at_diag, na.rm = TRUE), # Calculate mean age + median_age = median(age_at_diag, na.rm = TRUE), # Calculate median age + sd_age = sd(age_at_diag, na.rm = TRUE), # Calculate standard deviation of age + n = n() # Number of participants in each group ) -# Create a contingency table of surv vs ctDNA_ever -contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$ctDNA_ever) +print(age_summary) -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) +# Perform the Wilcoxon rank-sum test to compare the medians of age between ctDNA_ever positive and negative groups +wilcox_test_result <- wilcox.test(age_at_diag ~ ctDNA_ever, data = subset_data) -# Print the contingency table and Chi-squared test results, p<0.00001 -print(contingency_table) -print(chisq_test) +# Print the result +print(wilcox_test_result) +#looking at range of age for the ctDNA pos vs neg groups +age_summary <- subset_data |> + group_by(ctDNA_ever) |> + summarise( + min_age = min(age_at_diag, na.rm = TRUE), # Minimum age + max_age = max(age_at_diag, na.rm = TRUE), # Maximum age + .groups = "drop" + ) +# View the summary table for age +print(age_summary) ``` +``` {r} + +##### Race: demo_race_final + +# Get the count of unique participant_ids for each category in demo_race_final +race_counts_unique_percent <- subset_data %>% + group_by(demo_race_final) %>% + summarise(unique_participants = n_distinct(participant_id)) %>% + mutate(percent = unique_participants / sum(unique_participants) * 100) +# View the result +print(race_counts_unique_percent) -**Test Characteristics** -Next, we will look at ctDNA and DTC test characteristics. First we will look at the association between ctDNA and DTC positivity. Next we will look at the number of tests. -``` {r} +# Count distinct participant_ids by ctDNA_ever and demo_race_final +count_distinct_participants <- subset_data %>% + group_by(demo_race_final, ctDNA_ever) %>% + summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop") -############ Test Characteristics and Baseline versus cumulative positivity (ctDNA to start)####### +# Print the result +count_distinct_participants -### DTC by ctDNA (ever positive), association between test positivity. -# link by participant id -subset_data_by_id <- subset_data %>% +# Step 1: Summarize by unique participant_id +summarized_data <- subset_data %>% group_by(participant_id) %>% summarise( - dtc = first(dtc_ever), # Get the ever dtc for each participant - ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant + ctDNA_ever = first(ctDNA_ever), # Taking the first observed value of ctDNA_ever for each participant + demo_race_final = first(demo_race_final), # Taking the first observed value of demo_race_final for each participant .groups = "drop" ) -# Create a contingency table of dtc vs ctDNA_ever -contingency_table <- table(subset_data_by_id$dtc, subset_data_by_id$ctDNA_ever) - -# Perform the Chi-squared test +# Step 2: Create the contingency table +contingency_table <- table(summarized_data$ctDNA_ever, summarized_data$demo_race_final) +contingency_table +# Step 3: Perform the chi-squared test of independence chisq_test <- chisq.test(contingency_table) -# Print the contingency table and Chi-squared test results, p-val 0.839 -print(contingency_table) -print(chisq_test) +# Step 4: Print the result p val - 0.91 +chisq_test +#####receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') -##### Test stuff (#s and such of tests) +# Breakdown of final_receptor_group by unique participant_id +receptor_status_by_participant <- subset_data %>% + group_by(participant_id) %>% + summarise(final_receptor_group = first(final_receptor_group), # Or choose the most frequent group if needed + .groups = "drop") -#number of tests (ctDNA) -library(dplyr) +# View the result +table(receptor_status_by_participant$final_receptor_group) -# Assuming the status variable is named `ctDNA_detected` in d, and then in subset -status_summary_d <- d %>% - group_by(ctDNA_detected) %>% - summarise(total_samples = n(), .groups = "drop") +# Summarizing data by participant_id, final_receptor_group, and ctDNA_ever +receptor_ctDNA_status <- subset_data %>% + group_by(participant_id) %>% + summarise( + final_receptor_group = first(final_receptor_group), # Or the most frequent if needed + ctDNA_ever = first(ctDNA_ever), # Taking the first observed value for ctDNA_ever + .groups = "drop" + ) -# Print the summary -- we've got 385 FALSE, 8 FAILS, 11 TRUES -print(status_summary_d) +# Step 2: Create the contingency table +contingency_table_receptor <- table(receptor_ctDNA_status$final_receptor_group, receptor_ctDNA_status$ctDNA_ever) +contingency_table_receptor -#looking at the number of Fails by unique participant_id -fail_count <- d %>% - filter(ctDNA_detected == "Fail") %>% # Filter for rows where status is "FAIL" - distinct(participant_id) %>% # Get unique participant IDs - summarise(total_fails = n()) # Count unique participant IDs +# Step 3: Perform the chi-squared test of independence +chisq_test <- chisq.test(contingency_table_receptor) -# Print the result -- 4 individuals with FAIL results, which is what we got in the consort -print(fail_count) -fail_count <- subset_data %>% - filter(ctDNA_detected == "Fail") %>% # Filter for rows where status is "FAIL" - distinct(participant_id) %>% # Get unique participant IDs - summarise(total_fails = n()) # Count unique participant IDs +# Step 4: Print the result # p-value 0.10 +chisq_test -# Print the result -- none of the fails were pulled into the ctDNA cohort -print(fail_count) -#number of DTC tests in this cohort of 109 patients +#I was curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive) +#inclusion criteria inc_dx_crit___1 = TNBC (This has been confirmed with the study team) +#inc_dx_crit_list___1 -unique(subset_data$dtc_ihc_result_final) #NA, 0, and 1 +TNBC_ctDNA_status <- subset_data %>% + group_by(participant_id) %>% + summarise( + inc_dx_crit_list___1 = first(inc_dx_crit_list___1), # Or the most frequent if needed + ctDNA_ever = first(ctDNA_ever), # Taking the first observed value for ctDNA_ever + .groups = "drop" + ) -status_summary_subset <- subset_data %>% - group_by(dtc_ihc_result_final) %>% - summarise(total_samples = n(), .groups = "drop") +# Step 2: Create the contingency table +contingency_table_TNBC <- table(TNBC_ctDNA_status$inc_dx_crit_list___1, TNBC_ctDNA_status$ctDNA_ever) +contingency_table_TNBC -# Print the summary -- we've got 221 negatives, 49 positives, 128 NAs, across 39 patients (positive) and 70 patients (negative) -#### confirm with nick that not missing the NAs, but I suspect based on the below that we are fine and thse are just ctDNA only timepoints -print(status_summary_subset) +# Step 3: Perform the chi-squared test of independence +chisq_test <- chisq.test(contingency_table_TNBC) -### looking at NAs -- all of them have FALSE (so i think these are all the ones that had ctDNA timepoints ) -na_participants_dtc <- subset_data %>% - filter(is.na(dtc_ihc_result_final)) %>% - select(participant_id, dtc_ihc_result_final, FINAL_RESULT, ORIG_RSLT_DTC, ctDNA_detected, timepoint) +# Step 4: p-val is 0.12 +chisq_test + -# Print the list of participant IDs with NA in `dtc_ihc_result_final`-- they all have FALSE ctDNA results, so these are the ctDNA timepoints -#all of the timepoints are long-term except for CLEVER baseline. -print(na_participants_dtc, n=128) +### HR positive vs HR negative (Hormone receptor positive vs hormone receptor negative) +#first, I need to create a HR positive variable (HR_status) +subset_data <- subset_data |> + mutate(HR_status = case_when( + final_receptor_group %in% c(2, 3) ~ "HR+", + final_receptor_group %in% c(1, 4) ~ "Non-HR+", + TRUE ~ NA_character_ # In case there are missing or other unexpected values + )) -#look at timepoints -unique_timepoints <- unique(subset_data$timepoint) -print(unique_timepoints) +# View the new HR_status variable +table(subset_data$HR_status) -#Identify participant_ids with both "SURMOUNT" and "CLEVER Screening" timepoints -participants_dual_timepoints <- subset_data %>% - filter(timepoint %in% c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", - "Year 3 Follow Up", "Year 4 Follow Up", "CLEVER-Baseline")) %>% # Filter for relevant timepoints - group_by(participant_id) %>% - filter(n_distinct(timepoint) > 1) %>% # Ensure participant has both timepoints - ungroup() %>% - select(participant_id, timepoint, date) %>% # Select participant_id, timepoint, and date - distinct() - -participants_dual_timepoints <- subset_data %>% - filter(timepoint %in% c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", - "Year 3 Follow Up", "Year 4 Follow Up", "CLEVER-Baseline")) %>% +HR_status_by_participant <- subset_data %>% group_by(participant_id) %>% - filter(n_distinct(timepoint) > 1) %>% - ungroup() %>% - select(participant_id, timepoint, dtc_ihc_date_final, dtc_ihc_result_final) %>% - distinct() - -# Print the list of participant_ids with both timepoints -- great, all the CLEVER-Baselines are NAs in this as they should (blood only) -print(participants_dual_timepoints, n=190) - - -##### eVAF -names(subset_data) #use eVAF - -# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE` -eVAF_range_ctDNA_detected_percent <- subset_data %>% - filter(ctDNA_detected == TRUE) %>% # Filter for those with ctDNA detected - summarise( - median_eVAF_percent = median(eVAF, na.rm = TRUE) * 100, # Convert median to percentage - min_eVAF_percent = min(eVAF, na.rm = TRUE) * 100, # Convert minimum to percentage - max_eVAF_percent = max(eVAF, na.rm = TRUE) * 100 # Convert maximum to percentage - ) - -# Print the result -print(eVAF_range_ctDNA_detected_percent) + summarise(HR_status = first(HR_status), # Or use mode() if you have multiple rows per participant + .groups = "drop") -#### DTC counts -names(subset_data) #use dtc_ihc_summary_count_final +# View the result +table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-) -# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE` -dtc_count <- subset_data %>% - filter(dtc_ihc_result_final == 1) %>% # Filter for those with dtcs detected +# Summarize ctDNA_detected status by HR_status, for each unique participant_id +summary_data <- subset_data %>% + group_by(participant_id) %>% summarise( - median_dtc_count = median(dtc_ihc_summary_count_final, na.rm = TRUE), - min_dtc_count = min(dtc_ihc_summary_count_final, na.rm = TRUE), - max_dtc_count = max(dtc_ihc_summary_count_final, na.rm = TRUE) + HR_status = first(HR_status), # Get the HR_status for the participant + ctDNA_status = first(ctDNA_ever), # Get the ctDNA_detected status for the participant + .groups = "drop" ) -# Print the result -print(dtc_count) +contingency_table_HR <- table(summary_data$ctDNA_status, summary_data$HR_status) +contingency_table_HR +chisq_test <- chisq.test(contingency_table_HR) +# Print chi-squared test results #0.28 +chisq_test -#### Number of timepoints we see -# Timepoints per patient (median, range) -timepoints_per_patient <- subset_data %>% +###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported +# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported +summary_data <- subset_data %>% + filter(final_tumor_grade != 3) %>% # Exclude grade == 3 group_by(participant_id) %>% summarise( - total_timepoints = n_distinct(timepoint), # Count distinct timepoints for each patient + grade = first(final_tumor_grade), # Get the final_tumor_grade for each participant + ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant .groups = "drop" - ) %>% - summarise( - median_timepoints = median(total_timepoints, na.rm = TRUE), # Calculate median - min_timepoints = min(total_timepoints, na.rm = TRUE), # Calculate minimum - max_timepoints = max(total_timepoints, na.rm = TRUE) # Calculate maximum ) -# Timepoints of ctDNA assessment (`ctDNA_detected`) -ctDNA_timepoints <- subset_data %>% - filter(!is.na(ctDNA_detected)) %>% # Filter out NA values for ctDNA_detected - group_by(participant_id) %>% - summarise( - ctDNA_timepoints = n_distinct(timepoint), # Count distinct timepoints of ctDNA assessment - .groups = "drop" - ) %>% - summarise( - median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE), # Calculate median - min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE), # Calculate minimum - max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE) # Calculate maximum - ) +# Create a contingency table of grade vs ctDNA_ever +contingency_table <- table(summary_data$grade, summary_data$ctDNA_ever) -# Timepoints of DTC assessment (`dtc_ihc_results_final`) -dtc_timepoints <- subset_data %>% - filter(!is.na(dtc_ihc_result_final)) %>% # Filter out NA values for dtc_ihc_result_final - group_by(participant_id) %>% - summarise( - dtc_timepoints = n_distinct(timepoint), # Count distinct timepoints of DTC assessment - .groups = "drop" - ) %>% - summarise( - median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE), # Calculate median - min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE), # Calculate minimum - max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE) # Calculate maximum - ) +# View the contingency table +print(contingency_table) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# View the Chi-squared test result -- p-value 0.0229 +print(chisq_test) + +######histology (final histology) +#people have different combinations of histology (1-15) +table(subset_data$participant_id, subset_data$final_histology) + + histology_summary <- subset_data %>% + distinct(participant_id, final_histology) %>% # Get unique participant-histology combinations + group_by(final_histology) %>% # Group by histology type + summarise(count = n()) # Count the number of participants per histology type + + # View the summary table + print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology + + #trying to create Ductal, lobular, both, or other variables --> histology_category + subset_data <- subset_data %>% + mutate(histology_category = case_when( + grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular", # Both Ductal and Lobular + grepl("3", as.character(final_histology)) ~ "Ductal", # Ductal + grepl("14", as.character(final_histology)) ~ "Lobular", # Lobular + TRUE ~ "Other" # Any other combination + )) + + # Count the number of participants in each histology category + histology_counts <- subset_data %>% + group_by(histology_category) %>% + summarise(count = n_distinct(participant_id)) # Count distinct participants + + # View the counts -- adds up to 109! + print(histology_counts) + + #contingency table + library(tidyr) + contingency_table <- subset_data %>% + distinct(participant_id, histology_category, ctDNA_ever) %>% # Ensure each patient is counted once + count(histology_category, ctDNA_ever) %>% + pivot_wider(names_from = ctDNA_ever, values_from = n, values_fill = list(n = 0)) # Pivot the table to get ctDNA_ever as columns + + # 3. Perform the Chi-squared test of independence + chisq_test <- chisq.test(contingency_table[,-1]) # Remove the histology_category column for the test + + # 4. Print the contingency table + print(contingency_table) + + # 5. Print the result of the Chi-squared test p-value - 0.2276 + print(chisq_test) + + + +#### Staging N stage (Nodal stage) -# Print all summaries -print("Timepoints per patient:") -print(timepoints_per_patient) +table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3) + +nodal_summary <- subset_data %>% + distinct(participant_id, final_n_stage) %>% # Get unique participant-stage combinations + group_by(final_n_stage) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + +#View the summary table --adds up to 109, 46 = pN0 63 = pN1 + print(nodal_summary) + + subset_data_by_id <- subset_data %>% + filter(final_n_stage %in% c(0, 1, 2, 3)) %>% # Include only relevant nodal stages + group_by(participant_id) %>% + summarise( + nodal_status = first(final_n_stage), # Use final_n_stage as nodal_status for each participant + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" + ) + + #Create a contingency table of nodal_status vs ctDNA_ever + contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$ctDNA_ever) + + # Check if any cells in the contingency table have zero counts, which could affect test validity + print(contingency_table) + + # Step 5: Perform Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Step 6: Print the Chi-squared test result p = 0.0001 + print(chisq_test) + + + #### Node positive versus node negative: Using the final n stage to create a Node - vs node + variable from this summary indicator variable + subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"), # Node negative if 0, positive otherwise + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" + ) + + #adding node_status to subset_data + subset_data <- subset_data %>% + left_join(subset_data_by_id %>% select(participant_id, node_status), by = "participant_id") + + + #Create a contingency table of node_status vs ctDNA_ever + contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + #Print the contingency table and Chi-squared test results + print(contingency_table) + print(chisq_test) + -print("Timepoints of ctDNA assessment:") -print(ctDNA_timepoints) +#######Looking at T stage or tumor size: the variable is final_t_stage + + table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (99 = pTx, cannot evaluate) so can proceed with this + + t_summary <- subset_data %>% + distinct(participant_id, final_t_stage) %>% # Get unique participant-stage combinations + group_by(final_t_stage) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + + # View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2 + print(t_summary) + + + #for our T stage table, will use T1 vs T2 or greater to simplify, and we want to exclude 99 (the pTx). We will create "final_t_stage_combined" to represent this. + subset_data_clean <- subset_data %>% + filter(final_t_stage != 99, ctDNA_ever != 99) + + # Combine final_t_stage into T1 vs. T2 or greater + subset_data_clean <- subset_data_clean %>% + mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater")) + + # Summarize the data by participant_id after creating the new combined t_stage + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + + # Create a contingency table of final_t_stage_combined vs ctDNA_ever + contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Print the contingency table and Chi-squared test results. P value = 0.6 + print(contingency_table) + print(chisq_test) -print("Timepoints of DTC assessment:") -print(dtc_timepoints) +#### I looked at a different cut-off for T stage stats using T3 or greater as cutoff and didn't see any significant difference so am not using this for the table. + + #exclude 99 (the pTx) + subset_data_clean <- subset_data %>% + filter(final_t_stage != 99, ctDNA_ever != 99) + + # Combine final_t_stage into T1/T2 or T3 or greater + subset_data_clean <- subset_data_clean %>% + mutate(final_t_stage_combined = case_when( + final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2", # Group T1 and T2 together + final_t_stage >= 3 ~ "T3 or greater", # Group T3 and higher as a separate category + TRUE ~ NA_character_ # Handle any unexpected values + )) + + + # Summarize the data by participant_id after creating the new combined t_stage + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + + # Create a contingency table of final_t_stage_combined vs ctDNA_ever + contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Print the contingency table and Chi-squared test results --> not significant so ignore this + print(contingency_table) + print(chisq_test) + + + + ########Overall stage of disease -- final_overall_stage -### timepoints on clinical trial ### Ask Nick -- should we include all the timepoints on trial technically -#(CLEVER-Baseline, EOO, C3, C6, C12, 6M F/U, etc.) or just the ones while on treatment (C3, C6, C12) -#, or only the ones while patiennts are -unique_timepoints <- unique(subset_data$timepoint) -print(unique_timepoints) + table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this + + stage_summary <- subset_data %>% + distinct(participant_id, final_overall_stage) %>% # Get unique participant-stage combinations + group_by(final_overall_stage) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + + # View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay) + print(stage_summary) + + #exclude the 99 + subset_data_clean <- subset_data %>% + filter(final_overall_stage != 99, ctDNA_ever != 99) + + # Summarize the data by participant_id + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + final_overall_stage = first(final_overall_stage), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + + # Create a contingency table of final_overall_stage vs ctDNA_ever + contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Print the contingency table and Chi-squared test results --> hot dogggg p val = 0.006. Higher stage is associated with ctDNA_ever. + print(contingency_table) + print(chisq_test) + + -trial_timepoints <- c("CLEVER-Baseline", "EOO", "C3", "C6", "C12", "6M F/U", "12M F/U", "18M F/U", "24M F/U", "30M F/U", "36M F/U") +###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) + + table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness + + surgery <- subset_data %>% + distinct(participant_id, diag_surgery_type_1) %>% # Get unique participant-stage combinations + group_by(diag_surgery_type_1) %>% # Group by stage + summarise(count = n()) # Count the number of participants per histology type + + # View the summary table + print(surgery) + + + # Summarize the data by participant_id + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + surgery = first(diag_surgery_type_1), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) + + # Create a contingency table of final_overall_stage vs ctDNA_ever + contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$ctDNA_ever) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Print the contingency table and Chi-squared test results --> p-val = 1.... + print(contingency_table) + print(chisq_test) + + -# Count the number of samples by timepoint (for specific clinical trial timepoints) -samples_by_trial_timepoint <- subset_data %>% - filter(timepoint %in% trial_timepoints) %>% # Filter for relevant timepoints - group_by(timepoint) %>% # Group by timepoint - summarise( - total_samples = n_distinct(participant_id), # Count distinct participant_ids (samples) - .groups = "drop" # Remove grouping after summarizing - ) +######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms). Created a new variable (axillary_dissection) -# Print the result -print(samples_by_trial_timepoint) #total samples on trial (ctDNA and dtC) + + table(subset_data$diag_axillary_type___2_1) + table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two + + # Create a binary variable to identify participants who had axillary dissection + subset_data_clean <- subset_data %>% + mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) + + subset_data <- subset_data %>% + mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) + + # Ensure every participant has a ctDNA_ever and axillary_dissection value + # Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one + subset_data_clean <- subset_data %>% + mutate(axillary_dissection = case_when( + diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1, # Had axillary dissection + TRUE ~ 0 # No axillary dissection (includes missing values) + )) + + # Summarize the data by participant_id, including the axillary_dissection and ctDNA_ever variables + subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + axillary_dissection = first(axillary_dissection), # Get the axillary dissection status for each participant + ctDNA_ever = first(ctDNA_ever) # Get the ctDNA_ever status for each participant + ) + + contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$ctDNA_ever) + + subset_data <- subset_data %>% + mutate(axillary_dissection = ifelse(is.na(axillary_dissection), 0, axillary_dissection)) +table(subset_data$axillary_dissection) + + # Perform the Chi-squared test + chisq_test <- chisq.test(contingency_table) + + # Print the contingency table and Chi-squared test results --> p-value 0.173 + print(contingency_table) + print(chisq_test) -#### ctDNA on trial +####inflammatory (variable inflamm_yn)-- I have decided not to include inflammatory variable in table 1 as there were NO inflammatory breast cancers in the ctDNA cohort. +table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable) +table(d$inflamm_yn_2) ### I think inflammatory folks just not in subset of patients in the ctDNA cohort +table(subset_data$inflamm_yn) + -ctDNA_samples_by_timepoint <- subset_data %>% - filter(timepoint %in% trial_timepoints, !is.na(ctDNA_detected)) %>% # Filter for relevant timepoints and ctDNA detected - group_by(timepoint) %>% # Group by timepoint - summarise( - total_samples_ctDNA = n_distinct(participant_id), # Count distinct participant_ids (ctDNA samples) - .groups = "drop" # Remove grouping after summarizing - ) +#### radiation prtx_radiation +table(subset_data$prtx_radiation) -# Print the result for ctDNA samples -- 32 CLEVER-Baseline, 9 EOO, 20 C3, 28 C6, 4 C12, 27 6M, 18 12M, 13 18M, 13 24M, 12 30M, 18 36M -print(ctDNA_samples_by_timepoint) +radiation <- subset_data |> + distinct(participant_id,prtx_radiation) |> + group_by(prtx_radiation) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type +# View the summary table +print(radiation) -##### DTC by trial timepoint -# Count the number of DTC samples by timepoint (for specific clinical trial timepoints) -dtc_samples_by_timepoint <- subset_data %>% - filter(timepoint %in% trial_timepoints, !is.na(dtc_ihc_result_final)) %>% # Filter for relevant timepoints and DTC results - group_by(timepoint) %>% # Group by timepoint +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% summarise( - total_samples_dtc = n_distinct(participant_id), # Count distinct participant_ids (DTC samples) - .groups = "drop" # Remove grouping after summarizing + radiation = first(prtx_radiation), # xrt for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant ) -# Print the result for DTC samples -- makes sense, no CLEVER baseline timepoints, 9 EOO, 19 C3, 28 C6, 4 C12, 19 6M F/U -print(dtc_samples_by_timepoint) - -#### Number of ctDNA timepoints on surmount -print(unique_timepoints) -surmount_timepoints <- c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", "Year 3 Follow Up", "Year 4 Follow Up", "Long Term FU 1", "Long Term FU 2") - -ctDNA_surmount <- subset_data %>% - filter(timepoint %in% surmount_timepoints, !is.na(ctDNA_detected)) %>% # Filter for relevant timepoints and ctDNA detected - group_by(timepoint) %>% # Group by timepoint - summarise( - total_samples_ctDNA = n_distinct(participant_id), # Count distinct participant_ids (ctDNA samples) - .groups = "drop" # Remove grouping after summarizing - ) +# Create a contingency table of final_overall_stage vs ctDNA_ever +contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$ctDNA_ever) -# Print the result for ctDNA samples -- 109 Baseline, Y1 FU 40, Y2 FU 25, Y3 FU 14, Y4 FU 4, LT FU 10, LT 2 FU 2 -print(ctDNA_surmount) +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) +# Print the contingency table and Chi-squared test results --> p-val = 0.33 +print(contingency_table) +print(chisq_test) -### number of DTC timepoints on surmount -# Count the number of DTC samples by timepoint -dtc_timepoint_surmount <- subset_data %>% - filter(timepoint %in% surmount_timepoints, !is.na(dtc_ihc_result_final)) %>% # Filter for relevant timepoints and DTC results - group_by(timepoint) %>% # Group by timepoint - summarise( - total_samples_dtc = n_distinct(participant_id), # Count distinct participant_ids (DTC samples) - .groups = "drop" # Remove grouping after summarizing - ) -# Print the result for DTC samples -- -print(dtc_timepoint_surmount) +#### chemotherapy prtx_chemo +table(subset_data$prtx_chemo) +chemo <- subset_data |> + distinct(participant_id,prtx_chemo) |> + group_by(prtx_chemo) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type -#### positivity by timepoint -- ctDNA +# View the summary table +print(chemo) #3 people did not get chemo in this cohort -ctDNA_pos_rate_by_timepoint <- subset_data %>% - filter(!is.na(ctDNA_detected)) %>% # Ensure we are considering only non-missing ctDNA_detected values - group_by(timepoint, participant_id) %>% # Group by timepoint and participant - summarise( - ctDNA_pos = max(ctDNA_detected == TRUE), # If any value is TRUE, participant is ctDNA positive at that timepoint - .groups = "drop" - ) %>% - group_by(timepoint) %>% # Group again by timepoint to calculate the positivity rate +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% summarise( - positivity_rate = mean(ctDNA_pos), # Calculate the positivity rate for each timepoint - total_samples = n_distinct(participant_id), # Count the number of distinct participants - .groups = "drop" - ) - -# Print the result for ctDNA positivity rate by timepoint -print(ctDNA_pos_rate_by_timepoint) - -# Calculate cumulative ctDNA positivity rate by timepoint -ctDNA_pos_rate_cumulative <- ctDNA_pos_rate_by_timepoint %>% - arrange(timepoint) %>% # Ensure the data is sorted by timepoint - mutate( - cumulative_pos_rate = cumsum(positivity_rate * total_samples) / cumsum(total_samples) # Cumulative positivity rate + chemo = first(prtx_chemo), # chemo for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant ) -print(ctDNA_pos_rate_cumulative) - -#### Cumulative positivity ctDNA - -library(dplyr) - -# Calculate ctDNA positivity rate by participant -ctDNA_pos_rate <- subset_data %>% - filter(!is.na(ctDNA_detected)) %>% # Exclude missing ctDNA results - group_by(participant_id) %>% # Group by participant - summarise( - ctDNA_pos = max(ctDNA_detected == TRUE), # If any value is TRUE, participant is ctDNA positive - .groups = "drop" - ) +# Create a contingency table of final_overall_stage vs ctDNA_ever +contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$ctDNA_ever) -# Calculate cumulative positivity rate -ctDNA_pos_rate_cumulative <- ctDNA_pos_rate %>% - summarise( - total_pos = sum(ctDNA_pos), # Total number of ctDNA positive participants - total_samples = n(), # Total number of participants - cumulative_pos_rate = total_pos / total_samples # Cumulative positivity rate - ) +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -# Print the cumulative positivity rate -print(ctDNA_pos_rate_cumulative) +# Print the contingency table and Chi-squared test results --> p-val = 0.59 +print(contingency_table) +print(chisq_test) -# Count the number of positive ctDNA samples and total samples -ctDNA_pos_vs_total <- subset_data %>% - filter(!is.na(ctDNA_detected)) %>% # Exclude missing ctDNA results - summarise( - total_samples = n(), # Total number of ctDNA samples - positive_samples = sum(ctDNA_detected == TRUE), # Count of positive ctDNA samples - .groups = "drop" - ) %>% - mutate( - positivity_rate = positive_samples / total_samples # Proportion of positive ctDNA samples - ) -# Print the results -print(ctDNA_pos_vs_total) +####neoadjuvant chemo -- there are two variables for this that could theoretically be included: diag_neoadj_chemo_1 or diag_neoadj_chemo_2 +table(subset_data$diag_neoadj_chemo_1) +table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable -#### cumulative positivity DTC +nact <- subset_data |> + distinct(participant_id,diag_neoadj_chemo_1) |> + group_by(diag_neoadj_chemo_1) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type -# Calculate ctDNA positivity rate by participant -DTC_pos_rate <- subset_data %>% - filter(!is.na(dtc_ihc_result_final)) %>% # Exclude missing ctDNA results - group_by(participant_id) %>% # Group by participant - summarise( - dtc = max(dtc_ihc_result_final == 1), # If any value is TRUE, participant is ctDNA positive - .groups = "drop" - ) +# View the summary table +print(nact) #3 people did not get chemo in this cohort -# Calculate cumulative positivity rate -DTC_pos_rate_cumulative <- DTC_pos_rate %>% +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% summarise( - total_pos = sum(dtc), # Total number of ctDNA positive participants - total_samples = n(), # Total number of participants - cumulative_pos_rate = total_pos / total_samples # Cumulative positivity rate + nact = first(diag_neoadj_chemo_1), # NACT for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant ) -# Print the cumulative positivity rate -print(DTC_pos_rate_cumulative) - +# Create a contingency table of NACT vs ctDNA_ever +contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$ctDNA_ever) -# Count the number of positive ctDNA samples and total samples -dtc_pos_vs_total <- subset_data %>% - filter(!is.na(dtc_ihc_result_final)) %>% # Exclude missing ctDNA results - summarise( - total_samples = n(), # Total number of ctDNA samples - positive_samples = sum(dtc_ihc_result_final == 1), # Count of positive ctDNA samples - .groups = "drop" - ) %>% - mutate( - positivity_rate = positive_samples / total_samples # Proportion of positive ctDNA samples - ) +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -# Print the results -print(dtc_pos_vs_total) - +# Print the contingency table and Chi-squared test results --> p-val = 0.95 +print(contingency_table) +print(chisq_test) -``` +####hormone therapy prtx_endo -**Test Characteristics of ctDNA assay**: -Next we will look at the sensitivity and specificity of the ctDNA assay. +table(subset_data$prtx_endo) -``` {r} -###### Test characteristics ctDNA -#trying to do ctDNA 2x2 with ever relapsed on a patient level +endo <- subset_data |> + distinct(participant_id,prtx_endo) |> + group_by(prtx_endo) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type -library(dplyr) +# View the summary table +print(endo) #most ppl did get endo (62 of the 109) -# Exclude participants with all NA for `ctDNA_ever` or `ever_relapsed` -summarized_data <- subset_data %>% - filter(!is.na(ctDNA_ever) | !is.na(ever_relapsed)) %>% # Keep rows with at least one non-NA value +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% group_by(participant_id) %>% - summarize( - ctDNA_ever = max(ctDNA_ever, na.rm = TRUE), - ever_relapsed = max(ever_relapsed, na.rm = TRUE) + summarise( + endo = first(prtx_endo), # Get the final_overall_stage for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant ) -# Create the confusion matrix -confusion_matrix <- table(summarized_data$ctDNA_ever, summarized_data$ever_relapsed) - -# Extract counts from the confusion matrix -TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0) # True Positives -FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0) # False Positives -TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0) # True Negatives -FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0) # False Negatives - -# Calculate performance metrics -sensitivity <- TP / (TP + FN) # Sensitivity -specificity <- TN / (TN + FP) # Specificity -PPV <- TP / (TP + FP) # Positive Predictive Value -NPV <- TN / (TN + FN) # Negative Predictive Value - -# Create a data frame for the table -performance_table <- data.frame( - Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"), - Value = c(sensitivity, specificity, PPV, NPV) -) +# Create a contingency table of final_overall_stage vs ctDNA_ever +contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$ctDNA_ever) -# Print the table -print(performance_table) +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -#Format the table for better readability -library(knitr) -kable(performance_table, digits = 2, col.names = c("Metric", "Value")) +# Print the contingency table and Chi-squared test results --> p-val = 0.33 +print(contingency_table) +print(chisq_test) -``` -This ctDNA assay has high specificity (99%), with a high positive predictive value for relapse (94%) and also a high negative predictive value (88%). -``` {r} -### Test characteristics for DTC +####bone modifying agents prtx_bonemod -library(dplyr) +table(subset_data$prtx_bonemod) -# Total unique DTC+ patients -total_dtc_plus <- subset_data %>% - filter(dtc_ihc_result_final == 1) %>% - distinct(participant_id) %>% - nrow() +bonemod <- subset_data |> + distinct(participant_id,prtx_bonemod) |> + group_by(prtx_bonemod) |> # Group by stage + summarise(count = n()) # Count the number of participants per histology type -# Unique DTC+ patients who went on trial (those who have a trial ID fu_trial_pid) -dtc_plus_trial <- subset_data %>% - filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) %>% - distinct(participant_id) %>% - nrow() +# View the summary table +print(bonemod) #most ppl did get endo (39 got bonemod) -# Proportion of DTC+ patients who went on trial -proportion_trial <- dtc_plus_trial / total_dtc_plus +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + bonemod = first(prtx_bonemod), # Get bone mod status for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) -# Display results -cat("Total unique DTC+ patients:", total_dtc_plus, "\n") -cat("Unique DTC+ patients who went on trial:", dtc_plus_trial, "\n") -cat("Proportion of DTC+ patients who went on trial:", proportion_trial, "\n") +# Create a contingency table of bonemod vs ctDNA_ever +contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$ctDNA_ever) -# All DTC + patients went on trial (39/39) +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -``` +# Print the contingency table and Chi-squared test results --> p-val = 0.84 +print(contingency_table) +print(chisq_test) -All of the 39 individuals who were DTC positive went onto an interventional treatment trial aimed at eliminating the presence of the DTCs. It is therefore challenging to interpret the sensitivity and specificity of the DTC test, as relapse is the outcome and all of these patients are receiving an intervention aimed at eliminating the presence of the DTCs. +#### PCR -- did NOT include this in Table 1 as it aligns closely with NACT) +# 2 = non-pcr, 1 = pcr +#the variables of interest for path cr: diag_pcr_1 or diag_pcr_2 +table(subset_data$diag_pcr_1) +table(subset_data$diag_pcr_2) #none recorded here so can just use diag_pcr_1 -``` {r} -##### Concordance between DTC and ctDNA +pcr <- subset_data %>% + mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) %>% # Convert "." to NA + filter(!is.na(diag_pcr_1)) %>% # Exclude rows where diag_pcr_1 is NA + distinct(participant_id, diag_pcr_1) %>% + group_by(diag_pcr_1) %>% + summarise(count = n()) # Count the number of participants per histology type +# View the summary table +print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data -### concordance overall +# Summarize the data by participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% + summarise( + pcr = first(diag_pcr_1), # Get pcr for each participant + ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant + ) -# Filter and get unique participants by participant_id -concordance_overall_unique <- subset_data %>% - distinct(participant_id, .keep_all = TRUE) %>% - mutate(concordance = ifelse(dtc_ever == ctDNA_ever, "Concordant", "Discordant")) +# Create a contingency table of pcr vs ctDNA_ever +contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$ctDNA_ever) -# Count total concordant and discordant pairs for unique participants -overall_concordant <- sum(concordance_overall_unique$concordance == "Concordant") -overall_discordant <- sum(concordance_overall_unique$concordance == "Discordant") +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -# Proportion of concordance -proportion_concordant <- overall_concordant / (overall_concordant + overall_discordant) +# Print the contingency table and Chi-squared test results --> p-val = 0.86 -- does not seem to be association among those who got pcr (but also we have a group with 1 in it...) +print(contingency_table) +print(chisq_test) -cat("Overall Concordant (unique participants):", overall_concordant, "\n") -cat("Overall Discordant (unique participants):", overall_discordant, "\n") -cat("Overall Proportion Concordant (unique participants):", proportion_concordant, "\n") -#Proportion concordance 63% (ever positive) -# Ensure that dtc_ihc_result_final is converted to TRUE/FALSE for consistency with ctDNA_detected -concordance_by_timepoint <- subset_data %>% - filter(!is.na(dtc_ihc_result_final) & !is.na(ctDNA_detected)) %>% - mutate( - # Convert dtc_ihc_result_final (1 = TRUE, 0 = FALSE) to match ctDNA format (TRUE/FALSE) - dtc_ihc_result_final_bool = ifelse(dtc_ihc_result_final == 1, TRUE, FALSE), - # Create a column to determine concordance (both DTC and ctDNA being TRUE or both being FALSE) - concordance = ifelse(dtc_ihc_result_final_bool == ctDNA_detected, "Concordant", "Discordant") - ) %>% - group_by(timepoint) %>% +########recurrence +#local first, then distant.then create summary variable of either locreg or distant +#local fu_locreg_prog + +# Step 1: Summarize data by unique participant_id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% summarise( - total_concordant = sum(concordance == "Concordant"), - total_discordant = sum(concordance == "Discordant"), - total_samples = n(), # Total number of samples at this timepoint - concordance_rate = total_concordant / total_samples # Concordance rate per timepoint + fu_locreg_prog = first(fu_locreg_prog), # Get fu_locreg_prog status for each participant + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" ) -# Print concordance results for each timepoint -print(concordance_by_timepoint) - -# Now calculate overall concordance across all timepoints -overall_concordance <- sum(concordance_by_timepoint$total_concordant) / - sum(concordance_by_timepoint$total_samples) - -cat("Overall Concordance Rate across all timepoints:", overall_concordance, "\n") -#concordance, considering testing by timepoint, is 80%, versus 63% when you consider the tests separately. Does this make sense? +# Step 2: Create a contingency table of fu_locreg_prog vs ctDNA_ever +contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$ctDNA_ever) +# Step 3: Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -#### how many DTC pts went on trial? +# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 +print(contingency_table) +print(chisq_test) -# Total DTC+ patients -total_dtc_plus <- nrow(subset(subset_data, dtc_ihc_result_final == 1)) +####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char +### Just want to look at site distribution here -names(subset_data) +# Summarize the distribution of fu_locreg_site_char by unique participant_id +site_distribution <- subset_data %>% + group_by(participant_id) %>% + summarise( + site = first(fu_locreg_site_char), # Get the site for each unique participant + .groups = "drop" + ) %>% + count(site) # Count the occurrences of each site -# DTC+ patients who went on trial (those who have a fu_trial_pid) +# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast +print(site_distribution) -library(dplyr) +#####distant recurrence: distant fu_dist_prog -# Total unique DTC+ patients -total_dtc_plus <- subset_data %>% - filter(dtc_ihc_result_final == 1) %>% - distinct(participant_id) %>% - nrow() +# Step 1: Summarize data by unique participant_id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + fu_dist_prog = first(fu_dist_prog), # Get fu_dist_prog status for each participant + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" + ) -# Unique DTC+ patients who went on trial (those who have a trial ID fu_trial_pid) -dtc_plus_trial <- subset_data %>% - filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) %>% - distinct(participant_id) %>% - nrow() +# Step 2: Create a contingency table of dist prog vs ctDNA_ever --> 12 who had distant progression +contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$ctDNA_ever) -# Proportion of DTC+ patients who went on trial -proportion_trial <- dtc_plus_trial / total_dtc_plus +# Step 3: Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -# Display results -cat("Total unique DTC+ patients:", total_dtc_plus, "\n") -cat("Unique DTC+ patients who went on trial:", dtc_plus_trial, "\n") -cat("Proportion of DTC+ patients who went on trial:", proportion_trial, "\n") +# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 +print(contingency_table) +print(chisq_test) -# All DTC + patients went on trial (39/39) +### Distant sites +#distant site fu_dist_site_num #fu_dist_site_char -- start just looking at the locations -##### Concordance between DTC and ctDNA +# Summarize the distribution of fu_dist_site_char by unique participant_id +dist_site_distribution <- subset_data %>% + group_by(participant_id) %>% + summarise( + site = first(fu_dist_site_char), # Get the site for each unique participant + .groups = "drop" + ) %>% + count(site) # Count the occurrences of each site +# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal +print(dist_site_distribution) -### concordance overall +##### ANY Recurrence -- this includes either fu_locreg_prog or fu_dist_prog -# Filter and get unique participants by participant_id -concordance_overall_unique <- subset_data %>% - distinct(participant_id, .keep_all = TRUE) %>% - mutate(concordance = ifelse(dtc_ever == ctDNA_ever, "Concordant", "Discordant")) +# link by participant id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant + .groups = "drop" + ) -# Count total concordant and discordant pairs for unique participants -overall_concordant <- sum(concordance_overall_unique$concordance == "Concordant") -overall_discordant <- sum(concordance_overall_unique$concordance == "Discordant") +# Create a contingency table of ever_relapsed vs ctDNA_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA_ever) -# Proportion of concordance -proportion_concordant <- overall_concordant / (overall_concordant + overall_discordant) +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -cat("Overall Concordant (unique participants):", overall_concordant, "\n") -cat("Overall Discordant (unique participants):", overall_discordant, "\n") -cat("Overall Proportion Concordant (unique participants):", proportion_concordant, "\n") +# Print the contingency table and Chi-squared test results -- total 14 relapses, 8 were ctDNA +, 6 were not ever ctDNA positive +print(contingency_table) +print(chisq_test) -#Proportion concordance 63% (ever positive) +#### Relapse and DTCs +#using ever_relapsed and dtc_ever -# Ensure that dtc_ihc_result_final is converted to TRUE/FALSE for consistency with ctDNA_detected -concordance_by_timepoint <- subset_data %>% - filter(!is.na(dtc_ihc_result_final) & !is.na(ctDNA_detected)) %>% - mutate( - # Convert dtc_ihc_result_final (1 = TRUE, 0 = FALSE) to match ctDNA format (TRUE/FALSE) - dtc_ihc_result_final_bool = ifelse(dtc_ihc_result_final == 1, TRUE, FALSE), - # Create a column to determine concordance (both DTC and ctDNA being TRUE or both being FALSE) - concordance = ifelse(dtc_ihc_result_final_bool == ctDNA_detected, "Concordant", "Discordant") - ) %>% - group_by(timepoint) %>% +# link by participant id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% summarise( - total_concordant = sum(concordance == "Concordant"), - total_discordant = sum(concordance == "Discordant"), - total_samples = n(), # Total number of samples at this timepoint - concordance_rate = total_concordant / total_samples # Concordance rate per timepoint + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + dtc = first(dtc_ever), # Get the ctDNA_ever status for each participant + .groups = "drop" ) -# Print concordance results for each timepoint -print(concordance_by_timepoint) +# Create a contingency table of ever_relapsed vs ctDNA_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc) -# Now calculate overall concordance across all timepoints -overall_concordance <- sum(concordance_by_timepoint$total_concordant) / - sum(concordance_by_timepoint$total_samples) +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -cat("Overall Concordance Rate across all timepoints:", overall_concordance, "\n") -#concordance, considering testing by timepoint, is 80%, versus 63% when you consider the tests separately. Does this make sense? +# Print the contingency table and Chi-squared test results +print(contingency_table) +print(chisq_test) +# Identify participants missing data in either `ever_relapsed` or `dtc_ever` +missing_data <- subset_data_by_id %>% + filter(is.na(ever_relapsed) | is.na(dtc)) -############### DTC Demographics ########## +# Print the IDs of participants with missing data +print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above) + +### look at ever_relapsed by ctDNA -###### median age at diagnosis +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + ctDNA = first(ctDNA_ever), # Get the ctDNA_ever status for each participant + .groups = "drop" + ) -names(subset_data) #to identify the variables I want to use -str(subset_data$diag_date_1) #character -str(subset_data$org_consent_date) #character -str(subset_data$collection_date) #character +# Create a contingency table of ever_relapsed vs ctDNA_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA) -d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y") -d$org_consent_date <- as.Date(d$demo_dob, format = "%m/%d/%Y") -d$collection_date <- as.Date(d$demo_dob, format = "%m/%d/%Y") +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) +# Print the contingency table and Chi-squared test results, p < 0.00001 +print(contingency_table) +print(chisq_test) -str(d$diag_date_1) #dates! -str(d$org_consent_date) #dates! +####survival: fu_survival -### doing the same for subset_data as it didn't carry over into that data set -subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y") -subset_data$org_consent_date <- as.Date(subset_data$org_consent_date, format = "%m/%d/%Y") -subset_data$collection_date <- as.Date(subset_data$collection_date, format = "%m/%d/%Y") +table(subset_data$fu_surv) +surv <- subset_data %>% + distinct(participant_id, fu_surv) %>% + group_by(fu_surv) %>% + summarise(count = n()) # Count the number of participants per histology type -# calculating age from date of diagnosis to dob -subset_data$time_to_consent <- as.numeric(difftime(subset_data$org_consent_date, subset_data$diag_date_1, units = "days")) / 365.25 -head(subset_data$time_to_consent) +# View the summary table +print(surv) #1 NA patient --> identify the NA patient below dead = 5, alive 103. There is 1 that's an NA. -subset_data$time_to_consent_month <- as.numeric(difftime(subset_data$org_consent_date, subset_data$diag_date_1, units = "days")) / 30 -head(subset_data$time_to_consent_month) +na_participant <- subset_data %>% + filter(is.na(fu_surv)) %>% + select(participant_id, fu_surv) -summary(subset_data$time_to_consent) #median +# Print the result -- 28115-17-021 -- no follow up data for this pt looking in redcap, everyone else has some survival data in the ctDNA cohort. +print(na_participant) -time_to_consent <- subset_data %>% - group_by(ctDNA_ever) %>% +# Summarize data by unique participant_id +subset_data_by_id <- subset_data_clean %>% + group_by(participant_id) %>% summarise( - mean_time_to_consent = mean(time_to_consent, na.rm = TRUE), # Calculate mean age - median_time_to_consent = median(time_to_consent, na.rm = TRUE), # Calculate median age - sd_age = sd(time_to_consent, na.rm = TRUE), # Calculate standard deviation of age - n = n() # Number of participants in each group + surv = first(fu_surv), # Get survival status for each participant + ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant + .groups = "drop" ) -print(time_to_consent) #interesting dtc ever are slightly more positive +# Create a contingency table of surv vs ctDNA_ever +contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$ctDNA_ever) -# Perform the Wilcoxon rank-sum test to compare the medians of age between dtc_ever positive and negative groups -wilcox_test_result <- wilcox.test(time_to_consent ~ ctDNA_ever, data = subset_data) +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) -# Print the result -print(wilcox_test_result) +# Print the contingency table and Chi-squared test results, p<0.00001 +print(contingency_table) +print(chisq_test) -#looking at range of age for the dtc pos -consent_summ <- subset_data %>% - group_by(ctDNA_ever) %>% - summarise( - min_time_to_consent = min(time_to_consent, na.rm = TRUE), # Minimum age - max_time_to_consent = max(time_to_consent, na.rm = TRUE), # Maximum age - .groups = "drop" - ) -# View the summary table -print(consent_summ) +``` + +``` {r} +############### DTC Demographics ########## +###### median age at diagnosis #### Age at Dx (by DTC) @@ -1768,6 +1781,9 @@ age_summary <- subset_data %>% # View the summary table print(age_summary) +``` + +```{r} ##### Race: demo_race_final @@ -2409,53 +2425,14 @@ print(contingency_table) print(chisq_test) +``` -####referred to trial fu_trial_yn --> this variable seems to have disappeared. I can re-make it based on fu_trial_pid -#### SOMETHING WEIRD HAPPENING WITH TRIAL REFERRAL HERE WHEN I LOOK AT IT BY DTC -names(d) - -table(subset_data$fu_trial_yn) #this variable does not exist in our data set - -subset_data <- subset_data %>% - mutate(trial = ifelse(!is.na(fu_trial_pid) & fu_trial_pid != "", "Yes", "No")) -print(subset_data$trial) - - -trial <- subset_data |> - distinct(participant_id,trial) |> - group_by(trial) |> # Group by trial yes/no - summarise(count = n()) # Count the number of participants per histology type - -# View the summary table -print(trial) #38 pts went on trial based on this fu_trial_id - -# Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% - summarise( - trial = first(trial), # Get the final_overall_stage for each participant - dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant - ) - -# Create a contingency table of final_overall_stage vs dtc_ever -contingency_table <- table(subset_data_by_id$trial, subset_data_by_id$dtc_ever) - -# Perform the Chi-squared test -chisq_test <- chisq.test(contingency_table) - -# Print the contingency table and Chi-squared test results --> ### something weird -print(contingency_table) -print(chisq_test) - - - - - -Later +```{r} +#Later #2 = non-pcr, 1 = pcr -#path cr diag_pcr_1 or diag_pcr_2 +#path cr diag_pcr_1 or diag_pcr_2 (as this could be on either of the two diagnosis and staging forms, there are 2 variables for this) table(subset_data$diag_pcr_1) -table(subset_data$diag_pcr_2) #none recorded here si can just use pcr_1 +table(subset_data$diag_pcr_2) #none recorded here so can just use pcr_1 pcr <- subset_data %>% mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) %>% # Convert "." to NA @@ -2481,7 +2458,7 @@ contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$dtc_ever) # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) -# Print the contingency table and Chi-squared test results --> p-val = 0.86 -- does not seem to be association among those who got pcr (but also we have a group with 1 in it...) +# Print the contingency table and Chi-squared test results --> p-val = 0.27-- does not seem to be association among those who got pcr (but also we have a group with 1 in it...). print(contingency_table) print(chisq_test) @@ -2656,66 +2633,660 @@ chisq_test <- chisq.test(contingency_table) print(contingency_table) print(chisq_test) -####### Making Table 1 ######### -library(tableone) +``` + +Now that we have run the univariate associations for all the important demographic and clinical factors for both ctDNA and DTC, we will work on actually making our table 1, first by ctDNA status and a second table by DTC status. -# Specify variables -continuous_vars <- c("age_summary", "dtc_count", "dtc_ihc_summary_count_final") +```{r} +####### Making Table 1--first for ctDNA ######### -categorical_vars <- c("demo_race_final", "final_receptor_group", - "final_tumor_grade", "histology_category", "final_n_stage", - "final_t_stage", "final_overall_stage", - "diag_surgery_type_1", "axillary_dissection", "prtx_radiation", "prtx_chemo", - "diag_neoadj_chemo_1", "prtx_endo", "prtx_bonemod", "diag_pcr_1", "fu_locreg_prog", - "fu_locreg_site_char", "fu_dist_prog", "fu_dist_site_char", "ever_relapsed", - "fu_surv") +## https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html TRY LATER +## lasso https://stats.stackexchange.com/questions/72251/an-example-lasso-regression-using-glmnet-for-binary-outcome +## https://www.r-bloggers.com/2020/05/quick-tutorial-on-lasso-regression-with-example/#google_vignette +#Table 1 Code +library(table1) +names(subset_data) #to choose variables -# Ensure categorical variables are factors with meaningful labels -subset_data <- subset_data %>% - mutate(across(all_of(categorical_vars), as.factor)) +library(dplyr) +library(tidyr) +library(stringr) + +# Prepare the dataset +unique_subset_data <- subset_data %>% + mutate( + # Convert "Missing" and 99 to NA in relevant columns + final_t_stage = na_if(as.character(final_t_stage), "Missing"), + final_t_stage = na_if(final_t_stage, "99"), + final_overall_stage = na_if(as.character(final_overall_stage), "Missing"), + final_overall_stage = na_if(final_overall_stage, "99"), + final_tumor_grade = na_if(final_tumor_grade, 3), + diag_pcr_1 = na_if(diag_pcr_1, "."), + # Replace 99 with NA in all numeric columns + across(where(is.numeric), ~ na_if(.x, 99)) + ) %>% + filter( + # Remove rows with NA values for specific columns before summarizing + !is.na(final_t_stage), + !is.na(final_overall_stage), + !is.na(final_receptor_group), + !is.na(demo_race_final), + !is.na(final_tumor_grade), + !is.na(final_n_stage), + !is.na(histology_category), + !is.na(axillary_dissection), + !is.na(diag_surgery_type_1), + !is.na(diag_neoadj_chemo_1), + !is.na(diag_pcr_1), + !is.na(ctDNA_ever) + ) %>% + group_by(participant_id) %>% + summarize( + age_at_diag = first(na.omit(age_at_diag)), + final_receptor_group = first(na.omit(final_receptor_group)), + demo_race_final = first(na.omit(demo_race_final)), + final_tumor_grade = first(na.omit(final_tumor_grade)), + final_overall_stage = first(na.omit(final_overall_stage)), + final_t_stage = first(na.omit(final_t_stage)), + final_n_stage = first(na.omit(final_n_stage)), + histology_category = first(na.omit(histology_category)), + prtx_radiation = first(na.omit(prtx_radiation)), + prtx_chemo = first(na.omit(prtx_chemo)), + prtx_endo = first(na.omit(prtx_endo)), + prtx_bonemod = first(na.omit(prtx_bonemod)), + node_status = first(na.omit(node_status)), + axillary_dissection = first(na.omit(axillary_dissection)), + diag_surgery_type_1 = first(na.omit(diag_surgery_type_1)), + diag_neoadj_chemo_1 = first(na.omit(diag_neoadj_chemo_1)), diag_pcr_1 = first(na.omit(diag_pcr_1)), + ctDNA_ever = first(na.omit(ctDNA_ever)) + ) + +# trying to get rid of the missings for diag_pcr_1 + +unique(subset_data$diag_pcr_1) #. + +####### +#add labels for +#final_receptor_group +#demo_race_final +#final_tumor_grade +#final_overall_tage +#final_t_stage) +#final_n_stage +#histology_category +#prtx_radiation +#prtx_chemo) +#prtx_endo +#prtx_bonemod +#node_status) +#axillary_dissection +#diag_surgery_type_1 +#diag_neoadj_chemo_1 +#ctDNA_ever +#diag_pcr_1 + + +label(unique_subset_data$age_at_diag) <- "Age at Diagnosis" +units(unique_subset_data$age_at_diag) <- "years" + +#Final receptor group: 1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+' + + +# assign `final_receptor_group` factor levels and labels to `unique_subset_data` +unique_subset_data <- unique_subset_data %>% + mutate( + final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4), + labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+")) + ) -## labels race -subset_data$demo_race_final <- factor(subset_data$demo_race_final, - levels = c("5", "1", "3"), - labels = c("White", "Black", "Asian")) -print(subset_data$demo_race_final) +label(unique_subset_data$final_receptor_group) <- "Final Receptor Group" -as.factor(d$demo_race_final) -#5 = white -#1 = black -#3 = asian +table(unique_subset_data$final_receptor_group) +##demo_race_final +table(unique_subset_data$demo_race_final) #1, 3, 5 -- 5 = white, 1 = black, 3 = asian -### labels -subset_data$final_receptor_group <- factor(subset_data$final_receptor_group, - levels = c("ER+", "ER-", "HER2+"), - labels = c("ER Positive", "ER Negative", "HER2 Positive")) +unique_subset_data$demo_race_final <- + factor(unique_subset_data$demo_race_final, levels=c(1,3,5), + labels=c("Black", + "Asian", "White")) +label(unique_subset_data$demo_race_final) <- "Race" +table(unique_subset_data$demo_race_final) +#final_tumor_grade +table(unique_subset_data$final_tumor_grade) # 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not Reported. Added 3 to NA up above for table 1 code so it will be considered N/A. -# Create table -table1 <- CreateTableOne(vars = c(continuous_vars, categorical_vars), - strata = "ctDNA_ever", - data = subset_data, - factorVars = categorical_vars) +unique_subset_data$final_tumor_grade <- + factor(unique_subset_data$final_tumor_grade, levels=c(0,1,2), + labels=c("Grade 3", + "Grade 1", "Grade 2")) +label(unique_subset_data$final_tumor_grade) <- "Tumor Grade" +table(unique_subset_data$final_tumor_grade) -# Print the table with p-values -print(table1, showAllLevels = TRUE, pDigits = 3) -names(subset_data) +#final_overall_stage + +table(unique_subset_data$final_overall_stage) # 1 = stage I 2 = stage II 3 = stage III + +unique_subset_data$final_overall_stage <- + factor(unique_subset_data$final_overall_stage, levels=c(1,2,3), + labels=c("Stage I", + "Stage II", "Stage III")) +label(unique_subset_data$final_overall_stage) <- "Overall Stage" +table(unique_subset_data$final_overall_stage) + +#final_t_stage +table(unique_subset_data$final_t_stage) # 1 = T1 2 = T2 3 = T3 4 = T4 + +unique_subset_data$final_t_stage <- + factor(unique_subset_data$final_t_stage, levels=c(1,2,3,4), + labels=c("T1", + "T2", "T3", "T4")) +label(unique_subset_data$final_t_stage) <- "T Stage" +table(unique_subset_data$final_t_stage) + + +#final_n_stage + +table(unique_subset_data$final_n_stage) # 0 = N0 1 = N1 2 = N2 3 = N3 + +unique_subset_data$final_n_stage <- + factor(unique_subset_data$final_n_stage, levels=c(0,1,2,3), + labels=c("N0", + "N1", "N2", "N3")) +label(unique_subset_data$final_n_stage) <- "N Stage" +table(unique_subset_data$final_n_stage) + +#histology_category + +table(unique_subset_data$histology_category) #These are labeled already correctly as both ductal and lobular, ductal, lobular, and other +label(unique_subset_data$histology_category) <- "Histology Category" + + +#prtx_radiation + +table(unique_subset_data$prtx_radiation) #1 = radiation 0 = no + +unique_subset_data$prtx_radiation <- + factor(unique_subset_data$prtx_radiation, levels=c(0,1), + labels=c("No Radiation", "Radiation")) +label(unique_subset_data$prtx_radiation) <- "Radiation" +table(unique_subset_data$prtx_radiation) + +#prtx_chemo + +table(unique_subset_data$prtx_chemo) #1 = chemo 0 = no +table(subset_data$prtx_chemo) + +unique_subset_data$prtx_chemo <- +factor(unique_subset_data$prtx_chemo, levels=c(0,1), + labels=c("No Chemo", "Chemo")) +label(unique_subset_data$prtx_chemo) <- "Chemo" +table(unique_subset_data$prtx_chemo) + +#prtx_endo + + +table(unique_subset_data$prtx_endo) #1 = hormone therapy 0 = no +table(subset_data$prtx_endo) + +unique_subset_data$prtx_endo <- +factor(unique_subset_data$prtx_endo, levels=c(0,1), + labels=c("No Endocrine Therapy", "Endocrine Therapy")) +label(unique_subset_data$prtx_endo) <- "Endocrine Therapy" +table(unique_subset_data$prtx_endo) + +#prtx_bonemod + +table(unique_subset_data$prtx_bonemod) #1 = bonemod 0 = no +table(unique_subset_data$prtx_bonemod) + +unique_subset_data$prtx_bonemod <- +factor(unique_subset_data$prtx_bonemod, levels=c(0,1), + labels=c("No Bone Modifying Treatment", "Bone Modifying Treatment")) +label(unique_subset_data$prtx_bonemod) <- "Bone Modifying Treatment" +table(unique_subset_data$prtx_bonemod) + + + +#node_status +table(unique_subset_data$node_status) #already positive and negative +label(unique_subset_data$node_status) <- "Node Status" + +#axillary_dissection + +table(unique_subset_data$axillary_dissection) #1 = axillary dissection 0 = no dissection + +unique_subset_data$axillary_dissection <- +factor(unique_subset_data$axillary_dissection, levels=c(0,1), + labels=c("No Axillary Dissection", "Axillary Dissection")) +label(unique_subset_data$axillary_dissection) <- "Axillary Dissection" +table(unique_subset_data$axillary_dissection) + +#diag_surgery_type_1 +table(unique_subset_data$diag_surgery_type_1) #1 = Lumpectomy 2 = Mastectomy + +unique_subset_data$diag_surgery_type_1 <- +factor(unique_subset_data$diag_surgery_type_1, levels=c(1,2), + labels=c("Lumpectomy", "Mastectomy")) +label(unique_subset_data$diag_surgery_type_1) <- "Surgery Type" +table(unique_subset_data$diag_surgery_type_1) + +#diag_neoadj_chemo_1 + +table(unique_subset_data$diag_neoadj_chemo_1) #1 = Neoadj Chemo 0 = No Neoadjuv + +unique_subset_data$diag_neoadj_chemo_1 <- +factor(unique_subset_data$diag_neoadj_chemo_1, levels=c(0,1), + labels=c("No Neoadjuvant Chemo", "Neoadjuvant Chemo")) +label(unique_subset_data$diag_neoadj_chemo_1) <- "Neoadjuvant Chemo" +table(unique_subset_data$diag_neoadj_chemo_1) + +#pCR +table(unique_subset_data$diag_pcr_1) #1 = pCR 2 = non-PCR + +unique_subset_data$diag_pcr_1<- +factor(unique_subset_data$diag_pcr_1, levels=c(1,2), + labels=c("pCR", "Non-pCR")) +label(unique_subset_data$diag_pcr_1) <- "Pathologic Complete Response" +table(unique_subset_data$diag_pcr_1) + + + +#ctDNA_ever +table(unique_subset_data$ctDNA_ever) #FALSE = ctDNA Negative TRUE = ctDNA Positive + +unique_subset_data$ctDNA_ever <- +factor(unique_subset_data$ctDNA_ever, levels=c("FALSE", "TRUE"), + labels=c("ctDNA Negative", "ctDNA Positive")) +label(unique_subset_data$ctDNA_ever) <- "ctDNA Status" +table(unique_subset_data$ctDNA_ever) + +caption <- "Table 1 by ctDNA Status" + +# Generate the table1 summary +table1( + ~ age_at_diag + final_receptor_group + demo_race_final + + final_tumor_grade + final_overall_stage + + final_t_stage + final_n_stage + + histology_category + prtx_radiation + + prtx_chemo + prtx_endo + prtx_bonemod + + node_status + axillary_dissection + + diag_surgery_type_1 + diag_neoadj_chemo_1 + diag_pcr_1 | + ctDNA_ever, + data = unique_subset_data, overall=c(left="Total"), caption=caption) ``` +```{r} +#Adding P-values and tests of significance to the code. + +# Step 1: Create table1 output +table1_output <- table1( + ~ age_at_diag + final_receptor_group + demo_race_final + + final_tumor_grade + final_overall_stage + + final_t_stage + final_n_stage + + histology_category + prtx_radiation + + prtx_chemo + prtx_endo + prtx_bonemod + + node_status + axillary_dissection + + diag_surgery_type_1 + diag_neoadj_chemo_1 | + ctDNA_ever, + data = unique_subset_data, + overall = c(left = "Total"), + caption = "Table 1: Summary of demographic and clinical variables by ctDNA status" +) -I have chosen to perform a multivariable logistic regression to identify predictors of ctDNA positivity. ctDNA positivity is a binary outcome, and we have performed univariable analyses as above already to look at -``` {r} +#### +pvalue_function <- function(x, ...) { + # Remove any "Overall" group if present and focus only on ctDNA+ and ctDNA- comparisons + x <- x[!names(x) %in% "Overall"] # Filter out the "Overall" column + y <- unlist(x) + g <- factor(rep(1:length(x), times = sapply(x, length))) + + # Debugging information to check group levels and data + if (length(unique(g)) != 2) { + return(NA) # Return NA if not comparing exactly two groups + } + + # Perform the appropriate test based on the type of variable + if (is.numeric(y)) { + # For continuous variables, perform a t-test + p <- t.test(y ~ g)$p.value + } else { + # For categorical variables, perform a chi-squared test or Fisher's test + table_result <- table(y, g) + + # Choose the correct test based on cell counts + if (any(table_result < 5)) { + p <- fisher.test(table_result)$p.value # Use Fisher's test for low counts + } else { + p <- chisq.test(table_result)$p.value # Use chi-squared test otherwise + } + } + + # Format the p-value for output + formatted_p <- format.pval(p, digits = 3, eps = 0.001) + return(formatted_p) +} + + +# Generate table1 with the p-value column +table1_p <- table1( + ~ age_at_diag + final_receptor_group + demo_race_final + + final_tumor_grade + final_overall_stage + + final_t_stage + final_n_stage + + histology_category + prtx_radiation + + prtx_chemo + prtx_endo + prtx_bonemod + + node_status + axillary_dissection + + diag_surgery_type_1 + diag_neoadj_chemo_1 | + ctDNA_ever, + data = unique_subset_data, + overall = c(left = "Total"), + extra.col = list("P-value" = pvalue_function), # Add p-value function + extra.col.pos = 4 # Position of the extra column +) + +table1_p #Still not adding p-values....grrr + +``` + +** Table of demographics and clinical factors by DTC status ** + +```{r} + +####### Table of clinical and demographic factors by DTC status ######### + +## https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html TRY LATER +## lasso https://stats.stackexchange.com/questions/72251/an-example-lasso-regression-using-glmnet-for-binary-outcome +## https://www.r-bloggers.com/2020/05/quick-tutorial-on-lasso-regression-with-example/#google_vignette +#Table 1 Code +library(table1) +library(dplyr) +library(tidyr) +library(stringr) + +# Prepare the dataset +unique_subset_data <- subset_data %>% + mutate( + # Convert "Missing" and 99 to NA in relevant columns + final_t_stage = na_if(as.character(final_t_stage), "Missing"), + final_t_stage = na_if(final_t_stage, "99"), + final_overall_stage = na_if(as.character(final_overall_stage), "Missing"), + final_overall_stage = na_if(final_overall_stage, "99"), + final_tumor_grade = na_if(final_tumor_grade, 3), + # Replace 99 with NA in all numeric columns + across(where(is.numeric), ~ na_if(.x, 99)) + ) %>% + filter( + # Remove rows with NA values for specific columns before summarizing + !is.na(final_t_stage), + !is.na(final_overall_stage), + !is.na(final_receptor_group), + !is.na(demo_race_final), + !is.na(final_tumor_grade), + !is.na(final_n_stage), + !is.na(histology_category), + !is.na(axillary_dissection), + !is.na(diag_surgery_type_1), + !is.na(diag_neoadj_chemo_1), + !is.na(ctDNA_ever) + ) %>% + group_by(participant_id) %>% + summarize( + age_at_diag = first(na.omit(age_at_diag)), + final_receptor_group = first(na.omit(final_receptor_group)), + demo_race_final = first(na.omit(demo_race_final)), + final_tumor_grade = first(na.omit(final_tumor_grade)), + final_overall_stage = first(na.omit(final_overall_stage)), + final_t_stage = first(na.omit(final_t_stage)), + final_n_stage = first(na.omit(final_n_stage)), + histology_category = first(na.omit(histology_category)), + prtx_radiation = first(na.omit(prtx_radiation)), + prtx_chemo = first(na.omit(prtx_chemo)), + prtx_endo = first(na.omit(prtx_endo)), + prtx_bonemod = first(na.omit(prtx_bonemod)), + node_status = first(na.omit(node_status)), + axillary_dissection = first(na.omit(axillary_dissection)), + diag_surgery_type_1 = first(na.omit(diag_surgery_type_1)), + diag_neoadj_chemo_1 = first(na.omit(diag_neoadj_chemo_1)), + ctDNA_ever = first(na.omit(ctDNA_ever)) + ) + +# Generate the table1 summary +table1( + ~ age_at_diag + factor(final_receptor_group) + factor(demo_race_final) + + factor(final_tumor_grade) + factor(final_overall_stage) + + factor(final_t_stage) + factor(final_n_stage) + + factor(histology_category) + factor(prtx_radiation) + + factor(prtx_chemo) + factor(prtx_endo) + factor(prtx_bonemod) + + factor(node_status) + factor(axillary_dissection) + + factor(diag_surgery_type_1) + factor(diag_neoadj_chemo_1) | + ctDNA_ever, + data = unique_subset_data +) + +####### +#add labels for +#final_receptor_group +#demo_race_final +#final_tumor_grade +#final_overall_tage +#final_t_stage) +#final_n_stage +#histology_category +#prtx_radiation +#prtx_chemo) +#prtx_endo +#prtx_bonemod +#node_status) +#axillary_dissection +#diag_surgery_type_1 +#diag_neoadj_chemo_1 +#ctDNA_ever + + +label(unique_subset_data$age_at_diag) <- "Age at Diagnosis" +units(unique_subset_data$age_at_diag) <- "years" + +#Final receptor group: 1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+' + + +# assign `final_receptor_group` factor levels and labels to `unique_subset_data` +unique_subset_data <- unique_subset_data %>% + mutate( + final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4), + labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+")) + ) + +label(unique_subset_data$final_receptor_group) <- "Final Receptor Group" + +table(unique_subset_data$final_receptor_group) + +##demo_race_final + +table(unique_subset_data$demo_race_final) #1, 3, 5 -- 5 = white, 1 = black, 3 = asian + +unique_subset_data$demo_race_final <- + factor(unique_subset_data$demo_race_final, levels=c(1,3,5), + labels=c("Black", + "Asian", "White")) +label(unique_subset_data$demo_race_final) <- "Race" +table(unique_subset_data$demo_race_final) + + +#final_tumor_grade +table(unique_subset_data$final_tumor_grade) # 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not Reported. Added 3 to NA up above for table 1 code so it will be considered N/A. + +unique_subset_data$final_tumor_grade <- + factor(unique_subset_data$final_tumor_grade, levels=c(0,1,2), + labels=c("Grade 3", + "Grade 1", "Grade 2")) +label(unique_subset_data$final_tumor_grade) <- "Tumor Grade" +table(unique_subset_data$final_tumor_grade) + + +#final_overall_stage + +table(unique_subset_data$final_overall_stage) # 1 = stage I 2 = stage II 3 = stage III + +unique_subset_data$final_overall_stage <- + factor(unique_subset_data$final_overall_stage, levels=c(1,2,3), + labels=c("Stage I", + "Stage II", "Stage III")) +label(unique_subset_data$final_overall_stage) <- "Overall Stage" +table(unique_subset_data$final_overall_stage) + +#final_t_stage +table(unique_subset_data$final_t_stage) # 1 = T1 2 = T2 3 = T3 4 = T4 + +unique_subset_data$final_t_stage <- + factor(unique_subset_data$final_t_stage, levels=c(1,2,3,4), + labels=c("T1", + "T2", "T3", "T4")) +label(unique_subset_data$final_t_stage) <- "T Stage" +table(unique_subset_data$final_t_stage) + + +#final_n_stage + +table(unique_subset_data$final_n_stage) # 0 = N0 1 = N1 2 = N2 3 = N3 + +unique_subset_data$final_n_stage <- + factor(unique_subset_data$final_n_stage, levels=c(0,1,2,3), + labels=c("N0", + "N1", "N2", "N3")) +label(unique_subset_data$final_n_stage) <- "N Stage" +table(unique_subset_data$final_n_stage) + +#histology_category + +table(unique_subset_data$histology_category) #These are labeled already correctly as both ductal and lobular, ductal, lobular, and other +label(unique_subset_data$histology_category) <- "Histology Category" + + +#prtx_radiation + +table(unique_subset_data$prtx_radiation) #1 = radiation 0 = no + +unique_subset_data$prtx_radiation <- + factor(unique_subset_data$prtx_radiation, levels=c(0,1), + labels=c("No Radiation", "Radiation")) +label(unique_subset_data$prtx_radiation) <- "Radiation" +table(unique_subset_data$prtx_radiation) + +#prtx_chemo + +table(unique_subset_data$prtx_chemo) #1 = chemo 0 = no +table(subset_data$prtx_chemo) + +unique_subset_data$prtx_chemo <- +factor(unique_subset_data$prtx_chemo, levels=c(0,1), + labels=c("No Chemo", "Chemo")) +label(unique_subset_data$prtx_chemo) <- "Chemo" +table(unique_subset_data$prtx_chemo) + +#prtx_endo + + +table(unique_subset_data$prtx_endo) #1 = hormone therapy 0 = no +table(subset_data$prtx_endo) + +unique_subset_data$prtx_endo <- +factor(unique_subset_data$prtx_endo, levels=c(0,1), + labels=c("No Endocrine Therapy", "Endocrine Therapy")) +label(unique_subset_data$prtx_endo) <- "Endocrine Therapy" +table(unique_subset_data$prtx_endo) + +#prtx_bonemod + +table(unique_subset_data$prtx_bonemod) #1 = bonemod 0 = no +table(unique_subset_data$prtx_bonemod) + +unique_subset_data$prtx_bonemod <- +factor(unique_subset_data$prtx_bonemod, levels=c(0,1), + labels=c("No Bone Modifying Treatment", "Bone Modifying Treatment")) +label(unique_subset_data$prtx_bonemod) <- "Bone Modifying Treatment" +table(unique_subset_data$prtx_bonemod) + + + +#node_status +table(unique_subset_data$node_status) #already positive and negative +label(unique_subset_data$node_status) <- "Node Status" + +#axillary_dissection + +table(unique_subset_data$axillary_dissection) #1 = axillary dissection 0 = no dissection + +unique_subset_data$axillary_dissection <- +factor(unique_subset_data$axillary_dissection, levels=c(0,1), + labels=c("No Axillary Dissection", "Axillary Dissection")) +label(unique_subset_data$axillary_dissection) <- "Axillary Dissection" +table(unique_subset_data$axillary_dissection) + +#diag_surgery_type_1 +table(unique_subset_data$diag_surgery_type_1) #1 = Lumpectomy 2 = Mastectomy + +unique_subset_data$diag_surgery_type_1 <- +factor(unique_subset_data$diag_surgery_type_1, levels=c(1,2), + labels=c("Lumpectomy", "Mastectomy")) +label(unique_subset_data$diag_surgery_type_1) <- "Surgery Type" +table(unique_subset_data$diag_surgery_type_1) + +#diag_neoadj_chemo_1 + +table(unique_subset_data$diag_neoadj_chemo_1) #1 = Neoadj Chemo 0 = No Neoadjuv + +unique_subset_data$diag_neoadj_chemo_1 <- +factor(unique_subset_data$diag_neoadj_chemo_1, levels=c(0,1), + labels=c("No Neoadjuvant Chemo", "Neoadjuvant Chemo")) +label(unique_subset_data$diag_neoadj_chemo_1) <- "Neoadjuvant Chemo" +table(unique_subset_data$diag_neoadj_chemo_1) + + +#ctDNA_ever +table(unique_subset_data$ctDNA_ever) #FALSE = ctDNA Negative TRUE = ctDNA Positive + +unique_subset_data$ctDNA_ever <- +factor(unique_subset_data$ctDNA_ever, levels=c("FALSE", "TRUE"), + labels=c("ctDNA Negative", "ctDNA Positive")) +label(unique_subset_data$ctDNA_ever) <- "ctDNA Status" +table(unique_subset_data$ctDNA_ever) + +caption <- "Table 1 by ctDNA Status" + +# Generate the table1 summary +table1( + ~ age_at_diag + final_receptor_group + demo_race_final + + final_tumor_grade + final_overall_stage + + final_t_stage + final_n_stage + + histology_category + prtx_radiation + + prtx_chemo + prtx_endo + prtx_bonemod + + node_status + axillary_dissection + + diag_surgery_type_1 + diag_neoadj_chemo_1 | + ctDNA_ever, + data = unique_subset_data, overall=c(left="Total"), caption=caption) + + + +``` + + +I have chosen to perform a multivariable logistic regression to identify predictors of ctDNA positivity as we suspect this is a biomarker of relapse and can see even in our data-set that it is strongly associated with relapse and worse overall survival. Identifying predictors of positivity would help us to understand who we might consider screening for ctDNA positivity, as this testing is expensive and takes time and resources--and may not benefit everyone. ctDNA positivity is a binary outcome, and we have performed univariable analyses as above already to look at potentially significant relationships. There are multiple types of variables worth considering, including demographic and clinical factors, disease factors (such as aggressiveness of the tumor measured by histology and grade, the hormone receptor status of the tumor, the stage of disease at diagnosis), and treatment factors (surgery type, radiation or no radiation, neoadjuvant chemotherapy or no, endocrine aka anti-hormone therapy). These are all time-invarying factors, and all were present at the beginning of enrollment on study, prior to ctDNA positivity. + +Lasso will give us the most parsimonious model and is an automatic approach, without consideration of absolute p-value cutoffs. We considered stepwise model building based on p-values, but this approach has gone out of favor as this approach uses somewhat arbitrary p-value cutoffs and can ignore actually relevant and important variables. As we have already performed univariate tests of association above with chi-squared tests, we do not need to perform univariable logistic regression. + +```{r} + +### DELETE + library(dplyr) -# Univariable logistic regression +# Univariable logistic regression -- do not need to do this as we did the chisquared tests already # Define the outcome variable outcome <- "ctDNA_ever" @@ -2758,52 +3329,213 @@ print(univariable_results) ``` -** do i need to do univariate associations +We will next think about our multivariable regression model. We have several variables that were significant in our univariable analyses (chi-squared). These include median age-at-diagnosis, longer time from diagnosis to enrollment, higher tumor grade, and higher initial stage at diagnosis. Variables that were not significant but that could be considered include histology, nodal positivity, higher t-stage, and receptor status. While recurrence (both distant and local) as well as worse survival are significantly associated with ctDNA positivity, these are outcomes that we think of as following ctDNA positivity temporally and therefore should not be included in our predictive model as predictors. + +In thinking about what these variables represent, we think about the extent of treatment that patients have received as one major category. We also think about intrinsic tumor risk factors as another. + +There is no specific method to choose variables, but generally purposeful selection begins with univariate analysis which we have already performed. Next we will perform LASSO to identify and select variables. + +```{r} +library(glmnet) + +# Prepare the response variable +y <- unique_subset_data$ctDNA_ever + +# Predictor matrix, excluding the outcome variable +X <- as.matrix(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", + "final_tumor_grade", "final_overall_stage", + "final_t_stage", "final_n_stage", + "histology_category", "prtx_radiation", + "prtx_chemo", "prtx_endo", "prtx_bonemod", + "node_status", "axillary_dissection", + "diag_surgery_type_1", "diag_neoadj_chemo_1")]) + + +# Fit lasso model +lasso_model <- glmnet(X, y, family = "binomial", alpha = 1) # alpha = 1 for lasso + +#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. +cv_lasso_model <- cv.glmnet(X, y, family = "binomial", alpha = 1) + +#plotting the results to look at the performance of different lamda +plot(cv_lasso_model) + +#getting the best lambda +best_lambda <- cv_lasso_model$lambda.min +print(paste("Best lambda:", best_lambda)) + +#Finding the final fit model with the optimal lambda +final_lasso_model <- glmnet(X, y, family = "binomial", alpha = 1, lambda = best_lambda) + +#Which coefficents are included in the model. age at diag coeff is 0.057 (so small influence) +coef(final_lasso_model) + +### Trying with fewer variables that are not as colinear (not including node status or chemo as a majority of ppl got chemo) + +#making sure that the variables are represented in our dataset we are going to use for the lasso +table(unique_subset_data$ctDNA_ever) +table(unique_subset_data$demo_race_final) + +#most of these are factors or characters +sapply(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", + "final_tumor_grade", "final_overall_stage", + "final_t_stage", "final_n_stage", + "histology_category", "prtx_radiation", + "prtx_chemo", "prtx_endo", "prtx_bonemod", + "node_status", "axillary_dissection", + "diag_surgery_type_1", "diag_neoadj_chemo_1")], class) + +#converting factors/ characters to numeric +unique_subset_data$final_receptor_group <- as.numeric(as.factor(unique_subset_data$final_receptor_group)) +unique_subset_data$demo_race_final <- as.numeric(as.factor(unique_subset_data$demo_race_final)) +unique_subset_data$final_tumor_grade <- as.numeric(as.factor(unique_subset_data$final_tumor_grade)) +unique_subset_data$final_overall_stage <- as.numeric(as.factor(unique_subset_data$final_overall_stage)) +unique_subset_data$final_t_stage<- as.numeric(as.factor(unique_subset_data$final_t_stage)) +unique_subset_data$final_n_stage <- as.numeric(as.factor(unique_subset_data$final_n_stage)) +unique_subset_data$histology_category <- as.numeric(as.factor(unique_subset_data$histology_category)) +unique_subset_data$prtx_radiation <- as.numeric(as.factor(unique_subset_data$prtx_radiation)) +unique_subset_data$prtx_chemo <- as.numeric(as.factor(unique_subset_data$prtx_chemo)) +unique_subset_data$prtx_endo <- as.numeric(as.factor(unique_subset_data$prtx_endo)) +unique_subset_data$prtx_bonemod <- as.numeric(as.factor(unique_subset_data$prtx_bonemod)) +unique_subset_data$node_status <- as.numeric(as.factor(unique_subset_data$node_status)) +unique_subset_data$axillary_dissection <- as.numeric(as.factor(unique_subset_data$axillary_dissection)) +unique_subset_data$diag_surgery_type_1 <- as.numeric(as.factor(unique_subset_data$diag_surgery_type_1)) +unique_subset_data$diag_neoadj_chemo_1 <- as.numeric(as.factor(unique_subset_data$diag_neoadj_chemo_1)) + + +#checking NAs + +# Check for NAs in the dataset -- none! +sum(is.na(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", + "final_tumor_grade", "final_overall_stage", + "final_t_stage", "final_n_stage", + "histology_category", "prtx_radiation", + "prtx_chemo", "prtx_endo", "prtx_bonemod", + "node_status", "axillary_dissection", + "diag_surgery_type_1", "diag_neoadj_chemo_1")])) + +### let's try LASSO again +# Prepare the response variable +y <- unique_subset_data$ctDNA_ever + +#making matrix / in the right form -- appears to be all numeric data +X1 <- as.matrix(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", + "final_tumor_grade", "final_overall_stage", + "final_t_stage", "final_n_stage", + "histology_category", "prtx_radiation", + "prtx_chemo", "prtx_endo", "prtx_bonemod", + "node_status", "axillary_dissection", + "diag_surgery_type_1", "diag_neoadj_chemo_1")]) + +lasso_model <- glmnet(X1, y, family = "binomial", alpha = 1) # alpha = 1 for lasso +#when I initially fit this model, I got a NAs induced by coercion error message and realized that a number of these are non-numeric --so above converted a bunch of variables to numeric that were initially factors + + +#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. Do get warning that one of the groups in the outcome has fewer than 8 observations, so this modeling may not go super well. +cv_lasso_model <- cv.glmnet(X1, y, family = "binomial", alpha = 1) + +#plotting the results to look at the performance of different lamda +plot(cv_lasso_model) + +#getting the best lambda --best lambda 0.0041359 +best_lambda <- cv_lasso_model$lambda.min +print(paste("Best lambda:", best_lambda)) + +#Finding the final fit model with the optimal lambda +final_lasso_model <- glmnet(X1, y, family = "binomial", alpha = 1, lambda = best_lambda) + +#Which coefficents are included in the model. Only age_at_diag appears to be significant currently, but the coefficient is 0 suggesting it has very little effect. +coef(final_lasso_model) + + +#### Repeating Lasso without all of the variables (excluding some that we would expect to be colinear such as NACT and chemo. we will exclude chemo as a majority of individuals had chemo. We will also exclude node_status as this is a derivative/summary variable from final_n_stage) --> X2 + +y <- unique_subset_data$ctDNA_ever +X2<- as.matrix(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", + "final_tumor_grade", "final_overall_stage", + "final_t_stage", "final_n_stage", + "histology_category", "prtx_radiation", + "prtx_endo", "prtx_bonemod", + "axillary_dissection", + "diag_surgery_type_1", "diag_neoadj_chemo_1")]) + +lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1) # alpha = 1 for lasso +#when I initially fit this model, I got a NAs induced by coercion error message and realized that a number of these are non-numeric --so above converted a bunch of variables to numeric that were initially factors + + +#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. +cv_lasso_model <- cv.glmnet(X2, y, family = "binomial", alpha = 1) + +#plotting the results to look at the performance of different lamda +plot(cv_lasso_model) + +#getting the best lambda -- lambda is 0.06 +best_lambda <- cv_lasso_model$lambda.min +print(paste("Best lambda:", best_lambda)) + +#Finding the final fit model with the optimal lambda +final_lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1, lambda = best_lambda) + +#Which coefficents are included in the model. Only age_at_diag appears to be predictive in the model (but its s0 value is 0), with all the others just showing dots, which suggests that all of them were shrunk to zero because they were not influential enough on the outcome. +coef(final_lasso_model) + +### tutorial to use -- https://www.statology.org/lasso-regression-in-r/ +``` -We will next think about our multivariable regression model. We have several variables that were significant in our univariable analyses (chi-squared). These include median age-at-diagnosis, longer time from diagnosis to enrollment, higher tumor grade, and higher initial stage at diagnosis. Variables that were not significant but that could be considered include histology, nodal positivity, higher t-stage, and receptor status. While recurrence (both distant and local) as well as worse survival are significantly associated with ctDNA positivity, these are outcomes that we think of as following ctDNA positivity temporally and therefore should not be included in our predictive model as predictors. +It is somewhat challenging to model ctDNA positivity using any of these approaches because there were only 9 individuals in this cohort with positive results. Because of this low N, it is hard to identify predictors. -In thinking about what these variables represent, we think about the extent of treatment that patients have received as one major category. We also think about intrinsic tumor risk factors as another +The intercept (-2) is the log-odds of the outcome (dtc_ever, aka dtc positivity) when all the predictor variables are zero. The coefficients can be interpreted as such -- for age at diagnosis, for every 1 unit increase in age, the log-odds of testing DTC positive decreases by 0.0086, holding all other variables constant. This is a relatively small decrease though. -There is no specific method to choose variables, but generally purposeful selection begins with univariate analysis. Any varialbe having significance is selected as a cndidate for multivariate analysis, based on a p-value cut-offp oint of 0.25, as more stringent cutoffs can fail to identify variables known to be important. Significance is evaluated at the 0.1 alpha level, and confounding as a change in the parameter estimate greater than 15% or 20% compared to the full model. This is as per Hosmer and Lemeshow selection methodology. +To test our proof of principle approach that lasso can be applied to this dataset, we will also look at DTC predictors, as DTC positivity was more frequent in this cohort and we therefore suspect the modeling approach may work better. ``` {r} -# Multivariable logistic regression -# Combine all variables into a formula -multivariable_formula <- as.formula(paste(outcome, "~", paste(c(continuous_vars, categorical_vars), collapse = " + "))) +#### DTC predictions. -# Fit the multivariable logistic regression model -multivariable_model <- glm(multivariable_formula, data = subset_data, family = "binomial") +table(subset_data$dtc_ever) #dtc_ever is in subset_data (but not yet in the unique_subset_data) +# Merge the datasets to include dtc_ever by participant_id +unique_subset_data <- merge(unique_subset_data, subset_data[, c("participant_id", "dtc_ever")], by = "participant_id", all.x = TRUE) +table(unique_subset_data$dtc_ever) #dtc_ever is now in unique_subset data -# Summary of the multivariable model -summary(multivariable_model) +#run the lasso for DTC status. This might work better as there are more DTC + results +y1 <- unique_subset_data$dtc_ever +X3 <- as.matrix(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", + "final_tumor_grade", "final_overall_stage", + "final_t_stage", "final_n_stage", + "histology_category", "prtx_radiation", + "prtx_endo", "prtx_bonemod", + "axillary_dissection", + "diag_surgery_type_1", "diag_neoadj_chemo_1")]) -# Extract coefficients and p-values from the multivariable model -multivariable_results <- data.frame( - Variable = rownames(summary(multivariable_model)$coefficients), - Estimate = summary(multivariable_model)$coefficients[, 1], - Std_Error = summary(multivariable_model)$coefficients[, 2], - z_value = summary(multivariable_model)$coefficients[, 3], - p_value = summary(multivariable_model)$coefficients[, 4] -) +lasso_model <- glmnet(X3, y1, family = "binomial", alpha = 1) # alpha = 1 for lasso. 0 for ridge. -# Print multivariable results -print(multivariable_results) + +#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. +cv_lasso_model <- cv.glmnet(X3, y1, family = "binomial", alpha = 1) +#plotting the results to look at the performance of different lamda +plot(cv_lasso_model) -``` +#getting the best lambda -- best lambda is 0.00345 (better than with ctDNA) +best_lambda <- cv_lasso_model$lambda.min +print(paste("Best lambda:", best_lambda)) +#Finding the final fit model with the optimal lambda +final_lasso_model <- glmnet(X3, y1, family = "binomial", alpha = 1, lambda = best_lambda) +#Which coefficents are included in the model. For the model with DTC positivity, we get more coefficients retained in the model. Most notable is the influence of axillary dissection (or none) on the log-odds of dtc positivity. Other influential factors using LASSO are surgery type (with mastectomy vs lumpectomy increasing the log odds of DTC positivity) and neoadjuvant chemotherapy (with the presence of NACT increasing the log odds of DTC positivity). +coef(final_lasso_model) -Of 184 pts enrolled from 2016 – 2021, 121 had tissue available; 114/121 (94%) had successful WES. A total of \_\_\_\_\_ plasma samples from 109 pts (median 2 timepoints each, range 1-12) have been successfully tested by RaDaR to date. -Overall, ctDNA was detected in 11 samples from 9/96 pts (9.3%) with a median eVAF of 0.009% (range 0.002-0.084%). Two pts were ctDNA+ at baseline (BL), and 7 became positive on surveillance. 87/96 (90.6%) were ctDNA- across all timepoints. +``` + +For the LASSO model with DTC positivity, we get more coefficients retained in the model and a lower lambda, which suggests a better overall model. Most notable factors in the LASSO for DTC positivity are the influence of axillary dissection (or none) on the log-odds of dtc positivity. Other influential factors using LASSO are surgery type (with mastectomy vs lumpectomy increasing the log odds of DTC positivity) and neoadjuvant chemotherapy (with the presence of NACT increasing the log odds of DTC positivity). -34/96 pts (35%) were DTC+, either at BL (n=24, 25%) or after (n=10, 10%). Considering all timepoints, concordance was 64%. Of 34 ever-DTC+ pts, 4 (12%) were ctDNA+ (of whom 3/4 recurred) and 30 remained ctDNA- (with 1/30 who recurred). Among the 62 pts who remained DTC-, 5 (8%) were ctDNA+ (with 5/5 who recurred), and 57 remained ctDNA- (of whom 5/57 recurred). All ctDNA positivity in DTC+ pts occurred at the time of or after DTC positivity. Over median follow-up (f/u) of 65 months (m), BC recurrence occurred in 14/96 pts (15%), with 2 locoregional-only and 12 distant +/- locoregional recurrences (involving the bone, liver, lung/pleura, and brain); 8/14 pts (57%) were ctDNA+ prior to relapse. 7/12 (58%) with distant recurrences were ctDNA+ prior to metastatic diagnosis, at a median lead time of 15 m (range 0 – 25). Overall, ctDNA+ pts experienced a median lead time from ctDNA positivity to recurrence of 13 m (range 0 – 25). Only 1 of 9 ctDNA+ pts has not recurred; this pt was DTC+ and went on therapeutic trial, without evidence of recurrence over 20 m f/u. 30/34 DTC+ pts (89%) who went on therapeutic trial have not had ctDNA detected during f/u and have not recurred. Overall, ctDNA status was significantly associated with relapse (p\<0.01), with a PPV of 89% and NPV of 93%. Of the 24 BL DTC+ pts, 2 became ctDNA+ at subsequent timepoints, an average of 18 m after DTC assessment, and both relapsed (3 and 5 m from ctDNA detection, respectively). + +Among the 62 pts who remained DTC-, 5 (8%) were ctDNA+ (with 5/5 who recurred), and 57 remained ctDNA- (of whom 5/57 recurred). All ctDNA positivity in DTC+ pts occurred at the time of or after DTC positivity. Over median follow-up (f/u) of 65 months (m), BC recurrence occurred in 14/96 pts (15%), with 2 locoregional-only and 12 distant +/- locoregional recurrences (involving the bone, liver, lung/pleura, and brain); 8/14 pts (57%) were ctDNA+ prior to relapse. 7/12 (58%) with distant recurrences were ctDNA+ prior to metastatic diagnosis, at a median lead time of 15 m (range 0 – 25). Overall, ctDNA+ pts experienced a median lead time from ctDNA positivity to recurrence of 13 m (range 0 – 25). Only 1 of 9 ctDNA+ pts has not recurred; this pt was DTC+ and went on therapeutic trial, without evidence of recurrence over 20 m f/u. 30/34 DTC+ pts (89%) who went on therapeutic trial have not had ctDNA detected during f/u and have not recurred. Overall, ctDNA status was significantly associated with relapse (p\<0.01), with a PPV of 89% and NPV of 93%. Of the 24 BL DTC+ pts, 2 became ctDNA+ at subsequent timepoints, an average of 18 m after DTC assessment, and both relapsed (3 and 5 m from ctDNA detection, respectively). Describe your results and include relevant tables, plots, and code/comments used to obtain them. You may refer to the @sec-methods as needed. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you'd like, but this is not required. ## Conclusion {#sec-conclusion} - -In this study, X was associated with Y. +In this study, X was associated with Y. From c139fd2c47cb7d44463c884792a4744a59dc529b Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 9 Dec 2024 14:44:38 -0500 Subject: [PATCH 07/14] Update FinalProject.html --- FinalProject.html | 16520 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 16452 insertions(+), 68 deletions(-) diff --git a/FinalProject.html b/FinalProject.html index b07acefc5..dfc896185 100644 --- a/FinalProject.html +++ b/FinalProject.html @@ -6,8 +6,9 @@ + -Final Presentation +Predictors of ctDNA positivity - - - - - - - - - - + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+

Predictors of ctDNA positivity

+

BMIN503/EPID600 Final Project

+
+ + + +
+ +
+
Author
+
+

Eleanor Taranto

+
+
+ + + +
+ + + +
+ + +
+
+

1 Overview

+

Link to my final project: https://github.com/ntaranto/BMIN503_Final_Project

+

After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA (ctDNA) as risk factors for recurrence. Historically, in retrospective, observational studies, these markers of minimal residual disease (MRD) have demonstrated strong associations with recurrence and disease relapse. But it remains unclear how these associations persist with modern treatment paradigms, what the time course of positivity, and what clinical risk factors put patients at greatest risk for having DTC or ctDNA positivity and subsequent relapse–as well as which most strongly predict biomarker positivity.

+

Optimizing detection, intervention and surveillance for MRD after breast cancer treatment utilizing highly sensitive, real-time assays to identify patients with dormant tumor cells and those undergoing tumor reactivation in-transit to recurrence would fundamentally change the paradigm of breast cancer follow up from one of “watchful waiting” to a proactive approach that could enable ongoing monitoring for subclinical disease and suppression or eradication of cells that could potentially metastasize. Ultimately, an approach using MRD biomarkers could provide reassurance to patients with definitively negative MRD testing that they are unlikely to experience a relapse, could enable detection and treatment strategies for those in whom MRD is detectable and, ultimately, prevent recurrence and subsequent death from metastatic breast cancer while preserving quality of life for millions of breast cancer survivors around the world.

+

In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this study, we are following patients over multiple years after their breast cancer for these markers and clinically following them for relapse and survival events. The goal of this study is to assess the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time and risk factors for DTC and ctDNA positivity, optimize the type and number of tests needed to predict recurrence, and further evaluate the long-term impact of therapeutic interventions aimed at eliminating DTCs.

+

In this specific analysis, we will look at the clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed.

+

For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the associations between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about clinical predictors of positivity and the biomarkers of breast cancer recurrence and dormance more broadly.

+
+
+

2 Introduction

+

Breast cancer is the most prevalent cancer since it is both common and treatable, with 5-year survival rates approaching 90%. Of the 14.5 million cancer survivors alive today in the U.S., nearly 1 in 4 are breast cancer survivors. Unfortunately, up to 30% of patients who receive adjuvant therapy for breast cancer will ultimately recur and die from their disease, typically as a consequence of metastatic recurrence. Since recurrent breast cancer is incurable, the propensity of cancers to recur following treatment is arguably the most important determinant of clinical outcome. Recurrent breast cancers arise from the pool of breast cancer cells that remain after initial treatment, likely in the form of minimal residual disease (MRD): local and disseminated residual tumor cells that survive in their host in a presumed dormant state following treatment of the primary breast cancer. The development of incurable metastatic disease is thought to be due to this persistent pool of residual disease resulting from escape of cells from the primary tumor, intravasation and survival in the circulation, and eventual extravasation and metastatic seeding. Many breast cancers pass through a latent phase in which disseminated tumor cells (DTCs) may persist in niches where they may reside in a dormant state for months to decades. These DTCs are thought to exist in a temporary quiescent state of reversible cell-cycle arrest, from which some cells may eventually reactivate, resume proliferation and recirculate, at which point they can be detected as circulating tumor DNA (ctDNA) in the blood. Longitudinal studies demonstrate that the detection of DTCs in the bone marrow in such patients is associated with poorer disease-free, breast cancer recurrence-free, and overall survival compared to patients without DTCs. Several mechanisms implicated in this process by preclinical studies are therapeutically targetable, and the research group in the 2-PREVENT Breast Cancer Translational Center of Excellence (TCE) have developed several interventional trials aimed at targeting these DTCs.

+

However, it still remains unclear how exactly the presence of DTCs and/or ctDNA predicts relapse in the era of modern treatment for breast cancers, including chemotherapy, immunotherapy, surgery, targeted treatments, and radiation. Questions remain about who will develop DTC/ctDNA positivity, which patients with DTC positivity will have these cells reactivate, whether or not and when DTC positivity leads to ctDNA positivity, and which patients with these markers will develop relapse and subsequent metastatic disease.In the SURMOUNT surveillance study, patients with early stage (i.e. curable) but high-risk breast cancer are enrolled and undergo initial baseline bone marrow assessment (BMA) for evaluation of DTCs by immunohistochemistry (IHC), as well as peripheral blood assessment for retrospective ctDNA assesmsent. Patients who screen DTC positive–either at baseline or on yearly surveillance BMA–are referred for interventional trials aimed at eliminating dormant cells prior to clinical relapse. Patients who screen DTC negative remain in the SURMOUNT surveillance cohort and undergo yearly DTC assessment and peripheral blood collection for ctDNA assessment. All patients are followed for recurrence events and survival. The first intervention trial, CLEVER, completed enrollment in 2021, so this initial analysis is focused on the patients who were enrolled on SURMOUNT for the purposes of accruing this first trial.

+

Despite years of progress in breast cancer diagnostics and therapeutics, identifying the individuals at risk of recurrence – and figuring out how to manage and minimize their elevated risk–remains a challenge. In this study, we sought to assess the clinical validity of DTC and ctDNA assessment and to better understand the population in which they may be useful. Specifically, in this analysis, we looked at overall rates of ctDNA and DTC positivity in this cohort and clinical factors that were associated with each.

+
+
+

3 Methods

+

“PENN SURMOUNT”: SURMOUNT is a single center, prospective, longitudinal cohort study examining MRD biomarkers among pts within 5 years (y) of BC diagnosis who completed all curative treatment except endocrine therapy. Eligible pts must have had: 1) TNBC, or 2) HER2+ or HR+ BC with positive LN and/or residual disease after neoadjuvant therapy, or 3) HR+ BC with a 21-gene Recurrence score >25 and/or high risk Mammaprint. Pts had annual bone marrow aspirate (BMA) for DTCs by immunohistochemistry (using methods of Naume et al.). DTC+ pts went on therapeutic trial; DTC- pts had up to 5y of annual BMA and blood testing. ctDNA was retrospectively assessed using the RaDaR assay (NeoGenomics), which targets pt-specific somatic mutations identified by whole-exome sequencing (WES) of primary tumor tissue.

+

Data Collection and Merge: The ctDNA assessment was performed after bespoke panel development on tissue on peripheral blood from 109 patients by Neogenomics, inc. and provided back to the research team in .csv format, with the last data drop occurring July 30, 2024. DTC assessment was performed based on bone marrow assessment and ultimately entered into a REDCap database by the research team through this same follow-up date. Clinical and demographic factors–and follow-up data–were abstracted by the research team through October 2024 and entered into the same REDCap database. Data were exported in mid-October 2024 by the TCE data manager, and merged with the ctDNA data prior to hand-off for this analysis. The final locked and merged dataset, labeled “surmount184_merged_20241108.csv” is maintained in the TCE box for the ctDNA analysis, and a copy is being stored in the FinalProject_files.

+

First, we will import csv of final data, which is entitled “surmount184_merged_20241108.csv.”

+
+
library(here)
+
+
here() starts at /Users/NoraTaranto/BMIN503_Final_Project
+
+
library(dplyr) 
+
+

+Attaching package: 'dplyr'
+
+
+
The following objects are masked from 'package:stats':
+
+    filter, lag
+
+
+
The following objects are masked from 'package:base':
+
+    intersect, setdiff, setequal, union
+
+
d <- read.csv(file = here("data",
+                          "surmount184_merged_20241108.csv"))
+
+

Next, we will limit the data to the 109 patients who had ctDNA tested, of the 184 individuals who were included in the initial CLEVER trial screening group. We will look at the names and structures of the variables in the dataset “d”, of which there are 387, the majority of which are clinical variables (often categorical dummy variables), but some of which are outcome variables related to local relapse, distant relapse, and survival as well as to the pathology report accounting for DTC. This list will help us to identify the important factors to include ultimately in our multivariable model to predict positivity of these markers. We will also look at the structure of the variables as we may need to reformat some of them for analyses.

+
+
#looking at the names of the variables, and the structure of the variables. 
+names(d) 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                    
+
+
str(d)
+
+
'data.frame':   579 obs. of  387 variables:
+ $ ID                              : int  16001 16001 16001 16001 16001 16002 16003 16004 16005 16005 ...
+ $ trialID                         : int  16001 16001 16001 16001 16001 NA NA 16004 1813 1813 ...
+ $ participant_id                  : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ patient_id                      : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ fu_trial_pid                    : chr  "" "" "" "" ...
+ $ timepoint                       : chr  "SURMOUNT-Baseline" "Year 1 Follow Up" "Year 2 Follow Up" "Long Term FU 1" ...
+ $ project                         : chr  "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" ...
+ $ surmount_id                     : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ panel_id                        : chr  "23A05190P01" "23A05190P01" "23A05190P01" "23A05190P01" ...
+ $ accession                       : chr  "23A07639" "23A07640" "23A07641" "23A07642" ...
+ $ sample_id                       : chr  "23A0763907_pl" "23A0764006_pl" "23A0764105_pl" "23A0764206_pl" ...
+ $ collection_date                 : chr  "03AUG2016" "18SEP2017" "06AUG2018" "11NOV2021" ...
+ $ extracted_plasma_volume_ml      : num  5.61 5.15 3.68 4.87 5.05 ...
+ $ input                           : int  4999 4999 2775 1550 2725 NA NA 4999 NA NA ...
+ $ input_sample                    : int  19996 19996 11100 6200 10900 NA NA 19996 NA NA ...
+ $ physical_run_name               : chr  "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" ...
+ $ workflow_name                   : chr  "RaDaR" "RaDaR" "RaDaR" "RaDaR" ...
+ $ eVAF                            : num  2.07e-10 2.86e-10 5.64e-12 1.08e-09 2.23e-13 ...
+ $ mutant_molecules                : int  0 0 0 0 0 NA NA 0 NA NA ...
+ $ mean_VAF                        : num  1.48e-05 1.21e-05 7.21e-06 1.82e-05 1.10e-05 NA NA 2.85e-05 NA NA ...
+ $ Score                           : num  -1.013 -0.882 -0.74 -0.495 -0.724 ...
+ $ all_pass_variants               : int  16 16 16 16 16 NA NA 47 NA NA ...
+ $ total_variants                  : int  48 48 48 48 48 NA NA 48 NA NA ...
+ $ n_positive_variants             : int  0 0 0 0 0 NA NA 0 NA NA ...
+ $ ctDNA_detected                  : chr  "FALSE" "FALSE" "FALSE" "FALSE" ...
+ $ ctdna_cohort                    : int  1 1 1 NA NA 0 0 1 0 0 ...
+ $ dtc_ihc_date_final              : chr  "03AUG16:00:00:00" "18SEP17:00:00:00" "06AUG18:10:30:00" "" ...
+ $ dtc_ihc_cytospinnum_final       : int  10 5 5 NA NA 10 10 10 10 5 ...
+ $ dtc_ihc_result_final            : int  0 0 0 NA NA 0 0 0 0 0 ...
+ $ dtc_ihc_summary_count_final     : int  0 0 0 NA NA 0 0 0 0 0 ...
+ $ dtc_final_result_date           : chr  "08/03/2016" "09/18/2017" "08/06/2018" "" ...
+ $ pt                              : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ bma_date                        : chr  "" "" "" "" ...
+ $ ORIG_RSLT_DTC                   : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ ORIG_RSLT_DTC_COUNT             : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ FINAL_RESULT                    : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ FINAL_COUNT                     : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ org_consent_date                : chr  "06/09/2016" "06/09/2016" "06/09/2016" "06/09/2016" ...
+ $ demo_initials                   : chr  "LB" "LB" "LB" "LB" ...
+ $ demo_dob                        : chr  "09/21/1957" "09/21/1957" "09/21/1957" "09/21/1957" ...
+ $ demo_sex                        : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ demo_ethnicity                  : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ demo_race___1                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___2                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___3                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___4                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___5                   : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ demo_race___88                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___99                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race_other                 : logi  NA NA NA NA NA NA ...
+ $ prtx_radiation                  : int  1 1 1 1 1 1 1 0 1 1 ...
+ $ prtx_rad_start                  : chr  "04/29/2014" "04/29/2014" "04/29/2014" "04/29/2014" ...
+ $ prtx_rad_end                    : chr  "06/13/2014" "06/13/2014" "06/13/2014" "06/13/2014" ...
+ $ prtx_chemo                      : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ prtx_endo                       : int  1 1 1 1 1 0 0 1 0 0 ...
+ $ prtx_bonemod                    : int  0 0 0 0 0 0 0 0 1 1 ...
+ $ prior_therapy_complete          : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ inc_dx_crit                     : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ inc_dx_crit_list___1            : int  0 0 0 0 0 0 1 0 0 0 ...
+ $ inc_dx_crit_list___2            : int  1 1 1 1 1 0 1 0 1 1 ...
+ $ inc_dx_crit_list___3            : int  0 0 0 0 0 1 0 0 0 0 ...
+ $ inc_dx_crit_list___4            : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ final_receptor_group            : int  2 2 2 2 2 4 1 2 4 4 ...
+ $ demo_race_final                 : int  5 5 5 5 5 5 5 5 5 5 ...
+ $ final_histology                 : chr  "14,3" "14,3" "14,3" "14,3" ...
+ $ final_tumor_grade               : int  2 2 2 2 2 2 3 2 0 0 ...
+ $ final_overall_stage             : int  3 3 3 3 3 2 2 1 3 3 ...
+ $ final_t_stage                   : int  2 2 2 2 2 2 2 1 3 3 ...
+ $ final_n_stage                   : int  3 3 3 3 3 0 1 0 3 3 ...
+ $ fu_date_to                      : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
+ $ fu_surv                         : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ fu_date_death                   : chr  "" "" "" "" ...
+ $ fu_dec_bc_pres                  : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ fu_dec_bc_cause                 : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ fu_locreg_site_num              : chr  "" "" "" "" ...
+ $ fu_locreg_site_char             : chr  "" "" "" "" ...
+ $ fu_locreg_prog                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ fu_locreg_date                  : chr  "" "" "" "" ...
+ $ fu_dist_site_num                : chr  "" "" "" "" ...
+ $ fu_dist_site_char               : chr  "" "" "" "" ...
+ $ fu_dist_prog                    : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ fu_dist_date                    : chr  "" "" "" "" ...
+ $ censor_date                     : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
+ $ chemo_indication_1              : int  1 1 1 1 1 2 2 1 2 2 ...
+ $ chemo_name_1                    : int  2 2 2 2 2 5 2 1 7 7 ...
+ $ chemo_name_other_1              : chr  "" "" "" "" ...
+ $ chemo_start_date_1              : chr  "10/03/2013" "10/03/2013" "10/03/2013" "10/03/2013" ...
+ $ start_date_exact_1              : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ chemo_end_date_1                : chr  "01/16/2014" "01/16/2014" "01/16/2014" "01/16/2014" ...
+ $ end_date_exact_1                : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ chemo_notes_1                   : chr  "" "" "" "" ...
+ $ prior_chemotherapy_complete_1   : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ chemo_indication_2              : int  NA NA NA NA NA 2 NA NA 3 3 ...
+ $ chemo_name_2                    : int  NA NA NA NA NA 3 NA NA 25 25 ...
+ $ chemo_name_other_2              : chr  "" "" "" "" ...
+ $ chemo_start_date_2              : chr  "" "" "" "" ...
+ $ start_date_exact_2              : int  NA NA NA NA NA 1 NA NA 1 1 ...
+ $ chemo_end_date_2                : chr  "" "" "" "" ...
+ $ end_date_exact_2                : chr  "" "" "" "" ...
+  [list output truncated]
+
+
+

Summary variables: We have a few different important summary variables which we’ve identified. Summary variables: final_overall_stage final_t_stage final_n_stage, final_receptor_group (1=‘TNBC’, 2=‘HR+ Her2-’, 3=‘HR+ Her2+’, 4=‘HR- Her2+’) final_tumor_grade final_histology, demo_race_final fu_locreg_site_num (numeric values for local regional site), fu_locreg_site_char (character values for local regional site), fu_dist_site_num (numeric values for distant site), fu_dist_site_char (character values for distant site), censor_date (most recent fu_date_to among patients who are alive without local or distant progression).

+

Limiting from the overall cohort (184) to the ctDNA cohort: We know that this dataset contains 184 individuals (as this was the overall cohort of individuals that were screened for the CLEVER interventional study on SURMOUNT). But we also know, from the separate ctDNA csv and the information from the Neogenomics summary data, that there were 109 individuals on whom ctDNA was assessed. We need to limit the data set “d” to this “ctDNA cohort”–we will call the ctDNA cohort “subset_data.” We have an indicator variable “ctDNA_cohort” with which we can limit this subset.

+
+
#looking at the names of the variables, and the structure of the variables. 
+names(d) 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                    
+
+
str(d)
+
+
'data.frame':   579 obs. of  387 variables:
+ $ ID                              : int  16001 16001 16001 16001 16001 16002 16003 16004 16005 16005 ...
+ $ trialID                         : int  16001 16001 16001 16001 16001 NA NA 16004 1813 1813 ...
+ $ participant_id                  : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ patient_id                      : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ fu_trial_pid                    : chr  "" "" "" "" ...
+ $ timepoint                       : chr  "SURMOUNT-Baseline" "Year 1 Follow Up" "Year 2 Follow Up" "Long Term FU 1" ...
+ $ project                         : chr  "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" ...
+ $ surmount_id                     : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ panel_id                        : chr  "23A05190P01" "23A05190P01" "23A05190P01" "23A05190P01" ...
+ $ accession                       : chr  "23A07639" "23A07640" "23A07641" "23A07642" ...
+ $ sample_id                       : chr  "23A0763907_pl" "23A0764006_pl" "23A0764105_pl" "23A0764206_pl" ...
+ $ collection_date                 : chr  "03AUG2016" "18SEP2017" "06AUG2018" "11NOV2021" ...
+ $ extracted_plasma_volume_ml      : num  5.61 5.15 3.68 4.87 5.05 ...
+ $ input                           : int  4999 4999 2775 1550 2725 NA NA 4999 NA NA ...
+ $ input_sample                    : int  19996 19996 11100 6200 10900 NA NA 19996 NA NA ...
+ $ physical_run_name               : chr  "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" ...
+ $ workflow_name                   : chr  "RaDaR" "RaDaR" "RaDaR" "RaDaR" ...
+ $ eVAF                            : num  2.07e-10 2.86e-10 5.64e-12 1.08e-09 2.23e-13 ...
+ $ mutant_molecules                : int  0 0 0 0 0 NA NA 0 NA NA ...
+ $ mean_VAF                        : num  1.48e-05 1.21e-05 7.21e-06 1.82e-05 1.10e-05 NA NA 2.85e-05 NA NA ...
+ $ Score                           : num  -1.013 -0.882 -0.74 -0.495 -0.724 ...
+ $ all_pass_variants               : int  16 16 16 16 16 NA NA 47 NA NA ...
+ $ total_variants                  : int  48 48 48 48 48 NA NA 48 NA NA ...
+ $ n_positive_variants             : int  0 0 0 0 0 NA NA 0 NA NA ...
+ $ ctDNA_detected                  : chr  "FALSE" "FALSE" "FALSE" "FALSE" ...
+ $ ctdna_cohort                    : int  1 1 1 NA NA 0 0 1 0 0 ...
+ $ dtc_ihc_date_final              : chr  "03AUG16:00:00:00" "18SEP17:00:00:00" "06AUG18:10:30:00" "" ...
+ $ dtc_ihc_cytospinnum_final       : int  10 5 5 NA NA 10 10 10 10 5 ...
+ $ dtc_ihc_result_final            : int  0 0 0 NA NA 0 0 0 0 0 ...
+ $ dtc_ihc_summary_count_final     : int  0 0 0 NA NA 0 0 0 0 0 ...
+ $ dtc_final_result_date           : chr  "08/03/2016" "09/18/2017" "08/06/2018" "" ...
+ $ pt                              : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ bma_date                        : chr  "" "" "" "" ...
+ $ ORIG_RSLT_DTC                   : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ ORIG_RSLT_DTC_COUNT             : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ FINAL_RESULT                    : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ FINAL_COUNT                     : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ org_consent_date                : chr  "06/09/2016" "06/09/2016" "06/09/2016" "06/09/2016" ...
+ $ demo_initials                   : chr  "LB" "LB" "LB" "LB" ...
+ $ demo_dob                        : chr  "09/21/1957" "09/21/1957" "09/21/1957" "09/21/1957" ...
+ $ demo_sex                        : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ demo_ethnicity                  : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ demo_race___1                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___2                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___3                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___4                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___5                   : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ demo_race___88                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___99                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race_other                 : logi  NA NA NA NA NA NA ...
+ $ prtx_radiation                  : int  1 1 1 1 1 1 1 0 1 1 ...
+ $ prtx_rad_start                  : chr  "04/29/2014" "04/29/2014" "04/29/2014" "04/29/2014" ...
+ $ prtx_rad_end                    : chr  "06/13/2014" "06/13/2014" "06/13/2014" "06/13/2014" ...
+ $ prtx_chemo                      : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ prtx_endo                       : int  1 1 1 1 1 0 0 1 0 0 ...
+ $ prtx_bonemod                    : int  0 0 0 0 0 0 0 0 1 1 ...
+ $ prior_therapy_complete          : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ inc_dx_crit                     : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ inc_dx_crit_list___1            : int  0 0 0 0 0 0 1 0 0 0 ...
+ $ inc_dx_crit_list___2            : int  1 1 1 1 1 0 1 0 1 1 ...
+ $ inc_dx_crit_list___3            : int  0 0 0 0 0 1 0 0 0 0 ...
+ $ inc_dx_crit_list___4            : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ final_receptor_group            : int  2 2 2 2 2 4 1 2 4 4 ...
+ $ demo_race_final                 : int  5 5 5 5 5 5 5 5 5 5 ...
+ $ final_histology                 : chr  "14,3" "14,3" "14,3" "14,3" ...
+ $ final_tumor_grade               : int  2 2 2 2 2 2 3 2 0 0 ...
+ $ final_overall_stage             : int  3 3 3 3 3 2 2 1 3 3 ...
+ $ final_t_stage                   : int  2 2 2 2 2 2 2 1 3 3 ...
+ $ final_n_stage                   : int  3 3 3 3 3 0 1 0 3 3 ...
+ $ fu_date_to                      : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
+ $ fu_surv                         : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ fu_date_death                   : chr  "" "" "" "" ...
+ $ fu_dec_bc_pres                  : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ fu_dec_bc_cause                 : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ fu_locreg_site_num              : chr  "" "" "" "" ...
+ $ fu_locreg_site_char             : chr  "" "" "" "" ...
+ $ fu_locreg_prog                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ fu_locreg_date                  : chr  "" "" "" "" ...
+ $ fu_dist_site_num                : chr  "" "" "" "" ...
+ $ fu_dist_site_char               : chr  "" "" "" "" ...
+ $ fu_dist_prog                    : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ fu_dist_date                    : chr  "" "" "" "" ...
+ $ censor_date                     : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
+ $ chemo_indication_1              : int  1 1 1 1 1 2 2 1 2 2 ...
+ $ chemo_name_1                    : int  2 2 2 2 2 5 2 1 7 7 ...
+ $ chemo_name_other_1              : chr  "" "" "" "" ...
+ $ chemo_start_date_1              : chr  "10/03/2013" "10/03/2013" "10/03/2013" "10/03/2013" ...
+ $ start_date_exact_1              : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ chemo_end_date_1                : chr  "01/16/2014" "01/16/2014" "01/16/2014" "01/16/2014" ...
+ $ end_date_exact_1                : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ chemo_notes_1                   : chr  "" "" "" "" ...
+ $ prior_chemotherapy_complete_1   : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ chemo_indication_2              : int  NA NA NA NA NA 2 NA NA 3 3 ...
+ $ chemo_name_2                    : int  NA NA NA NA NA 3 NA NA 25 25 ...
+ $ chemo_name_other_2              : chr  "" "" "" "" ...
+ $ chemo_start_date_2              : chr  "" "" "" "" ...
+ $ start_date_exact_2              : int  NA NA NA NA NA 1 NA NA 1 1 ...
+ $ chemo_end_date_2                : chr  "" "" "" "" ...
+ $ end_date_exact_2                : chr  "" "" "" "" ...
+  [list output truncated]
+
+
###### ctDNA to limit to ctDNA cohort (but ok to include NAs as long as they were ever ctDNA cohort == 1) --> shall call this subset_data 
+
+# Identified all participant_ids where ctDNA_cohort == 1 
+valid_participants <- d |> 
+  filter(ctdna_cohort == 1) |> 
+  pull(participant_id) |> 
+  unique()
+
+# Subset the data to include all rows where participant_id is in the valid list
+subset_data <- d |> 
+  filter(participant_id %in% valid_participants)
+
+# Count the number of unique participant_ids in the subset_data
+unique_count <- subset_data |> 
+  summarise(unique_participants = n_distinct(participant_id))
+
+# View the result == 109! This is the correct # of patients. 
+unique_count
+
+
  unique_participants
+1                 109
+
+
+

Creating the ctDNA_ever positive indicator: Now that we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected. We can see, looking at the table by sample count using the ctDNA_detected variable (false = negative/ctDNA was NOT detected, true = positive/ctDNA was detected), that there were 385 negative samples, and 11 positive samples within the ctDNA cohort. Next, we will create the ctDNA_ever variable that will, by participant_id (which is the unique study ID), tell us if that participant ever had ctDNA detected.

+
+
#ctDNA_detected = character, ok 
+
+names(subset_data)
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                    
+
+
### Excluding the FAILS from this cohort 
+######create the ctDNA Ever positive variable 
+table(subset_data$ctDNA_detected) #385 FALSE,  11 TRUE 
+
+

+      FALSE  TRUE 
+    2   385    11 
+
+
table(d$ctDNA_detected)
+
+

+       Fail FALSE  TRUE 
+  175     8   385    11 
+
+
# Create the 'ctDNA_ever' variable: 
+# This will be 1 if ctDNA_detected was 1 for any record for the participant, otherwise 0.
+subset_data <- subset_data  |> 
+  group_by(participant_id) |>
+  mutate(ctDNA_ever = if_else(any(ctDNA_detected == TRUE), TRUE, FALSE)) |>
+  ungroup()
+
+# View the updated data
+table(subset_data$participant_id, subset_data$ctDNA_ever)
+
+
              
+               FALSE TRUE
+  28115-16-001     5    0
+  28115-16-004     1    0
+  28115-16-010     1    0
+  28115-16-014     1    0
+  28115-16-015    12    0
+  28115-16-017     3    0
+  28115-16-020     0    1
+  28115-16-021     9    0
+  28115-16-023     1    0
+  28115-16-025     1    0
+  28115-16-026    10    0
+  28115-16-027     3    0
+  28115-16-029     2    0
+  28115-16-033     2    0
+  28115-16-035     1    0
+  28115-17-001     8    0
+  28115-17-002     9    0
+  28115-17-006     1    0
+  28115-17-008     9    0
+  28115-17-009     1    0
+  28115-17-010     5    0
+  28115-17-011     9    0
+  28115-17-012    10    0
+  28115-17-016     4    0
+  28115-17-017     5    0
+  28115-17-019     9    0
+  28115-17-021     1    0
+  28115-17-022     1    0
+  28115-17-023     0    2
+  28115-17-024     4    0
+  28115-17-025     0    2
+  28115-17-027     8    0
+  28115-17-030     3    0
+  28115-17-031     5    0
+  28115-17-032     0   10
+  28115-17-036     7    0
+  28115-17-039     2    0
+  28115-17-040     4    0
+  28115-17-045     1    0
+  28115-17-046    10    0
+  28115-17-047     3    0
+  28115-17-048     2    0
+  28115-17-050     0    3
+  28115-17-051     9    0
+  28115-17-052     3    0
+  28115-18-001     7    0
+  28115-18-002     2    0
+  28115-18-004     2    0
+  28115-18-006     1    0
+  28115-18-009     1    0
+  28115-18-011     5    0
+  28115-18-014     2    0
+  28115-18-015     5    0
+  28115-18-017     1    0
+  28115-18-020     8    0
+  28115-18-021     8    0
+  28115-18-022    12    0
+  28115-18-023     3    0
+  28115-18-024     2    0
+  28115-18-027     1    0
+  28115-18-028     1    0
+  28115-18-029     4    0
+  28115-18-030     2    0
+  28115-18-031     3    0
+  28115-18-032     6    0
+  28115-18-034     1    0
+  28115-19-001     0    1
+  28115-19-002     2    0
+  28115-19-003     5    0
+  28115-19-004     1    0
+  28115-19-005     3    0
+  28115-19-006     8    0
+  28115-19-007     5    0
+  28115-19-009     6    0
+  28115-19-011     1    0
+  28115-19-012     3    0
+  28115-19-014     2    0
+  28115-19-016     2    0
+  28115-19-017     2    0
+  28115-19-019     3    0
+  28115-19-020     2    0
+  28115-19-021     4    0
+  28115-19-022     2    0
+  28115-19-025     6    0
+  28115-19-028     2    0
+  28115-20-004     2    0
+  28115-20-007     2    0
+  28115-20-009     4    0
+  28115-20-010     1    0
+  28115-21-001     1    0
+  28115-21-002     4    0
+  28115-21-003     0    2
+  28115-21-006     2    0
+  28115-21-007     0    3
+  28115-21-009     3    0
+  28115-21-011     1    0
+  28115-21-013     4    0
+  28115-21-014     2    0
+  28115-21-015     2    0
+  28115-21-016     8    0
+  28115-21-019     1    0
+  28115-21-020     3    0
+  28115-21-021     3    0
+  28115-21-022     1    0
+  28115-21-024     0    2
+  28115-21-025     2    0
+  28115-21-026     2    0
+  28115-21-027     2    0
+  28115-21-028     1    0
+
+
subset_data |> 
+  group_by(participant_id) |> 
+  summarize(ctDNA_ever = first(ctDNA_ever)) |> 
+  count(ctDNA_ever)
+
+
# A tibble: 2 × 2
+  ctDNA_ever     n
+  <lgl>      <int>
+1 FALSE        100
+2 TRUE           9
+
+
+

We can see here using the summary variable ctDNA_ever that there are 100 individuals with always negative results, and 9 individuals with “ever positive” ctDNA results, which matches our original ctDNA source data.

+

Creating the Ever DTC Positive Variable Next, we will create a variable to represent whether someone ever had a DTC positive test. To do this, we will use the final result variable “dtc_ihc_result_final” which tells us, for a given sample/date, whether that DTC result was positive (“1”) or negative (“0”). We see in this data set, by sample, that there are 221 negative samples, and 49 positive samples in this dataset (accross 109 patients, 39 of whom were DTC positive), which aligns with our prior data and consorts.

+
+
names(subset_data) #looking at the names of variables to find the DTC indicator variable 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+
+
library(stringr)
+
+#final result variable is dtc_ihc_result_final. This is on a by sample level though. 
+#final count for DTCs is dtc_ihc_summary_count
+#final result date is dtc_final_result_ date
+
+table(subset_data$dtc_ihc_result_final) #221 negatives, 49 positives 
+
+

+  0   1 
+221  49 
+
+
#making the dtc_ever variable 
+subset_data <- subset_data |> 
+  group_by(participant_id) |> 
+  mutate(dtc_ever = if_else(any(dtc_ihc_result_final == 1, na.rm = TRUE), 1, 0)) |> 
+  ungroup()
+
+table(subset_data$participant_id, subset_data$dtc_ever) 
+
+
              
+                0  1
+  28115-16-001  5  0
+  28115-16-004  1  0
+  28115-16-010  1  0
+  28115-16-014  1  0
+  28115-16-015  0 12
+  28115-16-017  3  0
+  28115-16-020  1  0
+  28115-16-021  0  9
+  28115-16-023  1  0
+  28115-16-025  1  0
+  28115-16-026  0 10
+  28115-16-027  3  0
+  28115-16-029  2  0
+  28115-16-033  2  0
+  28115-16-035  1  0
+  28115-17-001  0  8
+  28115-17-002  0  9
+  28115-17-006  1  0
+  28115-17-008  0  9
+  28115-17-009  1  0
+  28115-17-010  0  5
+  28115-17-011  0  9
+  28115-17-012  0 10
+  28115-17-016  0  4
+  28115-17-017  0  5
+  28115-17-019  0  9
+  28115-17-021  1  0
+  28115-17-022  1  0
+  28115-17-023  2  0
+  28115-17-024  0  4
+  28115-17-025  0  2
+  28115-17-027  0  8
+  28115-17-030  3  0
+  28115-17-031  0  5
+  28115-17-032  0 10
+  28115-17-036  0  7
+  28115-17-039  2  0
+  28115-17-040  4  0
+  28115-17-045  1  0
+  28115-17-046  0 10
+  28115-17-047  3  0
+  28115-17-048  2  0
+  28115-17-050  0  3
+  28115-17-051  0  9
+  28115-17-052  3  0
+  28115-18-001  0  7
+  28115-18-002  2  0
+  28115-18-004  2  0
+  28115-18-006  1  0
+  28115-18-009  1  0
+  28115-18-011  5  0
+  28115-18-014  2  0
+  28115-18-015  0  5
+  28115-18-017  1  0
+  28115-18-020  0  8
+  28115-18-021  0  8
+  28115-18-022  0 12
+  28115-18-023  0  3
+  28115-18-024  2  0
+  28115-18-027  1  0
+  28115-18-028  1  0
+  28115-18-029  4  0
+  28115-18-030  2  0
+  28115-18-031  0  3
+  28115-18-032  0  6
+  28115-18-034  1  0
+  28115-19-001  1  0
+  28115-19-002  2  0
+  28115-19-003  5  0
+  28115-19-004  1  0
+  28115-19-005  3  0
+  28115-19-006  0  8
+  28115-19-007  5  0
+  28115-19-009  0  6
+  28115-19-011  1  0
+  28115-19-012  3  0
+  28115-19-014  2  0
+  28115-19-016  0  2
+  28115-19-017  0  2
+  28115-19-019  3  0
+  28115-19-020  2  0
+  28115-19-021  4  0
+  28115-19-022  0  2
+  28115-19-025  0  6
+  28115-19-028  0  2
+  28115-20-004  2  0
+  28115-20-007  2  0
+  28115-20-009  4  0
+  28115-20-010  1  0
+  28115-21-001  1  0
+  28115-21-002  4  0
+  28115-21-003  2  0
+  28115-21-006  2  0
+  28115-21-007  3  0
+  28115-21-009  3  0
+  28115-21-011  1  0
+  28115-21-013  4  0
+  28115-21-014  2  0
+  28115-21-015  2  0
+  28115-21-016  0  8
+  28115-21-019  1  0
+  28115-21-020  3  0
+  28115-21-021  3  0
+  28115-21-022  1  0
+  28115-21-024  0  2
+  28115-21-025  2  0
+  28115-21-026  2  0
+  28115-21-027  0  2
+  28115-21-028  1  0
+
+
subset_data |> 
+  group_by(participant_id) |> 
+  summarize(dtc_ever = first(dtc_ever)) |> 
+  count(dtc_ever)
+
+
# A tibble: 2 × 2
+  dtc_ever     n
+     <dbl> <int>
+1        0    70
+2        1    39
+
+
+

Looking at the number of DTC positives by unique participant, we see 70 DTC ever negatives, 39 positives, which aligns with our source data on DTC positivity for this specific ctDNA cohort.

+
+
+
+

4 Results

+

Sample and Testing Information: In this cohort of 109 individuals who had ctDNA and DTC testing on SURMOUNT (either at baseline or in follow-up), 100 remained persistently ctDNA negative, and 70 remained persistently DTC negative–with 9 respective ctDNA-positive individuals and 39 DTC-positive individuals. Of 184 pts enrolled from 2016 – 2021, 121 had tissue available; 114/121 (94%) had successful WES (prior data/NeoGenomics data).

+
+
#counts for ctDNA positivity 
+subset_data |>
+  filter(ctDNA_ever == "TRUE") |>
+  summarize(unique_participants = n_distinct(participant_id))
+
+
# A tibble: 1 × 1
+  unique_participants
+                <int>
+1                   9
+
+
table(subset_data$ctDNA_detected) #385 FALSE,  11 TRUE 
+
+

+      FALSE  TRUE 
+    2   385    11 
+
+
table(d$ctDNA_detected) #385 false, 11 true, 8 fails 
+
+

+       Fail FALSE  TRUE 
+  175     8   385    11 
+
+
# Count unique participants with FAIL in ctDNA_detected (this is in database d, the original database, not in the ctDNA cohort, as these patients were excluded from the cohort)
+num_fail <- d |> 
+  filter(ctDNA_detected == "Fail") |>   # Filter rows where ctDNA_detected is FAIL
+  distinct(participant_id) |>          # Select unique participant_id
+  nrow()                                # Count the number of rows
+
+num_fail #4 individuals with Fails in original d dataset 
+
+
[1] 4
+
+
#timepoints of positivity. 2 at baseline, 7 after. 
+subset_data |>
+  filter(ctDNA_ever == "TRUE") |>
+  group_by(participant_id) |>
+  summarize(positive_timepoints = list(timepoint))
+
+
# A tibble: 9 × 2
+  participant_id positive_timepoints
+  <chr>          <list>             
+1 28115-16-020   <chr [1]>          
+2 28115-17-023   <chr [2]>          
+3 28115-17-025   <chr [2]>          
+4 28115-17-032   <chr [10]>         
+5 28115-17-050   <chr [3]>          
+6 28115-19-001   <chr [1]>          
+7 28115-21-003   <chr [2]>          
+8 28115-21-007   <chr [3]>          
+9 28115-21-024   <chr [2]>          
+
+
subset_data |>
+  filter(ctDNA_detected == "TRUE", timepoint == "SURMOUNT-Baseline") |>
+  summarize(count_SURMOUNT_Baseline = n())
+
+
# A tibble: 1 × 1
+  count_SURMOUNT_Baseline
+                    <int>
+1                       2
+
+
#eVAF 
+
+subset_data |>
+  filter(ctDNA_ever == "TRUE") |>
+  summarize(
+    mean_eVAF = mean(eVAF, na.rm = TRUE),
+    median_eVAF = median(eVAF, na.rm = TRUE),
+    sd_eVAF = sd(eVAF, na.rm = TRUE),
+    min_eVAF = min(eVAF, na.rm = TRUE),
+    max_eVAF = max(eVAF, na.rm = TRUE)
+  )
+
+
# A tibble: 1 × 5
+  mean_eVAF median_eVAF  sd_eVAF min_eVAF max_eVAF
+      <dbl>       <dbl>    <dbl>    <dbl>    <dbl>
+1 0.0000893 0.000000413 0.000219 2.14e-18 0.000836
+
+
#### DTC counts 
+
+#counts for DTC positivity --> 39 
+subset_data |>
+  filter(dtc_ever == 1) |>
+  summarize(unique_participants = n_distinct(participant_id))
+
+
# A tibble: 1 × 1
+  unique_participants
+                <int>
+1                  39
+
+
#timepoints of positivity. 
+subset_data |>
+  filter(dtc_ever == 1) |>
+  select(participant_id, timepoint)
+
+
# A tibble: 249 × 2
+   participant_id timepoint        
+   <chr>          <chr>            
+ 1 28115-16-015   SURMOUNT-Baseline
+ 2 28115-16-015   Year 1 Follow Up 
+ 3 28115-16-015   Year 2 Follow Up 
+ 4 28115-16-015   Year 3 Follow Up 
+ 5 28115-16-015   CLEVER-Baseline  
+ 6 28115-16-015   C6               
+ 7 28115-16-015   6M F/U           
+ 8 28115-16-015   12M F/U          
+ 9 28115-16-015   18M F/U          
+10 28115-16-015   24M F/U          
+# ℹ 239 more rows
+
+
# numbers at baseline 
+
+subset_data |>
+  filter(dtc_ihc_result_final == 1, timepoint == "SURMOUNT-Baseline") |>
+  summarize(count_SURMOUNT_Baseline = n())
+
+
# A tibble: 1 × 1
+  count_SURMOUNT_Baseline
+                    <int>
+1                      26
+
+
### Timepoint Data (# timepoints per patient)
+
+# Timepoints per patient (median, range), overall
+timepoints_per_patient <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    total_timepoints = n_distinct(timepoint),  # Count distinct timepoints for each patient
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_timepoints = median(total_timepoints, na.rm = TRUE),  # Calculate median
+    min_timepoints = min(total_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_timepoints = max(total_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+timepoints_per_patient
+
+
# A tibble: 1 × 3
+  median_timepoints min_timepoints max_timepoints
+              <int>          <int>          <int>
+1                 2              1             12
+
+
#  Timepoints of ctDNA assessment (`ctDNA_detected`)
+ctDNA_timepoints <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Filter out NA values for ctDNA_detected
+  group_by(participant_id) |>
+  summarise(
+    ctDNA_timepoints = n_distinct(timepoint),  # Count distinct timepoints of ctDNA assessment
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE),  # Calculate median
+    min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+ctDNA_timepoints
+
+
# A tibble: 1 × 3
+  median_ctDNA_timepoints min_ctDNA_timepoints max_ctDNA_timepoints
+                    <int>                <int>                <int>
+1                       2                    1                   12
+
+
#  Timepoints of DTC assessment (`dtc_ihc_results_final`)
+dtc_timepoints <- subset_data |>
+  filter(!is.na(dtc_ihc_result_final)) |>  # Filter out NA values for dtc_ihc_result_final
+  group_by(participant_id) |>
+  summarise(
+    dtc_timepoints = n_distinct(timepoint),  # Count distinct timepoints of DTC assessment
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE),  # Calculate median
+    min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+dtc_timepoints
+
+
# A tibble: 1 × 3
+  median_dtc_timepoints min_dtc_timepoints max_dtc_timepoints
+                  <int>              <int>              <int>
+1                     2                  1                  6
+
+
# Print all summaries
+print("Timepoints per patient:")
+
+
[1] "Timepoints per patient:"
+
+
print(timepoints_per_patient)
+
+
# A tibble: 1 × 3
+  median_timepoints min_timepoints max_timepoints
+              <int>          <int>          <int>
+1                 2              1             12
+
+
print("Timepoints of ctDNA assessment:")
+
+
[1] "Timepoints of ctDNA assessment:"
+
+
print(ctDNA_timepoints)
+
+
# A tibble: 1 × 3
+  median_ctDNA_timepoints min_ctDNA_timepoints max_ctDNA_timepoints
+                    <int>                <int>                <int>
+1                       2                    1                   12
+
+
print("Timepoints of DTC assessment:")
+
+
[1] "Timepoints of DTC assessment:"
+
+
print(dtc_timepoints)
+
+
# A tibble: 1 × 3
+  median_dtc_timepoints min_dtc_timepoints max_dtc_timepoints
+                  <int>              <int>              <int>
+1                     2                  1                  6
+
+
+

Timepoints of samples: A total of 396 plasma samples from 109 pts (median 2 timepoints each, range 1-12) have been successfully tested by RaDaR to date, with 8 failed samples across 4 unique individuals. These 4 individuals were excluded from the ctDNA cohort as they did not ultimately have succcesful ctDNA assessment. These 109 individuals had a median of 2 DTC assessment timepoints (range 1-6).

+

Overall, ctDNA was detected in 11 samples from 9/109 pts with a mean eVAF of 0.009% (range 0.002-0.084%). Two pts were ctDNA+ at baseline (BL), and 7 became positive on surveillance. 100/109 were ctDNA- across all timepoints. 39/109 pts were DTC+, either at BL (n=26) or after (n=13).

+
+
# Filter and get unique participants by participant_id
+concordance_overall_unique <- subset_data |> 
+  distinct(participant_id, .keep_all = TRUE) |> 
+  mutate(concordance = ifelse(dtc_ever == ctDNA_ever, "Concordant", "Discordant"))
+
+# Count total concordant and discordant pairs for unique participants
+overall_concordant <- sum(concordance_overall_unique$concordance == "Concordant")
+overall_discordant <- sum(concordance_overall_unique$concordance == "Discordant")
+
+# Proportion of concordance
+proportion_concordant <- overall_concordant / (overall_concordant + overall_discordant)
+
+cat("Overall Concordant (unique participants):", overall_concordant, "\n")
+
+
Overall Concordant (unique participants): 69 
+
+
cat("Overall Discordant (unique participants):", overall_discordant, "\n")
+
+
Overall Discordant (unique participants): 40 
+
+
cat("Overall Proportion Concordant (unique participants):", proportion_concordant, "\n")
+
+
Overall Proportion Concordant (unique participants): 0.6330275 
+
+
#Proportion concordance 63% (ever positive)
+unique <- subset_data |>
+  group_by(participant_id) |>
+  summarize(
+    dtc_ever = max(dtc_ever, na.rm = TRUE),    # Ensures 1 if DTC is ever detected
+    ctDNA_ever = max(ctDNA_ever, na.rm = TRUE) # Ensures 1 if ctDNA is ever detected
+  )
+
+# Create the 2x2 table
+table_ctDNA_dtc <- table(unique$ctDNA_ever, unique$dtc_ever)
+print(table_ctDNA_dtc)
+
+
   
+     0  1
+  0 65 35
+  1  5  4
+
+
+
+
#Concordance by timepoint 
+
+# Ensure that dtc_ihc_result_final is converted to TRUE/FALSE for consistency with ctDNA_detected
+concordance_by_timepoint <- subset_data |> 
+  filter(!is.na(dtc_ihc_result_final) & !is.na(ctDNA_detected)) |> 
+  mutate(
+    # Convert dtc_ihc_result_final (1 = TRUE, 0 = FALSE) to match ctDNA format (TRUE/FALSE)
+    dtc_ihc_result_final_bool = ifelse(dtc_ihc_result_final == 1, TRUE, FALSE),
+    # Create a column to determine concordance (both DTC and ctDNA being TRUE or both being FALSE)
+    concordance = ifelse(dtc_ihc_result_final_bool == ctDNA_detected, "Concordant", "Discordant")
+  ) |>
+  group_by(timepoint) |>
+  summarise(
+    total_concordant = sum(concordance == "Concordant"),
+    total_discordant = sum(concordance == "Discordant"),
+    total_samples = n(),  # Total number of samples at this timepoint
+    concordance_rate = total_concordant / total_samples  # Concordance rate per timepoint
+  )
+
+# Print concordance results for each timepoint
+print(concordance_by_timepoint)
+
+
# A tibble: 10 × 5
+   timepoint    total_concordant total_discordant total_samples concordance_rate
+   <chr>                   <int>            <int>         <int>            <dbl>
+ 1 6M F/U                     17                2            19            0.895
+ 2 C12                         4                0             4            1    
+ 3 C3                         17                2            19            0.895
+ 4 C6                         26                2            28            0.929
+ 5 EOO                         5                4             9            0.556
+ 6 SURMOUNT-Ba…               80               29           109            0.734
+ 7 Year 1 Foll…               31                9            40            0.775
+ 8 Year 2 Foll…               21                3            24            0.875
+ 9 Year 3 Foll…               11                3            14            0.786
+10 Year 4 Foll…                3                1             4            0.75 
+
+
# Now calculate overall concordance across all timepoints
+overall_concordance <- sum(concordance_by_timepoint$total_concordant) / 
+  sum(concordance_by_timepoint$total_samples)
+
+cat("Overall Concordance Rate across all timepoints:", overall_concordance, "\n")
+
+
Overall Concordance Rate across all timepoints: 0.7962963 
+
+
#concordance, considering testing by timepoint, is 80% 
+
+

Concordance of DTC and ctDNA testing: Considering all timepoints, concordance was 63%, with higher concordance (80%) taking into account result concordance at each timepoint. Of 39 ever-DTC+ pts, 4 were ctDNA+ (of whom 3/4 recurred) and 35 remained ctDNA- (with 1/30 who recurred).

+

Test Characteristics

+

Next, we will look at ctDNA and DTC test characteristics. First we will look at the association between ctDNA and DTC positivity. Next we will look at the number of tests.

+
+
############ Test Characteristics and Baseline versus cumulative positivity (ctDNA to start)#######
+
+### DTC by ctDNA (ever positive), association between test positivity. 
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    dtc = first(dtc_ever),  # Get the ever dtc for each participant
+    ctDNA_ever = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of dtc vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$dtc, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results, p-val 0.839 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    65    5
+  1    35    4
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.041269, df = 1, p-value = 0.839
+
+
##### Tests (#s and such of tests)
+
+#number of tests (ctDNA)
+library(dplyr)
+
+# Assuming the status variable is named `ctDNA_detected` in d, and then in subset 
+status_summary_d <- d |>
+  group_by(ctDNA_detected) |>
+  summarise(total_samples = n(), .groups = "drop")
+
+# Print the summary -- we've got 385 FALSE, 8 FAILS, 11 TRUES 
+print(status_summary_d)
+
+
# A tibble: 4 × 2
+  ctDNA_detected total_samples
+  <chr>                  <int>
+1 ""                       175
+2 "FALSE"                  385
+3 "Fail"                     8
+4 "TRUE"                    11
+
+
#looking at the number of Fails by unique participant_id
+fail_count <- d |>
+  filter(ctDNA_detected == "Fail") |>   # Filter for rows where status is "FAIL"
+  distinct(participant_id) |>          # Get unique participant IDs
+  summarise(total_fails = n())          # Count unique participant IDs
+
+# Print the result -- 4 individuals with FAIL results, which is what we got in the consort  
+print(fail_count)
+
+
  total_fails
+1           4
+
+
fail_count <- subset_data |>
+  filter(ctDNA_detected == "Fail") |>   # Filter for rows where status is "FAIL"
+  distinct(participant_id) |>          # Get unique participant IDs
+  summarise(total_fails = n())          # Count unique participant IDs
+
+# Print the result -- none of the fails were pulled into the ctDNA cohort  
+print(fail_count)
+
+
# A tibble: 1 × 1
+  total_fails
+        <int>
+1           0
+
+
#number of DTC tests in this cohort of 109 patients 
+
+unique(subset_data$dtc_ihc_result_final) #NA, 0, and 1 
+
+
[1]  0 NA  1
+
+
status_summary_subset <- subset_data |>
+  group_by(dtc_ihc_result_final) |>
+  summarise(total_samples = n(), .groups = "drop")
+
+# Print the summary -- we've got 221 negatives, 49 positives, 128 NAs, across 39 patients (positive) and 70 patients (negative)  
+#### confirm with nick that not missing the NAs, but I suspect based on the below that we are fine and thse are just ctDNA only timepoints 
+print(status_summary_subset)
+
+
# A tibble: 3 × 2
+  dtc_ihc_result_final total_samples
+                 <int>         <int>
+1                    0           221
+2                    1            49
+3                   NA           128
+
+
### looking at NAs -- all of them have FALSE (so i think these are all the ones that had ctDNA timepoints )
+na_participants_dtc <- subset_data |>
+  filter(is.na(dtc_ihc_result_final)) |>
+  select(participant_id, dtc_ihc_result_final, FINAL_RESULT, ORIG_RSLT_DTC, ctDNA_detected, timepoint)
+
+# Print the list of participant IDs with NA in `dtc_ihc_result_final`-- they all have FALSE ctDNA results, so these are the ctDNA timepoints
+#all of the timepoints are long-term except for CLEVER baseline. 
+print(na_participants_dtc, n=128)
+
+
# A tibble: 128 × 6
+    participant_id dtc_ihc_result_final FINAL_RESULT ORIG_RSLT_DTC
+    <chr>                         <int>        <int>         <int>
+  1 28115-16-001                     NA           NA            NA
+  2 28115-16-001                     NA           NA            NA
+  3 28115-16-015                     NA           NA            NA
+  4 28115-16-015                     NA           NA            NA
+  5 28115-16-015                     NA           NA            NA
+  6 28115-16-015                     NA           NA            NA
+  7 28115-16-015                     NA           NA            NA
+  8 28115-16-015                     NA           NA            NA
+  9 28115-16-021                     NA           NA            NA
+ 10 28115-16-021                     NA           NA            NA
+ 11 28115-16-021                     NA           NA            NA
+ 12 28115-16-021                     NA           NA            NA
+ 13 28115-16-026                     NA           NA            NA
+ 14 28115-16-026                     NA           NA            NA
+ 15 28115-16-026                     NA           NA            NA
+ 16 28115-16-026                     NA           NA            NA
+ 17 28115-16-026                     NA           NA            NA
+ 18 28115-16-026                     NA           NA            NA
+ 19 28115-16-033                     NA           NA            NA
+ 20 28115-17-001                     NA           NA            NA
+ 21 28115-17-001                     NA           NA            NA
+ 22 28115-17-001                     NA           NA            NA
+ 23 28115-17-001                     NA           NA            NA
+ 24 28115-17-001                     NA           NA            NA
+ 25 28115-17-002                     NA           NA            NA
+ 26 28115-17-002                     NA           NA            NA
+ 27 28115-17-002                     NA           NA            NA
+ 28 28115-17-002                     NA           NA            NA
+ 29 28115-17-002                     NA           NA            NA
+ 30 28115-17-008                     NA           NA            NA
+ 31 28115-17-008                     NA           NA            NA
+ 32 28115-17-008                     NA           NA            NA
+ 33 28115-17-008                     NA           NA            NA
+ 34 28115-17-008                     NA           NA            NA
+ 35 28115-17-008                     NA           NA            NA
+ 36 28115-17-010                     NA           NA            NA
+ 37 28115-17-011                     NA           NA            NA
+ 38 28115-17-011                     NA           NA            NA
+ 39 28115-17-011                     NA           NA            NA
+ 40 28115-17-011                     NA           NA            NA
+ 41 28115-17-012                     NA           NA            NA
+ 42 28115-17-012                     NA           NA            NA
+ 43 28115-17-012                     NA           NA            NA
+ 44 28115-17-012                     NA           NA            NA
+ 45 28115-17-012                     NA           NA            NA
+ 46 28115-17-012                     NA           NA            NA
+ 47 28115-17-016                     NA           NA            NA
+ 48 28115-17-017                     NA           NA            NA
+ 49 28115-17-017                     NA           NA            NA
+ 50 28115-17-019                     NA           NA            NA
+ 51 28115-17-019                     NA           NA            NA
+ 52 28115-17-019                     NA           NA            NA
+ 53 28115-17-019                     NA           NA            NA
+ 54 28115-17-019                     NA           NA            NA
+ 55 28115-17-024                     NA           NA            NA
+ 56 28115-17-027                     NA           NA            NA
+ 57 28115-17-027                     NA           NA            NA
+ 58 28115-17-027                     NA           NA            NA
+ 59 28115-17-027                     NA           NA            NA
+ 60 28115-17-031                     NA           NA            NA
+ 61 28115-17-032                     NA           NA            NA
+ 62 28115-17-032                     NA           NA            NA
+ 63 28115-17-032                     NA           NA            NA
+ 64 28115-17-032                     NA           NA            NA
+ 65 28115-17-032                     NA           NA            NA
+ 66 28115-17-032                     NA           NA            NA
+ 67 28115-17-036                     NA           NA            NA
+ 68 28115-17-036                     NA           NA            NA
+ 69 28115-17-046                     NA           NA            NA
+ 70 28115-17-046                     NA           NA            NA
+ 71 28115-17-046                     NA           NA            NA
+ 72 28115-17-046                     NA           NA            NA
+ 73 28115-17-046                     NA           NA            NA
+ 74 28115-17-046                     NA           NA            NA
+ 75 28115-17-050                     NA           NA            NA
+ 76 28115-17-050                     NA           NA            NA
+ 77 28115-17-051                     NA           NA            NA
+ 78 28115-17-051                     NA           NA            NA
+ 79 28115-17-051                     NA           NA            NA
+ 80 28115-17-051                     NA           NA            NA
+ 81 28115-17-051                     NA           NA            NA
+ 82 28115-17-052                     NA           NA            NA
+ 83 28115-18-001                     NA           NA            NA
+ 84 28115-18-001                     NA           NA            NA
+ 85 28115-18-001                     NA           NA            NA
+ 86 28115-18-001                     NA           NA            NA
+ 87 28115-18-004                     NA           NA            NA
+ 88 28115-18-015                     NA           NA            NA
+ 89 28115-18-020                     NA           NA            NA
+ 90 28115-18-020                     NA           NA            NA
+ 91 28115-18-020                     NA           NA            NA
+ 92 28115-18-020                     NA           NA            NA
+ 93 28115-18-021                     NA           NA            NA
+ 94 28115-18-021                     NA           NA            NA
+ 95 28115-18-021                     NA           NA            NA
+ 96 28115-18-021                     NA           NA            NA
+ 97 28115-18-021                     NA           NA            NA
+ 98 28115-18-021                     NA           NA            NA
+ 99 28115-18-022                     NA           NA            NA
+100 28115-18-022                     NA           NA            NA
+101 28115-18-022                     NA           NA            NA
+102 28115-18-022                     NA           NA            NA
+103 28115-18-022                     NA           NA            NA
+104 28115-18-022                     NA           NA            NA
+105 28115-18-023                     NA           NA            NA
+106 28115-18-023                     NA           NA            NA
+107 28115-18-029                     NA           NA            NA
+108 28115-18-031                     NA           NA            NA
+109 28115-18-032                     NA           NA            NA
+110 28115-18-032                     NA           NA            NA
+111 28115-18-032                     NA           NA            NA
+112 28115-18-032                     NA           NA            NA
+113 28115-19-002                     NA           NA            NA
+114 28115-19-005                     NA           NA            NA
+115 28115-19-005                     NA           NA            NA
+116 28115-19-006                     NA           NA            NA
+117 28115-19-006                     NA           NA            NA
+118 28115-19-006                     NA           NA            NA
+119 28115-19-009                     NA           NA            NA
+120 28115-19-025                     NA           NA            NA
+121 28115-19-028                     NA           NA            NA
+122 28115-20-007                     NA           NA            NA
+123 28115-21-006                     NA           NA            NA
+124 28115-21-016                     NA           NA            NA
+125 28115-21-016                     NA           NA            NA
+126 28115-21-016                     NA           NA            NA
+127 28115-21-016                     NA           NA            NA
+128 28115-21-025                     NA           NA            NA
+# ℹ 2 more variables: ctDNA_detected <chr>, timepoint <chr>
+
+
#look at timepoints 
+unique_timepoints <- unique(subset_data$timepoint)
+print(unique_timepoints)
+
+
 [1] "SURMOUNT-Baseline" "Year 1 Follow Up"  "Year 2 Follow Up" 
+ [4] "Long Term FU 1"    "Long Term FU 2"    "Year 3 Follow Up" 
+ [7] "CLEVER-Baseline"   "C6"                "6M F/U"           
+[10] "12M F/U"           "18M F/U"           "24M F/U"          
+[13] "30M F/U"           "36M F/U"           "C3"               
+[16] "EOO"               "C12"               "Year 4 Follow Up" 
+
+
##### eVAF 
+names(subset_data) #use eVAF
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                        
+
+
# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE`
+eVAF_range_ctDNA_detected_percent <- subset_data |>
+  filter(ctDNA_detected == TRUE) |>   # Filter for those with ctDNA detected
+  summarise(
+    median_eVAF_percent = median(eVAF, na.rm = TRUE) * 100,  # Convert median to percentage
+    min_eVAF_percent = min(eVAF, na.rm = TRUE) * 100,        # Convert minimum to percentage
+    max_eVAF_percent = max(eVAF, na.rm = TRUE) * 100         # Convert maximum to percentage
+  )
+
+# Print the result
+print(eVAF_range_ctDNA_detected_percent)
+
+
# A tibble: 1 × 3
+  median_eVAF_percent min_eVAF_percent max_eVAF_percent
+                <dbl>            <dbl>            <dbl>
+1             0.00901          0.00165           0.0836
+
+
#### DTC counts 
+names(subset_data) #use dtc_ihc_summary_count_final  
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                        
+
+
# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE`
+dtc_count <- subset_data |>
+  filter(dtc_ihc_result_final == 1) |>   # Filter for those with dtcs detected 
+  summarise(
+    median_dtc_count = median(dtc_ihc_summary_count_final, na.rm = TRUE), 
+    min_dtc_count = min(dtc_ihc_summary_count_final, na.rm = TRUE),        
+    max_dtc_count = max(dtc_ihc_summary_count_final, na.rm = TRUE)         
+  )
+
+# Print the result
+print(dtc_count)
+
+
# A tibble: 1 × 3
+  median_dtc_count min_dtc_count max_dtc_count
+             <int>         <int>         <int>
+1                2             1            10
+
+
#### Number of timepoints we see 
+
+# Timepoints per patient (median, range)
+timepoints_per_patient <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    total_timepoints = n_distinct(timepoint),  # Count distinct timepoints for each patient
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_timepoints = median(total_timepoints, na.rm = TRUE),  # Calculate median
+    min_timepoints = min(total_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_timepoints = max(total_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+
+#  Timepoints of ctDNA assessment (`ctDNA_detected`)
+ctDNA_timepoints <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Filter out NA values for ctDNA_detected
+  group_by(participant_id) |>
+  summarise(
+    ctDNA_timepoints = n_distinct(timepoint),  # Count distinct timepoints of ctDNA assessment
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE),  # Calculate median
+    min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+
+#  Timepoints of DTC assessment (`dtc_ihc_results_final`)
+dtc_timepoints <- subset_data |>
+  filter(!is.na(dtc_ihc_result_final)) |>  # Filter out NA values for dtc_ihc_result_final
+  group_by(participant_id) |>
+  summarise(
+    dtc_timepoints = n_distinct(timepoint),  # Count distinct timepoints of DTC assessment
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE),  # Calculate median
+    min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+
+# Print all summaries
+print("Timepoints per patient:")
+
+
[1] "Timepoints per patient:"
+
+
print(timepoints_per_patient)
+
+
# A tibble: 1 × 3
+  median_timepoints min_timepoints max_timepoints
+              <int>          <int>          <int>
+1                 2              1             12
+
+
print("Timepoints of ctDNA assessment:")
+
+
[1] "Timepoints of ctDNA assessment:"
+
+
print(ctDNA_timepoints)
+
+
# A tibble: 1 × 3
+  median_ctDNA_timepoints min_ctDNA_timepoints max_ctDNA_timepoints
+                    <int>                <int>                <int>
+1                       2                    1                   12
+
+
print("Timepoints of DTC assessment:")
+
+
[1] "Timepoints of DTC assessment:"
+
+
print(dtc_timepoints)
+
+
# A tibble: 1 × 3
+  median_dtc_timepoints min_dtc_timepoints max_dtc_timepoints
+                  <int>              <int>              <int>
+1                     2                  1                  6
+
+
### timepoints on clinical trial ### Ask Nick -- should we include all the timepoints on trial technically 
+#(CLEVER-Baseline, EOO, C3, C6, C12, 6M F/U, etc.) or just the ones while on treatment (C3, C6, C12) 
+#, or only the ones while patiennts are 
+unique_timepoints <- unique(subset_data$timepoint)
+print(unique_timepoints) 
+
+
 [1] "SURMOUNT-Baseline" "Year 1 Follow Up"  "Year 2 Follow Up" 
+ [4] "Long Term FU 1"    "Long Term FU 2"    "Year 3 Follow Up" 
+ [7] "CLEVER-Baseline"   "C6"                "6M F/U"           
+[10] "12M F/U"           "18M F/U"           "24M F/U"          
+[13] "30M F/U"           "36M F/U"           "C3"               
+[16] "EOO"               "C12"               "Year 4 Follow Up" 
+
+
trial_timepoints <- c("CLEVER-Baseline", "EOO", "C3", "C6", "C12", "6M F/U", "12M F/U", "18M F/U", "24M F/U", "30M F/U", "36M F/U")
+
+# Count the number of samples by timepoint (for specific clinical trial timepoints)
+samples_by_trial_timepoint <- subset_data |>
+  filter(timepoint %in% trial_timepoints) |>  # Filter for relevant timepoints
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples = n_distinct(participant_id),  # Count distinct participant_ids (samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result
+print(samples_by_trial_timepoint) #total samples on trial (ctDNA and dtC)
+
+
# A tibble: 11 × 2
+   timepoint       total_samples
+   <chr>                   <int>
+ 1 12M F/U                    18
+ 2 18M F/U                    13
+ 3 24M F/U                    13
+ 4 30M F/U                    12
+ 5 36M F/U                    18
+ 6 6M F/U                     27
+ 7 C12                         4
+ 8 C3                         20
+ 9 C6                         28
+10 CLEVER-Baseline            32
+11 EOO                         9
+
+
#### ctDNA on trial 
+
+ctDNA_samples_by_timepoint <- subset_data |>
+  filter(timepoint %in% trial_timepoints, !is.na(ctDNA_detected)) |>  # Filter for relevant timepoints and ctDNA detected
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples_ctDNA = n_distinct(participant_id),  # Count distinct participant_ids (ctDNA samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result for ctDNA samples -- 32 CLEVER-Baseline, 9 EOO, 20 C3, 28 C6, 4 C12, 27 6M, 18 12M, 13 18M, 13 24M, 12 30M, 18 36M 
+print(ctDNA_samples_by_timepoint)
+
+
# A tibble: 11 × 2
+   timepoint       total_samples_ctDNA
+   <chr>                         <int>
+ 1 12M F/U                          18
+ 2 18M F/U                          13
+ 3 24M F/U                          13
+ 4 30M F/U                          12
+ 5 36M F/U                          18
+ 6 6M F/U                           27
+ 7 C12                               4
+ 8 C3                               20
+ 9 C6                               28
+10 CLEVER-Baseline                  32
+11 EOO                               9
+
+
##### DTC by trial timepoint 
+# Count the number of DTC samples by timepoint (for specific clinical trial timepoints)
+dtc_samples_by_timepoint <- subset_data |>
+  filter(timepoint %in% trial_timepoints, !is.na(dtc_ihc_result_final)) |>  # Filter for relevant timepoints and DTC results
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples_dtc = n_distinct(participant_id),  # Count distinct participant_ids (DTC samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result for DTC samples -- makes sense, no CLEVER baseline timepoints, 9 EOO, 19 C3, 28 C6, 4 C12, 19 6M F/U 
+print(dtc_samples_by_timepoint)
+
+
# A tibble: 5 × 2
+  timepoint total_samples_dtc
+  <chr>                 <int>
+1 6M F/U                   19
+2 C12                       4
+3 C3                       19
+4 C6                       28
+5 EOO                       9
+
+
#### Number of ctDNA timepoints on surmount 
+print(unique_timepoints) 
+
+
 [1] "SURMOUNT-Baseline" "Year 1 Follow Up"  "Year 2 Follow Up" 
+ [4] "Long Term FU 1"    "Long Term FU 2"    "Year 3 Follow Up" 
+ [7] "CLEVER-Baseline"   "C6"                "6M F/U"           
+[10] "12M F/U"           "18M F/U"           "24M F/U"          
+[13] "30M F/U"           "36M F/U"           "C3"               
+[16] "EOO"               "C12"               "Year 4 Follow Up" 
+
+
surmount_timepoints <- c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", "Year 3 Follow Up", "Year 4 Follow Up", "Long Term FU 1", "Long Term FU 2") 
+
+ctDNA_surmount <- subset_data |>
+  filter(timepoint %in% surmount_timepoints, !is.na(ctDNA_detected)) |>  # Filter for relevant timepoints and ctDNA detected
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples_ctDNA = n_distinct(participant_id),  # Count distinct participant_ids (ctDNA samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result for ctDNA samples -- 109 Baseline, Y1 FU 40, Y2 FU 25, Y3 FU 14, Y4 FU 4, LT FU 10, LT 2 FU 2
+print(ctDNA_surmount)
+
+
# A tibble: 7 × 2
+  timepoint         total_samples_ctDNA
+  <chr>                           <int>
+1 Long Term FU 1                     10
+2 Long Term FU 2                      2
+3 SURMOUNT-Baseline                 109
+4 Year 1 Follow Up                   40
+5 Year 2 Follow Up                   25
+6 Year 3 Follow Up                   14
+7 Year 4 Follow Up                    4
+
+
### number of DTC timepoints on surmount 
+# Count the number of DTC samples by timepoint 
+dtc_timepoint_surmount <- subset_data |>
+  filter(timepoint %in% surmount_timepoints, !is.na(dtc_ihc_result_final)) |>  # Filter for relevant timepoints and DTC results
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples_dtc = n_distinct(participant_id),  # Count distinct participant_ids (DTC samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result for DTC samples -- 
+print(dtc_timepoint_surmount)
+
+
# A tibble: 5 × 2
+  timepoint         total_samples_dtc
+  <chr>                         <int>
+1 SURMOUNT-Baseline               109
+2 Year 1 Follow Up                 40
+3 Year 2 Follow Up                 24
+4 Year 3 Follow Up                 14
+5 Year 4 Follow Up                  4
+
+
#### positivity by timepoint -- ctDNA 
+
+ctDNA_pos_rate_by_timepoint <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Ensure we are considering only non-missing ctDNA_detected values
+  group_by(timepoint, participant_id) |>  # Group by timepoint and participant
+  summarise(
+    ctDNA_pos = max(ctDNA_detected == TRUE),  # If any value is TRUE, participant is ctDNA positive at that timepoint
+    .groups = "drop"
+  ) |>
+  group_by(timepoint) |>  # Group again by timepoint to calculate the positivity rate
+  summarise(
+    positivity_rate = mean(ctDNA_pos),  # Calculate the positivity rate for each timepoint
+    total_samples = n_distinct(participant_id),  # Count the number of distinct participants
+    .groups = "drop"
+  )
+
+# Print the result for ctDNA positivity rate by timepoint
+print(ctDNA_pos_rate_by_timepoint)
+
+
# A tibble: 18 × 3
+   timepoint         positivity_rate total_samples
+   <chr>                       <dbl>         <int>
+ 1 12M F/U                    0.0556            18
+ 2 18M F/U                    0                 13
+ 3 24M F/U                    0                 13
+ 4 30M F/U                    0                 12
+ 5 36M F/U                    0.0556            18
+ 6 6M F/U                     0.0370            27
+ 7 C12                        0                  4
+ 8 C3                         0                 20
+ 9 C6                         0                 28
+10 CLEVER-Baseline            0                 32
+11 EOO                        0                  9
+12 Long Term FU 1             0                 10
+13 Long Term FU 2             0                  2
+14 SURMOUNT-Baseline          0.0183           109
+15 Year 1 Follow Up           0.125             40
+16 Year 2 Follow Up           0.04              25
+17 Year 3 Follow Up           0                 14
+18 Year 4 Follow Up           0                  4
+
+
# Calculate cumulative ctDNA positivity rate by timepoint
+ctDNA_pos_rate_cumulative <- ctDNA_pos_rate_by_timepoint |>
+  arrange(timepoint) |>  # Ensure the data is sorted by timepoint
+  mutate(
+    cumulative_pos_rate = cumsum(positivity_rate * total_samples) / cumsum(total_samples)  # Cumulative positivity rate
+  )
+
+print(ctDNA_pos_rate_cumulative)
+
+
# A tibble: 18 × 4
+   timepoint         positivity_rate total_samples cumulative_pos_rate
+   <chr>                       <dbl>         <int>               <dbl>
+ 1 12M F/U                    0.0556            18              0.0556
+ 2 18M F/U                    0                 13              0.0323
+ 3 24M F/U                    0                 13              0.0227
+ 4 30M F/U                    0                 12              0.0179
+ 5 36M F/U                    0.0556            18              0.0270
+ 6 6M F/U                     0.0370            27              0.0297
+ 7 C12                        0                  4              0.0286
+ 8 C3                         0                 20              0.024 
+ 9 C6                         0                 28              0.0196
+10 CLEVER-Baseline            0                 32              0.0162
+11 EOO                        0                  9              0.0155
+12 Long Term FU 1             0                 10              0.0147
+13 Long Term FU 2             0                  2              0.0146
+14 SURMOUNT-Baseline          0.0183           109              0.0159
+15 Year 1 Follow Up           0.125             40              0.0282
+16 Year 2 Follow Up           0.04              25              0.0289
+17 Year 3 Follow Up           0                 14              0.0279
+18 Year 4 Follow Up           0                  4              0.0276
+
+
#### Cumulative positivity ctDNA 
+
+library(dplyr)
+
+# Calculate ctDNA positivity rate by participant
+ctDNA_pos_rate <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Exclude missing ctDNA results
+  group_by(participant_id) |>  # Group by participant
+  summarise(
+    ctDNA_pos = max(ctDNA_detected == TRUE),  # If any value is TRUE, participant is ctDNA positive
+    .groups = "drop"
+  )
+
+# Calculate cumulative positivity rate
+ctDNA_pos_rate_cumulative <- ctDNA_pos_rate |>
+  summarise(
+    total_pos = sum(ctDNA_pos),  # Total number of ctDNA positive participants
+    total_samples = n(),  # Total number of participants
+    cumulative_pos_rate = total_pos / total_samples  # Cumulative positivity rate
+  )
+
+# Print the cumulative positivity rate
+print(ctDNA_pos_rate_cumulative)
+
+
# A tibble: 1 × 3
+  total_pos total_samples cumulative_pos_rate
+      <int>         <int>               <dbl>
+1         9           109              0.0826
+
+
# Count the number of positive ctDNA samples and total samples
+ctDNA_pos_vs_total <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Exclude missing ctDNA results
+  summarise(
+    total_samples = n(),  # Total number of ctDNA samples
+    positive_samples = sum(ctDNA_detected == TRUE),  # Count of positive ctDNA samples
+    .groups = "drop"
+  ) |>
+  mutate(
+    positivity_rate = positive_samples / total_samples  # Proportion of positive ctDNA samples
+  )
+
+# Print the results
+print(ctDNA_pos_vs_total)
+
+
# A tibble: 1 × 3
+  total_samples positive_samples positivity_rate
+          <int>            <int>           <dbl>
+1           398               11          0.0276
+
+
#### cumulative positivity DTC 
+
+# Calculate ctDNA positivity rate by participant
+DTC_pos_rate <- subset_data |>
+  filter(!is.na(dtc_ihc_result_final)) |>  # Exclude missing ctDNA results
+  group_by(participant_id) |>  # Group by participant
+  summarise(
+    dtc = max(dtc_ihc_result_final == 1),  # If any value is TRUE, participant is ctDNA positive
+    .groups = "drop"
+  )
+
+# Calculate cumulative positivity rate
+DTC_pos_rate_cumulative <- DTC_pos_rate |>
+  summarise(
+    total_pos = sum(dtc),  # Total number of ctDNA positive participants
+    total_samples = n(),  # Total number of participants
+    cumulative_pos_rate = total_pos / total_samples  # Cumulative positivity rate
+  )
+
+# Print the cumulative positivity rate
+print(DTC_pos_rate_cumulative)
+
+
# A tibble: 1 × 3
+  total_pos total_samples cumulative_pos_rate
+      <int>         <int>               <dbl>
+1        39           109               0.358
+
+
# Count the number of positive ctDNA samples and total samples
+dtc_pos_vs_total <- subset_data |>
+  filter(!is.na(dtc_ihc_result_final)) |>  # Exclude missing ctDNA results
+  summarise(
+    total_samples = n(),  # Total number of ctDNA samples
+    positive_samples = sum(dtc_ihc_result_final == 1),  # Count of positive ctDNA samples
+    .groups = "drop"
+  ) |>
+  mutate(
+    positivity_rate = positive_samples / total_samples  # Proportion of positive ctDNA samples
+  )
+
+# Print the results
+print(dtc_pos_vs_total)
+
+
# A tibble: 1 × 3
+  total_samples positive_samples positivity_rate
+          <int>            <int>           <dbl>
+1           270               49           0.181
+
+
+

We see the distribution of test samples by timepoint, and can see that the most samples–and the highest rate of positivity– occurred at SURMOUNT-baseline, but that more samples became positive with subsequent testing and that the cumulative positivity rate rose with additional timepoints–for both DTC and ctDNA assessment.

+

Test Characteristics of ctDNA assay: Next we will look at the sensitivity and specificity of the ctDNA assay.

+
+
######  Test characteristics ctDNA 
+#trying to do ctDNA 2x2 with ever relapsed on a patient level 
+
+library(dplyr)
+library(knitr)
+
+#create ever_relapsed variable 
+subset_data <- subset_data |>
+  mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No"))
+
+
+# Exclude participants with all NA for `ctDNA_ever` or `ever_relapsed`
+summarized_data <- subset_data |>
+  filter(!is.na(ctDNA_ever) | !is.na(ever_relapsed)) |> # Keep rows with at least one non-NA value
+  group_by(participant_id) |>
+  summarize(
+    ctDNA_ever = max(ctDNA_ever, na.rm = TRUE),       
+    ever_relapsed = max(ever_relapsed, na.rm = TRUE)  
+  )
+
+
Warning: There were 2 warnings in `summarize()`.
+The first warning was:
+ℹ In argument: `ever_relapsed = max(ever_relapsed, na.rm = TRUE)`.
+ℹ In group 27: `participant_id = "28115-17-021"`.
+Caused by warning in `max()`:
+! no non-missing arguments, returning NA
+ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
+
+
# Create the confusion matrix
+confusion_matrix <- table(summarized_data$ctDNA_ever, summarized_data$ever_relapsed)
+
+# Extract counts from the confusion matrix
+TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0)  # True Positives
+FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0)  # False Positives
+TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0)  # True Negatives
+FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0)  # False Negatives
+
+# Calculate performance metrics
+sensitivity <- TP / (TP + FN)  # Sensitivity
+specificity <- TN / (TN + FP)  # Specificity
+PPV <- TP / (TP + FP)          # Positive Predictive Value
+NPV <- TN / (TN + FN)          # Negative Predictive Value
+
+# Create a data frame for the table
+performance_table <- data.frame(
+  Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"),
+  Value = c(sensitivity, specificity, PPV, NPV)
+)
+
+# Print the table
+print(performance_table)
+
+
                           Metric     Value
+1                     Sensitivity 0.5714286
+2                     Specificity 0.9892473
+3 Positive Predictive Value (PPV) 0.8888889
+4 Negative Predictive Value (NPV) 0.9387755
+
+
#Format the table for better readability
+kable(performance_table, digits = 2, col.names = c("Metric", "Value"))
+
+ + + + + + + + + + + + + + + + + + + + + + + + + +
MetricValue
Sensitivity0.57
Specificity0.99
Positive Predictive Value (PPV)0.89
Negative Predictive Value (NPV)0.94
+
+
+

This ctDNA assay has high specificity (99%), with a high positive predictive value for relapse (89%) and also a high negative predictive value (94%).

+
+
### Test characteristics for DTC -- and trial #s 
+
+library(dplyr)
+
+# Total unique DTC+ patients
+total_dtc_plus <- subset_data |>
+  filter(dtc_ihc_result_final == 1) |>
+  distinct(participant_id) |>
+  nrow()
+
+# Unique DTC+ patients who went on trial (those who have a trial ID fu_trial_pid)
+dtc_plus_trial <- subset_data |>
+  filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) |>
+  distinct(participant_id) |>
+  nrow()
+
+# Proportion of DTC+ patients who went on trial
+proportion_trial <- dtc_plus_trial / total_dtc_plus
+
+# Display results
+cat("Total unique DTC+ patients:", total_dtc_plus, "\n")
+
+
Total unique DTC+ patients: 39 
+
+
cat("Unique DTC+ patients who went on trial:", dtc_plus_trial, "\n")
+
+
Unique DTC+ patients who went on trial: 39 
+
+
cat("Proportion of DTC+ patients who went on trial:", proportion_trial, "\n")
+
+
Proportion of DTC+ patients who went on trial: 1 
+
+
# All DTC + patients went on trial (39/39)
+
+
+# Exclude participants with all NA for `dtc_ever` or `ever_relapsed`
+summarized_data <- subset_data |>
+  filter(!is.na(dtc_ever) | !is.na(ever_relapsed)) |> # Keep rows with at least one non-NA value
+  group_by(participant_id) |>
+  summarize(
+    dtc_ever = max(dtc_ever, na.rm = TRUE),       
+    ever_relapsed = max(ever_relapsed, na.rm = TRUE)  
+  )
+
+
Warning: There were 2 warnings in `summarize()`.
+The first warning was:
+ℹ In argument: `ever_relapsed = max(ever_relapsed, na.rm = TRUE)`.
+ℹ In group 27: `participant_id = "28115-17-021"`.
+Caused by warning in `max()`:
+! no non-missing arguments, returning NA
+ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
+
+
# Create the confusion matrix
+confusion_matrix <- table(summarized_data$dtc_ever, summarized_data$ever_relapsed)
+
+# Extract counts from the confusion matrix
+TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0)  # True Positives
+FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0)  # False Positives
+TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0)  # True Negatives
+FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0)  # False Negatives
+
+# Calculate performance metrics
+sensitivity <- TP / (TP + FN)  # Sensitivity
+specificity <- TN / (TN + FP)  # Specificity
+PPV <- TP / (TP + FP)          # Positive Predictive Value
+NPV <- TN / (TN + FN)          # Negative Predictive Value
+
+# Create a data frame for the table
+performance_table <- data.frame(
+  Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"),
+  Value = c(sensitivity, specificity, PPV, NPV)
+)
+
+# Print the table
+print(performance_table)
+
+
                           Metric     Value
+1                     Sensitivity 0.2857143
+2                     Specificity 0.6344086
+3 Positive Predictive Value (PPV) 0.1052632
+4 Negative Predictive Value (NPV) 0.8550725
+
+
#Format the table for better readability
+library(knitr)
+kable(performance_table, digits = 2, col.names = c("Metric", "Value"))
+
+ + + + + + + + + + + + + + + + + + + + + + + + + +
MetricValue
Sensitivity0.29
Specificity0.63
Positive Predictive Value (PPV)0.11
Negative Predictive Value (NPV)0.86
+
+
+

All of the 39 individuals who were DTC positive went onto an interventional treatment trial aimed at eliminating the presence of the DTCs. This is different from the workflow for ctDNA assessment, which occurred retrospectively–sometimes several years after testing–and was not the basis for any trial/intervention decision-making. It is therefore somewhat challenging to interpret the sensitivity and specificity of the DTC test, as relapse is the outcome and all of these patients are receiving an intervention aimed at eliminating the presence of the DTCs and thereby preventing relapse. The intervention after DTC assessment explains in part the low positive predictive value and the low sensitivity of the test. However, the high negative predictive value of 0.86 in the cohort–which is looking only at those who remained DTC negative and their outcomes (ie. those who did NOT get an intervention) suggests that repeat negative DTC testing (ie always remaining DTC negative on all testing) is valuable in predicting a good outcome (ie. NO relapse during follow-up).

+

Associations with Relapse

+
+
## ctDNA association with relapse ## 
+# link by participant id 
+subset_data_by_id <- subset_data %>%
+  group_by(participant_id) %>%
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs dtc_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results. ctDNA has a strong association with relapse (p<0.0001). 
+print(contingency_table)
+
+
     
+      FALSE TRUE
+  No     92    1
+  Yes     6    8
+
+
print(chisq_test)  
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 42.642, df = 1, p-value = 6.573e-11
+
+
#DTC association with relapse## 
+
+# link by participant id 
+subset_data_by_id <- subset_data %>%
+  group_by(participant_id) %>%
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc = first(dtc_ever),        # Get the dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs dtc_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results. Less strong of an association with relapse (p = 0.774) 
+print(contingency_table)
+
+
     
+       0  1
+  No  59 34
+  Yes 10  4
+
+
print(chisq_test)  
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.079932, df = 1, p-value = 0.7774
+
+
+

Looking at how our two biomarkers are associated with relapse using univariable tests of association, we can see that ctDNA positivity is strongly associated with relapse, but DTC positivity is not. It is important to keep in mind that DTC positivity was the basis for enrollment onto interventional clinical trials that were aimed at eliminating DTCs and preventing relapse (and all DTC positive individuals in this cohort enrolled on interventional trials). This likely confounds our ability to measure the association of DTC positivity with relapse. ctDNA assessment, meanwhile, was performed retrospectively and not used for clinical decision-making.

+

Demographics and Clinical Factor Assessment: Univariable associations by ctDNA status

+

Next we will start to build our Table 1, looking at important clinical and demographic variables in this ctDNA cohort. To start, we will look at univariable tests of association while looking at each variable (using chi-squared tests of association for categorical variables and t-tests for continuous variables).

+
+
library(dplyr)
+
+########### Variables to look at for Table 1 #########
+names(subset_data) #to identify the variables I want to use 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                         "ever_relapsed"                   
+
+
###### median age at diagnosis -- this requires some initial varialbe manipulation to start as the variables are in character form, not date form 
+str(subset_data$diag_date_1) #character -- need to be changed to date 
+
+
 chr [1:398] "08/15/2013" "08/15/2013" "08/15/2013" "08/15/2013" ...
+
+
str(subset_data$demo_dob) #character  -- need to be changed to date 
+
+
 chr [1:398] "09/21/1957" "09/21/1957" "09/21/1957" "09/21/1957" ...
+
+
d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y")  
+d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y")  
+
+str(d$diag_date_1) #dates! 
+
+
 Date[1:579], format: "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" ...
+
+
str(d$demo_dob) #dates! 
+
+
 Date[1:579], format: "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" ...
+
+
### doing the same for subset_data as it didn't carry over into that data set 
+subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y")  
+subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y")  
+
+# calculating age from date of diagnosis to dob 
+subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25
+head(subset_data$age_at_diag)
+
+
[1] 55.89870 55.89870 55.89870 55.89870 55.89870 49.25667
+
+
summary(subset_data$age_at_diag) #median 48.75 
+
+
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
+  27.34   41.73   48.75   49.35   57.63   68.94 
+
+
age_summary <- subset_data |> 
+  group_by(ctDNA_ever) |> 
+  summarise(
+    mean_age = mean(age_at_diag, na.rm = TRUE),  # Calculate mean age
+    median_age = median(age_at_diag, na.rm = TRUE),  # Calculate median age
+    sd_age = sd(age_at_diag, na.rm = TRUE),  # Calculate standard deviation of age
+    n = n()  # Number of participants in each group
+  )
+
+print(age_summary)
+
+
# A tibble: 2 × 5
+  ctDNA_ever mean_age median_age sd_age     n
+  <lgl>         <dbl>      <dbl>  <dbl> <int>
+1 FALSE          49.1       48.5   9.77   372
+2 TRUE           53.3       50.4   7.64    26
+
+
# Perform the Wilcoxon rank-sum test to compare the medians of age between ctDNA_ever positive and negative groups
+wilcox_test_result <- wilcox.test(age_at_diag ~ ctDNA_ever, data = subset_data)
+
+# Print the result
+print(wilcox_test_result)
+
+

+    Wilcoxon rank sum test with continuity correction
+
+data:  age_at_diag by ctDNA_ever
+W = 3499, p-value = 0.01842
+alternative hypothesis: true location shift is not equal to 0
+
+
#looking at range of age for the ctDNA pos vs neg groups 
+age_summary <- subset_data |> 
+  group_by(ctDNA_ever) |> 
+  summarise(
+    min_age = min(age_at_diag, na.rm = TRUE),  # Minimum age
+    max_age = max(age_at_diag, na.rm = TRUE),  # Maximum age
+    .groups = "drop"
+  )
+
+# View the summary table for age 
+print(age_summary)
+
+
# A tibble: 2 × 3
+  ctDNA_ever min_age max_age
+  <lgl>        <dbl>   <dbl>
+1 FALSE         27.3    68.9
+2 TRUE          38.6    64.4
+
+
+
+
##### Race: demo_race_final
+
+# Get the count of unique participant_ids for each category in demo_race_final
+race_counts_unique_percent <- subset_data |>
+  group_by(demo_race_final) |>
+  summarise(unique_participants = n_distinct(participant_id)) |>
+  mutate(percent = unique_participants / sum(unique_participants) * 100)
+
+# View the result
+print(race_counts_unique_percent)
+
+
# A tibble: 3 × 3
+  demo_race_final unique_participants percent
+            <int>               <int>   <dbl>
+1               1                   9   8.26 
+2               3                   1   0.917
+3               5                  99  90.8  
+
+
# Count distinct participant_ids by ctDNA_ever and demo_race_final
+count_distinct_participants <- subset_data |>
+  group_by(demo_race_final, ctDNA_ever) |>
+  summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop")
+
+# Print the result
+count_distinct_participants
+
+
# A tibble: 5 × 3
+  demo_race_final ctDNA_ever distinct_participant_count
+            <int> <lgl>                           <int>
+1               1 FALSE                               8
+2               1 TRUE                                1
+3               3 FALSE                               1
+4               5 FALSE                              91
+5               5 TRUE                                8
+
+
# Step 1: Summarize by unique participant_id
+summarized_data <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ctDNA_ever = first(ctDNA_ever),   # Taking the first observed value of ctDNA_ever for each participant
+    demo_race_final = first(demo_race_final),  # Taking the first observed value of demo_race_final for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table <- table(summarized_data$ctDNA_ever, summarized_data$demo_race_final)
+contingency_table
+
+
       
+         1  3  5
+  FALSE  8  1 91
+  TRUE   1  0  8
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the result p val - 0.91 
+chisq_test
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 0.19084, df = 2, p-value = 0.909
+
+
#####receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') 
+
+# Breakdown of final_receptor_group by unique participant_id
+receptor_status_by_participant <- subset_data |>
+  group_by(participant_id) |>
+  summarise(final_receptor_group = first(final_receptor_group),  # Or choose the most frequent group if needed
+            .groups = "drop")
+
+# View the result
+table(receptor_status_by_participant$final_receptor_group)
+
+

+ 1  2  3  4 
+45 52  8  4 
+
+
# Summarizing data by participant_id, final_receptor_group, and ctDNA_ever
+receptor_ctDNA_status <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    final_receptor_group = first(final_receptor_group),  # Or the most frequent if needed
+    ctDNA_ever = first(ctDNA_ever),  # Taking the first observed value for ctDNA_ever
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table_receptor <- table(receptor_ctDNA_status$final_receptor_group, receptor_ctDNA_status$ctDNA_ever)
+contingency_table_receptor
+
+
   
+    FALSE TRUE
+  1    44    1
+  2    45    7
+  3     8    0
+  4     3    1
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table_receptor)
+
+
Warning in chisq.test(contingency_table_receptor): Chi-squared approximation
+may be incorrect
+
+
# Step 4: Print the result # p-value 0.10
+chisq_test
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table_receptor
+X-squared = 6.2231, df = 3, p-value = 0.1012
+
+
#I was curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive)
+#inclusion criteria inc_dx_crit___1  = TNBC  (This has been confirmed with the study team)
+#inc_dx_crit_list___1  
+
+TNBC_ctDNA_status <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    inc_dx_crit_list___1 = first(inc_dx_crit_list___1),  # Or the most frequent if needed
+    ctDNA_ever = first(ctDNA_ever),  # Taking the first observed value for ctDNA_ever
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table_TNBC <- table(TNBC_ctDNA_status$inc_dx_crit_list___1, TNBC_ctDNA_status$ctDNA_ever)
+contingency_table_TNBC
+
+
   
+    FALSE TRUE
+  0    56    8
+  1    44    1
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table_TNBC)
+
+
Warning in chisq.test(contingency_table_TNBC): Chi-squared approximation may be
+incorrect
+
+
# Step 4: p-val is 0.12 
+chisq_test
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table_TNBC
+X-squared = 2.4526, df = 1, p-value = 0.1173
+
+
### HR positive vs HR negative (Hormone receptor positive vs hormone receptor negative)
+#first, I need to create a HR positive variable (HR_status)
+subset_data <- subset_data |> 
+  mutate(HR_status = case_when(
+    final_receptor_group %in% c(2, 3) ~ "HR+",
+    final_receptor_group %in% c(1, 4) ~ "Non-HR+",
+    TRUE ~ NA_character_  # In case there are missing or other unexpected values
+  ))
+
+# View the new HR_status variable
+table(subset_data$HR_status)
+
+

+    HR+ Non-HR+ 
+    225     173 
+
+
HR_status_by_participant <- subset_data |>
+  group_by(participant_id) |>
+  summarise(HR_status = first(HR_status),  # Or use mode() if you have multiple rows per participant
+            .groups = "drop")
+
+# View the result 
+table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-)
+
+

+    HR+ Non-HR+ 
+     60      49 
+
+
# Summarize ctDNA_detected status by HR_status, for each unique participant_id
+summary_data <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    HR_status = first(HR_status),  # Get the HR_status for the participant
+    ctDNA_status = first(ctDNA_ever),  # Get the ctDNA_detected status for the participant
+    .groups = "drop"
+  )
+
+contingency_table_HR <- table(summary_data$ctDNA_status, summary_data$HR_status)
+contingency_table_HR
+
+
       
+        HR+ Non-HR+
+  FALSE  53      47
+  TRUE    7       2
+
+
chisq_test <- chisq.test(contingency_table_HR)
+
+
Warning in chisq.test(contingency_table_HR): Chi-squared approximation may be
+incorrect
+
+
# Print chi-squared test results #0.28 
+chisq_test
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table_HR
+X-squared = 1.1696, df = 1, p-value = 0.2795
+
+
###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported  
+# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported 
+summary_data <- subset_data |>
+  filter(final_tumor_grade != 3) |>  # Exclude grade == 3
+  group_by(participant_id) |>
+  summarise(
+    grade = first(final_tumor_grade),  # Get the final_tumor_grade for each participant
+    ctDNA_ever = first(ctDNA_ever),    # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of grade vs ctDNA_ever
+contingency_table <- table(summary_data$grade, summary_data$ctDNA_ever)
+
+# View the contingency table
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    75    4
+  1    17    5
+  2     6    0
+
+
# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# View the Chi-squared test result -- p-value 0.0229 
+print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 7.5533, df = 2, p-value = 0.0229
+
+
######histology (final histology)
+#people have different combinations of histology (1-15)
+table(subset_data$participant_id, subset_data$final_histology)
+
+
              
+                1 1,13,14,3 1,3 11,3 12,3 13,3 13,3,5 14 14,15 14,15,3 14,3
+  28115-16-001  0         0   0    0    0    0      0  0     0       0    5
+  28115-16-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-010  0         0   0    0    1    0      0  0     0       0    0
+  28115-16-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-015  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-017  0         0   0    0    0    0      0  3     0       0    0
+  28115-16-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-023  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-026  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-029  0         0   0    0    0    0      0  2     0       0    0
+  28115-16-033  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-035  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-001  0         0   8    0    0    0      0  0     0       0    0
+  28115-17-002  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-006  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-008  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-010  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-012  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-017  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-019  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-023  0         0   2    0    0    0      0  0     0       0    0
+  28115-17-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-030  0         0   3    0    0    0      0  0     0       0    0
+  28115-17-031  0         0   0    0    0    0      0  5     0       0    0
+  28115-17-032  0         0   0   10    0    0      0  0     0       0    0
+  28115-17-036  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-039  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-040  0         0   4    0    0    0      0  0     0       0    0
+  28115-17-045  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-046  0         0   0    0    0   10      0  0     0       0    0
+  28115-17-047  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-048  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-050  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-051  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-052  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-001  0         0   0    0    0    0      0  0     7       0    0
+  28115-18-002  0         0   2    0    0    0      0  0     0       0    0
+  28115-18-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-006  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-009  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-015  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-017  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-021  0         0   0    0    0    0      0  8     0       0    0
+  28115-18-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-023  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-028  1         0   0    0    0    0      0  0     0       0    0
+  28115-18-029  0         0   0    0    0    0      0  4     0       0    0
+  28115-18-030  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-031  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-032  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-034  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-001  0         0   0    0    0    0      0  1     0       0    0
+  28115-19-002  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-003  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-005  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-006  0         0   0    0    0    8      0  0     0       0    0
+  28115-19-007  0         0   0    0    0    0      0  0     0       0    5
+  28115-19-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-012  0         3   0    0    0    0      0  0     0       0    0
+  28115-19-014  0         0   0    0    0    0      0  0     0       0    2
+  28115-19-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-017  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-019  0         0   0    0    0    0      0  0     0       0    3
+  28115-19-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-028  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-007  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-010  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-001  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-002  0         0   0    0    0    0      0  0     0       0    4
+  28115-21-003  0         0   0    0    0    0      0  2     0       0    0
+  28115-21-006  0         0   0    0    0    2      0  0     0       0    0
+  28115-21-007  0         0   0    0    0    0      0  3     0       0    0
+  28115-21-009  0         0   0    0    0    0      3  0     0       0    0
+  28115-21-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-013  0         0   0    0    0    0      0  0     0       0    4
+  28115-21-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-015  0         0   0    0    0    0      0  0     0       2    0
+  28115-21-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-019  0         0   0    0    0    0      0  1     0       0    0
+  28115-21-020  0         0   3    0    0    0      0  0     0       0    0
+  28115-21-021  0         0   0    0    0    0      0  3     0       0    0
+  28115-21-022  0         0   0    0    0    1      0  0     0       0    0
+  28115-21-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-026  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-028  0         0   0    0    0    0      0  0     0       0    1
+              
+               16,3  3 3,5 3,7  5
+  28115-16-001    0  0   0   0  0
+  28115-16-004    0  1   0   0  0
+  28115-16-010    0  0   0   0  0
+  28115-16-014    0  1   0   0  0
+  28115-16-015    0 12   0   0  0
+  28115-16-017    0  0   0   0  0
+  28115-16-020    0  0   1   0  0
+  28115-16-021    0  9   0   0  0
+  28115-16-023    0  1   0   0  0
+  28115-16-025    0  1   0   0  0
+  28115-16-026    0 10   0   0  0
+  28115-16-027    0  3   0   0  0
+  28115-16-029    0  0   0   0  0
+  28115-16-033    0  2   0   0  0
+  28115-16-035    0  1   0   0  0
+  28115-17-001    0  0   0   0  0
+  28115-17-002    0  9   0   0  0
+  28115-17-006    0  1   0   0  0
+  28115-17-008    0  9   0   0  0
+  28115-17-009    0  1   0   0  0
+  28115-17-010    0  5   0   0  0
+  28115-17-011    0  9   0   0  0
+  28115-17-012    0 10   0   0  0
+  28115-17-016    0  4   0   0  0
+  28115-17-017    0  5   0   0  0
+  28115-17-019    0  9   0   0  0
+  28115-17-021    0  1   0   0  0
+  28115-17-022    0  1   0   0  0
+  28115-17-023    0  0   0   0  0
+  28115-17-024    0  0   0   4  0
+  28115-17-025    0  2   0   0  0
+  28115-17-027    0  8   0   0  0
+  28115-17-030    0  0   0   0  0
+  28115-17-031    0  0   0   0  0
+  28115-17-032    0  0   0   0  0
+  28115-17-036    0  7   0   0  0
+  28115-17-039    0  2   0   0  0
+  28115-17-040    0  0   0   0  0
+  28115-17-045    0  0   1   0  0
+  28115-17-046    0  0   0   0  0
+  28115-17-047    0  3   0   0  0
+  28115-17-048    0  2   0   0  0
+  28115-17-050    0  3   0   0  0
+  28115-17-051    0  9   0   0  0
+  28115-17-052    0  0   0   0  3
+  28115-18-001    0  0   0   0  0
+  28115-18-002    0  0   0   0  0
+  28115-18-004    0  2   0   0  0
+  28115-18-006    0  0   0   0  0
+  28115-18-009    0  0   0   0  0
+  28115-18-011    0  5   0   0  0
+  28115-18-014    0  2   0   0  0
+  28115-18-015    0  5   0   0  0
+  28115-18-017    0  0   0   0  0
+  28115-18-020    0  8   0   0  0
+  28115-18-021    0  0   0   0  0
+  28115-18-022    0  0   0  12  0
+  28115-18-023    0  3   0   0  0
+  28115-18-024    0  0   0   2  0
+  28115-18-027    0  1   0   0  0
+  28115-18-028    0  0   0   0  0
+  28115-18-029    0  0   0   0  0
+  28115-18-030    0  2   0   0  0
+  28115-18-031    0  3   0   0  0
+  28115-18-032    0  6   0   0  0
+  28115-18-034    0  1   0   0  0
+  28115-19-001    0  0   0   0  0
+  28115-19-002    0  2   0   0  0
+  28115-19-003    0  5   0   0  0
+  28115-19-004    0  1   0   0  0
+  28115-19-005    0  3   0   0  0
+  28115-19-006    0  0   0   0  0
+  28115-19-007    0  0   0   0  0
+  28115-19-009    0  6   0   0  0
+  28115-19-011    0  1   0   0  0
+  28115-19-012    0  0   0   0  0
+  28115-19-014    0  0   0   0  0
+  28115-19-016    0  2   0   0  0
+  28115-19-017    0  2   0   0  0
+  28115-19-019    0  0   0   0  0
+  28115-19-020    0  2   0   0  0
+  28115-19-021    0  4   0   0  0
+  28115-19-022    0  2   0   0  0
+  28115-19-025    0  6   0   0  0
+  28115-19-028    0  2   0   0  0
+  28115-20-004    0  2   0   0  0
+  28115-20-007    0  2   0   0  0
+  28115-20-009    0  4   0   0  0
+  28115-20-010    0  1   0   0  0
+  28115-21-001    0  1   0   0  0
+  28115-21-002    0  0   0   0  0
+  28115-21-003    0  0   0   0  0
+  28115-21-006    0  0   0   0  0
+  28115-21-007    0  0   0   0  0
+  28115-21-009    0  0   0   0  0
+  28115-21-011    0  1   0   0  0
+  28115-21-013    0  0   0   0  0
+  28115-21-014    0  2   0   0  0
+  28115-21-015    0  0   0   0  0
+  28115-21-016    0  8   0   0  0
+  28115-21-019    0  0   0   0  0
+  28115-21-020    0  0   0   0  0
+  28115-21-021    0  0   0   0  0
+  28115-21-022    0  0   0   0  0
+  28115-21-024    0  2   0   0  0
+  28115-21-025    0  2   0   0  0
+  28115-21-026    2  0   0   0  0
+  28115-21-027    0  2   0   0  0
+  28115-21-028    0  0   0   0  0
+
+
  histology_summary <- subset_data |>
+    distinct(participant_id, final_histology) |>  # Get unique participant-histology combinations
+    group_by(final_histology) |>  # Group by histology type
+    summarise(count = n())  # Count the number of participants per histology type
+  
+  # View the summary table
+  print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology 
+
+
# A tibble: 16 × 2
+   final_histology count
+   <chr>           <int>
+ 1 1                   1
+ 2 1,13,14,3           1
+ 3 1,3                 6
+ 4 11,3                1
+ 5 12,3                1
+ 6 13,3                4
+ 7 13,3,5              1
+ 8 14                 13
+ 9 14,15               1
+10 14,15,3             1
+11 14,3                7
+12 16,3                1
+13 3                  65
+14 3,5                 2
+15 3,7                 3
+16 5                   1
+
+
  #trying to create Ductal, lobular, both, or other variables --> histology_category 
+  subset_data <- subset_data |>
+    mutate(histology_category = case_when(
+      grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular",  # Both Ductal and Lobular
+      grepl("3", as.character(final_histology)) ~ "Ductal",  # Ductal
+      grepl("14", as.character(final_histology)) ~ "Lobular",  # Lobular
+      TRUE ~ "Other"  # Any other combination
+    ))
+  
+  # Count the number of participants in each histology category
+  histology_counts <- subset_data |>
+    group_by(histology_category) |>
+    summarise(count = n_distinct(participant_id))  # Count distinct participants
+  
+  # View the counts -- adds up to 109! 
+  print(histology_counts)
+
+
# A tibble: 4 × 2
+  histology_category      count
+  <chr>                   <int>
+1 Both Ductal and Lobular     9
+2 Ductal                     84
+3 Lobular                    14
+4 Other                       2
+
+
  #contingency table 
+  library(tidyr)
+  contingency_table <- subset_data |>
+    distinct(participant_id, histology_category, ctDNA_ever) |>  # Ensure each patient is counted once
+    count(histology_category, ctDNA_ever) |>
+    pivot_wider(names_from = ctDNA_ever, values_from = n, values_fill = list(n = 0))  # Pivot the table to get ctDNA_ever as columns
+  
+  # 3. Perform the Chi-squared test of independence
+  chisq_test <- chisq.test(contingency_table[,-1])  # Remove the histology_category column for the test
+
+
Warning in chisq.test(contingency_table[, -1]): Chi-squared approximation may
+be incorrect
+
+
  # 4. Print the contingency table
+  print(contingency_table) 
+
+
# A tibble: 4 × 3
+  histology_category      `FALSE` `TRUE`
+  <chr>                     <int>  <int>
+1 Both Ductal and Lobular       9      0
+2 Ductal                       78      6
+3 Lobular                      11      3
+4 Other                         2      0
+
+
  # 5. Print the result of the Chi-squared test p-value - 0.2276
+  print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table[, -1]
+X-squared = 4.334, df = 3, p-value = 0.2276
+
+
#### Staging N stage (Nodal stage) 
+
+table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3)
+
+
              
+                0  1  2  3
+  28115-16-001  0  0  0  5
+  28115-16-004  1  0  0  0
+  28115-16-010  0  0  0  1
+  28115-16-014  1  0  0  0
+  28115-16-015 12  0  0  0
+  28115-16-017  0  0  3  0
+  28115-16-020  0  0  1  0
+  28115-16-021  0  0  9  0
+  28115-16-023  1  0  0  0
+  28115-16-025  1  0  0  0
+  28115-16-026 10  0  0  0
+  28115-16-027  0  3  0  0
+  28115-16-029  2  0  0  0
+  28115-16-033  0  2  0  0
+  28115-16-035  1  0  0  0
+  28115-17-001  0  8  0  0
+  28115-17-002  9  0  0  0
+  28115-17-006  0  1  0  0
+  28115-17-008  9  0  0  0
+  28115-17-009  1  0  0  0
+  28115-17-010  5  0  0  0
+  28115-17-011  0  0  0  9
+  28115-17-012  0  0  0 10
+  28115-17-016  0  4  0  0
+  28115-17-017  0  5  0  0
+  28115-17-019  9  0  0  0
+  28115-17-021  1  0  0  0
+  28115-17-022  1  0  0  0
+  28115-17-023  0  0  2  0
+  28115-17-024  4  0  0  0
+  28115-17-025  2  0  0  0
+  28115-17-027  0  8  0  0
+  28115-17-030  3  0  0  0
+  28115-17-031  5  0  0  0
+  28115-17-032  0  0 10  0
+  28115-17-036  7  0  0  0
+  28115-17-039  2  0  0  0
+  28115-17-040  0  0  4  0
+  28115-17-045  1  0  0  0
+  28115-17-046 10  0  0  0
+  28115-17-047  0  3  0  0
+  28115-17-048  0  0  2  0
+  28115-17-050  3  0  0  0
+  28115-17-051  9  0  0  0
+  28115-17-052  3  0  0  0
+  28115-18-001  0  0  7  0
+  28115-18-002  0  2  0  0
+  28115-18-004  0  0  2  0
+  28115-18-006  0  1  0  0
+  28115-18-009  1  0  0  0
+  28115-18-011  0  5  0  0
+  28115-18-014  0  2  0  0
+  28115-18-015  5  0  0  0
+  28115-18-017  0  1  0  0
+  28115-18-020  8  0  0  0
+  28115-18-021  0  8  0  0
+  28115-18-022 12  0  0  0
+  28115-18-023  0  3  0  0
+  28115-18-024  0  2  0  0
+  28115-18-027  0  1  0  0
+  28115-18-028  1  0  0  0
+  28115-18-029  0  4  0  0
+  28115-18-030  2  0  0  0
+  28115-18-031  0  3  0  0
+  28115-18-032  0  6  0  0
+  28115-18-034  1  0  0  0
+  28115-19-001  0  0  0  1
+  28115-19-002  0  2  0  0
+  28115-19-003  0  5  0  0
+  28115-19-004  0  1  0  0
+  28115-19-005  3  0  0  0
+  28115-19-006  0  8  0  0
+  28115-19-007  0  5  0  0
+  28115-19-009  0  0  0  6
+  28115-19-011  0  1  0  0
+  28115-19-012  0  3  0  0
+  28115-19-014  0  0  0  2
+  28115-19-016  2  0  0  0
+  28115-19-017  2  0  0  0
+  28115-19-019  0  3  0  0
+  28115-19-020  2  0  0  0
+  28115-19-021  0  4  0  0
+  28115-19-022  0  2  0  0
+  28115-19-025  0  6  0  0
+  28115-19-028  2  0  0  0
+  28115-20-004  2  0  0  0
+  28115-20-007  0  0  2  0
+  28115-20-009  4  0  0  0
+  28115-20-010  0  1  0  0
+  28115-21-001  0  1  0  0
+  28115-21-002  0  4  0  0
+  28115-21-003  0  0  2  0
+  28115-21-006  0  2  0  0
+  28115-21-007  0  0  3  0
+  28115-21-009  0  0  3  0
+  28115-21-011  1  0  0  0
+  28115-21-013  0  4  0  0
+  28115-21-014  0  2  0  0
+  28115-21-015  0  2  0  0
+  28115-21-016  8  0  0  0
+  28115-21-019  0  1  0  0
+  28115-21-020  0  3  0  0
+  28115-21-021  0  3  0  0
+  28115-21-022  1  0  0  0
+  28115-21-024  2  0  0  0
+  28115-21-025  0  2  0  0
+  28115-21-026  0  2  0  0
+  28115-21-027  2  0  0  0
+  28115-21-028  1  0  0  0
+
+
nodal_summary <- subset_data |>
+    distinct(participant_id, final_n_stage) |>  # Get unique participant-stage combinations
+    group_by(final_n_stage) |>  # Group by stage
+    summarise(count = n())  # Count the number of participants per histology type
+  
+#View the summary table --adds up to 109, 46 = pN0 63 = pN1 
+  print(nodal_summary)
+
+
# A tibble: 4 × 2
+  final_n_stage count
+          <int> <int>
+1             0    46
+2             1    43
+3             2    13
+4             3     7
+
+
  subset_data_by_id <- subset_data |>
+    filter(final_n_stage %in% c(0, 1, 2, 3)) |>  # Include only relevant nodal stages
+    group_by(participant_id) |>
+    summarise(
+      nodal_status = first(final_n_stage),  # Use final_n_stage as nodal_status for each participant
+      ctDNA_ever = first(ctDNA_ever),       # Get ctDNA_ever status for each participant
+      .groups = "drop"
+    )
+  
+  #Create a contingency table of nodal_status vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$ctDNA_ever)
+  
+  # Check if any cells in the contingency table have zero counts, which could affect test validity
+  print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    43    3
+  1    43    0
+  2     8    5
+  3     6    1
+
+
  # Step 5: Perform Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Step 6: Print the Chi-squared test result p = 0.0001 
+  print(chisq_test) 
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 20.045, df = 3, p-value = 0.0001661
+
+
  #### Node positive versus node negative: Using the final n stage to create a Node - vs node + variable from this summary indicator variable 
+  subset_data_by_id <- subset_data |>
+    group_by(participant_id) |>
+    summarise(
+      node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"),  # Node negative if 0, positive otherwise
+      ctDNA_ever = first(ctDNA_ever),  # Get ctDNA_ever status for each participant
+      .groups = "drop"
+    )
+  
+  #adding node_status to subset_data 
+ subset_data <- subset_data |>
+  left_join(subset_data_by_id |> select(participant_id, node_status), by = "participant_id")
+  
+  
+  #Create a contingency table of node_status vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  #Print the contingency table and Chi-squared test results
+  print(contingency_table)
+
+
               
+                FALSE TRUE
+  Node Negative    43    3
+  Node Positive    57    6
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.044142, df = 1, p-value = 0.8336
+
+
#######Looking at T stage or tumor size: the variable is final_t_stage 
+  
+  table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (99 = pTx, cannot evaluate) so can proceed with this 
+
+

+  1   2   3   4  99 
+173 168  46  10   1 
+
+
  t_summary <- subset_data |>
+    distinct(participant_id, final_t_stage) |>  # Get unique participant-stage combinations
+    group_by(final_t_stage) |>  # Group by stage
+    summarise(count = n())  # Count the number of participants per histology type
+  
+  # View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2  
+  print(t_summary)
+
+
# A tibble: 5 × 2
+  final_t_stage count
+          <int> <int>
+1             1    51
+2             2    44
+3             3    12
+4             4     1
+5            99     1
+
+
  #for our T stage table, will use T1 vs T2 or greater to simplify, and we want to exclude 99 (the pTx). We will create "final_t_stage_combined" to represent this.  
+  subset_data_clean <- subset_data |>
+    filter(final_t_stage != 99, ctDNA_ever != 99)
+  
+  # Combine final_t_stage into T1 vs. T2 or greater
+  subset_data_clean <- subset_data_clean |>
+    mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater"))
+  
+  # Summarize the data by participant_id after creating the new combined t_stage
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+      final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+    )
+  
+  # Create a contingency table of final_t_stage_combined vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results. P value = 0.6
+  print(contingency_table)
+
+
               
+                FALSE TRUE
+  T1               48    3
+  T2 or greater    51    6
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.27357, df = 1, p-value = 0.6009
+
+
#### I looked at a different cut-off for T stage stats using T3 or greater as cutoff and didn't see any significant difference so am not using this for the table. 
+  
+  #exclude 99 (the pTx) 
+  subset_data_clean <- subset_data |>
+    filter(final_t_stage != 99, ctDNA_ever != 99)
+  
+  # Combine final_t_stage into T1/T2 or T3 or greater
+  subset_data_clean <- subset_data_clean |>
+    mutate(final_t_stage_combined = case_when(
+      final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2",  # Group T1 and T2 together
+      final_t_stage >= 3 ~ "T3 or greater",  # Group T3 and higher as a separate category
+      TRUE ~ NA_character_  # Handle any unexpected values
+    ))
+  
+  
+  # Summarize the data by participant_id after creating the new combined t_stage
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+      final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+    )
+  
+  # Create a contingency table of final_t_stage_combined vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results --> not significant so ignore this 
+  print(contingency_table)
+
+
               
+                FALSE TRUE
+  T1 or T2         88    7
+  T3 or greater    11    2
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.19875, df = 1, p-value = 0.6557
+
+
  ########Overall stage of disease -- final_overall_stage 
+
+  table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this 
+
+

+  1   2   3  99 
+124 167 105   2 
+
+
  stage_summary <- subset_data |>
+    distinct(participant_id, final_overall_stage) |>  # Get unique participant-stage combinations
+    group_by(final_overall_stage) |>  # Group by stage
+    summarise(count = n())  # Count the number of participants per histology type
+  
+  # View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay)
+  print(stage_summary)
+
+
# A tibble: 4 × 2
+  final_overall_stage count
+                <int> <int>
+1                   1    35
+2                   2    47
+3                   3    26
+4                  99     1
+
+
  #exclude the 99 
+  subset_data_clean <- subset_data |>
+    filter(final_overall_stage != 99, ctDNA_ever != 99)
+  
+  # Summarize the data by participant_id
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+      final_overall_stage = first(final_overall_stage),  # Get the final_overall_stage for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+    )
+  
+  # Create a contingency table of final_overall_stage vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results --> hot dogggg p val = 0.006. Higher stage is associated with ctDNA_ever.  
+  print(contingency_table)
+
+
   
+    FALSE TRUE
+  1    33    2
+  2    46    1
+  3    20    6
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 10.082, df = 2, p-value = 0.006466
+
+
###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) 
+  
+  table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness 
+
+

+  1   2 
+158 240 
+
+
  surgery <- subset_data |>
+    distinct(participant_id, diag_surgery_type_1) |>  # Get unique participant-stage combinations
+    group_by(diag_surgery_type_1) |>  # Group by stage
+    summarise(count = n())  # Count the number of participants per histology type
+  
+  # View the summary table
+  print(surgery)
+
+
# A tibble: 2 × 2
+  diag_surgery_type_1 count
+                <int> <int>
+1                   1    45
+2                   2    64
+
+
  # Summarize the data by participant_id
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+     surgery = first(diag_surgery_type_1),  # Get the final_overall_stage for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+    )
+  
+  # Create a contingency table of final_overall_stage vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results --> p-val = 1....
+  print(contingency_table)
+
+
   
+    FALSE TRUE
+  1    41    4
+  2    58    5
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0, df = 1, p-value = 1
+
+
######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms). Created a new variable (axillary_dissection) 
+
+  
+  table(subset_data$diag_axillary_type___2_1) 
+
+

+  0   1 
+215 183 
+
+
  table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two 
+
+

+ 0  1 
+16  4 
+
+
  # Create a binary variable to identify participants who had axillary dissection
+  subset_data_clean <- subset_data |>
+    mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0))
+  
+  subset_data <- subset_data |>
+  mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0))
+  
+  # Ensure every participant has a ctDNA_ever and axillary_dissection value
+  # Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one
+  subset_data_clean <- subset_data |>
+    mutate(axillary_dissection = case_when(
+      diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1,  # Had axillary dissection
+      TRUE ~ 0  # No axillary dissection (includes missing values)
+    ))
+  
+  # Summarize the data by participant_id, including the axillary_dissection and ctDNA_ever variables
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+      axillary_dissection = first(axillary_dissection),  # Get the axillary dissection status for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get the ctDNA_ever status for each participant
+    )
+  
+  contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$ctDNA_ever)
+  
+  subset_data <- subset_data |>
+  mutate(axillary_dissection = ifelse(is.na(axillary_dissection), 0, axillary_dissection))
+table(subset_data$axillary_dissection)
+
+

+  0   1 
+214 184 
+
+
  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results --> p-value 0.173 
+  print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    52    2
+  1    48    7
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 1.8588, df = 1, p-value = 0.1728
+
+
####inflammatory (variable inflamm_yn)-- I have decided not to include inflammatory variable in table 1 as there were NO inflammatory breast cancers in the ctDNA cohort. 
+table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable)
+
+

+  0   1 
+568  11 
+
+
table(d$inflamm_yn_2)  ### I think inflammatory folks just not in subset of patients in the ctDNA cohort 
+
+

+ 0 
+24 
+
+
table(subset_data$inflamm_yn) 
+
+
Warning: Unknown or uninitialised column: `inflamm_yn`.
+
+
+
< table of extent 0 >
+
+
#### radiation prtx_radiation 
+table(subset_data$prtx_radiation) 
+
+

+  0   1 
+116 282 
+
+
radiation <- subset_data |> 
+  distinct(participant_id,prtx_radiation) |> 
+  group_by(prtx_radiation) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(radiation)
+
+
# A tibble: 2 × 2
+  prtx_radiation count
+           <int> <int>
+1              0    34
+2              1    75
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    radiation = first(prtx_radiation),  # xrt for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.33 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    33    1
+  1    67    8
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.96444, df = 1, p-value = 0.3261
+
+
#### chemotherapy prtx_chemo 
+table(subset_data$prtx_chemo) 
+
+

+  0   1 
+ 18 380 
+
+
chemo <- subset_data |> 
+  distinct(participant_id,prtx_chemo) |> 
+  group_by(prtx_chemo) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(chemo) #3 people did not get chemo in this cohort 
+
+
# A tibble: 2 × 2
+  prtx_chemo count
+       <int> <int>
+1          0     3
+2          1   106
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    chemo = first(prtx_chemo),  # chemo for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.59 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0     2    1
+  1    98    8
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.28802, df = 1, p-value = 0.5915
+
+
####neoadjuvant chemo -- there are two variables for this that could theoretically be included: diag_neoadj_chemo_1 or diag_neoadj_chemo_2 
+
+table(subset_data$diag_neoadj_chemo_1) 
+
+

+  0   1 
+327  71 
+
+
table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable 
+
+

+ 0 
+20 
+
+
nact <- subset_data |> 
+  distinct(participant_id,diag_neoadj_chemo_1) |> 
+  group_by(diag_neoadj_chemo_1) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(nact) #3 people did not get chemo in this cohort 
+
+
# A tibble: 2 × 2
+  diag_neoadj_chemo_1 count
+                <int> <int>
+1                   0    90
+2                   1    19
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    nact = first(diag_neoadj_chemo_1),  # NACT for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of NACT vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.95 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    82    8
+  1    18    1
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.0039839, df = 1, p-value = 0.9497
+
+
####hormone therapy prtx_endo 
+
+table(subset_data$prtx_endo) 
+
+

+  0   1 
+156 242 
+
+
endo <- subset_data |> 
+  distinct(participant_id,prtx_endo) |> 
+  group_by(prtx_endo) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(endo) #most ppl did get endo (62 of the 109)
+
+
# A tibble: 2 × 2
+  prtx_endo count
+      <int> <int>
+1         0    47
+2         1    62
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    endo = first(prtx_endo),  # Get the final_overall_stage for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.33 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    45    2
+  1    55    7
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.94139, df = 1, p-value = 0.3319
+
+
####bone modifying agents prtx_bonemod 
+
+table(subset_data$prtx_bonemod) 
+
+

+  0   1 
+238 160 
+
+
bonemod <- subset_data |> 
+  distinct(participant_id,prtx_bonemod) |> 
+  group_by(prtx_bonemod) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(bonemod) #most ppl did get endo (39 got bonemod)
+
+
# A tibble: 2 × 2
+  prtx_bonemod count
+         <int> <int>
+1            0    70
+2            1    39
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    bonemod = first(prtx_bonemod),  # Get bone mod status for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of bonemod vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.84 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    65    5
+  1    35    4
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.041269, df = 1, p-value = 0.839
+
+
#### PCR -- did NOT include this in Table 1 as it aligns closely with NACT) 
+# 2 = non-pcr, 1 = pcr 
+#the variables of interest for path cr: diag_pcr_1 or diag_pcr_2  
+table(subset_data$diag_pcr_1) 
+
+

+  .   1   2 
+327   8  63 
+
+
table(subset_data$diag_pcr_2) #none recorded here so can just use diag_pcr_1 
+
+

+      . 
+378  20 
+
+
pcr <- subset_data |>
+  mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) |>  # Convert "." to NA
+  filter(!is.na(diag_pcr_1)) |>  # Exclude rows where diag_pcr_1 is NA
+  distinct(participant_id, diag_pcr_1) |>
+  group_by(diag_pcr_1) |>
+  summarise(count = n()) # Count the number of participants per histology type
+
+# View the summary table
+print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data 
+
+
# A tibble: 2 × 2
+  diag_pcr_1 count
+  <chr>      <int>
+1 1              1
+2 2             18
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    pcr = first(diag_pcr_1),  # Get pcr for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of pcr vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.86 -- does not seem to be association among those who got pcr (but also we have a group with 1 in it...) 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  .    82    8
+  1     1    0
+  2    17    1
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 0.31085, df = 2, p-value = 0.8561
+
+
########recurrence
+#local first, then distant.then create summary variable of either locreg or distant 
+#local fu_locreg_prog 
+
+# Step 1: Summarize data by unique participant_id
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    fu_locreg_prog = first(fu_locreg_prog),  # Get fu_locreg_prog status for each participant
+    ctDNA_ever = first(ctDNA_ever),          # Get ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of fu_locreg_prog vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$ctDNA_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    96    5
+  1     2    4
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 20.564, df = 1, p-value = 5.768e-06
+
+
####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char 
+### Just want to look at site distribution here 
+
+# Summarize the distribution of fu_locreg_site_char by unique participant_id
+site_distribution <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    site = first(fu_locreg_site_char),  # Get the site for each unique participant
+    .groups = "drop"
+  ) |>
+  count(site)  # Count the occurrences of each site
+
+# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast 
+print(site_distribution)
+
+
# A tibble: 6 × 2
+  site                                                              n
+  <chr>                                                         <int>
+1 ""                                                              103
+2 "Axillary Nodes"                                                  2
+3 "Axillary Nodes,Internal Mammary Nodes,Supraclavicular Nodes"     1
+4 "Ipsilateral Breast"                                              1
+5 "Ipsilateral Breast,Axillary Nodes"                               1
+6 "Supraclavicular Nodes"                                           1
+
+
#####distant recurrence: distant fu_dist_prog 
+
+# Step 1: Summarize data by unique participant_id
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    fu_dist_prog = first(fu_dist_prog),  # Get fu_dist_prog status for each participant
+    ctDNA_ever = first(ctDNA_ever),          # Get ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of dist prog vs ctDNA_ever --> 12 who had distant progression 
+contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$ctDNA_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    93    2
+  1     5    7
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 36.73, df = 1, p-value = 1.356e-09
+
+
### Distant sites 
+#distant site fu_dist_site_num #fu_dist_site_char  -- start just looking at the locations 
+
+# Summarize the distribution of fu_dist_site_char by unique participant_id
+dist_site_distribution <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    site = first(fu_dist_site_char),  # Get the site for each unique participant
+    .groups = "drop"
+  ) |>
+  count(site)  # Count the occurrences of each site
+
+# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal 
+print(dist_site_distribution)
+
+
# A tibble: 8 × 2
+  site                  n
+  <chr>             <int>
+1 ""                   97
+2 "Bone"                5
+3 "Bone,Other"          1
+4 "Intra-abdominal"     1
+5 "Liver"               2
+6 "Liver,Bone"          1
+7 "Lung"                1
+8 "Pleura,Lung"         1
+
+
##### ANY Recurrence -- this includes either fu_locreg_prog or fu_dist_prog 
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    ctDNA_ever = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results -- total 14 relapses, 8 were ctDNA +, 6 were not ever ctDNA positive 
+print(contingency_table)
+
+
     
+      FALSE TRUE
+  No     92    1
+  Yes     6    8
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 42.642, df = 1, p-value = 6.573e-11
+
+
#### Relapse and DTCs  
+#using ever_relapsed and dtc_ever
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc = first(dtc_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
     
+       0  1
+  No  59 34
+  Yes 10  4
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.079932, df = 1, p-value = 0.7774
+
+
# Identify participants missing data in either `ever_relapsed` or `dtc_ever`
+missing_data <- subset_data_by_id |>
+  filter(is.na(ever_relapsed) | is.na(dtc))
+
+# Print the IDs of participants with missing data
+print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above)
+
+
[1] "28115-17-021" "28115-18-032"
+
+
### look at ever_relapsed by ctDNA 
+
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    ctDNA = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results, p < 0.00001 
+print(contingency_table)
+
+
     
+      FALSE TRUE
+  No     92    1
+  Yes     6    8
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 42.642, df = 1, p-value = 6.573e-11
+
+
####survival: fu_survival 
+
+table(subset_data$fu_surv)
+
+

+  0   1 
+  8 389 
+
+
surv <- subset_data |>
+  distinct(participant_id, fu_surv) |>
+  group_by(fu_surv) |>
+  summarise(count = n()) # Count the number of participants per histology type
+
+# View the summary table
+print(surv) #1 NA patient --> identify the NA patient below  dead = 5, alive 103. There is 1 that's an NA. 
+
+
# A tibble: 3 × 2
+  fu_surv count
+    <int> <int>
+1       0     5
+2       1   103
+3      NA     1
+
+
na_participant <- subset_data |>
+  filter(is.na(fu_surv)) |>
+  select(participant_id, fu_surv)
+
+# Print the result -- 28115-17-021  -- no follow up data for this pt looking in redcap, everyone else has some survival data in the ctDNA cohort. 
+print(na_participant)
+
+
# A tibble: 1 × 2
+  participant_id fu_surv
+  <chr>            <int>
+1 28115-17-021        NA
+
+
# Summarize data by unique participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    surv = first(fu_surv),          # Get survival status for each participant
+    ctDNA_ever = first(ctDNA_ever),  # Get ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of surv vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results, p<0.00001
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0     1    4
+  1    98    5
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 26.099, df = 1, p-value = 3.243e-07
+
+
+

DTC Demographics and Univariable tests of association: Next we will look at the univariable tests of association by DTC status.

+
+
############### DTC Demographics ########## 
+
+###### median age at diagnosis 
+
+#### Age at Dx (by DTC)
+
+names(subset_data) #to identify the variables I want to use 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                         "ever_relapsed"                   
+[391] "age_at_diag"                      "HR_status"                       
+[393] "histology_category"               "node_status"                     
+[395] "axillary_dissection"             
+
+
str(subset_data$diag_date_1) #character
+
+
 Date[1:398], format: "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" ...
+
+
str(subset_data$demo_dob) #character 
+
+
 Date[1:398], format: "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" ...
+
+
d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y")  
+d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y")  
+
+str(d$diag_date_1) #dates! 
+
+
 Date[1:579], format: "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" ...
+
+
str(d$demo_dob) #dates! 
+
+
 Date[1:579], format: "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" ...
+
+
### doing the same for subset_data as it didn't carry over into that data set 
+subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y")  
+subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y")  
+
+# calculating age from date of diagnosis to dob 
+subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25
+head(subset_data$age_at_diag)
+
+
[1] 55.89870 55.89870 55.89870 55.89870 55.89870 49.25667
+
+
summary(subset_data$age_at_diag) #median 48.75 
+
+
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
+  27.34   41.73   48.75   49.35   57.63   68.94 
+
+
age_summary <- subset_data |>
+  group_by(dtc_ever) |>
+  summarise(
+    mean_age = mean(age_at_diag, na.rm = TRUE),  # Calculate mean age
+    median_age = median(age_at_diag, na.rm = TRUE),  # Calculate median age
+    sd_age = sd(age_at_diag, na.rm = TRUE),  # Calculate standard deviation of age
+    n = n()  # Number of participants in each group
+  )
+
+print(age_summary) #interesting dtc ever are slightly more positive 
+
+
# A tibble: 2 × 5
+  dtc_ever mean_age median_age sd_age     n
+     <dbl>    <dbl>      <dbl>  <dbl> <int>
+1        0     50.6       51.8   9.48   149
+2        1     48.6       47.3   9.75   249
+
+
# Perform the Wilcoxon rank-sum test to compare the medians of age between dtc_ever positive and negative groups
+wilcox_test_result <- wilcox.test(age_at_diag ~ dtc_ever, data = subset_data)
+
+# Print the result
+print(wilcox_test_result)
+
+

+    Wilcoxon rank sum test with continuity correction
+
+data:  age_at_diag by dtc_ever
+W = 20838, p-value = 0.03946
+alternative hypothesis: true location shift is not equal to 0
+
+
#looking at range of age for the dtc pos 
+age_summary <- subset_data |>
+  group_by(dtc_ever) |>
+  summarise(
+    min_age = min(age_at_diag, na.rm = TRUE),  # Minimum age
+    max_age = max(age_at_diag, na.rm = TRUE),  # Maximum age
+    .groups = "drop"
+  )
+
+# View the summary table
+print(age_summary)
+
+
# A tibble: 2 × 3
+  dtc_ever min_age max_age
+     <dbl>   <dbl>   <dbl>
+1        0    27.3    68.9
+2        1    30.7    67.7
+
+
+
+
##### Race: demo_race_final
+
+# Get the count of unique participant_ids for each category in demo_race_final
+race_counts_unique_percent <- subset_data |>
+  group_by(demo_race_final) |>
+  summarise(unique_participants = n_distinct(participant_id)) |>
+  mutate(percent = unique_participants / sum(unique_participants) * 100)
+
+# View the result
+print(race_counts_unique_percent)
+
+
# A tibble: 3 × 3
+  demo_race_final unique_participants percent
+            <int>               <int>   <dbl>
+1               1                   9   8.26 
+2               3                   1   0.917
+3               5                  99  90.8  
+
+
# Count distinct participant_ids by dtc_ever and demo_race_final
+count_distinct_participants <- subset_data |>
+  group_by(demo_race_final, dtc_ever) |>
+  summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop")
+
+# Print the result
+count_distinct_participants
+
+
# A tibble: 5 × 3
+  demo_race_final dtc_ever distinct_participant_count
+            <int>    <dbl>                      <int>
+1               1        0                          5
+2               1        1                          4
+3               3        0                          1
+4               5        0                         64
+5               5        1                         35
+
+
library(dplyr)
+
+# Step 1: Summarize by unique participant_id
+summarized_data <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    dtc_ever = first(dtc_ever),   # Taking the first observed value of dtc_ever for each participant
+    demo_race_final = first(demo_race_final),  # Taking the first observed value of demo_race_final for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table <- table(summarized_data$dtc_ever, summarized_data$demo_race_final)
+contingency_table
+
+
   
+     1  3  5
+  0  5  1 64
+  1  4  0 35
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the result p val - 0.65 
+chisq_test
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 0.85903, df = 2, p-value = 0.6508
+
+
#receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') 
+
+# Breakdown of final_receptor_group by unique participant_id
+receptor_status_by_participant <- subset_data |>
+  group_by(participant_id) |>
+  summarise(final_receptor_group = first(final_receptor_group),  # Or choose the most frequent group if needed
+            .groups = "drop")
+
+# View the result
+table(receptor_status_by_participant$final_receptor_group)
+
+

+ 1  2  3  4 
+45 52  8  4 
+
+
# Summarizing data by participant_id, final_receptor_group, and dtc_ever
+receptor_dtc_status <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    final_receptor_group = first(final_receptor_group),  # Or the most frequent if needed
+    dtc_ever = first(dtc_ever),  # Taking the first observed value for dtc_ever
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table_receptor <- table(receptor_dtc_status$final_receptor_group, receptor_dtc_status$dtc_ever)
+contingency_table_receptor
+
+
   
+     0  1
+  1 25 20
+  2 37 15
+  3  4  4
+  4  4  0
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table_receptor)
+
+
Warning in chisq.test(contingency_table_receptor): Chi-squared approximation
+may be incorrect
+
+
# Step 4: Print the result # p-value 0.14 -- interesting looks like more even distribution of DTC + across TNBC than for ctDNA 
+chisq_test
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table_receptor
+X-squared = 5.4909, df = 3, p-value = 0.1392
+
+
#curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive)
+#start with TNBC (using QDC)
+#inclusion criteria inc_dx_crit___1  = TNBC 
+
+
+#inc_dx_crit_list___1  
+
+TNBC_dtc_status <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    inc_dx_crit_list___1 = first(inc_dx_crit_list___1),  # Or the most frequent if needed
+    dtc_ever = first(dtc_ever),  # Taking the first observed value for dtc_ever
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table_TNBC <- table(TNBC_dtc_status$inc_dx_crit_list___1, TNBC_dtc_status$dtc_ever)
+contingency_table_TNBC
+
+
   
+     0  1
+  0 45 19
+  1 25 20
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table_TNBC)
+
+# Step 4: p-val is 0.17 
+chisq_test
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table_TNBC
+X-squared = 1.903, df = 1, p-value = 0.1677
+
+
#ER vs non-ER 
+#first create HR_status variable 
+subset_data <- subset_data |> 
+  mutate(HR_status = case_when(
+    final_receptor_group %in% c(2, 3) ~ "HR+",
+    final_receptor_group %in% c(1, 4) ~ "Non-HR+",
+    TRUE ~ NA_character_  # In case there are missing or other unexpected values
+  ))
+
+# View the new HR_status variable
+table(subset_data$HR_status)
+
+

+    HR+ Non-HR+ 
+    225     173 
+
+
HR_status_by_participant <- subset_data |>
+  group_by(participant_id) |>
+  summarise(HR_status = first(HR_status),  # Or use mode() if you have multiple rows per participant
+            .groups = "drop")
+
+# View the result 
+table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-)
+
+

+    HR+ Non-HR+ 
+     60      49 
+
+
# Summarize dtc_detected status by HR_status, for each unique participant_id
+summary_data <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    HR_status = first(HR_status),  # Get the HR_status for the participant
+    dtc_status = first(dtc_ever),  # Get the dtc_detected status for the participant
+    .groups = "drop"
+  )
+
+contingency_table_HR <- table(summary_data$dtc_status, summary_data$HR_status)
+contingency_table_HR
+
+
   
+    HR+ Non-HR+
+  0  41      29
+  1  19      20
+
+
chisq_test <- chisq.test(contingency_table_HR)
+
+# Print chi-squared test results #0.28 
+chisq_test
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table_HR
+X-squared = 0.62484, df = 1, p-value = 0.4293
+
+
###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported  
+
+# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported 
+summary_data <- subset_data |>
+  filter(final_tumor_grade != 3) |>  # Exclude grade == 3
+  group_by(participant_id) |>
+  summarise(
+    grade = first(final_tumor_grade),  # Get the final_tumor_grade for each participant
+    dtc_ever = first(dtc_ever),    # Get the dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of grade vs dtc_ever
+contingency_table <- table(summary_data$grade, summary_data$dtc_ever)
+
+# View the contingency table
+print(contingency_table)
+
+
   
+     0  1
+  0 46 33
+  1 18  4
+  2  4  2
+
+
# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# View the Chi-squared test result -- p-value 0.12 NOT SIG for DTCs 
+print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 4.1608, df = 2, p-value = 0.1249
+
+
######histology  #people have different combinations of histology (1-15)
+table(subset_data$participant_id, subset_data$final_histology)
+
+
              
+                1 1,13,14,3 1,3 11,3 12,3 13,3 13,3,5 14 14,15 14,15,3 14,3
+  28115-16-001  0         0   0    0    0    0      0  0     0       0    5
+  28115-16-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-010  0         0   0    0    1    0      0  0     0       0    0
+  28115-16-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-015  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-017  0         0   0    0    0    0      0  3     0       0    0
+  28115-16-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-023  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-026  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-029  0         0   0    0    0    0      0  2     0       0    0
+  28115-16-033  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-035  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-001  0         0   8    0    0    0      0  0     0       0    0
+  28115-17-002  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-006  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-008  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-010  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-012  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-017  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-019  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-023  0         0   2    0    0    0      0  0     0       0    0
+  28115-17-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-030  0         0   3    0    0    0      0  0     0       0    0
+  28115-17-031  0         0   0    0    0    0      0  5     0       0    0
+  28115-17-032  0         0   0   10    0    0      0  0     0       0    0
+  28115-17-036  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-039  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-040  0         0   4    0    0    0      0  0     0       0    0
+  28115-17-045  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-046  0         0   0    0    0   10      0  0     0       0    0
+  28115-17-047  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-048  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-050  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-051  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-052  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-001  0         0   0    0    0    0      0  0     7       0    0
+  28115-18-002  0         0   2    0    0    0      0  0     0       0    0
+  28115-18-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-006  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-009  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-015  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-017  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-021  0         0   0    0    0    0      0  8     0       0    0
+  28115-18-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-023  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-028  1         0   0    0    0    0      0  0     0       0    0
+  28115-18-029  0         0   0    0    0    0      0  4     0       0    0
+  28115-18-030  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-031  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-032  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-034  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-001  0         0   0    0    0    0      0  1     0       0    0
+  28115-19-002  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-003  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-005  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-006  0         0   0    0    0    8      0  0     0       0    0
+  28115-19-007  0         0   0    0    0    0      0  0     0       0    5
+  28115-19-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-012  0         3   0    0    0    0      0  0     0       0    0
+  28115-19-014  0         0   0    0    0    0      0  0     0       0    2
+  28115-19-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-017  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-019  0         0   0    0    0    0      0  0     0       0    3
+  28115-19-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-028  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-007  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-010  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-001  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-002  0         0   0    0    0    0      0  0     0       0    4
+  28115-21-003  0         0   0    0    0    0      0  2     0       0    0
+  28115-21-006  0         0   0    0    0    2      0  0     0       0    0
+  28115-21-007  0         0   0    0    0    0      0  3     0       0    0
+  28115-21-009  0         0   0    0    0    0      3  0     0       0    0
+  28115-21-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-013  0         0   0    0    0    0      0  0     0       0    4
+  28115-21-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-015  0         0   0    0    0    0      0  0     0       2    0
+  28115-21-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-019  0         0   0    0    0    0      0  1     0       0    0
+  28115-21-020  0         0   3    0    0    0      0  0     0       0    0
+  28115-21-021  0         0   0    0    0    0      0  3     0       0    0
+  28115-21-022  0         0   0    0    0    1      0  0     0       0    0
+  28115-21-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-026  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-028  0         0   0    0    0    0      0  0     0       0    1
+              
+               16,3  3 3,5 3,7  5
+  28115-16-001    0  0   0   0  0
+  28115-16-004    0  1   0   0  0
+  28115-16-010    0  0   0   0  0
+  28115-16-014    0  1   0   0  0
+  28115-16-015    0 12   0   0  0
+  28115-16-017    0  0   0   0  0
+  28115-16-020    0  0   1   0  0
+  28115-16-021    0  9   0   0  0
+  28115-16-023    0  1   0   0  0
+  28115-16-025    0  1   0   0  0
+  28115-16-026    0 10   0   0  0
+  28115-16-027    0  3   0   0  0
+  28115-16-029    0  0   0   0  0
+  28115-16-033    0  2   0   0  0
+  28115-16-035    0  1   0   0  0
+  28115-17-001    0  0   0   0  0
+  28115-17-002    0  9   0   0  0
+  28115-17-006    0  1   0   0  0
+  28115-17-008    0  9   0   0  0
+  28115-17-009    0  1   0   0  0
+  28115-17-010    0  5   0   0  0
+  28115-17-011    0  9   0   0  0
+  28115-17-012    0 10   0   0  0
+  28115-17-016    0  4   0   0  0
+  28115-17-017    0  5   0   0  0
+  28115-17-019    0  9   0   0  0
+  28115-17-021    0  1   0   0  0
+  28115-17-022    0  1   0   0  0
+  28115-17-023    0  0   0   0  0
+  28115-17-024    0  0   0   4  0
+  28115-17-025    0  2   0   0  0
+  28115-17-027    0  8   0   0  0
+  28115-17-030    0  0   0   0  0
+  28115-17-031    0  0   0   0  0
+  28115-17-032    0  0   0   0  0
+  28115-17-036    0  7   0   0  0
+  28115-17-039    0  2   0   0  0
+  28115-17-040    0  0   0   0  0
+  28115-17-045    0  0   1   0  0
+  28115-17-046    0  0   0   0  0
+  28115-17-047    0  3   0   0  0
+  28115-17-048    0  2   0   0  0
+  28115-17-050    0  3   0   0  0
+  28115-17-051    0  9   0   0  0
+  28115-17-052    0  0   0   0  3
+  28115-18-001    0  0   0   0  0
+  28115-18-002    0  0   0   0  0
+  28115-18-004    0  2   0   0  0
+  28115-18-006    0  0   0   0  0
+  28115-18-009    0  0   0   0  0
+  28115-18-011    0  5   0   0  0
+  28115-18-014    0  2   0   0  0
+  28115-18-015    0  5   0   0  0
+  28115-18-017    0  0   0   0  0
+  28115-18-020    0  8   0   0  0
+  28115-18-021    0  0   0   0  0
+  28115-18-022    0  0   0  12  0
+  28115-18-023    0  3   0   0  0
+  28115-18-024    0  0   0   2  0
+  28115-18-027    0  1   0   0  0
+  28115-18-028    0  0   0   0  0
+  28115-18-029    0  0   0   0  0
+  28115-18-030    0  2   0   0  0
+  28115-18-031    0  3   0   0  0
+  28115-18-032    0  6   0   0  0
+  28115-18-034    0  1   0   0  0
+  28115-19-001    0  0   0   0  0
+  28115-19-002    0  2   0   0  0
+  28115-19-003    0  5   0   0  0
+  28115-19-004    0  1   0   0  0
+  28115-19-005    0  3   0   0  0
+  28115-19-006    0  0   0   0  0
+  28115-19-007    0  0   0   0  0
+  28115-19-009    0  6   0   0  0
+  28115-19-011    0  1   0   0  0
+  28115-19-012    0  0   0   0  0
+  28115-19-014    0  0   0   0  0
+  28115-19-016    0  2   0   0  0
+  28115-19-017    0  2   0   0  0
+  28115-19-019    0  0   0   0  0
+  28115-19-020    0  2   0   0  0
+  28115-19-021    0  4   0   0  0
+  28115-19-022    0  2   0   0  0
+  28115-19-025    0  6   0   0  0
+  28115-19-028    0  2   0   0  0
+  28115-20-004    0  2   0   0  0
+  28115-20-007    0  2   0   0  0
+  28115-20-009    0  4   0   0  0
+  28115-20-010    0  1   0   0  0
+  28115-21-001    0  1   0   0  0
+  28115-21-002    0  0   0   0  0
+  28115-21-003    0  0   0   0  0
+  28115-21-006    0  0   0   0  0
+  28115-21-007    0  0   0   0  0
+  28115-21-009    0  0   0   0  0
+  28115-21-011    0  1   0   0  0
+  28115-21-013    0  0   0   0  0
+  28115-21-014    0  2   0   0  0
+  28115-21-015    0  0   0   0  0
+  28115-21-016    0  8   0   0  0
+  28115-21-019    0  0   0   0  0
+  28115-21-020    0  0   0   0  0
+  28115-21-021    0  0   0   0  0
+  28115-21-022    0  0   0   0  0
+  28115-21-024    0  2   0   0  0
+  28115-21-025    0  2   0   0  0
+  28115-21-026    2  0   0   0  0
+  28115-21-027    0  2   0   0  0
+  28115-21-028    0  0   0   0  0
+
+
histology_summary <- subset_data |>
+  distinct(participant_id, final_histology) |>  # Get unique participant-histology combinations
+  group_by(final_histology) |>  # Group by histology type
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology 
+
+
# A tibble: 16 × 2
+   final_histology count
+   <chr>           <int>
+ 1 1                   1
+ 2 1,13,14,3           1
+ 3 1,3                 6
+ 4 11,3                1
+ 5 12,3                1
+ 6 13,3                4
+ 7 13,3,5              1
+ 8 14                 13
+ 9 14,15               1
+10 14,15,3             1
+11 14,3                7
+12 16,3                1
+13 3                  65
+14 3,5                 2
+15 3,7                 3
+16 5                   1
+
+
#trying to create Ductal, lobular, both, or other variables 
+subset_data <- subset_data |>
+  mutate(histology_category = case_when(
+    grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular",  # Both Ductal and Lobular
+    grepl("3", as.character(final_histology)) ~ "Ductal",  # Ductal
+    grepl("14", as.character(final_histology)) ~ "Lobular",  # Lobular
+    TRUE ~ "Other"  # Any other combination
+  ))
+
+# Count the number of participants in each histology category
+histology_counts <- subset_data |>
+  group_by(histology_category) |>
+  summarise(count = n_distinct(participant_id))  # Count distinct participants
+
+# View the counts -- adds up to 109! 
+print(histology_counts)
+
+
# A tibble: 4 × 2
+  histology_category      count
+  <chr>                   <int>
+1 Both Ductal and Lobular     9
+2 Ductal                     84
+3 Lobular                    14
+4 Other                       2
+
+
#contingency table 
+library(tidyr)
+contingency_table <- subset_data |>
+  distinct(participant_id, histology_category, dtc_ever) |>  # Ensure each patient is counted once
+  count(histology_category, dtc_ever) |>
+  pivot_wider(names_from = dtc_ever, values_from = n, values_fill = list(n = 0))  # Pivot the table to get dtc_ever as columns
+
+# 3. Perform the Chi-squared test of independence
+chisq_test <- chisq.test(contingency_table[,-1])  # Remove the histology_category column for the test
+
+
Warning in chisq.test(contingency_table[, -1]): Chi-squared approximation may
+be incorrect
+
+
# 4. Print the contingency table
+print(contingency_table) 
+
+
# A tibble: 4 × 3
+  histology_category        `0`   `1`
+  <chr>                   <int> <int>
+1 Both Ductal and Lobular     9     0
+2 Ductal                     48    36
+3 Lobular                    11     3
+4 Other                       2     0
+
+
# 5. Print the result of the Chi-squared test p-value - 0.03 ### More ductal positive generally compard to all histology 
+print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table[, -1]
+X-squared = 9.2145, df = 3, p-value = 0.02657
+
+
#### Stage -- N stage  --> come back to this N stage stuff 
+
+table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3)
+
+
              
+                0  1  2  3
+  28115-16-001  0  0  0  5
+  28115-16-004  1  0  0  0
+  28115-16-010  0  0  0  1
+  28115-16-014  1  0  0  0
+  28115-16-015 12  0  0  0
+  28115-16-017  0  0  3  0
+  28115-16-020  0  0  1  0
+  28115-16-021  0  0  9  0
+  28115-16-023  1  0  0  0
+  28115-16-025  1  0  0  0
+  28115-16-026 10  0  0  0
+  28115-16-027  0  3  0  0
+  28115-16-029  2  0  0  0
+  28115-16-033  0  2  0  0
+  28115-16-035  1  0  0  0
+  28115-17-001  0  8  0  0
+  28115-17-002  9  0  0  0
+  28115-17-006  0  1  0  0
+  28115-17-008  9  0  0  0
+  28115-17-009  1  0  0  0
+  28115-17-010  5  0  0  0
+  28115-17-011  0  0  0  9
+  28115-17-012  0  0  0 10
+  28115-17-016  0  4  0  0
+  28115-17-017  0  5  0  0
+  28115-17-019  9  0  0  0
+  28115-17-021  1  0  0  0
+  28115-17-022  1  0  0  0
+  28115-17-023  0  0  2  0
+  28115-17-024  4  0  0  0
+  28115-17-025  2  0  0  0
+  28115-17-027  0  8  0  0
+  28115-17-030  3  0  0  0
+  28115-17-031  5  0  0  0
+  28115-17-032  0  0 10  0
+  28115-17-036  7  0  0  0
+  28115-17-039  2  0  0  0
+  28115-17-040  0  0  4  0
+  28115-17-045  1  0  0  0
+  28115-17-046 10  0  0  0
+  28115-17-047  0  3  0  0
+  28115-17-048  0  0  2  0
+  28115-17-050  3  0  0  0
+  28115-17-051  9  0  0  0
+  28115-17-052  3  0  0  0
+  28115-18-001  0  0  7  0
+  28115-18-002  0  2  0  0
+  28115-18-004  0  0  2  0
+  28115-18-006  0  1  0  0
+  28115-18-009  1  0  0  0
+  28115-18-011  0  5  0  0
+  28115-18-014  0  2  0  0
+  28115-18-015  5  0  0  0
+  28115-18-017  0  1  0  0
+  28115-18-020  8  0  0  0
+  28115-18-021  0  8  0  0
+  28115-18-022 12  0  0  0
+  28115-18-023  0  3  0  0
+  28115-18-024  0  2  0  0
+  28115-18-027  0  1  0  0
+  28115-18-028  1  0  0  0
+  28115-18-029  0  4  0  0
+  28115-18-030  2  0  0  0
+  28115-18-031  0  3  0  0
+  28115-18-032  0  6  0  0
+  28115-18-034  1  0  0  0
+  28115-19-001  0  0  0  1
+  28115-19-002  0  2  0  0
+  28115-19-003  0  5  0  0
+  28115-19-004  0  1  0  0
+  28115-19-005  3  0  0  0
+  28115-19-006  0  8  0  0
+  28115-19-007  0  5  0  0
+  28115-19-009  0  0  0  6
+  28115-19-011  0  1  0  0
+  28115-19-012  0  3  0  0
+  28115-19-014  0  0  0  2
+  28115-19-016  2  0  0  0
+  28115-19-017  2  0  0  0
+  28115-19-019  0  3  0  0
+  28115-19-020  2  0  0  0
+  28115-19-021  0  4  0  0
+  28115-19-022  0  2  0  0
+  28115-19-025  0  6  0  0
+  28115-19-028  2  0  0  0
+  28115-20-004  2  0  0  0
+  28115-20-007  0  0  2  0
+  28115-20-009  4  0  0  0
+  28115-20-010  0  1  0  0
+  28115-21-001  0  1  0  0
+  28115-21-002  0  4  0  0
+  28115-21-003  0  0  2  0
+  28115-21-006  0  2  0  0
+  28115-21-007  0  0  3  0
+  28115-21-009  0  0  3  0
+  28115-21-011  1  0  0  0
+  28115-21-013  0  4  0  0
+  28115-21-014  0  2  0  0
+  28115-21-015  0  2  0  0
+  28115-21-016  8  0  0  0
+  28115-21-019  0  1  0  0
+  28115-21-020  0  3  0  0
+  28115-21-021  0  3  0  0
+  28115-21-022  1  0  0  0
+  28115-21-024  2  0  0  0
+  28115-21-025  0  2  0  0
+  28115-21-026  0  2  0  0
+  28115-21-027  2  0  0  0
+  28115-21-028  1  0  0  0
+
+
nodal_summary <- subset_data |>
+  distinct(participant_id, final_n_stage) |>  # Get unique participant-stage combinations
+  group_by(final_n_stage) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+#View the summary table --adds up to 109, 46 = pN0 63 = pN1 
+print(nodal_summary)
+
+
# A tibble: 4 × 2
+  final_n_stage count
+          <int> <int>
+1             0    46
+2             1    43
+3             2    13
+4             3     7
+
+
subset_data_by_id <- subset_data |>
+  filter(final_n_stage %in% c(0, 1, 2, 3)) |>  # Include only relevant nodal stages
+  group_by(participant_id) |>
+  summarise(
+    nodal_status = first(final_n_stage),  # Use final_n_stage as nodal_status for each participant
+    dtc_ever = first(dtc_ever),       # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 3: Create a contingency table of nodal_status vs dtc_ever
+contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$dtc_ever)
+
+# Step 4: Check if any cells in the contingency table have zero counts, which could affect test validity
+print(contingency_table)
+
+
   
+     0  1
+  0 24 22
+  1 32 11
+  2 10  3
+  3  4  3
+
+
# Step 5: Perform Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 6: Print the Chi-squared test result p = 0.0001 
+print(chisq_test) 
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 5.9169, df = 3, p-value = 0.1157
+
+
#### Creating Node - vs node + variable from summary variable  
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"),  # Node negative if 0, positive otherwise
+    dtc_ever = first(dtc_ever),  # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of node_status vs dtc_ever
+contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$dtc_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Step 4: Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
               
+                 0  1
+  Node Negative 24 22
+  Node Positive 46 17
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 4.1601, df = 1, p-value = 0.04139
+
+
####### EXTRA CODE/CONFIRMATION / slightly different but ignore for our analysis 
+#cross-check with indicator pN0 in our data that reflects nodal positivity.... there is 1 patient that is node - by summary variable but node + by indicator variable 
+## should double check this at some point 
+node_pos <- subset_data |>
+  distinct(participant_id, inc_dx_crit_list___2) |>  # Get unique participant-stage combinations
+  group_by(inc_dx_crit_list___2) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+print(node_pos)
+
+
# A tibble: 2 × 2
+  inc_dx_crit_list___2 count
+                 <int> <int>
+1                    0    45
+2                    1    64
+
+
contingency_table <- subset_data |>
+  distinct(participant_id, inc_dx_crit_list___2, dtc_ever) |>  # Ensure unique participants
+  count(inc_dx_crit_list___2, dtc_ever) |>  # Count occurrences
+  spread(key = dtc_ever, value = n, fill = 0)  # Spread data into a matrix
+
+# View the contingency table
+print(contingency_table)
+
+
# A tibble: 2 × 3
+  inc_dx_crit_list___2   `0`   `1`
+                 <int> <dbl> <dbl>
+1                    0    25    20
+2                    1    45    19
+
+
# Perform the Chi-square test =0.3902 
+chi_square_result <- chisq.test(contingency_table[, -1])  # Exclude the first column with the levels
+print(chi_square_result)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table[, -1]
+X-squared = 1.903, df = 1, p-value = 0.1677
+
+
#######t stage final_t_stage 
+
+table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (90 = pTx) so can proceed with this 
+
+

+  1   2   3   4  99 
+173 168  46  10   1 
+
+
t_summary <- subset_data |>
+  distinct(participant_id, final_t_stage) |>  # Get unique participant-stage combinations
+  group_by(final_t_stage) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2  
+print(t_summary)
+
+
# A tibble: 5 × 2
+  final_t_stage count
+          <int> <int>
+1             1    51
+2             2    44
+3             3    12
+4             4     1
+5            99     1
+
+
#### T stage, for our T stage table, will use T1 vs T2 or greater to simplify 
+#exclude 99 (the pTx) 
+subset_data_clean <- subset_data |>
+  filter(final_t_stage != 99, dtc_ever != 99)
+
+# Combine final_t_stage into T1 vs. T2 or greater
+subset_data_clean <- subset_data_clean |>
+  mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater"))
+
+# Summarize the data by participant_id after creating the new combined t_stage
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_t_stage_combined vs dtc_ever
+contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
               
+                 0  1
+  T1            34 17
+  T2 or greater 35 22
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.13531, df = 1, p-value = 0.713
+
+
#### TRy for T stage stats using T3 or greater as cutoff -- not super useful, So DONOT USE THIS FOR TABLE  
+
+#exclude 99 (the pTx) 
+subset_data_clean <- subset_data |>
+  filter(final_t_stage != 99, dtc_ever != 99)
+
+# Combine final_t_stage into T1/T2 or T3 or greater
+subset_data_clean <- subset_data_clean |>
+  mutate(final_t_stage_combined = case_when(
+    final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2",  # Group T1 and T2 together
+    final_t_stage >= 3 ~ "T3 or greater",  # Group T3 and higher as a separate category
+    TRUE ~ NA_character_  # Handle any unexpected values
+  ))
+
+
+# Summarize the data by participant_id after creating the new combined t_stage
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_t_stage_combined vs dtc_ever
+contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> not significant so ignore this 
+print(contingency_table)
+
+
               
+                 0  1
+  T1 or T2      61 34
+  T3 or greater  8  5
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 1.4397e-31, df = 1, p-value = 1
+
+
########stage of disease -- final_overall_stage 
+
+table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this 
+
+

+  1   2   3  99 
+124 167 105   2 
+
+
stage_summary <- subset_data |>
+  distinct(participant_id, final_overall_stage) |>  # Get unique participant-stage combinations
+  group_by(final_overall_stage) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay)
+print(stage_summary)
+
+
# A tibble: 4 × 2
+  final_overall_stage count
+                <int> <int>
+1                   1    35
+2                   2    47
+3                   3    26
+4                  99     1
+
+
#exclude the 99 
+subset_data_clean <- subset_data |>
+  filter(final_overall_stage != 99, dtc_ever != 99)
+
+# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    final_overall_stage = first(final_overall_stage),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> kind of interesting, stage doesnt seem to predict dtc pos --> 0.80 
+print(contingency_table)
+
+
   
+     0  1
+  1 22 13
+  2 29 18
+  3 18  8
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 0.43515, df = 2, p-value = 0.8045
+
+
###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) 
+
+
+table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness 
+
+

+  1   2 
+158 240 
+
+
surgery <- subset_data |>
+  distinct(participant_id, diag_surgery_type_1) |>  # Get unique participant-stage combinations
+  group_by(diag_surgery_type_1) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(surgery)
+
+
# A tibble: 2 × 2
+  diag_surgery_type_1 count
+                <int> <int>
+1                   1    45
+2                   2    64
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    surgery = first(diag_surgery_type_1),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> p-val = 0.48....
+print(contingency_table)
+
+
   
+     0  1
+  1 31 14
+  2 38 25
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.50569, df = 1, p-value = 0.477
+
+
######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms)
+
+table(subset_data$diag_axillary_type___2_1) 
+
+

+  0   1 
+215 183 
+
+
table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two 
+
+

+ 0  1 
+16  4 
+
+
# Create a binary variable to identify participants who had axillary dissection
+subset_data_clean <- subset_data |>
+  mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0))
+
+# Ensure every participant has a dtc_ever and axillary_dissection value
+# Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one
+subset_data_clean <- subset_data |>
+  mutate(axillary_dissection = case_when(
+    diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1,  # Had axillary dissection
+    TRUE ~ 0  # No axillary dissection (includes missing values)
+  ))
+
+# Summarize the data by participant_id, including the axillary_dissection and dtc_ever variables
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    axillary_dissection = first(axillary_dissection),  # Get the axillary dissection status for each participant
+    dtc_ever = first(dtc_ever)  # Get the dtc_ever status for each participant
+  )
+
+contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+fishers <- fisher.test(contingency_table)
+print(fishers)
+
+

+    Fisher's Exact Test for Count Data
+
+data:  contingency_table
+p-value = 0.1649
+alternative hypothesis: true odds ratio is not equal to 1
+95 percent confidence interval:
+ 0.2309022 1.3129062
+sample estimates:
+odds ratio 
+ 0.5559943 
+
+
# Print the contingency table and Chi-squared test results --> p-value 0.173 (used chisq for consistency...)
+print(contingency_table)
+
+
   
+     0  1
+  0 31 23
+  1 39 16
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 1.614, df = 1, p-value = 0.2039
+
+
####inflammatory inflamm_yn -- IGNORE THIS for Table 1 
+table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable)
+
+

+  0   1 
+568  11 
+
+
table(d$inflamm_yn_2)  ### I think inflammatory folks just not in subset of patients in the dtc cohort 
+
+

+ 0 
+24 
+
+
table(subset_data$inflamm_yn)
+
+
Warning: Unknown or uninitialised column: `inflamm_yn`.
+
+
+
< table of extent 0 >
+
+
#### radiation prtx_radiation 
+table(subset_data$prtx_radiation) 
+
+

+  0   1 
+116 282 
+
+
radiation <- subset_data |> 
+  distinct(participant_id,prtx_radiation) |> 
+  group_by(prtx_radiation) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(radiation)
+
+
# A tibble: 2 × 2
+  prtx_radiation count
+           <int> <int>
+1              0    34
+2              1    75
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    radiation = first(prtx_radiation),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+fishers <- fisher.test(contingency_table)
+print(fishers)
+
+

+    Fisher's Exact Test for Count Data
+
+data:  contingency_table
+p-value = 0.6709
+alternative hypothesis: true odds ratio is not equal to 1
+95 percent confidence interval:
+ 0.4916844 3.2745694
+sample estimates:
+odds ratio 
+  1.243166 
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.77 
+print(contingency_table)
+
+
   
+     0  1
+  0 23 11
+  1 47 28
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.0823, df = 1, p-value = 0.7742
+
+
#### chemotherapy prtx_chemo 
+table(subset_data$prtx_chemo) 
+
+

+  0   1 
+ 18 380 
+
+
chemo <- subset_data |> 
+  distinct(participant_id,prtx_chemo) |> 
+  group_by(prtx_chemo) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(chemo) #3 people didn not get chemo in this cohort 
+
+
# A tibble: 2 × 2
+  prtx_chemo count
+       <int> <int>
+1          0     3
+2          1   106
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    chemo = first(prtx_chemo),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
fishers <- fisher.test(contingency_table)
+print(fishers)
+
+

+    Fisher's Exact Test for Count Data
+
+data:  contingency_table
+p-value = 0.2906
+alternative hypothesis: true odds ratio is not equal to 1
+95 percent confidence interval:
+ 0.00448755 5.37725419
+sample estimates:
+odds ratio 
+ 0.2715663 
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.60 
+print(contingency_table)
+
+
   
+     0  1
+  0  1  2
+  1 69 37
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.27148, df = 1, p-value = 0.6023
+
+
####neoadjuvant chemo diag_neoadj_chemo_1 or diag_neoadj_chemo_2 
+
+table(subset_data$diag_neoadj_chemo_1) 
+
+

+  0   1 
+327  71 
+
+
table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable 
+
+

+ 0 
+20 
+
+
nact <- subset_data |> 
+  distinct(participant_id,diag_neoadj_chemo_1) |> 
+  group_by(diag_neoadj_chemo_1) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(nact) #3 people didn not get chemo in this cohort 
+
+
# A tibble: 2 × 2
+  diag_neoadj_chemo_1 count
+                <int> <int>
+1                   0    90
+2                   1    19
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    nact = first(diag_neoadj_chemo_1),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> p-val  = 0.37 slightly greater trend than with ctDNA  
+print(contingency_table)
+
+
   
+     0  1
+  0 60 30
+  1 10  9
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.80344, df = 1, p-value = 0.3701
+
+
####hormone therapy prtx_endo 
+
+table(subset_data$prtx_endo) 
+
+

+  0   1 
+156 242 
+
+
endo <- subset_data |> 
+  distinct(participant_id,prtx_endo) |> 
+  group_by(prtx_endo) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(endo) #most ppl did get endo (62 of the 109)
+
+
# A tibble: 2 × 2
+  prtx_endo count
+      <int> <int>
+1         0    47
+2         1    62
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    endo = first(prtx_endo),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> p-val  = 0.50 
+print(contingency_table)
+
+
   
+     0  1
+  0 28 19
+  1 42 20
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.46137, df = 1, p-value = 0.497
+
+
####bone modifying agents prtx_bonemod 
+
+table(subset_data$prtx_bonemod) 
+
+

+  0   1 
+238 160 
+
+
bonemod <- subset_data |> 
+  distinct(participant_id,prtx_bonemod) |> 
+  group_by(prtx_bonemod) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(bonemod) #most ppl did get endo (39 got bonemod)
+
+
# A tibble: 2 × 2
+  prtx_bonemod count
+         <int> <int>
+1            0    70
+2            1    39
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    bonemod = first(prtx_bonemod),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of bonemod vs dtc_ever
+contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> p-val  = 1 
+print(contingency_table)
+
+
   
+     0  1
+  0 45 25
+  1 25 14
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0, df = 1, p-value = 1
+
+
+
+
#pCR 
+#2 = non-pcr, 1 = pcr 
+#path cr diag_pcr_1 or diag_pcr_2 (as this could be on either of the two diagnosis and staging forms, there are 2 variables for this)
+table(subset_data$diag_pcr_1) 
+
+

+  .   1   2 
+327   8  63 
+
+
table(subset_data$diag_pcr_2) #none recorded here so can just use pcr_1 
+
+

+      . 
+378  20 
+
+
pcr <- subset_data |>
+  mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) |>  # Convert "." to NA
+  filter(!is.na(diag_pcr_1)) |>  # Exclude rows where diag_pcr_1 is NA
+  distinct(participant_id, diag_pcr_1) |>
+  group_by(diag_pcr_1) |>
+  summarise(count = n()) # Count the number of participants per histology type
+
+# View the summary table
+print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data 
+
+
# A tibble: 2 × 2
+  diag_pcr_1 count
+  <chr>      <int>
+1 1              1
+2 2             18
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    pcr = first(diag_pcr_1),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of pcr vs dtc_ever
+contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.27-- does not seem to be association among those who got pcr (but also we have a group with 1 in it and a very small sample size of those on whom pCR was evaluated (18 individuals)
+print(contingency_table)
+
+
   
+     0  1
+  . 60 30
+  1  0  1
+  2 10  8
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 2.6174, df = 2, p-value = 0.2702
+
+
########recurrence
+#local first, then distant.then create summary variable of either locreg or distant 
+#local fu_locreg_prog 
+
+# Step 1: Summarize data by unique participant_id
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    fu_locreg_prog = first(fu_locreg_prog),  # Get fu_locreg_prog status for each participant
+    dtc_ever = first(dtc_ever),          # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of fu_locreg_prog vs dtc_ever
+contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$dtc_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the contingency table and Chi-squared test results -- p-val of 0.74, less of an association (but pts on trial) 
+print(contingency_table)
+
+
   
+     0  1
+  0 66 35
+  1  3  3
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.10507, df = 1, p-value = 0.7458
+
+
####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char 
+### Just want to look at site distribution here 
+
+# Summarize the distribution of fu_locreg_site_char by unique participant_id
+site_distribution <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    site = first(fu_locreg_site_char),  # Get the site for each unique participant
+    .groups = "drop"
+  ) |>
+  count(site)  # Count the occurrences of each site
+
+# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast 
+print(site_distribution)
+
+
# A tibble: 6 × 2
+  site                                                              n
+  <chr>                                                         <int>
+1 ""                                                              103
+2 "Axillary Nodes"                                                  2
+3 "Axillary Nodes,Internal Mammary Nodes,Supraclavicular Nodes"     1
+4 "Ipsilateral Breast"                                              1
+5 "Ipsilateral Breast,Axillary Nodes"                               1
+6 "Supraclavicular Nodes"                                           1
+
+
#####distant recurrence: distant fu_dist_prog 
+
+# Step 1: Summarize data by unique participant_id
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    fu_dist_prog = first(fu_dist_prog),  # Get fu_dist_prog status for each participant
+    dtc_ever = first(dtc_ever),          # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of dist prog vs dtc_ever --> 12 who had distant progression 
+contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$dtc_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the contingency table and Chi-squared test results -- p-val 0.63
+print(contingency_table)
+
+
   
+     0  1
+  0 60 35
+  1  9  3
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.23777, df = 1, p-value = 0.6258
+
+
### Distant sites 
+#distant site fu_dist_site_num #fu_dist_site_char  -- start justl ooking at the locations 
+
+# Summarize the distribution of fu_dist_site_char by unique participant_id
+dist_site_distribution <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    site = first(fu_dist_site_char),  # Get the site for each unique participant
+    .groups = "drop"
+  ) |>
+  count(site)  # Count the occurrences of each site
+
+# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal 
+print(dist_site_distribution)
+
+
# A tibble: 8 × 2
+  site                  n
+  <chr>             <int>
+1 ""                   97
+2 "Bone"                5
+3 "Bone,Other"          1
+4 "Intra-abdominal"     1
+5 "Liver"               2
+6 "Liver,Bone"          1
+7 "Lung"                1
+8 "Pleura,Lung"         1
+
+
#any recurrence 
+#either fu_locreg_prog or fu_dist_prog 
+
+subset_data <- subset_data |>
+  mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No"))
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc_ever = first(dtc_ever),        # Get the dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs dtc_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results -- total 14 relapses, 10 were dtc - 4 were dtc + 
+print(contingency_table)
+
+
     
+       0  1
+  No  59 34
+  Yes 10  4
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.079932, df = 1, p-value = 0.7774
+
+
#### Relapse and DTC 
+#using ever_relapsed
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc = first(dtc_ever),        # Get the dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs dtc_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
     
+       0  1
+  No  59 34
+  Yes 10  4
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.079932, df = 1, p-value = 0.7774
+
+
# Identify participants missing data in either `ever_relapsed` or `dtc_ever`
+missing_data <- subset_data_by_id |>
+  filter(is.na(ever_relapsed) | is.na(dtc))
+
+# Print the IDs of participants with missing data
+print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above)
+
+
[1] "28115-17-021" "28115-18-032"
+
+
####survival analysis  fu_survival 
+
+table(subset_data$fu_surv)
+
+

+  0   1 
+  8 389 
+
+
surv <- subset_data |>
+  distinct(participant_id, fu_surv) |>
+  group_by(fu_surv) |>
+  summarise(count = n()) # Count the number of participants per histology type
+
+# View the summary table
+print(surv) #1 NA patient --> identify the NA patient below  dead = 5, alive 103. There is 1 that's an NA. 
+
+
# A tibble: 3 × 2
+  fu_surv count
+    <int> <int>
+1       0     5
+2       1   103
+3      NA     1
+
+
na_participant <- subset_data |>
+  filter(is.na(fu_surv)) |>
+  select(participant_id, fu_surv)
+
+# Print the result -- 28115-17-021  -- no follow up data for this pt looking in redcap, everyone else has some survival data in the dtc cohort. 
+print(na_participant)
+
+
# A tibble: 1 × 2
+  participant_id fu_surv
+  <chr>            <int>
+1 28115-17-021        NA
+
+
# Summarize data by unique participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    surv = first(fu_surv),          # Get survival status for each participant
+    dtc_ever = first(dtc_ever),  # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of surv vs dtc_ever
+contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
   
+     0  1
+  0  4  1
+  1 65 38
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.084865, df = 1, p-value = 0.7708
+
+
+

Now that we have run the univariate associations for all the important demographic and clinical factors for both ctDNA and DTC, we will work on actually making our table 1, first by ctDNA status and a second table by DTC status.

+
+

4.1 Making our Table 1

+
+

4.1.1 Demographics and Clinical Factors by ctDNA Status

+
+
####### Making Table 1--first for ctDNA ######### 
+
+## Resources to try for both making Table 1 and LASSO 
+## https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html
+## lasso https://stats.stackexchange.com/questions/72251/an-example-lasso-regression-using-glmnet-for-binary-outcome 
+## https://www.r-bloggers.com/2020/05/quick-tutorial-on-lasso-regression-with-example/#google_vignette 
+
+#Table 1 Code 
+library(table1)
+
+

+Attaching package: 'table1'
+
+
+
The following objects are masked from 'package:base':
+
+    units, units<-
+
+
names(subset_data) #to choose variables 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                         "ever_relapsed"                   
+[391] "age_at_diag"                      "HR_status"                       
+[393] "histology_category"               "node_status"                     
+[395] "axillary_dissection"             
+
+
library(dplyr)
+library(tidyr)
+library(stringr)
+
+# Prepare the dataset
+unique_subset_data <- subset_data |>
+  mutate(
+    # Convert "Missing" and 99 to NA in relevant columns
+    final_t_stage = na_if(as.character(final_t_stage), "Missing"),
+    final_t_stage = na_if(final_t_stage, "99"),
+    final_overall_stage = na_if(as.character(final_overall_stage), "Missing"),
+    final_overall_stage = na_if(final_overall_stage, "99"),
+    final_tumor_grade = na_if(final_tumor_grade, 3),
+    diag_pcr_1 = na_if(diag_pcr_1, "."),
+    # Replace 99 with NA in all numeric columns
+    across(where(is.numeric), ~ na_if(.x, 99))
+  )  |>
+  group_by(participant_id) |>
+  summarize(
+    age_at_diag = first(na.omit(age_at_diag)),
+    final_receptor_group = first(na.omit(final_receptor_group)),
+    demo_race_final = first(na.omit(demo_race_final)),
+    final_tumor_grade = first(na.omit(final_tumor_grade)),
+    final_overall_stage = first(na.omit(final_overall_stage)),
+    final_t_stage = first(na.omit(final_t_stage)),
+    final_n_stage = first(na.omit(final_n_stage)),
+    histology_category = first(na.omit(histology_category)),
+    prtx_radiation = first(na.omit(prtx_radiation)),
+    prtx_chemo = first(na.omit(prtx_chemo)),
+    prtx_endo = first(na.omit(prtx_endo)),
+    prtx_bonemod = first(na.omit(prtx_bonemod)),
+    node_status = first(na.omit(node_status)),
+    axillary_dissection = first(na.omit(axillary_dissection)),
+    diag_surgery_type_1 = first(na.omit(diag_surgery_type_1)),
+    diag_neoadj_chemo_1 = first(na.omit(diag_neoadj_chemo_1)), diag_pcr_1 = first(na.omit(diag_pcr_1)),
+    ctDNA_ever = first(na.omit(ctDNA_ever))
+  )
+
+#######
+#add labels for 
+#final_receptor_group
+#demo_race_final
+#final_tumor_grade
+#final_overall_tage
+#final_t_stage) 
+#final_n_stage 
+#histology_category
+#prtx_radiation 
+#prtx_chemo) 
+#prtx_endo
+#prtx_bonemod 
+#node_status) 
+#axillary_dissection 
+#diag_surgery_type_1
+#diag_neoadj_chemo_1 
+#ctDNA_ever 
+#diag_pcr_1
+
+
+label(unique_subset_data$age_at_diag) <- "Age at Diagnosis"
+units(unique_subset_data$age_at_diag)       <- "years"
+
+#Final receptor group: 1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+'
+
+
+# assign `final_receptor_group` factor levels and labels to `unique_subset_data`
+unique_subset_data <- unique_subset_data |>
+  mutate(
+    final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4),
+                                  labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+"))
+  )
+
+label(unique_subset_data$final_receptor_group)       <- "Final Receptor Group"
+
+table(unique_subset_data$final_receptor_group)
+
+

+     TNBC HR+ HER2- HR+ HER2+ HR- HER2+ 
+       45        52         8         4 
+
+
##demo_race_final 
+
+table(unique_subset_data$demo_race_final) #1, 3, 5 -- 5 = white, 1 = black, 3 = asian 
+
+

+ 1  3  5 
+ 9  1 99 
+
+
unique_subset_data$demo_race_final <- 
+  factor(unique_subset_data$demo_race_final, levels=c(1,3,5),
+         labels=c("Black", 
+                  "Asian", "White"))
+label(unique_subset_data$demo_race_final)  <- "Race"
+table(unique_subset_data$demo_race_final) 
+
+

+Black Asian White 
+    9     1    99 
+
+
#final_tumor_grade 
+table(unique_subset_data$final_tumor_grade) # 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not Reported. Added 3 to NA up above for table 1 code so it will be considered N/A. 
+
+

+ 0  1  2 
+79 22  6 
+
+
unique_subset_data$final_tumor_grade <- 
+  factor(unique_subset_data$final_tumor_grade, levels=c(0,1,2),
+         labels=c("Grade 3", 
+                  "Grade 1", "Grade 2"))
+label(unique_subset_data$final_tumor_grade)  <- "Tumor Grade"
+table(unique_subset_data$final_tumor_grade) 
+
+

+Grade 3 Grade 1 Grade 2 
+     79      22       6 
+
+
#final_overall_stage
+
+table(unique_subset_data$final_overall_stage) # 1 = stage I 2 = stage II 3 = stage III  
+
+

+ 1  2  3 
+35 47 26 
+
+
unique_subset_data$final_overall_stage <- 
+  factor(unique_subset_data$final_overall_stage, levels=c(1,2,3),
+         labels=c("Stage I", 
+                  "Stage II", "Stage III"))
+label(unique_subset_data$final_overall_stage)  <- "Overall Stage"
+table(unique_subset_data$final_overall_stage) 
+
+

+  Stage I  Stage II Stage III 
+       35        47        26 
+
+
#final_t_stage
+table(unique_subset_data$final_t_stage) # 1 = T1 2 = T2 3 = T3 4 = T4  
+
+

+ 1  2  3  4 
+51 44 12  1 
+
+
unique_subset_data$final_t_stage <- 
+  factor(unique_subset_data$final_t_stage, levels=c(1,2,3,4),
+         labels=c("T1", 
+                  "T2", "T3", "T4"))
+label(unique_subset_data$final_t_stage)  <- "T Stage"
+table(unique_subset_data$final_t_stage) 
+
+

+T1 T2 T3 T4 
+51 44 12  1 
+
+
#final_n_stage 
+
+table(unique_subset_data$final_n_stage) # 0 = N0 1 = N1 2 = N2 3 = N3 
+
+

+ 0  1  2  3 
+46 43 13  7 
+
+
unique_subset_data$final_n_stage <- 
+  factor(unique_subset_data$final_n_stage, levels=c(0,1,2,3),
+         labels=c("N0", 
+                  "N1", "N2", "N3"))
+label(unique_subset_data$final_n_stage)  <- "N Stage"
+table(unique_subset_data$final_n_stage) 
+
+

+N0 N1 N2 N3 
+46 43 13  7 
+
+
#histology_category
+
+table(unique_subset_data$histology_category) #These are labeled already correctly as both ductal and lobular, ductal, lobular, and other  
+
+

+Both Ductal and Lobular                  Ductal                 Lobular 
+                      9                      84                      14 
+                  Other 
+                      2 
+
+
label(unique_subset_data$histology_category)  <- "Histology Category"
+
+
+#prtx_radiation 
+
+table(unique_subset_data$prtx_radiation) #1 = radiation 0 = no 
+
+

+ 0  1 
+34 75 
+
+
unique_subset_data$prtx_radiation <- 
+  factor(unique_subset_data$prtx_radiation, levels=c(0,1),
+         labels=c("No Radiation", "Radiation"))
+label(unique_subset_data$prtx_radiation)  <- "Radiation"
+table(unique_subset_data$prtx_radiation)
+
+

+No Radiation    Radiation 
+          34           75 
+
+
#prtx_chemo
+
+table(unique_subset_data$prtx_chemo) #1 = chemo 0 = no 
+
+

+  0   1 
+  3 106 
+
+
table(subset_data$prtx_chemo)
+
+

+  0   1 
+ 18 380 
+
+
unique_subset_data$prtx_chemo <- 
+factor(unique_subset_data$prtx_chemo, levels=c(0,1),
+         labels=c("No Chemo", "Chemo"))
+label(unique_subset_data$prtx_chemo)  <- "Chemo"
+table(unique_subset_data$prtx_chemo)
+
+

+No Chemo    Chemo 
+       3      106 
+
+
#prtx_endo
+
+
+table(unique_subset_data$prtx_endo) #1 = hormone therapy 0 = no 
+
+

+ 0  1 
+47 62 
+
+
table(subset_data$prtx_endo)
+
+

+  0   1 
+156 242 
+
+
unique_subset_data$prtx_endo <- 
+factor(unique_subset_data$prtx_endo, levels=c(0,1),
+         labels=c("No Endocrine Therapy", "Endocrine Therapy"))
+label(unique_subset_data$prtx_endo)  <- "Endocrine Therapy"
+table(unique_subset_data$prtx_endo)
+
+

+No Endocrine Therapy    Endocrine Therapy 
+                  47                   62 
+
+
#prtx_bonemod 
+
+table(unique_subset_data$prtx_bonemod) #1 = bonemod 0 = no 
+
+

+ 0  1 
+70 39 
+
+
table(unique_subset_data$prtx_bonemod)
+
+

+ 0  1 
+70 39 
+
+
unique_subset_data$prtx_bonemod <- 
+factor(unique_subset_data$prtx_bonemod, levels=c(0,1),
+         labels=c("No Bone Modifying Treatment", "Bone Modifying Treatment"))
+label(unique_subset_data$prtx_bonemod)  <- "Bone Modifying Treatment"
+table(unique_subset_data$prtx_bonemod)
+
+

+No Bone Modifying Treatment    Bone Modifying Treatment 
+                         70                          39 
+
+
#node_status 
+table(unique_subset_data$node_status) #already positive and negative  
+
+

+Node Negative Node Positive 
+           46            63 
+
+
label(unique_subset_data$node_status)  <- "Node Status"
+
+#axillary_dissection 
+
+table(unique_subset_data$axillary_dissection) #1 = axillary dissection 0 = no dissection
+
+

+ 0  1 
+54 55 
+
+
unique_subset_data$axillary_dissection <- 
+factor(unique_subset_data$axillary_dissection, levels=c(0,1),
+         labels=c("No Axillary Dissection", "Axillary Dissection"))
+label(unique_subset_data$axillary_dissection)  <- "Axillary Dissection"
+table(unique_subset_data$axillary_dissection)
+
+

+No Axillary Dissection    Axillary Dissection 
+                    54                     55 
+
+
#diag_surgery_type_1
+table(unique_subset_data$diag_surgery_type_1) #1 = Lumpectomy 2 = Mastectomy
+
+

+ 1  2 
+45 64 
+
+
unique_subset_data$diag_surgery_type_1 <- 
+factor(unique_subset_data$diag_surgery_type_1, levels=c(1,2),
+         labels=c("Lumpectomy", "Mastectomy"))
+label(unique_subset_data$diag_surgery_type_1)  <- "Surgery Type"
+table(unique_subset_data$diag_surgery_type_1)
+
+

+Lumpectomy Mastectomy 
+        45         64 
+
+
#diag_neoadj_chemo_1 
+
+table(unique_subset_data$diag_neoadj_chemo_1) #1 = Neoadj Chemo 0 = No Neoadjuv 
+
+

+ 0  1 
+90 19 
+
+
unique_subset_data$diag_neoadj_chemo_1 <- 
+factor(unique_subset_data$diag_neoadj_chemo_1, levels=c(0,1),
+         labels=c("No Neoadjuvant Chemo", "Neoadjuvant Chemo"))
+label(unique_subset_data$diag_neoadj_chemo_1)  <- "Neoadjuvant Chemo"
+table(unique_subset_data$diag_neoadj_chemo_1)
+
+

+No Neoadjuvant Chemo    Neoadjuvant Chemo 
+                  90                   19 
+
+
#pCR 
+table(unique_subset_data$diag_pcr_1) #1 = pCR 2 = non-PCR  
+
+

+ 1  2 
+ 1 18 
+
+
unique_subset_data$diag_pcr_1<- 
+factor(unique_subset_data$diag_pcr_1, levels=c(1,2),
+         labels=c("pCR", "Non-pCR"))
+label(unique_subset_data$diag_pcr_1)  <- "Pathologic Complete Response"
+table(unique_subset_data$diag_pcr_1)
+
+

+    pCR Non-pCR 
+      1      18 
+
+
#ctDNA_ever 
+table(unique_subset_data$ctDNA_ever) #FALSE = ctDNA Negative TRUE = ctDNA Positive
+
+

+FALSE  TRUE 
+  100     9 
+
+
unique_subset_data$ctDNA_ever <- 
+factor(unique_subset_data$ctDNA_ever, levels=c("FALSE", "TRUE"),
+         labels=c("ctDNA Negative", "ctDNA Positive"))
+label(unique_subset_data$ctDNA_ever)  <- "ctDNA Status"
+table(unique_subset_data$ctDNA_ever)
+
+

+ctDNA Negative ctDNA Positive 
+           100              9 
+
+
caption  <- "Table 1 by ctDNA Status"
+
+# Generate the table1 summary
+table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 + diag_pcr_1 | 
+    ctDNA_ever,
+  data = unique_subset_data, overall=c(left="Total"), caption=caption)
+
+
+ + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1 by ctDNA Status
Total
+(N=109)
ctDNA Negative
+(N=100)
ctDNA Positive
+(N=9)
Age at Diagnosis (years)
Mean (SD)49.7 (9.66)49.2 (9.64)54.6 (8.94)
Median [Min, Max]49.3 [27.3, 68.9]49.0 [27.3, 68.9]55.6 [38.6, 64.4]
Final Receptor Group
TNBC45 (41.3%)44 (44.0%)1 (11.1%)
HR+ HER2-52 (47.7%)45 (45.0%)7 (77.8%)
HR+ HER2+8 (7.3%)8 (8.0%)0 (0%)
HR- HER2+4 (3.7%)3 (3.0%)1 (11.1%)
Race
Black9 (8.3%)8 (8.0%)1 (11.1%)
Asian1 (0.9%)1 (1.0%)0 (0%)
White99 (90.8%)91 (91.0%)8 (88.9%)
Tumor Grade
Grade 379 (72.5%)75 (75.0%)4 (44.4%)
Grade 122 (20.2%)17 (17.0%)5 (55.6%)
Grade 26 (5.5%)6 (6.0%)0 (0%)
Missing2 (1.8%)2 (2.0%)0 (0%)
Overall Stage
Stage I35 (32.1%)33 (33.0%)2 (22.2%)
Stage II47 (43.1%)46 (46.0%)1 (11.1%)
Stage III26 (23.9%)20 (20.0%)6 (66.7%)
Missing1 (0.9%)1 (1.0%)0 (0%)
T Stage
T151 (46.8%)48 (48.0%)3 (33.3%)
T244 (40.4%)40 (40.0%)4 (44.4%)
T312 (11.0%)11 (11.0%)1 (11.1%)
T41 (0.9%)0 (0%)1 (11.1%)
Missing1 (0.9%)1 (1.0%)0 (0%)
N Stage
N046 (42.2%)43 (43.0%)3 (33.3%)
N143 (39.4%)43 (43.0%)0 (0%)
N213 (11.9%)8 (8.0%)5 (55.6%)
N37 (6.4%)6 (6.0%)1 (11.1%)
Histology Category
Both Ductal and Lobular9 (8.3%)9 (9.0%)0 (0%)
Ductal84 (77.1%)78 (78.0%)6 (66.7%)
Lobular14 (12.8%)11 (11.0%)3 (33.3%)
Other2 (1.8%)2 (2.0%)0 (0%)
Radiation
No Radiation34 (31.2%)33 (33.0%)1 (11.1%)
Radiation75 (68.8%)67 (67.0%)8 (88.9%)
Chemo
No Chemo3 (2.8%)2 (2.0%)1 (11.1%)
Chemo106 (97.2%)98 (98.0%)8 (88.9%)
Endocrine Therapy
No Endocrine Therapy47 (43.1%)45 (45.0%)2 (22.2%)
Endocrine Therapy62 (56.9%)55 (55.0%)7 (77.8%)
Bone Modifying Treatment
No Bone Modifying Treatment70 (64.2%)65 (65.0%)5 (55.6%)
Bone Modifying Treatment39 (35.8%)35 (35.0%)4 (44.4%)
Node Status
Node Negative46 (42.2%)43 (43.0%)3 (33.3%)
Node Positive63 (57.8%)57 (57.0%)6 (66.7%)
Axillary Dissection
No Axillary Dissection54 (49.5%)52 (52.0%)2 (22.2%)
Axillary Dissection55 (50.5%)48 (48.0%)7 (77.8%)
Surgery Type
Lumpectomy45 (41.3%)41 (41.0%)4 (44.4%)
Mastectomy64 (58.7%)59 (59.0%)5 (55.6%)
Neoadjuvant Chemo
No Neoadjuvant Chemo90 (82.6%)82 (82.0%)8 (88.9%)
Neoadjuvant Chemo19 (17.4%)18 (18.0%)1 (11.1%)
Pathologic Complete Response
pCR1 (0.9%)1 (1.0%)0 (0%)
Non-pCR18 (16.5%)17 (17.0%)1 (11.1%)
Missing90 (82.6%)82 (82.0%)8 (88.9%)
+ +
+
+
+

We have our basic Table 1 by ctDNA status.

+
+
#Adding P-values and tests of significance to the code. 
+
+# Step 1: Create table1 output
+table1_output <- table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 +diag_pcr_1 | 
+    ctDNA_ever,
+  data = unique_subset_data,
+  overall = c(left = "Total"),
+  caption = "Table 1: Summary of demographic and clinical variables by ctDNA status"
+)
+
+
+####
+pvalue_function <- function(x, ...) {
+  print(x)
+  # Remove any "overall" group if present and focus only on ctDNA+ and ctDNA- comparisons
+  x <- x[!names(x) %in% "overall"]  # Filter out the "Overall" column
+  y <- unlist(x)
+  g <- factor(rep(1:length(x), times = sapply(x, length)))
+  
+  # Debugging information to check group levels and data
+  if (length(unique(g)) != 2) {
+    return(NA)  # Return NA if not comparing exactly two groups
+  }
+
+  # Perform the appropriate test based on the type of variable
+  if (is.numeric(y)) {
+    # For continuous variables, perform a t-test
+    p <- t.test(y ~ g)$p.value
+  } else {
+    # For categorical variables, perform a chi-squared test or Fisher's test
+    table_result <- table(y, g)
+    
+    # Choose the correct test based on cell counts
+    if (any(table_result < 5)) {
+      p <- fisher.test(table_result)$p.value  # Use Fisher's test for low counts
+    } else {
+      p <- chisq.test(table_result)$p.value  # Use chi-squared test otherwise
+    }
+  }
+  
+  # Format the p-value for output
+  formatted_p <- format.pval(p, digits = 3, eps = 0.001)
+  return(formatted_p)
+}
+  
+
+# Generate table1 with the p-value column
+table1_p <- table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 + diag_pcr_1| 
+    ctDNA_ever,
+  data = unique_subset_data,
+  overall = c(left = "Total"),
+  extra.col = list("P-value" = pvalue_function),  # Add p-value function
+  extra.col.pos = 4  # Position of the extra column
+)
+
+
$overall
+  [1] 55.89870 49.25667 52.87611 29.93840 37.00753 48.98563 63.80835 40.89802
+  [9] 43.59754 38.57632 41.77687 45.68925 59.94524 59.43600 52.14511 42.93771
+ [17] 64.69541 55.14031 41.26762 39.52361 57.76044 44.42984 51.34565 42.27789
+ [25] 57.05133 57.62628 54.86927 44.18891 63.62491 36.00548 55.57837 30.71595
+ [33] 41.28953 59.38946 48.79945 59.15400 48.97194 59.39767 39.67967 67.68515
+ [41] 41.84531 48.16975 58.07529 62.49966 46.64476 47.34565 52.09856 36.58042
+ [49] 58.26146 61.76318 61.73580 39.40862 55.30459 53.10335 43.30459 48.46270
+ [57] 44.07666 52.55305 56.45996 67.72621 39.59206 51.82752 58.28611 46.93498
+ [65] 31.17591 55.96441 46.38741 46.33812 40.62971 37.67556 32.35318 48.75291
+ [73] 56.22177 39.41136 49.76591 43.22245 36.01095 41.30322 59.57016 39.65503
+ [81] 54.94593 43.50992 48.80767 62.10541 63.35934 57.31417 59.74264 66.92676
+ [89] 36.30938 34.83641 55.12115 52.07118 27.33744 64.41342 56.09035 47.90691
+ [97] 51.38125 41.71663 48.47639 40.52567 60.39151 52.51198 60.87064 58.61465
+[105] 38.60370 68.93634 37.84531 51.43874 52.68720
+attr(,"label")
+[1] "Age at Diagnosis"
+attr(,"units")
+[1] "years"
+
+$`ctDNA Negative`
+  [1] 55.89870 49.25667 52.87611 29.93840 37.00753 48.98563 40.89802 43.59754
+  [9] 38.57632 41.77687 45.68925 59.94524 59.43600 52.14511 42.93771 64.69541
+ [17] 55.14031 41.26762 39.52361 57.76044 44.42984 51.34565 42.27789 57.05133
+ [25] 57.62628 54.86927 44.18891 36.00548 30.71595 41.28953 59.38946 59.15400
+ [33] 48.97194 59.39767 39.67967 67.68515 41.84531 48.16975 62.49966 46.64476
+ [41] 47.34565 52.09856 36.58042 58.26146 61.76318 61.73580 39.40862 55.30459
+ [49] 53.10335 43.30459 48.46270 44.07666 52.55305 56.45996 67.72621 39.59206
+ [57] 51.82752 58.28611 46.93498 31.17591 55.96441 46.33812 40.62971 37.67556
+ [65] 32.35318 48.75291 56.22177 39.41136 49.76591 43.22245 36.01095 41.30322
+ [73] 59.57016 39.65503 54.94593 43.50992 48.80767 62.10541 63.35934 57.31417
+ [81] 59.74264 66.92676 36.30938 34.83641 55.12115 27.33744 56.09035 47.90691
+ [89] 51.38125 41.71663 48.47639 40.52567 60.39151 52.51198 60.87064 58.61465
+ [97] 68.93634 37.84531 51.43874 52.68720
+
+$`ctDNA Positive`
+[1] 63.80835 63.62491 55.57837 48.79945 58.07529 46.38741 52.07118 64.41342
+[9] 38.60370
+
+$overall
+  [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR- HER2+
+  [8] HR+ HER2+ TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2+
+ [15] TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC     
+ [22] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
+ [29] HR+ HER2- TNBC      TNBC      HR+ HER2+ TNBC      HR+ HER2- HR+ HER2-
+ [36] HR+ HER2+ TNBC      HR+ HER2- TNBC      HR+ HER2- TNBC      TNBC     
+ [43] HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2-
+ [50] HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2- TNBC      HR+ HER2-
+ [57] TNBC      TNBC      TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [64] TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2-
+ [71] TNBC      HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2+ HR+ HER2-
+ [78] TNBC      HR+ HER2- HR+ HER2- TNBC      HR- HER2+ TNBC      HR+ HER2-
+ [85] HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [92] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC     
+ [99] HR+ HER2- TNBC      HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2-
+[106] HR+ HER2- HR- HER2+ TNBC      HR+ HER2-
+attr(,"label")
+[1] Final Receptor Group
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$`ctDNA Negative`
+  [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2+
+  [8] TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2+ TNBC     
+ [15] TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC      HR+ HER2-
+ [22] HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC      TNBC     
+ [29] HR+ HER2+ TNBC      HR+ HER2- HR+ HER2+ TNBC      HR+ HER2- TNBC     
+ [36] HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2-
+ [43] HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2-
+ [50] TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
+ [57] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [64] HR+ HER2- TNBC      HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2+
+ [71] HR+ HER2- TNBC      HR+ HER2- HR+ HER2- TNBC      HR- HER2+ TNBC     
+ [78] HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2-
+ [85] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2-
+ [92] TNBC      HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2- HR- HER2+
+ [99] TNBC      HR+ HER2-
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$`ctDNA Positive`
+[1] HR- HER2+ HR+ HER2- TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2-
+[8] HR+ HER2- HR+ HER2-
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$overall
+  [1] White White White White White White Black White White White White White
+ [13] White White White White White White Black White White White White White
+ [25] White Black White White White White White White Black White White White
+ [37] White White White White White White White White White White White Black
+ [49] White White White White White White White White Black White White White
+ [61] White White White White Black White White White White White White White
+ [73] White White White White White White White White Black White White White
+ [85] White White White White White White White White Asian White White White
+ [97] White Black White White White White White White White White White White
+[109] White
+attr(,"label")
+[1] Race
+Levels: Black Asian White
+
+$`ctDNA Negative`
+  [1] White White White White White White White White White White White White
+ [13] White White White White White Black White White White White White White
+ [25] Black White White White White Black White White White White White White
+ [37] White White White White White White Black White White White White White
+ [49] White White White Black White White White White White White White Black
+ [61] White White White White White White White White White White White White
+ [73] White White Black White White White White White White White White White
+ [85] White Asian White White White Black White White White White White White
+ [97] White White White White
+Levels: Black Asian White
+
+$`ctDNA Positive`
+[1] Black White White White White White White White White
+Levels: Black Asian White
+
+$overall
+  [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [10] Grade 3 Grade 3 Grade 3 <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3
+ [19] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [28] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3
+ [37] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 2
+ [46] Grade 1 Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 <NA>   
+ [55] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+ [64] Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
+ [73] Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [82] Grade 3 Grade 3 Grade 2 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+ [91] Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+[100] Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+[109] Grade 3
+attr(,"label")
+[1] Tumor Grade
+Levels: Grade 3 Grade 1 Grade 2
+
+$`ctDNA Negative`
+  [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [10] Grade 3 Grade 3 <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [19] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [28] Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3
+ [37] Grade 3 Grade 3 Grade 2 Grade 2 Grade 1 Grade 3 Grade 1 Grade 1 Grade 1
+ [46] Grade 3 Grade 3 Grade 3 <NA>    Grade 3 Grade 1 Grade 3 Grade 3 Grade 3
+ [55] Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+ [64] Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [73] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 3 Grade 3 Grade 1
+ [82] Grade 3 Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3
+ [91] Grade 1 Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
+[100] Grade 3
+Levels: Grade 3 Grade 1 Grade 2
+
+$`ctDNA Positive`
+[1] Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 1 Grade 1 Grade 1 Grade 3
+Levels: Grade 3 Grade 1 Grade 2
+
+$overall
+  [1] Stage III Stage I   Stage III Stage II  Stage II  Stage III Stage III
+  [8] Stage III Stage II  Stage I   Stage II  Stage II  Stage II  Stage II 
+ [15] Stage I   Stage II  Stage I   Stage II  Stage II  Stage I   Stage II 
+ [22] Stage III Stage III Stage II  Stage II  Stage II  Stage I   Stage II 
+ [29] Stage III Stage II  Stage I   Stage II  Stage I   Stage II  Stage III
+ [36] Stage II  Stage II  Stage III Stage I   Stage I   Stage II  Stage III
+ [43] Stage II  Stage I   Stage I   Stage III Stage II  Stage III Stage II 
+ [50] Stage I   Stage II  Stage II  Stage I   Stage II  Stage I   Stage III
+ [57] Stage I   Stage II  Stage I   Stage II  Stage II  Stage III Stage I  
+ [64] Stage II  Stage II  Stage II  Stage III Stage II  Stage I   Stage II 
+ [71] Stage I   Stage II  Stage I   Stage III Stage III Stage I   Stage III
+ [78] Stage I   Stage I   Stage I   Stage I   Stage II  Stage III Stage I  
+ [85] Stage II  Stage I   Stage III Stage I   Stage II  Stage II  Stage II 
+ [92] Stage III Stage II  Stage III Stage III Stage I   Stage II  Stage II 
+ [99] <NA>      Stage I   Stage I   Stage III Stage III Stage I   Stage I  
+[106] Stage II  Stage II  Stage I   Stage II 
+attr(,"label")
+[1] Overall Stage
+Levels: Stage I Stage II Stage III
+
+$`ctDNA Negative`
+  [1] Stage III Stage I   Stage III Stage II  Stage II  Stage III Stage III
+  [8] Stage II  Stage I   Stage II  Stage II  Stage II  Stage II  Stage I  
+ [15] Stage II  Stage I   Stage II  Stage II  Stage I   Stage II  Stage III
+ [22] Stage III Stage II  Stage II  Stage II  Stage I   Stage II  Stage II 
+ [29] Stage II  Stage I   Stage II  Stage II  Stage II  Stage III Stage I  
+ [36] Stage I   Stage II  Stage III Stage I   Stage I   Stage III Stage II 
+ [43] Stage III Stage II  Stage I   Stage II  Stage II  Stage I   Stage II 
+ [50] Stage I   Stage III Stage I   Stage II  Stage I   Stage II  Stage II 
+ [57] Stage III Stage I   Stage II  Stage II  Stage II  Stage II  Stage I  
+ [64] Stage II  Stage I   Stage II  Stage I   Stage III Stage III Stage I  
+ [71] Stage III Stage I   Stage I   Stage I   Stage I   Stage II  Stage III
+ [78] Stage I   Stage II  Stage I   Stage III Stage I   Stage II  Stage II 
+ [85] Stage II  Stage II  Stage III Stage I   Stage II  Stage II  <NA>     
+ [92] Stage I   Stage I   Stage III Stage III Stage I   Stage II  Stage II 
+ [99] Stage I   Stage II 
+Levels: Stage I Stage II Stage III
+
+$`ctDNA Positive`
+[1] Stage III Stage III Stage I   Stage III Stage II  Stage III Stage III
+[8] Stage III Stage I  
+Levels: Stage I Stage II Stage III
+
+$overall
+  [1] T2   T1   T2   T2   T2   T3   T2   T1   T2   T1   T2   T2   T3   T1   T1  
+ [16] T2   T1   T1   T2   T1   T2   T2   T2   T2   T1   T2   T1   <NA> T2   T2  
+ [31] T1   T1   T1   T3   T4   T2   T2   T1   T1   T1   T2   T3   T2   T1   T1  
+ [46] T3   T1   T1   T2   T1   T2   T2   T1   T1   T1   T3   T1   T1   T1   T2  
+ [61] T2   T3   T1   T2   T2   T2   T1   T1   T1   T2   T1   T2   T1   T3   T3  
+ [76] T1   T2   T1   T1   T1   T1   T2   T2   T1   T2   T1   T1   T1   T2   T2  
+ [91] T2   T3   T2   T2   T1   T1   T2   T1   T1   T1   T1   T3   T3   T1   T1  
+[106] T2   T2   T1   T2  
+attr(,"label")
+[1] T Stage
+Levels: T1 T2 T3 T4
+
+$`ctDNA Negative`
+  [1] T2   T1   T2   T2   T2   T3   T1   T2   T1   T2   T2   T3   T1   T1   T2  
+ [16] T1   T1   T2   T1   T2   T2   T2   T2   T1   T2   T1   <NA> T2   T1   T1  
+ [31] T3   T2   T2   T1   T1   T1   T2   T3   T1   T1   T3   T1   T1   T2   T1  
+ [46] T2   T2   T1   T1   T1   T3   T1   T1   T1   T2   T2   T3   T1   T2   T2  
+ [61] T2   T1   T1   T2   T1   T2   T1   T3   T3   T1   T2   T1   T1   T1   T1  
+ [76] T2   T2   T1   T2   T1   T1   T1   T2   T2   T2   T2   T1   T1   T2   T1  
+ [91] T1   T1   T1   T3   T3   T1   T2   T2   T1   T2  
+Levels: T1 T2 T3 T4
+
+$`ctDNA Positive`
+[1] T2 T2 T1 T4 T2 T1 T3 T2 T1
+Levels: T1 T2 T3 T4
+
+$overall
+  [1] N3 N0 N3 N0 N0 N2 N2 N2 N0 N0 N0 N1 N0 N1 N0 N1 N0 N1 N0 N0 N0 N3 N3 N1 N1
+ [26] N0 N0 N0 N2 N0 N0 N1 N0 N0 N2 N0 N0 N2 N0 N0 N1 N2 N0 N0 N0 N2 N1 N2 N1 N0
+ [51] N1 N1 N0 N1 N0 N1 N0 N1 N1 N1 N0 N1 N0 N1 N1 N0 N3 N1 N1 N1 N0 N1 N1 N3 N1
+ [76] N1 N3 N0 N0 N1 N0 N1 N1 N1 N0 N0 N2 N0 N1 N1 N1 N2 N1 N2 N2 N0 N1 N1 N1 N0
+[101] N1 N1 N1 N0 N0 N1 N1 N0 N0
+attr(,"label")
+[1] N Stage
+Levels: N0 N1 N2 N3
+
+$`ctDNA Negative`
+  [1] N3 N0 N3 N0 N0 N2 N2 N0 N0 N0 N1 N0 N1 N0 N1 N0 N1 N0 N0 N0 N3 N3 N1 N1 N0
+ [26] N0 N0 N0 N1 N0 N0 N0 N0 N2 N0 N0 N1 N2 N0 N0 N2 N1 N2 N1 N0 N1 N1 N0 N1 N0
+ [51] N1 N0 N1 N1 N1 N0 N1 N0 N1 N1 N0 N1 N1 N1 N0 N1 N1 N3 N1 N1 N3 N0 N0 N1 N0
+ [76] N1 N1 N1 N0 N0 N2 N0 N1 N1 N1 N1 N2 N0 N1 N1 N1 N0 N1 N1 N1 N0 N1 N1 N0 N0
+Levels: N0 N1 N2 N3
+
+$`ctDNA Positive`
+[1] N2 N2 N0 N2 N0 N3 N2 N2 N0
+Levels: N0 N1 N2 N3
+
+$overall
+  [1] Both Ductal and Lobular Ductal                  Ductal                 
+  [4] Ductal                  Ductal                  Lobular                
+  [7] Ductal                  Ductal                  Ductal                 
+ [10] Ductal                  Ductal                  Ductal                 
+ [13] Lobular                 Ductal                  Ductal                 
+ [16] Ductal                  Ductal                  Ductal                 
+ [19] Ductal                  Ductal                  Ductal                 
+ [22] Ductal                  Ductal                  Ductal                 
+ [25] Ductal                  Ductal                  Ductal                 
+ [28] Ductal                  Ductal                  Ductal                 
+ [31] Ductal                  Ductal                  Ductal                 
+ [34] Lobular                 Ductal                  Ductal                 
+ [37] Ductal                  Ductal                  Ductal                 
+ [40] Ductal                  Ductal                  Ductal                 
+ [43] Ductal                  Ductal                  Other                  
+ [46] Lobular                 Ductal                  Ductal                 
+ [49] Lobular                 Lobular                 Ductal                 
+ [52] Ductal                  Ductal                  Lobular                
+ [55] Ductal                  Lobular                 Ductal                 
+ [58] Ductal                  Ductal                  Ductal                 
+ [61] Other                   Lobular                 Ductal                 
+ [64] Ductal                  Ductal                  Ductal                 
+ [67] Lobular                 Ductal                  Ductal                 
+ [70] Ductal                  Ductal                  Ductal                 
+ [73] Both Ductal and Lobular Ductal                  Ductal                 
+ [76] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
+ [79] Ductal                  Both Ductal and Lobular Ductal                 
+ [82] Ductal                  Ductal                  Ductal                 
+ [85] Ductal                  Ductal                  Ductal                 
+ [88] Ductal                  Ductal                  Ductal                 
+ [91] Both Ductal and Lobular Lobular                 Ductal                 
+ [94] Lobular                 Ductal                  Ductal                 
+ [97] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
+[100] Ductal                  Lobular                 Ductal                 
+[103] Lobular                 Ductal                  Ductal                 
+[106] Ductal                  Ductal                  Ductal                 
+[109] Both Ductal and Lobular
+Levels: Both Ductal and Lobular Ductal Lobular Other
+
+$`ctDNA Negative`
+  [1] Both Ductal and Lobular Ductal                  Ductal                 
+  [4] Ductal                  Ductal                  Lobular                
+  [7] Ductal                  Ductal                  Ductal                 
+ [10] Ductal                  Ductal                  Lobular                
+ [13] Ductal                  Ductal                  Ductal                 
+ [16] Ductal                  Ductal                  Ductal                 
+ [19] Ductal                  Ductal                  Ductal                 
+ [22] Ductal                  Ductal                  Ductal                 
+ [25] Ductal                  Ductal                  Ductal                 
+ [28] Ductal                  Ductal                  Ductal                 
+ [31] Lobular                 Ductal                  Ductal                 
+ [34] Ductal                  Ductal                  Ductal                 
+ [37] Ductal                  Ductal                  Ductal                 
+ [40] Other                   Lobular                 Ductal                 
+ [43] Ductal                  Lobular                 Lobular                
+ [46] Ductal                  Ductal                  Ductal                 
+ [49] Lobular                 Ductal                  Lobular                
+ [52] Ductal                  Ductal                  Ductal                 
+ [55] Ductal                  Other                   Lobular                
+ [58] Ductal                  Ductal                  Ductal                 
+ [61] Ductal                  Ductal                  Ductal                 
+ [64] Ductal                  Ductal                  Ductal                 
+ [67] Both Ductal and Lobular Ductal                  Ductal                 
+ [70] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
+ [73] Ductal                  Both Ductal and Lobular Ductal                 
+ [76] Ductal                  Ductal                  Ductal                 
+ [79] Ductal                  Ductal                  Ductal                 
+ [82] Ductal                  Ductal                  Ductal                 
+ [85] Both Ductal and Lobular Ductal                  Ductal                 
+ [88] Ductal                  Both Ductal and Lobular Ductal                 
+ [91] Both Ductal and Lobular Ductal                  Lobular                
+ [94] Ductal                  Lobular                 Ductal                 
+ [97] Ductal                  Ductal                  Ductal                 
+[100] Both Ductal and Lobular
+Levels: Both Ductal and Lobular Ductal Lobular Other
+
+$`ctDNA Positive`
+[1] Ductal  Ductal  Ductal  Ductal  Ductal  Lobular Lobular Lobular Ductal 
+Levels: Ductal Lobular
+
+$overall
+  [1] Radiation    No Radiation No Radiation No Radiation No Radiation
+  [6] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [11] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [16] No Radiation Radiation    Radiation    No Radiation Radiation   
+ [21] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [26] Radiation    No Radiation No Radiation Radiation    No Radiation
+ [31] Radiation    Radiation    No Radiation Radiation    Radiation   
+ [36] No Radiation No Radiation Radiation    No Radiation No Radiation
+ [41] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [46] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [51] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [56] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [61] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [66] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [71] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [76] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [81] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [86] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [91] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [96] No Radiation Radiation    Radiation    No Radiation No Radiation
+[101] No Radiation Radiation    Radiation    Radiation    No Radiation
+[106] No Radiation Radiation    No Radiation No Radiation
+attr(,"label")
+[1] Radiation
+Levels: No Radiation Radiation
+
+$`ctDNA Negative`
+  [1] Radiation    No Radiation No Radiation No Radiation No Radiation
+  [6] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [11] Radiation    Radiation    Radiation    No Radiation No Radiation
+ [16] Radiation    Radiation    No Radiation Radiation    Radiation   
+ [21] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [26] No Radiation No Radiation No Radiation Radiation    No Radiation
+ [31] Radiation    No Radiation No Radiation Radiation    No Radiation
+ [36] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [41] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [46] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [51] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [56] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [61] Radiation    Radiation    No Radiation Radiation    No Radiation
+ [66] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [71] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [76] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [81] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [86] Radiation    Radiation    No Radiation Radiation    Radiation   
+ [91] No Radiation No Radiation No Radiation Radiation    Radiation   
+ [96] Radiation    No Radiation Radiation    No Radiation No Radiation
+Levels: No Radiation Radiation
+
+$`ctDNA Positive`
+[1] Radiation    Radiation    Radiation    Radiation    Radiation   
+[6] Radiation    Radiation    Radiation    No Radiation
+Levels: No Radiation Radiation
+
+$overall
+  [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+  [9] Chemo    Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo   
+ [17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [33] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo    Chemo   
+ [41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [73] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [81] Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
+ [89] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [97] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[105] Chemo    Chemo    Chemo    Chemo    Chemo   
+attr(,"label")
+[1] Chemo
+Levels: No Chemo Chemo
+
+$`ctDNA Negative`
+  [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+  [9] Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo   
+ [17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [33] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [73] Chemo    Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo   
+ [81] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [89] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [97] Chemo    Chemo    Chemo    Chemo   
+Levels: No Chemo Chemo
+
+$`ctDNA Positive`
+[1] Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
+[9] Chemo   
+Levels: No Chemo Chemo
+
+$overall
+  [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+  [4] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+  [7] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [10] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [13] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [16] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [19] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [22] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [25] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [28] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [31] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [34] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [37] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [40] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [43] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [46] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [49] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [52] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [55] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [58] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [61] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [64] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [67] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [70] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [73] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [76] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [79] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [82] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [85] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [88] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [91] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [94] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [97] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[100] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[103] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[106] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[109] Endocrine Therapy   
+attr(,"label")
+[1] Endocrine Therapy
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$`ctDNA Negative`
+  [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+  [4] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+  [7] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [10] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [13] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [16] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [19] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [22] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [25] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [28] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [31] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [34] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [37] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [40] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [43] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [46] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [49] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [52] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [55] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [58] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [61] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [64] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [67] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [70] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [73] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [76] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [79] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [82] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [85] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [88] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [91] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [94] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [97] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[100] Endocrine Therapy   
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$`ctDNA Positive`
+[1] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[4] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[7] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$overall
+  [1] No Bone Modifying Treatment No Bone Modifying Treatment
+  [3] No Bone Modifying Treatment No Bone Modifying Treatment
+  [5] No Bone Modifying Treatment Bone Modifying Treatment   
+  [7] No Bone Modifying Treatment No Bone Modifying Treatment
+  [9] No Bone Modifying Treatment No Bone Modifying Treatment
+ [11] No Bone Modifying Treatment No Bone Modifying Treatment
+ [13] No Bone Modifying Treatment No Bone Modifying Treatment
+ [15] No Bone Modifying Treatment No Bone Modifying Treatment
+ [17] No Bone Modifying Treatment No Bone Modifying Treatment
+ [19] No Bone Modifying Treatment No Bone Modifying Treatment
+ [21] No Bone Modifying Treatment Bone Modifying Treatment   
+ [23] Bone Modifying Treatment    No Bone Modifying Treatment
+ [25] No Bone Modifying Treatment No Bone Modifying Treatment
+ [27] No Bone Modifying Treatment No Bone Modifying Treatment
+ [29] No Bone Modifying Treatment No Bone Modifying Treatment
+ [31] No Bone Modifying Treatment No Bone Modifying Treatment
+ [33] No Bone Modifying Treatment Bone Modifying Treatment   
+ [35] No Bone Modifying Treatment Bone Modifying Treatment   
+ [37] No Bone Modifying Treatment Bone Modifying Treatment   
+ [39] No Bone Modifying Treatment Bone Modifying Treatment   
+ [41] No Bone Modifying Treatment No Bone Modifying Treatment
+ [43] No Bone Modifying Treatment Bone Modifying Treatment   
+ [45] Bone Modifying Treatment    Bone Modifying Treatment   
+ [47] No Bone Modifying Treatment Bone Modifying Treatment   
+ [49] No Bone Modifying Treatment Bone Modifying Treatment   
+ [51] Bone Modifying Treatment    No Bone Modifying Treatment
+ [53] No Bone Modifying Treatment No Bone Modifying Treatment
+ [55] No Bone Modifying Treatment Bone Modifying Treatment   
+ [57] Bone Modifying Treatment    No Bone Modifying Treatment
+ [59] No Bone Modifying Treatment No Bone Modifying Treatment
+ [61] No Bone Modifying Treatment Bone Modifying Treatment   
+ [63] No Bone Modifying Treatment No Bone Modifying Treatment
+ [65] No Bone Modifying Treatment No Bone Modifying Treatment
+ [67] Bone Modifying Treatment    No Bone Modifying Treatment
+ [69] Bone Modifying Treatment    Bone Modifying Treatment   
+ [71] No Bone Modifying Treatment No Bone Modifying Treatment
+ [73] No Bone Modifying Treatment Bone Modifying Treatment   
+ [75] Bone Modifying Treatment    Bone Modifying Treatment   
+ [77] Bone Modifying Treatment    No Bone Modifying Treatment
+ [79] Bone Modifying Treatment    Bone Modifying Treatment   
+ [81] No Bone Modifying Treatment No Bone Modifying Treatment
+ [83] No Bone Modifying Treatment Bone Modifying Treatment   
+ [85] Bone Modifying Treatment    Bone Modifying Treatment   
+ [87] Bone Modifying Treatment    No Bone Modifying Treatment
+ [89] No Bone Modifying Treatment Bone Modifying Treatment   
+ [91] Bone Modifying Treatment    Bone Modifying Treatment   
+ [93] Bone Modifying Treatment    Bone Modifying Treatment   
+ [95] Bone Modifying Treatment    No Bone Modifying Treatment
+ [97] Bone Modifying Treatment    No Bone Modifying Treatment
+ [99] No Bone Modifying Treatment No Bone Modifying Treatment
+[101] Bone Modifying Treatment    No Bone Modifying Treatment
+[103] Bone Modifying Treatment    No Bone Modifying Treatment
+[105] Bone Modifying Treatment    No Bone Modifying Treatment
+[107] No Bone Modifying Treatment No Bone Modifying Treatment
+[109] No Bone Modifying Treatment
+attr(,"label")
+[1] Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$`ctDNA Negative`
+  [1] No Bone Modifying Treatment No Bone Modifying Treatment
+  [3] No Bone Modifying Treatment No Bone Modifying Treatment
+  [5] No Bone Modifying Treatment Bone Modifying Treatment   
+  [7] No Bone Modifying Treatment No Bone Modifying Treatment
+  [9] No Bone Modifying Treatment No Bone Modifying Treatment
+ [11] No Bone Modifying Treatment No Bone Modifying Treatment
+ [13] No Bone Modifying Treatment No Bone Modifying Treatment
+ [15] No Bone Modifying Treatment No Bone Modifying Treatment
+ [17] No Bone Modifying Treatment No Bone Modifying Treatment
+ [19] No Bone Modifying Treatment No Bone Modifying Treatment
+ [21] Bone Modifying Treatment    Bone Modifying Treatment   
+ [23] No Bone Modifying Treatment No Bone Modifying Treatment
+ [25] No Bone Modifying Treatment No Bone Modifying Treatment
+ [27] No Bone Modifying Treatment No Bone Modifying Treatment
+ [29] No Bone Modifying Treatment No Bone Modifying Treatment
+ [31] Bone Modifying Treatment    Bone Modifying Treatment   
+ [33] No Bone Modifying Treatment Bone Modifying Treatment   
+ [35] No Bone Modifying Treatment Bone Modifying Treatment   
+ [37] No Bone Modifying Treatment No Bone Modifying Treatment
+ [39] Bone Modifying Treatment    Bone Modifying Treatment   
+ [41] Bone Modifying Treatment    No Bone Modifying Treatment
+ [43] Bone Modifying Treatment    No Bone Modifying Treatment
+ [45] Bone Modifying Treatment    Bone Modifying Treatment   
+ [47] No Bone Modifying Treatment No Bone Modifying Treatment
+ [49] No Bone Modifying Treatment No Bone Modifying Treatment
+ [51] Bone Modifying Treatment    Bone Modifying Treatment   
+ [53] No Bone Modifying Treatment No Bone Modifying Treatment
+ [55] No Bone Modifying Treatment No Bone Modifying Treatment
+ [57] Bone Modifying Treatment    No Bone Modifying Treatment
+ [59] No Bone Modifying Treatment No Bone Modifying Treatment
+ [61] No Bone Modifying Treatment No Bone Modifying Treatment
+ [63] Bone Modifying Treatment    Bone Modifying Treatment   
+ [65] No Bone Modifying Treatment No Bone Modifying Treatment
+ [67] No Bone Modifying Treatment Bone Modifying Treatment   
+ [69] Bone Modifying Treatment    Bone Modifying Treatment   
+ [71] Bone Modifying Treatment    No Bone Modifying Treatment
+ [73] Bone Modifying Treatment    Bone Modifying Treatment   
+ [75] No Bone Modifying Treatment No Bone Modifying Treatment
+ [77] No Bone Modifying Treatment Bone Modifying Treatment   
+ [79] Bone Modifying Treatment    Bone Modifying Treatment   
+ [81] Bone Modifying Treatment    No Bone Modifying Treatment
+ [83] No Bone Modifying Treatment Bone Modifying Treatment   
+ [85] Bone Modifying Treatment    Bone Modifying Treatment   
+ [87] Bone Modifying Treatment    No Bone Modifying Treatment
+ [89] Bone Modifying Treatment    No Bone Modifying Treatment
+ [91] No Bone Modifying Treatment No Bone Modifying Treatment
+ [93] Bone Modifying Treatment    No Bone Modifying Treatment
+ [95] Bone Modifying Treatment    No Bone Modifying Treatment
+ [97] No Bone Modifying Treatment No Bone Modifying Treatment
+ [99] No Bone Modifying Treatment No Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$`ctDNA Positive`
+[1] No Bone Modifying Treatment No Bone Modifying Treatment
+[3] No Bone Modifying Treatment No Bone Modifying Treatment
+[5] No Bone Modifying Treatment Bone Modifying Treatment   
+[7] Bone Modifying Treatment    Bone Modifying Treatment   
+[9] Bone Modifying Treatment   
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$overall
+  [1] Node Positive Node Negative Node Positive Node Negative Node Negative
+  [6] Node Positive Node Positive Node Positive Node Negative Node Negative
+ [11] Node Negative Node Positive Node Negative Node Positive Node Negative
+ [16] Node Positive Node Negative Node Positive Node Negative Node Negative
+ [21] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [26] Node Negative Node Negative Node Negative Node Positive Node Negative
+ [31] Node Negative Node Positive Node Negative Node Negative Node Positive
+ [36] Node Negative Node Negative Node Positive Node Negative Node Negative
+ [41] Node Positive Node Positive Node Negative Node Negative Node Negative
+ [46] Node Positive Node Positive Node Positive Node Positive Node Negative
+ [51] Node Positive Node Positive Node Negative Node Positive Node Negative
+ [56] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [61] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [66] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [71] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [76] Node Positive Node Positive Node Negative Node Negative Node Positive
+ [81] Node Negative Node Positive Node Positive Node Positive Node Negative
+ [86] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [91] Node Positive Node Positive Node Positive Node Positive Node Positive
+ [96] Node Negative Node Positive Node Positive Node Positive Node Negative
+[101] Node Positive Node Positive Node Positive Node Negative Node Negative
+[106] Node Positive Node Positive Node Negative Node Negative
+Levels: Node Negative Node Positive
+
+$`ctDNA Negative`
+  [1] Node Positive Node Negative Node Positive Node Negative Node Negative
+  [6] Node Positive Node Positive Node Negative Node Negative Node Negative
+ [11] Node Positive Node Negative Node Positive Node Negative Node Positive
+ [16] Node Negative Node Positive Node Negative Node Negative Node Negative
+ [21] Node Positive Node Positive Node Positive Node Positive Node Negative
+ [26] Node Negative Node Negative Node Negative Node Positive Node Negative
+ [31] Node Negative Node Negative Node Negative Node Positive Node Negative
+ [36] Node Negative Node Positive Node Positive Node Negative Node Negative
+ [41] Node Positive Node Positive Node Positive Node Positive Node Negative
+ [46] Node Positive Node Positive Node Negative Node Positive Node Negative
+ [51] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [56] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [61] Node Negative Node Positive Node Positive Node Positive Node Negative
+ [66] Node Positive Node Positive Node Positive Node Positive Node Positive
+ [71] Node Positive Node Negative Node Negative Node Positive Node Negative
+ [76] Node Positive Node Positive Node Positive Node Negative Node Negative
+ [81] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [86] Node Positive Node Positive Node Negative Node Positive Node Positive
+ [91] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [96] Node Negative Node Positive Node Positive Node Negative Node Negative
+Levels: Node Negative Node Positive
+
+$`ctDNA Positive`
+[1] Node Positive Node Positive Node Negative Node Positive Node Negative
+[6] Node Positive Node Positive Node Positive Node Negative
+Levels: Node Negative Node Positive
+
+$overall
+  [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+  [4] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+  [7] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [10] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [13] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [16] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [19] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [22] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [25] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [28] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [31] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [34] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [37] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [40] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [43] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [46] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [49] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [52] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [55] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [58] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [61] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [64] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [67] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [70] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [73] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [76] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [79] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [82] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [85] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [88] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [91] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [94] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [97] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[100] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+[103] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[106] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[109] No Axillary Dissection
+attr(,"label")
+[1] Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$`ctDNA Negative`
+  [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+  [4] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+  [7] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [10] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [13] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [16] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [19] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [22] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [25] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [28] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [31] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [34] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [37] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [40] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [43] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [46] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [49] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [52] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [55] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [58] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [61] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [64] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [67] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [70] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [73] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [76] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [79] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [82] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [85] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [88] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [91] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [94] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [97] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[100] No Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$`ctDNA Positive`
+[1] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[4] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[7] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$overall
+  [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+  [7] Lumpectomy Mastectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy
+ [13] Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [19] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [25] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [31] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [37] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [43] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+ [49] Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [55] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [61] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [67] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [73] Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [79] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [85] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [91] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [97] Mastectomy Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy
+[103] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+[109] Mastectomy
+attr(,"label")
+[1] Surgery Type
+Levels: Lumpectomy Mastectomy
+
+$`ctDNA Negative`
+  [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+  [7] Mastectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy
+ [13] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [19] Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy
+ [25] Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Mastectomy
+ [31] Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [37] Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [43] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [49] Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [55] Mastectomy Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy
+ [61] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [67] Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [73] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [79] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [85] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Lumpectomy
+ [91] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy
+ [97] Mastectomy Mastectomy Mastectomy Mastectomy
+Levels: Lumpectomy Mastectomy
+
+$`ctDNA Positive`
+[1] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Mastectomy
+[8] Mastectomy Mastectomy
+Levels: Lumpectomy Mastectomy
+
+$overall
+  [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+  [4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+  [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [16] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [25] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [28] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [34] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [37] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [40] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [43] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [49] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [52] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [55] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [58] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [61] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [64] Neoadjuvant Chemo    Neoadjuvant Chemo    Neoadjuvant Chemo   
+ [67] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [70] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [73] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [76] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [79] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [82] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [85] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [88] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [91] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [94] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [97] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[100] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+[103] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[106] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[109] No Neoadjuvant Chemo
+attr(,"label")
+[1] Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$`ctDNA Negative`
+  [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+  [4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+  [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [16] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [25] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [28] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [31] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [34] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [37] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [40] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [43] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [49] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [52] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [55] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [58] No Neoadjuvant Chemo Neoadjuvant Chemo    Neoadjuvant Chemo   
+ [61] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [64] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [67] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [70] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [73] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [76] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [79] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [82] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [85] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [88] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [91] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [94] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [97] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[100] No Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$`ctDNA Positive`
+[1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$overall
+  [1] <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [10] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [19] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [28] Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    Non-pCR
+ [37] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>   
+ [46] <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>   
+ [55] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [64] Non-pCR Non-pCR Non-pCR <NA>    <NA>    <NA>    Non-pCR <NA>    Non-pCR
+ [73] <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [82] Non-pCR Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR
+ [91] <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+[100] pCR     <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>   
+[109] <NA>   
+attr(,"label")
+[1] Pathologic Complete Response
+Levels: pCR Non-pCR
+
+$`ctDNA Negative`
+  [1] <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [10] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [19] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR
+ [28] <NA>    <NA>    <NA>    Non-pCR Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [37] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>   
+ [46] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [55] <NA>    <NA>    <NA>    <NA>    Non-pCR Non-pCR Non-pCR <NA>    <NA>   
+ [64] Non-pCR <NA>    Non-pCR <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>   
+ [73] <NA>    <NA>    <NA>    Non-pCR Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [82] <NA>    <NA>    Non-pCR <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [91] <NA>    pCR     <NA>    Non-pCR <NA>    <NA>    <NA>    Non-pCR <NA>   
+[100] <NA>   
+Levels: pCR Non-pCR
+
+$`ctDNA Positive`
+[1] <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
+Levels: pCR Non-pCR
+
+
table1_p #we have p-values!  
+
+
+ +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Total
+(N=109)
ctDNA Negative
+(N=100)
ctDNA Positive
+(N=9)
P-value
Age at Diagnosis (years)0.118
Mean (SD)49.7 (9.66)49.2 (9.64)54.6 (8.94)
Median [Min, Max]49.3 [27.3, 68.9]49.0 [27.3, 68.9]55.6 [38.6, 64.4]
Final Receptor Group0.0891
TNBC45 (41.3%)44 (44.0%)1 (11.1%)
HR+ HER2-52 (47.7%)45 (45.0%)7 (77.8%)
HR+ HER2+8 (7.3%)8 (8.0%)0 (0%)
HR- HER2+4 (3.7%)3 (3.0%)1 (11.1%)
Race0.594
Black9 (8.3%)8 (8.0%)1 (11.1%)
Asian1 (0.9%)1 (1.0%)0 (0%)
White99 (90.8%)91 (91.0%)8 (88.9%)
Tumor Grade0.0366
Grade 379 (72.5%)75 (75.0%)4 (44.4%)
Grade 122 (20.2%)17 (17.0%)5 (55.6%)
Grade 26 (5.5%)6 (6.0%)0 (0%)
Missing2 (1.8%)2 (2.0%)0 (0%)
Overall Stage0.00814
Stage I35 (32.1%)33 (33.0%)2 (22.2%)
Stage II47 (43.1%)46 (46.0%)1 (11.1%)
Stage III26 (23.9%)20 (20.0%)6 (66.7%)
Missing1 (0.9%)1 (1.0%)0 (0%)
T Stage0.119
T151 (46.8%)48 (48.0%)3 (33.3%)
T244 (40.4%)40 (40.0%)4 (44.4%)
T312 (11.0%)11 (11.0%)1 (11.1%)
T41 (0.9%)0 (0%)1 (11.1%)
Missing1 (0.9%)1 (1.0%)0 (0%)
N Stage<0.001
N046 (42.2%)43 (43.0%)3 (33.3%)
N143 (39.4%)43 (43.0%)0 (0%)
N213 (11.9%)8 (8.0%)5 (55.6%)
N37 (6.4%)6 (6.0%)1 (11.1%)
Histology Category0.284
Both Ductal and Lobular9 (8.3%)9 (9.0%)0 (0%)
Ductal84 (77.1%)78 (78.0%)6 (66.7%)
Lobular14 (12.8%)11 (11.0%)3 (33.3%)
Other2 (1.8%)2 (2.0%)0 (0%)
Radiation0.268
No Radiation34 (31.2%)33 (33.0%)1 (11.1%)
Radiation75 (68.8%)67 (67.0%)8 (88.9%)
Chemo0.23
No Chemo3 (2.8%)2 (2.0%)1 (11.1%)
Chemo106 (97.2%)98 (98.0%)8 (88.9%)
Endocrine Therapy0.295
No Endocrine Therapy47 (43.1%)45 (45.0%)2 (22.2%)
Endocrine Therapy62 (56.9%)55 (55.0%)7 (77.8%)
Bone Modifying Treatment0.719
No Bone Modifying Treatment70 (64.2%)65 (65.0%)5 (55.6%)
Bone Modifying Treatment39 (35.8%)35 (35.0%)4 (44.4%)
Node Status0.731
Node Negative46 (42.2%)43 (43.0%)3 (33.3%)
Node Positive63 (57.8%)57 (57.0%)6 (66.7%)
Axillary Dissection0.161
No Axillary Dissection54 (49.5%)52 (52.0%)2 (22.2%)
Axillary Dissection55 (50.5%)48 (48.0%)7 (77.8%)
Surgery Type1
Lumpectomy45 (41.3%)41 (41.0%)4 (44.4%)
Mastectomy64 (58.7%)59 (59.0%)5 (55.6%)
Neoadjuvant Chemo1
No Neoadjuvant Chemo90 (82.6%)82 (82.0%)8 (88.9%)
Neoadjuvant Chemo19 (17.4%)18 (18.0%)1 (11.1%)
Pathologic Complete Response1
pCR1 (0.9%)1 (1.0%)0 (0%)
Non-pCR18 (16.5%)17 (17.0%)1 (11.1%)
Missing90 (82.6%)82 (82.0%)8 (88.9%)
+ +
+
+
+

We can see in this Table 1 by ctDNA status, including tests of association, that the following variables have significant (p<0.05) associations: Tumor Grade (higher grade associated with positivity), overall stage (higher stage associated with positivity), N-stage (with higher N-stage seemingly associated with positivity), with trends towards significance (approaching a significant p-value) for receptor status and age at diagnosis.

+
+
+
+

4.2 Table of demographics and clinical factors by DTC status

+

Next we will create a Table to look at demographic and clinical factors by DTC status, including tests of association.

+
+
####### Table of clinical and demographic factors by DTC status ######### 
+
+# Prepare the dataset
+dtc_unique_subset_data <- subset_data |>
+  mutate(
+    # Replace "Missing" and 99 with NA in relevant columns
+    final_t_stage = na_if(as.character(final_t_stage), "Missing"),
+    final_t_stage = na_if(final_t_stage, "99"),
+    final_overall_stage = na_if(as.character(final_overall_stage), "Missing"),
+    final_overall_stage = na_if(final_overall_stage, "99"),
+    final_tumor_grade = na_if(final_tumor_grade, 3), # Assumes 3 means "Not Reported"
+    diag_pcr_1 = na_if(diag_pcr_1, "."),
+    # Replace 99 with NA in all numeric columns
+    across(where(is.numeric), ~ na_if(.x, 99))
+  ) |>
+  group_by(participant_id) |>
+  summarize(
+    # Summarize unique participant-level data
+    age_at_diag = first(na.omit(age_at_diag)),
+    final_receptor_group = first(na.omit(final_receptor_group)),
+    demo_race_final = first(na.omit(demo_race_final)),
+    final_tumor_grade = first(na.omit(final_tumor_grade)),
+    final_overall_stage = first(na.omit(final_overall_stage)),
+    final_t_stage = first(na.omit(final_t_stage)),
+    final_n_stage = first(na.omit(final_n_stage)),
+    histology_category = first(na.omit(histology_category)),
+    prtx_radiation = first(na.omit(prtx_radiation)),
+    prtx_chemo = first(na.omit(prtx_chemo)),
+    prtx_endo = first(na.omit(prtx_endo)),
+    prtx_bonemod = first(na.omit(prtx_bonemod)),
+    node_status = first(na.omit(node_status)),
+    axillary_dissection = first(na.omit(axillary_dissection)),
+    diag_surgery_type_1 = first(na.omit(diag_surgery_type_1)),
+    diag_neoadj_chemo_1 = first(na.omit(diag_neoadj_chemo_1)),
+    diag_pcr_1 = first(na.omit(diag_pcr_1)),
+    ctDNA_ever = first(na.omit(ctDNA_ever)),
+    dtc_ever = first(na.omit(dtc_ever))
+  )
+
+# Convert variables to labeled factors for table output
+dtc_unique_subset_data <- dtc_unique_subset_data |>
+  mutate(
+    final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4),
+                                  labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+")),
+    race = factor(demo_race_final, levels = c(1, 3, 5),
+                             labels = c("Black", "Asian", "White")),
+    final_tumor_grade = factor(final_tumor_grade, levels = c(0, 1, 2),
+                               labels = c("Grade 3", "Grade 1", "Grade 2")),
+    final_overall_stage = factor(final_overall_stage, levels = c(1, 2, 3),
+                                 labels = c("Stage I", "Stage II", "Stage III")),
+    final_t_stage = factor(final_t_stage, levels = c(1, 2, 3, 4),
+                           labels = c("T1", "T2", "T3", "T4")),
+    final_n_stage = factor(final_n_stage, levels = c(0, 1, 2, 3),
+                           labels = c("N0", "N1", "N2", "N3")),
+    prtx_radiation = factor(prtx_radiation, levels = c(0, 1),
+                            labels = c("No Radiation", "Radiation")),
+    prtx_chemo = factor(prtx_chemo, levels = c(0, 1),
+                        labels = c("No Chemo", "Chemo")),
+    prtx_endo = factor(prtx_endo, levels = c(0, 1),
+                       labels = c("No Endocrine Therapy", "Endocrine Therapy")),
+    prtx_bonemod = factor(prtx_bonemod, levels = c(0, 1),
+                          labels = c("No Bone Modifying Treatment", "Bone Modifying Treatment")),
+    axillary_dissection = factor(axillary_dissection, levels = c(0, 1),
+                                 labels = c("No Axillary Dissection", "Axillary Dissection")),
+    diag_surgery_type_1 = factor(diag_surgery_type_1, levels = c(1, 2),
+                                 labels = c("Lumpectomy", "Mastectomy")),
+    diag_neoadj_chemo_1 = factor(diag_neoadj_chemo_1, levels = c(0, 1),
+                                 labels = c("No Neoadjuvant Chemo", "Neoadjuvant Chemo")),
+    diag_pcr_1 = factor(diag_pcr_1, levels = c(1, 2),
+                        labels = c("pCR", "Non-pCR")),
+    ctDNA_ever = factor(ctDNA_ever, levels = c("FALSE", "TRUE"),
+                        labels = c("ctDNA Negative", "ctDNA Positive")),
+    dtc_ever = factor(dtc_ever, levels = c(0, 1),
+                      labels = c("DTC Negative", "DTC Positive"))
+  )
+
+#### Labels 
+
+label(dtc_unique_subset_data$age_at_diag) <- "Age at Diagnosis"
+units(dtc_unique_subset_data$age_at_diag)       <- "years"
+
+# assign `final_receptor_group` labels to `dc_unique_subset_data`
+label(dtc_unique_subset_data$final_receptor_group)       <- "Final Receptor Group"
+
+##demo_race_final 
+label(dtc_unique_subset_data$demo_race_final)  <- "Race"
+
+
+#final_tumor_grade 
+
+label(dtc_unique_subset_data$final_tumor_grade)  <- "Tumor Grade"
+
+
+#final_overall_stage
+
+label(dtc_unique_subset_data$final_overall_stage)  <- "Overall Stage"
+
+#final_t_stage
+label(dtc_unique_subset_data$final_t_stage)  <- "T Stage"
+
+
+#final_n_stage 
+label(dtc_unique_subset_data$final_n_stage)  <- "N Stage"
+
+#histology_category
+
+
+label(dtc_unique_subset_data$histology_category)  <- "Histology Category"
+
+
+#prtx_radiation 
+
+label(dtc_unique_subset_data$prtx_radiation)  <- "Radiation"
+
+#prtx_chemo
+label(dtc_unique_subset_data$prtx_chemo)  <- "Chemo"
+
+#prtx_endo
+label(dtc_unique_subset_data$prtx_endo)  <- "Endocrine Therapy"
+
+#prtx_bonemod 
+label(dtc_unique_subset_data$prtx_bonemod)  <- "Bone Modifying Treatment"
+
+#node_status 
+label(dtc_unique_subset_data$node_status)  <- "Node Status"
+
+#axillary_dissection 
+label(dtc_unique_subset_data$axillary_dissection)  <- "Axillary Dissection"
+
+#diag_surgery_type_1
+label(dtc_unique_subset_data$diag_surgery_type_1)  <- "Surgery Type"
+
+#diag_neoadj_chemo_1 
+
+label(dtc_unique_subset_data$diag_neoadj_chemo_1)  <- "Neoadjuvant Chemo"
+
+#pCR 
+
+label(dtc_unique_subset_data$diag_pcr_1)  <- "Pathologic Complete Response"
+
+
+#DTC_ever 
+label(dtc_unique_subset_data$ctDNA_ever)  <- "DTC Status"
+
+
+####
+
+# Step 1: Create table1 output
+table1_output <- table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 | 
+    dtc_ever,
+  data = dtc_unique_subset_data
+)
+
+table1_output
+
+
+ ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DTC Negative
+(N=70)
DTC Positive
+(N=39)
Overall
+(N=109)
Age at Diagnosis (years)
Mean (SD)49.9 (9.74)49.2 (9.63)49.7 (9.66)
Median [Min, Max]51.6 [27.3, 68.9]48.8 [30.7, 67.7]49.3 [27.3, 68.9]
Final Receptor Group
TNBC25 (35.7%)20 (51.3%)45 (41.3%)
HR+ HER2-37 (52.9%)15 (38.5%)52 (47.7%)
HR+ HER2+4 (5.7%)4 (10.3%)8 (7.3%)
HR- HER2+4 (5.7%)0 (0%)4 (3.7%)
Race
Mean (SD)4.69 (1.06)4.59 (1.23)4.65 (1.12)
Median [Min, Max]5.00 [1.00, 5.00]5.00 [1.00, 5.00]5.00 [1.00, 5.00]
Tumor Grade
Grade 346 (65.7%)33 (84.6%)79 (72.5%)
Grade 118 (25.7%)4 (10.3%)22 (20.2%)
Grade 24 (5.7%)2 (5.1%)6 (5.5%)
Missing2 (2.9%)0 (0%)2 (1.8%)
Overall Stage
Stage I22 (31.4%)13 (33.3%)35 (32.1%)
Stage II29 (41.4%)18 (46.2%)47 (43.1%)
Stage III18 (25.7%)8 (20.5%)26 (23.9%)
Missing1 (1.4%)0 (0%)1 (0.9%)
T Stage
T134 (48.6%)17 (43.6%)51 (46.8%)
T227 (38.6%)17 (43.6%)44 (40.4%)
T38 (11.4%)4 (10.3%)12 (11.0%)
T40 (0%)1 (2.6%)1 (0.9%)
Missing1 (1.4%)0 (0%)1 (0.9%)
N Stage
N024 (34.3%)22 (56.4%)46 (42.2%)
N132 (45.7%)11 (28.2%)43 (39.4%)
N210 (14.3%)3 (7.7%)13 (11.9%)
N34 (5.7%)3 (7.7%)7 (6.4%)
Histology Category
Both Ductal and Lobular9 (12.9%)0 (0%)9 (8.3%)
Ductal48 (68.6%)36 (92.3%)84 (77.1%)
Lobular11 (15.7%)3 (7.7%)14 (12.8%)
Other2 (2.9%)0 (0%)2 (1.8%)
Radiation
No Radiation23 (32.9%)11 (28.2%)34 (31.2%)
Radiation47 (67.1%)28 (71.8%)75 (68.8%)
Chemo
No Chemo1 (1.4%)2 (5.1%)3 (2.8%)
Chemo69 (98.6%)37 (94.9%)106 (97.2%)
Endocrine Therapy
No Endocrine Therapy28 (40.0%)19 (48.7%)47 (43.1%)
Endocrine Therapy42 (60.0%)20 (51.3%)62 (56.9%)
Bone Modifying Treatment
No Bone Modifying Treatment45 (64.3%)25 (64.1%)70 (64.2%)
Bone Modifying Treatment25 (35.7%)14 (35.9%)39 (35.8%)
Node Status
Node Negative24 (34.3%)22 (56.4%)46 (42.2%)
Node Positive46 (65.7%)17 (43.6%)63 (57.8%)
Axillary Dissection
No Axillary Dissection31 (44.3%)23 (59.0%)54 (49.5%)
Axillary Dissection39 (55.7%)16 (41.0%)55 (50.5%)
Surgery Type
Lumpectomy31 (44.3%)14 (35.9%)45 (41.3%)
Mastectomy39 (55.7%)25 (64.1%)64 (58.7%)
Neoadjuvant Chemo
No Neoadjuvant Chemo60 (85.7%)30 (76.9%)90 (82.6%)
Neoadjuvant Chemo10 (14.3%)9 (23.1%)19 (17.4%)
+ +
+
+
####
+pvalue_function <- function(x, ...) {
+  print(x)
+  # Remove any "overall" group if present and focus only on ctDNA+ and ctDNA- comparisons
+  x <- x[!names(x) %in% "overall"]  # Filter out the "Overall" column
+  y <- unlist(x)
+  g <- factor(rep(1:length(x), times = sapply(x, length)))
+  
+  # Debugging information to check group levels and data
+  if (length(unique(g)) != 2) {
+    return(NA)  # Return NA if not comparing exactly two groups
+  }
+
+  # Perform the appropriate test based on the type of variable
+  if (is.numeric(y)) {
+    # For continuous variables, perform a t-test
+    p <- t.test(y ~ g)$p.value
+  } else {
+    # For categorical variables, perform a chi-squared test or Fisher's test
+    table_result <- table(y, g)
+    
+    # Choose the correct test based on cell counts
+    if (any(table_result < 5)) {
+      p <- fisher.test(table_result)$p.value  # Use Fisher's test for low counts
+    } else {
+      p <- chisq.test(table_result)$p.value  # Use chi-squared test otherwise
+    }
+  }
+  
+  # Format the p-value for output
+  formatted_p <- format.pval(p, digits = 3, eps = 0.001)
+  return(formatted_p)
+}
+  
+
+# Generate table1 with the p-value column
+table1_dtc <- table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 | 
+    dtc_ever,
+  data = dtc_unique_subset_data,
+  overall = c(left = "Total"),
+  extra.col = list("P-value" = pvalue_function),  # Add p-value function
+  extra.col.pos = 4  # Position of the extra column
+)
+
+
$overall
+  [1] 55.89870 49.25667 52.87611 29.93840 37.00753 48.98563 63.80835 40.89802
+  [9] 43.59754 38.57632 41.77687 45.68925 59.94524 59.43600 52.14511 42.93771
+ [17] 64.69541 55.14031 41.26762 39.52361 57.76044 44.42984 51.34565 42.27789
+ [25] 57.05133 57.62628 54.86927 44.18891 63.62491 36.00548 55.57837 30.71595
+ [33] 41.28953 59.38946 48.79945 59.15400 48.97194 59.39767 39.67967 67.68515
+ [41] 41.84531 48.16975 58.07529 62.49966 46.64476 47.34565 52.09856 36.58042
+ [49] 58.26146 61.76318 61.73580 39.40862 55.30459 53.10335 43.30459 48.46270
+ [57] 44.07666 52.55305 56.45996 67.72621 39.59206 51.82752 58.28611 46.93498
+ [65] 31.17591 55.96441 46.38741 46.33812 40.62971 37.67556 32.35318 48.75291
+ [73] 56.22177 39.41136 49.76591 43.22245 36.01095 41.30322 59.57016 39.65503
+ [81] 54.94593 43.50992 48.80767 62.10541 63.35934 57.31417 59.74264 66.92676
+ [89] 36.30938 34.83641 55.12115 52.07118 27.33744 64.41342 56.09035 47.90691
+ [97] 51.38125 41.71663 48.47639 40.52567 60.39151 52.51198 60.87064 58.61465
+[105] 38.60370 68.93634 37.84531 51.43874 52.68720
+attr(,"label")
+[1] "Age at Diagnosis"
+attr(,"units")
+[1] "years"
+
+$`DTC Negative`
+ [1] 55.89870 49.25667 52.87611 29.93840 48.98563 63.80835 43.59754 38.57632
+ [9] 45.68925 59.94524 59.43600 52.14511 55.14031 39.52361 54.86927 44.18891
+[17] 63.62491 41.28953 48.97194 59.39767 39.67967 41.84531 48.16975 46.64476
+[25] 52.09856 36.58042 58.26146 61.76318 61.73580 39.40862 53.10335 56.45996
+[33] 67.72621 39.59206 51.82752 58.28611 55.96441 46.38741 46.33812 40.62971
+[41] 37.67556 32.35318 56.22177 49.76591 43.22245 36.01095 39.65503 54.94593
+[49] 43.50992 57.31417 59.74264 66.92676 36.30938 34.83641 55.12115 52.07118
+[57] 27.33744 64.41342 56.09035 47.90691 51.38125 41.71663 48.47639 60.39151
+[65] 52.51198 60.87064 58.61465 68.93634 37.84531 52.68720
+
+$`DTC Positive`
+ [1] 37.00753 40.89802 41.77687 42.93771 64.69541 41.26762 57.76044 44.42984
+ [9] 51.34565 42.27789 57.05133 57.62628 36.00548 55.57837 30.71595 59.38946
+[17] 48.79945 59.15400 67.68515 58.07529 62.49966 47.34565 55.30459 43.30459
+[25] 48.46270 44.07666 52.55305 46.93498 31.17591 48.75291 39.41136 41.30322
+[33] 59.57016 48.80767 62.10541 63.35934 40.52567 38.60370 51.43874
+
+$overall
+  [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR- HER2+
+  [8] HR+ HER2+ TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2+
+ [15] TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC     
+ [22] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
+ [29] HR+ HER2- TNBC      TNBC      HR+ HER2+ TNBC      HR+ HER2- HR+ HER2-
+ [36] HR+ HER2+ TNBC      HR+ HER2- TNBC      HR+ HER2- TNBC      TNBC     
+ [43] HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2-
+ [50] HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2- TNBC      HR+ HER2-
+ [57] TNBC      TNBC      TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [64] TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2-
+ [71] TNBC      HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2+ HR+ HER2-
+ [78] TNBC      HR+ HER2- HR+ HER2- TNBC      HR- HER2+ TNBC      HR+ HER2-
+ [85] HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [92] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC     
+ [99] HR+ HER2- TNBC      HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2-
+[106] HR+ HER2- HR- HER2+ TNBC      HR+ HER2-
+attr(,"label")
+[1] Final Receptor Group
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$`DTC Negative`
+ [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2- HR- HER2+ TNBC     
+ [8] TNBC      HR+ HER2- HR+ HER2- HR+ HER2+ TNBC      TNBC      TNBC     
+[15] TNBC      TNBC      HR+ HER2- TNBC      TNBC      HR+ HER2- TNBC     
+[22] TNBC      TNBC      HR+ HER2+ HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2-
+[29] TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2-
+[36] HR+ HER2- TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC     
+[43] HR+ HER2- TNBC      HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      HR- HER2+
+[50] HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2-
+[57] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2-
+[64] HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2- HR- HER2+ HR+ HER2-
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$`DTC Positive`
+ [1] TNBC      HR+ HER2+ TNBC      TNBC      HR+ HER2- TNBC      TNBC     
+ [8] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
+[15] HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2-
+[22] HR+ HER2- TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC     
+[29] TNBC      HR+ HER2- HR+ HER2- TNBC      HR+ HER2- TNBC      HR+ HER2-
+[36] HR+ HER2+ TNBC      HR+ HER2- TNBC     
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$overall
+  [1] 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 1 5 5 5 5 5 5 1 5 5 5 5
+ [38] 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5
+ [75] 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 3 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5
+attr(,"label")
+[1] "Race"
+
+$`DTC Negative`
+ [1] 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 5
+[39] 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 3 5 5 5 5 1 5 5 5 5 5 5 5 5
+
+$`DTC Positive`
+ [1] 5 5 5 5 5 1 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 5 5 1 5 5 1 5 5 5 5 5 5 5 5 5
+[39] 5
+
+$overall
+  [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [10] Grade 3 Grade 3 Grade 3 <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3
+ [19] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [28] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3
+ [37] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 2
+ [46] Grade 1 Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 <NA>   
+ [55] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+ [64] Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
+ [73] Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [82] Grade 3 Grade 3 Grade 2 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+ [91] Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+[100] Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+[109] Grade 3
+attr(,"label")
+[1] Tumor Grade
+Levels: Grade 3 Grade 1 Grade 2
+
+$`DTC Negative`
+ [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+[10] <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+[19] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 2 Grade 3 Grade 1 Grade 1
+[28] Grade 1 Grade 3 Grade 3 <NA>    Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+[37] Grade 3 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3
+[46] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+[55] Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+[64] Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
+Levels: Grade 3 Grade 1 Grade 2
+
+$`DTC Positive`
+ [1] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+[10] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3
+[19] Grade 3 Grade 3 Grade 2 Grade 1 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3
+[28] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 3
+[37] Grade 3 Grade 3 Grade 3
+Levels: Grade 3 Grade 1 Grade 2
+
+$overall
+  [1] Stage III Stage I   Stage III Stage II  Stage II  Stage III Stage III
+  [8] Stage III Stage II  Stage I   Stage II  Stage II  Stage II  Stage II 
+ [15] Stage I   Stage II  Stage I   Stage II  Stage II  Stage I   Stage II 
+ [22] Stage III Stage III Stage II  Stage II  Stage II  Stage I   Stage II 
+ [29] Stage III Stage II  Stage I   Stage II  Stage I   Stage II  Stage III
+ [36] Stage II  Stage II  Stage III Stage I   Stage I   Stage II  Stage III
+ [43] Stage II  Stage I   Stage I   Stage III Stage II  Stage III Stage II 
+ [50] Stage I   Stage II  Stage II  Stage I   Stage II  Stage I   Stage III
+ [57] Stage I   Stage II  Stage I   Stage II  Stage II  Stage III Stage I  
+ [64] Stage II  Stage II  Stage II  Stage III Stage II  Stage I   Stage II 
+ [71] Stage I   Stage II  Stage I   Stage III Stage III Stage I   Stage III
+ [78] Stage I   Stage I   Stage I   Stage I   Stage II  Stage III Stage I  
+ [85] Stage II  Stage I   Stage III Stage I   Stage II  Stage II  Stage II 
+ [92] Stage III Stage II  Stage III Stage III Stage I   Stage II  Stage II 
+ [99] <NA>      Stage I   Stage I   Stage III Stage III Stage I   Stage I  
+[106] Stage II  Stage II  Stage I   Stage II 
+attr(,"label")
+[1] Overall Stage
+Levels: Stage I Stage II Stage III
+
+$`DTC Negative`
+ [1] Stage III Stage I   Stage III Stage II  Stage III Stage III Stage II 
+ [8] Stage I   Stage II  Stage II  Stage II  Stage I   Stage II  Stage I  
+[15] Stage I   Stage II  Stage III Stage I   Stage II  Stage III Stage I  
+[22] Stage II  Stage III Stage I   Stage II  Stage III Stage II  Stage I  
+[29] Stage II  Stage II  Stage II  Stage I   Stage II  Stage II  Stage III
+[36] Stage I   Stage II  Stage III Stage II  Stage I   Stage II  Stage I  
+[43] Stage I   Stage III Stage I   Stage III Stage I   Stage I   Stage II 
+[50] Stage I   Stage III Stage I   Stage II  Stage II  Stage II  Stage III
+[57] Stage II  Stage III Stage III Stage I   Stage II  Stage II  <NA>     
+[64] Stage I   Stage III Stage III Stage I   Stage II  Stage II  Stage II 
+Levels: Stage I Stage II Stage III
+
+$`DTC Positive`
+ [1] Stage II  Stage III Stage II  Stage II  Stage I   Stage II  Stage II 
+ [8] Stage III Stage III Stage II  Stage II  Stage II  Stage II  Stage I  
+[15] Stage II  Stage II  Stage III Stage II  Stage I   Stage II  Stage I  
+[22] Stage III Stage I   Stage I   Stage III Stage I   Stage II  Stage II 
+[29] Stage II  Stage II  Stage III Stage I   Stage I   Stage III Stage I  
+[36] Stage II  Stage I   Stage I   Stage I  
+Levels: Stage I Stage II Stage III
+
+$overall
+  [1] T2   T1   T2   T2   T2   T3   T2   T1   T2   T1   T2   T2   T3   T1   T1  
+ [16] T2   T1   T1   T2   T1   T2   T2   T2   T2   T1   T2   T1   <NA> T2   T2  
+ [31] T1   T1   T1   T3   T4   T2   T2   T1   T1   T1   T2   T3   T2   T1   T1  
+ [46] T3   T1   T1   T2   T1   T2   T2   T1   T1   T1   T3   T1   T1   T1   T2  
+ [61] T2   T3   T1   T2   T2   T2   T1   T1   T1   T2   T1   T2   T1   T3   T3  
+ [76] T1   T2   T1   T1   T1   T1   T2   T2   T1   T2   T1   T1   T1   T2   T2  
+ [91] T2   T3   T2   T2   T1   T1   T2   T1   T1   T1   T1   T3   T3   T1   T1  
+[106] T2   T2   T1   T2  
+attr(,"label")
+[1] T Stage
+Levels: T1 T2 T3 T4
+
+$`DTC Negative`
+ [1] T2   T1   T2   T2   T3   T2   T2   T1   T2   T3   T1   T1   T1   T1   T1  
+[16] <NA> T2   T1   T2   T1   T1   T2   T3   T1   T1   T1   T2   T1   T2   T2  
+[31] T1   T1   T2   T2   T3   T1   T2   T1   T1   T1   T2   T1   T1   T3   T1  
+[46] T2   T1   T1   T2   T1   T1   T1   T2   T2   T2   T3   T2   T2   T1   T1  
+[61] T2   T1   T1   T1   T3   T3   T1   T2   T2   T2  
+Levels: T1 T2 T3 T4
+
+$`DTC Positive`
+ [1] T2 T1 T2 T2 T1 T2 T2 T2 T2 T2 T1 T2 T2 T1 T1 T3 T4 T2 T1 T2 T1 T3 T1 T1 T3
+[26] T1 T1 T2 T2 T2 T3 T1 T1 T2 T1 T2 T1 T1 T1
+Levels: T1 T2 T3 T4
+
+$overall
+  [1] N3 N0 N3 N0 N0 N2 N2 N2 N0 N0 N0 N1 N0 N1 N0 N1 N0 N1 N0 N0 N0 N3 N3 N1 N1
+ [26] N0 N0 N0 N2 N0 N0 N1 N0 N0 N2 N0 N0 N2 N0 N0 N1 N2 N0 N0 N0 N2 N1 N2 N1 N0
+ [51] N1 N1 N0 N1 N0 N1 N0 N1 N1 N1 N0 N1 N0 N1 N1 N0 N3 N1 N1 N1 N0 N1 N1 N3 N1
+ [76] N1 N3 N0 N0 N1 N0 N1 N1 N1 N0 N0 N2 N0 N1 N1 N1 N2 N1 N2 N2 N0 N1 N1 N1 N0
+[101] N1 N1 N1 N0 N0 N1 N1 N0 N0
+attr(,"label")
+[1] N Stage
+Levels: N0 N1 N2 N3
+
+$`DTC Negative`
+ [1] N3 N0 N3 N0 N2 N2 N0 N0 N1 N0 N1 N0 N1 N0 N0 N0 N2 N0 N0 N2 N0 N1 N2 N0 N1
+[26] N2 N1 N0 N1 N1 N1 N1 N1 N0 N1 N0 N0 N3 N1 N1 N1 N0 N1 N1 N1 N3 N1 N0 N1 N0
+[51] N2 N0 N1 N1 N1 N2 N1 N2 N2 N0 N1 N1 N1 N1 N1 N1 N0 N1 N1 N0
+Levels: N0 N1 N2 N3
+
+$`DTC Positive`
+ [1] N0 N2 N0 N1 N0 N0 N0 N3 N3 N1 N1 N0 N0 N0 N1 N0 N2 N0 N0 N0 N0 N2 N0 N0 N1
+[26] N0 N1 N1 N1 N1 N3 N0 N0 N1 N1 N0 N0 N0 N0
+Levels: N0 N1 N2 N3
+
+$overall
+  [1] Both Ductal and Lobular Ductal                  Ductal                 
+  [4] Ductal                  Ductal                  Lobular                
+  [7] Ductal                  Ductal                  Ductal                 
+ [10] Ductal                  Ductal                  Ductal                 
+ [13] Lobular                 Ductal                  Ductal                 
+ [16] Ductal                  Ductal                  Ductal                 
+ [19] Ductal                  Ductal                  Ductal                 
+ [22] Ductal                  Ductal                  Ductal                 
+ [25] Ductal                  Ductal                  Ductal                 
+ [28] Ductal                  Ductal                  Ductal                 
+ [31] Ductal                  Ductal                  Ductal                 
+ [34] Lobular                 Ductal                  Ductal                 
+ [37] Ductal                  Ductal                  Ductal                 
+ [40] Ductal                  Ductal                  Ductal                 
+ [43] Ductal                  Ductal                  Other                  
+ [46] Lobular                 Ductal                  Ductal                 
+ [49] Lobular                 Lobular                 Ductal                 
+ [52] Ductal                  Ductal                  Lobular                
+ [55] Ductal                  Lobular                 Ductal                 
+ [58] Ductal                  Ductal                  Ductal                 
+ [61] Other                   Lobular                 Ductal                 
+ [64] Ductal                  Ductal                  Ductal                 
+ [67] Lobular                 Ductal                  Ductal                 
+ [70] Ductal                  Ductal                  Ductal                 
+ [73] Both Ductal and Lobular Ductal                  Ductal                 
+ [76] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
+ [79] Ductal                  Both Ductal and Lobular Ductal                 
+ [82] Ductal                  Ductal                  Ductal                 
+ [85] Ductal                  Ductal                  Ductal                 
+ [88] Ductal                  Ductal                  Ductal                 
+ [91] Both Ductal and Lobular Lobular                 Ductal                 
+ [94] Lobular                 Ductal                  Ductal                 
+ [97] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
+[100] Ductal                  Lobular                 Ductal                 
+[103] Lobular                 Ductal                  Ductal                 
+[106] Ductal                  Ductal                  Ductal                 
+[109] Both Ductal and Lobular
+Levels: Both Ductal and Lobular Ductal Lobular Other
+
+$`DTC Negative`
+ [1] Both Ductal and Lobular Ductal                  Ductal                 
+ [4] Ductal                  Lobular                 Ductal                 
+ [7] Ductal                  Ductal                  Ductal                 
+[10] Lobular                 Ductal                  Ductal                 
+[13] Ductal                  Ductal                  Ductal                 
+[16] Ductal                  Ductal                  Ductal                 
+[19] Ductal                  Ductal                  Ductal                 
+[22] Ductal                  Ductal                  Other                  
+[25] Ductal                  Ductal                  Lobular                
+[28] Lobular                 Ductal                  Ductal                 
+[31] Lobular                 Ductal                  Ductal                 
+[34] Other                   Lobular                 Ductal                 
+[37] Ductal                  Lobular                 Ductal                 
+[40] Ductal                  Ductal                  Ductal                 
+[43] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
+[46] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
+[49] Ductal                  Ductal                  Ductal                 
+[52] Ductal                  Ductal                  Ductal                 
+[55] Both Ductal and Lobular Lobular                 Ductal                 
+[58] Lobular                 Ductal                  Ductal                 
+[61] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
+[64] Lobular                 Ductal                  Lobular                
+[67] Ductal                  Ductal                  Ductal                 
+[70] Both Ductal and Lobular
+Levels: Both Ductal and Lobular Ductal Lobular Other
+
+$`DTC Positive`
+ [1] Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal 
+[10] Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Lobular Ductal  Ductal 
+[19] Ductal  Ductal  Ductal  Lobular Ductal  Ductal  Lobular Ductal  Ductal 
+[28] Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal 
+[37] Ductal  Ductal  Ductal 
+Levels: Ductal Lobular
+
+$overall
+  [1] Radiation    No Radiation No Radiation No Radiation No Radiation
+  [6] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [11] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [16] No Radiation Radiation    Radiation    No Radiation Radiation   
+ [21] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [26] Radiation    No Radiation No Radiation Radiation    No Radiation
+ [31] Radiation    Radiation    No Radiation Radiation    Radiation   
+ [36] No Radiation No Radiation Radiation    No Radiation No Radiation
+ [41] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [46] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [51] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [56] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [61] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [66] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [71] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [76] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [81] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [86] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [91] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [96] No Radiation Radiation    Radiation    No Radiation No Radiation
+[101] No Radiation Radiation    Radiation    Radiation    No Radiation
+[106] No Radiation Radiation    No Radiation No Radiation
+attr(,"label")
+[1] Radiation
+Levels: No Radiation Radiation
+
+$`DTC Negative`
+ [1] Radiation    No Radiation No Radiation No Radiation Radiation   
+ [6] Radiation    Radiation    No Radiation Radiation    Radiation   
+[11] Radiation    No Radiation Radiation    Radiation    No Radiation
+[16] No Radiation Radiation    No Radiation No Radiation Radiation   
+[21] No Radiation Radiation    Radiation    Radiation    No Radiation
+[26] Radiation    Radiation    No Radiation Radiation    Radiation   
+[31] No Radiation Radiation    No Radiation No Radiation Radiation   
+[36] Radiation    Radiation    Radiation    Radiation    No Radiation
+[41] Radiation    No Radiation Radiation    Radiation    No Radiation
+[46] Radiation    Radiation    Radiation    Radiation    Radiation   
+[51] Radiation    Radiation    Radiation    Radiation    Radiation   
+[56] Radiation    Radiation    Radiation    Radiation    No Radiation
+[61] Radiation    Radiation    No Radiation No Radiation Radiation   
+[66] Radiation    Radiation    No Radiation Radiation    No Radiation
+Levels: No Radiation Radiation
+
+$`DTC Positive`
+ [1] No Radiation Radiation    Radiation    No Radiation Radiation   
+ [6] No Radiation Radiation    Radiation    Radiation    Radiation   
+[11] No Radiation Radiation    No Radiation Radiation    Radiation   
+[16] Radiation    Radiation    No Radiation No Radiation Radiation   
+[21] Radiation    Radiation    Radiation    Radiation    Radiation   
+[26] No Radiation Radiation    Radiation    Radiation    Radiation   
+[31] Radiation    Radiation    Radiation    Radiation    Radiation   
+[36] Radiation    No Radiation No Radiation No Radiation
+Levels: No Radiation Radiation
+
+$overall
+  [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+  [9] Chemo    Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo   
+ [17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [33] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo    Chemo   
+ [41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [73] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [81] Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
+ [89] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [97] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[105] Chemo    Chemo    Chemo    Chemo    Chemo   
+attr(,"label")
+[1] Chemo
+Levels: No Chemo Chemo
+
+$`DTC Negative`
+ [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [9] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo    Chemo   
+[17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[33] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+Levels: No Chemo Chemo
+
+$`DTC Positive`
+ [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [9] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[17] No Chemo Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[33] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
+Levels: No Chemo Chemo
+
+$overall
+  [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+  [4] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+  [7] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [10] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [13] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [16] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [19] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [22] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [25] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [28] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [31] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [34] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [37] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [40] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [43] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [46] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [49] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [52] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [55] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [58] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [61] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [64] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [67] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [70] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [73] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [76] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [79] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [82] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [85] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [88] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [91] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [94] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [97] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[100] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[103] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[106] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[109] Endocrine Therapy   
+attr(,"label")
+[1] Endocrine Therapy
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$`DTC Negative`
+ [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [4] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [7] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[10] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[13] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+[16] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[19] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[22] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[25] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[28] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[31] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[34] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[37] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[40] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[43] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[46] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[49] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[52] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[55] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[58] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[61] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[64] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[67] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[70] Endocrine Therapy   
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$`DTC Positive`
+ [1] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [4] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [7] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[10] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+[13] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[16] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[19] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[22] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[25] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[28] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[31] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[34] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[37] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$overall
+  [1] No Bone Modifying Treatment No Bone Modifying Treatment
+  [3] No Bone Modifying Treatment No Bone Modifying Treatment
+  [5] No Bone Modifying Treatment Bone Modifying Treatment   
+  [7] No Bone Modifying Treatment No Bone Modifying Treatment
+  [9] No Bone Modifying Treatment No Bone Modifying Treatment
+ [11] No Bone Modifying Treatment No Bone Modifying Treatment
+ [13] No Bone Modifying Treatment No Bone Modifying Treatment
+ [15] No Bone Modifying Treatment No Bone Modifying Treatment
+ [17] No Bone Modifying Treatment No Bone Modifying Treatment
+ [19] No Bone Modifying Treatment No Bone Modifying Treatment
+ [21] No Bone Modifying Treatment Bone Modifying Treatment   
+ [23] Bone Modifying Treatment    No Bone Modifying Treatment
+ [25] No Bone Modifying Treatment No Bone Modifying Treatment
+ [27] No Bone Modifying Treatment No Bone Modifying Treatment
+ [29] No Bone Modifying Treatment No Bone Modifying Treatment
+ [31] No Bone Modifying Treatment No Bone Modifying Treatment
+ [33] No Bone Modifying Treatment Bone Modifying Treatment   
+ [35] No Bone Modifying Treatment Bone Modifying Treatment   
+ [37] No Bone Modifying Treatment Bone Modifying Treatment   
+ [39] No Bone Modifying Treatment Bone Modifying Treatment   
+ [41] No Bone Modifying Treatment No Bone Modifying Treatment
+ [43] No Bone Modifying Treatment Bone Modifying Treatment   
+ [45] Bone Modifying Treatment    Bone Modifying Treatment   
+ [47] No Bone Modifying Treatment Bone Modifying Treatment   
+ [49] No Bone Modifying Treatment Bone Modifying Treatment   
+ [51] Bone Modifying Treatment    No Bone Modifying Treatment
+ [53] No Bone Modifying Treatment No Bone Modifying Treatment
+ [55] No Bone Modifying Treatment Bone Modifying Treatment   
+ [57] Bone Modifying Treatment    No Bone Modifying Treatment
+ [59] No Bone Modifying Treatment No Bone Modifying Treatment
+ [61] No Bone Modifying Treatment Bone Modifying Treatment   
+ [63] No Bone Modifying Treatment No Bone Modifying Treatment
+ [65] No Bone Modifying Treatment No Bone Modifying Treatment
+ [67] Bone Modifying Treatment    No Bone Modifying Treatment
+ [69] Bone Modifying Treatment    Bone Modifying Treatment   
+ [71] No Bone Modifying Treatment No Bone Modifying Treatment
+ [73] No Bone Modifying Treatment Bone Modifying Treatment   
+ [75] Bone Modifying Treatment    Bone Modifying Treatment   
+ [77] Bone Modifying Treatment    No Bone Modifying Treatment
+ [79] Bone Modifying Treatment    Bone Modifying Treatment   
+ [81] No Bone Modifying Treatment No Bone Modifying Treatment
+ [83] No Bone Modifying Treatment Bone Modifying Treatment   
+ [85] Bone Modifying Treatment    Bone Modifying Treatment   
+ [87] Bone Modifying Treatment    No Bone Modifying Treatment
+ [89] No Bone Modifying Treatment Bone Modifying Treatment   
+ [91] Bone Modifying Treatment    Bone Modifying Treatment   
+ [93] Bone Modifying Treatment    Bone Modifying Treatment   
+ [95] Bone Modifying Treatment    No Bone Modifying Treatment
+ [97] Bone Modifying Treatment    No Bone Modifying Treatment
+ [99] No Bone Modifying Treatment No Bone Modifying Treatment
+[101] Bone Modifying Treatment    No Bone Modifying Treatment
+[103] Bone Modifying Treatment    No Bone Modifying Treatment
+[105] Bone Modifying Treatment    No Bone Modifying Treatment
+[107] No Bone Modifying Treatment No Bone Modifying Treatment
+[109] No Bone Modifying Treatment
+attr(,"label")
+[1] Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$`DTC Negative`
+ [1] No Bone Modifying Treatment No Bone Modifying Treatment
+ [3] No Bone Modifying Treatment No Bone Modifying Treatment
+ [5] Bone Modifying Treatment    No Bone Modifying Treatment
+ [7] No Bone Modifying Treatment No Bone Modifying Treatment
+ [9] No Bone Modifying Treatment No Bone Modifying Treatment
+[11] No Bone Modifying Treatment No Bone Modifying Treatment
+[13] No Bone Modifying Treatment No Bone Modifying Treatment
+[15] No Bone Modifying Treatment No Bone Modifying Treatment
+[17] No Bone Modifying Treatment No Bone Modifying Treatment
+[19] No Bone Modifying Treatment Bone Modifying Treatment   
+[21] No Bone Modifying Treatment No Bone Modifying Treatment
+[23] No Bone Modifying Treatment Bone Modifying Treatment   
+[25] No Bone Modifying Treatment Bone Modifying Treatment   
+[27] No Bone Modifying Treatment Bone Modifying Treatment   
+[29] Bone Modifying Treatment    No Bone Modifying Treatment
+[31] No Bone Modifying Treatment No Bone Modifying Treatment
+[33] No Bone Modifying Treatment No Bone Modifying Treatment
+[35] Bone Modifying Treatment    No Bone Modifying Treatment
+[37] No Bone Modifying Treatment Bone Modifying Treatment   
+[39] No Bone Modifying Treatment Bone Modifying Treatment   
+[41] Bone Modifying Treatment    No Bone Modifying Treatment
+[43] No Bone Modifying Treatment Bone Modifying Treatment   
+[45] Bone Modifying Treatment    Bone Modifying Treatment   
+[47] Bone Modifying Treatment    No Bone Modifying Treatment
+[49] No Bone Modifying Treatment Bone Modifying Treatment   
+[51] Bone Modifying Treatment    No Bone Modifying Treatment
+[53] No Bone Modifying Treatment Bone Modifying Treatment   
+[55] Bone Modifying Treatment    Bone Modifying Treatment   
+[57] Bone Modifying Treatment    Bone Modifying Treatment   
+[59] Bone Modifying Treatment    No Bone Modifying Treatment
+[61] Bone Modifying Treatment    No Bone Modifying Treatment
+[63] No Bone Modifying Treatment Bone Modifying Treatment   
+[65] No Bone Modifying Treatment Bone Modifying Treatment   
+[67] No Bone Modifying Treatment No Bone Modifying Treatment
+[69] No Bone Modifying Treatment No Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$`DTC Positive`
+ [1] No Bone Modifying Treatment No Bone Modifying Treatment
+ [3] No Bone Modifying Treatment No Bone Modifying Treatment
+ [5] No Bone Modifying Treatment No Bone Modifying Treatment
+ [7] No Bone Modifying Treatment Bone Modifying Treatment   
+ [9] Bone Modifying Treatment    No Bone Modifying Treatment
+[11] No Bone Modifying Treatment No Bone Modifying Treatment
+[13] No Bone Modifying Treatment No Bone Modifying Treatment
+[15] No Bone Modifying Treatment Bone Modifying Treatment   
+[17] No Bone Modifying Treatment Bone Modifying Treatment   
+[19] Bone Modifying Treatment    No Bone Modifying Treatment
+[21] Bone Modifying Treatment    Bone Modifying Treatment   
+[23] No Bone Modifying Treatment No Bone Modifying Treatment
+[25] Bone Modifying Treatment    Bone Modifying Treatment   
+[27] No Bone Modifying Treatment No Bone Modifying Treatment
+[29] No Bone Modifying Treatment No Bone Modifying Treatment
+[31] Bone Modifying Treatment    No Bone Modifying Treatment
+[33] Bone Modifying Treatment    No Bone Modifying Treatment
+[35] Bone Modifying Treatment    Bone Modifying Treatment   
+[37] No Bone Modifying Treatment Bone Modifying Treatment   
+[39] No Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$overall
+  [1] Node Positive Node Negative Node Positive Node Negative Node Negative
+  [6] Node Positive Node Positive Node Positive Node Negative Node Negative
+ [11] Node Negative Node Positive Node Negative Node Positive Node Negative
+ [16] Node Positive Node Negative Node Positive Node Negative Node Negative
+ [21] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [26] Node Negative Node Negative Node Negative Node Positive Node Negative
+ [31] Node Negative Node Positive Node Negative Node Negative Node Positive
+ [36] Node Negative Node Negative Node Positive Node Negative Node Negative
+ [41] Node Positive Node Positive Node Negative Node Negative Node Negative
+ [46] Node Positive Node Positive Node Positive Node Positive Node Negative
+ [51] Node Positive Node Positive Node Negative Node Positive Node Negative
+ [56] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [61] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [66] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [71] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [76] Node Positive Node Positive Node Negative Node Negative Node Positive
+ [81] Node Negative Node Positive Node Positive Node Positive Node Negative
+ [86] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [91] Node Positive Node Positive Node Positive Node Positive Node Positive
+ [96] Node Negative Node Positive Node Positive Node Positive Node Negative
+[101] Node Positive Node Positive Node Positive Node Negative Node Negative
+[106] Node Positive Node Positive Node Negative Node Negative
+Levels: Node Negative Node Positive
+
+$`DTC Negative`
+ [1] Node Positive Node Negative Node Positive Node Negative Node Positive
+ [6] Node Positive Node Negative Node Negative Node Positive Node Negative
+[11] Node Positive Node Negative Node Positive Node Negative Node Negative
+[16] Node Negative Node Positive Node Negative Node Negative Node Positive
+[21] Node Negative Node Positive Node Positive Node Negative Node Positive
+[26] Node Positive Node Positive Node Negative Node Positive Node Positive
+[31] Node Positive Node Positive Node Positive Node Negative Node Positive
+[36] Node Negative Node Negative Node Positive Node Positive Node Positive
+[41] Node Positive Node Negative Node Positive Node Positive Node Positive
+[46] Node Positive Node Positive Node Negative Node Positive Node Negative
+[51] Node Positive Node Negative Node Positive Node Positive Node Positive
+[56] Node Positive Node Positive Node Positive Node Positive Node Negative
+[61] Node Positive Node Positive Node Positive Node Positive Node Positive
+[66] Node Positive Node Negative Node Positive Node Positive Node Negative
+Levels: Node Negative Node Positive
+
+$`DTC Positive`
+ [1] Node Negative Node Positive Node Negative Node Positive Node Negative
+ [6] Node Negative Node Negative Node Positive Node Positive Node Positive
+[11] Node Positive Node Negative Node Negative Node Negative Node Positive
+[16] Node Negative Node Positive Node Negative Node Negative Node Negative
+[21] Node Negative Node Positive Node Negative Node Negative Node Positive
+[26] Node Negative Node Positive Node Positive Node Positive Node Positive
+[31] Node Positive Node Negative Node Negative Node Positive Node Positive
+[36] Node Negative Node Negative Node Negative Node Negative
+Levels: Node Negative Node Positive
+
+$overall
+  [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+  [4] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+  [7] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [10] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [13] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [16] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [19] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [22] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [25] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [28] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [31] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [34] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [37] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [40] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [43] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [46] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [49] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [52] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [55] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [58] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [61] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [64] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [67] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [70] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [73] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [76] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [79] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [82] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [85] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [88] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [91] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [94] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [97] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[100] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+[103] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[106] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[109] No Axillary Dissection
+attr(,"label")
+[1] Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$`DTC Negative`
+ [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [4] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [7] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+[10] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[13] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+[16] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[19] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+[22] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[25] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[28] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[31] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[34] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[37] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[40] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[43] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+[46] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[49] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[52] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+[55] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[58] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[61] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[64] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+[67] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[70] No Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$`DTC Positive`
+ [1] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [4] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [7] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+[10] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[13] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+[16] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[19] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[22] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[25] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[28] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[31] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[34] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[37] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$overall
+  [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+  [7] Lumpectomy Mastectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy
+ [13] Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [19] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [25] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [31] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [37] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [43] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+ [49] Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [55] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [61] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [67] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [73] Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [79] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [85] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [91] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [97] Mastectomy Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy
+[103] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+[109] Mastectomy
+attr(,"label")
+[1] Surgery Type
+Levels: Lumpectomy Mastectomy
+
+$`DTC Negative`
+ [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Lumpectomy
+ [7] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy
+[13] Lumpectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+[19] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+[25] Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+[31] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy
+[37] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+[43] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Lumpectomy
+[49] Mastectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+[55] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+[61] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+[67] Lumpectomy Mastectomy Mastectomy Mastectomy
+Levels: Lumpectomy Mastectomy
+
+$`DTC Positive`
+ [1] Mastectomy Mastectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [7] Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+[13] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+[19] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+[25] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+[31] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+[37] Lumpectomy Mastectomy Mastectomy
+Levels: Lumpectomy Mastectomy
+
+$overall
+  [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+  [4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+  [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [16] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [25] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [28] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [34] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [37] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [40] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [43] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [49] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [52] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [55] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [58] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [61] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [64] Neoadjuvant Chemo    Neoadjuvant Chemo    Neoadjuvant Chemo   
+ [67] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [70] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [73] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [76] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [79] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [82] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [85] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [88] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [91] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [94] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [97] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[100] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+[103] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[106] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[109] No Neoadjuvant Chemo
+attr(,"label")
+[1] Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$`DTC Negative`
+ [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [4] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[16] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+[19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[25] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+[28] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[34] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[37] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+[40] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[43] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[49] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+[52] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+[55] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+[58] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[61] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[64] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[67] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+[70] No Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$`DTC Positive`
+ [1] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [4] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[16] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+[19] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[25] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[28] Neoadjuvant Chemo    Neoadjuvant Chemo    Neoadjuvant Chemo   
+[31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[34] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+[37] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+
table1_dtc #we have p-values!  
+
+
+ +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Total
+(N=109)
DTC Negative
+(N=70)
DTC Positive
+(N=39)
P-value
Age at Diagnosis (years)0.722
Mean (SD)49.7 (9.66)49.9 (9.74)49.2 (9.63)
Median [Min, Max]49.3 [27.3, 68.9]51.6 [27.3, 68.9]48.8 [30.7, 67.7]
Final Receptor Group0.145
TNBC45 (41.3%)25 (35.7%)20 (51.3%)
HR+ HER2-52 (47.7%)37 (52.9%)15 (38.5%)
HR+ HER2+8 (7.3%)4 (5.7%)4 (10.3%)
HR- HER2+4 (3.7%)4 (5.7%)0 (0%)
Race0.683
Mean (SD)4.65 (1.12)4.69 (1.06)4.59 (1.23)
Median [Min, Max]5.00 [1.00, 5.00]5.00 [1.00, 5.00]5.00 [1.00, 5.00]
Tumor Grade0.11
Grade 379 (72.5%)46 (65.7%)33 (84.6%)
Grade 122 (20.2%)18 (25.7%)4 (10.3%)
Grade 26 (5.5%)4 (5.7%)2 (5.1%)
Missing2 (1.8%)2 (2.9%)0 (0%)
Overall Stage0.804
Stage I35 (32.1%)22 (31.4%)13 (33.3%)
Stage II47 (43.1%)29 (41.4%)18 (46.2%)
Stage III26 (23.9%)18 (25.7%)8 (20.5%)
Missing1 (0.9%)1 (1.4%)0 (0%)
T Stage0.629
T151 (46.8%)34 (48.6%)17 (43.6%)
T244 (40.4%)27 (38.6%)17 (43.6%)
T312 (11.0%)8 (11.4%)4 (10.3%)
T41 (0.9%)0 (0%)1 (2.6%)
Missing1 (0.9%)1 (1.4%)0 (0%)
N Stage0.114
N046 (42.2%)24 (34.3%)22 (56.4%)
N143 (39.4%)32 (45.7%)11 (28.2%)
N213 (11.9%)10 (14.3%)3 (7.7%)
N37 (6.4%)4 (5.7%)3 (7.7%)
Histology Category0.0157
Both Ductal and Lobular9 (8.3%)9 (12.9%)0 (0%)
Ductal84 (77.1%)48 (68.6%)36 (92.3%)
Lobular14 (12.8%)11 (15.7%)3 (7.7%)
Other2 (1.8%)2 (2.9%)0 (0%)
Radiation0.774
No Radiation34 (31.2%)23 (32.9%)11 (28.2%)
Radiation75 (68.8%)47 (67.1%)28 (71.8%)
Chemo0.291
No Chemo3 (2.8%)1 (1.4%)2 (5.1%)
Chemo106 (97.2%)69 (98.6%)37 (94.9%)
Endocrine Therapy0.497
No Endocrine Therapy47 (43.1%)28 (40.0%)19 (48.7%)
Endocrine Therapy62 (56.9%)42 (60.0%)20 (51.3%)
Bone Modifying Treatment1
No Bone Modifying Treatment70 (64.2%)45 (64.3%)25 (64.1%)
Bone Modifying Treatment39 (35.8%)25 (35.7%)14 (35.9%)
Node Status0.0414
Node Negative46 (42.2%)24 (34.3%)22 (56.4%)
Node Positive63 (57.8%)46 (65.7%)17 (43.6%)
Axillary Dissection0.204
No Axillary Dissection54 (49.5%)31 (44.3%)23 (59.0%)
Axillary Dissection55 (50.5%)39 (55.7%)16 (41.0%)
Surgery Type0.516
Lumpectomy45 (41.3%)31 (44.3%)14 (35.9%)
Mastectomy64 (58.7%)39 (55.7%)25 (64.1%)
Neoadjuvant Chemo0.37
No Neoadjuvant Chemo90 (82.6%)60 (85.7%)30 (76.9%)
Neoadjuvant Chemo19 (17.4%)10 (14.3%)9 (23.1%)
+ +
+
+
+

We can see in this series of tests that there are similar, but not identical, sets of variables that appear to be significant in predicting DTC status, including: Histology category (with ductal histology more storngly correlated with positivity), Nodal status (with node positive patients more likely to have DTC positivity), with trends towards significance for N stage, receptor group, and tumor grade. We have decided to not include pCR in this table or in further analyses because the cohort of patients who received neoadjuvant therapy is only 19 patients, so the n is very low for any tests of association and there is significant missingness for the overall cohort.

+
+
+

4.3 Multivariable Analysis

+

Variable Selection and Planning: I have chosen to perform a multivariable logistic regression to identify predictors of ctDNA (and DTC) positivity as we suspect these are biomarkers of relapse and can see even in our data-set that ctDNA is strongly associated with relapse and worse overall survival. Identifying predictors of positivity would help us to understand who we might consider screening for ctDNA positivity, as this testing is expensive and takes time and resources–and may not benefit everyone. ctDNA positivity is a binary outcome, and we have performed univariable analyses as above already to look at potentially significant relationships. There are multiple types of variables worth considering, including demographic and clinical factors, disease factors (such as aggressiveness of the tumor as measured by histology and grade, the hormone receptor status of the tumor, the stage of disease at diagnosis), and treatment factors (surgery type, radiation or no radiation, neoadjuvant chemotherapy or no, endocrine aka anti-hormone therapy). The only variable we have removed from our model is pathologic complete response (whether or not patients have NO tumor at the time of surgery IF they received neoadjuvant chemo/immunotherapy before surgery) as the number of patients who received neoadjuvant therapy was not particularly high and therefore there is significant missingness (and it would not make sense to impute for this variable, as it only is a relevant factor to consider for those patients who received neoadjuvant therapy). These are all time-invarying factors, and all were present at the beginning of enrollment on study, prior to ctDNA testing. We will use all of our variables that we assessed in our initial univariable tests of association (including those that had significant associations and those that did not), as we suspect some of these variables are related to one another or colinear and therefore we cannot rely on simple univariable tests of association to determine what will be most predictive of positivity.

+

We have several variables that were significant in our univariable analyses (chi-squared). These include median age-at-diagnosis, longer time from diagnosis to enrollment, higher tumor grade, and higher initial stage at diagnosis. Variables that were not significant but that could be considered include histology, nodal positivity, higher t-stage, and receptor status. While recurrence (both distant and local) as well as worse survival are significantly associated with ctDNA positivity, these are outcomes that we think of as following ctDNA positivity temporally and therefore should not be included in our predictive model as predictors.

+

LASSO: Lasso will give us the most parsimonious model and is an automatic approach, without consideration of absolute p-value cutoffs. There is no specific “right” method to choose variables, but generally purposeful selection begins with univariable analysis which we have already performed. We considered stepwise model building based on p-values, but this approach has gone out of favor as this approach uses somewhat arbitrary p-value cutoffs and can ignore actually relevant and important variables. As we have already performed univariable tests of association above with chi-squared tests, we do not need to perform univariable logistic regression. We have removed one variable of interest that we assessed with univariable association for ctDNA because of the significant missingness in the cohort overall (and its applicability to only the small subset of patients who received neoadjuvant therapy). We will perform LASSO with our remaining variables to identify and select variables that are most predictive.

+
+
library(glmnet)
+
+
Loading required package: Matrix
+
+
+

+Attaching package: 'Matrix'
+
+
+
The following objects are masked from 'package:tidyr':
+
+    expand, pack, unpack
+
+
+
Loaded glmnet 4.1-8
+
+
set.seed(123) 
+
+# Prepare the response variable
+y <- unique_subset_data$ctDNA_ever
+
+
+#was getting an error message when I ran y and X2 because there were 4 missing observations, so will impute these as it is only 4 and missingness is lo (<10%)
+library(mice)
+
+

+Attaching package: 'mice'
+
+
+
The following object is masked from 'package:stats':
+
+    filter
+
+
+
The following objects are masked from 'package:base':
+
+    cbind, rbind
+
+
# Impute missing values (as general missingness is low as above)
+imputed_data <- mice(unique_subset_data, m = 1, method = "pmm", maxit = 5)
+
+

+ iter imp variable
+  1   1  final_tumor_grade  final_overall_stage  final_t_stage
+  2   1  final_tumor_grade  final_overall_stage  final_t_stage
+  3   1  final_tumor_grade  final_overall_stage  final_t_stage
+  4   1  final_tumor_grade  final_overall_stage  final_t_stage
+  5   1  final_tumor_grade  final_overall_stage  final_t_stage
+
+
+
Warning: Number of logged events: 4
+
+
unique_subset_data <- complete(imputed_data)
+
+#-1 to not include intercept in this matrix as a predictor variable 
+X2 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+
+                                   final_tumor_grade + final_overall_stage + 
+                                      + final_t_stage + final_n_stage +  
+                                      histology_category + prtx_radiation + 
+                                      + prtx_chemo + prtx_endo + prtx_bonemod + 
+                                      node_status + axillary_dissection + 
+                                      diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data)
+
+
+
+# Fit lasso model
+lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1)  # alpha = 1 for lasso
+
+#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
+cv_lasso_model <- cv.glmnet(X2, y, family = "binomial", alpha = 1)
+
+
Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+
+
+
Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+
+
#plotting the results to look at the performance of different lamda 
+plot(cv_lasso_model)
+
+
+
+

+
+
+
+
#getting the best lambda  -- 0.052 
+best_lambda <- cv_lasso_model$lambda.min
+print(paste("Best lambda:", best_lambda))
+
+
[1] "Best lambda: 0.048114238291791"
+
+
#Finding the final fit model with the optimal lambda 
+final_lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1, lambda = best_lambda)
+
+#Which coefficents are included in the model. 
+coef(final_lasso_model) 
+
+
29 x 1 sparse Matrix of class "dgCMatrix"
+                                              s0
+(Intercept)                            -2.762725
+age_at_diag                             .       
+final_receptor_groupTNBC                .       
+final_receptor_groupHR+ HER2-           .       
+final_receptor_groupHR+ HER2+           .       
+final_receptor_groupHR- HER2+           .       
+demo_race_finalAsian                    .       
+demo_race_finalWhite                    .       
+final_tumor_gradeGrade 1                .       
+final_tumor_gradeGrade 2                .       
+final_overall_stageStage II             .       
+final_overall_stageStage III            .       
+final_t_stageT2                         .       
+final_t_stageT3                         .       
+final_t_stageT4                         1.189380
+final_n_stageN1                         .       
+final_n_stageN2                         1.573283
+final_n_stageN3                         .       
+histology_categoryDuctal                .       
+histology_categoryLobular               .       
+histology_categoryOther                 .       
+prtx_radiationRadiation                 .       
+prtx_chemoChemo                         .       
+prtx_endoEndocrine Therapy              .       
+prtx_bonemodBone Modifying Treatment    .       
+node_statusNode Positive                .       
+axillary_dissectionAxillary Dissection  .       
+diag_surgery_type_1Mastectomy           .       
+diag_neoadj_chemo_1Neoadjuvant Chemo    .       
+
+
+

Variables that remain significant in the LASSO for ctDNA positivity are t-stage and n-stage. It is slightly challenging to interpret these multi-level variables (such as T-stage and N stage) in the lasso but you can see that higher categories (T4, N2) are associated with positivity in the LASSO. The lambda for this model is quite low at 0.05. It is important to remember that a number of these variables are related to one another (such as T stage and N stage with final overall stage, which is built based on T and N stage), and node status + N stage (node status being built on N stage). I’ll try a few other LASSOs to see whether by eliminating one of each of these colinear variables we get different results.

+
+
library(glmnet)
+
+set.seed(123) #to ensure consistency of results 
+
+# Prepare the response variable
+y <- unique_subset_data$ctDNA_ever
+
+#yet again, the same 4 missing observations in X3, so will impute (only 4 observations). We have already imputed these, so I don't need to do it again for unique_subset_Data 
+
+
+### removed Nodal status as a variable 
+X3 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+
+                                   final_tumor_grade + final_overall_stage + 
+                                      + final_t_stage + final_n_stage + 
+                                      histology_category + prtx_radiation + 
+                                      + prtx_chemo + prtx_endo + prtx_bonemod  
+                                       + axillary_dissection + 
+                                      diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data)
+
+# Fit lasso model
+lasso_model <- glmnet(X3, y, family = "binomial", alpha = 1)  # alpha = 1 for lasso
+
+#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
+cv_lasso_model <- cv.glmnet(X3, y, family = "binomial", alpha = 1)
+
+
Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+
+
#plotting the results to look at the performance of different lamda 
+plot(cv_lasso_model)
+
+
+
+

+
+
+
+
#getting the best lambda  -- 0.048, lower  
+best_lambda <- cv_lasso_model$lambda.min
+print(paste("Best lambda:", best_lambda))
+
+
[1] "Best lambda: 0.048114238291791"
+
+
#Finding the final fit model with the optimal lambda 
+paired_down_lasso <- glmnet(X3, y, family = "binomial", alpha = 1, lambda = best_lambda)
+
+#Which coefficents are included in the model. 
+coef(paired_down_lasso) 
+
+
28 x 1 sparse Matrix of class "dgCMatrix"
+                                              s0
+(Intercept)                            -2.762725
+age_at_diag                             .       
+final_receptor_groupTNBC                .       
+final_receptor_groupHR+ HER2-           .       
+final_receptor_groupHR+ HER2+           .       
+final_receptor_groupHR- HER2+           .       
+demo_race_finalAsian                    .       
+demo_race_finalWhite                    .       
+final_tumor_gradeGrade 1                .       
+final_tumor_gradeGrade 2                .       
+final_overall_stageStage II             .       
+final_overall_stageStage III            .       
+final_t_stageT2                         .       
+final_t_stageT3                         .       
+final_t_stageT4                         1.189380
+final_n_stageN1                         .       
+final_n_stageN2                         1.573283
+final_n_stageN3                         .       
+histology_categoryDuctal                .       
+histology_categoryLobular               .       
+histology_categoryOther                 .       
+prtx_radiationRadiation                 .       
+prtx_chemoChemo                         .       
+prtx_endoEndocrine Therapy              .       
+prtx_bonemodBone Modifying Treatment    .       
+axillary_dissectionAxillary Dissection  .       
+diag_surgery_type_1Mastectomy           .       
+diag_neoadj_chemo_1Neoadjuvant Chemo    .       
+
+
+

When we use the paired down lasso model for ctDNA positivity (removed nodal positivity), we see that T stage and N stage remain the only significant factors, and that higher nodal status is the most influential on ctDNA positivity. The lambda for this model is 0.048 which is lower than the prior model.

+

It is, however, somewhat challenging to model ctDNA positivity using any of these approaches because there were only 9 individuals in this cohort of 109 individuals with positive results. Because of this low N, it is hard to know exactly what to do with these predictors.

+

The intercept (-2.76) is the log-odds of the outcome (ctDNA positivity or DTC positivity) when all the predictor variables are zero. The coefficients can be interpreted as the amount/times the log odds increases (or decreases) for that cohort, holding all other variables equal.

+

To test our proof of principle approach that lasso can be applied to this dataset and perhaps generate more robust results, we will also look at DTC predictors, as DTC positivity was more frequent in this cohort and we therefore suspect the modeling approach may work better.

+
+
set.seed(123) 
+
+#### DTC predictions. 
+
+subset_data <- subset_data[!duplicated(subset_data$participant_id), ]
+dtc_unique_subset_data <- merge(unique_subset_data, subset_data[, c("participant_id", "dtc_ever")], by = "participant_id", all.x = TRUE)
+
+nrow(dtc_unique_subset_data)  # Should still be 109
+
+
[1] 109
+
+
table(dtc_unique_subset_data$dtc_ever, useNA = "ifany")  # Check for NA values
+
+

+ 0  1 
+70 39 
+
+
#run the lasso for DTC status. This might work better as there are more DTC + results 
+y1 <- dtc_unique_subset_data$dtc_ever
+X2 #use the same X2 as it has the same predictors we are interested in  
+
+
    age_at_diag final_receptor_groupTNBC final_receptor_groupHR+ HER2-
+1      55.89870                        0                             1
+2      49.25667                        0                             1
+3      52.87611                        0                             1
+4      29.93840                        1                             0
+5      37.00753                        1                             0
+6      48.98563                        0                             1
+7      63.80835                        0                             0
+8      40.89802                        0                             0
+9      43.59754                        1                             0
+10     38.57632                        1                             0
+11     41.77687                        1                             0
+12     45.68925                        0                             1
+13     59.94524                        0                             1
+14     59.43600                        0                             0
+15     52.14511                        1                             0
+16     42.93771                        1                             0
+17     64.69541                        0                             1
+18     55.14031                        1                             0
+19     41.26762                        1                             0
+20     39.52361                        1                             0
+21     57.76044                        1                             0
+22     44.42984                        0                             1
+23     51.34565                        0                             1
+24     42.27789                        1                             0
+25     57.05133                        1                             0
+26     57.62628                        1                             0
+27     54.86927                        1                             0
+28     44.18891                        1                             0
+29     63.62491                        0                             1
+30     36.00548                        1                             0
+31     55.57837                        1                             0
+32     30.71595                        0                             0
+33     41.28953                        1                             0
+34     59.38946                        0                             1
+35     48.79945                        0                             1
+36     59.15400                        0                             0
+37     48.97194                        1                             0
+38     59.39767                        0                             1
+39     39.67967                        1                             0
+40     67.68515                        0                             1
+41     41.84531                        1                             0
+42     48.16975                        1                             0
+43     58.07529                        0                             1
+44     62.49966                        0                             1
+45     46.64476                        0                             0
+46     47.34565                        0                             1
+47     52.09856                        0                             1
+48     36.58042                        0                             0
+49     58.26146                        0                             1
+50     61.76318                        0                             1
+51     61.73580                        1                             0
+52     39.40862                        1                             0
+53     55.30459                        1                             0
+54     53.10335                        0                             1
+55     43.30459                        1                             0
+56     48.46270                        0                             1
+57     44.07666                        1                             0
+58     52.55305                        1                             0
+59     56.45996                        1                             0
+60     67.72621                        1                             0
+61     39.59206                        1                             0
+62     51.82752                        0                             1
+63     58.28611                        0                             1
+64     46.93498                        1                             0
+65     31.17591                        1                             0
+66     55.96441                        1                             0
+67     46.38741                        0                             1
+68     46.33812                        0                             1
+69     40.62971                        0                             1
+70     37.67556                        0                             1
+71     32.35318                        1                             0
+72     48.75291                        0                             1
+73     56.22177                        0                             1
+74     39.41136                        0                             1
+75     49.76591                        1                             0
+76     43.22245                        0                             0
+77     36.01095                        0                             1
+78     41.30322                        1                             0
+79     59.57016                        0                             1
+80     39.65503                        0                             1
+81     54.94593                        1                             0
+82     43.50992                        0                             0
+83     48.80767                        1                             0
+84     62.10541                        0                             1
+85     63.35934                        0                             0
+86     57.31417                        0                             1
+87     59.74264                        0                             1
+88     66.92676                        1                             0
+89     36.30938                        1                             0
+90     34.83641                        0                             1
+91     55.12115                        0                             1
+92     52.07118                        0                             1
+93     27.33744                        0                             1
+94     64.41342                        0                             1
+95     56.09035                        0                             1
+96     47.90691                        0                             1
+97     51.38125                        0                             1
+98     41.71663                        1                             0
+99     48.47639                        0                             1
+100    40.52567                        1                             0
+101    60.39151                        0                             1
+102    52.51198                        0                             0
+103    60.87064                        0                             1
+104    58.61465                        0                             1
+105    38.60370                        0                             1
+106    68.93634                        0                             1
+107    37.84531                        0                             0
+108    51.43874                        1                             0
+109    52.68720                        0                             1
+    final_receptor_groupHR+ HER2+ final_receptor_groupHR- HER2+
+1                               0                             0
+2                               0                             0
+3                               0                             0
+4                               0                             0
+5                               0                             0
+6                               0                             0
+7                               0                             1
+8                               1                             0
+9                               0                             0
+10                              0                             0
+11                              0                             0
+12                              0                             0
+13                              0                             0
+14                              1                             0
+15                              0                             0
+16                              0                             0
+17                              0                             0
+18                              0                             0
+19                              0                             0
+20                              0                             0
+21                              0                             0
+22                              0                             0
+23                              0                             0
+24                              0                             0
+25                              0                             0
+26                              0                             0
+27                              0                             0
+28                              0                             0
+29                              0                             0
+30                              0                             0
+31                              0                             0
+32                              1                             0
+33                              0                             0
+34                              0                             0
+35                              0                             0
+36                              1                             0
+37                              0                             0
+38                              0                             0
+39                              0                             0
+40                              0                             0
+41                              0                             0
+42                              0                             0
+43                              0                             0
+44                              0                             0
+45                              1                             0
+46                              0                             0
+47                              0                             0
+48                              1                             0
+49                              0                             0
+50                              0                             0
+51                              0                             0
+52                              0                             0
+53                              0                             0
+54                              0                             0
+55                              0                             0
+56                              0                             0
+57                              0                             0
+58                              0                             0
+59                              0                             0
+60                              0                             0
+61                              0                             0
+62                              0                             0
+63                              0                             0
+64                              0                             0
+65                              0                             0
+66                              0                             0
+67                              0                             0
+68                              0                             0
+69                              0                             0
+70                              0                             0
+71                              0                             0
+72                              0                             0
+73                              0                             0
+74                              0                             0
+75                              0                             0
+76                              1                             0
+77                              0                             0
+78                              0                             0
+79                              0                             0
+80                              0                             0
+81                              0                             0
+82                              0                             1
+83                              0                             0
+84                              0                             0
+85                              1                             0
+86                              0                             0
+87                              0                             0
+88                              0                             0
+89                              0                             0
+90                              0                             0
+91                              0                             0
+92                              0                             0
+93                              0                             0
+94                              0                             0
+95                              0                             0
+96                              0                             0
+97                              0                             0
+98                              0                             0
+99                              0                             0
+100                             0                             0
+101                             0                             0
+102                             0                             1
+103                             0                             0
+104                             0                             0
+105                             0                             0
+106                             0                             0
+107                             0                             1
+108                             0                             0
+109                             0                             0
+    demo_race_finalAsian demo_race_finalWhite final_tumor_gradeGrade 1
+1                      0                    1                        0
+2                      0                    1                        0
+3                      0                    1                        0
+4                      0                    1                        0
+5                      0                    1                        0
+6                      0                    1                        0
+7                      0                    0                        0
+8                      0                    1                        0
+9                      0                    1                        0
+10                     0                    1                        0
+11                     0                    1                        0
+12                     0                    1                        0
+13                     0                    1                        0
+14                     0                    1                        0
+15                     0                    1                        0
+16                     0                    1                        0
+17                     0                    1                        0
+18                     0                    1                        0
+19                     0                    0                        0
+20                     0                    1                        0
+21                     0                    1                        0
+22                     0                    1                        0
+23                     0                    1                        0
+24                     0                    1                        0
+25                     0                    1                        0
+26                     0                    0                        0
+27                     0                    1                        0
+28                     0                    1                        0
+29                     0                    1                        1
+30                     0                    1                        0
+31                     0                    1                        0
+32                     0                    1                        0
+33                     0                    0                        0
+34                     0                    1                        1
+35                     0                    1                        1
+36                     0                    1                        0
+37                     0                    1                        0
+38                     0                    1                        1
+39                     0                    1                        0
+40                     0                    1                        0
+41                     0                    1                        0
+42                     0                    1                        0
+43                     0                    1                        0
+44                     0                    1                        0
+45                     0                    1                        0
+46                     0                    1                        1
+47                     0                    1                        0
+48                     0                    0                        1
+49                     0                    1                        1
+50                     0                    1                        1
+51                     0                    1                        0
+52                     0                    1                        0
+53                     0                    1                        0
+54                     0                    1                        1
+55                     0                    1                        0
+56                     0                    1                        1
+57                     0                    0                        0
+58                     0                    1                        0
+59                     0                    1                        0
+60                     0                    1                        0
+61                     0                    1                        0
+62                     0                    1                        1
+63                     0                    1                        0
+64                     0                    1                        0
+65                     0                    0                        0
+66                     0                    1                        0
+67                     0                    1                        1
+68                     0                    1                        1
+69                     0                    1                        0
+70                     0                    1                        0
+71                     0                    1                        0
+72                     0                    1                        0
+73                     0                    1                        1
+74                     0                    1                        0
+75                     0                    1                        0
+76                     0                    1                        0
+77                     0                    1                        0
+78                     0                    1                        0
+79                     0                    1                        0
+80                     0                    1                        0
+81                     0                    0                        0
+82                     0                    1                        0
+83                     0                    1                        0
+84                     0                    1                        0
+85                     0                    1                        0
+86                     0                    1                        0
+87                     0                    1                        1
+88                     0                    1                        0
+89                     0                    1                        0
+90                     0                    1                        1
+91                     0                    1                        0
+92                     0                    1                        1
+93                     1                    0                        1
+94                     0                    1                        1
+95                     0                    1                        0
+96                     0                    1                        1
+97                     0                    1                        0
+98                     0                    0                        0
+99                     0                    1                        1
+100                    0                    1                        0
+101                    0                    1                        1
+102                    0                    1                        0
+103                    0                    1                        1
+104                    0                    1                        0
+105                    0                    1                        0
+106                    0                    1                        0
+107                    0                    1                        0
+108                    0                    1                        0
+109                    0                    1                        0
+    final_tumor_gradeGrade 2 final_overall_stageStage II
+1                          1                           0
+2                          1                           0
+3                          0                           0
+4                          0                           1
+5                          0                           1
+6                          0                           0
+7                          0                           0
+8                          0                           0
+9                          0                           1
+10                         0                           0
+11                         0                           1
+12                         0                           1
+13                         0                           1
+14                         1                           1
+15                         0                           0
+16                         0                           1
+17                         0                           0
+18                         0                           1
+19                         0                           1
+20                         0                           0
+21                         0                           1
+22                         0                           0
+23                         0                           0
+24                         0                           1
+25                         0                           1
+26                         0                           1
+27                         0                           0
+28                         0                           1
+29                         0                           0
+30                         0                           1
+31                         0                           0
+32                         0                           1
+33                         0                           0
+34                         0                           1
+35                         0                           0
+36                         0                           1
+37                         0                           1
+38                         0                           0
+39                         0                           0
+40                         0                           0
+41                         0                           1
+42                         0                           0
+43                         0                           1
+44                         1                           0
+45                         1                           0
+46                         0                           0
+47                         0                           1
+48                         0                           0
+49                         0                           1
+50                         0                           0
+51                         0                           1
+52                         0                           1
+53                         0                           0
+54                         0                           1
+55                         0                           0
+56                         0                           0
+57                         0                           0
+58                         0                           1
+59                         0                           0
+60                         0                           1
+61                         0                           1
+62                         0                           0
+63                         0                           0
+64                         0                           1
+65                         0                           1
+66                         0                           1
+67                         0                           0
+68                         0                           1
+69                         0                           0
+70                         0                           1
+71                         0                           0
+72                         0                           1
+73                         0                           0
+74                         0                           0
+75                         0                           0
+76                         0                           0
+77                         0                           0
+78                         0                           0
+79                         0                           0
+80                         0                           0
+81                         0                           0
+82                         0                           1
+83                         0                           0
+84                         1                           0
+85                         0                           1
+86                         0                           0
+87                         0                           0
+88                         0                           0
+89                         0                           1
+90                         0                           1
+91                         0                           1
+92                         0                           0
+93                         0                           1
+94                         0                           0
+95                         0                           0
+96                         0                           0
+97                         0                           1
+98                         0                           1
+99                         0                           1
+100                        0                           0
+101                        0                           0
+102                        0                           0
+103                        0                           0
+104                        0                           0
+105                        0                           0
+106                        0                           1
+107                        0                           1
+108                        0                           0
+109                        0                           1
+    final_overall_stageStage III final_t_stageT2 final_t_stageT3
+1                              1               1               0
+2                              0               0               0
+3                              1               1               0
+4                              0               1               0
+5                              0               1               0
+6                              1               0               1
+7                              1               1               0
+8                              1               0               0
+9                              0               1               0
+10                             0               0               0
+11                             0               1               0
+12                             0               1               0
+13                             0               0               1
+14                             0               0               0
+15                             0               0               0
+16                             0               1               0
+17                             0               0               0
+18                             0               0               0
+19                             0               1               0
+20                             0               0               0
+21                             0               1               0
+22                             1               1               0
+23                             1               1               0
+24                             0               1               0
+25                             0               0               0
+26                             0               1               0
+27                             0               0               0
+28                             0               0               0
+29                             1               1               0
+30                             0               1               0
+31                             0               0               0
+32                             0               0               0
+33                             0               0               0
+34                             0               0               1
+35                             1               0               0
+36                             0               1               0
+37                             0               1               0
+38                             1               0               0
+39                             0               0               0
+40                             0               0               0
+41                             0               1               0
+42                             1               0               1
+43                             0               1               0
+44                             0               0               0
+45                             0               0               0
+46                             1               0               1
+47                             0               0               0
+48                             1               0               0
+49                             0               1               0
+50                             0               0               0
+51                             0               1               0
+52                             0               1               0
+53                             0               0               0
+54                             0               0               0
+55                             0               0               0
+56                             1               0               1
+57                             0               0               0
+58                             0               0               0
+59                             0               0               0
+60                             0               1               0
+61                             0               1               0
+62                             1               0               1
+63                             0               0               0
+64                             0               1               0
+65                             0               1               0
+66                             0               1               0
+67                             1               0               0
+68                             0               0               0
+69                             0               0               0
+70                             0               1               0
+71                             0               0               0
+72                             0               1               0
+73                             0               0               0
+74                             1               0               1
+75                             1               0               1
+76                             0               0               0
+77                             1               1               0
+78                             0               0               0
+79                             0               0               0
+80                             0               0               0
+81                             0               0               0
+82                             0               1               0
+83                             1               1               0
+84                             0               0               0
+85                             0               1               0
+86                             0               0               0
+87                             1               0               0
+88                             0               0               0
+89                             0               1               0
+90                             0               1               0
+91                             0               1               0
+92                             1               0               1
+93                             0               1               0
+94                             1               1               0
+95                             1               0               0
+96                             0               0               0
+97                             0               1               0
+98                             0               0               0
+99                             0               0               0
+100                            0               0               0
+101                            0               0               0
+102                            1               0               1
+103                            1               0               1
+104                            0               0               0
+105                            0               0               0
+106                            0               1               0
+107                            0               1               0
+108                            0               0               0
+109                            0               1               0
+    final_t_stageT4 final_n_stageN1 final_n_stageN2 final_n_stageN3
+1                 0               0               0               1
+2                 0               0               0               0
+3                 0               0               0               1
+4                 0               0               0               0
+5                 0               0               0               0
+6                 0               0               1               0
+7                 0               0               1               0
+8                 0               0               1               0
+9                 0               0               0               0
+10                0               0               0               0
+11                0               0               0               0
+12                0               1               0               0
+13                0               0               0               0
+14                0               1               0               0
+15                0               0               0               0
+16                0               1               0               0
+17                0               0               0               0
+18                0               1               0               0
+19                0               0               0               0
+20                0               0               0               0
+21                0               0               0               0
+22                0               0               0               1
+23                0               0               0               1
+24                0               1               0               0
+25                0               1               0               0
+26                0               0               0               0
+27                0               0               0               0
+28                0               0               0               0
+29                0               0               1               0
+30                0               0               0               0
+31                0               0               0               0
+32                0               1               0               0
+33                0               0               0               0
+34                0               0               0               0
+35                1               0               1               0
+36                0               0               0               0
+37                0               0               0               0
+38                0               0               1               0
+39                0               0               0               0
+40                0               0               0               0
+41                0               1               0               0
+42                0               0               1               0
+43                0               0               0               0
+44                0               0               0               0
+45                0               0               0               0
+46                0               0               1               0
+47                0               1               0               0
+48                0               0               1               0
+49                0               1               0               0
+50                0               0               0               0
+51                0               1               0               0
+52                0               1               0               0
+53                0               0               0               0
+54                0               1               0               0
+55                0               0               0               0
+56                0               1               0               0
+57                0               0               0               0
+58                0               1               0               0
+59                0               1               0               0
+60                0               1               0               0
+61                0               0               0               0
+62                0               1               0               0
+63                0               0               0               0
+64                0               1               0               0
+65                0               1               0               0
+66                0               0               0               0
+67                0               0               0               1
+68                0               1               0               0
+69                0               1               0               0
+70                0               1               0               0
+71                0               0               0               0
+72                0               1               0               0
+73                0               1               0               0
+74                0               0               0               1
+75                0               1               0               0
+76                0               1               0               0
+77                0               0               0               1
+78                0               0               0               0
+79                0               0               0               0
+80                0               1               0               0
+81                0               0               0               0
+82                0               1               0               0
+83                0               1               0               0
+84                0               1               0               0
+85                0               0               0               0
+86                0               0               0               0
+87                0               0               1               0
+88                0               0               0               0
+89                0               1               0               0
+90                0               1               0               0
+91                0               1               0               0
+92                0               0               1               0
+93                0               1               0               0
+94                0               0               1               0
+95                0               0               1               0
+96                0               0               0               0
+97                0               1               0               0
+98                0               1               0               0
+99                0               1               0               0
+100               0               0               0               0
+101               0               1               0               0
+102               0               1               0               0
+103               0               1               0               0
+104               0               0               0               0
+105               0               0               0               0
+106               0               1               0               0
+107               0               1               0               0
+108               0               0               0               0
+109               0               0               0               0
+    histology_categoryDuctal histology_categoryLobular histology_categoryOther
+1                          0                         0                       0
+2                          1                         0                       0
+3                          1                         0                       0
+4                          1                         0                       0
+5                          1                         0                       0
+6                          0                         1                       0
+7                          1                         0                       0
+8                          1                         0                       0
+9                          1                         0                       0
+10                         1                         0                       0
+11                         1                         0                       0
+12                         1                         0                       0
+13                         0                         1                       0
+14                         1                         0                       0
+15                         1                         0                       0
+16                         1                         0                       0
+17                         1                         0                       0
+18                         1                         0                       0
+19                         1                         0                       0
+20                         1                         0                       0
+21                         1                         0                       0
+22                         1                         0                       0
+23                         1                         0                       0
+24                         1                         0                       0
+25                         1                         0                       0
+26                         1                         0                       0
+27                         1                         0                       0
+28                         1                         0                       0
+29                         1                         0                       0
+30                         1                         0                       0
+31                         1                         0                       0
+32                         1                         0                       0
+33                         1                         0                       0
+34                         0                         1                       0
+35                         1                         0                       0
+36                         1                         0                       0
+37                         1                         0                       0
+38                         1                         0                       0
+39                         1                         0                       0
+40                         1                         0                       0
+41                         1                         0                       0
+42                         1                         0                       0
+43                         1                         0                       0
+44                         1                         0                       0
+45                         0                         0                       1
+46                         0                         1                       0
+47                         1                         0                       0
+48                         1                         0                       0
+49                         0                         1                       0
+50                         0                         1                       0
+51                         1                         0                       0
+52                         1                         0                       0
+53                         1                         0                       0
+54                         0                         1                       0
+55                         1                         0                       0
+56                         0                         1                       0
+57                         1                         0                       0
+58                         1                         0                       0
+59                         1                         0                       0
+60                         1                         0                       0
+61                         0                         0                       1
+62                         0                         1                       0
+63                         1                         0                       0
+64                         1                         0                       0
+65                         1                         0                       0
+66                         1                         0                       0
+67                         0                         1                       0
+68                         1                         0                       0
+69                         1                         0                       0
+70                         1                         0                       0
+71                         1                         0                       0
+72                         1                         0                       0
+73                         0                         0                       0
+74                         1                         0                       0
+75                         1                         0                       0
+76                         0                         0                       0
+77                         0                         0                       0
+78                         1                         0                       0
+79                         1                         0                       0
+80                         0                         0                       0
+81                         1                         0                       0
+82                         1                         0                       0
+83                         1                         0                       0
+84                         1                         0                       0
+85                         1                         0                       0
+86                         1                         0                       0
+87                         1                         0                       0
+88                         1                         0                       0
+89                         1                         0                       0
+90                         1                         0                       0
+91                         0                         0                       0
+92                         0                         1                       0
+93                         1                         0                       0
+94                         0                         1                       0
+95                         1                         0                       0
+96                         1                         0                       0
+97                         0                         0                       0
+98                         1                         0                       0
+99                         0                         0                       0
+100                        1                         0                       0
+101                        0                         1                       0
+102                        1                         0                       0
+103                        0                         1                       0
+104                        1                         0                       0
+105                        1                         0                       0
+106                        1                         0                       0
+107                        1                         0                       0
+108                        1                         0                       0
+109                        0                         0                       0
+    prtx_radiationRadiation prtx_chemoChemo prtx_endoEndocrine Therapy
+1                         1               1                          1
+2                         0               1                          1
+3                         0               1                          1
+4                         0               1                          0
+5                         0               1                          1
+6                         1               1                          1
+7                         1               1                          0
+8                         1               1                          1
+9                         1               1                          0
+10                        0               1                          0
+11                        1               1                          0
+12                        1               1                          1
+13                        1               1                          1
+14                        1               0                          1
+15                        0               1                          0
+16                        0               1                          0
+17                        1               1                          1
+18                        1               1                          0
+19                        0               1                          0
+20                        1               1                          0
+21                        1               1                          0
+22                        1               1                          1
+23                        1               1                          1
+24                        1               1                          0
+25                        0               1                          0
+26                        1               1                          0
+27                        0               1                          0
+28                        0               1                          0
+29                        1               1                          1
+30                        0               1                          0
+31                        1               1                          0
+32                        1               1                          1
+33                        0               1                          0
+34                        1               1                          1
+35                        1               0                          1
+36                        0               1                          1
+37                        0               1                          0
+38                        1               1                          1
+39                        0               1                          0
+40                        0               1                          1
+41                        1               1                          0
+42                        1               1                          0
+43                        1               1                          1
+44                        1               1                          1
+45                        1               1                          1
+46                        1               1                          1
+47                        0               1                          1
+48                        1               1                          1
+49                        1               1                          1
+50                        0               1                          1
+51                        1               1                          1
+52                        1               1                          0
+53                        1               1                          0
+54                        0               1                          1
+55                        1               1                          0
+56                        1               1                          1
+57                        0               1                          0
+58                        1               1                          0
+59                        1               1                          0
+60                        0               1                          0
+61                        0               1                          0
+62                        1               1                          1
+63                        1               1                          1
+64                        1               1                          0
+65                        1               1                          0
+66                        1               1                          0
+67                        1               1                          1
+68                        1               1                          1
+69                        0               1                          1
+70                        1               1                          1
+71                        0               1                          0
+72                        1               1                          1
+73                        1               1                          1
+74                        1               1                          1
+75                        1               1                          0
+76                        0               1                          1
+77                        1               1                          1
+78                        1               1                          0
+79                        1               1                          1
+80                        1               1                          1
+81                        1               1                          0
+82                        1               1                          0
+83                        1               1                          0
+84                        1               0                          1
+85                        1               1                          1
+86                        1               1                          1
+87                        1               1                          1
+88                        1               1                          0
+89                        1               1                          0
+90                        1               1                          1
+91                        1               1                          1
+92                        1               1                          1
+93                        1               1                          1
+94                        1               1                          1
+95                        1               1                          1
+96                        0               1                          1
+97                        1               1                          1
+98                        1               1                          0
+99                        0               1                          1
+100                       0               1                          0
+101                       0               1                          1
+102                       1               1                          0
+103                       1               1                          1
+104                       1               1                          1
+105                       0               1                          1
+106                       0               1                          1
+107                       1               1                          0
+108                       0               1                          0
+109                       0               1                          1
+    prtx_bonemodBone Modifying Treatment node_statusNode Positive
+1                                      0                        1
+2                                      0                        0
+3                                      0                        1
+4                                      0                        0
+5                                      0                        0
+6                                      1                        1
+7                                      0                        1
+8                                      0                        1
+9                                      0                        0
+10                                     0                        0
+11                                     0                        0
+12                                     0                        1
+13                                     0                        0
+14                                     0                        1
+15                                     0                        0
+16                                     0                        1
+17                                     0                        0
+18                                     0                        1
+19                                     0                        0
+20                                     0                        0
+21                                     0                        0
+22                                     1                        1
+23                                     1                        1
+24                                     0                        1
+25                                     0                        1
+26                                     0                        0
+27                                     0                        0
+28                                     0                        0
+29                                     0                        1
+30                                     0                        0
+31                                     0                        0
+32                                     0                        1
+33                                     0                        0
+34                                     1                        0
+35                                     0                        1
+36                                     1                        0
+37                                     0                        0
+38                                     1                        1
+39                                     0                        0
+40                                     1                        0
+41                                     0                        1
+42                                     0                        1
+43                                     0                        0
+44                                     1                        0
+45                                     1                        0
+46                                     1                        1
+47                                     0                        1
+48                                     1                        1
+49                                     0                        1
+50                                     1                        0
+51                                     1                        1
+52                                     0                        1
+53                                     0                        0
+54                                     0                        1
+55                                     0                        0
+56                                     1                        1
+57                                     1                        0
+58                                     0                        1
+59                                     0                        1
+60                                     0                        1
+61                                     0                        0
+62                                     1                        1
+63                                     0                        0
+64                                     0                        1
+65                                     0                        1
+66                                     0                        0
+67                                     1                        1
+68                                     0                        1
+69                                     1                        1
+70                                     1                        1
+71                                     0                        0
+72                                     0                        1
+73                                     0                        1
+74                                     1                        1
+75                                     1                        1
+76                                     1                        1
+77                                     1                        1
+78                                     0                        0
+79                                     1                        0
+80                                     1                        1
+81                                     0                        0
+82                                     0                        1
+83                                     0                        1
+84                                     1                        1
+85                                     1                        0
+86                                     1                        0
+87                                     1                        1
+88                                     0                        0
+89                                     0                        1
+90                                     1                        1
+91                                     1                        1
+92                                     1                        1
+93                                     1                        1
+94                                     1                        1
+95                                     1                        1
+96                                     0                        0
+97                                     1                        1
+98                                     0                        1
+99                                     0                        1
+100                                    0                        0
+101                                    1                        1
+102                                    0                        1
+103                                    1                        1
+104                                    0                        0
+105                                    1                        0
+106                                    0                        1
+107                                    0                        1
+108                                    0                        0
+109                                    0                        0
+    axillary_dissectionAxillary Dissection diag_surgery_type_1Mastectomy
+1                                        1                             0
+2                                        1                             1
+3                                        1                             1
+4                                        0                             1
+5                                        0                             1
+6                                        1                             1
+7                                        1                             0
+8                                        1                             1
+9                                        0                             0
+10                                       0                             0
+11                                       0                             0
+12                                       0                             0
+13                                       0                             0
+14                                       1                             0
+15                                       0                             1
+16                                       1                             1
+17                                       0                             0
+18                                       0                             0
+19                                       0                             1
+20                                       0                             0
+21                                       0                             0
+22                                       1                             1
+23                                       1                             0
+24                                       1                             1
+25                                       1                             1
+26                                       0                             0
+27                                       1                             0
+28                                       0                             1
+29                                       1                             0
+30                                       0                             1
+31                                       0                             0
+32                                       0                             0
+33                                       0                             1
+34                                       1                             1
+35                                       1                             1
+36                                       0                             1
+37                                       0                             1
+38                                       0                             0
+39                                       0                             0
+40                                       0                             1
+41                                       1                             1
+42                                       1                             1
+43                                       1                             1
+44                                       0                             0
+45                                       0                             0
+46                                       1                             1
+47                                       1                             1
+48                                       1                             0
+49                                       1                             1
+50                                       1                             1
+51                                       0                             0
+52                                       0                             0
+53                                       0                             1
+54                                       1                             1
+55                                       0                             0
+56                                       1                             1
+57                                       0                             1
+58                                       0                             0
+59                                       0                             0
+60                                       1                             1
+61                                       1                             1
+62                                       1                             1
+63                                       0                             0
+64                                       1                             1
+65                                       1                             1
+66                                       1                             1
+67                                       1                             0
+68                                       1                             0
+69                                       0                             1
+70                                       1                             1
+71                                       0                             1
+72                                       1                             1
+73                                       0                             0
+74                                       1                             1
+75                                       1                             1
+76                                       1                             1
+77                                       1                             0
+78                                       0                             0
+79                                       0                             0
+80                                       0                             0
+81                                       0                             0
+82                                       1                             1
+83                                       1                             1
+84                                       0                             1
+85                                       0                             0
+86                                       0                             0
+87                                       1                             0
+88                                       0                             0
+89                                       1                             1
+90                                       1                             1
+91                                       1                             0
+92                                       1                             1
+93                                       1                             1
+94                                       1                             1
+95                                       1                             1
+96                                       0                             1
+97                                       1                             1
+98                                       0                             0
+99                                       1                             1
+100                                      0                             0
+101                                      0                             1
+102                                      1                             1
+103                                      1                             1
+104                                      0                             0
+105                                      0                             1
+106                                      1                             1
+107                                      0                             1
+108                                      0                             1
+109                                      0                             1
+    diag_neoadj_chemo_1Neoadjuvant Chemo
+1                                      0
+2                                      0
+3                                      0
+4                                      0
+5                                      1
+6                                      0
+7                                      0
+8                                      0
+9                                      0
+10                                     0
+11                                     0
+12                                     0
+13                                     0
+14                                     0
+15                                     0
+16                                     0
+17                                     0
+18                                     0
+19                                     0
+20                                     0
+21                                     0
+22                                     0
+23                                     0
+24                                     0
+25                                     0
+26                                     0
+27                                     0
+28                                     1
+29                                     0
+30                                     0
+31                                     0
+32                                     0
+33                                     0
+34                                     1
+35                                     0
+36                                     1
+37                                     0
+38                                     0
+39                                     0
+40                                     0
+41                                     0
+42                                     0
+43                                     1
+44                                     0
+45                                     0
+46                                     0
+47                                     0
+48                                     0
+49                                     1
+50                                     0
+51                                     0
+52                                     0
+53                                     0
+54                                     0
+55                                     0
+56                                     0
+57                                     0
+58                                     0
+59                                     0
+60                                     0
+61                                     0
+62                                     0
+63                                     0
+64                                     1
+65                                     1
+66                                     1
+67                                     0
+68                                     0
+69                                     0
+70                                     1
+71                                     0
+72                                     1
+73                                     0
+74                                     0
+75                                     1
+76                                     0
+77                                     0
+78                                     0
+79                                     0
+80                                     0
+81                                     0
+82                                     1
+83                                     1
+84                                     0
+85                                     0
+86                                     0
+87                                     0
+88                                     0
+89                                     0
+90                                     1
+91                                     0
+92                                     0
+93                                     1
+94                                     0
+95                                     0
+96                                     0
+97                                     0
+98                                     0
+99                                     0
+100                                    1
+101                                    0
+102                                    1
+103                                    0
+104                                    0
+105                                    0
+106                                    0
+107                                    1
+108                                    0
+109                                    0
+attr(,"assign")
+ [1]  1  2  2  2  2  3  3  4  4  5  5  6  6  6  7  7  7  8  8  8  9 10 11 12 13
+[26] 14 15 16
+attr(,"contrasts")
+attr(,"contrasts")$final_receptor_group
+[1] "contr.treatment"
+
+attr(,"contrasts")$demo_race_final
+[1] "contr.treatment"
+
+attr(,"contrasts")$final_tumor_grade
+[1] "contr.treatment"
+
+attr(,"contrasts")$final_overall_stage
+[1] "contr.treatment"
+
+attr(,"contrasts")$final_t_stage
+[1] "contr.treatment"
+
+attr(,"contrasts")$final_n_stage
+[1] "contr.treatment"
+
+attr(,"contrasts")$histology_category
+[1] "contr.treatment"
+
+attr(,"contrasts")$prtx_radiation
+[1] "contr.treatment"
+
+attr(,"contrasts")$prtx_chemo
+[1] "contr.treatment"
+
+attr(,"contrasts")$prtx_endo
+[1] "contr.treatment"
+
+attr(,"contrasts")$prtx_bonemod
+[1] "contr.treatment"
+
+attr(,"contrasts")$node_status
+[1] "contr.treatment"
+
+attr(,"contrasts")$axillary_dissection
+[1] "contr.treatment"
+
+attr(,"contrasts")$diag_surgery_type_1
+[1] "contr.treatment"
+
+attr(,"contrasts")$diag_neoadj_chemo_1
+[1] "contr.treatment"
+
+
dim(X2)  # Rows should match nrow(dtc_unique_subset_data)
+
+
[1] 109  28
+
+
length(y1)  # Should also match nrow(dtc_unique_subset_data). We have the same # (109)!
+
+
[1] 109
+
+
lasso_model <- glmnet(X2, y1, family = "binomial", alpha = 1)  # alpha = 1 for lasso. 0 for ridge. 
+
+#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
+cv_lasso_model <- cv.glmnet(X2, y1, family = "binomial", alpha = 1)
+
+#plotting the results to look at the performance of different lamda 
+plot(cv_lasso_model)
+
+
+
+

+
+
+
+
#getting the best lambda  -- best lambda is 0.024, even lower! 
+best_lambda <- cv_lasso_model$lambda.min
+print(paste("Best lambda:", best_lambda)) 
+
+
[1] "Best lambda: 0.0243089462466253"
+
+
#Finding the final fit model with the optimal lambda 
+final_lasso_model <- glmnet(X2, y1, family = "binomial", alpha = 1, lambda = best_lambda)
+
+#Which coefficents are included in the model. For the model with DTC positivity, we get more coefficients retained in the model.  
+coef(final_lasso_model) 
+
+
29 x 1 sparse Matrix of class "dgCMatrix"
+                                                s0
+(Intercept)                            -1.84167985
+age_at_diag                             .         
+final_receptor_groupTNBC                .         
+final_receptor_groupHR+ HER2-           .         
+final_receptor_groupHR+ HER2+           0.45710104
+final_receptor_groupHR- HER2+          -1.69722848
+demo_race_finalAsian                   -0.74138771
+demo_race_finalWhite                    .         
+final_tumor_gradeGrade 1               -0.38615694
+final_tumor_gradeGrade 2                .         
+final_overall_stageStage II             .         
+final_overall_stageStage III            .         
+final_t_stageT2                         .         
+final_t_stageT3                         .         
+final_t_stageT4                         1.81638875
+final_n_stageN1                         .         
+final_n_stageN2                        -0.02487123
+final_n_stageN3                         0.39987630
+histology_categoryDuctal                1.10854122
+histology_categoryLobular               .         
+histology_categoryOther                -0.62014177
+prtx_radiationRadiation                 0.61433722
+prtx_chemoChemo                         .         
+prtx_endoEndocrine Therapy              .         
+prtx_bonemodBone Modifying Treatment    0.12821102
+node_statusNode Positive               -0.80642910
+axillary_dissectionAxillary Dissection  .         
+diag_surgery_type_1Mastectomy           0.60444202
+diag_neoadj_chemo_1Neoadjuvant Chemo    0.23561799
+
+
+

For the LASSO model with DTC positivity, we get many more coefficients retained in the model and a lower lambda, which suggests a better overall model. Most notable factors in the LASSO for DTC positivity are higher T stage (with T4 inducing the highest log odds risk of ctDNA positivity), triple positive status (HR+ HER2+) which has a strong negative association with DTC positivty (though this cohort only had a handful of people who met this criteria), and ductal histology. Other influential factors using LASSO are node negativity, radiation history, bone modifying treatment, mastectomy, and neoadjuvant therapy. We will try this modeling for DTC positivity without our nodal status variable as this is likely colinear with node positivity to see how our model changes.

+
+
#### DTC LASSO without final_n_stage in it 
+
+set.seed(123) 
+
+#run the lasso for DTC status. This might work better as there are more DTC + results 
+y1 <- dtc_unique_subset_data$dtc_ever
+
+
+### removed final_n_stage as a variable 
+X4 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+
+                                   final_tumor_grade + final_overall_stage + 
+                                      + final_t_stage  + 
+                                      histology_category + prtx_radiation + 
+                                      + prtx_chemo + prtx_endo + prtx_bonemod  
+                                       + axillary_dissection +  node_status +
+                                      diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data)
+
+ 
+
+dim(X4)  # Rows should match nrow(dtc_unique_subset_data)
+
+
[1] 109  25
+
+
length(y1)  # Should also match nrow(dtc_unique_subset_data). We have the same # (109)!
+
+
[1] 109
+
+
lasso_model <- glmnet(X4, y1, family = "binomial", alpha = 1)  # alpha = 1 for lasso. 0 for ridge. 
+
+#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
+cv_lasso_model <- cv.glmnet(X4, y1, family = "binomial", alpha = 1)
+
+#plotting the results to look at the performance of different lamda 
+plot(cv_lasso_model)
+
+
+
+

+
+
+
+
#getting the best lambda  -- best lambda is 0.027, same as above
+best_lambda <- cv_lasso_model$lambda.min
+print(paste("Best lambda:", best_lambda)) 
+
+
[1] "Best lambda: 0.0266790384961084"
+
+
#Finding the final fit model with the optimal lambda 
+final_lasso_model <- glmnet(X3, y1, family = "binomial", alpha = 1, lambda = best_lambda)
+
+#Which coefficents are included in the model. For the model with DTC positivity, we get more coefficients retained in the model. Most notable is the influence of axillary dissection (or none) on the log-odds of dtc positivity.  Other influential factors using LASSO are surgery type (with mastectomy vs lumpectomy increasing the log odds of DTC positivity) and neoadjuvant chemotherapy (with the presence of NACT increasing the log odds of DTC positivity). 
+coef(final_lasso_model) 
+
+
28 x 1 sparse Matrix of class "dgCMatrix"
+                                                s0
+(Intercept)                            -1.74827344
+age_at_diag                             .         
+final_receptor_groupTNBC                .         
+final_receptor_groupHR+ HER2-           .         
+final_receptor_groupHR+ HER2+           0.38484232
+final_receptor_groupHR- HER2+          -1.63882020
+demo_race_finalAsian                   -0.57458506
+demo_race_finalWhite                    .         
+final_tumor_gradeGrade 1               -0.35364514
+final_tumor_gradeGrade 2                .         
+final_overall_stageStage II             .         
+final_overall_stageStage III            .         
+final_t_stageT2                         .         
+final_t_stageT3                         .         
+final_t_stageT4                         1.60923057
+final_n_stageN1                        -0.60299273
+final_n_stageN2                        -0.53436611
+final_n_stageN3                         .         
+histology_categoryDuctal                1.07913426
+histology_categoryLobular               .         
+histology_categoryOther                -0.35868770
+prtx_radiationRadiation                 0.48044674
+prtx_chemoChemo                         .         
+prtx_endoEndocrine Therapy              .         
+prtx_bonemodBone Modifying Treatment    0.05228216
+axillary_dissectionAxillary Dissection -0.09409813
+diag_surgery_type_1Mastectomy           0.51967688
+diag_neoadj_chemo_1Neoadjuvant Chemo    0.28396596
+
+
+

In this last LASSO, in which we removed nodal_status to just assess the more granular final_n_stage (N1 vs N2 vs N3 etc), a few more variables became more significant. T4 stage, ductal histology, and receptor status maintained their strong relationships with DTC positivity, and several other variables maintained their less strong relationships (including grade, nodal status, race, bone modifying treatment, mastectomy, and neoadjuvant therapy–which all increased the risk of DTC positivity). Axillary dissection was negatively associated with dtc positivity–but just barely. These models without the node_status variable are the ones we will choose given that the lambdas are about the same or lower (compared to those including node_status) for both the ctDNA and DTC models as these make more intuitive sense than including two variables that are very similar to one another (as they represent the same information in different ways).

+
+
+

5 Conclusion

+

In this cohort of 109 individuals on the SURMOUNT study, DTC positivity occurred more frequently (in around 30% of individuals) than ctDNA postiivity, which occurred in < 10% of patients either at baseline or during surveillance. Despite low numbers, there was good concordance between ctDNA and DTC positivity (in particular, accounting for timepoint, with a concordance of 0.8).

+

In assessing predictors of ctDNA positivity, we identified that higher T stage and N stage remain the most significant predictors of ctDNA positivity (With age at diagnosis, HR+ and HER2+, lobular histology, and lower grade also serving as significant predictors of ctDNA positivity). The lambda for this model is 0.048.

+

In assessing predictors of DTC positivity using LASSO, we identified a bunch of factors including ductal histology, higher T stage (larger tumor size), and HER2 negative histology as the factors most strongly associated with DTC positivity. Other factors that were associated in multivariable approaches included factors representing more treatment (mastectomy, radiation, and neoadjuvant therapy). Interestingly, nodal positivity seemed to be negatively associated with DTC positivity. The lambda for this model is 0.027.

+

It is worth noting that the ctDNA model in particular is challenging to interpret in the setting of the low number of ctDNA positive individuals (n=9).

+

Overall, ctDNA status was significantly associated with relapse (p<0.01), with a PPV of 89% and NPV of 94% (and a specificity for relapse of 0.99). DTC positivity was NOT significantly associated with relapse and the sensitivity and specificity of this test for relapse was challenging to interpret in light of the fact that all DTC positive patients in this cohort patients went onto interventional trials aimed at eliminating dormant cancer cells. The negative predictive value of DTC assessment was high (0.86), suggesting that this test may potentially be useful in identifying those individuals who are at lower risk of relapse.

+

Future directions will be aimed at assessing the test characteristics of DTC assessment in the full cohort of patients on SURMOUNT to date (n=220) and looking at the incremental value of multiple testing, obtaining ctDNA assessment for this full cohort of patients, and performing survival analyses to assess lead time to clinical events (relapse, death) with DTC and ctDNA assessment and looking at the fluctuation of ctDNA positivity among those patients on clinical trials who had frequent testing while on therapy (and following therapy).

+

We had several limitations: Missing data (though low levels for our variables of interest for this analysis). Our model also includes colinear variables–or variables that represent different ways of thinking about tumor aggressiveness or disease aggressiveness (such as T stage and N stage, which directly feed into Overall Stage) in the LASSO. The LASSO does not account for this, so we will try group lasso as our next step. We also had limited power in creating predictive model for ctDNA in particular given the rarity of positivity in our cohort (though this rate matches the positivity rate in other cohort studies).

@@ -222,7 +16606,7 @@

Results

return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href); } // Inspect non-navigation links and adorn them if external - var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)'); + var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)'); for (var i=0; iResults - \ No newline at end of file + From db3e451436b1072d951e2fcb110eeb8e21704a84 Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 9 Dec 2024 14:46:34 -0500 Subject: [PATCH 08/14] Update FinalProjectTaranto.qmd --- FinalProjectTaranto.qmd | 1441 +++++++++++++++++++-------------------- 1 file changed, 691 insertions(+), 750 deletions(-) diff --git a/FinalProjectTaranto.qmd b/FinalProjectTaranto.qmd index b51dce736..8ab32f5ad 100644 --- a/FinalProjectTaranto.qmd +++ b/FinalProjectTaranto.qmd @@ -14,42 +14,42 @@ embed-resources: true Link to my final project: https://github.com/ntaranto/BMIN503_Final_Project -After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA (ctDNA) as risk factors for recurrence. Historically, and in retrospective, observational studies, these markers of minimal residual disease (MRD) have demonstrated strong associations with recurrence and disease relapse. But it remains unclear how these associations persist with modern treatment paradigms what the time course of positivity and what clinical risk factors put patients at greatest risk for having DTC or ctDNA positivity and subsequent relapse--and which most strongly predict biomarker positivity. +After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA (ctDNA) as risk factors for recurrence. Historically, in retrospective, observational studies, these markers of minimal residual disease (MRD) have demonstrated strong associations with recurrence and disease relapse. But it remains unclear how these associations persist with modern treatment paradigms, what the time course of positivity, and what clinical risk factors put patients at greatest risk for having DTC or ctDNA positivity and subsequent relapse--as well as which most strongly predict biomarker positivity. -Optimizing detection, intervention and surveillance for MRD after breast cancer treatment utilizing highly sensitive, real-time assays to identify patients with dormant tumor cells and those undergoing tumor reactivation in-transit to recurrence would fundamentally change the paradigm of breast cancer follow up from one of “watchful waiting” to a proactive approach that could enable ongoing monitoring for subclinical disease and suppression or eradication of cells that could potentially metastasize. Ultimately, our approach has the potential to provide reassurance to patients with definitively negative MRD testing that they are unlikely to ever experience a relapse, enable effective MRD-based surveillance, detection and treatment strategies for those in whom it is detectable and, ultimately, prevent recurrence and subsequent death from metastatic breast cancer while preserving quality of life for millions of breast cancer survivors around the world. +Optimizing detection, intervention and surveillance for MRD after breast cancer treatment utilizing highly sensitive, real-time assays to identify patients with dormant tumor cells and those undergoing tumor reactivation in-transit to recurrence would fundamentally change the paradigm of breast cancer follow up from one of “watchful waiting” to a proactive approach that could enable ongoing monitoring for subclinical disease and suppression or eradication of cells that could potentially metastasize. Ultimately, an approach using MRD biomarkers could provide reassurance to patients with definitively negative MRD testing that they are unlikely to experience a relapse, could enable detection and treatment strategies for those in whom MRD is detectable and, ultimately, prevent recurrence and subsequent death from metastatic breast cancer while preserving quality of life for millions of breast cancer survivors around the world. -In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this overall study, we are assessing the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time, optimizing the type and number of tests needed to predict recurrence, outcomes and lead time, and further evaluating the long-term impact of our prior therapeutic interventions. In this specific analysis, we will look at clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed. +In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this study, we are following patients over multiple years after their breast cancer for these markers and clinically following them for relapse and survival events. The goal of this study is to assess the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time and risk factors for DTC and ctDNA positivity, optimize the type and number of tests needed to predict recurrence, and further evaluate the long-term impact of therapeutic interventions aimed at eliminating DTCs. -For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the associations between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about the biomarkers of breast cancer recurrence and dormance more broadly. +In this specific analysis, we will look at the clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed. -## Introduction {#sec-introduction} +For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the associations between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about clinical predictors of positivity and the biomarkers of breast cancer recurrence and dormance more broadly. -Breast cancer is the most prevalent cancer since it is both common and treatable, with 5-year survival rates approaching 90%. Of the 14.5 million cancer survivors alive today in the U.S., nearly 1 in 4 are breast cancer survivors. Unfortunately, up to 30% of patients who receive adjuvant therapy for breast cancer will ultimately recur and die from their disease, typically as a consequence of metastatic recurrence. Since recurrent breast cancer is incurable, the propensity of cancers to recur following treatment is arguably the most important determinant of clinical outcome. +## Introduction {#sec-introduction} -Recurrent breast cancers arise from the pool of breast cancer cells that remain after initial treatment, likely in the form of minimal residual disease (MRD): local and disseminated residual tumor cells (RTCs) that survive in their host in a presumed dormant state following treatment of the primary breast cancer.The development of incurable metastatic disease is due to this persistent pool of residual disease resulting from escape of cells from the primary tumor, intravasation and survival in the circulation, and eventual extravasation and metastatic seeding. Many breast cancers pass through a latent phase in which disseminated tumor cells (DTCs) persist in niches where they may reside in a dormant state for months to decades. These DTCs exist in a temporary quiescent state of reversible cell-cycle arrest, from which some cells may eventually reactivate, resume proliferation and recirculate, at which point they can be detected as circulating tumor DNA (ctDNA) in the blood. Longitudinal studies demonstrate that the detection of DTCs in the bone marrow in such patients is associated with poorer disease-free, breast cancer recurrence-free, and overall survival compared to patients without DTCs. Several mechanisms implicated in this process by preclinical studies are therapeutically targetable, and we have developed several interventional trials aimed at targeting these DTCs that are fed by the SURMOUNT surveillance study. +Breast cancer is the most prevalent cancer since it is both common and treatable, with 5-year survival rates approaching 90%. Of the 14.5 million cancer survivors alive today in the U.S., nearly 1 in 4 are breast cancer survivors. Unfortunately, up to 30% of patients who receive adjuvant therapy for breast cancer will ultimately recur and die from their disease, typically as a consequence of metastatic recurrence. Since recurrent breast cancer is incurable, the propensity of cancers to recur following treatment is arguably the most important determinant of clinical outcome. Recurrent breast cancers arise from the pool of breast cancer cells that remain after initial treatment, likely in the form of minimal residual disease (MRD): local and disseminated residual tumor cells that survive in their host in a presumed dormant state following treatment of the primary breast cancer. The development of incurable metastatic disease is thought to be due to this persistent pool of residual disease resulting from escape of cells from the primary tumor, intravasation and survival in the circulation, and eventual extravasation and metastatic seeding. Many breast cancers pass through a latent phase in which disseminated tumor cells (DTCs) may persist in niches where they may reside in a dormant state for months to decades. These DTCs are thought to exist in a temporary quiescent state of reversible cell-cycle arrest, from which some cells may eventually reactivate, resume proliferation and recirculate, at which point they can be detected as circulating tumor DNA (ctDNA) in the blood. Longitudinal studies demonstrate that the detection of DTCs in the bone marrow in such patients is associated with poorer disease-free, breast cancer recurrence-free, and overall survival compared to patients without DTCs. Several mechanisms implicated in this process by preclinical studies are therapeutically targetable, and the research group in the 2-PREVENT Breast Cancer Translational Center of Excellence (TCE) have developed several interventional trials aimed at targeting these DTCs. -In the SURMOUNT surveillance study, patients with early stage (i.e. curable) but high-risk breast cancer are enrolled and undergo initial baseline bone marrow assessment (BMA) for evaluation of DTCs, as well as peripheral blood assessment for retrospective ctDNA assesmsent. Patients who screen DTC positive--either at baseline or on yearly surveillance BMA--are referred for interventional trials. Patients who screen DTC negative remain in the SURMOUNT surveillance cohort and undergo yearly DTC assessment and peripheral blood collection for ctDNA assessment. The first intervention trial, CLEVER, completed enrollment in 2021, and so this initial analysis is focused on the patients who were enrolled for the purposes of accruing this first trial. +However, it still remains unclear how exactly the presence of DTCs and/or ctDNA predicts relapse in the era of modern treatment for breast cancers, including chemotherapy, immunotherapy, surgery, targeted treatments, and radiation. Questions remain about who will develop DTC/ctDNA positivity, which patients with DTC positivity will have these cells reactivate, whether or not and when DTC positivity leads to ctDNA positivity, and which patients with these markers will develop relapse and subsequent metastatic disease.In the SURMOUNT surveillance study, patients with early stage (i.e. curable) but high-risk breast cancer are enrolled and undergo initial baseline bone marrow assessment (BMA) for evaluation of DTCs by immunohistochemistry (IHC), as well as peripheral blood assessment for retrospective ctDNA assesmsent. Patients who screen DTC positive--either at baseline or on yearly surveillance BMA--are referred for interventional trials aimed at eliminating dormant cells prior to clinical relapse. Patients who screen DTC negative remain in the SURMOUNT surveillance cohort and undergo yearly DTC assessment and peripheral blood collection for ctDNA assessment. All patients are followed for recurrence events and survival. The first intervention trial, CLEVER, completed enrollment in 2021, so this initial analysis is focused on the patients who were enrolled on SURMOUNT for the purposes of accruing this first trial. -Despite years of progress in breast cancer diagnostics and therapeutics, identifying the individuals at risk of recurrence -- and figuring out how to manage and minimize their elevated risk--remains a challenge. In this study, we seek to assess the clinical validity of DTC and ctDNA assessment and to better understand the population in which they may be useful. Specifically, in this analysis, we looked at overall rates of ctDNA and DTC positivity in this cohort and clinical factors that were associated with each. +Despite years of progress in breast cancer diagnostics and therapeutics, identifying the individuals at risk of recurrence -- and figuring out how to manage and minimize their elevated risk--remains a challenge. In this study, we sought to assess the clinical validity of DTC and ctDNA assessment and to better understand the population in which they may be useful. Specifically, in this analysis, we looked at overall rates of ctDNA and DTC positivity in this cohort and clinical factors that were associated with each. ## Methods {#sec-methods} -“PENN SURMOUNT” is a single center, prospective, longitudinal cohort study examining MRD biomarkers among pts within 5 years (y) of BC diagnosis who completed all curative treatment except endocrine therapy. Eligible pts must have had: 1) TNBC, or 2) HER2+ or HR+ BC with positive LN and/or residual disease after neoadjuvant therapy, or 3) HR+ BC with a 21-gene Recurrence score \>25 and/or high risk Mammaprint. Pts had annual bone marrow aspirate (BMA) for DTCs by immunohistochemistry (using methods of Naume et al.). DTC+ pts went on therapeutic trial; DTC- pts had up to 5y of annual BMA and blood testing. ctDNA was retrospectively assessed using the RaDaR assay, which targets pt-specific somatic mutations identified by whole-exome sequencing (WES) of primary tumor tissue. +**“PENN SURMOUNT”**: SURMOUNT is a single center, prospective, longitudinal cohort study examining MRD biomarkers among pts within 5 years (y) of BC diagnosis who completed all curative treatment except endocrine therapy. Eligible pts must have had: 1) TNBC, or 2) HER2+ or HR+ BC with positive LN and/or residual disease after neoadjuvant therapy, or 3) HR+ BC with a 21-gene Recurrence score \>25 and/or high risk Mammaprint. Pts had annual bone marrow aspirate (BMA) for DTCs by immunohistochemistry (using methods of Naume et al.). DTC+ pts went on therapeutic trial; DTC- pts had up to 5y of annual BMA and blood testing. ctDNA was retrospectively assessed using the RaDaR assay (NeoGenomics), which targets pt-specific somatic mutations identified by whole-exome sequencing (WES) of primary tumor tissue. -The ctDNA assessment was performed after bespoke panel development on tissue on peripheral blood from 109 patients by Neogenomics, inc. and provided back to the research team in .csv format, with the last data drop occurring July 30, 2024. DTC assessment was performed based on bone marrow assessment and ultimately entered into REDCap database by the research team through this same follow-up date. Clinical and demographic factors--and follow-up data--were abstracted by the research team through October 2024 and entered into the same REDCap database. Data were exported in mid-October 2024 by the TCE data manager, and merged with the ctDNA data prior to hand-off for this analysis. The final locked and merged dataset, labeled "surmount184_merged_20241108.csv" is maintained in the TCE box for the ctDNA analysis, and a copy is being stored in the FinalProject_files. +**Data Collection and Merge:** The ctDNA assessment was performed after bespoke panel development on tissue on peripheral blood from 109 patients by Neogenomics, inc. and provided back to the research team in .csv format, with the last data drop occurring July 30, 2024. DTC assessment was performed based on bone marrow assessment and ultimately entered into a REDCap database by the research team through this same follow-up date. Clinical and demographic factors--and follow-up data--were abstracted by the research team through October 2024 and entered into the same REDCap database. Data were exported in mid-October 2024 by the TCE data manager, and merged with the ctDNA data prior to hand-off for this analysis. The final locked and merged dataset, labeled "surmount184_merged_20241108.csv" is maintained in the TCE box for the ctDNA analysis, and a copy is being stored in the FinalProject_files. -**First,** we will import csv of final data, which is entitled "surmount184_merged_20241108.csv" +**First,** we will import csv of final data, which is entitled "surmount184_merged_20241108.csv." + +``` {r} -```{r} library(here) library(dplyr) -d <- read.csv(file = here("FinalProject_files", +d <- read.csv(file = here("data", "surmount184_merged_20241108.csv")) - ``` -**Next,** we will limit the data to the 109 patients who had ctDNA tested, of the 184 individuals who were included in the initial CLEVER trial screening group. We will look at the names and structures of the variables in the dataset "d", of which there are 387, the majority of which are clinical variables (often categorical dummy variables), but some of which are outcome variables related to local relapse, distant relapse, and survival as well as to the pathology report accounting for DTC. +**Next,** we will limit the data to the 109 patients who had ctDNA tested, of the 184 individuals who were included in the initial CLEVER trial screening group. We will look at the names and structures of the variables in the dataset "d", of which there are 387, the majority of which are clinical variables (often categorical dummy variables), but some of which are outcome variables related to local relapse, distant relapse, and survival as well as to the pathology report accounting for DTC. This list will help us to identify the important factors to include ultimately in our multivariable model to predict positivity of these markers. We will also look at the structure of the variables as we may need to reformat some of them for analyses. ```{r} @@ -61,7 +61,7 @@ str(d) **Summary variables:** We have a few different important summary variables which we've identified. Summary variables: final_overall_stage final_t_stage final_n_stage, final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') final_tumor_grade final_histology, demo_race_final fu_locreg_site_num (numeric values for local regional site), fu_locreg_site_char (character values for local regional site), fu_dist_site_num (numeric values for distant site), fu_dist_site_char (character values for distant site), censor_date (most recent fu_date_to among patients who are alive without local or distant progression). -**Limiting from the overall cohort (184) to the ctDNA cohort**: We know that this data merge contains 184 individuals (as this was the overall cohort of individuals that were screened for the CLEVER interventional study on SURMOUNT), but also know, from the separate ctDNA csv and the information from the Neogenomics summary data, that there were 109 individuals on whom ctDNA was assessed. We need to limit the data set "d" to this "ctDNA cohort"--we will call the ctDNA cohort "subset_data." We have an indicator variable "ctDNA_cohort" with which we can limit this subset. +**Limiting from the overall cohort (184) to the ctDNA cohort**: We know that this dataset contains 184 individuals (as this was the overall cohort of individuals that were screened for the CLEVER interventional study on SURMOUNT). But we also know, from the separate ctDNA csv and the information from the Neogenomics summary data, that there were 109 individuals on whom ctDNA was assessed. We need to limit the data set "d" to this "ctDNA cohort"--we will call the ctDNA cohort "subset_data." We have an indicator variable "ctDNA_cohort" with which we can limit this subset. ```{r} @@ -90,7 +90,7 @@ unique_count ``` -Now that we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected. We can see, looking at the table by sample count using the ctDNA_detected variable (false = negative/ctDNA was NOT detected, true = positive/ctDNA was detected), that there were 385 negative samples, and 11 positive samples within the ctDNA cohort. Next, we will create the ctDNA_ever variable that will, by participant_id (which is the unique study ID), tell us if that participant ever had ctDNA detected. +**Creating the ctDNA_ever positive indicator**: Now that we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected. We can see, looking at the table by sample count using the ctDNA_detected variable (false = negative/ctDNA was NOT detected, true = positive/ctDNA was detected), that there were 385 negative samples, and 11 positive samples within the ctDNA cohort. Next, we will create the ctDNA_ever variable that will, by participant_id (which is the unique study ID), tell us if that participant ever had ctDNA detected. ```{r} #ctDNA_detected = character, ok @@ -120,7 +120,7 @@ subset_data |> We can see here using the summary variable ctDNA_ever that there are 100 individuals with always negative results, and 9 individuals with "ever positive" ctDNA results, which matches our original ctDNA source data. -**Ever DTC Positive** Next, we will create a variable to represent whether someone ever had a DTC positive test. To do this, we will use the final result variable "dtc_ihc_result_final" which tells us, for a given sample/date, whether that DTC result was positive ("1") or negative ("0"). We see in this data set, by sample, that there are 221 negatives, and 49 positives, which aligns with our prior data and consorts. +**Creating the Ever DTC Positive Variable** Next, we will create a variable to represent whether someone ever had a DTC positive test. To do this, we will use the final result variable "dtc_ihc_result_final" which tells us, for a given sample/date, whether that DTC result was positive ("1") or negative ("0"). We see in this data set, by sample, that there are 221 negative samples, and 49 positive samples in this dataset (accross 109 patients, 39 of whom were DTC positive), which aligns with our prior data and consorts. ```{r} @@ -154,10 +154,9 @@ Looking at the number of DTC positives by unique participant, we see 70 DTC ever ## Results {#sec-results} -**Sample and Testing Information:** -In this cohort of 109 individuals who had ctDNA and DTC testing, 100 remained persistently ctDNA negative, and 70 remained persistently DTC negative--with 9 respective ctDNA positive individuals and 39 DTC positive individuals. Of 184 pts enrolled from 2016 – 2021, 121 had tissue available; 114/121 (94%) had successful WES (prior data/NeoGenomics data). +**Sample and Testing Information:** In this cohort of 109 individuals who had ctDNA and DTC testing on SURMOUNT (either at baseline or in follow-up), 100 remained persistently ctDNA negative, and 70 remained persistently DTC negative--with 9 respective ctDNA-positive individuals and 39 DTC-positive individuals. Of 184 pts enrolled from 2016 – 2021, 121 had tissue available; 114/121 (94%) had successful WES (prior data/NeoGenomics data). -``` {r} +```{r} #counts for ctDNA positivity subset_data |> @@ -222,12 +221,12 @@ subset_data |> ### Timepoint Data (# timepoints per patient) # Timepoints per patient (median, range), overall -timepoints_per_patient <- subset_data %>% - group_by(participant_id) %>% +timepoints_per_patient <- subset_data |> + group_by(participant_id) |> summarise( total_timepoints = n_distinct(timepoint), # Count distinct timepoints for each patient .groups = "drop" - ) %>% + ) |> summarise( median_timepoints = median(total_timepoints, na.rm = TRUE), # Calculate median min_timepoints = min(total_timepoints, na.rm = TRUE), # Calculate minimum @@ -236,13 +235,13 @@ timepoints_per_patient <- subset_data %>% timepoints_per_patient # Timepoints of ctDNA assessment (`ctDNA_detected`) -ctDNA_timepoints <- subset_data %>% - filter(!is.na(ctDNA_detected)) %>% # Filter out NA values for ctDNA_detected - group_by(participant_id) %>% +ctDNA_timepoints <- subset_data |> + filter(!is.na(ctDNA_detected)) |> # Filter out NA values for ctDNA_detected + group_by(participant_id) |> summarise( ctDNA_timepoints = n_distinct(timepoint), # Count distinct timepoints of ctDNA assessment .groups = "drop" - ) %>% + ) |> summarise( median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE), # Calculate median min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE), # Calculate minimum @@ -251,13 +250,13 @@ ctDNA_timepoints <- subset_data %>% ctDNA_timepoints # Timepoints of DTC assessment (`dtc_ihc_results_final`) -dtc_timepoints <- subset_data %>% - filter(!is.na(dtc_ihc_result_final)) %>% # Filter out NA values for dtc_ihc_result_final - group_by(participant_id) %>% +dtc_timepoints <- subset_data |> + filter(!is.na(dtc_ihc_result_final)) |> # Filter out NA values for dtc_ihc_result_final + group_by(participant_id) |> summarise( dtc_timepoints = n_distinct(timepoint), # Count distinct timepoints of DTC assessment .groups = "drop" - ) %>% + ) |> summarise( median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE), # Calculate median min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE), # Calculate minimum @@ -277,13 +276,12 @@ print(dtc_timepoints) ``` -A total of 396 plasma samples from 109 pts (median 2 timepoints each, range 1-12) have been successfully tested by RaDaR to date, with 8 failed samples across 4 unique individuals. These 4 individuals were excluded from the ctDNA cohort as they did not ultimately have succcesful ctDNA assessment. These 109 individuals had a median of 2 DTC assessment timepoints (range 1-6). -Overall, ctDNA was detected in 11 samples from 9/109 pts with a mean eVAF of 0.009% (range 0.002-0.084%). -Two pts were ctDNA+ at baseline (BL), and 7 became positive on surveillance. 100/109 were ctDNA- across all timepoints. 39/109 pts were DTC+, either at BL (n=26) or after (n=13). +**Timepoints of samples**: A total of 396 plasma samples from 109 pts (median 2 timepoints each, range 1-12) have been successfully tested by RaDaR to date, with 8 failed samples across 4 unique individuals. These 4 individuals were excluded from the ctDNA cohort as they did not ultimately have succcesful ctDNA assessment. These 109 individuals had a median of 2 DTC assessment timepoints (range 1-6). +Overall, ctDNA was detected in 11 samples from 9/109 pts with a mean eVAF of 0.009% (range 0.002-0.084%). Two pts were ctDNA+ at baseline (BL), and 7 became positive on surveillance. 100/109 were ctDNA- across all timepoints. 39/109 pts were DTC+, either at BL (n=26) or after (n=13). -``` {r} +```{r} # Filter and get unique participants by participant_id concordance_overall_unique <- subset_data |> distinct(participant_id, .keep_all = TRUE) |> @@ -313,7 +311,7 @@ table_ctDNA_dtc <- table(unique$ctDNA_ever, unique$dtc_ever) print(table_ctDNA_dtc) ``` -``` {r} +```{r} #Concordance by timepoint # Ensure that dtc_ihc_result_final is converted to TRUE/FALSE for consistency with ctDNA_detected @@ -324,8 +322,8 @@ concordance_by_timepoint <- subset_data |> dtc_ihc_result_final_bool = ifelse(dtc_ihc_result_final == 1, TRUE, FALSE), # Create a column to determine concordance (both DTC and ctDNA being TRUE or both being FALSE) concordance = ifelse(dtc_ihc_result_final_bool == ctDNA_detected, "Concordant", "Discordant") - ) %>% - group_by(timepoint) %>% + ) |> + group_by(timepoint) |> summarise( total_concordant = sum(concordance == "Concordant"), total_discordant = sum(concordance == "Discordant"), @@ -343,11 +341,9 @@ overall_concordance <- sum(concordance_by_timepoint$total_concordant) / cat("Overall Concordance Rate across all timepoints:", overall_concordance, "\n") #concordance, considering testing by timepoint, is 80% -``` - -Considering all timepoints, concordance was 63%, with higher concordance (80%) taking into account result concordance at each timepoint. Of 39 ever-DTC+ pts, 4 were ctDNA+ (of whom 3/4 recurred) and 35 remained ctDNA- (with 1/30 who recurred). - +``` +**Concordance of DTC and ctDNA testing**: Considering all timepoints, concordance was 63%, with higher concordance (80%) taking into account result concordance at each timepoint. Of 39 ever-DTC+ pts, 4 were ctDNA+ (of whom 3/4 recurred) and 35 remained ctDNA- (with 1/30 who recurred). **Test Characteristics** @@ -360,8 +356,8 @@ Next, we will look at ctDNA and DTC test characteristics. First we will look at ### DTC by ctDNA (ever positive), association between test positivity. # link by participant id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( dtc = first(dtc_ever), # Get the ever dtc for each participant ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant @@ -380,30 +376,30 @@ print(chisq_test) -##### Test stuff (#s and such of tests) +##### Tests (#s and such of tests) #number of tests (ctDNA) library(dplyr) # Assuming the status variable is named `ctDNA_detected` in d, and then in subset -status_summary_d <- d %>% - group_by(ctDNA_detected) %>% +status_summary_d <- d |> + group_by(ctDNA_detected) |> summarise(total_samples = n(), .groups = "drop") # Print the summary -- we've got 385 FALSE, 8 FAILS, 11 TRUES print(status_summary_d) #looking at the number of Fails by unique participant_id -fail_count <- d %>% - filter(ctDNA_detected == "Fail") %>% # Filter for rows where status is "FAIL" - distinct(participant_id) %>% # Get unique participant IDs +fail_count <- d |> + filter(ctDNA_detected == "Fail") |> # Filter for rows where status is "FAIL" + distinct(participant_id) |> # Get unique participant IDs summarise(total_fails = n()) # Count unique participant IDs # Print the result -- 4 individuals with FAIL results, which is what we got in the consort print(fail_count) -fail_count <- subset_data %>% - filter(ctDNA_detected == "Fail") %>% # Filter for rows where status is "FAIL" - distinct(participant_id) %>% # Get unique participant IDs +fail_count <- subset_data |> + filter(ctDNA_detected == "Fail") |> # Filter for rows where status is "FAIL" + distinct(participant_id) |> # Get unique participant IDs summarise(total_fails = n()) # Count unique participant IDs # Print the result -- none of the fails were pulled into the ctDNA cohort @@ -413,8 +409,8 @@ print(fail_count) unique(subset_data$dtc_ihc_result_final) #NA, 0, and 1 -status_summary_subset <- subset_data %>% - group_by(dtc_ihc_result_final) %>% +status_summary_subset <- subset_data |> + group_by(dtc_ihc_result_final) |> summarise(total_samples = n(), .groups = "drop") # Print the summary -- we've got 221 negatives, 49 positives, 128 NAs, across 39 patients (positive) and 70 patients (negative) @@ -422,8 +418,8 @@ status_summary_subset <- subset_data %>% print(status_summary_subset) ### looking at NAs -- all of them have FALSE (so i think these are all the ones that had ctDNA timepoints ) -na_participants_dtc <- subset_data %>% - filter(is.na(dtc_ihc_result_final)) %>% +na_participants_dtc <- subset_data |> + filter(is.na(dtc_ihc_result_final)) |> select(participant_id, dtc_ihc_result_final, FINAL_RESULT, ORIG_RSLT_DTC, ctDNA_detected, timepoint) # Print the list of participant IDs with NA in `dtc_ihc_result_final`-- they all have FALSE ctDNA results, so these are the ctDNA timepoints @@ -440,8 +436,8 @@ print(unique_timepoints) names(subset_data) #use eVAF # Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE` -eVAF_range_ctDNA_detected_percent <- subset_data %>% - filter(ctDNA_detected == TRUE) %>% # Filter for those with ctDNA detected +eVAF_range_ctDNA_detected_percent <- subset_data |> + filter(ctDNA_detected == TRUE) |> # Filter for those with ctDNA detected summarise( median_eVAF_percent = median(eVAF, na.rm = TRUE) * 100, # Convert median to percentage min_eVAF_percent = min(eVAF, na.rm = TRUE) * 100, # Convert minimum to percentage @@ -455,8 +451,8 @@ print(eVAF_range_ctDNA_detected_percent) names(subset_data) #use dtc_ihc_summary_count_final # Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE` -dtc_count <- subset_data %>% - filter(dtc_ihc_result_final == 1) %>% # Filter for those with dtcs detected +dtc_count <- subset_data |> + filter(dtc_ihc_result_final == 1) |> # Filter for those with dtcs detected summarise( median_dtc_count = median(dtc_ihc_summary_count_final, na.rm = TRUE), min_dtc_count = min(dtc_ihc_summary_count_final, na.rm = TRUE), @@ -470,12 +466,12 @@ print(dtc_count) #### Number of timepoints we see # Timepoints per patient (median, range) -timepoints_per_patient <- subset_data %>% - group_by(participant_id) %>% +timepoints_per_patient <- subset_data |> + group_by(participant_id) |> summarise( total_timepoints = n_distinct(timepoint), # Count distinct timepoints for each patient .groups = "drop" - ) %>% + ) |> summarise( median_timepoints = median(total_timepoints, na.rm = TRUE), # Calculate median min_timepoints = min(total_timepoints, na.rm = TRUE), # Calculate minimum @@ -483,13 +479,13 @@ timepoints_per_patient <- subset_data %>% ) # Timepoints of ctDNA assessment (`ctDNA_detected`) -ctDNA_timepoints <- subset_data %>% - filter(!is.na(ctDNA_detected)) %>% # Filter out NA values for ctDNA_detected - group_by(participant_id) %>% +ctDNA_timepoints <- subset_data |> + filter(!is.na(ctDNA_detected)) |> # Filter out NA values for ctDNA_detected + group_by(participant_id) |> summarise( ctDNA_timepoints = n_distinct(timepoint), # Count distinct timepoints of ctDNA assessment .groups = "drop" - ) %>% + ) |> summarise( median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE), # Calculate median min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE), # Calculate minimum @@ -497,13 +493,13 @@ ctDNA_timepoints <- subset_data %>% ) # Timepoints of DTC assessment (`dtc_ihc_results_final`) -dtc_timepoints <- subset_data %>% - filter(!is.na(dtc_ihc_result_final)) %>% # Filter out NA values for dtc_ihc_result_final - group_by(participant_id) %>% +dtc_timepoints <- subset_data |> + filter(!is.na(dtc_ihc_result_final)) |> # Filter out NA values for dtc_ihc_result_final + group_by(participant_id) |> summarise( dtc_timepoints = n_distinct(timepoint), # Count distinct timepoints of DTC assessment .groups = "drop" - ) %>% + ) |> summarise( median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE), # Calculate median min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE), # Calculate minimum @@ -530,9 +526,9 @@ print(unique_timepoints) trial_timepoints <- c("CLEVER-Baseline", "EOO", "C3", "C6", "C12", "6M F/U", "12M F/U", "18M F/U", "24M F/U", "30M F/U", "36M F/U") # Count the number of samples by timepoint (for specific clinical trial timepoints) -samples_by_trial_timepoint <- subset_data %>% - filter(timepoint %in% trial_timepoints) %>% # Filter for relevant timepoints - group_by(timepoint) %>% # Group by timepoint +samples_by_trial_timepoint <- subset_data |> + filter(timepoint %in% trial_timepoints) |> # Filter for relevant timepoints + group_by(timepoint) |> # Group by timepoint summarise( total_samples = n_distinct(participant_id), # Count distinct participant_ids (samples) .groups = "drop" # Remove grouping after summarizing @@ -543,9 +539,9 @@ print(samples_by_trial_timepoint) #total samples on trial (ctDNA and dtC) #### ctDNA on trial -ctDNA_samples_by_timepoint <- subset_data %>% - filter(timepoint %in% trial_timepoints, !is.na(ctDNA_detected)) %>% # Filter for relevant timepoints and ctDNA detected - group_by(timepoint) %>% # Group by timepoint +ctDNA_samples_by_timepoint <- subset_data |> + filter(timepoint %in% trial_timepoints, !is.na(ctDNA_detected)) |> # Filter for relevant timepoints and ctDNA detected + group_by(timepoint) |> # Group by timepoint summarise( total_samples_ctDNA = n_distinct(participant_id), # Count distinct participant_ids (ctDNA samples) .groups = "drop" # Remove grouping after summarizing @@ -557,9 +553,9 @@ print(ctDNA_samples_by_timepoint) ##### DTC by trial timepoint # Count the number of DTC samples by timepoint (for specific clinical trial timepoints) -dtc_samples_by_timepoint <- subset_data %>% - filter(timepoint %in% trial_timepoints, !is.na(dtc_ihc_result_final)) %>% # Filter for relevant timepoints and DTC results - group_by(timepoint) %>% # Group by timepoint +dtc_samples_by_timepoint <- subset_data |> + filter(timepoint %in% trial_timepoints, !is.na(dtc_ihc_result_final)) |> # Filter for relevant timepoints and DTC results + group_by(timepoint) |> # Group by timepoint summarise( total_samples_dtc = n_distinct(participant_id), # Count distinct participant_ids (DTC samples) .groups = "drop" # Remove grouping after summarizing @@ -572,9 +568,9 @@ print(dtc_samples_by_timepoint) print(unique_timepoints) surmount_timepoints <- c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", "Year 3 Follow Up", "Year 4 Follow Up", "Long Term FU 1", "Long Term FU 2") -ctDNA_surmount <- subset_data %>% - filter(timepoint %in% surmount_timepoints, !is.na(ctDNA_detected)) %>% # Filter for relevant timepoints and ctDNA detected - group_by(timepoint) %>% # Group by timepoint +ctDNA_surmount <- subset_data |> + filter(timepoint %in% surmount_timepoints, !is.na(ctDNA_detected)) |> # Filter for relevant timepoints and ctDNA detected + group_by(timepoint) |> # Group by timepoint summarise( total_samples_ctDNA = n_distinct(participant_id), # Count distinct participant_ids (ctDNA samples) .groups = "drop" # Remove grouping after summarizing @@ -586,9 +582,9 @@ print(ctDNA_surmount) ### number of DTC timepoints on surmount # Count the number of DTC samples by timepoint -dtc_timepoint_surmount <- subset_data %>% - filter(timepoint %in% surmount_timepoints, !is.na(dtc_ihc_result_final)) %>% # Filter for relevant timepoints and DTC results - group_by(timepoint) %>% # Group by timepoint +dtc_timepoint_surmount <- subset_data |> + filter(timepoint %in% surmount_timepoints, !is.na(dtc_ihc_result_final)) |> # Filter for relevant timepoints and DTC results + group_by(timepoint) |> # Group by timepoint summarise( total_samples_dtc = n_distinct(participant_id), # Count distinct participant_ids (DTC samples) .groups = "drop" # Remove grouping after summarizing @@ -600,14 +596,14 @@ print(dtc_timepoint_surmount) #### positivity by timepoint -- ctDNA -ctDNA_pos_rate_by_timepoint <- subset_data %>% - filter(!is.na(ctDNA_detected)) %>% # Ensure we are considering only non-missing ctDNA_detected values - group_by(timepoint, participant_id) %>% # Group by timepoint and participant +ctDNA_pos_rate_by_timepoint <- subset_data |> + filter(!is.na(ctDNA_detected)) |> # Ensure we are considering only non-missing ctDNA_detected values + group_by(timepoint, participant_id) |> # Group by timepoint and participant summarise( ctDNA_pos = max(ctDNA_detected == TRUE), # If any value is TRUE, participant is ctDNA positive at that timepoint .groups = "drop" - ) %>% - group_by(timepoint) %>% # Group again by timepoint to calculate the positivity rate + ) |> + group_by(timepoint) |> # Group again by timepoint to calculate the positivity rate summarise( positivity_rate = mean(ctDNA_pos), # Calculate the positivity rate for each timepoint total_samples = n_distinct(participant_id), # Count the number of distinct participants @@ -618,8 +614,8 @@ ctDNA_pos_rate_by_timepoint <- subset_data %>% print(ctDNA_pos_rate_by_timepoint) # Calculate cumulative ctDNA positivity rate by timepoint -ctDNA_pos_rate_cumulative <- ctDNA_pos_rate_by_timepoint %>% - arrange(timepoint) %>% # Ensure the data is sorted by timepoint +ctDNA_pos_rate_cumulative <- ctDNA_pos_rate_by_timepoint |> + arrange(timepoint) |> # Ensure the data is sorted by timepoint mutate( cumulative_pos_rate = cumsum(positivity_rate * total_samples) / cumsum(total_samples) # Cumulative positivity rate ) @@ -631,16 +627,16 @@ print(ctDNA_pos_rate_cumulative) library(dplyr) # Calculate ctDNA positivity rate by participant -ctDNA_pos_rate <- subset_data %>% - filter(!is.na(ctDNA_detected)) %>% # Exclude missing ctDNA results - group_by(participant_id) %>% # Group by participant +ctDNA_pos_rate <- subset_data |> + filter(!is.na(ctDNA_detected)) |> # Exclude missing ctDNA results + group_by(participant_id) |> # Group by participant summarise( ctDNA_pos = max(ctDNA_detected == TRUE), # If any value is TRUE, participant is ctDNA positive .groups = "drop" ) # Calculate cumulative positivity rate -ctDNA_pos_rate_cumulative <- ctDNA_pos_rate %>% +ctDNA_pos_rate_cumulative <- ctDNA_pos_rate |> summarise( total_pos = sum(ctDNA_pos), # Total number of ctDNA positive participants total_samples = n(), # Total number of participants @@ -652,13 +648,13 @@ print(ctDNA_pos_rate_cumulative) # Count the number of positive ctDNA samples and total samples -ctDNA_pos_vs_total <- subset_data %>% - filter(!is.na(ctDNA_detected)) %>% # Exclude missing ctDNA results +ctDNA_pos_vs_total <- subset_data |> + filter(!is.na(ctDNA_detected)) |> # Exclude missing ctDNA results summarise( total_samples = n(), # Total number of ctDNA samples positive_samples = sum(ctDNA_detected == TRUE), # Count of positive ctDNA samples .groups = "drop" - ) %>% + ) |> mutate( positivity_rate = positive_samples / total_samples # Proportion of positive ctDNA samples ) @@ -670,16 +666,16 @@ print(ctDNA_pos_vs_total) #### cumulative positivity DTC # Calculate ctDNA positivity rate by participant -DTC_pos_rate <- subset_data %>% - filter(!is.na(dtc_ihc_result_final)) %>% # Exclude missing ctDNA results - group_by(participant_id) %>% # Group by participant +DTC_pos_rate <- subset_data |> + filter(!is.na(dtc_ihc_result_final)) |> # Exclude missing ctDNA results + group_by(participant_id) |> # Group by participant summarise( dtc = max(dtc_ihc_result_final == 1), # If any value is TRUE, participant is ctDNA positive .groups = "drop" ) # Calculate cumulative positivity rate -DTC_pos_rate_cumulative <- DTC_pos_rate %>% +DTC_pos_rate_cumulative <- DTC_pos_rate |> summarise( total_pos = sum(dtc), # Total number of ctDNA positive participants total_samples = n(), # Total number of participants @@ -691,13 +687,13 @@ print(DTC_pos_rate_cumulative) # Count the number of positive ctDNA samples and total samples -dtc_pos_vs_total <- subset_data %>% - filter(!is.na(dtc_ihc_result_final)) %>% # Exclude missing ctDNA results +dtc_pos_vs_total <- subset_data |> + filter(!is.na(dtc_ihc_result_final)) |> # Exclude missing ctDNA results summarise( total_samples = n(), # Total number of ctDNA samples positive_samples = sum(dtc_ihc_result_final == 1), # Count of positive ctDNA samples .groups = "drop" - ) %>% + ) |> mutate( positivity_rate = positive_samples / total_samples # Proportion of positive ctDNA samples ) @@ -708,6 +704,8 @@ print(dtc_pos_vs_total) ``` +We see the distribution of test samples by timepoint, and can see that the most samples--and the highest rate of positivity-- occurred at SURMOUNT-baseline, but that more samples became positive with subsequent testing and that the cumulative positivity rate rose with additional timepoints--for both DTC and ctDNA assessment. + **Test Characteristics of ctDNA assay**: Next we will look at the sensitivity and specificity of the ctDNA assay. ```{r} @@ -715,16 +713,17 @@ print(dtc_pos_vs_total) #trying to do ctDNA 2x2 with ever relapsed on a patient level library(dplyr) +library(knitr) #create ever_relapsed variable -subset_data <- subset_data %>% +subset_data <- subset_data |> mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No")) # Exclude participants with all NA for `ctDNA_ever` or `ever_relapsed` -summarized_data <- subset_data %>% - filter(!is.na(ctDNA_ever) | !is.na(ever_relapsed)) %>% # Keep rows with at least one non-NA value - group_by(participant_id) %>% +summarized_data <- subset_data |> + filter(!is.na(ctDNA_ever) | !is.na(ever_relapsed)) |> # Keep rows with at least one non-NA value + group_by(participant_id) |> summarize( ctDNA_ever = max(ctDNA_ever, na.rm = TRUE), ever_relapsed = max(ever_relapsed, na.rm = TRUE) @@ -755,7 +754,6 @@ performance_table <- data.frame( print(performance_table) #Format the table for better readability -library(knitr) kable(performance_table, digits = 2, col.names = c("Metric", "Value")) ``` @@ -768,15 +766,15 @@ This ctDNA assay has high specificity (99%), with a high positive predictive val library(dplyr) # Total unique DTC+ patients -total_dtc_plus <- subset_data %>% - filter(dtc_ihc_result_final == 1) %>% - distinct(participant_id) %>% +total_dtc_plus <- subset_data |> + filter(dtc_ihc_result_final == 1) |> + distinct(participant_id) |> nrow() # Unique DTC+ patients who went on trial (those who have a trial ID fu_trial_pid) -dtc_plus_trial <- subset_data %>% - filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) %>% - distinct(participant_id) %>% +dtc_plus_trial <- subset_data |> + filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) |> + distinct(participant_id) |> nrow() # Proportion of DTC+ patients who went on trial @@ -791,9 +789,9 @@ cat("Proportion of DTC+ patients who went on trial:", proportion_trial, "\n") # Exclude participants with all NA for `dtc_ever` or `ever_relapsed` -summarized_data <- subset_data %>% - filter(!is.na(dtc_ever) | !is.na(ever_relapsed)) %>% # Keep rows with at least one non-NA value - group_by(participant_id) %>% +summarized_data <- subset_data |> + filter(!is.na(dtc_ever) | !is.na(ever_relapsed)) |> # Keep rows with at least one non-NA value + group_by(participant_id) |> summarize( dtc_ever = max(dtc_ever, na.rm = TRUE), ever_relapsed = max(ever_relapsed, na.rm = TRUE) @@ -831,9 +829,61 @@ kable(performance_table, digits = 2, col.names = c("Metric", "Value")) All of the 39 individuals who were DTC positive went onto an interventional treatment trial aimed at eliminating the presence of the DTCs. This is different from the workflow for ctDNA assessment, which occurred retrospectively--sometimes several years after testing--and was not the basis for any trial/intervention decision-making. It is therefore somewhat challenging to interpret the sensitivity and specificity of the DTC test, as relapse is the outcome and all of these patients are receiving an intervention aimed at eliminating the presence of the DTCs and thereby preventing relapse. The intervention after DTC assessment explains in part the low positive predictive value and the low sensitivity of the test. However, the high negative predictive value of 0.86 in the cohort--which is looking only at those who remained DTC negative and their outcomes (ie. those who did NOT get an intervention) suggests that repeat negative DTC testing (ie always remaining DTC negative on all testing) is valuable in predicting a good outcome (ie. NO relapse during follow-up). - **Table 1** - - Next we will build our Table 1, looking at important clinical and demographic variables in this ctDNA cohort. +**Associations with Relapse** + +```{r} + +## ctDNA association with relapse ## +# link by participant id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + dtc = first(ctDNA_ever), # Get the ctDNA_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of ever_relapsed vs dtc_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results. ctDNA has a strong association with relapse (p<0.0001). +print(contingency_table) +print(chisq_test) + + +#DTC association with relapse## + +# link by participant id +subset_data_by_id <- subset_data %>% + group_by(participant_id) %>% + summarise( + ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant + dtc = first(dtc_ever), # Get the dtc_ever status for each participant + .groups = "drop" + ) + +# Create a contingency table of ever_relapsed vs dtc_ever +contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc) + +# Perform the Chi-squared test +chisq_test <- chisq.test(contingency_table) + +# Print the contingency table and Chi-squared test results. Less strong of an association with relapse (p = 0.774) +print(contingency_table) +print(chisq_test) + + + +``` + +Looking at how our two biomarkers are associated with relapse using univariable tests of association, we can see that ctDNA positivity is strongly associated with relapse, but DTC positivity is not. It is important to keep in mind that DTC positivity was the basis for enrollment onto interventional clinical trials that were aimed at eliminating DTCs and preventing relapse (and all DTC positive individuals in this cohort enrolled on interventional trials). This likely confounds our ability to measure the association of DTC positivity with relapse. ctDNA assessment, meanwhile, was performed retrospectively and not used for clinical decision-making. + +**Demographics and Clinical Factor Assessment: Univariable associations by ctDNA status** + +Next we will start to build our Table 1, looking at important clinical and demographic variables in this ctDNA cohort. To start, we will look at univariable tests of association while looking at each variable (using chi-squared tests of association for categorical variables and t-tests for continuous variables). ```{r} @@ -892,15 +942,14 @@ age_summary <- subset_data |> print(age_summary) ``` - -``` {r} +```{r} ##### Race: demo_race_final # Get the count of unique participant_ids for each category in demo_race_final -race_counts_unique_percent <- subset_data %>% - group_by(demo_race_final) %>% - summarise(unique_participants = n_distinct(participant_id)) %>% +race_counts_unique_percent <- subset_data |> + group_by(demo_race_final) |> + summarise(unique_participants = n_distinct(participant_id)) |> mutate(percent = unique_participants / sum(unique_participants) * 100) # View the result @@ -909,8 +958,8 @@ print(race_counts_unique_percent) # Count distinct participant_ids by ctDNA_ever and demo_race_final -count_distinct_participants <- subset_data %>% - group_by(demo_race_final, ctDNA_ever) %>% +count_distinct_participants <- subset_data |> + group_by(demo_race_final, ctDNA_ever) |> summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop") # Print the result @@ -918,8 +967,8 @@ count_distinct_participants # Step 1: Summarize by unique participant_id -summarized_data <- subset_data %>% - group_by(participant_id) %>% +summarized_data <- subset_data |> + group_by(participant_id) |> summarise( ctDNA_ever = first(ctDNA_ever), # Taking the first observed value of ctDNA_ever for each participant demo_race_final = first(demo_race_final), # Taking the first observed value of demo_race_final for each participant @@ -939,8 +988,8 @@ chisq_test #####receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') # Breakdown of final_receptor_group by unique participant_id -receptor_status_by_participant <- subset_data %>% - group_by(participant_id) %>% +receptor_status_by_participant <- subset_data |> + group_by(participant_id) |> summarise(final_receptor_group = first(final_receptor_group), # Or choose the most frequent group if needed .groups = "drop") @@ -948,8 +997,8 @@ receptor_status_by_participant <- subset_data %>% table(receptor_status_by_participant$final_receptor_group) # Summarizing data by participant_id, final_receptor_group, and ctDNA_ever -receptor_ctDNA_status <- subset_data %>% - group_by(participant_id) %>% +receptor_ctDNA_status <- subset_data |> + group_by(participant_id) |> summarise( final_receptor_group = first(final_receptor_group), # Or the most frequent if needed ctDNA_ever = first(ctDNA_ever), # Taking the first observed value for ctDNA_ever @@ -971,8 +1020,8 @@ chisq_test #inclusion criteria inc_dx_crit___1 = TNBC (This has been confirmed with the study team) #inc_dx_crit_list___1 -TNBC_ctDNA_status <- subset_data %>% - group_by(participant_id) %>% +TNBC_ctDNA_status <- subset_data |> + group_by(participant_id) |> summarise( inc_dx_crit_list___1 = first(inc_dx_crit_list___1), # Or the most frequent if needed ctDNA_ever = first(ctDNA_ever), # Taking the first observed value for ctDNA_ever @@ -1002,8 +1051,8 @@ subset_data <- subset_data |> # View the new HR_status variable table(subset_data$HR_status) -HR_status_by_participant <- subset_data %>% - group_by(participant_id) %>% +HR_status_by_participant <- subset_data |> + group_by(participant_id) |> summarise(HR_status = first(HR_status), # Or use mode() if you have multiple rows per participant .groups = "drop") @@ -1011,8 +1060,8 @@ HR_status_by_participant <- subset_data %>% table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-) # Summarize ctDNA_detected status by HR_status, for each unique participant_id -summary_data <- subset_data %>% - group_by(participant_id) %>% +summary_data <- subset_data |> + group_by(participant_id) |> summarise( HR_status = first(HR_status), # Get the HR_status for the participant ctDNA_status = first(ctDNA_ever), # Get the ctDNA_detected status for the participant @@ -1029,9 +1078,9 @@ chisq_test ###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported # Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported -summary_data <- subset_data %>% - filter(final_tumor_grade != 3) %>% # Exclude grade == 3 - group_by(participant_id) %>% +summary_data <- subset_data |> + filter(final_tumor_grade != 3) |> # Exclude grade == 3 + group_by(participant_id) |> summarise( grade = first(final_tumor_grade), # Get the final_tumor_grade for each participant ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant @@ -1054,16 +1103,16 @@ print(chisq_test) #people have different combinations of histology (1-15) table(subset_data$participant_id, subset_data$final_histology) - histology_summary <- subset_data %>% - distinct(participant_id, final_histology) %>% # Get unique participant-histology combinations - group_by(final_histology) %>% # Group by histology type + histology_summary <- subset_data |> + distinct(participant_id, final_histology) |> # Get unique participant-histology combinations + group_by(final_histology) |> # Group by histology type summarise(count = n()) # Count the number of participants per histology type # View the summary table print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology #trying to create Ductal, lobular, both, or other variables --> histology_category - subset_data <- subset_data %>% + subset_data <- subset_data |> mutate(histology_category = case_when( grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular", # Both Ductal and Lobular grepl("3", as.character(final_histology)) ~ "Ductal", # Ductal @@ -1072,8 +1121,8 @@ table(subset_data$participant_id, subset_data$final_histology) )) # Count the number of participants in each histology category - histology_counts <- subset_data %>% - group_by(histology_category) %>% + histology_counts <- subset_data |> + group_by(histology_category) |> summarise(count = n_distinct(participant_id)) # Count distinct participants # View the counts -- adds up to 109! @@ -1081,9 +1130,9 @@ table(subset_data$participant_id, subset_data$final_histology) #contingency table library(tidyr) - contingency_table <- subset_data %>% - distinct(participant_id, histology_category, ctDNA_ever) %>% # Ensure each patient is counted once - count(histology_category, ctDNA_ever) %>% + contingency_table <- subset_data |> + distinct(participant_id, histology_category, ctDNA_ever) |> # Ensure each patient is counted once + count(histology_category, ctDNA_ever) |> pivot_wider(names_from = ctDNA_ever, values_from = n, values_fill = list(n = 0)) # Pivot the table to get ctDNA_ever as columns # 3. Perform the Chi-squared test of independence @@ -1101,17 +1150,17 @@ table(subset_data$participant_id, subset_data$final_histology) table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3) -nodal_summary <- subset_data %>% - distinct(participant_id, final_n_stage) %>% # Get unique participant-stage combinations - group_by(final_n_stage) %>% # Group by stage +nodal_summary <- subset_data |> + distinct(participant_id, final_n_stage) |> # Get unique participant-stage combinations + group_by(final_n_stage) |> # Group by stage summarise(count = n()) # Count the number of participants per histology type #View the summary table --adds up to 109, 46 = pN0 63 = pN1 print(nodal_summary) - subset_data_by_id <- subset_data %>% - filter(final_n_stage %in% c(0, 1, 2, 3)) %>% # Include only relevant nodal stages - group_by(participant_id) %>% + subset_data_by_id <- subset_data |> + filter(final_n_stage %in% c(0, 1, 2, 3)) |> # Include only relevant nodal stages + group_by(participant_id) |> summarise( nodal_status = first(final_n_stage), # Use final_n_stage as nodal_status for each participant ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant @@ -1132,8 +1181,8 @@ nodal_summary <- subset_data %>% #### Node positive versus node negative: Using the final n stage to create a Node - vs node + variable from this summary indicator variable - subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% + subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"), # Node negative if 0, positive otherwise ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant @@ -1141,8 +1190,8 @@ nodal_summary <- subset_data %>% ) #adding node_status to subset_data - subset_data <- subset_data %>% - left_join(subset_data_by_id %>% select(participant_id, node_status), by = "participant_id") + subset_data <- subset_data |> + left_join(subset_data_by_id |> select(participant_id, node_status), by = "participant_id") #Create a contingency table of node_status vs ctDNA_ever @@ -1160,9 +1209,9 @@ nodal_summary <- subset_data %>% table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (99 = pTx, cannot evaluate) so can proceed with this - t_summary <- subset_data %>% - distinct(participant_id, final_t_stage) %>% # Get unique participant-stage combinations - group_by(final_t_stage) %>% # Group by stage + t_summary <- subset_data |> + distinct(participant_id, final_t_stage) |> # Get unique participant-stage combinations + group_by(final_t_stage) |> # Group by stage summarise(count = n()) # Count the number of participants per histology type # View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2 @@ -1170,16 +1219,16 @@ nodal_summary <- subset_data %>% #for our T stage table, will use T1 vs T2 or greater to simplify, and we want to exclude 99 (the pTx). We will create "final_t_stage_combined" to represent this. - subset_data_clean <- subset_data %>% + subset_data_clean <- subset_data |> filter(final_t_stage != 99, ctDNA_ever != 99) # Combine final_t_stage into T1 vs. T2 or greater - subset_data_clean <- subset_data_clean %>% + subset_data_clean <- subset_data_clean |> mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater")) # Summarize the data by participant_id after creating the new combined t_stage - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% + subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1198,11 +1247,11 @@ nodal_summary <- subset_data %>% #### I looked at a different cut-off for T stage stats using T3 or greater as cutoff and didn't see any significant difference so am not using this for the table. #exclude 99 (the pTx) - subset_data_clean <- subset_data %>% + subset_data_clean <- subset_data |> filter(final_t_stage != 99, ctDNA_ever != 99) # Combine final_t_stage into T1/T2 or T3 or greater - subset_data_clean <- subset_data_clean %>% + subset_data_clean <- subset_data_clean |> mutate(final_t_stage_combined = case_when( final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2", # Group T1 and T2 together final_t_stage >= 3 ~ "T3 or greater", # Group T3 and higher as a separate category @@ -1211,8 +1260,8 @@ nodal_summary <- subset_data %>% # Summarize the data by participant_id after creating the new combined t_stage - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% + subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1234,21 +1283,21 @@ nodal_summary <- subset_data %>% table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this - stage_summary <- subset_data %>% - distinct(participant_id, final_overall_stage) %>% # Get unique participant-stage combinations - group_by(final_overall_stage) %>% # Group by stage + stage_summary <- subset_data |> + distinct(participant_id, final_overall_stage) |> # Get unique participant-stage combinations + group_by(final_overall_stage) |> # Group by stage summarise(count = n()) # Count the number of participants per histology type # View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay) print(stage_summary) #exclude the 99 - subset_data_clean <- subset_data %>% + subset_data_clean <- subset_data |> filter(final_overall_stage != 99, ctDNA_ever != 99) # Summarize the data by participant_id - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% + subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( final_overall_stage = first(final_overall_stage), # Get the final_overall_stage for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1271,9 +1320,9 @@ nodal_summary <- subset_data %>% table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness - surgery <- subset_data %>% - distinct(participant_id, diag_surgery_type_1) %>% # Get unique participant-stage combinations - group_by(diag_surgery_type_1) %>% # Group by stage + surgery <- subset_data |> + distinct(participant_id, diag_surgery_type_1) |> # Get unique participant-stage combinations + group_by(diag_surgery_type_1) |> # Group by stage summarise(count = n()) # Count the number of participants per histology type # View the summary table @@ -1281,8 +1330,8 @@ nodal_summary <- subset_data %>% # Summarize the data by participant_id - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% + subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( surgery = first(diag_surgery_type_1), # Get the final_overall_stage for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1307,23 +1356,23 @@ nodal_summary <- subset_data %>% table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two # Create a binary variable to identify participants who had axillary dissection - subset_data_clean <- subset_data %>% + subset_data_clean <- subset_data |> mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) - subset_data <- subset_data %>% + subset_data <- subset_data |> mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) # Ensure every participant has a ctDNA_ever and axillary_dissection value # Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one - subset_data_clean <- subset_data %>% + subset_data_clean <- subset_data |> mutate(axillary_dissection = case_when( diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1, # Had axillary dissection TRUE ~ 0 # No axillary dissection (includes missing values) )) # Summarize the data by participant_id, including the axillary_dissection and ctDNA_ever variables - subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% + subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( axillary_dissection = first(axillary_dissection), # Get the axillary dissection status for each participant ctDNA_ever = first(ctDNA_ever) # Get the ctDNA_ever status for each participant @@ -1331,7 +1380,7 @@ nodal_summary <- subset_data %>% contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$ctDNA_ever) - subset_data <- subset_data %>% + subset_data <- subset_data |> mutate(axillary_dissection = ifelse(is.na(axillary_dissection), 0, axillary_dissection)) table(subset_data$axillary_dissection) @@ -1360,8 +1409,8 @@ radiation <- subset_data |> print(radiation) # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( radiation = first(prtx_radiation), # xrt for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1390,8 +1439,8 @@ chemo <- subset_data |> print(chemo) #3 people did not get chemo in this cohort # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( chemo = first(prtx_chemo), # chemo for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1423,8 +1472,8 @@ nact <- subset_data |> print(nact) #3 people did not get chemo in this cohort # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( nact = first(diag_neoadj_chemo_1), # NACT for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1454,8 +1503,8 @@ endo <- subset_data |> print(endo) #most ppl did get endo (62 of the 109) # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( endo = first(prtx_endo), # Get the final_overall_stage for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1486,8 +1535,8 @@ bonemod <- subset_data |> print(bonemod) #most ppl did get endo (39 got bonemod) # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( bonemod = first(prtx_bonemod), # Get bone mod status for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1509,19 +1558,19 @@ print(chisq_test) table(subset_data$diag_pcr_1) table(subset_data$diag_pcr_2) #none recorded here so can just use diag_pcr_1 -pcr <- subset_data %>% - mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) %>% # Convert "." to NA - filter(!is.na(diag_pcr_1)) %>% # Exclude rows where diag_pcr_1 is NA - distinct(participant_id, diag_pcr_1) %>% - group_by(diag_pcr_1) %>% +pcr <- subset_data |> + mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) |> # Convert "." to NA + filter(!is.na(diag_pcr_1)) |> # Exclude rows where diag_pcr_1 is NA + distinct(participant_id, diag_pcr_1) |> + group_by(diag_pcr_1) |> summarise(count = n()) # Count the number of participants per histology type # View the summary table print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( pcr = first(diag_pcr_1), # Get pcr for each participant ctDNA_ever = first(ctDNA_ever) # Get ctDNA_ever status for each participant @@ -1544,8 +1593,8 @@ print(chisq_test) #local fu_locreg_prog # Step 1: Summarize data by unique participant_id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( fu_locreg_prog = first(fu_locreg_prog), # Get fu_locreg_prog status for each participant ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant @@ -1566,12 +1615,12 @@ print(chisq_test) ### Just want to look at site distribution here # Summarize the distribution of fu_locreg_site_char by unique participant_id -site_distribution <- subset_data %>% - group_by(participant_id) %>% +site_distribution <- subset_data |> + group_by(participant_id) |> summarise( site = first(fu_locreg_site_char), # Get the site for each unique participant .groups = "drop" - ) %>% + ) |> count(site) # Count the occurrences of each site # View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast @@ -1580,8 +1629,8 @@ print(site_distribution) #####distant recurrence: distant fu_dist_prog # Step 1: Summarize data by unique participant_id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( fu_dist_prog = first(fu_dist_prog), # Get fu_dist_prog status for each participant ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant @@ -1603,12 +1652,12 @@ print(chisq_test) #distant site fu_dist_site_num #fu_dist_site_char -- start just looking at the locations # Summarize the distribution of fu_dist_site_char by unique participant_id -dist_site_distribution <- subset_data %>% - group_by(participant_id) %>% +dist_site_distribution <- subset_data |> + group_by(participant_id) |> summarise( site = first(fu_dist_site_char), # Get the site for each unique participant .groups = "drop" - ) %>% + ) |> count(site) # Count the occurrences of each site # View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal @@ -1617,8 +1666,8 @@ print(dist_site_distribution) ##### ANY Recurrence -- this includes either fu_locreg_prog or fu_dist_prog # link by participant id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant ctDNA_ever = first(ctDNA_ever), # Get the ctDNA_ever status for each participant @@ -1639,8 +1688,8 @@ print(chisq_test) #using ever_relapsed and dtc_ever # link by participant id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant dtc = first(dtc_ever), # Get the ctDNA_ever status for each participant @@ -1658,7 +1707,7 @@ print(contingency_table) print(chisq_test) # Identify participants missing data in either `ever_relapsed` or `dtc_ever` -missing_data <- subset_data_by_id %>% +missing_data <- subset_data_by_id |> filter(is.na(ever_relapsed) | is.na(dtc)) # Print the IDs of participants with missing data @@ -1666,8 +1715,8 @@ print(missing_data$participant_id) #two individuals do not have relapse data (17 ### look at ever_relapsed by ctDNA -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant ctDNA = first(ctDNA_ever), # Get the ctDNA_ever status for each participant @@ -1688,24 +1737,24 @@ print(chisq_test) table(subset_data$fu_surv) -surv <- subset_data %>% - distinct(participant_id, fu_surv) %>% - group_by(fu_surv) %>% +surv <- subset_data |> + distinct(participant_id, fu_surv) |> + group_by(fu_surv) |> summarise(count = n()) # Count the number of participants per histology type # View the summary table print(surv) #1 NA patient --> identify the NA patient below dead = 5, alive 103. There is 1 that's an NA. -na_participant <- subset_data %>% - filter(is.na(fu_surv)) %>% +na_participant <- subset_data |> + filter(is.na(fu_surv)) |> select(participant_id, fu_surv) # Print the result -- 28115-17-021 -- no follow up data for this pt looking in redcap, everyone else has some survival data in the ctDNA cohort. print(na_participant) # Summarize data by unique participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( surv = first(fu_surv), # Get survival status for each participant ctDNA_ever = first(ctDNA_ever), # Get ctDNA_ever status for each participant @@ -1725,7 +1774,9 @@ print(chisq_test) ``` -``` {r} +**DTC Demographics and Univariable tests of association:** Next we will look at the univariable tests of association by DTC status. + +```{r} ############### DTC Demographics ########## @@ -1753,8 +1804,8 @@ head(subset_data$age_at_diag) summary(subset_data$age_at_diag) #median 48.75 -age_summary <- subset_data %>% - group_by(dtc_ever) %>% +age_summary <- subset_data |> + group_by(dtc_ever) |> summarise( mean_age = mean(age_at_diag, na.rm = TRUE), # Calculate mean age median_age = median(age_at_diag, na.rm = TRUE), # Calculate median age @@ -1771,8 +1822,8 @@ wilcox_test_result <- wilcox.test(age_at_diag ~ dtc_ever, data = subset_data) print(wilcox_test_result) #looking at range of age for the dtc pos -age_summary <- subset_data %>% - group_by(dtc_ever) %>% +age_summary <- subset_data |> + group_by(dtc_ever) |> summarise( min_age = min(age_at_diag, na.rm = TRUE), # Minimum age max_age = max(age_at_diag, na.rm = TRUE), # Maximum age @@ -1783,15 +1834,15 @@ age_summary <- subset_data %>% print(age_summary) ``` -```{r} +```{r} ##### Race: demo_race_final # Get the count of unique participant_ids for each category in demo_race_final -race_counts_unique_percent <- subset_data %>% - group_by(demo_race_final) %>% - summarise(unique_participants = n_distinct(participant_id)) %>% +race_counts_unique_percent <- subset_data |> + group_by(demo_race_final) |> + summarise(unique_participants = n_distinct(participant_id)) |> mutate(percent = unique_participants / sum(unique_participants) * 100) # View the result @@ -1800,8 +1851,8 @@ print(race_counts_unique_percent) # Count distinct participant_ids by dtc_ever and demo_race_final -count_distinct_participants <- subset_data %>% - group_by(demo_race_final, dtc_ever) %>% +count_distinct_participants <- subset_data |> + group_by(demo_race_final, dtc_ever) |> summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop") # Print the result @@ -1812,8 +1863,8 @@ count_distinct_participants library(dplyr) # Step 1: Summarize by unique participant_id -summarized_data <- subset_data %>% - group_by(participant_id) %>% +summarized_data <- subset_data |> + group_by(participant_id) |> summarise( dtc_ever = first(dtc_ever), # Taking the first observed value of dtc_ever for each participant demo_race_final = first(demo_race_final), # Taking the first observed value of demo_race_final for each participant @@ -1834,8 +1885,8 @@ chisq_test #receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') # Breakdown of final_receptor_group by unique participant_id -receptor_status_by_participant <- subset_data %>% - group_by(participant_id) %>% +receptor_status_by_participant <- subset_data |> + group_by(participant_id) |> summarise(final_receptor_group = first(final_receptor_group), # Or choose the most frequent group if needed .groups = "drop") @@ -1843,8 +1894,8 @@ receptor_status_by_participant <- subset_data %>% table(receptor_status_by_participant$final_receptor_group) # Summarizing data by participant_id, final_receptor_group, and dtc_ever -receptor_dtc_status <- subset_data %>% - group_by(participant_id) %>% +receptor_dtc_status <- subset_data |> + group_by(participant_id) |> summarise( final_receptor_group = first(final_receptor_group), # Or the most frequent if needed dtc_ever = first(dtc_ever), # Taking the first observed value for dtc_ever @@ -1869,8 +1920,8 @@ chisq_test #inc_dx_crit_list___1 -TNBC_dtc_status <- subset_data %>% - group_by(participant_id) %>% +TNBC_dtc_status <- subset_data |> + group_by(participant_id) |> summarise( inc_dx_crit_list___1 = first(inc_dx_crit_list___1), # Or the most frequent if needed dtc_ever = first(dtc_ever), # Taking the first observed value for dtc_ever @@ -1900,8 +1951,8 @@ subset_data <- subset_data |> # View the new HR_status variable table(subset_data$HR_status) -HR_status_by_participant <- subset_data %>% - group_by(participant_id) %>% +HR_status_by_participant <- subset_data |> + group_by(participant_id) |> summarise(HR_status = first(HR_status), # Or use mode() if you have multiple rows per participant .groups = "drop") @@ -1909,8 +1960,8 @@ HR_status_by_participant <- subset_data %>% table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-) # Summarize dtc_detected status by HR_status, for each unique participant_id -summary_data <- subset_data %>% - group_by(participant_id) %>% +summary_data <- subset_data |> + group_by(participant_id) |> summarise( HR_status = first(HR_status), # Get the HR_status for the participant dtc_status = first(dtc_ever), # Get the dtc_detected status for the participant @@ -1930,9 +1981,9 @@ chisq_test ###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported # Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported -summary_data <- subset_data %>% - filter(final_tumor_grade != 3) %>% # Exclude grade == 3 - group_by(participant_id) %>% +summary_data <- subset_data |> + filter(final_tumor_grade != 3) |> # Exclude grade == 3 + group_by(participant_id) |> summarise( grade = first(final_tumor_grade), # Get the final_tumor_grade for each participant dtc_ever = first(dtc_ever), # Get the dtc_ever status for each participant @@ -1954,16 +2005,16 @@ print(chisq_test) ######histology #people have different combinations of histology (1-15) table(subset_data$participant_id, subset_data$final_histology) -histology_summary <- subset_data %>% - distinct(participant_id, final_histology) %>% # Get unique participant-histology combinations - group_by(final_histology) %>% # Group by histology type +histology_summary <- subset_data |> + distinct(participant_id, final_histology) |> # Get unique participant-histology combinations + group_by(final_histology) |> # Group by histology type summarise(count = n()) # Count the number of participants per histology type # View the summary table print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology #trying to create Ductal, lobular, both, or other variables -subset_data <- subset_data %>% +subset_data <- subset_data |> mutate(histology_category = case_when( grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular", # Both Ductal and Lobular grepl("3", as.character(final_histology)) ~ "Ductal", # Ductal @@ -1972,8 +2023,8 @@ subset_data <- subset_data %>% )) # Count the number of participants in each histology category -histology_counts <- subset_data %>% - group_by(histology_category) %>% +histology_counts <- subset_data |> + group_by(histology_category) |> summarise(count = n_distinct(participant_id)) # Count distinct participants # View the counts -- adds up to 109! @@ -1981,9 +2032,9 @@ print(histology_counts) #contingency table library(tidyr) -contingency_table <- subset_data %>% - distinct(participant_id, histology_category, dtc_ever) %>% # Ensure each patient is counted once - count(histology_category, dtc_ever) %>% +contingency_table <- subset_data |> + distinct(participant_id, histology_category, dtc_ever) |> # Ensure each patient is counted once + count(histology_category, dtc_ever) |> pivot_wider(names_from = dtc_ever, values_from = n, values_fill = list(n = 0)) # Pivot the table to get dtc_ever as columns # 3. Perform the Chi-squared test of independence @@ -2001,17 +2052,17 @@ print(chisq_test) table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3) -nodal_summary <- subset_data %>% - distinct(participant_id, final_n_stage) %>% # Get unique participant-stage combinations - group_by(final_n_stage) %>% # Group by stage +nodal_summary <- subset_data |> + distinct(participant_id, final_n_stage) |> # Get unique participant-stage combinations + group_by(final_n_stage) |> # Group by stage summarise(count = n()) # Count the number of participants per histology type #View the summary table --adds up to 109, 46 = pN0 63 = pN1 print(nodal_summary) -subset_data_by_id <- subset_data %>% - filter(final_n_stage %in% c(0, 1, 2, 3)) %>% # Include only relevant nodal stages - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + filter(final_n_stage %in% c(0, 1, 2, 3)) |> # Include only relevant nodal stages + group_by(participant_id) |> summarise( nodal_status = first(final_n_stage), # Use final_n_stage as nodal_status for each participant dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant @@ -2032,8 +2083,8 @@ print(chisq_test) #### Creating Node - vs node + variable from summary variable -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"), # Node negative if 0, positive otherwise dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant @@ -2054,16 +2105,16 @@ print(chisq_test) ####### EXTRA CODE/CONFIRMATION / slightly different but ignore for our analysis #cross-check with indicator pN0 in our data that reflects nodal positivity.... there is 1 patient that is node - by summary variable but node + by indicator variable ## should double check this at some point -node_pos <- subset_data %>% - distinct(participant_id, inc_dx_crit_list___2) %>% # Get unique participant-stage combinations - group_by(inc_dx_crit_list___2) %>% # Group by stage +node_pos <- subset_data |> + distinct(participant_id, inc_dx_crit_list___2) |> # Get unique participant-stage combinations + group_by(inc_dx_crit_list___2) |> # Group by stage summarise(count = n()) # Count the number of participants per histology type print(node_pos) -contingency_table <- subset_data %>% - distinct(participant_id, inc_dx_crit_list___2, dtc_ever) %>% # Ensure unique participants - count(inc_dx_crit_list___2, dtc_ever) %>% # Count occurrences +contingency_table <- subset_data |> + distinct(participant_id, inc_dx_crit_list___2, dtc_ever) |> # Ensure unique participants + count(inc_dx_crit_list___2, dtc_ever) |> # Count occurrences spread(key = dtc_ever, value = n, fill = 0) # Spread data into a matrix # View the contingency table @@ -2081,9 +2132,9 @@ print(chi_square_result) table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (90 = pTx) so can proceed with this -t_summary <- subset_data %>% - distinct(participant_id, final_t_stage) %>% # Get unique participant-stage combinations - group_by(final_t_stage) %>% # Group by stage +t_summary <- subset_data |> + distinct(participant_id, final_t_stage) |> # Get unique participant-stage combinations + group_by(final_t_stage) |> # Group by stage summarise(count = n()) # Count the number of participants per histology type # View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2 @@ -2092,16 +2143,16 @@ print(t_summary) #### T stage, for our T stage table, will use T1 vs T2 or greater to simplify #exclude 99 (the pTx) -subset_data_clean <- subset_data %>% +subset_data_clean <- subset_data |> filter(final_t_stage != 99, dtc_ever != 99) # Combine final_t_stage into T1 vs. T2 or greater -subset_data_clean <- subset_data_clean %>% +subset_data_clean <- subset_data_clean |> mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater")) # Summarize the data by participant_id after creating the new combined t_stage -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2120,11 +2171,11 @@ print(chisq_test) #### TRy for T stage stats using T3 or greater as cutoff -- not super useful, So DONOT USE THIS FOR TABLE #exclude 99 (the pTx) -subset_data_clean <- subset_data %>% +subset_data_clean <- subset_data |> filter(final_t_stage != 99, dtc_ever != 99) # Combine final_t_stage into T1/T2 or T3 or greater -subset_data_clean <- subset_data_clean %>% +subset_data_clean <- subset_data_clean |> mutate(final_t_stage_combined = case_when( final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2", # Group T1 and T2 together final_t_stage >= 3 ~ "T3 or greater", # Group T3 and higher as a separate category @@ -2133,8 +2184,8 @@ subset_data_clean <- subset_data_clean %>% # Summarize the data by participant_id after creating the new combined t_stage -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( final_t_stage_combined = first(final_t_stage_combined), # Get the combined t status for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2156,21 +2207,21 @@ print(chisq_test) table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this -stage_summary <- subset_data %>% - distinct(participant_id, final_overall_stage) %>% # Get unique participant-stage combinations - group_by(final_overall_stage) %>% # Group by stage +stage_summary <- subset_data |> + distinct(participant_id, final_overall_stage) |> # Get unique participant-stage combinations + group_by(final_overall_stage) |> # Group by stage summarise(count = n()) # Count the number of participants per histology type # View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay) print(stage_summary) #exclude the 99 -subset_data_clean <- subset_data %>% +subset_data_clean <- subset_data |> filter(final_overall_stage != 99, dtc_ever != 99) # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( final_overall_stage = first(final_overall_stage), # Get the final_overall_stage for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2194,9 +2245,9 @@ print(chisq_test) table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness -surgery <- subset_data %>% - distinct(participant_id, diag_surgery_type_1) %>% # Get unique participant-stage combinations - group_by(diag_surgery_type_1) %>% # Group by stage +surgery <- subset_data |> + distinct(participant_id, diag_surgery_type_1) |> # Get unique participant-stage combinations + group_by(diag_surgery_type_1) |> # Group by stage summarise(count = n()) # Count the number of participants per histology type # View the summary table @@ -2204,8 +2255,8 @@ print(surgery) # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( surgery = first(diag_surgery_type_1), # Get the final_overall_stage for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2229,20 +2280,20 @@ table(subset_data$diag_axillary_type___2_1) table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two # Create a binary variable to identify participants who had axillary dissection -subset_data_clean <- subset_data %>% +subset_data_clean <- subset_data |> mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0)) # Ensure every participant has a dtc_ever and axillary_dissection value # Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one -subset_data_clean <- subset_data %>% +subset_data_clean <- subset_data |> mutate(axillary_dissection = case_when( diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1, # Had axillary dissection TRUE ~ 0 # No axillary dissection (includes missing values) )) # Summarize the data by participant_id, including the axillary_dissection and dtc_ever variables -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( axillary_dissection = first(axillary_dissection), # Get the axillary dissection status for each participant dtc_ever = first(dtc_ever) # Get the dtc_ever status for each participant @@ -2276,8 +2327,8 @@ radiation <- subset_data |> print(radiation) # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( radiation = first(prtx_radiation), # Get the final_overall_stage for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2308,8 +2359,8 @@ chemo <- subset_data |> print(chemo) #3 people didn not get chemo in this cohort # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( chemo = first(prtx_chemo), # Get the final_overall_stage for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2343,8 +2394,8 @@ nact <- subset_data |> print(nact) #3 people didn not get chemo in this cohort # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( nact = first(diag_neoadj_chemo_1), # Get the final_overall_stage for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2374,8 +2425,8 @@ endo <- subset_data |> print(endo) #most ppl did get endo (62 of the 109) # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( endo = first(prtx_endo), # Get the final_overall_stage for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2407,8 +2458,8 @@ bonemod <- subset_data |> print(bonemod) #most ppl did get endo (39 got bonemod) # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( bonemod = first(prtx_bonemod), # Get the final_overall_stage for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2428,25 +2479,25 @@ print(chisq_test) ``` ```{r} -#Later +#pCR #2 = non-pcr, 1 = pcr #path cr diag_pcr_1 or diag_pcr_2 (as this could be on either of the two diagnosis and staging forms, there are 2 variables for this) table(subset_data$diag_pcr_1) table(subset_data$diag_pcr_2) #none recorded here so can just use pcr_1 -pcr <- subset_data %>% - mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) %>% # Convert "." to NA - filter(!is.na(diag_pcr_1)) %>% # Exclude rows where diag_pcr_1 is NA - distinct(participant_id, diag_pcr_1) %>% - group_by(diag_pcr_1) %>% +pcr <- subset_data |> + mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) |> # Convert "." to NA + filter(!is.na(diag_pcr_1)) |> # Exclude rows where diag_pcr_1 is NA + distinct(participant_id, diag_pcr_1) |> + group_by(diag_pcr_1) |> summarise(count = n()) # Count the number of participants per histology type # View the summary table print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data # Summarize the data by participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( pcr = first(diag_pcr_1), # Get the final_overall_stage for each participant dtc_ever = first(dtc_ever) # Get dtc_ever status for each participant @@ -2458,7 +2509,7 @@ contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$dtc_ever) # Perform the Chi-squared test chisq_test <- chisq.test(contingency_table) -# Print the contingency table and Chi-squared test results --> p-val = 0.27-- does not seem to be association among those who got pcr (but also we have a group with 1 in it...). +# Print the contingency table and Chi-squared test results --> p-val = 0.27-- does not seem to be association among those who got pcr (but also we have a group with 1 in it and a very small sample size of those on whom pCR was evaluated (18 individuals) print(contingency_table) print(chisq_test) @@ -2469,8 +2520,8 @@ print(chisq_test) #local fu_locreg_prog # Step 1: Summarize data by unique participant_id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( fu_locreg_prog = first(fu_locreg_prog), # Get fu_locreg_prog status for each participant dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant @@ -2491,12 +2542,12 @@ print(chisq_test) ### Just want to look at site distribution here # Summarize the distribution of fu_locreg_site_char by unique participant_id -site_distribution <- subset_data %>% - group_by(participant_id) %>% +site_distribution <- subset_data |> + group_by(participant_id) |> summarise( site = first(fu_locreg_site_char), # Get the site for each unique participant .groups = "drop" - ) %>% + ) |> count(site) # Count the occurrences of each site # View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast @@ -2505,8 +2556,8 @@ print(site_distribution) #####distant recurrence: distant fu_dist_prog # Step 1: Summarize data by unique participant_id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( fu_dist_prog = first(fu_dist_prog), # Get fu_dist_prog status for each participant dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant @@ -2528,12 +2579,12 @@ print(chisq_test) #distant site fu_dist_site_num #fu_dist_site_char -- start justl ooking at the locations # Summarize the distribution of fu_dist_site_char by unique participant_id -dist_site_distribution <- subset_data %>% - group_by(participant_id) %>% +dist_site_distribution <- subset_data |> + group_by(participant_id) |> summarise( site = first(fu_dist_site_char), # Get the site for each unique participant .groups = "drop" - ) %>% + ) |> count(site) # Count the occurrences of each site # View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal @@ -2542,12 +2593,12 @@ print(dist_site_distribution) #any recurrence #either fu_locreg_prog or fu_dist_prog -subset_data <- subset_data %>% +subset_data <- subset_data |> mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No")) # link by participant id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant dtc_ever = first(dtc_ever), # Get the dtc_ever status for each participant @@ -2568,8 +2619,8 @@ print(chisq_test) #using ever_relapsed # link by participant id -subset_data_by_id <- subset_data %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data |> + group_by(participant_id) |> summarise( ever_relapsed = first(ever_relapsed), # Get the ever_relapsed status for each participant dtc = first(dtc_ever), # Get the dtc_ever status for each participant @@ -2587,7 +2638,7 @@ print(contingency_table) print(chisq_test) # Identify participants missing data in either `ever_relapsed` or `dtc_ever` -missing_data <- subset_data_by_id %>% +missing_data <- subset_data_by_id |> filter(is.na(ever_relapsed) | is.na(dtc)) # Print the IDs of participants with missing data @@ -2599,24 +2650,24 @@ print(missing_data$participant_id) #two individuals do not have relapse data (17 table(subset_data$fu_surv) -surv <- subset_data %>% - distinct(participant_id, fu_surv) %>% - group_by(fu_surv) %>% +surv <- subset_data |> + distinct(participant_id, fu_surv) |> + group_by(fu_surv) |> summarise(count = n()) # Count the number of participants per histology type # View the summary table print(surv) #1 NA patient --> identify the NA patient below dead = 5, alive 103. There is 1 that's an NA. -na_participant <- subset_data %>% - filter(is.na(fu_surv)) %>% +na_participant <- subset_data |> + filter(is.na(fu_surv)) |> select(participant_id, fu_surv) # Print the result -- 28115-17-021 -- no follow up data for this pt looking in redcap, everyone else has some survival data in the dtc cohort. print(na_participant) # Summarize data by unique participant_id -subset_data_by_id <- subset_data_clean %>% - group_by(participant_id) %>% +subset_data_by_id <- subset_data_clean |> + group_by(participant_id) |> summarise( surv = first(fu_surv), # Get survival status for each participant dtc_ever = first(dtc_ever), # Get dtc_ever status for each participant @@ -2635,14 +2686,20 @@ print(chisq_test) ``` -Now that we have run the univariate associations for all the important demographic and clinical factors for both ctDNA and DTC, we will work on actually making our table 1, first by ctDNA status and a second table by DTC status. +Now that we have run the univariate associations for all the important demographic and clinical factors for both ctDNA and DTC, we will work on actually making our table 1, first by ctDNA status and a second table by DTC status. + +### **Making our Table 1** + +#### **Demographics and Clinical Factors by ctDNA Status** ```{r} ####### Making Table 1--first for ctDNA ######### -## https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html TRY LATER +## Resources to try for both making Table 1 and LASSO +## https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html ## lasso https://stats.stackexchange.com/questions/72251/an-example-lasso-regression-using-glmnet-for-binary-outcome ## https://www.r-bloggers.com/2020/05/quick-tutorial-on-lasso-regression-with-example/#google_vignette + #Table 1 Code library(table1) names(subset_data) #to choose variables @@ -2653,7 +2710,7 @@ library(tidyr) library(stringr) # Prepare the dataset -unique_subset_data <- subset_data %>% +unique_subset_data <- subset_data |> mutate( # Convert "Missing" and 99 to NA in relevant columns final_t_stage = na_if(as.character(final_t_stage), "Missing"), @@ -2664,23 +2721,8 @@ unique_subset_data <- subset_data %>% diag_pcr_1 = na_if(diag_pcr_1, "."), # Replace 99 with NA in all numeric columns across(where(is.numeric), ~ na_if(.x, 99)) - ) %>% - filter( - # Remove rows with NA values for specific columns before summarizing - !is.na(final_t_stage), - !is.na(final_overall_stage), - !is.na(final_receptor_group), - !is.na(demo_race_final), - !is.na(final_tumor_grade), - !is.na(final_n_stage), - !is.na(histology_category), - !is.na(axillary_dissection), - !is.na(diag_surgery_type_1), - !is.na(diag_neoadj_chemo_1), - !is.na(diag_pcr_1), - !is.na(ctDNA_ever) - ) %>% - group_by(participant_id) %>% + ) |> + group_by(participant_id) |> summarize( age_at_diag = first(na.omit(age_at_diag)), final_receptor_group = first(na.omit(final_receptor_group)), @@ -2701,10 +2743,6 @@ unique_subset_data <- subset_data %>% ctDNA_ever = first(na.omit(ctDNA_ever)) ) -# trying to get rid of the missings for diag_pcr_1 - -unique(subset_data$diag_pcr_1) #. - ####### #add labels for #final_receptor_group @@ -2733,7 +2771,7 @@ units(unique_subset_data$age_at_diag) <- "years" # assign `final_receptor_group` factor levels and labels to `unique_subset_data` -unique_subset_data <- unique_subset_data %>% +unique_subset_data <- unique_subset_data |> mutate( final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4), labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+")) @@ -2904,6 +2942,9 @@ factor(unique_subset_data$ctDNA_ever, levels=c("FALSE", "TRUE"), label(unique_subset_data$ctDNA_ever) <- "ctDNA Status" table(unique_subset_data$ctDNA_ever) + + + caption <- "Table 1 by ctDNA Status" # Generate the table1 summary @@ -2921,6 +2962,8 @@ table1( ``` +We have our basic Table 1 by ctDNA status. + ```{r} #Adding P-values and tests of significance to the code. @@ -2932,7 +2975,7 @@ table1_output <- table1( histology_category + prtx_radiation + prtx_chemo + prtx_endo + prtx_bonemod + node_status + axillary_dissection + - diag_surgery_type_1 + diag_neoadj_chemo_1 | + diag_surgery_type_1 + diag_neoadj_chemo_1 +diag_pcr_1 | ctDNA_ever, data = unique_subset_data, overall = c(left = "Total"), @@ -2942,8 +2985,9 @@ table1_output <- table1( #### pvalue_function <- function(x, ...) { - # Remove any "Overall" group if present and focus only on ctDNA+ and ctDNA- comparisons - x <- x[!names(x) %in% "Overall"] # Filter out the "Overall" column + print(x) + # Remove any "overall" group if present and focus only on ctDNA+ and ctDNA- comparisons + x <- x[!names(x) %in% "overall"] # Filter out the "Overall" column y <- unlist(x) g <- factor(rep(1:length(x), times = sapply(x, length))) @@ -2982,7 +3026,7 @@ table1_p <- table1( histology_category + prtx_radiation + prtx_chemo + prtx_endo + prtx_bonemod + node_status + axillary_dissection + - diag_surgery_type_1 + diag_neoadj_chemo_1 | + diag_surgery_type_1 + diag_neoadj_chemo_1 + diag_pcr_1| ctDNA_ever, data = unique_subset_data, overall = c(left = "Total"), @@ -2990,53 +3034,36 @@ table1_p <- table1( extra.col.pos = 4 # Position of the extra column ) -table1_p #Still not adding p-values....grrr +table1_p #we have p-values! ``` -** Table of demographics and clinical factors by DTC status ** +We can see in this Table 1 by ctDNA status, including tests of association, that the following variables have significant (p\<0.05) associations: Tumor Grade (higher grade associated with positivity), overall stage (higher stage associated with positivity), N-stage (with higher N-stage seemingly associated with positivity), with trends towards significance (approaching a significant p-value) for receptor status and age at diagnosis. -```{r} +### **Table of demographics and clinical factors by DTC status** -####### Table of clinical and demographic factors by DTC status ######### +Next we will create a Table to look at demographic and clinical factors by DTC status, including tests of association. -## https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html TRY LATER -## lasso https://stats.stackexchange.com/questions/72251/an-example-lasso-regression-using-glmnet-for-binary-outcome -## https://www.r-bloggers.com/2020/05/quick-tutorial-on-lasso-regression-with-example/#google_vignette -#Table 1 Code -library(table1) -library(dplyr) -library(tidyr) -library(stringr) +```{r} + +####### Table of clinical and demographic factors by DTC status ######### # Prepare the dataset -unique_subset_data <- subset_data %>% +dtc_unique_subset_data <- subset_data |> mutate( - # Convert "Missing" and 99 to NA in relevant columns + # Replace "Missing" and 99 with NA in relevant columns final_t_stage = na_if(as.character(final_t_stage), "Missing"), final_t_stage = na_if(final_t_stage, "99"), final_overall_stage = na_if(as.character(final_overall_stage), "Missing"), final_overall_stage = na_if(final_overall_stage, "99"), - final_tumor_grade = na_if(final_tumor_grade, 3), + final_tumor_grade = na_if(final_tumor_grade, 3), # Assumes 3 means "Not Reported" + diag_pcr_1 = na_if(diag_pcr_1, "."), # Replace 99 with NA in all numeric columns across(where(is.numeric), ~ na_if(.x, 99)) - ) %>% - filter( - # Remove rows with NA values for specific columns before summarizing - !is.na(final_t_stage), - !is.na(final_overall_stage), - !is.na(final_receptor_group), - !is.na(demo_race_final), - !is.na(final_tumor_grade), - !is.na(final_n_stage), - !is.na(histology_category), - !is.na(axillary_dissection), - !is.na(diag_surgery_type_1), - !is.na(diag_neoadj_chemo_1), - !is.na(ctDNA_ever) - ) %>% - group_by(participant_id) %>% + ) |> + group_by(participant_id) |> summarize( + # Summarize unique participant-level data age_at_diag = first(na.omit(age_at_diag)), final_receptor_group = first(na.omit(final_receptor_group)), demo_race_final = first(na.omit(demo_race_final)), @@ -3053,214 +3080,121 @@ unique_subset_data <- subset_data %>% axillary_dissection = first(na.omit(axillary_dissection)), diag_surgery_type_1 = first(na.omit(diag_surgery_type_1)), diag_neoadj_chemo_1 = first(na.omit(diag_neoadj_chemo_1)), - ctDNA_ever = first(na.omit(ctDNA_ever)) + diag_pcr_1 = first(na.omit(diag_pcr_1)), + ctDNA_ever = first(na.omit(ctDNA_ever)), + dtc_ever = first(na.omit(dtc_ever)) ) -# Generate the table1 summary -table1( - ~ age_at_diag + factor(final_receptor_group) + factor(demo_race_final) + - factor(final_tumor_grade) + factor(final_overall_stage) + - factor(final_t_stage) + factor(final_n_stage) + - factor(histology_category) + factor(prtx_radiation) + - factor(prtx_chemo) + factor(prtx_endo) + factor(prtx_bonemod) + - factor(node_status) + factor(axillary_dissection) + - factor(diag_surgery_type_1) + factor(diag_neoadj_chemo_1) | - ctDNA_ever, - data = unique_subset_data -) - -####### -#add labels for -#final_receptor_group -#demo_race_final -#final_tumor_grade -#final_overall_tage -#final_t_stage) -#final_n_stage -#histology_category -#prtx_radiation -#prtx_chemo) -#prtx_endo -#prtx_bonemod -#node_status) -#axillary_dissection -#diag_surgery_type_1 -#diag_neoadj_chemo_1 -#ctDNA_ever - - -label(unique_subset_data$age_at_diag) <- "Age at Diagnosis" -units(unique_subset_data$age_at_diag) <- "years" - -#Final receptor group: 1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+' - - -# assign `final_receptor_group` factor levels and labels to `unique_subset_data` -unique_subset_data <- unique_subset_data %>% +# Convert variables to labeled factors for table output +dtc_unique_subset_data <- dtc_unique_subset_data |> mutate( final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4), - labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+")) - ) - -label(unique_subset_data$final_receptor_group) <- "Final Receptor Group" - -table(unique_subset_data$final_receptor_group) + labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+")), + race = factor(demo_race_final, levels = c(1, 3, 5), + labels = c("Black", "Asian", "White")), + final_tumor_grade = factor(final_tumor_grade, levels = c(0, 1, 2), + labels = c("Grade 3", "Grade 1", "Grade 2")), + final_overall_stage = factor(final_overall_stage, levels = c(1, 2, 3), + labels = c("Stage I", "Stage II", "Stage III")), + final_t_stage = factor(final_t_stage, levels = c(1, 2, 3, 4), + labels = c("T1", "T2", "T3", "T4")), + final_n_stage = factor(final_n_stage, levels = c(0, 1, 2, 3), + labels = c("N0", "N1", "N2", "N3")), + prtx_radiation = factor(prtx_radiation, levels = c(0, 1), + labels = c("No Radiation", "Radiation")), + prtx_chemo = factor(prtx_chemo, levels = c(0, 1), + labels = c("No Chemo", "Chemo")), + prtx_endo = factor(prtx_endo, levels = c(0, 1), + labels = c("No Endocrine Therapy", "Endocrine Therapy")), + prtx_bonemod = factor(prtx_bonemod, levels = c(0, 1), + labels = c("No Bone Modifying Treatment", "Bone Modifying Treatment")), + axillary_dissection = factor(axillary_dissection, levels = c(0, 1), + labels = c("No Axillary Dissection", "Axillary Dissection")), + diag_surgery_type_1 = factor(diag_surgery_type_1, levels = c(1, 2), + labels = c("Lumpectomy", "Mastectomy")), + diag_neoadj_chemo_1 = factor(diag_neoadj_chemo_1, levels = c(0, 1), + labels = c("No Neoadjuvant Chemo", "Neoadjuvant Chemo")), + diag_pcr_1 = factor(diag_pcr_1, levels = c(1, 2), + labels = c("pCR", "Non-pCR")), + ctDNA_ever = factor(ctDNA_ever, levels = c("FALSE", "TRUE"), + labels = c("ctDNA Negative", "ctDNA Positive")), + dtc_ever = factor(dtc_ever, levels = c(0, 1), + labels = c("DTC Negative", "DTC Positive")) + ) + +#### Labels + +label(dtc_unique_subset_data$age_at_diag) <- "Age at Diagnosis" +units(dtc_unique_subset_data$age_at_diag) <- "years" + +# assign `final_receptor_group` labels to `dc_unique_subset_data` +label(dtc_unique_subset_data$final_receptor_group) <- "Final Receptor Group" ##demo_race_final - -table(unique_subset_data$demo_race_final) #1, 3, 5 -- 5 = white, 1 = black, 3 = asian - -unique_subset_data$demo_race_final <- - factor(unique_subset_data$demo_race_final, levels=c(1,3,5), - labels=c("Black", - "Asian", "White")) -label(unique_subset_data$demo_race_final) <- "Race" -table(unique_subset_data$demo_race_final) +label(dtc_unique_subset_data$demo_race_final) <- "Race" #final_tumor_grade -table(unique_subset_data$final_tumor_grade) # 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not Reported. Added 3 to NA up above for table 1 code so it will be considered N/A. -unique_subset_data$final_tumor_grade <- - factor(unique_subset_data$final_tumor_grade, levels=c(0,1,2), - labels=c("Grade 3", - "Grade 1", "Grade 2")) -label(unique_subset_data$final_tumor_grade) <- "Tumor Grade" -table(unique_subset_data$final_tumor_grade) +label(dtc_unique_subset_data$final_tumor_grade) <- "Tumor Grade" #final_overall_stage -table(unique_subset_data$final_overall_stage) # 1 = stage I 2 = stage II 3 = stage III - -unique_subset_data$final_overall_stage <- - factor(unique_subset_data$final_overall_stage, levels=c(1,2,3), - labels=c("Stage I", - "Stage II", "Stage III")) -label(unique_subset_data$final_overall_stage) <- "Overall Stage" -table(unique_subset_data$final_overall_stage) +label(dtc_unique_subset_data$final_overall_stage) <- "Overall Stage" #final_t_stage -table(unique_subset_data$final_t_stage) # 1 = T1 2 = T2 3 = T3 4 = T4 - -unique_subset_data$final_t_stage <- - factor(unique_subset_data$final_t_stage, levels=c(1,2,3,4), - labels=c("T1", - "T2", "T3", "T4")) -label(unique_subset_data$final_t_stage) <- "T Stage" -table(unique_subset_data$final_t_stage) +label(dtc_unique_subset_data$final_t_stage) <- "T Stage" #final_n_stage - -table(unique_subset_data$final_n_stage) # 0 = N0 1 = N1 2 = N2 3 = N3 - -unique_subset_data$final_n_stage <- - factor(unique_subset_data$final_n_stage, levels=c(0,1,2,3), - labels=c("N0", - "N1", "N2", "N3")) -label(unique_subset_data$final_n_stage) <- "N Stage" -table(unique_subset_data$final_n_stage) +label(dtc_unique_subset_data$final_n_stage) <- "N Stage" #histology_category -table(unique_subset_data$histology_category) #These are labeled already correctly as both ductal and lobular, ductal, lobular, and other -label(unique_subset_data$histology_category) <- "Histology Category" +label(dtc_unique_subset_data$histology_category) <- "Histology Category" -#prtx_radiation -table(unique_subset_data$prtx_radiation) #1 = radiation 0 = no +#prtx_radiation -unique_subset_data$prtx_radiation <- - factor(unique_subset_data$prtx_radiation, levels=c(0,1), - labels=c("No Radiation", "Radiation")) -label(unique_subset_data$prtx_radiation) <- "Radiation" -table(unique_subset_data$prtx_radiation) +label(dtc_unique_subset_data$prtx_radiation) <- "Radiation" #prtx_chemo - -table(unique_subset_data$prtx_chemo) #1 = chemo 0 = no -table(subset_data$prtx_chemo) - -unique_subset_data$prtx_chemo <- -factor(unique_subset_data$prtx_chemo, levels=c(0,1), - labels=c("No Chemo", "Chemo")) -label(unique_subset_data$prtx_chemo) <- "Chemo" -table(unique_subset_data$prtx_chemo) +label(dtc_unique_subset_data$prtx_chemo) <- "Chemo" #prtx_endo - - -table(unique_subset_data$prtx_endo) #1 = hormone therapy 0 = no -table(subset_data$prtx_endo) - -unique_subset_data$prtx_endo <- -factor(unique_subset_data$prtx_endo, levels=c(0,1), - labels=c("No Endocrine Therapy", "Endocrine Therapy")) -label(unique_subset_data$prtx_endo) <- "Endocrine Therapy" -table(unique_subset_data$prtx_endo) +label(dtc_unique_subset_data$prtx_endo) <- "Endocrine Therapy" #prtx_bonemod - -table(unique_subset_data$prtx_bonemod) #1 = bonemod 0 = no -table(unique_subset_data$prtx_bonemod) - -unique_subset_data$prtx_bonemod <- -factor(unique_subset_data$prtx_bonemod, levels=c(0,1), - labels=c("No Bone Modifying Treatment", "Bone Modifying Treatment")) -label(unique_subset_data$prtx_bonemod) <- "Bone Modifying Treatment" -table(unique_subset_data$prtx_bonemod) - - +label(dtc_unique_subset_data$prtx_bonemod) <- "Bone Modifying Treatment" #node_status -table(unique_subset_data$node_status) #already positive and negative -label(unique_subset_data$node_status) <- "Node Status" +label(dtc_unique_subset_data$node_status) <- "Node Status" #axillary_dissection - -table(unique_subset_data$axillary_dissection) #1 = axillary dissection 0 = no dissection - -unique_subset_data$axillary_dissection <- -factor(unique_subset_data$axillary_dissection, levels=c(0,1), - labels=c("No Axillary Dissection", "Axillary Dissection")) -label(unique_subset_data$axillary_dissection) <- "Axillary Dissection" -table(unique_subset_data$axillary_dissection) +label(dtc_unique_subset_data$axillary_dissection) <- "Axillary Dissection" #diag_surgery_type_1 -table(unique_subset_data$diag_surgery_type_1) #1 = Lumpectomy 2 = Mastectomy - -unique_subset_data$diag_surgery_type_1 <- -factor(unique_subset_data$diag_surgery_type_1, levels=c(1,2), - labels=c("Lumpectomy", "Mastectomy")) -label(unique_subset_data$diag_surgery_type_1) <- "Surgery Type" -table(unique_subset_data$diag_surgery_type_1) +label(dtc_unique_subset_data$diag_surgery_type_1) <- "Surgery Type" #diag_neoadj_chemo_1 -table(unique_subset_data$diag_neoadj_chemo_1) #1 = Neoadj Chemo 0 = No Neoadjuv +label(dtc_unique_subset_data$diag_neoadj_chemo_1) <- "Neoadjuvant Chemo" -unique_subset_data$diag_neoadj_chemo_1 <- -factor(unique_subset_data$diag_neoadj_chemo_1, levels=c(0,1), - labels=c("No Neoadjuvant Chemo", "Neoadjuvant Chemo")) -label(unique_subset_data$diag_neoadj_chemo_1) <- "Neoadjuvant Chemo" -table(unique_subset_data$diag_neoadj_chemo_1) +#pCR +label(dtc_unique_subset_data$diag_pcr_1) <- "Pathologic Complete Response" -#ctDNA_ever -table(unique_subset_data$ctDNA_ever) #FALSE = ctDNA Negative TRUE = ctDNA Positive -unique_subset_data$ctDNA_ever <- -factor(unique_subset_data$ctDNA_ever, levels=c("FALSE", "TRUE"), - labels=c("ctDNA Negative", "ctDNA Positive")) -label(unique_subset_data$ctDNA_ever) <- "ctDNA Status" -table(unique_subset_data$ctDNA_ever) +#DTC_ever +label(dtc_unique_subset_data$ctDNA_ever) <- "DTC Status" -caption <- "Table 1 by ctDNA Status" -# Generate the table1 summary -table1( +#### + +# Step 1: Create table1 output +table1_output <- table1( ~ age_at_diag + final_receptor_group + demo_race_final + final_tumor_grade + final_overall_stage + final_t_stage + final_n_stage + @@ -3268,255 +3202,253 @@ table1( prtx_chemo + prtx_endo + prtx_bonemod + node_status + axillary_dissection + diag_surgery_type_1 + diag_neoadj_chemo_1 | - ctDNA_ever, - data = unique_subset_data, overall=c(left="Total"), caption=caption) - - - -``` - - -I have chosen to perform a multivariable logistic regression to identify predictors of ctDNA positivity as we suspect this is a biomarker of relapse and can see even in our data-set that it is strongly associated with relapse and worse overall survival. Identifying predictors of positivity would help us to understand who we might consider screening for ctDNA positivity, as this testing is expensive and takes time and resources--and may not benefit everyone. ctDNA positivity is a binary outcome, and we have performed univariable analyses as above already to look at potentially significant relationships. There are multiple types of variables worth considering, including demographic and clinical factors, disease factors (such as aggressiveness of the tumor measured by histology and grade, the hormone receptor status of the tumor, the stage of disease at diagnosis), and treatment factors (surgery type, radiation or no radiation, neoadjuvant chemotherapy or no, endocrine aka anti-hormone therapy). These are all time-invarying factors, and all were present at the beginning of enrollment on study, prior to ctDNA positivity. - -Lasso will give us the most parsimonious model and is an automatic approach, without consideration of absolute p-value cutoffs. We considered stepwise model building based on p-values, but this approach has gone out of favor as this approach uses somewhat arbitrary p-value cutoffs and can ignore actually relevant and important variables. As we have already performed univariate tests of association above with chi-squared tests, we do not need to perform univariable logistic regression. + dtc_ever, + data = dtc_unique_subset_data +) -```{r} +table1_output -### DELETE +#### +pvalue_function <- function(x, ...) { + print(x) + # Remove any "overall" group if present and focus only on ctDNA+ and ctDNA- comparisons + x <- x[!names(x) %in% "overall"] # Filter out the "Overall" column + y <- unlist(x) + g <- factor(rep(1:length(x), times = sapply(x, length))) + + # Debugging information to check group levels and data + if (length(unique(g)) != 2) { + return(NA) # Return NA if not comparing exactly two groups + } -library(dplyr) + # Perform the appropriate test based on the type of variable + if (is.numeric(y)) { + # For continuous variables, perform a t-test + p <- t.test(y ~ g)$p.value + } else { + # For categorical variables, perform a chi-squared test or Fisher's test + table_result <- table(y, g) + + # Choose the correct test based on cell counts + if (any(table_result < 5)) { + p <- fisher.test(table_result)$p.value # Use Fisher's test for low counts + } else { + p <- chisq.test(table_result)$p.value # Use chi-squared test otherwise + } + } + + # Format the p-value for output + formatted_p <- format.pval(p, digits = 3, eps = 0.001) + return(formatted_p) +} + -# Univariable logistic regression -- do not need to do this as we did the chisquared tests already -# Define the outcome variable -outcome <- "ctDNA_ever" - -# Continuous predictors -continuous_vars <- c("age_at_diag", "dtc_ihc_summary_count_final") - -# Categorical predictors -categorical_vars <- c("demo_race_final", "final_receptor_group", "HR_status", - "final_tumor_grade", "histology_category", "final_n_stage", - "final_t_stage", "final_overall_stage", - "diag_surgery_type_1", "axillary_dissection", "prtx_radiation", "prtx_chemo", - "diag_neoadj_chemo_1", "prtx_endo", "prtx_bonemod", "diag_pcr_1", "fu_locreg_prog", - "fu_locreg_site_char", "fu_dist_prog", "fu_dist_site_char", "ever_relapsed", - "fu_surv") - -# Univariable regression for continuous variables -univariable_results_continuous <- sapply(continuous_vars, function(var) { - formula <- as.formula(paste(outcome, "~", var)) - model <- glm(formula, data = subset_data, family = "binomial") - summary(model)$coefficients[2, c(1, 4)] # Extract coefficient and p-value -}) - -# Univariable regression for categorical variables (assuming factors are properly encoded) -univariable_results_categorical <- sapply(categorical_vars, function(var) { - formula <- as.formula(paste(outcome, "~", var)) - model <- glm(formula, data = subset_data, family = "binomial") - summary(model)$coefficients[2, c(1, 4)] # Extract coefficient and p-value -}) - -# Combine continuous and categorical results -univariable_results <- data.frame( - Variable = c(continuous_vars, categorical_vars), - Estimate = c(univariable_results_continuous[1,], univariable_results_categorical[1,]), - p_value = c(univariable_results_continuous[2,], univariable_results_categorical[2,]) +# Generate table1 with the p-value column +table1_dtc <- table1( + ~ age_at_diag + final_receptor_group + demo_race_final + + final_tumor_grade + final_overall_stage + + final_t_stage + final_n_stage + + histology_category + prtx_radiation + + prtx_chemo + prtx_endo + prtx_bonemod + + node_status + axillary_dissection + + diag_surgery_type_1 + diag_neoadj_chemo_1 | + dtc_ever, + data = dtc_unique_subset_data, + overall = c(left = "Total"), + extra.col = list("P-value" = pvalue_function), # Add p-value function + extra.col.pos = 4 # Position of the extra column ) -# Print univariable results -print(univariable_results) +table1_dtc #we have p-values! ``` -We will next think about our multivariable regression model. We have several variables that were significant in our univariable analyses (chi-squared). These include median age-at-diagnosis, longer time from diagnosis to enrollment, higher tumor grade, and higher initial stage at diagnosis. Variables that were not significant but that could be considered include histology, nodal positivity, higher t-stage, and receptor status. While recurrence (both distant and local) as well as worse survival are significantly associated with ctDNA positivity, these are outcomes that we think of as following ctDNA positivity temporally and therefore should not be included in our predictive model as predictors. +We can see in this series of tests that there are similar, but not identical, sets of variables that appear to be significant in predicting DTC status, including: Histology category (with ductal histology more storngly correlated with positivity), Nodal status (with node positive patients more likely to have DTC positivity), with trends towards significance for N stage, receptor group, and tumor grade. We have decided to not include pCR in this table or in further analyses because the cohort of patients who received neoadjuvant therapy is only 19 patients, so the n is very low for any tests of association and there is significant missingness for the overall cohort. + +### **Multivariable Analysis** + +**Variable Selection and Planning:** I have chosen to perform a multivariable logistic regression to identify predictors of ctDNA (and DTC) positivity as we suspect these are biomarkers of relapse and can see even in our data-set that ctDNA is strongly associated with relapse and worse overall survival. Identifying predictors of positivity would help us to understand who we might consider screening for ctDNA positivity, as this testing is expensive and takes time and resources--and may not benefit everyone. ctDNA positivity is a binary outcome, and we have performed univariable analyses as above already to look at potentially significant relationships. There are multiple types of variables worth considering, including demographic and clinical factors, disease factors (such as aggressiveness of the tumor as measured by histology and grade, the hormone receptor status of the tumor, the stage of disease at diagnosis), and treatment factors (surgery type, radiation or no radiation, neoadjuvant chemotherapy or no, endocrine aka anti-hormone therapy). The only variable we have removed from our model is pathologic complete response (whether or not patients have NO tumor at the time of surgery IF they received neoadjuvant chemo/immunotherapy before surgery) as the number of patients who received neoadjuvant therapy was not particularly high and therefore there is significant missingness (and it would not make sense to impute for this variable, as it only is a relevant factor to consider for those patients who received neoadjuvant therapy). These are all time-invarying factors, and all were present at the beginning of enrollment on study, prior to ctDNA testing. We will use all of our variables that we assessed in our initial univariable tests of association (including those that had significant associations and those that did not), as we suspect some of these variables are related to one another or colinear and therefore we cannot rely on simple univariable tests of association to determine what will be most predictive of positivity. -In thinking about what these variables represent, we think about the extent of treatment that patients have received as one major category. We also think about intrinsic tumor risk factors as another. +We have several variables that were significant in our univariable analyses (chi-squared). These include median age-at-diagnosis, longer time from diagnosis to enrollment, higher tumor grade, and higher initial stage at diagnosis. Variables that were not significant but that could be considered include histology, nodal positivity, higher t-stage, and receptor status. While recurrence (both distant and local) as well as worse survival are significantly associated with ctDNA positivity, these are outcomes that we think of as following ctDNA positivity temporally and therefore should not be included in our predictive model as predictors. -There is no specific method to choose variables, but generally purposeful selection begins with univariate analysis which we have already performed. Next we will perform LASSO to identify and select variables. +**LASSO:** Lasso will give us the most parsimonious model and is an automatic approach, without consideration of absolute p-value cutoffs. There is no specific "right" method to choose variables, but generally purposeful selection begins with univariable analysis which we have already performed. We considered stepwise model building based on p-values, but this approach has gone out of favor as this approach uses somewhat arbitrary p-value cutoffs and can ignore actually relevant and important variables. As we have already performed univariable tests of association above with chi-squared tests, we do not need to perform univariable logistic regression. We have removed one variable of interest that we assessed with univariable association for ctDNA because of the significant missingness in the cohort overall (and its applicability to only the small subset of patients who received neoadjuvant therapy). We will perform LASSO with our remaining variables to identify and select variables that are most predictive. ```{r} library(glmnet) +set.seed(123) + # Prepare the response variable y <- unique_subset_data$ctDNA_ever -# Predictor matrix, excluding the outcome variable -X <- as.matrix(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", - "final_tumor_grade", "final_overall_stage", - "final_t_stage", "final_n_stage", - "histology_category", "prtx_radiation", - "prtx_chemo", "prtx_endo", "prtx_bonemod", - "node_status", "axillary_dissection", - "diag_surgery_type_1", "diag_neoadj_chemo_1")]) + +#was getting an error message when I ran y and X2 because there were 4 missing observations, so will impute these as it is only 4 and missingness is lo (<10%) +library(mice) + +# Impute missing values (as general missingness is low as above) +imputed_data <- mice(unique_subset_data, m = 1, method = "pmm", maxit = 5) +unique_subset_data <- complete(imputed_data) + +#-1 to not include intercept in this matrix as a predictor variable +X2 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+ + final_tumor_grade + final_overall_stage + + + final_t_stage + final_n_stage + + histology_category + prtx_radiation + + + prtx_chemo + prtx_endo + prtx_bonemod + + node_status + axillary_dissection + + diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data) + # Fit lasso model -lasso_model <- glmnet(X, y, family = "binomial", alpha = 1) # alpha = 1 for lasso +lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1) # alpha = 1 for lasso #Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. -cv_lasso_model <- cv.glmnet(X, y, family = "binomial", alpha = 1) +cv_lasso_model <- cv.glmnet(X2, y, family = "binomial", alpha = 1) #plotting the results to look at the performance of different lamda plot(cv_lasso_model) -#getting the best lambda +#getting the best lambda -- 0.052 best_lambda <- cv_lasso_model$lambda.min print(paste("Best lambda:", best_lambda)) #Finding the final fit model with the optimal lambda -final_lasso_model <- glmnet(X, y, family = "binomial", alpha = 1, lambda = best_lambda) +final_lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1, lambda = best_lambda) -#Which coefficents are included in the model. age at diag coeff is 0.057 (so small influence) +#Which coefficents are included in the model. coef(final_lasso_model) -### Trying with fewer variables that are not as colinear (not including node status or chemo as a majority of ppl got chemo) +``` + +Variables that remain significant in the LASSO for ctDNA positivity are t-stage and n-stage. It is slightly challenging to interpret these multi-level variables (such as T-stage and N stage) in the lasso but you can see that higher categories (T4, N2) are associated with positivity in the LASSO. The lambda for this model is quite low at 0.05. It is important to remember that a number of these variables are related to one another (such as T stage and N stage with final overall stage, which is built based on T and N stage), and node status + N stage (node status being built on N stage). I'll try a few other LASSOs to see whether by eliminating one of each of these colinear variables we get different results. + +```{r} + +library(glmnet) + +set.seed(123) #to ensure consistency of results -#making sure that the variables are represented in our dataset we are going to use for the lasso -table(unique_subset_data$ctDNA_ever) -table(unique_subset_data$demo_race_final) - -#most of these are factors or characters -sapply(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", - "final_tumor_grade", "final_overall_stage", - "final_t_stage", "final_n_stage", - "histology_category", "prtx_radiation", - "prtx_chemo", "prtx_endo", "prtx_bonemod", - "node_status", "axillary_dissection", - "diag_surgery_type_1", "diag_neoadj_chemo_1")], class) - -#converting factors/ characters to numeric -unique_subset_data$final_receptor_group <- as.numeric(as.factor(unique_subset_data$final_receptor_group)) -unique_subset_data$demo_race_final <- as.numeric(as.factor(unique_subset_data$demo_race_final)) -unique_subset_data$final_tumor_grade <- as.numeric(as.factor(unique_subset_data$final_tumor_grade)) -unique_subset_data$final_overall_stage <- as.numeric(as.factor(unique_subset_data$final_overall_stage)) -unique_subset_data$final_t_stage<- as.numeric(as.factor(unique_subset_data$final_t_stage)) -unique_subset_data$final_n_stage <- as.numeric(as.factor(unique_subset_data$final_n_stage)) -unique_subset_data$histology_category <- as.numeric(as.factor(unique_subset_data$histology_category)) -unique_subset_data$prtx_radiation <- as.numeric(as.factor(unique_subset_data$prtx_radiation)) -unique_subset_data$prtx_chemo <- as.numeric(as.factor(unique_subset_data$prtx_chemo)) -unique_subset_data$prtx_endo <- as.numeric(as.factor(unique_subset_data$prtx_endo)) -unique_subset_data$prtx_bonemod <- as.numeric(as.factor(unique_subset_data$prtx_bonemod)) -unique_subset_data$node_status <- as.numeric(as.factor(unique_subset_data$node_status)) -unique_subset_data$axillary_dissection <- as.numeric(as.factor(unique_subset_data$axillary_dissection)) -unique_subset_data$diag_surgery_type_1 <- as.numeric(as.factor(unique_subset_data$diag_surgery_type_1)) -unique_subset_data$diag_neoadj_chemo_1 <- as.numeric(as.factor(unique_subset_data$diag_neoadj_chemo_1)) - - -#checking NAs - -# Check for NAs in the dataset -- none! -sum(is.na(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", - "final_tumor_grade", "final_overall_stage", - "final_t_stage", "final_n_stage", - "histology_category", "prtx_radiation", - "prtx_chemo", "prtx_endo", "prtx_bonemod", - "node_status", "axillary_dissection", - "diag_surgery_type_1", "diag_neoadj_chemo_1")])) - -### let's try LASSO again # Prepare the response variable y <- unique_subset_data$ctDNA_ever -#making matrix / in the right form -- appears to be all numeric data -X1 <- as.matrix(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", - "final_tumor_grade", "final_overall_stage", - "final_t_stage", "final_n_stage", - "histology_category", "prtx_radiation", - "prtx_chemo", "prtx_endo", "prtx_bonemod", - "node_status", "axillary_dissection", - "diag_surgery_type_1", "diag_neoadj_chemo_1")]) +#yet again, the same 4 missing observations in X3, so will impute (only 4 observations). We have already imputed these, so I don't need to do it again for unique_subset_Data -lasso_model <- glmnet(X1, y, family = "binomial", alpha = 1) # alpha = 1 for lasso -#when I initially fit this model, I got a NAs induced by coercion error message and realized that a number of these are non-numeric --so above converted a bunch of variables to numeric that were initially factors - -#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. Do get warning that one of the groups in the outcome has fewer than 8 observations, so this modeling may not go super well. -cv_lasso_model <- cv.glmnet(X1, y, family = "binomial", alpha = 1) +### removed Nodal status as a variable +X3 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+ + final_tumor_grade + final_overall_stage + + + final_t_stage + final_n_stage + + histology_category + prtx_radiation + + + prtx_chemo + prtx_endo + prtx_bonemod + + axillary_dissection + + diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data) + +# Fit lasso model +lasso_model <- glmnet(X3, y, family = "binomial", alpha = 1) # alpha = 1 for lasso + +#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. +cv_lasso_model <- cv.glmnet(X3, y, family = "binomial", alpha = 1) #plotting the results to look at the performance of different lamda plot(cv_lasso_model) -#getting the best lambda --best lambda 0.0041359 +#getting the best lambda -- 0.048, lower best_lambda <- cv_lasso_model$lambda.min print(paste("Best lambda:", best_lambda)) #Finding the final fit model with the optimal lambda -final_lasso_model <- glmnet(X1, y, family = "binomial", alpha = 1, lambda = best_lambda) +paired_down_lasso <- glmnet(X3, y, family = "binomial", alpha = 1, lambda = best_lambda) -#Which coefficents are included in the model. Only age_at_diag appears to be significant currently, but the coefficient is 0 suggesting it has very little effect. -coef(final_lasso_model) +#Which coefficents are included in the model. +coef(paired_down_lasso) +``` -#### Repeating Lasso without all of the variables (excluding some that we would expect to be colinear such as NACT and chemo. we will exclude chemo as a majority of individuals had chemo. We will also exclude node_status as this is a derivative/summary variable from final_n_stage) --> X2 +When we use the paired down lasso model for ctDNA positivity (removed nodal positivity), we see that T stage and N stage remain the only significant factors, and that higher nodal status is the most influential on ctDNA positivity. The lambda for this model is 0.048 which is lower than the prior model. -y <- unique_subset_data$ctDNA_ever -X2<- as.matrix(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", - "final_tumor_grade", "final_overall_stage", - "final_t_stage", "final_n_stage", - "histology_category", "prtx_radiation", - "prtx_endo", "prtx_bonemod", - "axillary_dissection", - "diag_surgery_type_1", "diag_neoadj_chemo_1")]) +It is, however, somewhat challenging to model ctDNA positivity using any of these approaches because there were only 9 individuals in this cohort of 109 individuals with positive results. Because of this low N, it is hard to know exactly what to do with these predictors. -lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1) # alpha = 1 for lasso -#when I initially fit this model, I got a NAs induced by coercion error message and realized that a number of these are non-numeric --so above converted a bunch of variables to numeric that were initially factors +The intercept (-2.76) is the log-odds of the outcome (ctDNA positivity or DTC positivity) when all the predictor variables are zero. The coefficients can be interpreted as the amount/times the log odds increases (or decreases) for that cohort, holding all other variables equal. + +To test our proof of principle approach that lasso can be applied to this dataset and perhaps generate more robust results, we will also look at DTC predictors, as DTC positivity was more frequent in this cohort and we therefore suspect the modeling approach may work better. + +```{r} + + +set.seed(123) + +#### DTC predictions. + +subset_data <- subset_data[!duplicated(subset_data$participant_id), ] +dtc_unique_subset_data <- merge(unique_subset_data, subset_data[, c("participant_id", "dtc_ever")], by = "participant_id", all.x = TRUE) + +nrow(dtc_unique_subset_data) # Should still be 109 +table(dtc_unique_subset_data$dtc_ever, useNA = "ifany") # Check for NA values + +#run the lasso for DTC status. This might work better as there are more DTC + results +y1 <- dtc_unique_subset_data$dtc_ever +X2 #use the same X2 as it has the same predictors we are interested in + +dim(X2) # Rows should match nrow(dtc_unique_subset_data) +length(y1) # Should also match nrow(dtc_unique_subset_data). We have the same # (109)! + +lasso_model <- glmnet(X2, y1, family = "binomial", alpha = 1) # alpha = 1 for lasso. 0 for ridge. - #Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. -cv_lasso_model <- cv.glmnet(X2, y, family = "binomial", alpha = 1) +cv_lasso_model <- cv.glmnet(X2, y1, family = "binomial", alpha = 1) #plotting the results to look at the performance of different lamda plot(cv_lasso_model) -#getting the best lambda -- lambda is 0.06 +#getting the best lambda -- best lambda is 0.024, even lower! best_lambda <- cv_lasso_model$lambda.min -print(paste("Best lambda:", best_lambda)) +print(paste("Best lambda:", best_lambda)) #Finding the final fit model with the optimal lambda -final_lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1, lambda = best_lambda) +final_lasso_model <- glmnet(X2, y1, family = "binomial", alpha = 1, lambda = best_lambda) -#Which coefficents are included in the model. Only age_at_diag appears to be predictive in the model (but its s0 value is 0), with all the others just showing dots, which suggests that all of them were shrunk to zero because they were not influential enough on the outcome. +#Which coefficents are included in the model. For the model with DTC positivity, we get more coefficients retained in the model. coef(final_lasso_model) -### tutorial to use -- https://www.statology.org/lasso-regression-in-r/ + ``` -It is somewhat challenging to model ctDNA positivity using any of these approaches because there were only 9 individuals in this cohort with positive results. Because of this low N, it is hard to identify predictors. +For the LASSO model with DTC positivity, we get many more coefficients retained in the model and a lower lambda, which suggests a better overall model. Most notable factors in the LASSO for DTC positivity are higher T stage (with T4 inducing the highest log odds risk of ctDNA positivity), triple positive status (HR+ HER2+) which has a strong negative association with DTC positivty (though this cohort only had a handful of people who met this criteria), and ductal histology. Other influential factors using LASSO are node negativity, radiation history, bone modifying treatment, mastectomy, and neoadjuvant therapy. We will try this modeling for DTC positivity without our nodal status variable as this is likely colinear with node positivity to see how our model changes. -The intercept (-2) is the log-odds of the outcome (dtc_ever, aka dtc positivity) when all the predictor variables are zero. The coefficients can be interpreted as such -- for age at diagnosis, for every 1 unit increase in age, the log-odds of testing DTC positive decreases by 0.0086, holding all other variables constant. This is a relatively small decrease though. +```{r} +#### DTC LASSO without final_n_stage in it -To test our proof of principle approach that lasso can be applied to this dataset, we will also look at DTC predictors, as DTC positivity was more frequent in this cohort and we therefore suspect the modeling approach may work better. +set.seed(123) -``` {r} +#run the lasso for DTC status. This might work better as there are more DTC + results +y1 <- dtc_unique_subset_data$dtc_ever -#### DTC predictions. -table(subset_data$dtc_ever) #dtc_ever is in subset_data (but not yet in the unique_subset_data) -# Merge the datasets to include dtc_ever by participant_id -unique_subset_data <- merge(unique_subset_data, subset_data[, c("participant_id", "dtc_ever")], by = "participant_id", all.x = TRUE) -table(unique_subset_data$dtc_ever) #dtc_ever is now in unique_subset data +### removed final_n_stage as a variable +X4 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+ + final_tumor_grade + final_overall_stage + + + final_t_stage + + histology_category + prtx_radiation + + + prtx_chemo + prtx_endo + prtx_bonemod + + axillary_dissection + node_status + + diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data) + + + +dim(X4) # Rows should match nrow(dtc_unique_subset_data) +length(y1) # Should also match nrow(dtc_unique_subset_data). We have the same # (109)! + +lasso_model <- glmnet(X4, y1, family = "binomial", alpha = 1) # alpha = 1 for lasso. 0 for ridge. -#run the lasso for DTC status. This might work better as there are more DTC + results -y1 <- unique_subset_data$dtc_ever -X3 <- as.matrix(unique_subset_data[, c("age_at_diag", "final_receptor_group", "demo_race_final", - "final_tumor_grade", "final_overall_stage", - "final_t_stage", "final_n_stage", - "histology_category", "prtx_radiation", - "prtx_endo", "prtx_bonemod", - "axillary_dissection", - "diag_surgery_type_1", "diag_neoadj_chemo_1")]) - -lasso_model <- glmnet(X3, y1, family = "binomial", alpha = 1) # alpha = 1 for lasso. 0 for ridge. - - #Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. -cv_lasso_model <- cv.glmnet(X3, y1, family = "binomial", alpha = 1) +cv_lasso_model <- cv.glmnet(X4, y1, family = "binomial", alpha = 1) #plotting the results to look at the performance of different lamda plot(cv_lasso_model) -#getting the best lambda -- best lambda is 0.00345 (better than with ctDNA) +#getting the best lambda -- best lambda is 0.027, same as above best_lambda <- cv_lasso_model$lambda.min print(paste("Best lambda:", best_lambda)) @@ -3527,15 +3459,24 @@ final_lasso_model <- glmnet(X3, y1, family = "binomial", alpha = 1, lambda = bes coef(final_lasso_model) + ``` -For the LASSO model with DTC positivity, we get more coefficients retained in the model and a lower lambda, which suggests a better overall model. Most notable factors in the LASSO for DTC positivity are the influence of axillary dissection (or none) on the log-odds of dtc positivity. Other influential factors using LASSO are surgery type (with mastectomy vs lumpectomy increasing the log odds of DTC positivity) and neoadjuvant chemotherapy (with the presence of NACT increasing the log odds of DTC positivity). +In this last LASSO, in which we removed nodal_status to just assess the more granular final_n_stage (N1 vs N2 vs N3 etc), a few more variables became more significant. T4 stage, ductal histology, and receptor status maintained their strong relationships with DTC positivity, and several other variables maintained their less strong relationships (including grade, nodal status, race, bone modifying treatment, mastectomy, and neoadjuvant therapy--which all increased the risk of DTC positivity). Axillary dissection was negatively associated with dtc positivity--but just barely. These models without the node_status variable are the ones we will choose given that the lambdas are about the same or lower (compared to those including node_status) for both the ctDNA and DTC models as these make more intuitive sense than including two variables that are very similar to one another (as they represent the same information in different ways). + +## Conclusion {#sec-conclusion} +In this cohort of 109 individuals on the SURMOUNT study, DTC positivity occurred more frequently (in around 30% of individuals) than ctDNA postiivity, which occurred in \< 10% of patients either at baseline or during surveillance. Despite low numbers, there was good concordance between ctDNA and DTC positivity (in particular, accounting for timepoint, with a concordance of 0.8). -Among the 62 pts who remained DTC-, 5 (8%) were ctDNA+ (with 5/5 who recurred), and 57 remained ctDNA- (of whom 5/57 recurred). All ctDNA positivity in DTC+ pts occurred at the time of or after DTC positivity. Over median follow-up (f/u) of 65 months (m), BC recurrence occurred in 14/96 pts (15%), with 2 locoregional-only and 12 distant +/- locoregional recurrences (involving the bone, liver, lung/pleura, and brain); 8/14 pts (57%) were ctDNA+ prior to relapse. 7/12 (58%) with distant recurrences were ctDNA+ prior to metastatic diagnosis, at a median lead time of 15 m (range 0 – 25). Overall, ctDNA+ pts experienced a median lead time from ctDNA positivity to recurrence of 13 m (range 0 – 25). Only 1 of 9 ctDNA+ pts has not recurred; this pt was DTC+ and went on therapeutic trial, without evidence of recurrence over 20 m f/u. 30/34 DTC+ pts (89%) who went on therapeutic trial have not had ctDNA detected during f/u and have not recurred. Overall, ctDNA status was significantly associated with relapse (p\<0.01), with a PPV of 89% and NPV of 93%. Of the 24 BL DTC+ pts, 2 became ctDNA+ at subsequent timepoints, an average of 18 m after DTC assessment, and both relapsed (3 and 5 m from ctDNA detection, respectively). +In assessing predictors of ctDNA positivity, we identified that higher T stage and N stage remain the most significant predictors of ctDNA positivity (With age at diagnosis, HR+ and HER2+, lobular histology, and lower grade also serving as significant predictors of ctDNA positivity). The lambda for this model is 0.048. -Describe your results and include relevant tables, plots, and code/comments used to obtain them. You may refer to the @sec-methods as needed. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you'd like, but this is not required. +In assessing predictors of DTC positivity using LASSO, we identified a bunch of factors including ductal histology, higher T stage (larger tumor size), and HER2 negative histology as the factors most strongly associated with DTC positivity. Other factors that were associated in multivariable approaches included factors representing more treatment (mastectomy, radiation, and neoadjuvant therapy). Interestingly, nodal positivity seemed to be negatively associated with DTC positivity. The lambda for this model is 0.027. -## Conclusion {#sec-conclusion} +It is worth noting that the ctDNA model in particular is challenging to interpret in the setting of the low number of ctDNA positive individuals (n=9). + +Overall, ctDNA status was significantly associated with relapse (p<0.01), with a PPV of 89% and NPV of 94% (and a specificity for relapse of 0.99). DTC positivity was NOT significantly associated with relapse and the sensitivity and specificity of this test for relapse was challenging to interpret in light of the fact that all DTC positive patients in this cohort patients went onto interventional trials aimed at eliminating dormant cancer cells. The negative predictive value of DTC assessment was high (0.86), suggesting that this test may potentially be useful in identifying those individuals who are at lower risk of relapse. + +Future directions will be aimed at assessing the test characteristics of DTC assessment in the full cohort of patients on SURMOUNT to date (n=220) and looking at the incremental value of multiple testing, obtaining ctDNA assessment for this full cohort of patients, and performing survival analyses to assess lead time to clinical events (relapse, death) with DTC and ctDNA assessment and looking at the fluctuation of ctDNA positivity among those patients on clinical trials who had frequent testing while on therapy (and following therapy). + +We had several limitations: Missing data (though low levels for our variables of interest for this analysis). Our model also includes colinear variables--or variables that represent different ways of thinking about tumor aggressiveness or disease aggressiveness (such as T stage and N stage, which directly feed into Overall Stage) in the LASSO. The LASSO does not account for this, so we will try group lasso as our next step. We also had limited power in creating predictive model for ctDNA in particular given the rarity of positivity in our cohort (though this rate matches the positivity rate in other cohort studies). -In this study, X was associated with Y. From b5ca287838944ec44e42e91da4ef501e53a7a0df Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 9 Dec 2024 14:47:04 -0500 Subject: [PATCH 09/14] Rename FinalProject.html to FinalProjectTaranto.html --- FinalProject.html | 16937 ------------------------------------- FinalProjectTaranto.html | 1 + 2 files changed, 1 insertion(+), 16937 deletions(-) delete mode 100644 FinalProject.html create mode 100644 FinalProjectTaranto.html diff --git a/FinalProject.html b/FinalProject.html deleted file mode 100644 index dfc896185..000000000 --- a/FinalProject.html +++ /dev/null @@ -1,16937 +0,0 @@ - - - - - - - - - - -Predictors of ctDNA positivity - - - - - - - - - - - - - - - - - - - - -
- -
- -
-
-

Predictors of ctDNA positivity

-

BMIN503/EPID600 Final Project

-
- - - -
- -
-
Author
-
-

Eleanor Taranto

-
-
- - - -
- - - -
- - -
-
-

1 Overview

-

Link to my final project: https://github.com/ntaranto/BMIN503_Final_Project

-

After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA (ctDNA) as risk factors for recurrence. Historically, in retrospective, observational studies, these markers of minimal residual disease (MRD) have demonstrated strong associations with recurrence and disease relapse. But it remains unclear how these associations persist with modern treatment paradigms, what the time course of positivity, and what clinical risk factors put patients at greatest risk for having DTC or ctDNA positivity and subsequent relapse–as well as which most strongly predict biomarker positivity.

-

Optimizing detection, intervention and surveillance for MRD after breast cancer treatment utilizing highly sensitive, real-time assays to identify patients with dormant tumor cells and those undergoing tumor reactivation in-transit to recurrence would fundamentally change the paradigm of breast cancer follow up from one of “watchful waiting” to a proactive approach that could enable ongoing monitoring for subclinical disease and suppression or eradication of cells that could potentially metastasize. Ultimately, an approach using MRD biomarkers could provide reassurance to patients with definitively negative MRD testing that they are unlikely to experience a relapse, could enable detection and treatment strategies for those in whom MRD is detectable and, ultimately, prevent recurrence and subsequent death from metastatic breast cancer while preserving quality of life for millions of breast cancer survivors around the world.

-

In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this study, we are following patients over multiple years after their breast cancer for these markers and clinically following them for relapse and survival events. The goal of this study is to assess the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time and risk factors for DTC and ctDNA positivity, optimize the type and number of tests needed to predict recurrence, and further evaluate the long-term impact of therapeutic interventions aimed at eliminating DTCs.

-

In this specific analysis, we will look at the clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed.

-

For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the associations between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about clinical predictors of positivity and the biomarkers of breast cancer recurrence and dormance more broadly.

-
-
-

2 Introduction

-

Breast cancer is the most prevalent cancer since it is both common and treatable, with 5-year survival rates approaching 90%. Of the 14.5 million cancer survivors alive today in the U.S., nearly 1 in 4 are breast cancer survivors. Unfortunately, up to 30% of patients who receive adjuvant therapy for breast cancer will ultimately recur and die from their disease, typically as a consequence of metastatic recurrence. Since recurrent breast cancer is incurable, the propensity of cancers to recur following treatment is arguably the most important determinant of clinical outcome. Recurrent breast cancers arise from the pool of breast cancer cells that remain after initial treatment, likely in the form of minimal residual disease (MRD): local and disseminated residual tumor cells that survive in their host in a presumed dormant state following treatment of the primary breast cancer. The development of incurable metastatic disease is thought to be due to this persistent pool of residual disease resulting from escape of cells from the primary tumor, intravasation and survival in the circulation, and eventual extravasation and metastatic seeding. Many breast cancers pass through a latent phase in which disseminated tumor cells (DTCs) may persist in niches where they may reside in a dormant state for months to decades. These DTCs are thought to exist in a temporary quiescent state of reversible cell-cycle arrest, from which some cells may eventually reactivate, resume proliferation and recirculate, at which point they can be detected as circulating tumor DNA (ctDNA) in the blood. Longitudinal studies demonstrate that the detection of DTCs in the bone marrow in such patients is associated with poorer disease-free, breast cancer recurrence-free, and overall survival compared to patients without DTCs. Several mechanisms implicated in this process by preclinical studies are therapeutically targetable, and the research group in the 2-PREVENT Breast Cancer Translational Center of Excellence (TCE) have developed several interventional trials aimed at targeting these DTCs.

-

However, it still remains unclear how exactly the presence of DTCs and/or ctDNA predicts relapse in the era of modern treatment for breast cancers, including chemotherapy, immunotherapy, surgery, targeted treatments, and radiation. Questions remain about who will develop DTC/ctDNA positivity, which patients with DTC positivity will have these cells reactivate, whether or not and when DTC positivity leads to ctDNA positivity, and which patients with these markers will develop relapse and subsequent metastatic disease.In the SURMOUNT surveillance study, patients with early stage (i.e. curable) but high-risk breast cancer are enrolled and undergo initial baseline bone marrow assessment (BMA) for evaluation of DTCs by immunohistochemistry (IHC), as well as peripheral blood assessment for retrospective ctDNA assesmsent. Patients who screen DTC positive–either at baseline or on yearly surveillance BMA–are referred for interventional trials aimed at eliminating dormant cells prior to clinical relapse. Patients who screen DTC negative remain in the SURMOUNT surveillance cohort and undergo yearly DTC assessment and peripheral blood collection for ctDNA assessment. All patients are followed for recurrence events and survival. The first intervention trial, CLEVER, completed enrollment in 2021, so this initial analysis is focused on the patients who were enrolled on SURMOUNT for the purposes of accruing this first trial.

-

Despite years of progress in breast cancer diagnostics and therapeutics, identifying the individuals at risk of recurrence – and figuring out how to manage and minimize their elevated risk–remains a challenge. In this study, we sought to assess the clinical validity of DTC and ctDNA assessment and to better understand the population in which they may be useful. Specifically, in this analysis, we looked at overall rates of ctDNA and DTC positivity in this cohort and clinical factors that were associated with each.

-
-
-

3 Methods

-

“PENN SURMOUNT”: SURMOUNT is a single center, prospective, longitudinal cohort study examining MRD biomarkers among pts within 5 years (y) of BC diagnosis who completed all curative treatment except endocrine therapy. Eligible pts must have had: 1) TNBC, or 2) HER2+ or HR+ BC with positive LN and/or residual disease after neoadjuvant therapy, or 3) HR+ BC with a 21-gene Recurrence score >25 and/or high risk Mammaprint. Pts had annual bone marrow aspirate (BMA) for DTCs by immunohistochemistry (using methods of Naume et al.). DTC+ pts went on therapeutic trial; DTC- pts had up to 5y of annual BMA and blood testing. ctDNA was retrospectively assessed using the RaDaR assay (NeoGenomics), which targets pt-specific somatic mutations identified by whole-exome sequencing (WES) of primary tumor tissue.

-

Data Collection and Merge: The ctDNA assessment was performed after bespoke panel development on tissue on peripheral blood from 109 patients by Neogenomics, inc. and provided back to the research team in .csv format, with the last data drop occurring July 30, 2024. DTC assessment was performed based on bone marrow assessment and ultimately entered into a REDCap database by the research team through this same follow-up date. Clinical and demographic factors–and follow-up data–were abstracted by the research team through October 2024 and entered into the same REDCap database. Data were exported in mid-October 2024 by the TCE data manager, and merged with the ctDNA data prior to hand-off for this analysis. The final locked and merged dataset, labeled “surmount184_merged_20241108.csv” is maintained in the TCE box for the ctDNA analysis, and a copy is being stored in the FinalProject_files.

-

First, we will import csv of final data, which is entitled “surmount184_merged_20241108.csv.”

-
-
library(here)
-
-
here() starts at /Users/NoraTaranto/BMIN503_Final_Project
-
-
library(dplyr) 
-
-

-Attaching package: 'dplyr'
-
-
-
The following objects are masked from 'package:stats':
-
-    filter, lag
-
-
-
The following objects are masked from 'package:base':
-
-    intersect, setdiff, setequal, union
-
-
d <- read.csv(file = here("data",
-                          "surmount184_merged_20241108.csv"))
-
-

Next, we will limit the data to the 109 patients who had ctDNA tested, of the 184 individuals who were included in the initial CLEVER trial screening group. We will look at the names and structures of the variables in the dataset “d”, of which there are 387, the majority of which are clinical variables (often categorical dummy variables), but some of which are outcome variables related to local relapse, distant relapse, and survival as well as to the pathology report accounting for DTC. This list will help us to identify the important factors to include ultimately in our multivariable model to predict positivity of these markers. We will also look at the structure of the variables as we may need to reformat some of them for analyses.

-
-
#looking at the names of the variables, and the structure of the variables. 
-names(d) 
-
-
  [1] "ID"                               "trialID"                         
-  [3] "participant_id"                   "patient_id"                      
-  [5] "fu_trial_pid"                     "timepoint"                       
-  [7] "project"                          "surmount_id"                     
-  [9] "panel_id"                         "accession"                       
- [11] "sample_id"                        "collection_date"                 
- [13] "extracted_plasma_volume_ml"       "input"                           
- [15] "input_sample"                     "physical_run_name"               
- [17] "workflow_name"                    "eVAF"                            
- [19] "mutant_molecules"                 "mean_VAF"                        
- [21] "Score"                            "all_pass_variants"               
- [23] "total_variants"                   "n_positive_variants"             
- [25] "ctDNA_detected"                   "ctdna_cohort"                    
- [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
- [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
- [31] "dtc_final_result_date"            "pt"                              
- [33] "bma_date"                         "ORIG_RSLT_DTC"                   
- [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
- [37] "FINAL_COUNT"                      "org_consent_date"                
- [39] "demo_initials"                    "demo_dob"                        
- [41] "demo_sex"                         "demo_ethnicity"                  
- [43] "demo_race___1"                    "demo_race___2"                   
- [45] "demo_race___3"                    "demo_race___4"                   
- [47] "demo_race___5"                    "demo_race___88"                  
- [49] "demo_race___99"                   "demo_race_other"                 
- [51] "prtx_radiation"                   "prtx_rad_start"                  
- [53] "prtx_rad_end"                     "prtx_chemo"                      
- [55] "prtx_endo"                        "prtx_bonemod"                    
- [57] "prior_therapy_complete"           "inc_dx_crit"                     
- [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
- [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
- [63] "final_receptor_group"             "demo_race_final"                 
- [65] "final_histology"                  "final_tumor_grade"               
- [67] "final_overall_stage"              "final_t_stage"                   
- [69] "final_n_stage"                    "fu_date_to"                      
- [71] "fu_surv"                          "fu_date_death"                   
- [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
- [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
- [77] "fu_locreg_prog"                   "fu_locreg_date"                  
- [79] "fu_dist_site_num"                 "fu_dist_site_char"               
- [81] "fu_dist_prog"                     "fu_dist_date"                    
- [83] "censor_date"                      "chemo_indication_1"              
- [85] "chemo_name_1"                     "chemo_name_other_1"              
- [87] "chemo_start_date_1"               "start_date_exact_1"              
- [89] "chemo_end_date_1"                 "end_date_exact_1"                
- [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
- [93] "chemo_indication_2"               "chemo_name_2"                    
- [95] "chemo_name_other_2"               "chemo_start_date_2"              
- [97] "start_date_exact_2"               "chemo_end_date_2"                
- [99] "end_date_exact_2"                 "chemo_notes_2"                   
-[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
-[103] "chemo_name_3"                     "chemo_name_other_3"              
-[105] "chemo_start_date_3"               "start_date_exact_3"              
-[107] "chemo_end_date_3"                 "end_date_exact_3"                
-[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
-[111] "chemo_indication_4"               "chemo_name_4"                    
-[113] "chemo_name_other_4"               "chemo_start_date_4"              
-[115] "start_date_exact_4"               "chemo_end_date_4"                
-[117] "end_date_exact_4"                 "chemo_notes_4"                   
-[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
-[121] "hormone_name___1_1"               "hormone_name___2_1"              
-[123] "hormone_name___3_1"               "hormone_name___4_1"              
-[125] "hormone_name___5_1"               "hormone_name___6_1"              
-[127] "hormone_name___7_1"               "hormone_other_1"                 
-[129] "hormone_start_date_1"             "hormone_ongoing_1"               
-[131] "hormone_end_date_1"               "hormone_notes_1"                 
-[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
-[135] "hormone_name___1_2"               "hormone_name___2_2"              
-[137] "hormone_name___3_2"               "hormone_name___4_2"              
-[139] "hormone_name___5_2"               "hormone_name___6_2"              
-[141] "hormone_name___7_2"               "hormone_other_2"                 
-[143] "hormone_start_date_2"             "hormone_ongoing_2"               
-[145] "hormone_end_date_2"               "hormone_notes_2"                 
-[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
-[149] "hormone_name___1_3"               "hormone_name___2_3"              
-[151] "hormone_name___3_3"               "hormone_name___4_3"              
-[153] "hormone_name___5_3"               "hormone_name___6_3"              
-[155] "hormone_name___7_3"               "hormone_other_3"                 
-[157] "hormone_start_date_3"             "hormone_ongoing_3"               
-[159] "hormone_end_date_3"               "hormone_notes_3"                 
-[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
-[163] "hormone_name___1_4"               "hormone_name___2_4"              
-[165] "hormone_name___3_4"               "hormone_name___4_4"              
-[167] "hormone_name___5_4"               "hormone_name___6_4"              
-[169] "hormone_name___7_4"               "hormone_other_4"                 
-[171] "hormone_start_date_4"             "hormone_ongoing_4"               
-[173] "hormone_end_date_4"               "hormone_notes_4"                 
-[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
-[177] "hormone_name___1_5"               "hormone_name___2_5"              
-[179] "hormone_name___3_5"               "hormone_name___4_5"              
-[181] "hormone_name___5_5"               "hormone_name___6_5"              
-[183] "hormone_name___7_5"               "hormone_other_5"                 
-[185] "hormone_start_date_5"             "hormone_ongoing_5"               
-[187] "hormone_end_date_5"               "hormone_notes_5"                 
-[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
-[191] "hormone_name___1_6"               "hormone_name___2_6"              
-[193] "hormone_name___3_6"               "hormone_name___4_6"              
-[195] "hormone_name___5_6"               "hormone_name___6_6"              
-[197] "hormone_name___7_6"               "hormone_other_6"                 
-[199] "hormone_start_date_6"             "hormone_ongoing_6"               
-[201] "hormone_end_date_6"               "hormone_notes_6"                 
-[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
-[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
-[207] "bonemod_name_1"                   "bonemod_start_date_1"            
-[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
-[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
-[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
-[215] "bonemod_name_2"                   "bonemod_start_date_2"            
-[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
-[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
-[221] "diag_lateral_1"                   "diag_menopause_1"                
-[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
-[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
-[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
-[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
-[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
-[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
-[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
-[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
-[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
-[241] "diag_er_status_1"                 "diag_er_percent_1"               
-[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
-[245] "diag_her2_status_1"               "diag_her2_method_1"              
-[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
-[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
-[251] "diag_pcr_1"                       "diag_surgery_date_1"             
-[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
-[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
-[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
-[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
-[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
-[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
-[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
-[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
-[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
-[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
-[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
-[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
-[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
-[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
-[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
-[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
-[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
-[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
-[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
-[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
-[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
-[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
-[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
-[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
-[301] "mammoprint_1"                     "mammoprint_result_1"             
-[303] "diag_notes_1"                     "diag_date_2"                     
-[305] "diag_lateral_2"                   "diag_menopause_2"                
-[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
-[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
-[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
-[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
-[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
-[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
-[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
-[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
-[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
-[325] "diag_er_status_2"                 "diag_er_percent_2"               
-[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
-[329] "diag_her2_status_2"               "diag_her2_method_2"              
-[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
-[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
-[335] "diag_pcr_2"                       "diag_surgery_date_2"             
-[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
-[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
-[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
-[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
-[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
-[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
-[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
-[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
-[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
-[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
-[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
-[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
-[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
-[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
-[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
-[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
-[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
-[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
-[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
-[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
-[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
-[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
-[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
-[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
-[385] "mammoprint_2"                     "mammoprint_result_2"             
-[387] "diag_notes_2"                    
-
-
str(d)
-
-
'data.frame':   579 obs. of  387 variables:
- $ ID                              : int  16001 16001 16001 16001 16001 16002 16003 16004 16005 16005 ...
- $ trialID                         : int  16001 16001 16001 16001 16001 NA NA 16004 1813 1813 ...
- $ participant_id                  : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
- $ patient_id                      : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
- $ fu_trial_pid                    : chr  "" "" "" "" ...
- $ timepoint                       : chr  "SURMOUNT-Baseline" "Year 1 Follow Up" "Year 2 Follow Up" "Long Term FU 1" ...
- $ project                         : chr  "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" ...
- $ surmount_id                     : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
- $ panel_id                        : chr  "23A05190P01" "23A05190P01" "23A05190P01" "23A05190P01" ...
- $ accession                       : chr  "23A07639" "23A07640" "23A07641" "23A07642" ...
- $ sample_id                       : chr  "23A0763907_pl" "23A0764006_pl" "23A0764105_pl" "23A0764206_pl" ...
- $ collection_date                 : chr  "03AUG2016" "18SEP2017" "06AUG2018" "11NOV2021" ...
- $ extracted_plasma_volume_ml      : num  5.61 5.15 3.68 4.87 5.05 ...
- $ input                           : int  4999 4999 2775 1550 2725 NA NA 4999 NA NA ...
- $ input_sample                    : int  19996 19996 11100 6200 10900 NA NA 19996 NA NA ...
- $ physical_run_name               : chr  "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" ...
- $ workflow_name                   : chr  "RaDaR" "RaDaR" "RaDaR" "RaDaR" ...
- $ eVAF                            : num  2.07e-10 2.86e-10 5.64e-12 1.08e-09 2.23e-13 ...
- $ mutant_molecules                : int  0 0 0 0 0 NA NA 0 NA NA ...
- $ mean_VAF                        : num  1.48e-05 1.21e-05 7.21e-06 1.82e-05 1.10e-05 NA NA 2.85e-05 NA NA ...
- $ Score                           : num  -1.013 -0.882 -0.74 -0.495 -0.724 ...
- $ all_pass_variants               : int  16 16 16 16 16 NA NA 47 NA NA ...
- $ total_variants                  : int  48 48 48 48 48 NA NA 48 NA NA ...
- $ n_positive_variants             : int  0 0 0 0 0 NA NA 0 NA NA ...
- $ ctDNA_detected                  : chr  "FALSE" "FALSE" "FALSE" "FALSE" ...
- $ ctdna_cohort                    : int  1 1 1 NA NA 0 0 1 0 0 ...
- $ dtc_ihc_date_final              : chr  "03AUG16:00:00:00" "18SEP17:00:00:00" "06AUG18:10:30:00" "" ...
- $ dtc_ihc_cytospinnum_final       : int  10 5 5 NA NA 10 10 10 10 5 ...
- $ dtc_ihc_result_final            : int  0 0 0 NA NA 0 0 0 0 0 ...
- $ dtc_ihc_summary_count_final     : int  0 0 0 NA NA 0 0 0 0 0 ...
- $ dtc_final_result_date           : chr  "08/03/2016" "09/18/2017" "08/06/2018" "" ...
- $ pt                              : int  NA NA NA NA NA NA NA NA NA NA ...
- $ bma_date                        : chr  "" "" "" "" ...
- $ ORIG_RSLT_DTC                   : int  NA NA NA NA NA NA NA NA NA NA ...
- $ ORIG_RSLT_DTC_COUNT             : int  NA NA NA NA NA NA NA NA NA NA ...
- $ FINAL_RESULT                    : int  NA NA NA NA NA NA NA NA NA NA ...
- $ FINAL_COUNT                     : int  NA NA NA NA NA NA NA NA NA NA ...
- $ org_consent_date                : chr  "06/09/2016" "06/09/2016" "06/09/2016" "06/09/2016" ...
- $ demo_initials                   : chr  "LB" "LB" "LB" "LB" ...
- $ demo_dob                        : chr  "09/21/1957" "09/21/1957" "09/21/1957" "09/21/1957" ...
- $ demo_sex                        : int  2 2 2 2 2 2 2 2 2 2 ...
- $ demo_ethnicity                  : int  2 2 2 2 2 2 2 2 2 2 ...
- $ demo_race___1                   : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___2                   : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___3                   : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___4                   : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___5                   : int  1 1 1 1 1 1 1 1 1 1 ...
- $ demo_race___88                  : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___99                  : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race_other                 : logi  NA NA NA NA NA NA ...
- $ prtx_radiation                  : int  1 1 1 1 1 1 1 0 1 1 ...
- $ prtx_rad_start                  : chr  "04/29/2014" "04/29/2014" "04/29/2014" "04/29/2014" ...
- $ prtx_rad_end                    : chr  "06/13/2014" "06/13/2014" "06/13/2014" "06/13/2014" ...
- $ prtx_chemo                      : int  1 1 1 1 1 1 1 1 1 1 ...
- $ prtx_endo                       : int  1 1 1 1 1 0 0 1 0 0 ...
- $ prtx_bonemod                    : int  0 0 0 0 0 0 0 0 1 1 ...
- $ prior_therapy_complete          : int  2 2 2 2 2 2 2 2 2 2 ...
- $ inc_dx_crit                     : int  1 1 1 1 1 1 1 1 1 1 ...
- $ inc_dx_crit_list___1            : int  0 0 0 0 0 0 1 0 0 0 ...
- $ inc_dx_crit_list___2            : int  1 1 1 1 1 0 1 0 1 1 ...
- $ inc_dx_crit_list___3            : int  0 0 0 0 0 1 0 0 0 0 ...
- $ inc_dx_crit_list___4            : int  0 0 0 0 0 0 0 0 0 0 ...
- $ final_receptor_group            : int  2 2 2 2 2 4 1 2 4 4 ...
- $ demo_race_final                 : int  5 5 5 5 5 5 5 5 5 5 ...
- $ final_histology                 : chr  "14,3" "14,3" "14,3" "14,3" ...
- $ final_tumor_grade               : int  2 2 2 2 2 2 3 2 0 0 ...
- $ final_overall_stage             : int  3 3 3 3 3 2 2 1 3 3 ...
- $ final_t_stage                   : int  2 2 2 2 2 2 2 1 3 3 ...
- $ final_n_stage                   : int  3 3 3 3 3 0 1 0 3 3 ...
- $ fu_date_to                      : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
- $ fu_surv                         : int  1 1 1 1 1 1 1 1 1 1 ...
- $ fu_date_death                   : chr  "" "" "" "" ...
- $ fu_dec_bc_pres                  : int  NA NA NA NA NA NA NA NA NA NA ...
- $ fu_dec_bc_cause                 : int  NA NA NA NA NA NA NA NA NA NA ...
- $ fu_locreg_site_num              : chr  "" "" "" "" ...
- $ fu_locreg_site_char             : chr  "" "" "" "" ...
- $ fu_locreg_prog                  : int  0 0 0 0 0 0 0 0 0 0 ...
- $ fu_locreg_date                  : chr  "" "" "" "" ...
- $ fu_dist_site_num                : chr  "" "" "" "" ...
- $ fu_dist_site_char               : chr  "" "" "" "" ...
- $ fu_dist_prog                    : int  0 0 0 0 0 0 0 0 0 0 ...
- $ fu_dist_date                    : chr  "" "" "" "" ...
- $ censor_date                     : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
- $ chemo_indication_1              : int  1 1 1 1 1 2 2 1 2 2 ...
- $ chemo_name_1                    : int  2 2 2 2 2 5 2 1 7 7 ...
- $ chemo_name_other_1              : chr  "" "" "" "" ...
- $ chemo_start_date_1              : chr  "10/03/2013" "10/03/2013" "10/03/2013" "10/03/2013" ...
- $ start_date_exact_1              : int  1 1 1 1 1 1 1 1 1 1 ...
- $ chemo_end_date_1                : chr  "01/16/2014" "01/16/2014" "01/16/2014" "01/16/2014" ...
- $ end_date_exact_1                : int  1 1 1 1 1 1 1 1 1 1 ...
- $ chemo_notes_1                   : chr  "" "" "" "" ...
- $ prior_chemotherapy_complete_1   : int  2 2 2 2 2 2 2 2 2 2 ...
- $ chemo_indication_2              : int  NA NA NA NA NA 2 NA NA 3 3 ...
- $ chemo_name_2                    : int  NA NA NA NA NA 3 NA NA 25 25 ...
- $ chemo_name_other_2              : chr  "" "" "" "" ...
- $ chemo_start_date_2              : chr  "" "" "" "" ...
- $ start_date_exact_2              : int  NA NA NA NA NA 1 NA NA 1 1 ...
- $ chemo_end_date_2                : chr  "" "" "" "" ...
- $ end_date_exact_2                : chr  "" "" "" "" ...
-  [list output truncated]
-
-
-

Summary variables: We have a few different important summary variables which we’ve identified. Summary variables: final_overall_stage final_t_stage final_n_stage, final_receptor_group (1=‘TNBC’, 2=‘HR+ Her2-’, 3=‘HR+ Her2+’, 4=‘HR- Her2+’) final_tumor_grade final_histology, demo_race_final fu_locreg_site_num (numeric values for local regional site), fu_locreg_site_char (character values for local regional site), fu_dist_site_num (numeric values for distant site), fu_dist_site_char (character values for distant site), censor_date (most recent fu_date_to among patients who are alive without local or distant progression).

-

Limiting from the overall cohort (184) to the ctDNA cohort: We know that this dataset contains 184 individuals (as this was the overall cohort of individuals that were screened for the CLEVER interventional study on SURMOUNT). But we also know, from the separate ctDNA csv and the information from the Neogenomics summary data, that there were 109 individuals on whom ctDNA was assessed. We need to limit the data set “d” to this “ctDNA cohort”–we will call the ctDNA cohort “subset_data.” We have an indicator variable “ctDNA_cohort” with which we can limit this subset.

-
-
#looking at the names of the variables, and the structure of the variables. 
-names(d) 
-
-
  [1] "ID"                               "trialID"                         
-  [3] "participant_id"                   "patient_id"                      
-  [5] "fu_trial_pid"                     "timepoint"                       
-  [7] "project"                          "surmount_id"                     
-  [9] "panel_id"                         "accession"                       
- [11] "sample_id"                        "collection_date"                 
- [13] "extracted_plasma_volume_ml"       "input"                           
- [15] "input_sample"                     "physical_run_name"               
- [17] "workflow_name"                    "eVAF"                            
- [19] "mutant_molecules"                 "mean_VAF"                        
- [21] "Score"                            "all_pass_variants"               
- [23] "total_variants"                   "n_positive_variants"             
- [25] "ctDNA_detected"                   "ctdna_cohort"                    
- [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
- [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
- [31] "dtc_final_result_date"            "pt"                              
- [33] "bma_date"                         "ORIG_RSLT_DTC"                   
- [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
- [37] "FINAL_COUNT"                      "org_consent_date"                
- [39] "demo_initials"                    "demo_dob"                        
- [41] "demo_sex"                         "demo_ethnicity"                  
- [43] "demo_race___1"                    "demo_race___2"                   
- [45] "demo_race___3"                    "demo_race___4"                   
- [47] "demo_race___5"                    "demo_race___88"                  
- [49] "demo_race___99"                   "demo_race_other"                 
- [51] "prtx_radiation"                   "prtx_rad_start"                  
- [53] "prtx_rad_end"                     "prtx_chemo"                      
- [55] "prtx_endo"                        "prtx_bonemod"                    
- [57] "prior_therapy_complete"           "inc_dx_crit"                     
- [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
- [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
- [63] "final_receptor_group"             "demo_race_final"                 
- [65] "final_histology"                  "final_tumor_grade"               
- [67] "final_overall_stage"              "final_t_stage"                   
- [69] "final_n_stage"                    "fu_date_to"                      
- [71] "fu_surv"                          "fu_date_death"                   
- [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
- [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
- [77] "fu_locreg_prog"                   "fu_locreg_date"                  
- [79] "fu_dist_site_num"                 "fu_dist_site_char"               
- [81] "fu_dist_prog"                     "fu_dist_date"                    
- [83] "censor_date"                      "chemo_indication_1"              
- [85] "chemo_name_1"                     "chemo_name_other_1"              
- [87] "chemo_start_date_1"               "start_date_exact_1"              
- [89] "chemo_end_date_1"                 "end_date_exact_1"                
- [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
- [93] "chemo_indication_2"               "chemo_name_2"                    
- [95] "chemo_name_other_2"               "chemo_start_date_2"              
- [97] "start_date_exact_2"               "chemo_end_date_2"                
- [99] "end_date_exact_2"                 "chemo_notes_2"                   
-[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
-[103] "chemo_name_3"                     "chemo_name_other_3"              
-[105] "chemo_start_date_3"               "start_date_exact_3"              
-[107] "chemo_end_date_3"                 "end_date_exact_3"                
-[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
-[111] "chemo_indication_4"               "chemo_name_4"                    
-[113] "chemo_name_other_4"               "chemo_start_date_4"              
-[115] "start_date_exact_4"               "chemo_end_date_4"                
-[117] "end_date_exact_4"                 "chemo_notes_4"                   
-[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
-[121] "hormone_name___1_1"               "hormone_name___2_1"              
-[123] "hormone_name___3_1"               "hormone_name___4_1"              
-[125] "hormone_name___5_1"               "hormone_name___6_1"              
-[127] "hormone_name___7_1"               "hormone_other_1"                 
-[129] "hormone_start_date_1"             "hormone_ongoing_1"               
-[131] "hormone_end_date_1"               "hormone_notes_1"                 
-[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
-[135] "hormone_name___1_2"               "hormone_name___2_2"              
-[137] "hormone_name___3_2"               "hormone_name___4_2"              
-[139] "hormone_name___5_2"               "hormone_name___6_2"              
-[141] "hormone_name___7_2"               "hormone_other_2"                 
-[143] "hormone_start_date_2"             "hormone_ongoing_2"               
-[145] "hormone_end_date_2"               "hormone_notes_2"                 
-[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
-[149] "hormone_name___1_3"               "hormone_name___2_3"              
-[151] "hormone_name___3_3"               "hormone_name___4_3"              
-[153] "hormone_name___5_3"               "hormone_name___6_3"              
-[155] "hormone_name___7_3"               "hormone_other_3"                 
-[157] "hormone_start_date_3"             "hormone_ongoing_3"               
-[159] "hormone_end_date_3"               "hormone_notes_3"                 
-[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
-[163] "hormone_name___1_4"               "hormone_name___2_4"              
-[165] "hormone_name___3_4"               "hormone_name___4_4"              
-[167] "hormone_name___5_4"               "hormone_name___6_4"              
-[169] "hormone_name___7_4"               "hormone_other_4"                 
-[171] "hormone_start_date_4"             "hormone_ongoing_4"               
-[173] "hormone_end_date_4"               "hormone_notes_4"                 
-[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
-[177] "hormone_name___1_5"               "hormone_name___2_5"              
-[179] "hormone_name___3_5"               "hormone_name___4_5"              
-[181] "hormone_name___5_5"               "hormone_name___6_5"              
-[183] "hormone_name___7_5"               "hormone_other_5"                 
-[185] "hormone_start_date_5"             "hormone_ongoing_5"               
-[187] "hormone_end_date_5"               "hormone_notes_5"                 
-[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
-[191] "hormone_name___1_6"               "hormone_name___2_6"              
-[193] "hormone_name___3_6"               "hormone_name___4_6"              
-[195] "hormone_name___5_6"               "hormone_name___6_6"              
-[197] "hormone_name___7_6"               "hormone_other_6"                 
-[199] "hormone_start_date_6"             "hormone_ongoing_6"               
-[201] "hormone_end_date_6"               "hormone_notes_6"                 
-[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
-[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
-[207] "bonemod_name_1"                   "bonemod_start_date_1"            
-[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
-[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
-[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
-[215] "bonemod_name_2"                   "bonemod_start_date_2"            
-[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
-[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
-[221] "diag_lateral_1"                   "diag_menopause_1"                
-[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
-[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
-[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
-[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
-[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
-[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
-[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
-[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
-[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
-[241] "diag_er_status_1"                 "diag_er_percent_1"               
-[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
-[245] "diag_her2_status_1"               "diag_her2_method_1"              
-[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
-[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
-[251] "diag_pcr_1"                       "diag_surgery_date_1"             
-[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
-[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
-[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
-[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
-[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
-[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
-[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
-[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
-[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
-[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
-[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
-[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
-[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
-[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
-[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
-[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
-[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
-[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
-[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
-[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
-[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
-[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
-[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
-[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
-[301] "mammoprint_1"                     "mammoprint_result_1"             
-[303] "diag_notes_1"                     "diag_date_2"                     
-[305] "diag_lateral_2"                   "diag_menopause_2"                
-[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
-[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
-[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
-[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
-[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
-[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
-[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
-[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
-[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
-[325] "diag_er_status_2"                 "diag_er_percent_2"               
-[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
-[329] "diag_her2_status_2"               "diag_her2_method_2"              
-[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
-[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
-[335] "diag_pcr_2"                       "diag_surgery_date_2"             
-[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
-[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
-[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
-[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
-[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
-[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
-[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
-[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
-[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
-[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
-[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
-[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
-[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
-[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
-[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
-[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
-[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
-[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
-[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
-[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
-[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
-[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
-[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
-[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
-[385] "mammoprint_2"                     "mammoprint_result_2"             
-[387] "diag_notes_2"                    
-
-
str(d)
-
-
'data.frame':   579 obs. of  387 variables:
- $ ID                              : int  16001 16001 16001 16001 16001 16002 16003 16004 16005 16005 ...
- $ trialID                         : int  16001 16001 16001 16001 16001 NA NA 16004 1813 1813 ...
- $ participant_id                  : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
- $ patient_id                      : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
- $ fu_trial_pid                    : chr  "" "" "" "" ...
- $ timepoint                       : chr  "SURMOUNT-Baseline" "Year 1 Follow Up" "Year 2 Follow Up" "Long Term FU 1" ...
- $ project                         : chr  "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" ...
- $ surmount_id                     : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
- $ panel_id                        : chr  "23A05190P01" "23A05190P01" "23A05190P01" "23A05190P01" ...
- $ accession                       : chr  "23A07639" "23A07640" "23A07641" "23A07642" ...
- $ sample_id                       : chr  "23A0763907_pl" "23A0764006_pl" "23A0764105_pl" "23A0764206_pl" ...
- $ collection_date                 : chr  "03AUG2016" "18SEP2017" "06AUG2018" "11NOV2021" ...
- $ extracted_plasma_volume_ml      : num  5.61 5.15 3.68 4.87 5.05 ...
- $ input                           : int  4999 4999 2775 1550 2725 NA NA 4999 NA NA ...
- $ input_sample                    : int  19996 19996 11100 6200 10900 NA NA 19996 NA NA ...
- $ physical_run_name               : chr  "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" ...
- $ workflow_name                   : chr  "RaDaR" "RaDaR" "RaDaR" "RaDaR" ...
- $ eVAF                            : num  2.07e-10 2.86e-10 5.64e-12 1.08e-09 2.23e-13 ...
- $ mutant_molecules                : int  0 0 0 0 0 NA NA 0 NA NA ...
- $ mean_VAF                        : num  1.48e-05 1.21e-05 7.21e-06 1.82e-05 1.10e-05 NA NA 2.85e-05 NA NA ...
- $ Score                           : num  -1.013 -0.882 -0.74 -0.495 -0.724 ...
- $ all_pass_variants               : int  16 16 16 16 16 NA NA 47 NA NA ...
- $ total_variants                  : int  48 48 48 48 48 NA NA 48 NA NA ...
- $ n_positive_variants             : int  0 0 0 0 0 NA NA 0 NA NA ...
- $ ctDNA_detected                  : chr  "FALSE" "FALSE" "FALSE" "FALSE" ...
- $ ctdna_cohort                    : int  1 1 1 NA NA 0 0 1 0 0 ...
- $ dtc_ihc_date_final              : chr  "03AUG16:00:00:00" "18SEP17:00:00:00" "06AUG18:10:30:00" "" ...
- $ dtc_ihc_cytospinnum_final       : int  10 5 5 NA NA 10 10 10 10 5 ...
- $ dtc_ihc_result_final            : int  0 0 0 NA NA 0 0 0 0 0 ...
- $ dtc_ihc_summary_count_final     : int  0 0 0 NA NA 0 0 0 0 0 ...
- $ dtc_final_result_date           : chr  "08/03/2016" "09/18/2017" "08/06/2018" "" ...
- $ pt                              : int  NA NA NA NA NA NA NA NA NA NA ...
- $ bma_date                        : chr  "" "" "" "" ...
- $ ORIG_RSLT_DTC                   : int  NA NA NA NA NA NA NA NA NA NA ...
- $ ORIG_RSLT_DTC_COUNT             : int  NA NA NA NA NA NA NA NA NA NA ...
- $ FINAL_RESULT                    : int  NA NA NA NA NA NA NA NA NA NA ...
- $ FINAL_COUNT                     : int  NA NA NA NA NA NA NA NA NA NA ...
- $ org_consent_date                : chr  "06/09/2016" "06/09/2016" "06/09/2016" "06/09/2016" ...
- $ demo_initials                   : chr  "LB" "LB" "LB" "LB" ...
- $ demo_dob                        : chr  "09/21/1957" "09/21/1957" "09/21/1957" "09/21/1957" ...
- $ demo_sex                        : int  2 2 2 2 2 2 2 2 2 2 ...
- $ demo_ethnicity                  : int  2 2 2 2 2 2 2 2 2 2 ...
- $ demo_race___1                   : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___2                   : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___3                   : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___4                   : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___5                   : int  1 1 1 1 1 1 1 1 1 1 ...
- $ demo_race___88                  : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race___99                  : int  0 0 0 0 0 0 0 0 0 0 ...
- $ demo_race_other                 : logi  NA NA NA NA NA NA ...
- $ prtx_radiation                  : int  1 1 1 1 1 1 1 0 1 1 ...
- $ prtx_rad_start                  : chr  "04/29/2014" "04/29/2014" "04/29/2014" "04/29/2014" ...
- $ prtx_rad_end                    : chr  "06/13/2014" "06/13/2014" "06/13/2014" "06/13/2014" ...
- $ prtx_chemo                      : int  1 1 1 1 1 1 1 1 1 1 ...
- $ prtx_endo                       : int  1 1 1 1 1 0 0 1 0 0 ...
- $ prtx_bonemod                    : int  0 0 0 0 0 0 0 0 1 1 ...
- $ prior_therapy_complete          : int  2 2 2 2 2 2 2 2 2 2 ...
- $ inc_dx_crit                     : int  1 1 1 1 1 1 1 1 1 1 ...
- $ inc_dx_crit_list___1            : int  0 0 0 0 0 0 1 0 0 0 ...
- $ inc_dx_crit_list___2            : int  1 1 1 1 1 0 1 0 1 1 ...
- $ inc_dx_crit_list___3            : int  0 0 0 0 0 1 0 0 0 0 ...
- $ inc_dx_crit_list___4            : int  0 0 0 0 0 0 0 0 0 0 ...
- $ final_receptor_group            : int  2 2 2 2 2 4 1 2 4 4 ...
- $ demo_race_final                 : int  5 5 5 5 5 5 5 5 5 5 ...
- $ final_histology                 : chr  "14,3" "14,3" "14,3" "14,3" ...
- $ final_tumor_grade               : int  2 2 2 2 2 2 3 2 0 0 ...
- $ final_overall_stage             : int  3 3 3 3 3 2 2 1 3 3 ...
- $ final_t_stage                   : int  2 2 2 2 2 2 2 1 3 3 ...
- $ final_n_stage                   : int  3 3 3 3 3 0 1 0 3 3 ...
- $ fu_date_to                      : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
- $ fu_surv                         : int  1 1 1 1 1 1 1 1 1 1 ...
- $ fu_date_death                   : chr  "" "" "" "" ...
- $ fu_dec_bc_pres                  : int  NA NA NA NA NA NA NA NA NA NA ...
- $ fu_dec_bc_cause                 : int  NA NA NA NA NA NA NA NA NA NA ...
- $ fu_locreg_site_num              : chr  "" "" "" "" ...
- $ fu_locreg_site_char             : chr  "" "" "" "" ...
- $ fu_locreg_prog                  : int  0 0 0 0 0 0 0 0 0 0 ...
- $ fu_locreg_date                  : chr  "" "" "" "" ...
- $ fu_dist_site_num                : chr  "" "" "" "" ...
- $ fu_dist_site_char               : chr  "" "" "" "" ...
- $ fu_dist_prog                    : int  0 0 0 0 0 0 0 0 0 0 ...
- $ fu_dist_date                    : chr  "" "" "" "" ...
- $ censor_date                     : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
- $ chemo_indication_1              : int  1 1 1 1 1 2 2 1 2 2 ...
- $ chemo_name_1                    : int  2 2 2 2 2 5 2 1 7 7 ...
- $ chemo_name_other_1              : chr  "" "" "" "" ...
- $ chemo_start_date_1              : chr  "10/03/2013" "10/03/2013" "10/03/2013" "10/03/2013" ...
- $ start_date_exact_1              : int  1 1 1 1 1 1 1 1 1 1 ...
- $ chemo_end_date_1                : chr  "01/16/2014" "01/16/2014" "01/16/2014" "01/16/2014" ...
- $ end_date_exact_1                : int  1 1 1 1 1 1 1 1 1 1 ...
- $ chemo_notes_1                   : chr  "" "" "" "" ...
- $ prior_chemotherapy_complete_1   : int  2 2 2 2 2 2 2 2 2 2 ...
- $ chemo_indication_2              : int  NA NA NA NA NA 2 NA NA 3 3 ...
- $ chemo_name_2                    : int  NA NA NA NA NA 3 NA NA 25 25 ...
- $ chemo_name_other_2              : chr  "" "" "" "" ...
- $ chemo_start_date_2              : chr  "" "" "" "" ...
- $ start_date_exact_2              : int  NA NA NA NA NA 1 NA NA 1 1 ...
- $ chemo_end_date_2                : chr  "" "" "" "" ...
- $ end_date_exact_2                : chr  "" "" "" "" ...
-  [list output truncated]
-
-
###### ctDNA to limit to ctDNA cohort (but ok to include NAs as long as they were ever ctDNA cohort == 1) --> shall call this subset_data 
-
-# Identified all participant_ids where ctDNA_cohort == 1 
-valid_participants <- d |> 
-  filter(ctdna_cohort == 1) |> 
-  pull(participant_id) |> 
-  unique()
-
-# Subset the data to include all rows where participant_id is in the valid list
-subset_data <- d |> 
-  filter(participant_id %in% valid_participants)
-
-# Count the number of unique participant_ids in the subset_data
-unique_count <- subset_data |> 
-  summarise(unique_participants = n_distinct(participant_id))
-
-# View the result == 109! This is the correct # of patients. 
-unique_count
-
-
  unique_participants
-1                 109
-
-
-

Creating the ctDNA_ever positive indicator: Now that we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected. We can see, looking at the table by sample count using the ctDNA_detected variable (false = negative/ctDNA was NOT detected, true = positive/ctDNA was detected), that there were 385 negative samples, and 11 positive samples within the ctDNA cohort. Next, we will create the ctDNA_ever variable that will, by participant_id (which is the unique study ID), tell us if that participant ever had ctDNA detected.

-
-
#ctDNA_detected = character, ok 
-
-names(subset_data)
-
-
  [1] "ID"                               "trialID"                         
-  [3] "participant_id"                   "patient_id"                      
-  [5] "fu_trial_pid"                     "timepoint"                       
-  [7] "project"                          "surmount_id"                     
-  [9] "panel_id"                         "accession"                       
- [11] "sample_id"                        "collection_date"                 
- [13] "extracted_plasma_volume_ml"       "input"                           
- [15] "input_sample"                     "physical_run_name"               
- [17] "workflow_name"                    "eVAF"                            
- [19] "mutant_molecules"                 "mean_VAF"                        
- [21] "Score"                            "all_pass_variants"               
- [23] "total_variants"                   "n_positive_variants"             
- [25] "ctDNA_detected"                   "ctdna_cohort"                    
- [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
- [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
- [31] "dtc_final_result_date"            "pt"                              
- [33] "bma_date"                         "ORIG_RSLT_DTC"                   
- [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
- [37] "FINAL_COUNT"                      "org_consent_date"                
- [39] "demo_initials"                    "demo_dob"                        
- [41] "demo_sex"                         "demo_ethnicity"                  
- [43] "demo_race___1"                    "demo_race___2"                   
- [45] "demo_race___3"                    "demo_race___4"                   
- [47] "demo_race___5"                    "demo_race___88"                  
- [49] "demo_race___99"                   "demo_race_other"                 
- [51] "prtx_radiation"                   "prtx_rad_start"                  
- [53] "prtx_rad_end"                     "prtx_chemo"                      
- [55] "prtx_endo"                        "prtx_bonemod"                    
- [57] "prior_therapy_complete"           "inc_dx_crit"                     
- [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
- [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
- [63] "final_receptor_group"             "demo_race_final"                 
- [65] "final_histology"                  "final_tumor_grade"               
- [67] "final_overall_stage"              "final_t_stage"                   
- [69] "final_n_stage"                    "fu_date_to"                      
- [71] "fu_surv"                          "fu_date_death"                   
- [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
- [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
- [77] "fu_locreg_prog"                   "fu_locreg_date"                  
- [79] "fu_dist_site_num"                 "fu_dist_site_char"               
- [81] "fu_dist_prog"                     "fu_dist_date"                    
- [83] "censor_date"                      "chemo_indication_1"              
- [85] "chemo_name_1"                     "chemo_name_other_1"              
- [87] "chemo_start_date_1"               "start_date_exact_1"              
- [89] "chemo_end_date_1"                 "end_date_exact_1"                
- [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
- [93] "chemo_indication_2"               "chemo_name_2"                    
- [95] "chemo_name_other_2"               "chemo_start_date_2"              
- [97] "start_date_exact_2"               "chemo_end_date_2"                
- [99] "end_date_exact_2"                 "chemo_notes_2"                   
-[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
-[103] "chemo_name_3"                     "chemo_name_other_3"              
-[105] "chemo_start_date_3"               "start_date_exact_3"              
-[107] "chemo_end_date_3"                 "end_date_exact_3"                
-[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
-[111] "chemo_indication_4"               "chemo_name_4"                    
-[113] "chemo_name_other_4"               "chemo_start_date_4"              
-[115] "start_date_exact_4"               "chemo_end_date_4"                
-[117] "end_date_exact_4"                 "chemo_notes_4"                   
-[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
-[121] "hormone_name___1_1"               "hormone_name___2_1"              
-[123] "hormone_name___3_1"               "hormone_name___4_1"              
-[125] "hormone_name___5_1"               "hormone_name___6_1"              
-[127] "hormone_name___7_1"               "hormone_other_1"                 
-[129] "hormone_start_date_1"             "hormone_ongoing_1"               
-[131] "hormone_end_date_1"               "hormone_notes_1"                 
-[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
-[135] "hormone_name___1_2"               "hormone_name___2_2"              
-[137] "hormone_name___3_2"               "hormone_name___4_2"              
-[139] "hormone_name___5_2"               "hormone_name___6_2"              
-[141] "hormone_name___7_2"               "hormone_other_2"                 
-[143] "hormone_start_date_2"             "hormone_ongoing_2"               
-[145] "hormone_end_date_2"               "hormone_notes_2"                 
-[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
-[149] "hormone_name___1_3"               "hormone_name___2_3"              
-[151] "hormone_name___3_3"               "hormone_name___4_3"              
-[153] "hormone_name___5_3"               "hormone_name___6_3"              
-[155] "hormone_name___7_3"               "hormone_other_3"                 
-[157] "hormone_start_date_3"             "hormone_ongoing_3"               
-[159] "hormone_end_date_3"               "hormone_notes_3"                 
-[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
-[163] "hormone_name___1_4"               "hormone_name___2_4"              
-[165] "hormone_name___3_4"               "hormone_name___4_4"              
-[167] "hormone_name___5_4"               "hormone_name___6_4"              
-[169] "hormone_name___7_4"               "hormone_other_4"                 
-[171] "hormone_start_date_4"             "hormone_ongoing_4"               
-[173] "hormone_end_date_4"               "hormone_notes_4"                 
-[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
-[177] "hormone_name___1_5"               "hormone_name___2_5"              
-[179] "hormone_name___3_5"               "hormone_name___4_5"              
-[181] "hormone_name___5_5"               "hormone_name___6_5"              
-[183] "hormone_name___7_5"               "hormone_other_5"                 
-[185] "hormone_start_date_5"             "hormone_ongoing_5"               
-[187] "hormone_end_date_5"               "hormone_notes_5"                 
-[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
-[191] "hormone_name___1_6"               "hormone_name___2_6"              
-[193] "hormone_name___3_6"               "hormone_name___4_6"              
-[195] "hormone_name___5_6"               "hormone_name___6_6"              
-[197] "hormone_name___7_6"               "hormone_other_6"                 
-[199] "hormone_start_date_6"             "hormone_ongoing_6"               
-[201] "hormone_end_date_6"               "hormone_notes_6"                 
-[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
-[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
-[207] "bonemod_name_1"                   "bonemod_start_date_1"            
-[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
-[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
-[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
-[215] "bonemod_name_2"                   "bonemod_start_date_2"            
-[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
-[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
-[221] "diag_lateral_1"                   "diag_menopause_1"                
-[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
-[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
-[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
-[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
-[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
-[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
-[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
-[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
-[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
-[241] "diag_er_status_1"                 "diag_er_percent_1"               
-[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
-[245] "diag_her2_status_1"               "diag_her2_method_1"              
-[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
-[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
-[251] "diag_pcr_1"                       "diag_surgery_date_1"             
-[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
-[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
-[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
-[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
-[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
-[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
-[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
-[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
-[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
-[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
-[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
-[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
-[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
-[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
-[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
-[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
-[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
-[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
-[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
-[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
-[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
-[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
-[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
-[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
-[301] "mammoprint_1"                     "mammoprint_result_1"             
-[303] "diag_notes_1"                     "diag_date_2"                     
-[305] "diag_lateral_2"                   "diag_menopause_2"                
-[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
-[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
-[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
-[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
-[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
-[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
-[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
-[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
-[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
-[325] "diag_er_status_2"                 "diag_er_percent_2"               
-[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
-[329] "diag_her2_status_2"               "diag_her2_method_2"              
-[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
-[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
-[335] "diag_pcr_2"                       "diag_surgery_date_2"             
-[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
-[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
-[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
-[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
-[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
-[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
-[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
-[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
-[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
-[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
-[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
-[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
-[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
-[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
-[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
-[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
-[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
-[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
-[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
-[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
-[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
-[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
-[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
-[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
-[385] "mammoprint_2"                     "mammoprint_result_2"             
-[387] "diag_notes_2"                    
-
-
### Excluding the FAILS from this cohort 
-######create the ctDNA Ever positive variable 
-table(subset_data$ctDNA_detected) #385 FALSE,  11 TRUE 
-
-

-      FALSE  TRUE 
-    2   385    11 
-
-
table(d$ctDNA_detected)
-
-

-       Fail FALSE  TRUE 
-  175     8   385    11 
-
-
# Create the 'ctDNA_ever' variable: 
-# This will be 1 if ctDNA_detected was 1 for any record for the participant, otherwise 0.
-subset_data <- subset_data  |> 
-  group_by(participant_id) |>
-  mutate(ctDNA_ever = if_else(any(ctDNA_detected == TRUE), TRUE, FALSE)) |>
-  ungroup()
-
-# View the updated data
-table(subset_data$participant_id, subset_data$ctDNA_ever)
-
-
              
-               FALSE TRUE
-  28115-16-001     5    0
-  28115-16-004     1    0
-  28115-16-010     1    0
-  28115-16-014     1    0
-  28115-16-015    12    0
-  28115-16-017     3    0
-  28115-16-020     0    1
-  28115-16-021     9    0
-  28115-16-023     1    0
-  28115-16-025     1    0
-  28115-16-026    10    0
-  28115-16-027     3    0
-  28115-16-029     2    0
-  28115-16-033     2    0
-  28115-16-035     1    0
-  28115-17-001     8    0
-  28115-17-002     9    0
-  28115-17-006     1    0
-  28115-17-008     9    0
-  28115-17-009     1    0
-  28115-17-010     5    0
-  28115-17-011     9    0
-  28115-17-012    10    0
-  28115-17-016     4    0
-  28115-17-017     5    0
-  28115-17-019     9    0
-  28115-17-021     1    0
-  28115-17-022     1    0
-  28115-17-023     0    2
-  28115-17-024     4    0
-  28115-17-025     0    2
-  28115-17-027     8    0
-  28115-17-030     3    0
-  28115-17-031     5    0
-  28115-17-032     0   10
-  28115-17-036     7    0
-  28115-17-039     2    0
-  28115-17-040     4    0
-  28115-17-045     1    0
-  28115-17-046    10    0
-  28115-17-047     3    0
-  28115-17-048     2    0
-  28115-17-050     0    3
-  28115-17-051     9    0
-  28115-17-052     3    0
-  28115-18-001     7    0
-  28115-18-002     2    0
-  28115-18-004     2    0
-  28115-18-006     1    0
-  28115-18-009     1    0
-  28115-18-011     5    0
-  28115-18-014     2    0
-  28115-18-015     5    0
-  28115-18-017     1    0
-  28115-18-020     8    0
-  28115-18-021     8    0
-  28115-18-022    12    0
-  28115-18-023     3    0
-  28115-18-024     2    0
-  28115-18-027     1    0
-  28115-18-028     1    0
-  28115-18-029     4    0
-  28115-18-030     2    0
-  28115-18-031     3    0
-  28115-18-032     6    0
-  28115-18-034     1    0
-  28115-19-001     0    1
-  28115-19-002     2    0
-  28115-19-003     5    0
-  28115-19-004     1    0
-  28115-19-005     3    0
-  28115-19-006     8    0
-  28115-19-007     5    0
-  28115-19-009     6    0
-  28115-19-011     1    0
-  28115-19-012     3    0
-  28115-19-014     2    0
-  28115-19-016     2    0
-  28115-19-017     2    0
-  28115-19-019     3    0
-  28115-19-020     2    0
-  28115-19-021     4    0
-  28115-19-022     2    0
-  28115-19-025     6    0
-  28115-19-028     2    0
-  28115-20-004     2    0
-  28115-20-007     2    0
-  28115-20-009     4    0
-  28115-20-010     1    0
-  28115-21-001     1    0
-  28115-21-002     4    0
-  28115-21-003     0    2
-  28115-21-006     2    0
-  28115-21-007     0    3
-  28115-21-009     3    0
-  28115-21-011     1    0
-  28115-21-013     4    0
-  28115-21-014     2    0
-  28115-21-015     2    0
-  28115-21-016     8    0
-  28115-21-019     1    0
-  28115-21-020     3    0
-  28115-21-021     3    0
-  28115-21-022     1    0
-  28115-21-024     0    2
-  28115-21-025     2    0
-  28115-21-026     2    0
-  28115-21-027     2    0
-  28115-21-028     1    0
-
-
subset_data |> 
-  group_by(participant_id) |> 
-  summarize(ctDNA_ever = first(ctDNA_ever)) |> 
-  count(ctDNA_ever)
-
-
# A tibble: 2 × 2
-  ctDNA_ever     n
-  <lgl>      <int>
-1 FALSE        100
-2 TRUE           9
-
-
-

We can see here using the summary variable ctDNA_ever that there are 100 individuals with always negative results, and 9 individuals with “ever positive” ctDNA results, which matches our original ctDNA source data.

-

Creating the Ever DTC Positive Variable Next, we will create a variable to represent whether someone ever had a DTC positive test. To do this, we will use the final result variable “dtc_ihc_result_final” which tells us, for a given sample/date, whether that DTC result was positive (“1”) or negative (“0”). We see in this data set, by sample, that there are 221 negative samples, and 49 positive samples in this dataset (accross 109 patients, 39 of whom were DTC positive), which aligns with our prior data and consorts.

-
-
names(subset_data) #looking at the names of variables to find the DTC indicator variable 
-
-
  [1] "ID"                               "trialID"                         
-  [3] "participant_id"                   "patient_id"                      
-  [5] "fu_trial_pid"                     "timepoint"                       
-  [7] "project"                          "surmount_id"                     
-  [9] "panel_id"                         "accession"                       
- [11] "sample_id"                        "collection_date"                 
- [13] "extracted_plasma_volume_ml"       "input"                           
- [15] "input_sample"                     "physical_run_name"               
- [17] "workflow_name"                    "eVAF"                            
- [19] "mutant_molecules"                 "mean_VAF"                        
- [21] "Score"                            "all_pass_variants"               
- [23] "total_variants"                   "n_positive_variants"             
- [25] "ctDNA_detected"                   "ctdna_cohort"                    
- [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
- [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
- [31] "dtc_final_result_date"            "pt"                              
- [33] "bma_date"                         "ORIG_RSLT_DTC"                   
- [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
- [37] "FINAL_COUNT"                      "org_consent_date"                
- [39] "demo_initials"                    "demo_dob"                        
- [41] "demo_sex"                         "demo_ethnicity"                  
- [43] "demo_race___1"                    "demo_race___2"                   
- [45] "demo_race___3"                    "demo_race___4"                   
- [47] "demo_race___5"                    "demo_race___88"                  
- [49] "demo_race___99"                   "demo_race_other"                 
- [51] "prtx_radiation"                   "prtx_rad_start"                  
- [53] "prtx_rad_end"                     "prtx_chemo"                      
- [55] "prtx_endo"                        "prtx_bonemod"                    
- [57] "prior_therapy_complete"           "inc_dx_crit"                     
- [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
- [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
- [63] "final_receptor_group"             "demo_race_final"                 
- [65] "final_histology"                  "final_tumor_grade"               
- [67] "final_overall_stage"              "final_t_stage"                   
- [69] "final_n_stage"                    "fu_date_to"                      
- [71] "fu_surv"                          "fu_date_death"                   
- [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
- [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
- [77] "fu_locreg_prog"                   "fu_locreg_date"                  
- [79] "fu_dist_site_num"                 "fu_dist_site_char"               
- [81] "fu_dist_prog"                     "fu_dist_date"                    
- [83] "censor_date"                      "chemo_indication_1"              
- [85] "chemo_name_1"                     "chemo_name_other_1"              
- [87] "chemo_start_date_1"               "start_date_exact_1"              
- [89] "chemo_end_date_1"                 "end_date_exact_1"                
- [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
- [93] "chemo_indication_2"               "chemo_name_2"                    
- [95] "chemo_name_other_2"               "chemo_start_date_2"              
- [97] "start_date_exact_2"               "chemo_end_date_2"                
- [99] "end_date_exact_2"                 "chemo_notes_2"                   
-[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
-[103] "chemo_name_3"                     "chemo_name_other_3"              
-[105] "chemo_start_date_3"               "start_date_exact_3"              
-[107] "chemo_end_date_3"                 "end_date_exact_3"                
-[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
-[111] "chemo_indication_4"               "chemo_name_4"                    
-[113] "chemo_name_other_4"               "chemo_start_date_4"              
-[115] "start_date_exact_4"               "chemo_end_date_4"                
-[117] "end_date_exact_4"                 "chemo_notes_4"                   
-[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
-[121] "hormone_name___1_1"               "hormone_name___2_1"              
-[123] "hormone_name___3_1"               "hormone_name___4_1"              
-[125] "hormone_name___5_1"               "hormone_name___6_1"              
-[127] "hormone_name___7_1"               "hormone_other_1"                 
-[129] "hormone_start_date_1"             "hormone_ongoing_1"               
-[131] "hormone_end_date_1"               "hormone_notes_1"                 
-[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
-[135] "hormone_name___1_2"               "hormone_name___2_2"              
-[137] "hormone_name___3_2"               "hormone_name___4_2"              
-[139] "hormone_name___5_2"               "hormone_name___6_2"              
-[141] "hormone_name___7_2"               "hormone_other_2"                 
-[143] "hormone_start_date_2"             "hormone_ongoing_2"               
-[145] "hormone_end_date_2"               "hormone_notes_2"                 
-[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
-[149] "hormone_name___1_3"               "hormone_name___2_3"              
-[151] "hormone_name___3_3"               "hormone_name___4_3"              
-[153] "hormone_name___5_3"               "hormone_name___6_3"              
-[155] "hormone_name___7_3"               "hormone_other_3"                 
-[157] "hormone_start_date_3"             "hormone_ongoing_3"               
-[159] "hormone_end_date_3"               "hormone_notes_3"                 
-[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
-[163] "hormone_name___1_4"               "hormone_name___2_4"              
-[165] "hormone_name___3_4"               "hormone_name___4_4"              
-[167] "hormone_name___5_4"               "hormone_name___6_4"              
-[169] "hormone_name___7_4"               "hormone_other_4"                 
-[171] "hormone_start_date_4"             "hormone_ongoing_4"               
-[173] "hormone_end_date_4"               "hormone_notes_4"                 
-[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
-[177] "hormone_name___1_5"               "hormone_name___2_5"              
-[179] "hormone_name___3_5"               "hormone_name___4_5"              
-[181] "hormone_name___5_5"               "hormone_name___6_5"              
-[183] "hormone_name___7_5"               "hormone_other_5"                 
-[185] "hormone_start_date_5"             "hormone_ongoing_5"               
-[187] "hormone_end_date_5"               "hormone_notes_5"                 
-[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
-[191] "hormone_name___1_6"               "hormone_name___2_6"              
-[193] "hormone_name___3_6"               "hormone_name___4_6"              
-[195] "hormone_name___5_6"               "hormone_name___6_6"              
-[197] "hormone_name___7_6"               "hormone_other_6"                 
-[199] "hormone_start_date_6"             "hormone_ongoing_6"               
-[201] "hormone_end_date_6"               "hormone_notes_6"                 
-[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
-[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
-[207] "bonemod_name_1"                   "bonemod_start_date_1"            
-[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
-[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
-[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
-[215] "bonemod_name_2"                   "bonemod_start_date_2"            
-[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
-[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
-[221] "diag_lateral_1"                   "diag_menopause_1"                
-[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
-[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
-[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
-[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
-[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
-[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
-[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
-[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
-[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
-[241] "diag_er_status_1"                 "diag_er_percent_1"               
-[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
-[245] "diag_her2_status_1"               "diag_her2_method_1"              
-[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
-[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
-[251] "diag_pcr_1"                       "diag_surgery_date_1"             
-[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
-[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
-[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
-[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
-[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
-[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
-[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
-[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
-[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
-[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
-[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
-[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
-[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
-[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
-[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
-[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
-[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
-[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
-[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
-[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
-[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
-[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
-[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
-[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
-[301] "mammoprint_1"                     "mammoprint_result_1"             
-[303] "diag_notes_1"                     "diag_date_2"                     
-[305] "diag_lateral_2"                   "diag_menopause_2"                
-[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
-[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
-[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
-[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
-[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
-[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
-[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
-[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
-[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
-[325] "diag_er_status_2"                 "diag_er_percent_2"               
-[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
-[329] "diag_her2_status_2"               "diag_her2_method_2"              
-[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
-[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
-[335] "diag_pcr_2"                       "diag_surgery_date_2"             
-[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
-[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
-[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
-[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
-[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
-[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
-[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
-[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
-[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
-[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
-[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
-[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
-[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
-[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
-[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
-[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
-[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
-[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
-[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
-[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
-[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
-[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
-[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
-[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
-[385] "mammoprint_2"                     "mammoprint_result_2"             
-[387] "diag_notes_2"                     "ctDNA_ever"                      
-
-
library(stringr)
-
-#final result variable is dtc_ihc_result_final. This is on a by sample level though. 
-#final count for DTCs is dtc_ihc_summary_count
-#final result date is dtc_final_result_ date
-
-table(subset_data$dtc_ihc_result_final) #221 negatives, 49 positives 
-
-

-  0   1 
-221  49 
-
-
#making the dtc_ever variable 
-subset_data <- subset_data |> 
-  group_by(participant_id) |> 
-  mutate(dtc_ever = if_else(any(dtc_ihc_result_final == 1, na.rm = TRUE), 1, 0)) |> 
-  ungroup()
-
-table(subset_data$participant_id, subset_data$dtc_ever) 
-
-
              
-                0  1
-  28115-16-001  5  0
-  28115-16-004  1  0
-  28115-16-010  1  0
-  28115-16-014  1  0
-  28115-16-015  0 12
-  28115-16-017  3  0
-  28115-16-020  1  0
-  28115-16-021  0  9
-  28115-16-023  1  0
-  28115-16-025  1  0
-  28115-16-026  0 10
-  28115-16-027  3  0
-  28115-16-029  2  0
-  28115-16-033  2  0
-  28115-16-035  1  0
-  28115-17-001  0  8
-  28115-17-002  0  9
-  28115-17-006  1  0
-  28115-17-008  0  9
-  28115-17-009  1  0
-  28115-17-010  0  5
-  28115-17-011  0  9
-  28115-17-012  0 10
-  28115-17-016  0  4
-  28115-17-017  0  5
-  28115-17-019  0  9
-  28115-17-021  1  0
-  28115-17-022  1  0
-  28115-17-023  2  0
-  28115-17-024  0  4
-  28115-17-025  0  2
-  28115-17-027  0  8
-  28115-17-030  3  0
-  28115-17-031  0  5
-  28115-17-032  0 10
-  28115-17-036  0  7
-  28115-17-039  2  0
-  28115-17-040  4  0
-  28115-17-045  1  0
-  28115-17-046  0 10
-  28115-17-047  3  0
-  28115-17-048  2  0
-  28115-17-050  0  3
-  28115-17-051  0  9
-  28115-17-052  3  0
-  28115-18-001  0  7
-  28115-18-002  2  0
-  28115-18-004  2  0
-  28115-18-006  1  0
-  28115-18-009  1  0
-  28115-18-011  5  0
-  28115-18-014  2  0
-  28115-18-015  0  5
-  28115-18-017  1  0
-  28115-18-020  0  8
-  28115-18-021  0  8
-  28115-18-022  0 12
-  28115-18-023  0  3
-  28115-18-024  2  0
-  28115-18-027  1  0
-  28115-18-028  1  0
-  28115-18-029  4  0
-  28115-18-030  2  0
-  28115-18-031  0  3
-  28115-18-032  0  6
-  28115-18-034  1  0
-  28115-19-001  1  0
-  28115-19-002  2  0
-  28115-19-003  5  0
-  28115-19-004  1  0
-  28115-19-005  3  0
-  28115-19-006  0  8
-  28115-19-007  5  0
-  28115-19-009  0  6
-  28115-19-011  1  0
-  28115-19-012  3  0
-  28115-19-014  2  0
-  28115-19-016  0  2
-  28115-19-017  0  2
-  28115-19-019  3  0
-  28115-19-020  2  0
-  28115-19-021  4  0
-  28115-19-022  0  2
-  28115-19-025  0  6
-  28115-19-028  0  2
-  28115-20-004  2  0
-  28115-20-007  2  0
-  28115-20-009  4  0
-  28115-20-010  1  0
-  28115-21-001  1  0
-  28115-21-002  4  0
-  28115-21-003  2  0
-  28115-21-006  2  0
-  28115-21-007  3  0
-  28115-21-009  3  0
-  28115-21-011  1  0
-  28115-21-013  4  0
-  28115-21-014  2  0
-  28115-21-015  2  0
-  28115-21-016  0  8
-  28115-21-019  1  0
-  28115-21-020  3  0
-  28115-21-021  3  0
-  28115-21-022  1  0
-  28115-21-024  0  2
-  28115-21-025  2  0
-  28115-21-026  2  0
-  28115-21-027  0  2
-  28115-21-028  1  0
-
-
subset_data |> 
-  group_by(participant_id) |> 
-  summarize(dtc_ever = first(dtc_ever)) |> 
-  count(dtc_ever)
-
-
# A tibble: 2 × 2
-  dtc_ever     n
-     <dbl> <int>
-1        0    70
-2        1    39
-
-
-

Looking at the number of DTC positives by unique participant, we see 70 DTC ever negatives, 39 positives, which aligns with our source data on DTC positivity for this specific ctDNA cohort.

-
-
-
-

4 Results

-

Sample and Testing Information: In this cohort of 109 individuals who had ctDNA and DTC testing on SURMOUNT (either at baseline or in follow-up), 100 remained persistently ctDNA negative, and 70 remained persistently DTC negative–with 9 respective ctDNA-positive individuals and 39 DTC-positive individuals. Of 184 pts enrolled from 2016 – 2021, 121 had tissue available; 114/121 (94%) had successful WES (prior data/NeoGenomics data).

-
-
#counts for ctDNA positivity 
-subset_data |>
-  filter(ctDNA_ever == "TRUE") |>
-  summarize(unique_participants = n_distinct(participant_id))
-
-
# A tibble: 1 × 1
-  unique_participants
-                <int>
-1                   9
-
-
table(subset_data$ctDNA_detected) #385 FALSE,  11 TRUE 
-
-

-      FALSE  TRUE 
-    2   385    11 
-
-
table(d$ctDNA_detected) #385 false, 11 true, 8 fails 
-
-

-       Fail FALSE  TRUE 
-  175     8   385    11 
-
-
# Count unique participants with FAIL in ctDNA_detected (this is in database d, the original database, not in the ctDNA cohort, as these patients were excluded from the cohort)
-num_fail <- d |> 
-  filter(ctDNA_detected == "Fail") |>   # Filter rows where ctDNA_detected is FAIL
-  distinct(participant_id) |>          # Select unique participant_id
-  nrow()                                # Count the number of rows
-
-num_fail #4 individuals with Fails in original d dataset 
-
-
[1] 4
-
-
#timepoints of positivity. 2 at baseline, 7 after. 
-subset_data |>
-  filter(ctDNA_ever == "TRUE") |>
-  group_by(participant_id) |>
-  summarize(positive_timepoints = list(timepoint))
-
-
# A tibble: 9 × 2
-  participant_id positive_timepoints
-  <chr>          <list>             
-1 28115-16-020   <chr [1]>          
-2 28115-17-023   <chr [2]>          
-3 28115-17-025   <chr [2]>          
-4 28115-17-032   <chr [10]>         
-5 28115-17-050   <chr [3]>          
-6 28115-19-001   <chr [1]>          
-7 28115-21-003   <chr [2]>          
-8 28115-21-007   <chr [3]>          
-9 28115-21-024   <chr [2]>          
-
-
subset_data |>
-  filter(ctDNA_detected == "TRUE", timepoint == "SURMOUNT-Baseline") |>
-  summarize(count_SURMOUNT_Baseline = n())
-
-
# A tibble: 1 × 1
-  count_SURMOUNT_Baseline
-                    <int>
-1                       2
-
-
#eVAF 
-
-subset_data |>
-  filter(ctDNA_ever == "TRUE") |>
-  summarize(
-    mean_eVAF = mean(eVAF, na.rm = TRUE),
-    median_eVAF = median(eVAF, na.rm = TRUE),
-    sd_eVAF = sd(eVAF, na.rm = TRUE),
-    min_eVAF = min(eVAF, na.rm = TRUE),
-    max_eVAF = max(eVAF, na.rm = TRUE)
-  )
-
-
# A tibble: 1 × 5
-  mean_eVAF median_eVAF  sd_eVAF min_eVAF max_eVAF
-      <dbl>       <dbl>    <dbl>    <dbl>    <dbl>
-1 0.0000893 0.000000413 0.000219 2.14e-18 0.000836
-
-
#### DTC counts 
-
-#counts for DTC positivity --> 39 
-subset_data |>
-  filter(dtc_ever == 1) |>
-  summarize(unique_participants = n_distinct(participant_id))
-
-
# A tibble: 1 × 1
-  unique_participants
-                <int>
-1                  39
-
-
#timepoints of positivity. 
-subset_data |>
-  filter(dtc_ever == 1) |>
-  select(participant_id, timepoint)
-
-
# A tibble: 249 × 2
-   participant_id timepoint        
-   <chr>          <chr>            
- 1 28115-16-015   SURMOUNT-Baseline
- 2 28115-16-015   Year 1 Follow Up 
- 3 28115-16-015   Year 2 Follow Up 
- 4 28115-16-015   Year 3 Follow Up 
- 5 28115-16-015   CLEVER-Baseline  
- 6 28115-16-015   C6               
- 7 28115-16-015   6M F/U           
- 8 28115-16-015   12M F/U          
- 9 28115-16-015   18M F/U          
-10 28115-16-015   24M F/U          
-# ℹ 239 more rows
-
-
# numbers at baseline 
-
-subset_data |>
-  filter(dtc_ihc_result_final == 1, timepoint == "SURMOUNT-Baseline") |>
-  summarize(count_SURMOUNT_Baseline = n())
-
-
# A tibble: 1 × 1
-  count_SURMOUNT_Baseline
-                    <int>
-1                      26
-
-
### Timepoint Data (# timepoints per patient)
-
-# Timepoints per patient (median, range), overall
-timepoints_per_patient <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    total_timepoints = n_distinct(timepoint),  # Count distinct timepoints for each patient
-    .groups = "drop"
-  ) |>
-  summarise(
-    median_timepoints = median(total_timepoints, na.rm = TRUE),  # Calculate median
-    min_timepoints = min(total_timepoints, na.rm = TRUE),        # Calculate minimum
-    max_timepoints = max(total_timepoints, na.rm = TRUE)         # Calculate maximum
-  )
-timepoints_per_patient
-
-
# A tibble: 1 × 3
-  median_timepoints min_timepoints max_timepoints
-              <int>          <int>          <int>
-1                 2              1             12
-
-
#  Timepoints of ctDNA assessment (`ctDNA_detected`)
-ctDNA_timepoints <- subset_data |>
-  filter(!is.na(ctDNA_detected)) |>  # Filter out NA values for ctDNA_detected
-  group_by(participant_id) |>
-  summarise(
-    ctDNA_timepoints = n_distinct(timepoint),  # Count distinct timepoints of ctDNA assessment
-    .groups = "drop"
-  ) |>
-  summarise(
-    median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE),  # Calculate median
-    min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE),        # Calculate minimum
-    max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE)         # Calculate maximum
-  )
-ctDNA_timepoints
-
-
# A tibble: 1 × 3
-  median_ctDNA_timepoints min_ctDNA_timepoints max_ctDNA_timepoints
-                    <int>                <int>                <int>
-1                       2                    1                   12
-
-
#  Timepoints of DTC assessment (`dtc_ihc_results_final`)
-dtc_timepoints <- subset_data |>
-  filter(!is.na(dtc_ihc_result_final)) |>  # Filter out NA values for dtc_ihc_result_final
-  group_by(participant_id) |>
-  summarise(
-    dtc_timepoints = n_distinct(timepoint),  # Count distinct timepoints of DTC assessment
-    .groups = "drop"
-  ) |>
-  summarise(
-    median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE),  # Calculate median
-    min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE),        # Calculate minimum
-    max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE)         # Calculate maximum
-  )
-dtc_timepoints
-
-
# A tibble: 1 × 3
-  median_dtc_timepoints min_dtc_timepoints max_dtc_timepoints
-                  <int>              <int>              <int>
-1                     2                  1                  6
-
-
# Print all summaries
-print("Timepoints per patient:")
-
-
[1] "Timepoints per patient:"
-
-
print(timepoints_per_patient)
-
-
# A tibble: 1 × 3
-  median_timepoints min_timepoints max_timepoints
-              <int>          <int>          <int>
-1                 2              1             12
-
-
print("Timepoints of ctDNA assessment:")
-
-
[1] "Timepoints of ctDNA assessment:"
-
-
print(ctDNA_timepoints)
-
-
# A tibble: 1 × 3
-  median_ctDNA_timepoints min_ctDNA_timepoints max_ctDNA_timepoints
-                    <int>                <int>                <int>
-1                       2                    1                   12
-
-
print("Timepoints of DTC assessment:")
-
-
[1] "Timepoints of DTC assessment:"
-
-
print(dtc_timepoints)
-
-
# A tibble: 1 × 3
-  median_dtc_timepoints min_dtc_timepoints max_dtc_timepoints
-                  <int>              <int>              <int>
-1                     2                  1                  6
-
-
-

Timepoints of samples: A total of 396 plasma samples from 109 pts (median 2 timepoints each, range 1-12) have been successfully tested by RaDaR to date, with 8 failed samples across 4 unique individuals. These 4 individuals were excluded from the ctDNA cohort as they did not ultimately have succcesful ctDNA assessment. These 109 individuals had a median of 2 DTC assessment timepoints (range 1-6).

-

Overall, ctDNA was detected in 11 samples from 9/109 pts with a mean eVAF of 0.009% (range 0.002-0.084%). Two pts were ctDNA+ at baseline (BL), and 7 became positive on surveillance. 100/109 were ctDNA- across all timepoints. 39/109 pts were DTC+, either at BL (n=26) or after (n=13).

-
-
# Filter and get unique participants by participant_id
-concordance_overall_unique <- subset_data |> 
-  distinct(participant_id, .keep_all = TRUE) |> 
-  mutate(concordance = ifelse(dtc_ever == ctDNA_ever, "Concordant", "Discordant"))
-
-# Count total concordant and discordant pairs for unique participants
-overall_concordant <- sum(concordance_overall_unique$concordance == "Concordant")
-overall_discordant <- sum(concordance_overall_unique$concordance == "Discordant")
-
-# Proportion of concordance
-proportion_concordant <- overall_concordant / (overall_concordant + overall_discordant)
-
-cat("Overall Concordant (unique participants):", overall_concordant, "\n")
-
-
Overall Concordant (unique participants): 69 
-
-
cat("Overall Discordant (unique participants):", overall_discordant, "\n")
-
-
Overall Discordant (unique participants): 40 
-
-
cat("Overall Proportion Concordant (unique participants):", proportion_concordant, "\n")
-
-
Overall Proportion Concordant (unique participants): 0.6330275 
-
-
#Proportion concordance 63% (ever positive)
-unique <- subset_data |>
-  group_by(participant_id) |>
-  summarize(
-    dtc_ever = max(dtc_ever, na.rm = TRUE),    # Ensures 1 if DTC is ever detected
-    ctDNA_ever = max(ctDNA_ever, na.rm = TRUE) # Ensures 1 if ctDNA is ever detected
-  )
-
-# Create the 2x2 table
-table_ctDNA_dtc <- table(unique$ctDNA_ever, unique$dtc_ever)
-print(table_ctDNA_dtc)
-
-
   
-     0  1
-  0 65 35
-  1  5  4
-
-
-
-
#Concordance by timepoint 
-
-# Ensure that dtc_ihc_result_final is converted to TRUE/FALSE for consistency with ctDNA_detected
-concordance_by_timepoint <- subset_data |> 
-  filter(!is.na(dtc_ihc_result_final) & !is.na(ctDNA_detected)) |> 
-  mutate(
-    # Convert dtc_ihc_result_final (1 = TRUE, 0 = FALSE) to match ctDNA format (TRUE/FALSE)
-    dtc_ihc_result_final_bool = ifelse(dtc_ihc_result_final == 1, TRUE, FALSE),
-    # Create a column to determine concordance (both DTC and ctDNA being TRUE or both being FALSE)
-    concordance = ifelse(dtc_ihc_result_final_bool == ctDNA_detected, "Concordant", "Discordant")
-  ) |>
-  group_by(timepoint) |>
-  summarise(
-    total_concordant = sum(concordance == "Concordant"),
-    total_discordant = sum(concordance == "Discordant"),
-    total_samples = n(),  # Total number of samples at this timepoint
-    concordance_rate = total_concordant / total_samples  # Concordance rate per timepoint
-  )
-
-# Print concordance results for each timepoint
-print(concordance_by_timepoint)
-
-
# A tibble: 10 × 5
-   timepoint    total_concordant total_discordant total_samples concordance_rate
-   <chr>                   <int>            <int>         <int>            <dbl>
- 1 6M F/U                     17                2            19            0.895
- 2 C12                         4                0             4            1    
- 3 C3                         17                2            19            0.895
- 4 C6                         26                2            28            0.929
- 5 EOO                         5                4             9            0.556
- 6 SURMOUNT-Ba…               80               29           109            0.734
- 7 Year 1 Foll…               31                9            40            0.775
- 8 Year 2 Foll…               21                3            24            0.875
- 9 Year 3 Foll…               11                3            14            0.786
-10 Year 4 Foll…                3                1             4            0.75 
-
-
# Now calculate overall concordance across all timepoints
-overall_concordance <- sum(concordance_by_timepoint$total_concordant) / 
-  sum(concordance_by_timepoint$total_samples)
-
-cat("Overall Concordance Rate across all timepoints:", overall_concordance, "\n")
-
-
Overall Concordance Rate across all timepoints: 0.7962963 
-
-
#concordance, considering testing by timepoint, is 80% 
-
-

Concordance of DTC and ctDNA testing: Considering all timepoints, concordance was 63%, with higher concordance (80%) taking into account result concordance at each timepoint. Of 39 ever-DTC+ pts, 4 were ctDNA+ (of whom 3/4 recurred) and 35 remained ctDNA- (with 1/30 who recurred).

-

Test Characteristics

-

Next, we will look at ctDNA and DTC test characteristics. First we will look at the association between ctDNA and DTC positivity. Next we will look at the number of tests.

-
-
############ Test Characteristics and Baseline versus cumulative positivity (ctDNA to start)#######
-
-### DTC by ctDNA (ever positive), association between test positivity. 
-
-# link by participant id 
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    dtc = first(dtc_ever),  # Get the ever dtc for each participant
-    ctDNA_ever = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of dtc vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$dtc, subset_data_by_id$ctDNA_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results, p-val 0.839 
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    65    5
-  1    35    4
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.041269, df = 1, p-value = 0.839
-
-
##### Tests (#s and such of tests)
-
-#number of tests (ctDNA)
-library(dplyr)
-
-# Assuming the status variable is named `ctDNA_detected` in d, and then in subset 
-status_summary_d <- d |>
-  group_by(ctDNA_detected) |>
-  summarise(total_samples = n(), .groups = "drop")
-
-# Print the summary -- we've got 385 FALSE, 8 FAILS, 11 TRUES 
-print(status_summary_d)
-
-
# A tibble: 4 × 2
-  ctDNA_detected total_samples
-  <chr>                  <int>
-1 ""                       175
-2 "FALSE"                  385
-3 "Fail"                     8
-4 "TRUE"                    11
-
-
#looking at the number of Fails by unique participant_id
-fail_count <- d |>
-  filter(ctDNA_detected == "Fail") |>   # Filter for rows where status is "FAIL"
-  distinct(participant_id) |>          # Get unique participant IDs
-  summarise(total_fails = n())          # Count unique participant IDs
-
-# Print the result -- 4 individuals with FAIL results, which is what we got in the consort  
-print(fail_count)
-
-
  total_fails
-1           4
-
-
fail_count <- subset_data |>
-  filter(ctDNA_detected == "Fail") |>   # Filter for rows where status is "FAIL"
-  distinct(participant_id) |>          # Get unique participant IDs
-  summarise(total_fails = n())          # Count unique participant IDs
-
-# Print the result -- none of the fails were pulled into the ctDNA cohort  
-print(fail_count)
-
-
# A tibble: 1 × 1
-  total_fails
-        <int>
-1           0
-
-
#number of DTC tests in this cohort of 109 patients 
-
-unique(subset_data$dtc_ihc_result_final) #NA, 0, and 1 
-
-
[1]  0 NA  1
-
-
status_summary_subset <- subset_data |>
-  group_by(dtc_ihc_result_final) |>
-  summarise(total_samples = n(), .groups = "drop")
-
-# Print the summary -- we've got 221 negatives, 49 positives, 128 NAs, across 39 patients (positive) and 70 patients (negative)  
-#### confirm with nick that not missing the NAs, but I suspect based on the below that we are fine and thse are just ctDNA only timepoints 
-print(status_summary_subset)
-
-
# A tibble: 3 × 2
-  dtc_ihc_result_final total_samples
-                 <int>         <int>
-1                    0           221
-2                    1            49
-3                   NA           128
-
-
### looking at NAs -- all of them have FALSE (so i think these are all the ones that had ctDNA timepoints )
-na_participants_dtc <- subset_data |>
-  filter(is.na(dtc_ihc_result_final)) |>
-  select(participant_id, dtc_ihc_result_final, FINAL_RESULT, ORIG_RSLT_DTC, ctDNA_detected, timepoint)
-
-# Print the list of participant IDs with NA in `dtc_ihc_result_final`-- they all have FALSE ctDNA results, so these are the ctDNA timepoints
-#all of the timepoints are long-term except for CLEVER baseline. 
-print(na_participants_dtc, n=128)
-
-
# A tibble: 128 × 6
-    participant_id dtc_ihc_result_final FINAL_RESULT ORIG_RSLT_DTC
-    <chr>                         <int>        <int>         <int>
-  1 28115-16-001                     NA           NA            NA
-  2 28115-16-001                     NA           NA            NA
-  3 28115-16-015                     NA           NA            NA
-  4 28115-16-015                     NA           NA            NA
-  5 28115-16-015                     NA           NA            NA
-  6 28115-16-015                     NA           NA            NA
-  7 28115-16-015                     NA           NA            NA
-  8 28115-16-015                     NA           NA            NA
-  9 28115-16-021                     NA           NA            NA
- 10 28115-16-021                     NA           NA            NA
- 11 28115-16-021                     NA           NA            NA
- 12 28115-16-021                     NA           NA            NA
- 13 28115-16-026                     NA           NA            NA
- 14 28115-16-026                     NA           NA            NA
- 15 28115-16-026                     NA           NA            NA
- 16 28115-16-026                     NA           NA            NA
- 17 28115-16-026                     NA           NA            NA
- 18 28115-16-026                     NA           NA            NA
- 19 28115-16-033                     NA           NA            NA
- 20 28115-17-001                     NA           NA            NA
- 21 28115-17-001                     NA           NA            NA
- 22 28115-17-001                     NA           NA            NA
- 23 28115-17-001                     NA           NA            NA
- 24 28115-17-001                     NA           NA            NA
- 25 28115-17-002                     NA           NA            NA
- 26 28115-17-002                     NA           NA            NA
- 27 28115-17-002                     NA           NA            NA
- 28 28115-17-002                     NA           NA            NA
- 29 28115-17-002                     NA           NA            NA
- 30 28115-17-008                     NA           NA            NA
- 31 28115-17-008                     NA           NA            NA
- 32 28115-17-008                     NA           NA            NA
- 33 28115-17-008                     NA           NA            NA
- 34 28115-17-008                     NA           NA            NA
- 35 28115-17-008                     NA           NA            NA
- 36 28115-17-010                     NA           NA            NA
- 37 28115-17-011                     NA           NA            NA
- 38 28115-17-011                     NA           NA            NA
- 39 28115-17-011                     NA           NA            NA
- 40 28115-17-011                     NA           NA            NA
- 41 28115-17-012                     NA           NA            NA
- 42 28115-17-012                     NA           NA            NA
- 43 28115-17-012                     NA           NA            NA
- 44 28115-17-012                     NA           NA            NA
- 45 28115-17-012                     NA           NA            NA
- 46 28115-17-012                     NA           NA            NA
- 47 28115-17-016                     NA           NA            NA
- 48 28115-17-017                     NA           NA            NA
- 49 28115-17-017                     NA           NA            NA
- 50 28115-17-019                     NA           NA            NA
- 51 28115-17-019                     NA           NA            NA
- 52 28115-17-019                     NA           NA            NA
- 53 28115-17-019                     NA           NA            NA
- 54 28115-17-019                     NA           NA            NA
- 55 28115-17-024                     NA           NA            NA
- 56 28115-17-027                     NA           NA            NA
- 57 28115-17-027                     NA           NA            NA
- 58 28115-17-027                     NA           NA            NA
- 59 28115-17-027                     NA           NA            NA
- 60 28115-17-031                     NA           NA            NA
- 61 28115-17-032                     NA           NA            NA
- 62 28115-17-032                     NA           NA            NA
- 63 28115-17-032                     NA           NA            NA
- 64 28115-17-032                     NA           NA            NA
- 65 28115-17-032                     NA           NA            NA
- 66 28115-17-032                     NA           NA            NA
- 67 28115-17-036                     NA           NA            NA
- 68 28115-17-036                     NA           NA            NA
- 69 28115-17-046                     NA           NA            NA
- 70 28115-17-046                     NA           NA            NA
- 71 28115-17-046                     NA           NA            NA
- 72 28115-17-046                     NA           NA            NA
- 73 28115-17-046                     NA           NA            NA
- 74 28115-17-046                     NA           NA            NA
- 75 28115-17-050                     NA           NA            NA
- 76 28115-17-050                     NA           NA            NA
- 77 28115-17-051                     NA           NA            NA
- 78 28115-17-051                     NA           NA            NA
- 79 28115-17-051                     NA           NA            NA
- 80 28115-17-051                     NA           NA            NA
- 81 28115-17-051                     NA           NA            NA
- 82 28115-17-052                     NA           NA            NA
- 83 28115-18-001                     NA           NA            NA
- 84 28115-18-001                     NA           NA            NA
- 85 28115-18-001                     NA           NA            NA
- 86 28115-18-001                     NA           NA            NA
- 87 28115-18-004                     NA           NA            NA
- 88 28115-18-015                     NA           NA            NA
- 89 28115-18-020                     NA           NA            NA
- 90 28115-18-020                     NA           NA            NA
- 91 28115-18-020                     NA           NA            NA
- 92 28115-18-020                     NA           NA            NA
- 93 28115-18-021                     NA           NA            NA
- 94 28115-18-021                     NA           NA            NA
- 95 28115-18-021                     NA           NA            NA
- 96 28115-18-021                     NA           NA            NA
- 97 28115-18-021                     NA           NA            NA
- 98 28115-18-021                     NA           NA            NA
- 99 28115-18-022                     NA           NA            NA
-100 28115-18-022                     NA           NA            NA
-101 28115-18-022                     NA           NA            NA
-102 28115-18-022                     NA           NA            NA
-103 28115-18-022                     NA           NA            NA
-104 28115-18-022                     NA           NA            NA
-105 28115-18-023                     NA           NA            NA
-106 28115-18-023                     NA           NA            NA
-107 28115-18-029                     NA           NA            NA
-108 28115-18-031                     NA           NA            NA
-109 28115-18-032                     NA           NA            NA
-110 28115-18-032                     NA           NA            NA
-111 28115-18-032                     NA           NA            NA
-112 28115-18-032                     NA           NA            NA
-113 28115-19-002                     NA           NA            NA
-114 28115-19-005                     NA           NA            NA
-115 28115-19-005                     NA           NA            NA
-116 28115-19-006                     NA           NA            NA
-117 28115-19-006                     NA           NA            NA
-118 28115-19-006                     NA           NA            NA
-119 28115-19-009                     NA           NA            NA
-120 28115-19-025                     NA           NA            NA
-121 28115-19-028                     NA           NA            NA
-122 28115-20-007                     NA           NA            NA
-123 28115-21-006                     NA           NA            NA
-124 28115-21-016                     NA           NA            NA
-125 28115-21-016                     NA           NA            NA
-126 28115-21-016                     NA           NA            NA
-127 28115-21-016                     NA           NA            NA
-128 28115-21-025                     NA           NA            NA
-# ℹ 2 more variables: ctDNA_detected <chr>, timepoint <chr>
-
-
#look at timepoints 
-unique_timepoints <- unique(subset_data$timepoint)
-print(unique_timepoints)
-
-
 [1] "SURMOUNT-Baseline" "Year 1 Follow Up"  "Year 2 Follow Up" 
- [4] "Long Term FU 1"    "Long Term FU 2"    "Year 3 Follow Up" 
- [7] "CLEVER-Baseline"   "C6"                "6M F/U"           
-[10] "12M F/U"           "18M F/U"           "24M F/U"          
-[13] "30M F/U"           "36M F/U"           "C3"               
-[16] "EOO"               "C12"               "Year 4 Follow Up" 
-
-
##### eVAF 
-names(subset_data) #use eVAF
-
-
  [1] "ID"                               "trialID"                         
-  [3] "participant_id"                   "patient_id"                      
-  [5] "fu_trial_pid"                     "timepoint"                       
-  [7] "project"                          "surmount_id"                     
-  [9] "panel_id"                         "accession"                       
- [11] "sample_id"                        "collection_date"                 
- [13] "extracted_plasma_volume_ml"       "input"                           
- [15] "input_sample"                     "physical_run_name"               
- [17] "workflow_name"                    "eVAF"                            
- [19] "mutant_molecules"                 "mean_VAF"                        
- [21] "Score"                            "all_pass_variants"               
- [23] "total_variants"                   "n_positive_variants"             
- [25] "ctDNA_detected"                   "ctdna_cohort"                    
- [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
- [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
- [31] "dtc_final_result_date"            "pt"                              
- [33] "bma_date"                         "ORIG_RSLT_DTC"                   
- [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
- [37] "FINAL_COUNT"                      "org_consent_date"                
- [39] "demo_initials"                    "demo_dob"                        
- [41] "demo_sex"                         "demo_ethnicity"                  
- [43] "demo_race___1"                    "demo_race___2"                   
- [45] "demo_race___3"                    "demo_race___4"                   
- [47] "demo_race___5"                    "demo_race___88"                  
- [49] "demo_race___99"                   "demo_race_other"                 
- [51] "prtx_radiation"                   "prtx_rad_start"                  
- [53] "prtx_rad_end"                     "prtx_chemo"                      
- [55] "prtx_endo"                        "prtx_bonemod"                    
- [57] "prior_therapy_complete"           "inc_dx_crit"                     
- [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
- [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
- [63] "final_receptor_group"             "demo_race_final"                 
- [65] "final_histology"                  "final_tumor_grade"               
- [67] "final_overall_stage"              "final_t_stage"                   
- [69] "final_n_stage"                    "fu_date_to"                      
- [71] "fu_surv"                          "fu_date_death"                   
- [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
- [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
- [77] "fu_locreg_prog"                   "fu_locreg_date"                  
- [79] "fu_dist_site_num"                 "fu_dist_site_char"               
- [81] "fu_dist_prog"                     "fu_dist_date"                    
- [83] "censor_date"                      "chemo_indication_1"              
- [85] "chemo_name_1"                     "chemo_name_other_1"              
- [87] "chemo_start_date_1"               "start_date_exact_1"              
- [89] "chemo_end_date_1"                 "end_date_exact_1"                
- [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
- [93] "chemo_indication_2"               "chemo_name_2"                    
- [95] "chemo_name_other_2"               "chemo_start_date_2"              
- [97] "start_date_exact_2"               "chemo_end_date_2"                
- [99] "end_date_exact_2"                 "chemo_notes_2"                   
-[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
-[103] "chemo_name_3"                     "chemo_name_other_3"              
-[105] "chemo_start_date_3"               "start_date_exact_3"              
-[107] "chemo_end_date_3"                 "end_date_exact_3"                
-[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
-[111] "chemo_indication_4"               "chemo_name_4"                    
-[113] "chemo_name_other_4"               "chemo_start_date_4"              
-[115] "start_date_exact_4"               "chemo_end_date_4"                
-[117] "end_date_exact_4"                 "chemo_notes_4"                   
-[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
-[121] "hormone_name___1_1"               "hormone_name___2_1"              
-[123] "hormone_name___3_1"               "hormone_name___4_1"              
-[125] "hormone_name___5_1"               "hormone_name___6_1"              
-[127] "hormone_name___7_1"               "hormone_other_1"                 
-[129] "hormone_start_date_1"             "hormone_ongoing_1"               
-[131] "hormone_end_date_1"               "hormone_notes_1"                 
-[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
-[135] "hormone_name___1_2"               "hormone_name___2_2"              
-[137] "hormone_name___3_2"               "hormone_name___4_2"              
-[139] "hormone_name___5_2"               "hormone_name___6_2"              
-[141] "hormone_name___7_2"               "hormone_other_2"                 
-[143] "hormone_start_date_2"             "hormone_ongoing_2"               
-[145] "hormone_end_date_2"               "hormone_notes_2"                 
-[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
-[149] "hormone_name___1_3"               "hormone_name___2_3"              
-[151] "hormone_name___3_3"               "hormone_name___4_3"              
-[153] "hormone_name___5_3"               "hormone_name___6_3"              
-[155] "hormone_name___7_3"               "hormone_other_3"                 
-[157] "hormone_start_date_3"             "hormone_ongoing_3"               
-[159] "hormone_end_date_3"               "hormone_notes_3"                 
-[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
-[163] "hormone_name___1_4"               "hormone_name___2_4"              
-[165] "hormone_name___3_4"               "hormone_name___4_4"              
-[167] "hormone_name___5_4"               "hormone_name___6_4"              
-[169] "hormone_name___7_4"               "hormone_other_4"                 
-[171] "hormone_start_date_4"             "hormone_ongoing_4"               
-[173] "hormone_end_date_4"               "hormone_notes_4"                 
-[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
-[177] "hormone_name___1_5"               "hormone_name___2_5"              
-[179] "hormone_name___3_5"               "hormone_name___4_5"              
-[181] "hormone_name___5_5"               "hormone_name___6_5"              
-[183] "hormone_name___7_5"               "hormone_other_5"                 
-[185] "hormone_start_date_5"             "hormone_ongoing_5"               
-[187] "hormone_end_date_5"               "hormone_notes_5"                 
-[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
-[191] "hormone_name___1_6"               "hormone_name___2_6"              
-[193] "hormone_name___3_6"               "hormone_name___4_6"              
-[195] "hormone_name___5_6"               "hormone_name___6_6"              
-[197] "hormone_name___7_6"               "hormone_other_6"                 
-[199] "hormone_start_date_6"             "hormone_ongoing_6"               
-[201] "hormone_end_date_6"               "hormone_notes_6"                 
-[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
-[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
-[207] "bonemod_name_1"                   "bonemod_start_date_1"            
-[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
-[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
-[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
-[215] "bonemod_name_2"                   "bonemod_start_date_2"            
-[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
-[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
-[221] "diag_lateral_1"                   "diag_menopause_1"                
-[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
-[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
-[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
-[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
-[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
-[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
-[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
-[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
-[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
-[241] "diag_er_status_1"                 "diag_er_percent_1"               
-[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
-[245] "diag_her2_status_1"               "diag_her2_method_1"              
-[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
-[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
-[251] "diag_pcr_1"                       "diag_surgery_date_1"             
-[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
-[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
-[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
-[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
-[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
-[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
-[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
-[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
-[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
-[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
-[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
-[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
-[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
-[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
-[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
-[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
-[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
-[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
-[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
-[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
-[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
-[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
-[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
-[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
-[301] "mammoprint_1"                     "mammoprint_result_1"             
-[303] "diag_notes_1"                     "diag_date_2"                     
-[305] "diag_lateral_2"                   "diag_menopause_2"                
-[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
-[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
-[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
-[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
-[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
-[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
-[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
-[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
-[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
-[325] "diag_er_status_2"                 "diag_er_percent_2"               
-[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
-[329] "diag_her2_status_2"               "diag_her2_method_2"              
-[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
-[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
-[335] "diag_pcr_2"                       "diag_surgery_date_2"             
-[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
-[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
-[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
-[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
-[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
-[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
-[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
-[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
-[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
-[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
-[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
-[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
-[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
-[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
-[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
-[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
-[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
-[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
-[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
-[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
-[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
-[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
-[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
-[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
-[385] "mammoprint_2"                     "mammoprint_result_2"             
-[387] "diag_notes_2"                     "ctDNA_ever"                      
-[389] "dtc_ever"                        
-
-
# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE`
-eVAF_range_ctDNA_detected_percent <- subset_data |>
-  filter(ctDNA_detected == TRUE) |>   # Filter for those with ctDNA detected
-  summarise(
-    median_eVAF_percent = median(eVAF, na.rm = TRUE) * 100,  # Convert median to percentage
-    min_eVAF_percent = min(eVAF, na.rm = TRUE) * 100,        # Convert minimum to percentage
-    max_eVAF_percent = max(eVAF, na.rm = TRUE) * 100         # Convert maximum to percentage
-  )
-
-# Print the result
-print(eVAF_range_ctDNA_detected_percent)
-
-
# A tibble: 1 × 3
-  median_eVAF_percent min_eVAF_percent max_eVAF_percent
-                <dbl>            <dbl>            <dbl>
-1             0.00901          0.00165           0.0836
-
-
#### DTC counts 
-names(subset_data) #use dtc_ihc_summary_count_final  
-
-
  [1] "ID"                               "trialID"                         
-  [3] "participant_id"                   "patient_id"                      
-  [5] "fu_trial_pid"                     "timepoint"                       
-  [7] "project"                          "surmount_id"                     
-  [9] "panel_id"                         "accession"                       
- [11] "sample_id"                        "collection_date"                 
- [13] "extracted_plasma_volume_ml"       "input"                           
- [15] "input_sample"                     "physical_run_name"               
- [17] "workflow_name"                    "eVAF"                            
- [19] "mutant_molecules"                 "mean_VAF"                        
- [21] "Score"                            "all_pass_variants"               
- [23] "total_variants"                   "n_positive_variants"             
- [25] "ctDNA_detected"                   "ctdna_cohort"                    
- [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
- [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
- [31] "dtc_final_result_date"            "pt"                              
- [33] "bma_date"                         "ORIG_RSLT_DTC"                   
- [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
- [37] "FINAL_COUNT"                      "org_consent_date"                
- [39] "demo_initials"                    "demo_dob"                        
- [41] "demo_sex"                         "demo_ethnicity"                  
- [43] "demo_race___1"                    "demo_race___2"                   
- [45] "demo_race___3"                    "demo_race___4"                   
- [47] "demo_race___5"                    "demo_race___88"                  
- [49] "demo_race___99"                   "demo_race_other"                 
- [51] "prtx_radiation"                   "prtx_rad_start"                  
- [53] "prtx_rad_end"                     "prtx_chemo"                      
- [55] "prtx_endo"                        "prtx_bonemod"                    
- [57] "prior_therapy_complete"           "inc_dx_crit"                     
- [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
- [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
- [63] "final_receptor_group"             "demo_race_final"                 
- [65] "final_histology"                  "final_tumor_grade"               
- [67] "final_overall_stage"              "final_t_stage"                   
- [69] "final_n_stage"                    "fu_date_to"                      
- [71] "fu_surv"                          "fu_date_death"                   
- [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
- [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
- [77] "fu_locreg_prog"                   "fu_locreg_date"                  
- [79] "fu_dist_site_num"                 "fu_dist_site_char"               
- [81] "fu_dist_prog"                     "fu_dist_date"                    
- [83] "censor_date"                      "chemo_indication_1"              
- [85] "chemo_name_1"                     "chemo_name_other_1"              
- [87] "chemo_start_date_1"               "start_date_exact_1"              
- [89] "chemo_end_date_1"                 "end_date_exact_1"                
- [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
- [93] "chemo_indication_2"               "chemo_name_2"                    
- [95] "chemo_name_other_2"               "chemo_start_date_2"              
- [97] "start_date_exact_2"               "chemo_end_date_2"                
- [99] "end_date_exact_2"                 "chemo_notes_2"                   
-[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
-[103] "chemo_name_3"                     "chemo_name_other_3"              
-[105] "chemo_start_date_3"               "start_date_exact_3"              
-[107] "chemo_end_date_3"                 "end_date_exact_3"                
-[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
-[111] "chemo_indication_4"               "chemo_name_4"                    
-[113] "chemo_name_other_4"               "chemo_start_date_4"              
-[115] "start_date_exact_4"               "chemo_end_date_4"                
-[117] "end_date_exact_4"                 "chemo_notes_4"                   
-[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
-[121] "hormone_name___1_1"               "hormone_name___2_1"              
-[123] "hormone_name___3_1"               "hormone_name___4_1"              
-[125] "hormone_name___5_1"               "hormone_name___6_1"              
-[127] "hormone_name___7_1"               "hormone_other_1"                 
-[129] "hormone_start_date_1"             "hormone_ongoing_1"               
-[131] "hormone_end_date_1"               "hormone_notes_1"                 
-[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
-[135] "hormone_name___1_2"               "hormone_name___2_2"              
-[137] "hormone_name___3_2"               "hormone_name___4_2"              
-[139] "hormone_name___5_2"               "hormone_name___6_2"              
-[141] "hormone_name___7_2"               "hormone_other_2"                 
-[143] "hormone_start_date_2"             "hormone_ongoing_2"               
-[145] "hormone_end_date_2"               "hormone_notes_2"                 
-[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
-[149] "hormone_name___1_3"               "hormone_name___2_3"              
-[151] "hormone_name___3_3"               "hormone_name___4_3"              
-[153] "hormone_name___5_3"               "hormone_name___6_3"              
-[155] "hormone_name___7_3"               "hormone_other_3"                 
-[157] "hormone_start_date_3"             "hormone_ongoing_3"               
-[159] "hormone_end_date_3"               "hormone_notes_3"                 
-[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
-[163] "hormone_name___1_4"               "hormone_name___2_4"              
-[165] "hormone_name___3_4"               "hormone_name___4_4"              
-[167] "hormone_name___5_4"               "hormone_name___6_4"              
-[169] "hormone_name___7_4"               "hormone_other_4"                 
-[171] "hormone_start_date_4"             "hormone_ongoing_4"               
-[173] "hormone_end_date_4"               "hormone_notes_4"                 
-[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
-[177] "hormone_name___1_5"               "hormone_name___2_5"              
-[179] "hormone_name___3_5"               "hormone_name___4_5"              
-[181] "hormone_name___5_5"               "hormone_name___6_5"              
-[183] "hormone_name___7_5"               "hormone_other_5"                 
-[185] "hormone_start_date_5"             "hormone_ongoing_5"               
-[187] "hormone_end_date_5"               "hormone_notes_5"                 
-[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
-[191] "hormone_name___1_6"               "hormone_name___2_6"              
-[193] "hormone_name___3_6"               "hormone_name___4_6"              
-[195] "hormone_name___5_6"               "hormone_name___6_6"              
-[197] "hormone_name___7_6"               "hormone_other_6"                 
-[199] "hormone_start_date_6"             "hormone_ongoing_6"               
-[201] "hormone_end_date_6"               "hormone_notes_6"                 
-[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
-[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
-[207] "bonemod_name_1"                   "bonemod_start_date_1"            
-[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
-[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
-[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
-[215] "bonemod_name_2"                   "bonemod_start_date_2"            
-[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
-[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
-[221] "diag_lateral_1"                   "diag_menopause_1"                
-[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
-[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
-[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
-[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
-[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
-[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
-[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
-[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
-[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
-[241] "diag_er_status_1"                 "diag_er_percent_1"               
-[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
-[245] "diag_her2_status_1"               "diag_her2_method_1"              
-[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
-[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
-[251] "diag_pcr_1"                       "diag_surgery_date_1"             
-[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
-[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
-[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
-[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
-[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
-[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
-[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
-[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
-[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
-[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
-[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
-[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
-[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
-[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
-[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
-[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
-[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
-[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
-[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
-[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
-[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
-[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
-[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
-[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
-[301] "mammoprint_1"                     "mammoprint_result_1"             
-[303] "diag_notes_1"                     "diag_date_2"                     
-[305] "diag_lateral_2"                   "diag_menopause_2"                
-[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
-[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
-[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
-[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
-[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
-[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
-[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
-[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
-[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
-[325] "diag_er_status_2"                 "diag_er_percent_2"               
-[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
-[329] "diag_her2_status_2"               "diag_her2_method_2"              
-[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
-[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
-[335] "diag_pcr_2"                       "diag_surgery_date_2"             
-[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
-[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
-[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
-[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
-[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
-[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
-[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
-[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
-[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
-[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
-[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
-[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
-[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
-[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
-[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
-[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
-[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
-[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
-[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
-[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
-[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
-[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
-[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
-[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
-[385] "mammoprint_2"                     "mammoprint_result_2"             
-[387] "diag_notes_2"                     "ctDNA_ever"                      
-[389] "dtc_ever"                        
-
-
# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE`
-dtc_count <- subset_data |>
-  filter(dtc_ihc_result_final == 1) |>   # Filter for those with dtcs detected 
-  summarise(
-    median_dtc_count = median(dtc_ihc_summary_count_final, na.rm = TRUE), 
-    min_dtc_count = min(dtc_ihc_summary_count_final, na.rm = TRUE),        
-    max_dtc_count = max(dtc_ihc_summary_count_final, na.rm = TRUE)         
-  )
-
-# Print the result
-print(dtc_count)
-
-
# A tibble: 1 × 3
-  median_dtc_count min_dtc_count max_dtc_count
-             <int>         <int>         <int>
-1                2             1            10
-
-
#### Number of timepoints we see 
-
-# Timepoints per patient (median, range)
-timepoints_per_patient <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    total_timepoints = n_distinct(timepoint),  # Count distinct timepoints for each patient
-    .groups = "drop"
-  ) |>
-  summarise(
-    median_timepoints = median(total_timepoints, na.rm = TRUE),  # Calculate median
-    min_timepoints = min(total_timepoints, na.rm = TRUE),        # Calculate minimum
-    max_timepoints = max(total_timepoints, na.rm = TRUE)         # Calculate maximum
-  )
-
-#  Timepoints of ctDNA assessment (`ctDNA_detected`)
-ctDNA_timepoints <- subset_data |>
-  filter(!is.na(ctDNA_detected)) |>  # Filter out NA values for ctDNA_detected
-  group_by(participant_id) |>
-  summarise(
-    ctDNA_timepoints = n_distinct(timepoint),  # Count distinct timepoints of ctDNA assessment
-    .groups = "drop"
-  ) |>
-  summarise(
-    median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE),  # Calculate median
-    min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE),        # Calculate minimum
-    max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE)         # Calculate maximum
-  )
-
-#  Timepoints of DTC assessment (`dtc_ihc_results_final`)
-dtc_timepoints <- subset_data |>
-  filter(!is.na(dtc_ihc_result_final)) |>  # Filter out NA values for dtc_ihc_result_final
-  group_by(participant_id) |>
-  summarise(
-    dtc_timepoints = n_distinct(timepoint),  # Count distinct timepoints of DTC assessment
-    .groups = "drop"
-  ) |>
-  summarise(
-    median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE),  # Calculate median
-    min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE),        # Calculate minimum
-    max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE)         # Calculate maximum
-  )
-
-# Print all summaries
-print("Timepoints per patient:")
-
-
[1] "Timepoints per patient:"
-
-
print(timepoints_per_patient)
-
-
# A tibble: 1 × 3
-  median_timepoints min_timepoints max_timepoints
-              <int>          <int>          <int>
-1                 2              1             12
-
-
print("Timepoints of ctDNA assessment:")
-
-
[1] "Timepoints of ctDNA assessment:"
-
-
print(ctDNA_timepoints)
-
-
# A tibble: 1 × 3
-  median_ctDNA_timepoints min_ctDNA_timepoints max_ctDNA_timepoints
-                    <int>                <int>                <int>
-1                       2                    1                   12
-
-
print("Timepoints of DTC assessment:")
-
-
[1] "Timepoints of DTC assessment:"
-
-
print(dtc_timepoints)
-
-
# A tibble: 1 × 3
-  median_dtc_timepoints min_dtc_timepoints max_dtc_timepoints
-                  <int>              <int>              <int>
-1                     2                  1                  6
-
-
### timepoints on clinical trial ### Ask Nick -- should we include all the timepoints on trial technically 
-#(CLEVER-Baseline, EOO, C3, C6, C12, 6M F/U, etc.) or just the ones while on treatment (C3, C6, C12) 
-#, or only the ones while patiennts are 
-unique_timepoints <- unique(subset_data$timepoint)
-print(unique_timepoints) 
-
-
 [1] "SURMOUNT-Baseline" "Year 1 Follow Up"  "Year 2 Follow Up" 
- [4] "Long Term FU 1"    "Long Term FU 2"    "Year 3 Follow Up" 
- [7] "CLEVER-Baseline"   "C6"                "6M F/U"           
-[10] "12M F/U"           "18M F/U"           "24M F/U"          
-[13] "30M F/U"           "36M F/U"           "C3"               
-[16] "EOO"               "C12"               "Year 4 Follow Up" 
-
-
trial_timepoints <- c("CLEVER-Baseline", "EOO", "C3", "C6", "C12", "6M F/U", "12M F/U", "18M F/U", "24M F/U", "30M F/U", "36M F/U")
-
-# Count the number of samples by timepoint (for specific clinical trial timepoints)
-samples_by_trial_timepoint <- subset_data |>
-  filter(timepoint %in% trial_timepoints) |>  # Filter for relevant timepoints
-  group_by(timepoint) |>                      # Group by timepoint
-  summarise(
-    total_samples = n_distinct(participant_id),  # Count distinct participant_ids (samples)
-    .groups = "drop"  # Remove grouping after summarizing
-  )
-
-# Print the result
-print(samples_by_trial_timepoint) #total samples on trial (ctDNA and dtC)
-
-
# A tibble: 11 × 2
-   timepoint       total_samples
-   <chr>                   <int>
- 1 12M F/U                    18
- 2 18M F/U                    13
- 3 24M F/U                    13
- 4 30M F/U                    12
- 5 36M F/U                    18
- 6 6M F/U                     27
- 7 C12                         4
- 8 C3                         20
- 9 C6                         28
-10 CLEVER-Baseline            32
-11 EOO                         9
-
-
#### ctDNA on trial 
-
-ctDNA_samples_by_timepoint <- subset_data |>
-  filter(timepoint %in% trial_timepoints, !is.na(ctDNA_detected)) |>  # Filter for relevant timepoints and ctDNA detected
-  group_by(timepoint) |>                      # Group by timepoint
-  summarise(
-    total_samples_ctDNA = n_distinct(participant_id),  # Count distinct participant_ids (ctDNA samples)
-    .groups = "drop"  # Remove grouping after summarizing
-  )
-
-# Print the result for ctDNA samples -- 32 CLEVER-Baseline, 9 EOO, 20 C3, 28 C6, 4 C12, 27 6M, 18 12M, 13 18M, 13 24M, 12 30M, 18 36M 
-print(ctDNA_samples_by_timepoint)
-
-
# A tibble: 11 × 2
-   timepoint       total_samples_ctDNA
-   <chr>                         <int>
- 1 12M F/U                          18
- 2 18M F/U                          13
- 3 24M F/U                          13
- 4 30M F/U                          12
- 5 36M F/U                          18
- 6 6M F/U                           27
- 7 C12                               4
- 8 C3                               20
- 9 C6                               28
-10 CLEVER-Baseline                  32
-11 EOO                               9
-
-
##### DTC by trial timepoint 
-# Count the number of DTC samples by timepoint (for specific clinical trial timepoints)
-dtc_samples_by_timepoint <- subset_data |>
-  filter(timepoint %in% trial_timepoints, !is.na(dtc_ihc_result_final)) |>  # Filter for relevant timepoints and DTC results
-  group_by(timepoint) |>                      # Group by timepoint
-  summarise(
-    total_samples_dtc = n_distinct(participant_id),  # Count distinct participant_ids (DTC samples)
-    .groups = "drop"  # Remove grouping after summarizing
-  )
-
-# Print the result for DTC samples -- makes sense, no CLEVER baseline timepoints, 9 EOO, 19 C3, 28 C6, 4 C12, 19 6M F/U 
-print(dtc_samples_by_timepoint)
-
-
# A tibble: 5 × 2
-  timepoint total_samples_dtc
-  <chr>                 <int>
-1 6M F/U                   19
-2 C12                       4
-3 C3                       19
-4 C6                       28
-5 EOO                       9
-
-
#### Number of ctDNA timepoints on surmount 
-print(unique_timepoints) 
-
-
 [1] "SURMOUNT-Baseline" "Year 1 Follow Up"  "Year 2 Follow Up" 
- [4] "Long Term FU 1"    "Long Term FU 2"    "Year 3 Follow Up" 
- [7] "CLEVER-Baseline"   "C6"                "6M F/U"           
-[10] "12M F/U"           "18M F/U"           "24M F/U"          
-[13] "30M F/U"           "36M F/U"           "C3"               
-[16] "EOO"               "C12"               "Year 4 Follow Up" 
-
-
surmount_timepoints <- c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", "Year 3 Follow Up", "Year 4 Follow Up", "Long Term FU 1", "Long Term FU 2") 
-
-ctDNA_surmount <- subset_data |>
-  filter(timepoint %in% surmount_timepoints, !is.na(ctDNA_detected)) |>  # Filter for relevant timepoints and ctDNA detected
-  group_by(timepoint) |>                      # Group by timepoint
-  summarise(
-    total_samples_ctDNA = n_distinct(participant_id),  # Count distinct participant_ids (ctDNA samples)
-    .groups = "drop"  # Remove grouping after summarizing
-  )
-
-# Print the result for ctDNA samples -- 109 Baseline, Y1 FU 40, Y2 FU 25, Y3 FU 14, Y4 FU 4, LT FU 10, LT 2 FU 2
-print(ctDNA_surmount)
-
-
# A tibble: 7 × 2
-  timepoint         total_samples_ctDNA
-  <chr>                           <int>
-1 Long Term FU 1                     10
-2 Long Term FU 2                      2
-3 SURMOUNT-Baseline                 109
-4 Year 1 Follow Up                   40
-5 Year 2 Follow Up                   25
-6 Year 3 Follow Up                   14
-7 Year 4 Follow Up                    4
-
-
### number of DTC timepoints on surmount 
-# Count the number of DTC samples by timepoint 
-dtc_timepoint_surmount <- subset_data |>
-  filter(timepoint %in% surmount_timepoints, !is.na(dtc_ihc_result_final)) |>  # Filter for relevant timepoints and DTC results
-  group_by(timepoint) |>                      # Group by timepoint
-  summarise(
-    total_samples_dtc = n_distinct(participant_id),  # Count distinct participant_ids (DTC samples)
-    .groups = "drop"  # Remove grouping after summarizing
-  )
-
-# Print the result for DTC samples -- 
-print(dtc_timepoint_surmount)
-
-
# A tibble: 5 × 2
-  timepoint         total_samples_dtc
-  <chr>                         <int>
-1 SURMOUNT-Baseline               109
-2 Year 1 Follow Up                 40
-3 Year 2 Follow Up                 24
-4 Year 3 Follow Up                 14
-5 Year 4 Follow Up                  4
-
-
#### positivity by timepoint -- ctDNA 
-
-ctDNA_pos_rate_by_timepoint <- subset_data |>
-  filter(!is.na(ctDNA_detected)) |>  # Ensure we are considering only non-missing ctDNA_detected values
-  group_by(timepoint, participant_id) |>  # Group by timepoint and participant
-  summarise(
-    ctDNA_pos = max(ctDNA_detected == TRUE),  # If any value is TRUE, participant is ctDNA positive at that timepoint
-    .groups = "drop"
-  ) |>
-  group_by(timepoint) |>  # Group again by timepoint to calculate the positivity rate
-  summarise(
-    positivity_rate = mean(ctDNA_pos),  # Calculate the positivity rate for each timepoint
-    total_samples = n_distinct(participant_id),  # Count the number of distinct participants
-    .groups = "drop"
-  )
-
-# Print the result for ctDNA positivity rate by timepoint
-print(ctDNA_pos_rate_by_timepoint)
-
-
# A tibble: 18 × 3
-   timepoint         positivity_rate total_samples
-   <chr>                       <dbl>         <int>
- 1 12M F/U                    0.0556            18
- 2 18M F/U                    0                 13
- 3 24M F/U                    0                 13
- 4 30M F/U                    0                 12
- 5 36M F/U                    0.0556            18
- 6 6M F/U                     0.0370            27
- 7 C12                        0                  4
- 8 C3                         0                 20
- 9 C6                         0                 28
-10 CLEVER-Baseline            0                 32
-11 EOO                        0                  9
-12 Long Term FU 1             0                 10
-13 Long Term FU 2             0                  2
-14 SURMOUNT-Baseline          0.0183           109
-15 Year 1 Follow Up           0.125             40
-16 Year 2 Follow Up           0.04              25
-17 Year 3 Follow Up           0                 14
-18 Year 4 Follow Up           0                  4
-
-
# Calculate cumulative ctDNA positivity rate by timepoint
-ctDNA_pos_rate_cumulative <- ctDNA_pos_rate_by_timepoint |>
-  arrange(timepoint) |>  # Ensure the data is sorted by timepoint
-  mutate(
-    cumulative_pos_rate = cumsum(positivity_rate * total_samples) / cumsum(total_samples)  # Cumulative positivity rate
-  )
-
-print(ctDNA_pos_rate_cumulative)
-
-
# A tibble: 18 × 4
-   timepoint         positivity_rate total_samples cumulative_pos_rate
-   <chr>                       <dbl>         <int>               <dbl>
- 1 12M F/U                    0.0556            18              0.0556
- 2 18M F/U                    0                 13              0.0323
- 3 24M F/U                    0                 13              0.0227
- 4 30M F/U                    0                 12              0.0179
- 5 36M F/U                    0.0556            18              0.0270
- 6 6M F/U                     0.0370            27              0.0297
- 7 C12                        0                  4              0.0286
- 8 C3                         0                 20              0.024 
- 9 C6                         0                 28              0.0196
-10 CLEVER-Baseline            0                 32              0.0162
-11 EOO                        0                  9              0.0155
-12 Long Term FU 1             0                 10              0.0147
-13 Long Term FU 2             0                  2              0.0146
-14 SURMOUNT-Baseline          0.0183           109              0.0159
-15 Year 1 Follow Up           0.125             40              0.0282
-16 Year 2 Follow Up           0.04              25              0.0289
-17 Year 3 Follow Up           0                 14              0.0279
-18 Year 4 Follow Up           0                  4              0.0276
-
-
#### Cumulative positivity ctDNA 
-
-library(dplyr)
-
-# Calculate ctDNA positivity rate by participant
-ctDNA_pos_rate <- subset_data |>
-  filter(!is.na(ctDNA_detected)) |>  # Exclude missing ctDNA results
-  group_by(participant_id) |>  # Group by participant
-  summarise(
-    ctDNA_pos = max(ctDNA_detected == TRUE),  # If any value is TRUE, participant is ctDNA positive
-    .groups = "drop"
-  )
-
-# Calculate cumulative positivity rate
-ctDNA_pos_rate_cumulative <- ctDNA_pos_rate |>
-  summarise(
-    total_pos = sum(ctDNA_pos),  # Total number of ctDNA positive participants
-    total_samples = n(),  # Total number of participants
-    cumulative_pos_rate = total_pos / total_samples  # Cumulative positivity rate
-  )
-
-# Print the cumulative positivity rate
-print(ctDNA_pos_rate_cumulative)
-
-
# A tibble: 1 × 3
-  total_pos total_samples cumulative_pos_rate
-      <int>         <int>               <dbl>
-1         9           109              0.0826
-
-
# Count the number of positive ctDNA samples and total samples
-ctDNA_pos_vs_total <- subset_data |>
-  filter(!is.na(ctDNA_detected)) |>  # Exclude missing ctDNA results
-  summarise(
-    total_samples = n(),  # Total number of ctDNA samples
-    positive_samples = sum(ctDNA_detected == TRUE),  # Count of positive ctDNA samples
-    .groups = "drop"
-  ) |>
-  mutate(
-    positivity_rate = positive_samples / total_samples  # Proportion of positive ctDNA samples
-  )
-
-# Print the results
-print(ctDNA_pos_vs_total)
-
-
# A tibble: 1 × 3
-  total_samples positive_samples positivity_rate
-          <int>            <int>           <dbl>
-1           398               11          0.0276
-
-
#### cumulative positivity DTC 
-
-# Calculate ctDNA positivity rate by participant
-DTC_pos_rate <- subset_data |>
-  filter(!is.na(dtc_ihc_result_final)) |>  # Exclude missing ctDNA results
-  group_by(participant_id) |>  # Group by participant
-  summarise(
-    dtc = max(dtc_ihc_result_final == 1),  # If any value is TRUE, participant is ctDNA positive
-    .groups = "drop"
-  )
-
-# Calculate cumulative positivity rate
-DTC_pos_rate_cumulative <- DTC_pos_rate |>
-  summarise(
-    total_pos = sum(dtc),  # Total number of ctDNA positive participants
-    total_samples = n(),  # Total number of participants
-    cumulative_pos_rate = total_pos / total_samples  # Cumulative positivity rate
-  )
-
-# Print the cumulative positivity rate
-print(DTC_pos_rate_cumulative)
-
-
# A tibble: 1 × 3
-  total_pos total_samples cumulative_pos_rate
-      <int>         <int>               <dbl>
-1        39           109               0.358
-
-
# Count the number of positive ctDNA samples and total samples
-dtc_pos_vs_total <- subset_data |>
-  filter(!is.na(dtc_ihc_result_final)) |>  # Exclude missing ctDNA results
-  summarise(
-    total_samples = n(),  # Total number of ctDNA samples
-    positive_samples = sum(dtc_ihc_result_final == 1),  # Count of positive ctDNA samples
-    .groups = "drop"
-  ) |>
-  mutate(
-    positivity_rate = positive_samples / total_samples  # Proportion of positive ctDNA samples
-  )
-
-# Print the results
-print(dtc_pos_vs_total)
-
-
# A tibble: 1 × 3
-  total_samples positive_samples positivity_rate
-          <int>            <int>           <dbl>
-1           270               49           0.181
-
-
-

We see the distribution of test samples by timepoint, and can see that the most samples–and the highest rate of positivity– occurred at SURMOUNT-baseline, but that more samples became positive with subsequent testing and that the cumulative positivity rate rose with additional timepoints–for both DTC and ctDNA assessment.

-

Test Characteristics of ctDNA assay: Next we will look at the sensitivity and specificity of the ctDNA assay.

-
-
######  Test characteristics ctDNA 
-#trying to do ctDNA 2x2 with ever relapsed on a patient level 
-
-library(dplyr)
-library(knitr)
-
-#create ever_relapsed variable 
-subset_data <- subset_data |>
-  mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No"))
-
-
-# Exclude participants with all NA for `ctDNA_ever` or `ever_relapsed`
-summarized_data <- subset_data |>
-  filter(!is.na(ctDNA_ever) | !is.na(ever_relapsed)) |> # Keep rows with at least one non-NA value
-  group_by(participant_id) |>
-  summarize(
-    ctDNA_ever = max(ctDNA_ever, na.rm = TRUE),       
-    ever_relapsed = max(ever_relapsed, na.rm = TRUE)  
-  )
-
-
Warning: There were 2 warnings in `summarize()`.
-The first warning was:
-ℹ In argument: `ever_relapsed = max(ever_relapsed, na.rm = TRUE)`.
-ℹ In group 27: `participant_id = "28115-17-021"`.
-Caused by warning in `max()`:
-! no non-missing arguments, returning NA
-ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
-
-
# Create the confusion matrix
-confusion_matrix <- table(summarized_data$ctDNA_ever, summarized_data$ever_relapsed)
-
-# Extract counts from the confusion matrix
-TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0)  # True Positives
-FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0)  # False Positives
-TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0)  # True Negatives
-FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0)  # False Negatives
-
-# Calculate performance metrics
-sensitivity <- TP / (TP + FN)  # Sensitivity
-specificity <- TN / (TN + FP)  # Specificity
-PPV <- TP / (TP + FP)          # Positive Predictive Value
-NPV <- TN / (TN + FN)          # Negative Predictive Value
-
-# Create a data frame for the table
-performance_table <- data.frame(
-  Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"),
-  Value = c(sensitivity, specificity, PPV, NPV)
-)
-
-# Print the table
-print(performance_table)
-
-
                           Metric     Value
-1                     Sensitivity 0.5714286
-2                     Specificity 0.9892473
-3 Positive Predictive Value (PPV) 0.8888889
-4 Negative Predictive Value (NPV) 0.9387755
-
-
#Format the table for better readability
-kable(performance_table, digits = 2, col.names = c("Metric", "Value"))
-
- - - - - - - - - - - - - - - - - - - - - - - - - -
MetricValue
Sensitivity0.57
Specificity0.99
Positive Predictive Value (PPV)0.89
Negative Predictive Value (NPV)0.94
-
-
-

This ctDNA assay has high specificity (99%), with a high positive predictive value for relapse (89%) and also a high negative predictive value (94%).

-
-
### Test characteristics for DTC -- and trial #s 
-
-library(dplyr)
-
-# Total unique DTC+ patients
-total_dtc_plus <- subset_data |>
-  filter(dtc_ihc_result_final == 1) |>
-  distinct(participant_id) |>
-  nrow()
-
-# Unique DTC+ patients who went on trial (those who have a trial ID fu_trial_pid)
-dtc_plus_trial <- subset_data |>
-  filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) |>
-  distinct(participant_id) |>
-  nrow()
-
-# Proportion of DTC+ patients who went on trial
-proportion_trial <- dtc_plus_trial / total_dtc_plus
-
-# Display results
-cat("Total unique DTC+ patients:", total_dtc_plus, "\n")
-
-
Total unique DTC+ patients: 39 
-
-
cat("Unique DTC+ patients who went on trial:", dtc_plus_trial, "\n")
-
-
Unique DTC+ patients who went on trial: 39 
-
-
cat("Proportion of DTC+ patients who went on trial:", proportion_trial, "\n")
-
-
Proportion of DTC+ patients who went on trial: 1 
-
-
# All DTC + patients went on trial (39/39)
-
-
-# Exclude participants with all NA for `dtc_ever` or `ever_relapsed`
-summarized_data <- subset_data |>
-  filter(!is.na(dtc_ever) | !is.na(ever_relapsed)) |> # Keep rows with at least one non-NA value
-  group_by(participant_id) |>
-  summarize(
-    dtc_ever = max(dtc_ever, na.rm = TRUE),       
-    ever_relapsed = max(ever_relapsed, na.rm = TRUE)  
-  )
-
-
Warning: There were 2 warnings in `summarize()`.
-The first warning was:
-ℹ In argument: `ever_relapsed = max(ever_relapsed, na.rm = TRUE)`.
-ℹ In group 27: `participant_id = "28115-17-021"`.
-Caused by warning in `max()`:
-! no non-missing arguments, returning NA
-ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
-
-
# Create the confusion matrix
-confusion_matrix <- table(summarized_data$dtc_ever, summarized_data$ever_relapsed)
-
-# Extract counts from the confusion matrix
-TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0)  # True Positives
-FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0)  # False Positives
-TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0)  # True Negatives
-FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0)  # False Negatives
-
-# Calculate performance metrics
-sensitivity <- TP / (TP + FN)  # Sensitivity
-specificity <- TN / (TN + FP)  # Specificity
-PPV <- TP / (TP + FP)          # Positive Predictive Value
-NPV <- TN / (TN + FN)          # Negative Predictive Value
-
-# Create a data frame for the table
-performance_table <- data.frame(
-  Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"),
-  Value = c(sensitivity, specificity, PPV, NPV)
-)
-
-# Print the table
-print(performance_table)
-
-
                           Metric     Value
-1                     Sensitivity 0.2857143
-2                     Specificity 0.6344086
-3 Positive Predictive Value (PPV) 0.1052632
-4 Negative Predictive Value (NPV) 0.8550725
-
-
#Format the table for better readability
-library(knitr)
-kable(performance_table, digits = 2, col.names = c("Metric", "Value"))
-
- - - - - - - - - - - - - - - - - - - - - - - - - -
MetricValue
Sensitivity0.29
Specificity0.63
Positive Predictive Value (PPV)0.11
Negative Predictive Value (NPV)0.86
-
-
-

All of the 39 individuals who were DTC positive went onto an interventional treatment trial aimed at eliminating the presence of the DTCs. This is different from the workflow for ctDNA assessment, which occurred retrospectively–sometimes several years after testing–and was not the basis for any trial/intervention decision-making. It is therefore somewhat challenging to interpret the sensitivity and specificity of the DTC test, as relapse is the outcome and all of these patients are receiving an intervention aimed at eliminating the presence of the DTCs and thereby preventing relapse. The intervention after DTC assessment explains in part the low positive predictive value and the low sensitivity of the test. However, the high negative predictive value of 0.86 in the cohort–which is looking only at those who remained DTC negative and their outcomes (ie. those who did NOT get an intervention) suggests that repeat negative DTC testing (ie always remaining DTC negative on all testing) is valuable in predicting a good outcome (ie. NO relapse during follow-up).

-

Associations with Relapse

-
-
## ctDNA association with relapse ## 
-# link by participant id 
-subset_data_by_id <- subset_data %>%
-  group_by(participant_id) %>%
-  summarise(
-    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
-    dtc = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of ever_relapsed vs dtc_ever
-contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results. ctDNA has a strong association with relapse (p<0.0001). 
-print(contingency_table)
-
-
     
-      FALSE TRUE
-  No     92    1
-  Yes     6    8
-
-
print(chisq_test)  
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 42.642, df = 1, p-value = 6.573e-11
-
-
#DTC association with relapse## 
-
-# link by participant id 
-subset_data_by_id <- subset_data %>%
-  group_by(participant_id) %>%
-  summarise(
-    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
-    dtc = first(dtc_ever),        # Get the dtc_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of ever_relapsed vs dtc_ever
-contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results. Less strong of an association with relapse (p = 0.774) 
-print(contingency_table)
-
-
     
-       0  1
-  No  59 34
-  Yes 10  4
-
-
print(chisq_test)  
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.079932, df = 1, p-value = 0.7774
-
-
-

Looking at how our two biomarkers are associated with relapse using univariable tests of association, we can see that ctDNA positivity is strongly associated with relapse, but DTC positivity is not. It is important to keep in mind that DTC positivity was the basis for enrollment onto interventional clinical trials that were aimed at eliminating DTCs and preventing relapse (and all DTC positive individuals in this cohort enrolled on interventional trials). This likely confounds our ability to measure the association of DTC positivity with relapse. ctDNA assessment, meanwhile, was performed retrospectively and not used for clinical decision-making.

-

Demographics and Clinical Factor Assessment: Univariable associations by ctDNA status

-

Next we will start to build our Table 1, looking at important clinical and demographic variables in this ctDNA cohort. To start, we will look at univariable tests of association while looking at each variable (using chi-squared tests of association for categorical variables and t-tests for continuous variables).

-
-
library(dplyr)
-
-########### Variables to look at for Table 1 #########
-names(subset_data) #to identify the variables I want to use 
-
-
  [1] "ID"                               "trialID"                         
-  [3] "participant_id"                   "patient_id"                      
-  [5] "fu_trial_pid"                     "timepoint"                       
-  [7] "project"                          "surmount_id"                     
-  [9] "panel_id"                         "accession"                       
- [11] "sample_id"                        "collection_date"                 
- [13] "extracted_plasma_volume_ml"       "input"                           
- [15] "input_sample"                     "physical_run_name"               
- [17] "workflow_name"                    "eVAF"                            
- [19] "mutant_molecules"                 "mean_VAF"                        
- [21] "Score"                            "all_pass_variants"               
- [23] "total_variants"                   "n_positive_variants"             
- [25] "ctDNA_detected"                   "ctdna_cohort"                    
- [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
- [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
- [31] "dtc_final_result_date"            "pt"                              
- [33] "bma_date"                         "ORIG_RSLT_DTC"                   
- [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
- [37] "FINAL_COUNT"                      "org_consent_date"                
- [39] "demo_initials"                    "demo_dob"                        
- [41] "demo_sex"                         "demo_ethnicity"                  
- [43] "demo_race___1"                    "demo_race___2"                   
- [45] "demo_race___3"                    "demo_race___4"                   
- [47] "demo_race___5"                    "demo_race___88"                  
- [49] "demo_race___99"                   "demo_race_other"                 
- [51] "prtx_radiation"                   "prtx_rad_start"                  
- [53] "prtx_rad_end"                     "prtx_chemo"                      
- [55] "prtx_endo"                        "prtx_bonemod"                    
- [57] "prior_therapy_complete"           "inc_dx_crit"                     
- [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
- [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
- [63] "final_receptor_group"             "demo_race_final"                 
- [65] "final_histology"                  "final_tumor_grade"               
- [67] "final_overall_stage"              "final_t_stage"                   
- [69] "final_n_stage"                    "fu_date_to"                      
- [71] "fu_surv"                          "fu_date_death"                   
- [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
- [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
- [77] "fu_locreg_prog"                   "fu_locreg_date"                  
- [79] "fu_dist_site_num"                 "fu_dist_site_char"               
- [81] "fu_dist_prog"                     "fu_dist_date"                    
- [83] "censor_date"                      "chemo_indication_1"              
- [85] "chemo_name_1"                     "chemo_name_other_1"              
- [87] "chemo_start_date_1"               "start_date_exact_1"              
- [89] "chemo_end_date_1"                 "end_date_exact_1"                
- [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
- [93] "chemo_indication_2"               "chemo_name_2"                    
- [95] "chemo_name_other_2"               "chemo_start_date_2"              
- [97] "start_date_exact_2"               "chemo_end_date_2"                
- [99] "end_date_exact_2"                 "chemo_notes_2"                   
-[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
-[103] "chemo_name_3"                     "chemo_name_other_3"              
-[105] "chemo_start_date_3"               "start_date_exact_3"              
-[107] "chemo_end_date_3"                 "end_date_exact_3"                
-[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
-[111] "chemo_indication_4"               "chemo_name_4"                    
-[113] "chemo_name_other_4"               "chemo_start_date_4"              
-[115] "start_date_exact_4"               "chemo_end_date_4"                
-[117] "end_date_exact_4"                 "chemo_notes_4"                   
-[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
-[121] "hormone_name___1_1"               "hormone_name___2_1"              
-[123] "hormone_name___3_1"               "hormone_name___4_1"              
-[125] "hormone_name___5_1"               "hormone_name___6_1"              
-[127] "hormone_name___7_1"               "hormone_other_1"                 
-[129] "hormone_start_date_1"             "hormone_ongoing_1"               
-[131] "hormone_end_date_1"               "hormone_notes_1"                 
-[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
-[135] "hormone_name___1_2"               "hormone_name___2_2"              
-[137] "hormone_name___3_2"               "hormone_name___4_2"              
-[139] "hormone_name___5_2"               "hormone_name___6_2"              
-[141] "hormone_name___7_2"               "hormone_other_2"                 
-[143] "hormone_start_date_2"             "hormone_ongoing_2"               
-[145] "hormone_end_date_2"               "hormone_notes_2"                 
-[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
-[149] "hormone_name___1_3"               "hormone_name___2_3"              
-[151] "hormone_name___3_3"               "hormone_name___4_3"              
-[153] "hormone_name___5_3"               "hormone_name___6_3"              
-[155] "hormone_name___7_3"               "hormone_other_3"                 
-[157] "hormone_start_date_3"             "hormone_ongoing_3"               
-[159] "hormone_end_date_3"               "hormone_notes_3"                 
-[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
-[163] "hormone_name___1_4"               "hormone_name___2_4"              
-[165] "hormone_name___3_4"               "hormone_name___4_4"              
-[167] "hormone_name___5_4"               "hormone_name___6_4"              
-[169] "hormone_name___7_4"               "hormone_other_4"                 
-[171] "hormone_start_date_4"             "hormone_ongoing_4"               
-[173] "hormone_end_date_4"               "hormone_notes_4"                 
-[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
-[177] "hormone_name___1_5"               "hormone_name___2_5"              
-[179] "hormone_name___3_5"               "hormone_name___4_5"              
-[181] "hormone_name___5_5"               "hormone_name___6_5"              
-[183] "hormone_name___7_5"               "hormone_other_5"                 
-[185] "hormone_start_date_5"             "hormone_ongoing_5"               
-[187] "hormone_end_date_5"               "hormone_notes_5"                 
-[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
-[191] "hormone_name___1_6"               "hormone_name___2_6"              
-[193] "hormone_name___3_6"               "hormone_name___4_6"              
-[195] "hormone_name___5_6"               "hormone_name___6_6"              
-[197] "hormone_name___7_6"               "hormone_other_6"                 
-[199] "hormone_start_date_6"             "hormone_ongoing_6"               
-[201] "hormone_end_date_6"               "hormone_notes_6"                 
-[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
-[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
-[207] "bonemod_name_1"                   "bonemod_start_date_1"            
-[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
-[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
-[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
-[215] "bonemod_name_2"                   "bonemod_start_date_2"            
-[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
-[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
-[221] "diag_lateral_1"                   "diag_menopause_1"                
-[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
-[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
-[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
-[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
-[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
-[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
-[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
-[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
-[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
-[241] "diag_er_status_1"                 "diag_er_percent_1"               
-[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
-[245] "diag_her2_status_1"               "diag_her2_method_1"              
-[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
-[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
-[251] "diag_pcr_1"                       "diag_surgery_date_1"             
-[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
-[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
-[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
-[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
-[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
-[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
-[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
-[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
-[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
-[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
-[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
-[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
-[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
-[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
-[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
-[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
-[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
-[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
-[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
-[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
-[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
-[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
-[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
-[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
-[301] "mammoprint_1"                     "mammoprint_result_1"             
-[303] "diag_notes_1"                     "diag_date_2"                     
-[305] "diag_lateral_2"                   "diag_menopause_2"                
-[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
-[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
-[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
-[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
-[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
-[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
-[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
-[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
-[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
-[325] "diag_er_status_2"                 "diag_er_percent_2"               
-[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
-[329] "diag_her2_status_2"               "diag_her2_method_2"              
-[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
-[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
-[335] "diag_pcr_2"                       "diag_surgery_date_2"             
-[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
-[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
-[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
-[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
-[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
-[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
-[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
-[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
-[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
-[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
-[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
-[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
-[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
-[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
-[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
-[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
-[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
-[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
-[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
-[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
-[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
-[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
-[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
-[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
-[385] "mammoprint_2"                     "mammoprint_result_2"             
-[387] "diag_notes_2"                     "ctDNA_ever"                      
-[389] "dtc_ever"                         "ever_relapsed"                   
-
-
###### median age at diagnosis -- this requires some initial varialbe manipulation to start as the variables are in character form, not date form 
-str(subset_data$diag_date_1) #character -- need to be changed to date 
-
-
 chr [1:398] "08/15/2013" "08/15/2013" "08/15/2013" "08/15/2013" ...
-
-
str(subset_data$demo_dob) #character  -- need to be changed to date 
-
-
 chr [1:398] "09/21/1957" "09/21/1957" "09/21/1957" "09/21/1957" ...
-
-
d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y")  
-d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y")  
-
-str(d$diag_date_1) #dates! 
-
-
 Date[1:579], format: "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" ...
-
-
str(d$demo_dob) #dates! 
-
-
 Date[1:579], format: "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" ...
-
-
### doing the same for subset_data as it didn't carry over into that data set 
-subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y")  
-subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y")  
-
-# calculating age from date of diagnosis to dob 
-subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25
-head(subset_data$age_at_diag)
-
-
[1] 55.89870 55.89870 55.89870 55.89870 55.89870 49.25667
-
-
summary(subset_data$age_at_diag) #median 48.75 
-
-
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-  27.34   41.73   48.75   49.35   57.63   68.94 
-
-
age_summary <- subset_data |> 
-  group_by(ctDNA_ever) |> 
-  summarise(
-    mean_age = mean(age_at_diag, na.rm = TRUE),  # Calculate mean age
-    median_age = median(age_at_diag, na.rm = TRUE),  # Calculate median age
-    sd_age = sd(age_at_diag, na.rm = TRUE),  # Calculate standard deviation of age
-    n = n()  # Number of participants in each group
-  )
-
-print(age_summary)
-
-
# A tibble: 2 × 5
-  ctDNA_ever mean_age median_age sd_age     n
-  <lgl>         <dbl>      <dbl>  <dbl> <int>
-1 FALSE          49.1       48.5   9.77   372
-2 TRUE           53.3       50.4   7.64    26
-
-
# Perform the Wilcoxon rank-sum test to compare the medians of age between ctDNA_ever positive and negative groups
-wilcox_test_result <- wilcox.test(age_at_diag ~ ctDNA_ever, data = subset_data)
-
-# Print the result
-print(wilcox_test_result)
-
-

-    Wilcoxon rank sum test with continuity correction
-
-data:  age_at_diag by ctDNA_ever
-W = 3499, p-value = 0.01842
-alternative hypothesis: true location shift is not equal to 0
-
-
#looking at range of age for the ctDNA pos vs neg groups 
-age_summary <- subset_data |> 
-  group_by(ctDNA_ever) |> 
-  summarise(
-    min_age = min(age_at_diag, na.rm = TRUE),  # Minimum age
-    max_age = max(age_at_diag, na.rm = TRUE),  # Maximum age
-    .groups = "drop"
-  )
-
-# View the summary table for age 
-print(age_summary)
-
-
# A tibble: 2 × 3
-  ctDNA_ever min_age max_age
-  <lgl>        <dbl>   <dbl>
-1 FALSE         27.3    68.9
-2 TRUE          38.6    64.4
-
-
-
-
##### Race: demo_race_final
-
-# Get the count of unique participant_ids for each category in demo_race_final
-race_counts_unique_percent <- subset_data |>
-  group_by(demo_race_final) |>
-  summarise(unique_participants = n_distinct(participant_id)) |>
-  mutate(percent = unique_participants / sum(unique_participants) * 100)
-
-# View the result
-print(race_counts_unique_percent)
-
-
# A tibble: 3 × 3
-  demo_race_final unique_participants percent
-            <int>               <int>   <dbl>
-1               1                   9   8.26 
-2               3                   1   0.917
-3               5                  99  90.8  
-
-
# Count distinct participant_ids by ctDNA_ever and demo_race_final
-count_distinct_participants <- subset_data |>
-  group_by(demo_race_final, ctDNA_ever) |>
-  summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop")
-
-# Print the result
-count_distinct_participants
-
-
# A tibble: 5 × 3
-  demo_race_final ctDNA_ever distinct_participant_count
-            <int> <lgl>                           <int>
-1               1 FALSE                               8
-2               1 TRUE                                1
-3               3 FALSE                               1
-4               5 FALSE                              91
-5               5 TRUE                                8
-
-
# Step 1: Summarize by unique participant_id
-summarized_data <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    ctDNA_ever = first(ctDNA_ever),   # Taking the first observed value of ctDNA_ever for each participant
-    demo_race_final = first(demo_race_final),  # Taking the first observed value of demo_race_final for each participant
-    .groups = "drop"
-  )
-
-# Step 2: Create the contingency table
-contingency_table <- table(summarized_data$ctDNA_ever, summarized_data$demo_race_final)
-contingency_table
-
-
       
-         1  3  5
-  FALSE  8  1 91
-  TRUE   1  0  8
-
-
# Step 3: Perform the chi-squared test of independence
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Step 4: Print the result p val - 0.91 
-chisq_test
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 0.19084, df = 2, p-value = 0.909
-
-
#####receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') 
-
-# Breakdown of final_receptor_group by unique participant_id
-receptor_status_by_participant <- subset_data |>
-  group_by(participant_id) |>
-  summarise(final_receptor_group = first(final_receptor_group),  # Or choose the most frequent group if needed
-            .groups = "drop")
-
-# View the result
-table(receptor_status_by_participant$final_receptor_group)
-
-

- 1  2  3  4 
-45 52  8  4 
-
-
# Summarizing data by participant_id, final_receptor_group, and ctDNA_ever
-receptor_ctDNA_status <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    final_receptor_group = first(final_receptor_group),  # Or the most frequent if needed
-    ctDNA_ever = first(ctDNA_ever),  # Taking the first observed value for ctDNA_ever
-    .groups = "drop"
-  )
-
-# Step 2: Create the contingency table
-contingency_table_receptor <- table(receptor_ctDNA_status$final_receptor_group, receptor_ctDNA_status$ctDNA_ever)
-contingency_table_receptor
-
-
   
-    FALSE TRUE
-  1    44    1
-  2    45    7
-  3     8    0
-  4     3    1
-
-
# Step 3: Perform the chi-squared test of independence
-chisq_test <- chisq.test(contingency_table_receptor)
-
-
Warning in chisq.test(contingency_table_receptor): Chi-squared approximation
-may be incorrect
-
-
# Step 4: Print the result # p-value 0.10
-chisq_test
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table_receptor
-X-squared = 6.2231, df = 3, p-value = 0.1012
-
-
#I was curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive)
-#inclusion criteria inc_dx_crit___1  = TNBC  (This has been confirmed with the study team)
-#inc_dx_crit_list___1  
-
-TNBC_ctDNA_status <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    inc_dx_crit_list___1 = first(inc_dx_crit_list___1),  # Or the most frequent if needed
-    ctDNA_ever = first(ctDNA_ever),  # Taking the first observed value for ctDNA_ever
-    .groups = "drop"
-  )
-
-# Step 2: Create the contingency table
-contingency_table_TNBC <- table(TNBC_ctDNA_status$inc_dx_crit_list___1, TNBC_ctDNA_status$ctDNA_ever)
-contingency_table_TNBC
-
-
   
-    FALSE TRUE
-  0    56    8
-  1    44    1
-
-
# Step 3: Perform the chi-squared test of independence
-chisq_test <- chisq.test(contingency_table_TNBC)
-
-
Warning in chisq.test(contingency_table_TNBC): Chi-squared approximation may be
-incorrect
-
-
# Step 4: p-val is 0.12 
-chisq_test
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table_TNBC
-X-squared = 2.4526, df = 1, p-value = 0.1173
-
-
### HR positive vs HR negative (Hormone receptor positive vs hormone receptor negative)
-#first, I need to create a HR positive variable (HR_status)
-subset_data <- subset_data |> 
-  mutate(HR_status = case_when(
-    final_receptor_group %in% c(2, 3) ~ "HR+",
-    final_receptor_group %in% c(1, 4) ~ "Non-HR+",
-    TRUE ~ NA_character_  # In case there are missing or other unexpected values
-  ))
-
-# View the new HR_status variable
-table(subset_data$HR_status)
-
-

-    HR+ Non-HR+ 
-    225     173 
-
-
HR_status_by_participant <- subset_data |>
-  group_by(participant_id) |>
-  summarise(HR_status = first(HR_status),  # Or use mode() if you have multiple rows per participant
-            .groups = "drop")
-
-# View the result 
-table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-)
-
-

-    HR+ Non-HR+ 
-     60      49 
-
-
# Summarize ctDNA_detected status by HR_status, for each unique participant_id
-summary_data <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    HR_status = first(HR_status),  # Get the HR_status for the participant
-    ctDNA_status = first(ctDNA_ever),  # Get the ctDNA_detected status for the participant
-    .groups = "drop"
-  )
-
-contingency_table_HR <- table(summary_data$ctDNA_status, summary_data$HR_status)
-contingency_table_HR
-
-
       
-        HR+ Non-HR+
-  FALSE  53      47
-  TRUE    7       2
-
-
chisq_test <- chisq.test(contingency_table_HR)
-
-
Warning in chisq.test(contingency_table_HR): Chi-squared approximation may be
-incorrect
-
-
# Print chi-squared test results #0.28 
-chisq_test
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table_HR
-X-squared = 1.1696, df = 1, p-value = 0.2795
-
-
###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported  
-# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported 
-summary_data <- subset_data |>
-  filter(final_tumor_grade != 3) |>  # Exclude grade == 3
-  group_by(participant_id) |>
-  summarise(
-    grade = first(final_tumor_grade),  # Get the final_tumor_grade for each participant
-    ctDNA_ever = first(ctDNA_ever),    # Get the ctDNA_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of grade vs ctDNA_ever
-contingency_table <- table(summary_data$grade, summary_data$ctDNA_ever)
-
-# View the contingency table
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    75    4
-  1    17    5
-  2     6    0
-
-
# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# View the Chi-squared test result -- p-value 0.0229 
-print(chisq_test)
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 7.5533, df = 2, p-value = 0.0229
-
-
######histology (final histology)
-#people have different combinations of histology (1-15)
-table(subset_data$participant_id, subset_data$final_histology)
-
-
              
-                1 1,13,14,3 1,3 11,3 12,3 13,3 13,3,5 14 14,15 14,15,3 14,3
-  28115-16-001  0         0   0    0    0    0      0  0     0       0    5
-  28115-16-004  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-010  0         0   0    0    1    0      0  0     0       0    0
-  28115-16-014  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-015  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-017  0         0   0    0    0    0      0  3     0       0    0
-  28115-16-020  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-021  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-023  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-025  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-026  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-027  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-029  0         0   0    0    0    0      0  2     0       0    0
-  28115-16-033  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-035  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-001  0         0   8    0    0    0      0  0     0       0    0
-  28115-17-002  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-006  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-008  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-009  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-010  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-011  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-012  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-016  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-017  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-019  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-021  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-022  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-023  0         0   2    0    0    0      0  0     0       0    0
-  28115-17-024  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-025  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-027  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-030  0         0   3    0    0    0      0  0     0       0    0
-  28115-17-031  0         0   0    0    0    0      0  5     0       0    0
-  28115-17-032  0         0   0   10    0    0      0  0     0       0    0
-  28115-17-036  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-039  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-040  0         0   4    0    0    0      0  0     0       0    0
-  28115-17-045  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-046  0         0   0    0    0   10      0  0     0       0    0
-  28115-17-047  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-048  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-050  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-051  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-052  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-001  0         0   0    0    0    0      0  0     7       0    0
-  28115-18-002  0         0   2    0    0    0      0  0     0       0    0
-  28115-18-004  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-006  0         0   0    0    0    0      0  1     0       0    0
-  28115-18-009  0         0   0    0    0    0      0  1     0       0    0
-  28115-18-011  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-014  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-015  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-017  0         0   0    0    0    0      0  1     0       0    0
-  28115-18-020  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-021  0         0   0    0    0    0      0  8     0       0    0
-  28115-18-022  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-023  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-024  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-027  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-028  1         0   0    0    0    0      0  0     0       0    0
-  28115-18-029  0         0   0    0    0    0      0  4     0       0    0
-  28115-18-030  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-031  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-032  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-034  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-001  0         0   0    0    0    0      0  1     0       0    0
-  28115-19-002  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-003  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-004  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-005  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-006  0         0   0    0    0    8      0  0     0       0    0
-  28115-19-007  0         0   0    0    0    0      0  0     0       0    5
-  28115-19-009  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-011  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-012  0         3   0    0    0    0      0  0     0       0    0
-  28115-19-014  0         0   0    0    0    0      0  0     0       0    2
-  28115-19-016  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-017  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-019  0         0   0    0    0    0      0  0     0       0    3
-  28115-19-020  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-021  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-022  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-025  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-028  0         0   0    0    0    0      0  0     0       0    0
-  28115-20-004  0         0   0    0    0    0      0  0     0       0    0
-  28115-20-007  0         0   0    0    0    0      0  0     0       0    0
-  28115-20-009  0         0   0    0    0    0      0  0     0       0    0
-  28115-20-010  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-001  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-002  0         0   0    0    0    0      0  0     0       0    4
-  28115-21-003  0         0   0    0    0    0      0  2     0       0    0
-  28115-21-006  0         0   0    0    0    2      0  0     0       0    0
-  28115-21-007  0         0   0    0    0    0      0  3     0       0    0
-  28115-21-009  0         0   0    0    0    0      3  0     0       0    0
-  28115-21-011  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-013  0         0   0    0    0    0      0  0     0       0    4
-  28115-21-014  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-015  0         0   0    0    0    0      0  0     0       2    0
-  28115-21-016  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-019  0         0   0    0    0    0      0  1     0       0    0
-  28115-21-020  0         0   3    0    0    0      0  0     0       0    0
-  28115-21-021  0         0   0    0    0    0      0  3     0       0    0
-  28115-21-022  0         0   0    0    0    1      0  0     0       0    0
-  28115-21-024  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-025  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-026  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-027  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-028  0         0   0    0    0    0      0  0     0       0    1
-              
-               16,3  3 3,5 3,7  5
-  28115-16-001    0  0   0   0  0
-  28115-16-004    0  1   0   0  0
-  28115-16-010    0  0   0   0  0
-  28115-16-014    0  1   0   0  0
-  28115-16-015    0 12   0   0  0
-  28115-16-017    0  0   0   0  0
-  28115-16-020    0  0   1   0  0
-  28115-16-021    0  9   0   0  0
-  28115-16-023    0  1   0   0  0
-  28115-16-025    0  1   0   0  0
-  28115-16-026    0 10   0   0  0
-  28115-16-027    0  3   0   0  0
-  28115-16-029    0  0   0   0  0
-  28115-16-033    0  2   0   0  0
-  28115-16-035    0  1   0   0  0
-  28115-17-001    0  0   0   0  0
-  28115-17-002    0  9   0   0  0
-  28115-17-006    0  1   0   0  0
-  28115-17-008    0  9   0   0  0
-  28115-17-009    0  1   0   0  0
-  28115-17-010    0  5   0   0  0
-  28115-17-011    0  9   0   0  0
-  28115-17-012    0 10   0   0  0
-  28115-17-016    0  4   0   0  0
-  28115-17-017    0  5   0   0  0
-  28115-17-019    0  9   0   0  0
-  28115-17-021    0  1   0   0  0
-  28115-17-022    0  1   0   0  0
-  28115-17-023    0  0   0   0  0
-  28115-17-024    0  0   0   4  0
-  28115-17-025    0  2   0   0  0
-  28115-17-027    0  8   0   0  0
-  28115-17-030    0  0   0   0  0
-  28115-17-031    0  0   0   0  0
-  28115-17-032    0  0   0   0  0
-  28115-17-036    0  7   0   0  0
-  28115-17-039    0  2   0   0  0
-  28115-17-040    0  0   0   0  0
-  28115-17-045    0  0   1   0  0
-  28115-17-046    0  0   0   0  0
-  28115-17-047    0  3   0   0  0
-  28115-17-048    0  2   0   0  0
-  28115-17-050    0  3   0   0  0
-  28115-17-051    0  9   0   0  0
-  28115-17-052    0  0   0   0  3
-  28115-18-001    0  0   0   0  0
-  28115-18-002    0  0   0   0  0
-  28115-18-004    0  2   0   0  0
-  28115-18-006    0  0   0   0  0
-  28115-18-009    0  0   0   0  0
-  28115-18-011    0  5   0   0  0
-  28115-18-014    0  2   0   0  0
-  28115-18-015    0  5   0   0  0
-  28115-18-017    0  0   0   0  0
-  28115-18-020    0  8   0   0  0
-  28115-18-021    0  0   0   0  0
-  28115-18-022    0  0   0  12  0
-  28115-18-023    0  3   0   0  0
-  28115-18-024    0  0   0   2  0
-  28115-18-027    0  1   0   0  0
-  28115-18-028    0  0   0   0  0
-  28115-18-029    0  0   0   0  0
-  28115-18-030    0  2   0   0  0
-  28115-18-031    0  3   0   0  0
-  28115-18-032    0  6   0   0  0
-  28115-18-034    0  1   0   0  0
-  28115-19-001    0  0   0   0  0
-  28115-19-002    0  2   0   0  0
-  28115-19-003    0  5   0   0  0
-  28115-19-004    0  1   0   0  0
-  28115-19-005    0  3   0   0  0
-  28115-19-006    0  0   0   0  0
-  28115-19-007    0  0   0   0  0
-  28115-19-009    0  6   0   0  0
-  28115-19-011    0  1   0   0  0
-  28115-19-012    0  0   0   0  0
-  28115-19-014    0  0   0   0  0
-  28115-19-016    0  2   0   0  0
-  28115-19-017    0  2   0   0  0
-  28115-19-019    0  0   0   0  0
-  28115-19-020    0  2   0   0  0
-  28115-19-021    0  4   0   0  0
-  28115-19-022    0  2   0   0  0
-  28115-19-025    0  6   0   0  0
-  28115-19-028    0  2   0   0  0
-  28115-20-004    0  2   0   0  0
-  28115-20-007    0  2   0   0  0
-  28115-20-009    0  4   0   0  0
-  28115-20-010    0  1   0   0  0
-  28115-21-001    0  1   0   0  0
-  28115-21-002    0  0   0   0  0
-  28115-21-003    0  0   0   0  0
-  28115-21-006    0  0   0   0  0
-  28115-21-007    0  0   0   0  0
-  28115-21-009    0  0   0   0  0
-  28115-21-011    0  1   0   0  0
-  28115-21-013    0  0   0   0  0
-  28115-21-014    0  2   0   0  0
-  28115-21-015    0  0   0   0  0
-  28115-21-016    0  8   0   0  0
-  28115-21-019    0  0   0   0  0
-  28115-21-020    0  0   0   0  0
-  28115-21-021    0  0   0   0  0
-  28115-21-022    0  0   0   0  0
-  28115-21-024    0  2   0   0  0
-  28115-21-025    0  2   0   0  0
-  28115-21-026    2  0   0   0  0
-  28115-21-027    0  2   0   0  0
-  28115-21-028    0  0   0   0  0
-
-
  histology_summary <- subset_data |>
-    distinct(participant_id, final_histology) |>  # Get unique participant-histology combinations
-    group_by(final_histology) |>  # Group by histology type
-    summarise(count = n())  # Count the number of participants per histology type
-  
-  # View the summary table
-  print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology 
-
-
# A tibble: 16 × 2
-   final_histology count
-   <chr>           <int>
- 1 1                   1
- 2 1,13,14,3           1
- 3 1,3                 6
- 4 11,3                1
- 5 12,3                1
- 6 13,3                4
- 7 13,3,5              1
- 8 14                 13
- 9 14,15               1
-10 14,15,3             1
-11 14,3                7
-12 16,3                1
-13 3                  65
-14 3,5                 2
-15 3,7                 3
-16 5                   1
-
-
  #trying to create Ductal, lobular, both, or other variables --> histology_category 
-  subset_data <- subset_data |>
-    mutate(histology_category = case_when(
-      grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular",  # Both Ductal and Lobular
-      grepl("3", as.character(final_histology)) ~ "Ductal",  # Ductal
-      grepl("14", as.character(final_histology)) ~ "Lobular",  # Lobular
-      TRUE ~ "Other"  # Any other combination
-    ))
-  
-  # Count the number of participants in each histology category
-  histology_counts <- subset_data |>
-    group_by(histology_category) |>
-    summarise(count = n_distinct(participant_id))  # Count distinct participants
-  
-  # View the counts -- adds up to 109! 
-  print(histology_counts)
-
-
# A tibble: 4 × 2
-  histology_category      count
-  <chr>                   <int>
-1 Both Ductal and Lobular     9
-2 Ductal                     84
-3 Lobular                    14
-4 Other                       2
-
-
  #contingency table 
-  library(tidyr)
-  contingency_table <- subset_data |>
-    distinct(participant_id, histology_category, ctDNA_ever) |>  # Ensure each patient is counted once
-    count(histology_category, ctDNA_ever) |>
-    pivot_wider(names_from = ctDNA_ever, values_from = n, values_fill = list(n = 0))  # Pivot the table to get ctDNA_ever as columns
-  
-  # 3. Perform the Chi-squared test of independence
-  chisq_test <- chisq.test(contingency_table[,-1])  # Remove the histology_category column for the test
-
-
Warning in chisq.test(contingency_table[, -1]): Chi-squared approximation may
-be incorrect
-
-
  # 4. Print the contingency table
-  print(contingency_table) 
-
-
# A tibble: 4 × 3
-  histology_category      `FALSE` `TRUE`
-  <chr>                     <int>  <int>
-1 Both Ductal and Lobular       9      0
-2 Ductal                       78      6
-3 Lobular                      11      3
-4 Other                         2      0
-
-
  # 5. Print the result of the Chi-squared test p-value - 0.2276
-  print(chisq_test)
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table[, -1]
-X-squared = 4.334, df = 3, p-value = 0.2276
-
-
#### Staging N stage (Nodal stage) 
-
-table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3)
-
-
              
-                0  1  2  3
-  28115-16-001  0  0  0  5
-  28115-16-004  1  0  0  0
-  28115-16-010  0  0  0  1
-  28115-16-014  1  0  0  0
-  28115-16-015 12  0  0  0
-  28115-16-017  0  0  3  0
-  28115-16-020  0  0  1  0
-  28115-16-021  0  0  9  0
-  28115-16-023  1  0  0  0
-  28115-16-025  1  0  0  0
-  28115-16-026 10  0  0  0
-  28115-16-027  0  3  0  0
-  28115-16-029  2  0  0  0
-  28115-16-033  0  2  0  0
-  28115-16-035  1  0  0  0
-  28115-17-001  0  8  0  0
-  28115-17-002  9  0  0  0
-  28115-17-006  0  1  0  0
-  28115-17-008  9  0  0  0
-  28115-17-009  1  0  0  0
-  28115-17-010  5  0  0  0
-  28115-17-011  0  0  0  9
-  28115-17-012  0  0  0 10
-  28115-17-016  0  4  0  0
-  28115-17-017  0  5  0  0
-  28115-17-019  9  0  0  0
-  28115-17-021  1  0  0  0
-  28115-17-022  1  0  0  0
-  28115-17-023  0  0  2  0
-  28115-17-024  4  0  0  0
-  28115-17-025  2  0  0  0
-  28115-17-027  0  8  0  0
-  28115-17-030  3  0  0  0
-  28115-17-031  5  0  0  0
-  28115-17-032  0  0 10  0
-  28115-17-036  7  0  0  0
-  28115-17-039  2  0  0  0
-  28115-17-040  0  0  4  0
-  28115-17-045  1  0  0  0
-  28115-17-046 10  0  0  0
-  28115-17-047  0  3  0  0
-  28115-17-048  0  0  2  0
-  28115-17-050  3  0  0  0
-  28115-17-051  9  0  0  0
-  28115-17-052  3  0  0  0
-  28115-18-001  0  0  7  0
-  28115-18-002  0  2  0  0
-  28115-18-004  0  0  2  0
-  28115-18-006  0  1  0  0
-  28115-18-009  1  0  0  0
-  28115-18-011  0  5  0  0
-  28115-18-014  0  2  0  0
-  28115-18-015  5  0  0  0
-  28115-18-017  0  1  0  0
-  28115-18-020  8  0  0  0
-  28115-18-021  0  8  0  0
-  28115-18-022 12  0  0  0
-  28115-18-023  0  3  0  0
-  28115-18-024  0  2  0  0
-  28115-18-027  0  1  0  0
-  28115-18-028  1  0  0  0
-  28115-18-029  0  4  0  0
-  28115-18-030  2  0  0  0
-  28115-18-031  0  3  0  0
-  28115-18-032  0  6  0  0
-  28115-18-034  1  0  0  0
-  28115-19-001  0  0  0  1
-  28115-19-002  0  2  0  0
-  28115-19-003  0  5  0  0
-  28115-19-004  0  1  0  0
-  28115-19-005  3  0  0  0
-  28115-19-006  0  8  0  0
-  28115-19-007  0  5  0  0
-  28115-19-009  0  0  0  6
-  28115-19-011  0  1  0  0
-  28115-19-012  0  3  0  0
-  28115-19-014  0  0  0  2
-  28115-19-016  2  0  0  0
-  28115-19-017  2  0  0  0
-  28115-19-019  0  3  0  0
-  28115-19-020  2  0  0  0
-  28115-19-021  0  4  0  0
-  28115-19-022  0  2  0  0
-  28115-19-025  0  6  0  0
-  28115-19-028  2  0  0  0
-  28115-20-004  2  0  0  0
-  28115-20-007  0  0  2  0
-  28115-20-009  4  0  0  0
-  28115-20-010  0  1  0  0
-  28115-21-001  0  1  0  0
-  28115-21-002  0  4  0  0
-  28115-21-003  0  0  2  0
-  28115-21-006  0  2  0  0
-  28115-21-007  0  0  3  0
-  28115-21-009  0  0  3  0
-  28115-21-011  1  0  0  0
-  28115-21-013  0  4  0  0
-  28115-21-014  0  2  0  0
-  28115-21-015  0  2  0  0
-  28115-21-016  8  0  0  0
-  28115-21-019  0  1  0  0
-  28115-21-020  0  3  0  0
-  28115-21-021  0  3  0  0
-  28115-21-022  1  0  0  0
-  28115-21-024  2  0  0  0
-  28115-21-025  0  2  0  0
-  28115-21-026  0  2  0  0
-  28115-21-027  2  0  0  0
-  28115-21-028  1  0  0  0
-
-
nodal_summary <- subset_data |>
-    distinct(participant_id, final_n_stage) |>  # Get unique participant-stage combinations
-    group_by(final_n_stage) |>  # Group by stage
-    summarise(count = n())  # Count the number of participants per histology type
-  
-#View the summary table --adds up to 109, 46 = pN0 63 = pN1 
-  print(nodal_summary)
-
-
# A tibble: 4 × 2
-  final_n_stage count
-          <int> <int>
-1             0    46
-2             1    43
-3             2    13
-4             3     7
-
-
  subset_data_by_id <- subset_data |>
-    filter(final_n_stage %in% c(0, 1, 2, 3)) |>  # Include only relevant nodal stages
-    group_by(participant_id) |>
-    summarise(
-      nodal_status = first(final_n_stage),  # Use final_n_stage as nodal_status for each participant
-      ctDNA_ever = first(ctDNA_ever),       # Get ctDNA_ever status for each participant
-      .groups = "drop"
-    )
-  
-  #Create a contingency table of nodal_status vs ctDNA_ever
-  contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$ctDNA_ever)
-  
-  # Check if any cells in the contingency table have zero counts, which could affect test validity
-  print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    43    3
-  1    43    0
-  2     8    5
-  3     6    1
-
-
  # Step 5: Perform Chi-squared test
-  chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
  # Step 6: Print the Chi-squared test result p = 0.0001 
-  print(chisq_test) 
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 20.045, df = 3, p-value = 0.0001661
-
-
  #### Node positive versus node negative: Using the final n stage to create a Node - vs node + variable from this summary indicator variable 
-  subset_data_by_id <- subset_data |>
-    group_by(participant_id) |>
-    summarise(
-      node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"),  # Node negative if 0, positive otherwise
-      ctDNA_ever = first(ctDNA_ever),  # Get ctDNA_ever status for each participant
-      .groups = "drop"
-    )
-  
-  #adding node_status to subset_data 
- subset_data <- subset_data |>
-  left_join(subset_data_by_id |> select(participant_id, node_status), by = "participant_id")
-  
-  
-  #Create a contingency table of node_status vs ctDNA_ever
-  contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$ctDNA_ever)
-  
-  # Perform the Chi-squared test
-  chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
  #Print the contingency table and Chi-squared test results
-  print(contingency_table)
-
-
               
-                FALSE TRUE
-  Node Negative    43    3
-  Node Positive    57    6
-
-
  print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.044142, df = 1, p-value = 0.8336
-
-
#######Looking at T stage or tumor size: the variable is final_t_stage 
-  
-  table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (99 = pTx, cannot evaluate) so can proceed with this 
-
-

-  1   2   3   4  99 
-173 168  46  10   1 
-
-
  t_summary <- subset_data |>
-    distinct(participant_id, final_t_stage) |>  # Get unique participant-stage combinations
-    group_by(final_t_stage) |>  # Group by stage
-    summarise(count = n())  # Count the number of participants per histology type
-  
-  # View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2  
-  print(t_summary)
-
-
# A tibble: 5 × 2
-  final_t_stage count
-          <int> <int>
-1             1    51
-2             2    44
-3             3    12
-4             4     1
-5            99     1
-
-
  #for our T stage table, will use T1 vs T2 or greater to simplify, and we want to exclude 99 (the pTx). We will create "final_t_stage_combined" to represent this.  
-  subset_data_clean <- subset_data |>
-    filter(final_t_stage != 99, ctDNA_ever != 99)
-  
-  # Combine final_t_stage into T1 vs. T2 or greater
-  subset_data_clean <- subset_data_clean |>
-    mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater"))
-  
-  # Summarize the data by participant_id after creating the new combined t_stage
-  subset_data_by_id <- subset_data_clean |>
-    group_by(participant_id) |>
-    summarise(
-      final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
-      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-    )
-  
-  # Create a contingency table of final_t_stage_combined vs ctDNA_ever
-  contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever)
-  
-  # Perform the Chi-squared test
-  chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
  # Print the contingency table and Chi-squared test results. P value = 0.6
-  print(contingency_table)
-
-
               
-                FALSE TRUE
-  T1               48    3
-  T2 or greater    51    6
-
-
  print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.27357, df = 1, p-value = 0.6009
-
-
#### I looked at a different cut-off for T stage stats using T3 or greater as cutoff and didn't see any significant difference so am not using this for the table. 
-  
-  #exclude 99 (the pTx) 
-  subset_data_clean <- subset_data |>
-    filter(final_t_stage != 99, ctDNA_ever != 99)
-  
-  # Combine final_t_stage into T1/T2 or T3 or greater
-  subset_data_clean <- subset_data_clean |>
-    mutate(final_t_stage_combined = case_when(
-      final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2",  # Group T1 and T2 together
-      final_t_stage >= 3 ~ "T3 or greater",  # Group T3 and higher as a separate category
-      TRUE ~ NA_character_  # Handle any unexpected values
-    ))
-  
-  
-  # Summarize the data by participant_id after creating the new combined t_stage
-  subset_data_by_id <- subset_data_clean |>
-    group_by(participant_id) |>
-    summarise(
-      final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
-      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-    )
-  
-  # Create a contingency table of final_t_stage_combined vs ctDNA_ever
-  contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever)
-  
-  # Perform the Chi-squared test
-  chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
  # Print the contingency table and Chi-squared test results --> not significant so ignore this 
-  print(contingency_table)
-
-
               
-                FALSE TRUE
-  T1 or T2         88    7
-  T3 or greater    11    2
-
-
  print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.19875, df = 1, p-value = 0.6557
-
-
  ########Overall stage of disease -- final_overall_stage 
-
-  table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this 
-
-

-  1   2   3  99 
-124 167 105   2 
-
-
  stage_summary <- subset_data |>
-    distinct(participant_id, final_overall_stage) |>  # Get unique participant-stage combinations
-    group_by(final_overall_stage) |>  # Group by stage
-    summarise(count = n())  # Count the number of participants per histology type
-  
-  # View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay)
-  print(stage_summary)
-
-
# A tibble: 4 × 2
-  final_overall_stage count
-                <int> <int>
-1                   1    35
-2                   2    47
-3                   3    26
-4                  99     1
-
-
  #exclude the 99 
-  subset_data_clean <- subset_data |>
-    filter(final_overall_stage != 99, ctDNA_ever != 99)
-  
-  # Summarize the data by participant_id
-  subset_data_by_id <- subset_data_clean |>
-    group_by(participant_id) |>
-    summarise(
-      final_overall_stage = first(final_overall_stage),  # Get the final_overall_stage for each participant
-      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-    )
-  
-  # Create a contingency table of final_overall_stage vs ctDNA_ever
-  contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$ctDNA_ever)
-  
-  # Perform the Chi-squared test
-  chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
  # Print the contingency table and Chi-squared test results --> hot dogggg p val = 0.006. Higher stage is associated with ctDNA_ever.  
-  print(contingency_table)
-
-
   
-    FALSE TRUE
-  1    33    2
-  2    46    1
-  3    20    6
-
-
  print(chisq_test)
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 10.082, df = 2, p-value = 0.006466
-
-
###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) 
-  
-  table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness 
-
-

-  1   2 
-158 240 
-
-
  surgery <- subset_data |>
-    distinct(participant_id, diag_surgery_type_1) |>  # Get unique participant-stage combinations
-    group_by(diag_surgery_type_1) |>  # Group by stage
-    summarise(count = n())  # Count the number of participants per histology type
-  
-  # View the summary table
-  print(surgery)
-
-
# A tibble: 2 × 2
-  diag_surgery_type_1 count
-                <int> <int>
-1                   1    45
-2                   2    64
-
-
  # Summarize the data by participant_id
-  subset_data_by_id <- subset_data_clean |>
-    group_by(participant_id) |>
-    summarise(
-     surgery = first(diag_surgery_type_1),  # Get the final_overall_stage for each participant
-      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-    )
-  
-  # Create a contingency table of final_overall_stage vs ctDNA_ever
-  contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$ctDNA_ever)
-  
-  # Perform the Chi-squared test
-  chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
  # Print the contingency table and Chi-squared test results --> p-val = 1....
-  print(contingency_table)
-
-
   
-    FALSE TRUE
-  1    41    4
-  2    58    5
-
-
  print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0, df = 1, p-value = 1
-
-
######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms). Created a new variable (axillary_dissection) 
-
-  
-  table(subset_data$diag_axillary_type___2_1) 
-
-

-  0   1 
-215 183 
-
-
  table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two 
-
-

- 0  1 
-16  4 
-
-
  # Create a binary variable to identify participants who had axillary dissection
-  subset_data_clean <- subset_data |>
-    mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0))
-  
-  subset_data <- subset_data |>
-  mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0))
-  
-  # Ensure every participant has a ctDNA_ever and axillary_dissection value
-  # Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one
-  subset_data_clean <- subset_data |>
-    mutate(axillary_dissection = case_when(
-      diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1,  # Had axillary dissection
-      TRUE ~ 0  # No axillary dissection (includes missing values)
-    ))
-  
-  # Summarize the data by participant_id, including the axillary_dissection and ctDNA_ever variables
-  subset_data_by_id <- subset_data_clean |>
-    group_by(participant_id) |>
-    summarise(
-      axillary_dissection = first(axillary_dissection),  # Get the axillary dissection status for each participant
-      ctDNA_ever = first(ctDNA_ever)  # Get the ctDNA_ever status for each participant
-    )
-  
-  contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$ctDNA_ever)
-  
-  subset_data <- subset_data |>
-  mutate(axillary_dissection = ifelse(is.na(axillary_dissection), 0, axillary_dissection))
-table(subset_data$axillary_dissection)
-
-

-  0   1 
-214 184 
-
-
  # Perform the Chi-squared test
-  chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
  # Print the contingency table and Chi-squared test results --> p-value 0.173 
-  print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    52    2
-  1    48    7
-
-
  print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 1.8588, df = 1, p-value = 0.1728
-
-
####inflammatory (variable inflamm_yn)-- I have decided not to include inflammatory variable in table 1 as there were NO inflammatory breast cancers in the ctDNA cohort. 
-table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable)
-
-

-  0   1 
-568  11 
-
-
table(d$inflamm_yn_2)  ### I think inflammatory folks just not in subset of patients in the ctDNA cohort 
-
-

- 0 
-24 
-
-
table(subset_data$inflamm_yn) 
-
-
Warning: Unknown or uninitialised column: `inflamm_yn`.
-
-
-
< table of extent 0 >
-
-
#### radiation prtx_radiation 
-table(subset_data$prtx_radiation) 
-
-

-  0   1 
-116 282 
-
-
radiation <- subset_data |> 
-  distinct(participant_id,prtx_radiation) |> 
-  group_by(prtx_radiation) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(radiation)
-
-
# A tibble: 2 × 2
-  prtx_radiation count
-           <int> <int>
-1              0    34
-2              1    75
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    radiation = first(prtx_radiation),  # xrt for each participant
-    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-  )
-
-# Create a contingency table of final_overall_stage vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$ctDNA_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results --> p-val  = 0.33 
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    33    1
-  1    67    8
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.96444, df = 1, p-value = 0.3261
-
-
#### chemotherapy prtx_chemo 
-table(subset_data$prtx_chemo) 
-
-

-  0   1 
- 18 380 
-
-
chemo <- subset_data |> 
-  distinct(participant_id,prtx_chemo) |> 
-  group_by(prtx_chemo) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(chemo) #3 people did not get chemo in this cohort 
-
-
# A tibble: 2 × 2
-  prtx_chemo count
-       <int> <int>
-1          0     3
-2          1   106
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    chemo = first(prtx_chemo),  # chemo for each participant
-    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-  )
-
-# Create a contingency table of final_overall_stage vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$ctDNA_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results --> p-val  = 0.59 
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0     2    1
-  1    98    8
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.28802, df = 1, p-value = 0.5915
-
-
####neoadjuvant chemo -- there are two variables for this that could theoretically be included: diag_neoadj_chemo_1 or diag_neoadj_chemo_2 
-
-table(subset_data$diag_neoadj_chemo_1) 
-
-

-  0   1 
-327  71 
-
-
table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable 
-
-

- 0 
-20 
-
-
nact <- subset_data |> 
-  distinct(participant_id,diag_neoadj_chemo_1) |> 
-  group_by(diag_neoadj_chemo_1) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(nact) #3 people did not get chemo in this cohort 
-
-
# A tibble: 2 × 2
-  diag_neoadj_chemo_1 count
-                <int> <int>
-1                   0    90
-2                   1    19
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    nact = first(diag_neoadj_chemo_1),  # NACT for each participant
-    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-  )
-
-# Create a contingency table of NACT vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$ctDNA_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results --> p-val  = 0.95 
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    82    8
-  1    18    1
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.0039839, df = 1, p-value = 0.9497
-
-
####hormone therapy prtx_endo 
-
-table(subset_data$prtx_endo) 
-
-

-  0   1 
-156 242 
-
-
endo <- subset_data |> 
-  distinct(participant_id,prtx_endo) |> 
-  group_by(prtx_endo) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(endo) #most ppl did get endo (62 of the 109)
-
-
# A tibble: 2 × 2
-  prtx_endo count
-      <int> <int>
-1         0    47
-2         1    62
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    endo = first(prtx_endo),  # Get the final_overall_stage for each participant
-    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-  )
-
-# Create a contingency table of final_overall_stage vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$ctDNA_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results --> p-val  = 0.33 
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    45    2
-  1    55    7
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.94139, df = 1, p-value = 0.3319
-
-
####bone modifying agents prtx_bonemod 
-
-table(subset_data$prtx_bonemod) 
-
-

-  0   1 
-238 160 
-
-
bonemod <- subset_data |> 
-  distinct(participant_id,prtx_bonemod) |> 
-  group_by(prtx_bonemod) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(bonemod) #most ppl did get endo (39 got bonemod)
-
-
# A tibble: 2 × 2
-  prtx_bonemod count
-         <int> <int>
-1            0    70
-2            1    39
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    bonemod = first(prtx_bonemod),  # Get bone mod status for each participant
-    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-  )
-
-# Create a contingency table of bonemod vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$ctDNA_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results --> p-val  = 0.84 
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    65    5
-  1    35    4
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.041269, df = 1, p-value = 0.839
-
-
#### PCR -- did NOT include this in Table 1 as it aligns closely with NACT) 
-# 2 = non-pcr, 1 = pcr 
-#the variables of interest for path cr: diag_pcr_1 or diag_pcr_2  
-table(subset_data$diag_pcr_1) 
-
-

-  .   1   2 
-327   8  63 
-
-
table(subset_data$diag_pcr_2) #none recorded here so can just use diag_pcr_1 
-
-

-      . 
-378  20 
-
-
pcr <- subset_data |>
-  mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) |>  # Convert "." to NA
-  filter(!is.na(diag_pcr_1)) |>  # Exclude rows where diag_pcr_1 is NA
-  distinct(participant_id, diag_pcr_1) |>
-  group_by(diag_pcr_1) |>
-  summarise(count = n()) # Count the number of participants per histology type
-
-# View the summary table
-print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data 
-
-
# A tibble: 2 × 2
-  diag_pcr_1 count
-  <chr>      <int>
-1 1              1
-2 2             18
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    pcr = first(diag_pcr_1),  # Get pcr for each participant
-    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
-  )
-
-# Create a contingency table of pcr vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$ctDNA_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results --> p-val  = 0.86 -- does not seem to be association among those who got pcr (but also we have a group with 1 in it...) 
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  .    82    8
-  1     1    0
-  2    17    1
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 0.31085, df = 2, p-value = 0.8561
-
-
########recurrence
-#local first, then distant.then create summary variable of either locreg or distant 
-#local fu_locreg_prog 
-
-# Step 1: Summarize data by unique participant_id
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    fu_locreg_prog = first(fu_locreg_prog),  # Get fu_locreg_prog status for each participant
-    ctDNA_ever = first(ctDNA_ever),          # Get ctDNA_ever status for each participant
-    .groups = "drop"
-  )
-
-# Step 2: Create a contingency table of fu_locreg_prog vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$ctDNA_ever)
-
-# Step 3: Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    96    5
-  1     2    4
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 20.564, df = 1, p-value = 5.768e-06
-
-
####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char 
-### Just want to look at site distribution here 
-
-# Summarize the distribution of fu_locreg_site_char by unique participant_id
-site_distribution <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    site = first(fu_locreg_site_char),  # Get the site for each unique participant
-    .groups = "drop"
-  ) |>
-  count(site)  # Count the occurrences of each site
-
-# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast 
-print(site_distribution)
-
-
# A tibble: 6 × 2
-  site                                                              n
-  <chr>                                                         <int>
-1 ""                                                              103
-2 "Axillary Nodes"                                                  2
-3 "Axillary Nodes,Internal Mammary Nodes,Supraclavicular Nodes"     1
-4 "Ipsilateral Breast"                                              1
-5 "Ipsilateral Breast,Axillary Nodes"                               1
-6 "Supraclavicular Nodes"                                           1
-
-
#####distant recurrence: distant fu_dist_prog 
-
-# Step 1: Summarize data by unique participant_id
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    fu_dist_prog = first(fu_dist_prog),  # Get fu_dist_prog status for each participant
-    ctDNA_ever = first(ctDNA_ever),          # Get ctDNA_ever status for each participant
-    .groups = "drop"
-  )
-
-# Step 2: Create a contingency table of dist prog vs ctDNA_ever --> 12 who had distant progression 
-contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$ctDNA_ever)
-
-# Step 3: Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0    93    2
-  1     5    7
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 36.73, df = 1, p-value = 1.356e-09
-
-
### Distant sites 
-#distant site fu_dist_site_num #fu_dist_site_char  -- start just looking at the locations 
-
-# Summarize the distribution of fu_dist_site_char by unique participant_id
-dist_site_distribution <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    site = first(fu_dist_site_char),  # Get the site for each unique participant
-    .groups = "drop"
-  ) |>
-  count(site)  # Count the occurrences of each site
-
-# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal 
-print(dist_site_distribution)
-
-
# A tibble: 8 × 2
-  site                  n
-  <chr>             <int>
-1 ""                   97
-2 "Bone"                5
-3 "Bone,Other"          1
-4 "Intra-abdominal"     1
-5 "Liver"               2
-6 "Liver,Bone"          1
-7 "Lung"                1
-8 "Pleura,Lung"         1
-
-
##### ANY Recurrence -- this includes either fu_locreg_prog or fu_dist_prog 
-
-# link by participant id 
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
-    ctDNA_ever = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of ever_relapsed vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results -- total 14 relapses, 8 were ctDNA +, 6 were not ever ctDNA positive 
-print(contingency_table)
-
-
     
-      FALSE TRUE
-  No     92    1
-  Yes     6    8
-
-
print(chisq_test) 
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 42.642, df = 1, p-value = 6.573e-11
-
-
#### Relapse and DTCs  
-#using ever_relapsed and dtc_ever
-
-# link by participant id 
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
-    dtc = first(dtc_ever),        # Get the ctDNA_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of ever_relapsed vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results
-print(contingency_table)
-
-
     
-       0  1
-  No  59 34
-  Yes 10  4
-
-
print(chisq_test) 
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.079932, df = 1, p-value = 0.7774
-
-
# Identify participants missing data in either `ever_relapsed` or `dtc_ever`
-missing_data <- subset_data_by_id |>
-  filter(is.na(ever_relapsed) | is.na(dtc))
-
-# Print the IDs of participants with missing data
-print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above)
-
-
[1] "28115-17-021" "28115-18-032"
-
-
### look at ever_relapsed by ctDNA 
-
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
-    ctDNA = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of ever_relapsed vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results, p < 0.00001 
-print(contingency_table)
-
-
     
-      FALSE TRUE
-  No     92    1
-  Yes     6    8
-
-
print(chisq_test) 
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 42.642, df = 1, p-value = 6.573e-11
-
-
####survival: fu_survival 
-
-table(subset_data$fu_surv)
-
-

-  0   1 
-  8 389 
-
-
surv <- subset_data |>
-  distinct(participant_id, fu_surv) |>
-  group_by(fu_surv) |>
-  summarise(count = n()) # Count the number of participants per histology type
-
-# View the summary table
-print(surv) #1 NA patient --> identify the NA patient below  dead = 5, alive 103. There is 1 that's an NA. 
-
-
# A tibble: 3 × 2
-  fu_surv count
-    <int> <int>
-1       0     5
-2       1   103
-3      NA     1
-
-
na_participant <- subset_data |>
-  filter(is.na(fu_surv)) |>
-  select(participant_id, fu_surv)
-
-# Print the result -- 28115-17-021  -- no follow up data for this pt looking in redcap, everyone else has some survival data in the ctDNA cohort. 
-print(na_participant)
-
-
# A tibble: 1 × 2
-  participant_id fu_surv
-  <chr>            <int>
-1 28115-17-021        NA
-
-
# Summarize data by unique participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    surv = first(fu_surv),          # Get survival status for each participant
-    ctDNA_ever = first(ctDNA_ever),  # Get ctDNA_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of surv vs ctDNA_ever
-contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$ctDNA_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results, p<0.00001
-print(contingency_table)
-
-
   
-    FALSE TRUE
-  0     1    4
-  1    98    5
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 26.099, df = 1, p-value = 3.243e-07
-
-
-

DTC Demographics and Univariable tests of association: Next we will look at the univariable tests of association by DTC status.

-
-
############### DTC Demographics ########## 
-
-###### median age at diagnosis 
-
-#### Age at Dx (by DTC)
-
-names(subset_data) #to identify the variables I want to use 
-
-
  [1] "ID"                               "trialID"                         
-  [3] "participant_id"                   "patient_id"                      
-  [5] "fu_trial_pid"                     "timepoint"                       
-  [7] "project"                          "surmount_id"                     
-  [9] "panel_id"                         "accession"                       
- [11] "sample_id"                        "collection_date"                 
- [13] "extracted_plasma_volume_ml"       "input"                           
- [15] "input_sample"                     "physical_run_name"               
- [17] "workflow_name"                    "eVAF"                            
- [19] "mutant_molecules"                 "mean_VAF"                        
- [21] "Score"                            "all_pass_variants"               
- [23] "total_variants"                   "n_positive_variants"             
- [25] "ctDNA_detected"                   "ctdna_cohort"                    
- [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
- [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
- [31] "dtc_final_result_date"            "pt"                              
- [33] "bma_date"                         "ORIG_RSLT_DTC"                   
- [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
- [37] "FINAL_COUNT"                      "org_consent_date"                
- [39] "demo_initials"                    "demo_dob"                        
- [41] "demo_sex"                         "demo_ethnicity"                  
- [43] "demo_race___1"                    "demo_race___2"                   
- [45] "demo_race___3"                    "demo_race___4"                   
- [47] "demo_race___5"                    "demo_race___88"                  
- [49] "demo_race___99"                   "demo_race_other"                 
- [51] "prtx_radiation"                   "prtx_rad_start"                  
- [53] "prtx_rad_end"                     "prtx_chemo"                      
- [55] "prtx_endo"                        "prtx_bonemod"                    
- [57] "prior_therapy_complete"           "inc_dx_crit"                     
- [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
- [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
- [63] "final_receptor_group"             "demo_race_final"                 
- [65] "final_histology"                  "final_tumor_grade"               
- [67] "final_overall_stage"              "final_t_stage"                   
- [69] "final_n_stage"                    "fu_date_to"                      
- [71] "fu_surv"                          "fu_date_death"                   
- [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
- [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
- [77] "fu_locreg_prog"                   "fu_locreg_date"                  
- [79] "fu_dist_site_num"                 "fu_dist_site_char"               
- [81] "fu_dist_prog"                     "fu_dist_date"                    
- [83] "censor_date"                      "chemo_indication_1"              
- [85] "chemo_name_1"                     "chemo_name_other_1"              
- [87] "chemo_start_date_1"               "start_date_exact_1"              
- [89] "chemo_end_date_1"                 "end_date_exact_1"                
- [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
- [93] "chemo_indication_2"               "chemo_name_2"                    
- [95] "chemo_name_other_2"               "chemo_start_date_2"              
- [97] "start_date_exact_2"               "chemo_end_date_2"                
- [99] "end_date_exact_2"                 "chemo_notes_2"                   
-[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
-[103] "chemo_name_3"                     "chemo_name_other_3"              
-[105] "chemo_start_date_3"               "start_date_exact_3"              
-[107] "chemo_end_date_3"                 "end_date_exact_3"                
-[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
-[111] "chemo_indication_4"               "chemo_name_4"                    
-[113] "chemo_name_other_4"               "chemo_start_date_4"              
-[115] "start_date_exact_4"               "chemo_end_date_4"                
-[117] "end_date_exact_4"                 "chemo_notes_4"                   
-[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
-[121] "hormone_name___1_1"               "hormone_name___2_1"              
-[123] "hormone_name___3_1"               "hormone_name___4_1"              
-[125] "hormone_name___5_1"               "hormone_name___6_1"              
-[127] "hormone_name___7_1"               "hormone_other_1"                 
-[129] "hormone_start_date_1"             "hormone_ongoing_1"               
-[131] "hormone_end_date_1"               "hormone_notes_1"                 
-[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
-[135] "hormone_name___1_2"               "hormone_name___2_2"              
-[137] "hormone_name___3_2"               "hormone_name___4_2"              
-[139] "hormone_name___5_2"               "hormone_name___6_2"              
-[141] "hormone_name___7_2"               "hormone_other_2"                 
-[143] "hormone_start_date_2"             "hormone_ongoing_2"               
-[145] "hormone_end_date_2"               "hormone_notes_2"                 
-[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
-[149] "hormone_name___1_3"               "hormone_name___2_3"              
-[151] "hormone_name___3_3"               "hormone_name___4_3"              
-[153] "hormone_name___5_3"               "hormone_name___6_3"              
-[155] "hormone_name___7_3"               "hormone_other_3"                 
-[157] "hormone_start_date_3"             "hormone_ongoing_3"               
-[159] "hormone_end_date_3"               "hormone_notes_3"                 
-[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
-[163] "hormone_name___1_4"               "hormone_name___2_4"              
-[165] "hormone_name___3_4"               "hormone_name___4_4"              
-[167] "hormone_name___5_4"               "hormone_name___6_4"              
-[169] "hormone_name___7_4"               "hormone_other_4"                 
-[171] "hormone_start_date_4"             "hormone_ongoing_4"               
-[173] "hormone_end_date_4"               "hormone_notes_4"                 
-[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
-[177] "hormone_name___1_5"               "hormone_name___2_5"              
-[179] "hormone_name___3_5"               "hormone_name___4_5"              
-[181] "hormone_name___5_5"               "hormone_name___6_5"              
-[183] "hormone_name___7_5"               "hormone_other_5"                 
-[185] "hormone_start_date_5"             "hormone_ongoing_5"               
-[187] "hormone_end_date_5"               "hormone_notes_5"                 
-[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
-[191] "hormone_name___1_6"               "hormone_name___2_6"              
-[193] "hormone_name___3_6"               "hormone_name___4_6"              
-[195] "hormone_name___5_6"               "hormone_name___6_6"              
-[197] "hormone_name___7_6"               "hormone_other_6"                 
-[199] "hormone_start_date_6"             "hormone_ongoing_6"               
-[201] "hormone_end_date_6"               "hormone_notes_6"                 
-[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
-[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
-[207] "bonemod_name_1"                   "bonemod_start_date_1"            
-[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
-[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
-[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
-[215] "bonemod_name_2"                   "bonemod_start_date_2"            
-[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
-[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
-[221] "diag_lateral_1"                   "diag_menopause_1"                
-[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
-[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
-[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
-[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
-[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
-[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
-[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
-[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
-[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
-[241] "diag_er_status_1"                 "diag_er_percent_1"               
-[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
-[245] "diag_her2_status_1"               "diag_her2_method_1"              
-[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
-[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
-[251] "diag_pcr_1"                       "diag_surgery_date_1"             
-[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
-[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
-[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
-[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
-[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
-[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
-[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
-[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
-[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
-[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
-[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
-[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
-[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
-[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
-[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
-[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
-[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
-[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
-[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
-[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
-[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
-[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
-[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
-[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
-[301] "mammoprint_1"                     "mammoprint_result_1"             
-[303] "diag_notes_1"                     "diag_date_2"                     
-[305] "diag_lateral_2"                   "diag_menopause_2"                
-[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
-[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
-[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
-[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
-[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
-[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
-[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
-[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
-[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
-[325] "diag_er_status_2"                 "diag_er_percent_2"               
-[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
-[329] "diag_her2_status_2"               "diag_her2_method_2"              
-[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
-[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
-[335] "diag_pcr_2"                       "diag_surgery_date_2"             
-[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
-[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
-[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
-[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
-[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
-[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
-[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
-[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
-[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
-[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
-[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
-[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
-[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
-[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
-[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
-[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
-[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
-[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
-[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
-[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
-[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
-[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
-[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
-[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
-[385] "mammoprint_2"                     "mammoprint_result_2"             
-[387] "diag_notes_2"                     "ctDNA_ever"                      
-[389] "dtc_ever"                         "ever_relapsed"                   
-[391] "age_at_diag"                      "HR_status"                       
-[393] "histology_category"               "node_status"                     
-[395] "axillary_dissection"             
-
-
str(subset_data$diag_date_1) #character
-
-
 Date[1:398], format: "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" ...
-
-
str(subset_data$demo_dob) #character 
-
-
 Date[1:398], format: "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" ...
-
-
d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y")  
-d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y")  
-
-str(d$diag_date_1) #dates! 
-
-
 Date[1:579], format: "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" ...
-
-
str(d$demo_dob) #dates! 
-
-
 Date[1:579], format: "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" ...
-
-
### doing the same for subset_data as it didn't carry over into that data set 
-subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y")  
-subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y")  
-
-# calculating age from date of diagnosis to dob 
-subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25
-head(subset_data$age_at_diag)
-
-
[1] 55.89870 55.89870 55.89870 55.89870 55.89870 49.25667
-
-
summary(subset_data$age_at_diag) #median 48.75 
-
-
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-  27.34   41.73   48.75   49.35   57.63   68.94 
-
-
age_summary <- subset_data |>
-  group_by(dtc_ever) |>
-  summarise(
-    mean_age = mean(age_at_diag, na.rm = TRUE),  # Calculate mean age
-    median_age = median(age_at_diag, na.rm = TRUE),  # Calculate median age
-    sd_age = sd(age_at_diag, na.rm = TRUE),  # Calculate standard deviation of age
-    n = n()  # Number of participants in each group
-  )
-
-print(age_summary) #interesting dtc ever are slightly more positive 
-
-
# A tibble: 2 × 5
-  dtc_ever mean_age median_age sd_age     n
-     <dbl>    <dbl>      <dbl>  <dbl> <int>
-1        0     50.6       51.8   9.48   149
-2        1     48.6       47.3   9.75   249
-
-
# Perform the Wilcoxon rank-sum test to compare the medians of age between dtc_ever positive and negative groups
-wilcox_test_result <- wilcox.test(age_at_diag ~ dtc_ever, data = subset_data)
-
-# Print the result
-print(wilcox_test_result)
-
-

-    Wilcoxon rank sum test with continuity correction
-
-data:  age_at_diag by dtc_ever
-W = 20838, p-value = 0.03946
-alternative hypothesis: true location shift is not equal to 0
-
-
#looking at range of age for the dtc pos 
-age_summary <- subset_data |>
-  group_by(dtc_ever) |>
-  summarise(
-    min_age = min(age_at_diag, na.rm = TRUE),  # Minimum age
-    max_age = max(age_at_diag, na.rm = TRUE),  # Maximum age
-    .groups = "drop"
-  )
-
-# View the summary table
-print(age_summary)
-
-
# A tibble: 2 × 3
-  dtc_ever min_age max_age
-     <dbl>   <dbl>   <dbl>
-1        0    27.3    68.9
-2        1    30.7    67.7
-
-
-
-
##### Race: demo_race_final
-
-# Get the count of unique participant_ids for each category in demo_race_final
-race_counts_unique_percent <- subset_data |>
-  group_by(demo_race_final) |>
-  summarise(unique_participants = n_distinct(participant_id)) |>
-  mutate(percent = unique_participants / sum(unique_participants) * 100)
-
-# View the result
-print(race_counts_unique_percent)
-
-
# A tibble: 3 × 3
-  demo_race_final unique_participants percent
-            <int>               <int>   <dbl>
-1               1                   9   8.26 
-2               3                   1   0.917
-3               5                  99  90.8  
-
-
# Count distinct participant_ids by dtc_ever and demo_race_final
-count_distinct_participants <- subset_data |>
-  group_by(demo_race_final, dtc_ever) |>
-  summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop")
-
-# Print the result
-count_distinct_participants
-
-
# A tibble: 5 × 3
-  demo_race_final dtc_ever distinct_participant_count
-            <int>    <dbl>                      <int>
-1               1        0                          5
-2               1        1                          4
-3               3        0                          1
-4               5        0                         64
-5               5        1                         35
-
-
library(dplyr)
-
-# Step 1: Summarize by unique participant_id
-summarized_data <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    dtc_ever = first(dtc_ever),   # Taking the first observed value of dtc_ever for each participant
-    demo_race_final = first(demo_race_final),  # Taking the first observed value of demo_race_final for each participant
-    .groups = "drop"
-  )
-
-# Step 2: Create the contingency table
-contingency_table <- table(summarized_data$dtc_ever, summarized_data$demo_race_final)
-contingency_table
-
-
   
-     1  3  5
-  0  5  1 64
-  1  4  0 35
-
-
# Step 3: Perform the chi-squared test of independence
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Step 4: Print the result p val - 0.65 
-chisq_test
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 0.85903, df = 2, p-value = 0.6508
-
-
#receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') 
-
-# Breakdown of final_receptor_group by unique participant_id
-receptor_status_by_participant <- subset_data |>
-  group_by(participant_id) |>
-  summarise(final_receptor_group = first(final_receptor_group),  # Or choose the most frequent group if needed
-            .groups = "drop")
-
-# View the result
-table(receptor_status_by_participant$final_receptor_group)
-
-

- 1  2  3  4 
-45 52  8  4 
-
-
# Summarizing data by participant_id, final_receptor_group, and dtc_ever
-receptor_dtc_status <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    final_receptor_group = first(final_receptor_group),  # Or the most frequent if needed
-    dtc_ever = first(dtc_ever),  # Taking the first observed value for dtc_ever
-    .groups = "drop"
-  )
-
-# Step 2: Create the contingency table
-contingency_table_receptor <- table(receptor_dtc_status$final_receptor_group, receptor_dtc_status$dtc_ever)
-contingency_table_receptor
-
-
   
-     0  1
-  1 25 20
-  2 37 15
-  3  4  4
-  4  4  0
-
-
# Step 3: Perform the chi-squared test of independence
-chisq_test <- chisq.test(contingency_table_receptor)
-
-
Warning in chisq.test(contingency_table_receptor): Chi-squared approximation
-may be incorrect
-
-
# Step 4: Print the result # p-value 0.14 -- interesting looks like more even distribution of DTC + across TNBC than for ctDNA 
-chisq_test
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table_receptor
-X-squared = 5.4909, df = 3, p-value = 0.1392
-
-
#curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive)
-#start with TNBC (using QDC)
-#inclusion criteria inc_dx_crit___1  = TNBC 
-
-
-#inc_dx_crit_list___1  
-
-TNBC_dtc_status <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    inc_dx_crit_list___1 = first(inc_dx_crit_list___1),  # Or the most frequent if needed
-    dtc_ever = first(dtc_ever),  # Taking the first observed value for dtc_ever
-    .groups = "drop"
-  )
-
-# Step 2: Create the contingency table
-contingency_table_TNBC <- table(TNBC_dtc_status$inc_dx_crit_list___1, TNBC_dtc_status$dtc_ever)
-contingency_table_TNBC
-
-
   
-     0  1
-  0 45 19
-  1 25 20
-
-
# Step 3: Perform the chi-squared test of independence
-chisq_test <- chisq.test(contingency_table_TNBC)
-
-# Step 4: p-val is 0.17 
-chisq_test
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table_TNBC
-X-squared = 1.903, df = 1, p-value = 0.1677
-
-
#ER vs non-ER 
-#first create HR_status variable 
-subset_data <- subset_data |> 
-  mutate(HR_status = case_when(
-    final_receptor_group %in% c(2, 3) ~ "HR+",
-    final_receptor_group %in% c(1, 4) ~ "Non-HR+",
-    TRUE ~ NA_character_  # In case there are missing or other unexpected values
-  ))
-
-# View the new HR_status variable
-table(subset_data$HR_status)
-
-

-    HR+ Non-HR+ 
-    225     173 
-
-
HR_status_by_participant <- subset_data |>
-  group_by(participant_id) |>
-  summarise(HR_status = first(HR_status),  # Or use mode() if you have multiple rows per participant
-            .groups = "drop")
-
-# View the result 
-table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-)
-
-

-    HR+ Non-HR+ 
-     60      49 
-
-
# Summarize dtc_detected status by HR_status, for each unique participant_id
-summary_data <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    HR_status = first(HR_status),  # Get the HR_status for the participant
-    dtc_status = first(dtc_ever),  # Get the dtc_detected status for the participant
-    .groups = "drop"
-  )
-
-contingency_table_HR <- table(summary_data$dtc_status, summary_data$HR_status)
-contingency_table_HR
-
-
   
-    HR+ Non-HR+
-  0  41      29
-  1  19      20
-
-
chisq_test <- chisq.test(contingency_table_HR)
-
-# Print chi-squared test results #0.28 
-chisq_test
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table_HR
-X-squared = 0.62484, df = 1, p-value = 0.4293
-
-
###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported  
-
-# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported 
-summary_data <- subset_data |>
-  filter(final_tumor_grade != 3) |>  # Exclude grade == 3
-  group_by(participant_id) |>
-  summarise(
-    grade = first(final_tumor_grade),  # Get the final_tumor_grade for each participant
-    dtc_ever = first(dtc_ever),    # Get the dtc_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of grade vs dtc_ever
-contingency_table <- table(summary_data$grade, summary_data$dtc_ever)
-
-# View the contingency table
-print(contingency_table)
-
-
   
-     0  1
-  0 46 33
-  1 18  4
-  2  4  2
-
-
# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# View the Chi-squared test result -- p-value 0.12 NOT SIG for DTCs 
-print(chisq_test)
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 4.1608, df = 2, p-value = 0.1249
-
-
######histology  #people have different combinations of histology (1-15)
-table(subset_data$participant_id, subset_data$final_histology)
-
-
              
-                1 1,13,14,3 1,3 11,3 12,3 13,3 13,3,5 14 14,15 14,15,3 14,3
-  28115-16-001  0         0   0    0    0    0      0  0     0       0    5
-  28115-16-004  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-010  0         0   0    0    1    0      0  0     0       0    0
-  28115-16-014  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-015  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-017  0         0   0    0    0    0      0  3     0       0    0
-  28115-16-020  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-021  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-023  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-025  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-026  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-027  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-029  0         0   0    0    0    0      0  2     0       0    0
-  28115-16-033  0         0   0    0    0    0      0  0     0       0    0
-  28115-16-035  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-001  0         0   8    0    0    0      0  0     0       0    0
-  28115-17-002  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-006  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-008  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-009  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-010  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-011  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-012  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-016  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-017  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-019  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-021  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-022  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-023  0         0   2    0    0    0      0  0     0       0    0
-  28115-17-024  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-025  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-027  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-030  0         0   3    0    0    0      0  0     0       0    0
-  28115-17-031  0         0   0    0    0    0      0  5     0       0    0
-  28115-17-032  0         0   0   10    0    0      0  0     0       0    0
-  28115-17-036  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-039  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-040  0         0   4    0    0    0      0  0     0       0    0
-  28115-17-045  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-046  0         0   0    0    0   10      0  0     0       0    0
-  28115-17-047  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-048  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-050  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-051  0         0   0    0    0    0      0  0     0       0    0
-  28115-17-052  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-001  0         0   0    0    0    0      0  0     7       0    0
-  28115-18-002  0         0   2    0    0    0      0  0     0       0    0
-  28115-18-004  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-006  0         0   0    0    0    0      0  1     0       0    0
-  28115-18-009  0         0   0    0    0    0      0  1     0       0    0
-  28115-18-011  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-014  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-015  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-017  0         0   0    0    0    0      0  1     0       0    0
-  28115-18-020  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-021  0         0   0    0    0    0      0  8     0       0    0
-  28115-18-022  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-023  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-024  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-027  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-028  1         0   0    0    0    0      0  0     0       0    0
-  28115-18-029  0         0   0    0    0    0      0  4     0       0    0
-  28115-18-030  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-031  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-032  0         0   0    0    0    0      0  0     0       0    0
-  28115-18-034  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-001  0         0   0    0    0    0      0  1     0       0    0
-  28115-19-002  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-003  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-004  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-005  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-006  0         0   0    0    0    8      0  0     0       0    0
-  28115-19-007  0         0   0    0    0    0      0  0     0       0    5
-  28115-19-009  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-011  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-012  0         3   0    0    0    0      0  0     0       0    0
-  28115-19-014  0         0   0    0    0    0      0  0     0       0    2
-  28115-19-016  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-017  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-019  0         0   0    0    0    0      0  0     0       0    3
-  28115-19-020  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-021  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-022  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-025  0         0   0    0    0    0      0  0     0       0    0
-  28115-19-028  0         0   0    0    0    0      0  0     0       0    0
-  28115-20-004  0         0   0    0    0    0      0  0     0       0    0
-  28115-20-007  0         0   0    0    0    0      0  0     0       0    0
-  28115-20-009  0         0   0    0    0    0      0  0     0       0    0
-  28115-20-010  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-001  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-002  0         0   0    0    0    0      0  0     0       0    4
-  28115-21-003  0         0   0    0    0    0      0  2     0       0    0
-  28115-21-006  0         0   0    0    0    2      0  0     0       0    0
-  28115-21-007  0         0   0    0    0    0      0  3     0       0    0
-  28115-21-009  0         0   0    0    0    0      3  0     0       0    0
-  28115-21-011  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-013  0         0   0    0    0    0      0  0     0       0    4
-  28115-21-014  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-015  0         0   0    0    0    0      0  0     0       2    0
-  28115-21-016  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-019  0         0   0    0    0    0      0  1     0       0    0
-  28115-21-020  0         0   3    0    0    0      0  0     0       0    0
-  28115-21-021  0         0   0    0    0    0      0  3     0       0    0
-  28115-21-022  0         0   0    0    0    1      0  0     0       0    0
-  28115-21-024  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-025  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-026  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-027  0         0   0    0    0    0      0  0     0       0    0
-  28115-21-028  0         0   0    0    0    0      0  0     0       0    1
-              
-               16,3  3 3,5 3,7  5
-  28115-16-001    0  0   0   0  0
-  28115-16-004    0  1   0   0  0
-  28115-16-010    0  0   0   0  0
-  28115-16-014    0  1   0   0  0
-  28115-16-015    0 12   0   0  0
-  28115-16-017    0  0   0   0  0
-  28115-16-020    0  0   1   0  0
-  28115-16-021    0  9   0   0  0
-  28115-16-023    0  1   0   0  0
-  28115-16-025    0  1   0   0  0
-  28115-16-026    0 10   0   0  0
-  28115-16-027    0  3   0   0  0
-  28115-16-029    0  0   0   0  0
-  28115-16-033    0  2   0   0  0
-  28115-16-035    0  1   0   0  0
-  28115-17-001    0  0   0   0  0
-  28115-17-002    0  9   0   0  0
-  28115-17-006    0  1   0   0  0
-  28115-17-008    0  9   0   0  0
-  28115-17-009    0  1   0   0  0
-  28115-17-010    0  5   0   0  0
-  28115-17-011    0  9   0   0  0
-  28115-17-012    0 10   0   0  0
-  28115-17-016    0  4   0   0  0
-  28115-17-017    0  5   0   0  0
-  28115-17-019    0  9   0   0  0
-  28115-17-021    0  1   0   0  0
-  28115-17-022    0  1   0   0  0
-  28115-17-023    0  0   0   0  0
-  28115-17-024    0  0   0   4  0
-  28115-17-025    0  2   0   0  0
-  28115-17-027    0  8   0   0  0
-  28115-17-030    0  0   0   0  0
-  28115-17-031    0  0   0   0  0
-  28115-17-032    0  0   0   0  0
-  28115-17-036    0  7   0   0  0
-  28115-17-039    0  2   0   0  0
-  28115-17-040    0  0   0   0  0
-  28115-17-045    0  0   1   0  0
-  28115-17-046    0  0   0   0  0
-  28115-17-047    0  3   0   0  0
-  28115-17-048    0  2   0   0  0
-  28115-17-050    0  3   0   0  0
-  28115-17-051    0  9   0   0  0
-  28115-17-052    0  0   0   0  3
-  28115-18-001    0  0   0   0  0
-  28115-18-002    0  0   0   0  0
-  28115-18-004    0  2   0   0  0
-  28115-18-006    0  0   0   0  0
-  28115-18-009    0  0   0   0  0
-  28115-18-011    0  5   0   0  0
-  28115-18-014    0  2   0   0  0
-  28115-18-015    0  5   0   0  0
-  28115-18-017    0  0   0   0  0
-  28115-18-020    0  8   0   0  0
-  28115-18-021    0  0   0   0  0
-  28115-18-022    0  0   0  12  0
-  28115-18-023    0  3   0   0  0
-  28115-18-024    0  0   0   2  0
-  28115-18-027    0  1   0   0  0
-  28115-18-028    0  0   0   0  0
-  28115-18-029    0  0   0   0  0
-  28115-18-030    0  2   0   0  0
-  28115-18-031    0  3   0   0  0
-  28115-18-032    0  6   0   0  0
-  28115-18-034    0  1   0   0  0
-  28115-19-001    0  0   0   0  0
-  28115-19-002    0  2   0   0  0
-  28115-19-003    0  5   0   0  0
-  28115-19-004    0  1   0   0  0
-  28115-19-005    0  3   0   0  0
-  28115-19-006    0  0   0   0  0
-  28115-19-007    0  0   0   0  0
-  28115-19-009    0  6   0   0  0
-  28115-19-011    0  1   0   0  0
-  28115-19-012    0  0   0   0  0
-  28115-19-014    0  0   0   0  0
-  28115-19-016    0  2   0   0  0
-  28115-19-017    0  2   0   0  0
-  28115-19-019    0  0   0   0  0
-  28115-19-020    0  2   0   0  0
-  28115-19-021    0  4   0   0  0
-  28115-19-022    0  2   0   0  0
-  28115-19-025    0  6   0   0  0
-  28115-19-028    0  2   0   0  0
-  28115-20-004    0  2   0   0  0
-  28115-20-007    0  2   0   0  0
-  28115-20-009    0  4   0   0  0
-  28115-20-010    0  1   0   0  0
-  28115-21-001    0  1   0   0  0
-  28115-21-002    0  0   0   0  0
-  28115-21-003    0  0   0   0  0
-  28115-21-006    0  0   0   0  0
-  28115-21-007    0  0   0   0  0
-  28115-21-009    0  0   0   0  0
-  28115-21-011    0  1   0   0  0
-  28115-21-013    0  0   0   0  0
-  28115-21-014    0  2   0   0  0
-  28115-21-015    0  0   0   0  0
-  28115-21-016    0  8   0   0  0
-  28115-21-019    0  0   0   0  0
-  28115-21-020    0  0   0   0  0
-  28115-21-021    0  0   0   0  0
-  28115-21-022    0  0   0   0  0
-  28115-21-024    0  2   0   0  0
-  28115-21-025    0  2   0   0  0
-  28115-21-026    2  0   0   0  0
-  28115-21-027    0  2   0   0  0
-  28115-21-028    0  0   0   0  0
-
-
histology_summary <- subset_data |>
-  distinct(participant_id, final_histology) |>  # Get unique participant-histology combinations
-  group_by(final_histology) |>  # Group by histology type
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology 
-
-
# A tibble: 16 × 2
-   final_histology count
-   <chr>           <int>
- 1 1                   1
- 2 1,13,14,3           1
- 3 1,3                 6
- 4 11,3                1
- 5 12,3                1
- 6 13,3                4
- 7 13,3,5              1
- 8 14                 13
- 9 14,15               1
-10 14,15,3             1
-11 14,3                7
-12 16,3                1
-13 3                  65
-14 3,5                 2
-15 3,7                 3
-16 5                   1
-
-
#trying to create Ductal, lobular, both, or other variables 
-subset_data <- subset_data |>
-  mutate(histology_category = case_when(
-    grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular",  # Both Ductal and Lobular
-    grepl("3", as.character(final_histology)) ~ "Ductal",  # Ductal
-    grepl("14", as.character(final_histology)) ~ "Lobular",  # Lobular
-    TRUE ~ "Other"  # Any other combination
-  ))
-
-# Count the number of participants in each histology category
-histology_counts <- subset_data |>
-  group_by(histology_category) |>
-  summarise(count = n_distinct(participant_id))  # Count distinct participants
-
-# View the counts -- adds up to 109! 
-print(histology_counts)
-
-
# A tibble: 4 × 2
-  histology_category      count
-  <chr>                   <int>
-1 Both Ductal and Lobular     9
-2 Ductal                     84
-3 Lobular                    14
-4 Other                       2
-
-
#contingency table 
-library(tidyr)
-contingency_table <- subset_data |>
-  distinct(participant_id, histology_category, dtc_ever) |>  # Ensure each patient is counted once
-  count(histology_category, dtc_ever) |>
-  pivot_wider(names_from = dtc_ever, values_from = n, values_fill = list(n = 0))  # Pivot the table to get dtc_ever as columns
-
-# 3. Perform the Chi-squared test of independence
-chisq_test <- chisq.test(contingency_table[,-1])  # Remove the histology_category column for the test
-
-
Warning in chisq.test(contingency_table[, -1]): Chi-squared approximation may
-be incorrect
-
-
# 4. Print the contingency table
-print(contingency_table) 
-
-
# A tibble: 4 × 3
-  histology_category        `0`   `1`
-  <chr>                   <int> <int>
-1 Both Ductal and Lobular     9     0
-2 Ductal                     48    36
-3 Lobular                    11     3
-4 Other                       2     0
-
-
# 5. Print the result of the Chi-squared test p-value - 0.03 ### More ductal positive generally compard to all histology 
-print(chisq_test)
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table[, -1]
-X-squared = 9.2145, df = 3, p-value = 0.02657
-
-
#### Stage -- N stage  --> come back to this N stage stuff 
-
-table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3)
-
-
              
-                0  1  2  3
-  28115-16-001  0  0  0  5
-  28115-16-004  1  0  0  0
-  28115-16-010  0  0  0  1
-  28115-16-014  1  0  0  0
-  28115-16-015 12  0  0  0
-  28115-16-017  0  0  3  0
-  28115-16-020  0  0  1  0
-  28115-16-021  0  0  9  0
-  28115-16-023  1  0  0  0
-  28115-16-025  1  0  0  0
-  28115-16-026 10  0  0  0
-  28115-16-027  0  3  0  0
-  28115-16-029  2  0  0  0
-  28115-16-033  0  2  0  0
-  28115-16-035  1  0  0  0
-  28115-17-001  0  8  0  0
-  28115-17-002  9  0  0  0
-  28115-17-006  0  1  0  0
-  28115-17-008  9  0  0  0
-  28115-17-009  1  0  0  0
-  28115-17-010  5  0  0  0
-  28115-17-011  0  0  0  9
-  28115-17-012  0  0  0 10
-  28115-17-016  0  4  0  0
-  28115-17-017  0  5  0  0
-  28115-17-019  9  0  0  0
-  28115-17-021  1  0  0  0
-  28115-17-022  1  0  0  0
-  28115-17-023  0  0  2  0
-  28115-17-024  4  0  0  0
-  28115-17-025  2  0  0  0
-  28115-17-027  0  8  0  0
-  28115-17-030  3  0  0  0
-  28115-17-031  5  0  0  0
-  28115-17-032  0  0 10  0
-  28115-17-036  7  0  0  0
-  28115-17-039  2  0  0  0
-  28115-17-040  0  0  4  0
-  28115-17-045  1  0  0  0
-  28115-17-046 10  0  0  0
-  28115-17-047  0  3  0  0
-  28115-17-048  0  0  2  0
-  28115-17-050  3  0  0  0
-  28115-17-051  9  0  0  0
-  28115-17-052  3  0  0  0
-  28115-18-001  0  0  7  0
-  28115-18-002  0  2  0  0
-  28115-18-004  0  0  2  0
-  28115-18-006  0  1  0  0
-  28115-18-009  1  0  0  0
-  28115-18-011  0  5  0  0
-  28115-18-014  0  2  0  0
-  28115-18-015  5  0  0  0
-  28115-18-017  0  1  0  0
-  28115-18-020  8  0  0  0
-  28115-18-021  0  8  0  0
-  28115-18-022 12  0  0  0
-  28115-18-023  0  3  0  0
-  28115-18-024  0  2  0  0
-  28115-18-027  0  1  0  0
-  28115-18-028  1  0  0  0
-  28115-18-029  0  4  0  0
-  28115-18-030  2  0  0  0
-  28115-18-031  0  3  0  0
-  28115-18-032  0  6  0  0
-  28115-18-034  1  0  0  0
-  28115-19-001  0  0  0  1
-  28115-19-002  0  2  0  0
-  28115-19-003  0  5  0  0
-  28115-19-004  0  1  0  0
-  28115-19-005  3  0  0  0
-  28115-19-006  0  8  0  0
-  28115-19-007  0  5  0  0
-  28115-19-009  0  0  0  6
-  28115-19-011  0  1  0  0
-  28115-19-012  0  3  0  0
-  28115-19-014  0  0  0  2
-  28115-19-016  2  0  0  0
-  28115-19-017  2  0  0  0
-  28115-19-019  0  3  0  0
-  28115-19-020  2  0  0  0
-  28115-19-021  0  4  0  0
-  28115-19-022  0  2  0  0
-  28115-19-025  0  6  0  0
-  28115-19-028  2  0  0  0
-  28115-20-004  2  0  0  0
-  28115-20-007  0  0  2  0
-  28115-20-009  4  0  0  0
-  28115-20-010  0  1  0  0
-  28115-21-001  0  1  0  0
-  28115-21-002  0  4  0  0
-  28115-21-003  0  0  2  0
-  28115-21-006  0  2  0  0
-  28115-21-007  0  0  3  0
-  28115-21-009  0  0  3  0
-  28115-21-011  1  0  0  0
-  28115-21-013  0  4  0  0
-  28115-21-014  0  2  0  0
-  28115-21-015  0  2  0  0
-  28115-21-016  8  0  0  0
-  28115-21-019  0  1  0  0
-  28115-21-020  0  3  0  0
-  28115-21-021  0  3  0  0
-  28115-21-022  1  0  0  0
-  28115-21-024  2  0  0  0
-  28115-21-025  0  2  0  0
-  28115-21-026  0  2  0  0
-  28115-21-027  2  0  0  0
-  28115-21-028  1  0  0  0
-
-
nodal_summary <- subset_data |>
-  distinct(participant_id, final_n_stage) |>  # Get unique participant-stage combinations
-  group_by(final_n_stage) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-#View the summary table --adds up to 109, 46 = pN0 63 = pN1 
-print(nodal_summary)
-
-
# A tibble: 4 × 2
-  final_n_stage count
-          <int> <int>
-1             0    46
-2             1    43
-3             2    13
-4             3     7
-
-
subset_data_by_id <- subset_data |>
-  filter(final_n_stage %in% c(0, 1, 2, 3)) |>  # Include only relevant nodal stages
-  group_by(participant_id) |>
-  summarise(
-    nodal_status = first(final_n_stage),  # Use final_n_stage as nodal_status for each participant
-    dtc_ever = first(dtc_ever),       # Get dtc_ever status for each participant
-    .groups = "drop"
-  )
-
-# Step 3: Create a contingency table of nodal_status vs dtc_ever
-contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$dtc_ever)
-
-# Step 4: Check if any cells in the contingency table have zero counts, which could affect test validity
-print(contingency_table)
-
-
   
-     0  1
-  0 24 22
-  1 32 11
-  2 10  3
-  3  4  3
-
-
# Step 5: Perform Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Step 6: Print the Chi-squared test result p = 0.0001 
-print(chisq_test) 
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 5.9169, df = 3, p-value = 0.1157
-
-
#### Creating Node - vs node + variable from summary variable  
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"),  # Node negative if 0, positive otherwise
-    dtc_ever = first(dtc_ever),  # Get dtc_ever status for each participant
-    .groups = "drop"
-  )
-
-# Step 2: Create a contingency table of node_status vs dtc_ever
-contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$dtc_ever)
-
-# Step 3: Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-# Step 4: Print the contingency table and Chi-squared test results
-print(contingency_table)
-
-
               
-                 0  1
-  Node Negative 24 22
-  Node Positive 46 17
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 4.1601, df = 1, p-value = 0.04139
-
-
####### EXTRA CODE/CONFIRMATION / slightly different but ignore for our analysis 
-#cross-check with indicator pN0 in our data that reflects nodal positivity.... there is 1 patient that is node - by summary variable but node + by indicator variable 
-## should double check this at some point 
-node_pos <- subset_data |>
-  distinct(participant_id, inc_dx_crit_list___2) |>  # Get unique participant-stage combinations
-  group_by(inc_dx_crit_list___2) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-print(node_pos)
-
-
# A tibble: 2 × 2
-  inc_dx_crit_list___2 count
-                 <int> <int>
-1                    0    45
-2                    1    64
-
-
contingency_table <- subset_data |>
-  distinct(participant_id, inc_dx_crit_list___2, dtc_ever) |>  # Ensure unique participants
-  count(inc_dx_crit_list___2, dtc_ever) |>  # Count occurrences
-  spread(key = dtc_ever, value = n, fill = 0)  # Spread data into a matrix
-
-# View the contingency table
-print(contingency_table)
-
-
# A tibble: 2 × 3
-  inc_dx_crit_list___2   `0`   `1`
-                 <int> <dbl> <dbl>
-1                    0    25    20
-2                    1    45    19
-
-
# Perform the Chi-square test =0.3902 
-chi_square_result <- chisq.test(contingency_table[, -1])  # Exclude the first column with the levels
-print(chi_square_result)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table[, -1]
-X-squared = 1.903, df = 1, p-value = 0.1677
-
-
#######t stage final_t_stage 
-
-table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (90 = pTx) so can proceed with this 
-
-

-  1   2   3   4  99 
-173 168  46  10   1 
-
-
t_summary <- subset_data |>
-  distinct(participant_id, final_t_stage) |>  # Get unique participant-stage combinations
-  group_by(final_t_stage) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2  
-print(t_summary)
-
-
# A tibble: 5 × 2
-  final_t_stage count
-          <int> <int>
-1             1    51
-2             2    44
-3             3    12
-4             4     1
-5            99     1
-
-
#### T stage, for our T stage table, will use T1 vs T2 or greater to simplify 
-#exclude 99 (the pTx) 
-subset_data_clean <- subset_data |>
-  filter(final_t_stage != 99, dtc_ever != 99)
-
-# Combine final_t_stage into T1 vs. T2 or greater
-subset_data_clean <- subset_data_clean |>
-  mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater"))
-
-# Summarize the data by participant_id after creating the new combined t_stage
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of final_t_stage_combined vs dtc_ever
-contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-# Print the contingency table and Chi-squared test results
-print(contingency_table)
-
-
               
-                 0  1
-  T1            34 17
-  T2 or greater 35 22
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.13531, df = 1, p-value = 0.713
-
-
#### TRy for T stage stats using T3 or greater as cutoff -- not super useful, So DONOT USE THIS FOR TABLE  
-
-#exclude 99 (the pTx) 
-subset_data_clean <- subset_data |>
-  filter(final_t_stage != 99, dtc_ever != 99)
-
-# Combine final_t_stage into T1/T2 or T3 or greater
-subset_data_clean <- subset_data_clean |>
-  mutate(final_t_stage_combined = case_when(
-    final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2",  # Group T1 and T2 together
-    final_t_stage >= 3 ~ "T3 or greater",  # Group T3 and higher as a separate category
-    TRUE ~ NA_character_  # Handle any unexpected values
-  ))
-
-
-# Summarize the data by participant_id after creating the new combined t_stage
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of final_t_stage_combined vs dtc_ever
-contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results --> not significant so ignore this 
-print(contingency_table)
-
-
               
-                 0  1
-  T1 or T2      61 34
-  T3 or greater  8  5
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 1.4397e-31, df = 1, p-value = 1
-
-
########stage of disease -- final_overall_stage 
-
-table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this 
-
-

-  1   2   3  99 
-124 167 105   2 
-
-
stage_summary <- subset_data |>
-  distinct(participant_id, final_overall_stage) |>  # Get unique participant-stage combinations
-  group_by(final_overall_stage) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay)
-print(stage_summary)
-
-
# A tibble: 4 × 2
-  final_overall_stage count
-                <int> <int>
-1                   1    35
-2                   2    47
-3                   3    26
-4                  99     1
-
-
#exclude the 99 
-subset_data_clean <- subset_data |>
-  filter(final_overall_stage != 99, dtc_ever != 99)
-
-# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    final_overall_stage = first(final_overall_stage),  # Get the final_overall_stage for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of final_overall_stage vs dtc_ever
-contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-# Print the contingency table and Chi-squared test results --> kind of interesting, stage doesnt seem to predict dtc pos --> 0.80 
-print(contingency_table)
-
-
   
-     0  1
-  1 22 13
-  2 29 18
-  3 18  8
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 0.43515, df = 2, p-value = 0.8045
-
-
###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) 
-
-
-table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness 
-
-

-  1   2 
-158 240 
-
-
surgery <- subset_data |>
-  distinct(participant_id, diag_surgery_type_1) |>  # Get unique participant-stage combinations
-  group_by(diag_surgery_type_1) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(surgery)
-
-
# A tibble: 2 × 2
-  diag_surgery_type_1 count
-                <int> <int>
-1                   1    45
-2                   2    64
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    surgery = first(diag_surgery_type_1),  # Get the final_overall_stage for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of final_overall_stage vs dtc_ever
-contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-# Print the contingency table and Chi-squared test results --> p-val = 0.48....
-print(contingency_table)
-
-
   
-     0  1
-  1 31 14
-  2 38 25
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.50569, df = 1, p-value = 0.477
-
-
######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms)
-
-table(subset_data$diag_axillary_type___2_1) 
-
-

-  0   1 
-215 183 
-
-
table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two 
-
-

- 0  1 
-16  4 
-
-
# Create a binary variable to identify participants who had axillary dissection
-subset_data_clean <- subset_data |>
-  mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0))
-
-# Ensure every participant has a dtc_ever and axillary_dissection value
-# Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one
-subset_data_clean <- subset_data |>
-  mutate(axillary_dissection = case_when(
-    diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1,  # Had axillary dissection
-    TRUE ~ 0  # No axillary dissection (includes missing values)
-  ))
-
-# Summarize the data by participant_id, including the axillary_dissection and dtc_ever variables
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    axillary_dissection = first(axillary_dissection),  # Get the axillary dissection status for each participant
-    dtc_ever = first(dtc_ever)  # Get the dtc_ever status for each participant
-  )
-
-contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-fishers <- fisher.test(contingency_table)
-print(fishers)
-
-

-    Fisher's Exact Test for Count Data
-
-data:  contingency_table
-p-value = 0.1649
-alternative hypothesis: true odds ratio is not equal to 1
-95 percent confidence interval:
- 0.2309022 1.3129062
-sample estimates:
-odds ratio 
- 0.5559943 
-
-
# Print the contingency table and Chi-squared test results --> p-value 0.173 (used chisq for consistency...)
-print(contingency_table)
-
-
   
-     0  1
-  0 31 23
-  1 39 16
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 1.614, df = 1, p-value = 0.2039
-
-
####inflammatory inflamm_yn -- IGNORE THIS for Table 1 
-table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable)
-
-

-  0   1 
-568  11 
-
-
table(d$inflamm_yn_2)  ### I think inflammatory folks just not in subset of patients in the dtc cohort 
-
-

- 0 
-24 
-
-
table(subset_data$inflamm_yn)
-
-
Warning: Unknown or uninitialised column: `inflamm_yn`.
-
-
-
< table of extent 0 >
-
-
#### radiation prtx_radiation 
-table(subset_data$prtx_radiation) 
-
-

-  0   1 
-116 282 
-
-
radiation <- subset_data |> 
-  distinct(participant_id,prtx_radiation) |> 
-  group_by(prtx_radiation) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(radiation)
-
-
# A tibble: 2 × 2
-  prtx_radiation count
-           <int> <int>
-1              0    34
-2              1    75
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    radiation = first(prtx_radiation),  # Get the final_overall_stage for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of final_overall_stage vs dtc_ever
-contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-fishers <- fisher.test(contingency_table)
-print(fishers)
-
-

-    Fisher's Exact Test for Count Data
-
-data:  contingency_table
-p-value = 0.6709
-alternative hypothesis: true odds ratio is not equal to 1
-95 percent confidence interval:
- 0.4916844 3.2745694
-sample estimates:
-odds ratio 
-  1.243166 
-
-
# Print the contingency table and Chi-squared test results --> p-val  = 0.77 
-print(contingency_table)
-
-
   
-     0  1
-  0 23 11
-  1 47 28
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.0823, df = 1, p-value = 0.7742
-
-
#### chemotherapy prtx_chemo 
-table(subset_data$prtx_chemo) 
-
-

-  0   1 
- 18 380 
-
-
chemo <- subset_data |> 
-  distinct(participant_id,prtx_chemo) |> 
-  group_by(prtx_chemo) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(chemo) #3 people didn not get chemo in this cohort 
-
-
# A tibble: 2 × 2
-  prtx_chemo count
-       <int> <int>
-1          0     3
-2          1   106
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    chemo = first(prtx_chemo),  # Get the final_overall_stage for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of final_overall_stage vs dtc_ever
-contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
fishers <- fisher.test(contingency_table)
-print(fishers)
-
-

-    Fisher's Exact Test for Count Data
-
-data:  contingency_table
-p-value = 0.2906
-alternative hypothesis: true odds ratio is not equal to 1
-95 percent confidence interval:
- 0.00448755 5.37725419
-sample estimates:
-odds ratio 
- 0.2715663 
-
-
# Print the contingency table and Chi-squared test results --> p-val  = 0.60 
-print(contingency_table)
-
-
   
-     0  1
-  0  1  2
-  1 69 37
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.27148, df = 1, p-value = 0.6023
-
-
####neoadjuvant chemo diag_neoadj_chemo_1 or diag_neoadj_chemo_2 
-
-table(subset_data$diag_neoadj_chemo_1) 
-
-

-  0   1 
-327  71 
-
-
table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable 
-
-

- 0 
-20 
-
-
nact <- subset_data |> 
-  distinct(participant_id,diag_neoadj_chemo_1) |> 
-  group_by(diag_neoadj_chemo_1) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(nact) #3 people didn not get chemo in this cohort 
-
-
# A tibble: 2 × 2
-  diag_neoadj_chemo_1 count
-                <int> <int>
-1                   0    90
-2                   1    19
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    nact = first(diag_neoadj_chemo_1),  # Get the final_overall_stage for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of final_overall_stage vs dtc_ever
-contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-# Print the contingency table and Chi-squared test results --> p-val  = 0.37 slightly greater trend than with ctDNA  
-print(contingency_table)
-
-
   
-     0  1
-  0 60 30
-  1 10  9
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.80344, df = 1, p-value = 0.3701
-
-
####hormone therapy prtx_endo 
-
-table(subset_data$prtx_endo) 
-
-

-  0   1 
-156 242 
-
-
endo <- subset_data |> 
-  distinct(participant_id,prtx_endo) |> 
-  group_by(prtx_endo) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(endo) #most ppl did get endo (62 of the 109)
-
-
# A tibble: 2 × 2
-  prtx_endo count
-      <int> <int>
-1         0    47
-2         1    62
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    endo = first(prtx_endo),  # Get the final_overall_stage for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of final_overall_stage vs dtc_ever
-contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-# Print the contingency table and Chi-squared test results --> p-val  = 0.50 
-print(contingency_table)
-
-
   
-     0  1
-  0 28 19
-  1 42 20
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.46137, df = 1, p-value = 0.497
-
-
####bone modifying agents prtx_bonemod 
-
-table(subset_data$prtx_bonemod) 
-
-

-  0   1 
-238 160 
-
-
bonemod <- subset_data |> 
-  distinct(participant_id,prtx_bonemod) |> 
-  group_by(prtx_bonemod) |>  # Group by stage
-  summarise(count = n())  # Count the number of participants per histology type
-
-# View the summary table
-print(bonemod) #most ppl did get endo (39 got bonemod)
-
-
# A tibble: 2 × 2
-  prtx_bonemod count
-         <int> <int>
-1            0    70
-2            1    39
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    bonemod = first(prtx_bonemod),  # Get the final_overall_stage for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of bonemod vs dtc_ever
-contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-# Print the contingency table and Chi-squared test results --> p-val  = 1 
-print(contingency_table)
-
-
   
-     0  1
-  0 45 25
-  1 25 14
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0, df = 1, p-value = 1
-
-
-
-
#pCR 
-#2 = non-pcr, 1 = pcr 
-#path cr diag_pcr_1 or diag_pcr_2 (as this could be on either of the two diagnosis and staging forms, there are 2 variables for this)
-table(subset_data$diag_pcr_1) 
-
-

-  .   1   2 
-327   8  63 
-
-
table(subset_data$diag_pcr_2) #none recorded here so can just use pcr_1 
-
-

-      . 
-378  20 
-
-
pcr <- subset_data |>
-  mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) |>  # Convert "." to NA
-  filter(!is.na(diag_pcr_1)) |>  # Exclude rows where diag_pcr_1 is NA
-  distinct(participant_id, diag_pcr_1) |>
-  group_by(diag_pcr_1) |>
-  summarise(count = n()) # Count the number of participants per histology type
-
-# View the summary table
-print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data 
-
-
# A tibble: 2 × 2
-  diag_pcr_1 count
-  <chr>      <int>
-1 1              1
-2 2             18
-
-
# Summarize the data by participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    pcr = first(diag_pcr_1),  # Get the final_overall_stage for each participant
-    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
-  )
-
-# Create a contingency table of pcr vs dtc_ever
-contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results --> p-val  = 0.27-- does not seem to be association among those who got pcr (but also we have a group with 1 in it and a very small sample size of those on whom pCR was evaluated (18 individuals)
-print(contingency_table)
-
-
   
-     0  1
-  . 60 30
-  1  0  1
-  2 10  8
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test
-
-data:  contingency_table
-X-squared = 2.6174, df = 2, p-value = 0.2702
-
-
########recurrence
-#local first, then distant.then create summary variable of either locreg or distant 
-#local fu_locreg_prog 
-
-# Step 1: Summarize data by unique participant_id
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    fu_locreg_prog = first(fu_locreg_prog),  # Get fu_locreg_prog status for each participant
-    dtc_ever = first(dtc_ever),          # Get dtc_ever status for each participant
-    .groups = "drop"
-  )
-
-# Step 2: Create a contingency table of fu_locreg_prog vs dtc_ever
-contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$dtc_ever)
-
-# Step 3: Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Step 4: Print the contingency table and Chi-squared test results -- p-val of 0.74, less of an association (but pts on trial) 
-print(contingency_table)
-
-
   
-     0  1
-  0 66 35
-  1  3  3
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.10507, df = 1, p-value = 0.7458
-
-
####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char 
-### Just want to look at site distribution here 
-
-# Summarize the distribution of fu_locreg_site_char by unique participant_id
-site_distribution <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    site = first(fu_locreg_site_char),  # Get the site for each unique participant
-    .groups = "drop"
-  ) |>
-  count(site)  # Count the occurrences of each site
-
-# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast 
-print(site_distribution)
-
-
# A tibble: 6 × 2
-  site                                                              n
-  <chr>                                                         <int>
-1 ""                                                              103
-2 "Axillary Nodes"                                                  2
-3 "Axillary Nodes,Internal Mammary Nodes,Supraclavicular Nodes"     1
-4 "Ipsilateral Breast"                                              1
-5 "Ipsilateral Breast,Axillary Nodes"                               1
-6 "Supraclavicular Nodes"                                           1
-
-
#####distant recurrence: distant fu_dist_prog 
-
-# Step 1: Summarize data by unique participant_id
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    fu_dist_prog = first(fu_dist_prog),  # Get fu_dist_prog status for each participant
-    dtc_ever = first(dtc_ever),          # Get dtc_ever status for each participant
-    .groups = "drop"
-  )
-
-# Step 2: Create a contingency table of dist prog vs dtc_ever --> 12 who had distant progression 
-contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$dtc_ever)
-
-# Step 3: Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Step 4: Print the contingency table and Chi-squared test results -- p-val 0.63
-print(contingency_table)
-
-
   
-     0  1
-  0 60 35
-  1  9  3
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.23777, df = 1, p-value = 0.6258
-
-
### Distant sites 
-#distant site fu_dist_site_num #fu_dist_site_char  -- start justl ooking at the locations 
-
-# Summarize the distribution of fu_dist_site_char by unique participant_id
-dist_site_distribution <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    site = first(fu_dist_site_char),  # Get the site for each unique participant
-    .groups = "drop"
-  ) |>
-  count(site)  # Count the occurrences of each site
-
-# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal 
-print(dist_site_distribution)
-
-
# A tibble: 8 × 2
-  site                  n
-  <chr>             <int>
-1 ""                   97
-2 "Bone"                5
-3 "Bone,Other"          1
-4 "Intra-abdominal"     1
-5 "Liver"               2
-6 "Liver,Bone"          1
-7 "Lung"                1
-8 "Pleura,Lung"         1
-
-
#any recurrence 
-#either fu_locreg_prog or fu_dist_prog 
-
-subset_data <- subset_data |>
-  mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No"))
-
-# link by participant id 
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
-    dtc_ever = first(dtc_ever),        # Get the dtc_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of ever_relapsed vs dtc_ever
-contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results -- total 14 relapses, 10 were dtc - 4 were dtc + 
-print(contingency_table)
-
-
     
-       0  1
-  No  59 34
-  Yes 10  4
-
-
print(chisq_test) 
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.079932, df = 1, p-value = 0.7774
-
-
#### Relapse and DTC 
-#using ever_relapsed
-
-# link by participant id 
-subset_data_by_id <- subset_data |>
-  group_by(participant_id) |>
-  summarise(
-    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
-    dtc = first(dtc_ever),        # Get the dtc_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of ever_relapsed vs dtc_ever
-contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results
-print(contingency_table)
-
-
     
-       0  1
-  No  59 34
-  Yes 10  4
-
-
print(chisq_test) 
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.079932, df = 1, p-value = 0.7774
-
-
# Identify participants missing data in either `ever_relapsed` or `dtc_ever`
-missing_data <- subset_data_by_id |>
-  filter(is.na(ever_relapsed) | is.na(dtc))
-
-# Print the IDs of participants with missing data
-print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above)
-
-
[1] "28115-17-021" "28115-18-032"
-
-
####survival analysis  fu_survival 
-
-table(subset_data$fu_surv)
-
-

-  0   1 
-  8 389 
-
-
surv <- subset_data |>
-  distinct(participant_id, fu_surv) |>
-  group_by(fu_surv) |>
-  summarise(count = n()) # Count the number of participants per histology type
-
-# View the summary table
-print(surv) #1 NA patient --> identify the NA patient below  dead = 5, alive 103. There is 1 that's an NA. 
-
-
# A tibble: 3 × 2
-  fu_surv count
-    <int> <int>
-1       0     5
-2       1   103
-3      NA     1
-
-
na_participant <- subset_data |>
-  filter(is.na(fu_surv)) |>
-  select(participant_id, fu_surv)
-
-# Print the result -- 28115-17-021  -- no follow up data for this pt looking in redcap, everyone else has some survival data in the dtc cohort. 
-print(na_participant)
-
-
# A tibble: 1 × 2
-  participant_id fu_surv
-  <chr>            <int>
-1 28115-17-021        NA
-
-
# Summarize data by unique participant_id
-subset_data_by_id <- subset_data_clean |>
-  group_by(participant_id) |>
-  summarise(
-    surv = first(fu_surv),          # Get survival status for each participant
-    dtc_ever = first(dtc_ever),  # Get dtc_ever status for each participant
-    .groups = "drop"
-  )
-
-# Create a contingency table of surv vs dtc_ever
-contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$dtc_ever)
-
-# Perform the Chi-squared test
-chisq_test <- chisq.test(contingency_table)
-
-
Warning in chisq.test(contingency_table): Chi-squared approximation may be
-incorrect
-
-
# Print the contingency table and Chi-squared test results
-print(contingency_table)
-
-
   
-     0  1
-  0  4  1
-  1 65 38
-
-
print(chisq_test)
-
-

-    Pearson's Chi-squared test with Yates' continuity correction
-
-data:  contingency_table
-X-squared = 0.084865, df = 1, p-value = 0.7708
-
-
-

Now that we have run the univariate associations for all the important demographic and clinical factors for both ctDNA and DTC, we will work on actually making our table 1, first by ctDNA status and a second table by DTC status.

-
-

4.1 Making our Table 1

-
-

4.1.1 Demographics and Clinical Factors by ctDNA Status

-
-
####### Making Table 1--first for ctDNA ######### 
-
-## Resources to try for both making Table 1 and LASSO 
-## https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html
-## lasso https://stats.stackexchange.com/questions/72251/an-example-lasso-regression-using-glmnet-for-binary-outcome 
-## https://www.r-bloggers.com/2020/05/quick-tutorial-on-lasso-regression-with-example/#google_vignette 
-
-#Table 1 Code 
-library(table1)
-
-

-Attaching package: 'table1'
-
-
-
The following objects are masked from 'package:base':
-
-    units, units<-
-
-
names(subset_data) #to choose variables 
-
-
  [1] "ID"                               "trialID"                         
-  [3] "participant_id"                   "patient_id"                      
-  [5] "fu_trial_pid"                     "timepoint"                       
-  [7] "project"                          "surmount_id"                     
-  [9] "panel_id"                         "accession"                       
- [11] "sample_id"                        "collection_date"                 
- [13] "extracted_plasma_volume_ml"       "input"                           
- [15] "input_sample"                     "physical_run_name"               
- [17] "workflow_name"                    "eVAF"                            
- [19] "mutant_molecules"                 "mean_VAF"                        
- [21] "Score"                            "all_pass_variants"               
- [23] "total_variants"                   "n_positive_variants"             
- [25] "ctDNA_detected"                   "ctdna_cohort"                    
- [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
- [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
- [31] "dtc_final_result_date"            "pt"                              
- [33] "bma_date"                         "ORIG_RSLT_DTC"                   
- [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
- [37] "FINAL_COUNT"                      "org_consent_date"                
- [39] "demo_initials"                    "demo_dob"                        
- [41] "demo_sex"                         "demo_ethnicity"                  
- [43] "demo_race___1"                    "demo_race___2"                   
- [45] "demo_race___3"                    "demo_race___4"                   
- [47] "demo_race___5"                    "demo_race___88"                  
- [49] "demo_race___99"                   "demo_race_other"                 
- [51] "prtx_radiation"                   "prtx_rad_start"                  
- [53] "prtx_rad_end"                     "prtx_chemo"                      
- [55] "prtx_endo"                        "prtx_bonemod"                    
- [57] "prior_therapy_complete"           "inc_dx_crit"                     
- [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
- [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
- [63] "final_receptor_group"             "demo_race_final"                 
- [65] "final_histology"                  "final_tumor_grade"               
- [67] "final_overall_stage"              "final_t_stage"                   
- [69] "final_n_stage"                    "fu_date_to"                      
- [71] "fu_surv"                          "fu_date_death"                   
- [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
- [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
- [77] "fu_locreg_prog"                   "fu_locreg_date"                  
- [79] "fu_dist_site_num"                 "fu_dist_site_char"               
- [81] "fu_dist_prog"                     "fu_dist_date"                    
- [83] "censor_date"                      "chemo_indication_1"              
- [85] "chemo_name_1"                     "chemo_name_other_1"              
- [87] "chemo_start_date_1"               "start_date_exact_1"              
- [89] "chemo_end_date_1"                 "end_date_exact_1"                
- [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
- [93] "chemo_indication_2"               "chemo_name_2"                    
- [95] "chemo_name_other_2"               "chemo_start_date_2"              
- [97] "start_date_exact_2"               "chemo_end_date_2"                
- [99] "end_date_exact_2"                 "chemo_notes_2"                   
-[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
-[103] "chemo_name_3"                     "chemo_name_other_3"              
-[105] "chemo_start_date_3"               "start_date_exact_3"              
-[107] "chemo_end_date_3"                 "end_date_exact_3"                
-[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
-[111] "chemo_indication_4"               "chemo_name_4"                    
-[113] "chemo_name_other_4"               "chemo_start_date_4"              
-[115] "start_date_exact_4"               "chemo_end_date_4"                
-[117] "end_date_exact_4"                 "chemo_notes_4"                   
-[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
-[121] "hormone_name___1_1"               "hormone_name___2_1"              
-[123] "hormone_name___3_1"               "hormone_name___4_1"              
-[125] "hormone_name___5_1"               "hormone_name___6_1"              
-[127] "hormone_name___7_1"               "hormone_other_1"                 
-[129] "hormone_start_date_1"             "hormone_ongoing_1"               
-[131] "hormone_end_date_1"               "hormone_notes_1"                 
-[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
-[135] "hormone_name___1_2"               "hormone_name___2_2"              
-[137] "hormone_name___3_2"               "hormone_name___4_2"              
-[139] "hormone_name___5_2"               "hormone_name___6_2"              
-[141] "hormone_name___7_2"               "hormone_other_2"                 
-[143] "hormone_start_date_2"             "hormone_ongoing_2"               
-[145] "hormone_end_date_2"               "hormone_notes_2"                 
-[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
-[149] "hormone_name___1_3"               "hormone_name___2_3"              
-[151] "hormone_name___3_3"               "hormone_name___4_3"              
-[153] "hormone_name___5_3"               "hormone_name___6_3"              
-[155] "hormone_name___7_3"               "hormone_other_3"                 
-[157] "hormone_start_date_3"             "hormone_ongoing_3"               
-[159] "hormone_end_date_3"               "hormone_notes_3"                 
-[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
-[163] "hormone_name___1_4"               "hormone_name___2_4"              
-[165] "hormone_name___3_4"               "hormone_name___4_4"              
-[167] "hormone_name___5_4"               "hormone_name___6_4"              
-[169] "hormone_name___7_4"               "hormone_other_4"                 
-[171] "hormone_start_date_4"             "hormone_ongoing_4"               
-[173] "hormone_end_date_4"               "hormone_notes_4"                 
-[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
-[177] "hormone_name___1_5"               "hormone_name___2_5"              
-[179] "hormone_name___3_5"               "hormone_name___4_5"              
-[181] "hormone_name___5_5"               "hormone_name___6_5"              
-[183] "hormone_name___7_5"               "hormone_other_5"                 
-[185] "hormone_start_date_5"             "hormone_ongoing_5"               
-[187] "hormone_end_date_5"               "hormone_notes_5"                 
-[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
-[191] "hormone_name___1_6"               "hormone_name___2_6"              
-[193] "hormone_name___3_6"               "hormone_name___4_6"              
-[195] "hormone_name___5_6"               "hormone_name___6_6"              
-[197] "hormone_name___7_6"               "hormone_other_6"                 
-[199] "hormone_start_date_6"             "hormone_ongoing_6"               
-[201] "hormone_end_date_6"               "hormone_notes_6"                 
-[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
-[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
-[207] "bonemod_name_1"                   "bonemod_start_date_1"            
-[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
-[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
-[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
-[215] "bonemod_name_2"                   "bonemod_start_date_2"            
-[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
-[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
-[221] "diag_lateral_1"                   "diag_menopause_1"                
-[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
-[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
-[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
-[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
-[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
-[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
-[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
-[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
-[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
-[241] "diag_er_status_1"                 "diag_er_percent_1"               
-[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
-[245] "diag_her2_status_1"               "diag_her2_method_1"              
-[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
-[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
-[251] "diag_pcr_1"                       "diag_surgery_date_1"             
-[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
-[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
-[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
-[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
-[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
-[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
-[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
-[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
-[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
-[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
-[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
-[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
-[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
-[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
-[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
-[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
-[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
-[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
-[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
-[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
-[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
-[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
-[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
-[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
-[301] "mammoprint_1"                     "mammoprint_result_1"             
-[303] "diag_notes_1"                     "diag_date_2"                     
-[305] "diag_lateral_2"                   "diag_menopause_2"                
-[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
-[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
-[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
-[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
-[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
-[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
-[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
-[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
-[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
-[325] "diag_er_status_2"                 "diag_er_percent_2"               
-[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
-[329] "diag_her2_status_2"               "diag_her2_method_2"              
-[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
-[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
-[335] "diag_pcr_2"                       "diag_surgery_date_2"             
-[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
-[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
-[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
-[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
-[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
-[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
-[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
-[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
-[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
-[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
-[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
-[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
-[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
-[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
-[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
-[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
-[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
-[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
-[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
-[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
-[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
-[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
-[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
-[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
-[385] "mammoprint_2"                     "mammoprint_result_2"             
-[387] "diag_notes_2"                     "ctDNA_ever"                      
-[389] "dtc_ever"                         "ever_relapsed"                   
-[391] "age_at_diag"                      "HR_status"                       
-[393] "histology_category"               "node_status"                     
-[395] "axillary_dissection"             
-
-
library(dplyr)
-library(tidyr)
-library(stringr)
-
-# Prepare the dataset
-unique_subset_data <- subset_data |>
-  mutate(
-    # Convert "Missing" and 99 to NA in relevant columns
-    final_t_stage = na_if(as.character(final_t_stage), "Missing"),
-    final_t_stage = na_if(final_t_stage, "99"),
-    final_overall_stage = na_if(as.character(final_overall_stage), "Missing"),
-    final_overall_stage = na_if(final_overall_stage, "99"),
-    final_tumor_grade = na_if(final_tumor_grade, 3),
-    diag_pcr_1 = na_if(diag_pcr_1, "."),
-    # Replace 99 with NA in all numeric columns
-    across(where(is.numeric), ~ na_if(.x, 99))
-  )  |>
-  group_by(participant_id) |>
-  summarize(
-    age_at_diag = first(na.omit(age_at_diag)),
-    final_receptor_group = first(na.omit(final_receptor_group)),
-    demo_race_final = first(na.omit(demo_race_final)),
-    final_tumor_grade = first(na.omit(final_tumor_grade)),
-    final_overall_stage = first(na.omit(final_overall_stage)),
-    final_t_stage = first(na.omit(final_t_stage)),
-    final_n_stage = first(na.omit(final_n_stage)),
-    histology_category = first(na.omit(histology_category)),
-    prtx_radiation = first(na.omit(prtx_radiation)),
-    prtx_chemo = first(na.omit(prtx_chemo)),
-    prtx_endo = first(na.omit(prtx_endo)),
-    prtx_bonemod = first(na.omit(prtx_bonemod)),
-    node_status = first(na.omit(node_status)),
-    axillary_dissection = first(na.omit(axillary_dissection)),
-    diag_surgery_type_1 = first(na.omit(diag_surgery_type_1)),
-    diag_neoadj_chemo_1 = first(na.omit(diag_neoadj_chemo_1)), diag_pcr_1 = first(na.omit(diag_pcr_1)),
-    ctDNA_ever = first(na.omit(ctDNA_ever))
-  )
-
-#######
-#add labels for 
-#final_receptor_group
-#demo_race_final
-#final_tumor_grade
-#final_overall_tage
-#final_t_stage) 
-#final_n_stage 
-#histology_category
-#prtx_radiation 
-#prtx_chemo) 
-#prtx_endo
-#prtx_bonemod 
-#node_status) 
-#axillary_dissection 
-#diag_surgery_type_1
-#diag_neoadj_chemo_1 
-#ctDNA_ever 
-#diag_pcr_1
-
-
-label(unique_subset_data$age_at_diag) <- "Age at Diagnosis"
-units(unique_subset_data$age_at_diag)       <- "years"
-
-#Final receptor group: 1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+'
-
-
-# assign `final_receptor_group` factor levels and labels to `unique_subset_data`
-unique_subset_data <- unique_subset_data |>
-  mutate(
-    final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4),
-                                  labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+"))
-  )
-
-label(unique_subset_data$final_receptor_group)       <- "Final Receptor Group"
-
-table(unique_subset_data$final_receptor_group)
-
-

-     TNBC HR+ HER2- HR+ HER2+ HR- HER2+ 
-       45        52         8         4 
-
-
##demo_race_final 
-
-table(unique_subset_data$demo_race_final) #1, 3, 5 -- 5 = white, 1 = black, 3 = asian 
-
-

- 1  3  5 
- 9  1 99 
-
-
unique_subset_data$demo_race_final <- 
-  factor(unique_subset_data$demo_race_final, levels=c(1,3,5),
-         labels=c("Black", 
-                  "Asian", "White"))
-label(unique_subset_data$demo_race_final)  <- "Race"
-table(unique_subset_data$demo_race_final) 
-
-

-Black Asian White 
-    9     1    99 
-
-
#final_tumor_grade 
-table(unique_subset_data$final_tumor_grade) # 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not Reported. Added 3 to NA up above for table 1 code so it will be considered N/A. 
-
-

- 0  1  2 
-79 22  6 
-
-
unique_subset_data$final_tumor_grade <- 
-  factor(unique_subset_data$final_tumor_grade, levels=c(0,1,2),
-         labels=c("Grade 3", 
-                  "Grade 1", "Grade 2"))
-label(unique_subset_data$final_tumor_grade)  <- "Tumor Grade"
-table(unique_subset_data$final_tumor_grade) 
-
-

-Grade 3 Grade 1 Grade 2 
-     79      22       6 
-
-
#final_overall_stage
-
-table(unique_subset_data$final_overall_stage) # 1 = stage I 2 = stage II 3 = stage III  
-
-

- 1  2  3 
-35 47 26 
-
-
unique_subset_data$final_overall_stage <- 
-  factor(unique_subset_data$final_overall_stage, levels=c(1,2,3),
-         labels=c("Stage I", 
-                  "Stage II", "Stage III"))
-label(unique_subset_data$final_overall_stage)  <- "Overall Stage"
-table(unique_subset_data$final_overall_stage) 
-
-

-  Stage I  Stage II Stage III 
-       35        47        26 
-
-
#final_t_stage
-table(unique_subset_data$final_t_stage) # 1 = T1 2 = T2 3 = T3 4 = T4  
-
-

- 1  2  3  4 
-51 44 12  1 
-
-
unique_subset_data$final_t_stage <- 
-  factor(unique_subset_data$final_t_stage, levels=c(1,2,3,4),
-         labels=c("T1", 
-                  "T2", "T3", "T4"))
-label(unique_subset_data$final_t_stage)  <- "T Stage"
-table(unique_subset_data$final_t_stage) 
-
-

-T1 T2 T3 T4 
-51 44 12  1 
-
-
#final_n_stage 
-
-table(unique_subset_data$final_n_stage) # 0 = N0 1 = N1 2 = N2 3 = N3 
-
-

- 0  1  2  3 
-46 43 13  7 
-
-
unique_subset_data$final_n_stage <- 
-  factor(unique_subset_data$final_n_stage, levels=c(0,1,2,3),
-         labels=c("N0", 
-                  "N1", "N2", "N3"))
-label(unique_subset_data$final_n_stage)  <- "N Stage"
-table(unique_subset_data$final_n_stage) 
-
-

-N0 N1 N2 N3 
-46 43 13  7 
-
-
#histology_category
-
-table(unique_subset_data$histology_category) #These are labeled already correctly as both ductal and lobular, ductal, lobular, and other  
-
-

-Both Ductal and Lobular                  Ductal                 Lobular 
-                      9                      84                      14 
-                  Other 
-                      2 
-
-
label(unique_subset_data$histology_category)  <- "Histology Category"
-
-
-#prtx_radiation 
-
-table(unique_subset_data$prtx_radiation) #1 = radiation 0 = no 
-
-

- 0  1 
-34 75 
-
-
unique_subset_data$prtx_radiation <- 
-  factor(unique_subset_data$prtx_radiation, levels=c(0,1),
-         labels=c("No Radiation", "Radiation"))
-label(unique_subset_data$prtx_radiation)  <- "Radiation"
-table(unique_subset_data$prtx_radiation)
-
-

-No Radiation    Radiation 
-          34           75 
-
-
#prtx_chemo
-
-table(unique_subset_data$prtx_chemo) #1 = chemo 0 = no 
-
-

-  0   1 
-  3 106 
-
-
table(subset_data$prtx_chemo)
-
-

-  0   1 
- 18 380 
-
-
unique_subset_data$prtx_chemo <- 
-factor(unique_subset_data$prtx_chemo, levels=c(0,1),
-         labels=c("No Chemo", "Chemo"))
-label(unique_subset_data$prtx_chemo)  <- "Chemo"
-table(unique_subset_data$prtx_chemo)
-
-

-No Chemo    Chemo 
-       3      106 
-
-
#prtx_endo
-
-
-table(unique_subset_data$prtx_endo) #1 = hormone therapy 0 = no 
-
-

- 0  1 
-47 62 
-
-
table(subset_data$prtx_endo)
-
-

-  0   1 
-156 242 
-
-
unique_subset_data$prtx_endo <- 
-factor(unique_subset_data$prtx_endo, levels=c(0,1),
-         labels=c("No Endocrine Therapy", "Endocrine Therapy"))
-label(unique_subset_data$prtx_endo)  <- "Endocrine Therapy"
-table(unique_subset_data$prtx_endo)
-
-

-No Endocrine Therapy    Endocrine Therapy 
-                  47                   62 
-
-
#prtx_bonemod 
-
-table(unique_subset_data$prtx_bonemod) #1 = bonemod 0 = no 
-
-

- 0  1 
-70 39 
-
-
table(unique_subset_data$prtx_bonemod)
-
-

- 0  1 
-70 39 
-
-
unique_subset_data$prtx_bonemod <- 
-factor(unique_subset_data$prtx_bonemod, levels=c(0,1),
-         labels=c("No Bone Modifying Treatment", "Bone Modifying Treatment"))
-label(unique_subset_data$prtx_bonemod)  <- "Bone Modifying Treatment"
-table(unique_subset_data$prtx_bonemod)
-
-

-No Bone Modifying Treatment    Bone Modifying Treatment 
-                         70                          39 
-
-
#node_status 
-table(unique_subset_data$node_status) #already positive and negative  
-
-

-Node Negative Node Positive 
-           46            63 
-
-
label(unique_subset_data$node_status)  <- "Node Status"
-
-#axillary_dissection 
-
-table(unique_subset_data$axillary_dissection) #1 = axillary dissection 0 = no dissection
-
-

- 0  1 
-54 55 
-
-
unique_subset_data$axillary_dissection <- 
-factor(unique_subset_data$axillary_dissection, levels=c(0,1),
-         labels=c("No Axillary Dissection", "Axillary Dissection"))
-label(unique_subset_data$axillary_dissection)  <- "Axillary Dissection"
-table(unique_subset_data$axillary_dissection)
-
-

-No Axillary Dissection    Axillary Dissection 
-                    54                     55 
-
-
#diag_surgery_type_1
-table(unique_subset_data$diag_surgery_type_1) #1 = Lumpectomy 2 = Mastectomy
-
-

- 1  2 
-45 64 
-
-
unique_subset_data$diag_surgery_type_1 <- 
-factor(unique_subset_data$diag_surgery_type_1, levels=c(1,2),
-         labels=c("Lumpectomy", "Mastectomy"))
-label(unique_subset_data$diag_surgery_type_1)  <- "Surgery Type"
-table(unique_subset_data$diag_surgery_type_1)
-
-

-Lumpectomy Mastectomy 
-        45         64 
-
-
#diag_neoadj_chemo_1 
-
-table(unique_subset_data$diag_neoadj_chemo_1) #1 = Neoadj Chemo 0 = No Neoadjuv 
-
-

- 0  1 
-90 19 
-
-
unique_subset_data$diag_neoadj_chemo_1 <- 
-factor(unique_subset_data$diag_neoadj_chemo_1, levels=c(0,1),
-         labels=c("No Neoadjuvant Chemo", "Neoadjuvant Chemo"))
-label(unique_subset_data$diag_neoadj_chemo_1)  <- "Neoadjuvant Chemo"
-table(unique_subset_data$diag_neoadj_chemo_1)
-
-

-No Neoadjuvant Chemo    Neoadjuvant Chemo 
-                  90                   19 
-
-
#pCR 
-table(unique_subset_data$diag_pcr_1) #1 = pCR 2 = non-PCR  
-
-

- 1  2 
- 1 18 
-
-
unique_subset_data$diag_pcr_1<- 
-factor(unique_subset_data$diag_pcr_1, levels=c(1,2),
-         labels=c("pCR", "Non-pCR"))
-label(unique_subset_data$diag_pcr_1)  <- "Pathologic Complete Response"
-table(unique_subset_data$diag_pcr_1)
-
-

-    pCR Non-pCR 
-      1      18 
-
-
#ctDNA_ever 
-table(unique_subset_data$ctDNA_ever) #FALSE = ctDNA Negative TRUE = ctDNA Positive
-
-

-FALSE  TRUE 
-  100     9 
-
-
unique_subset_data$ctDNA_ever <- 
-factor(unique_subset_data$ctDNA_ever, levels=c("FALSE", "TRUE"),
-         labels=c("ctDNA Negative", "ctDNA Positive"))
-label(unique_subset_data$ctDNA_ever)  <- "ctDNA Status"
-table(unique_subset_data$ctDNA_ever)
-
-

-ctDNA Negative ctDNA Positive 
-           100              9 
-
-
caption  <- "Table 1 by ctDNA Status"
-
-# Generate the table1 summary
-table1(
-  ~ age_at_diag + final_receptor_group + demo_race_final + 
-    final_tumor_grade + final_overall_stage + 
-    final_t_stage + final_n_stage + 
-    histology_category + prtx_radiation + 
-    prtx_chemo + prtx_endo + prtx_bonemod + 
-    node_status + axillary_dissection + 
-    diag_surgery_type_1 + diag_neoadj_chemo_1 + diag_pcr_1 | 
-    ctDNA_ever,
-  data = unique_subset_data, overall=c(left="Total"), caption=caption)
-
-
- - ------ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Table 1 by ctDNA Status
Total
-(N=109)
ctDNA Negative
-(N=100)
ctDNA Positive
-(N=9)
Age at Diagnosis (years)
Mean (SD)49.7 (9.66)49.2 (9.64)54.6 (8.94)
Median [Min, Max]49.3 [27.3, 68.9]49.0 [27.3, 68.9]55.6 [38.6, 64.4]
Final Receptor Group
TNBC45 (41.3%)44 (44.0%)1 (11.1%)
HR+ HER2-52 (47.7%)45 (45.0%)7 (77.8%)
HR+ HER2+8 (7.3%)8 (8.0%)0 (0%)
HR- HER2+4 (3.7%)3 (3.0%)1 (11.1%)
Race
Black9 (8.3%)8 (8.0%)1 (11.1%)
Asian1 (0.9%)1 (1.0%)0 (0%)
White99 (90.8%)91 (91.0%)8 (88.9%)
Tumor Grade
Grade 379 (72.5%)75 (75.0%)4 (44.4%)
Grade 122 (20.2%)17 (17.0%)5 (55.6%)
Grade 26 (5.5%)6 (6.0%)0 (0%)
Missing2 (1.8%)2 (2.0%)0 (0%)
Overall Stage
Stage I35 (32.1%)33 (33.0%)2 (22.2%)
Stage II47 (43.1%)46 (46.0%)1 (11.1%)
Stage III26 (23.9%)20 (20.0%)6 (66.7%)
Missing1 (0.9%)1 (1.0%)0 (0%)
T Stage
T151 (46.8%)48 (48.0%)3 (33.3%)
T244 (40.4%)40 (40.0%)4 (44.4%)
T312 (11.0%)11 (11.0%)1 (11.1%)
T41 (0.9%)0 (0%)1 (11.1%)
Missing1 (0.9%)1 (1.0%)0 (0%)
N Stage
N046 (42.2%)43 (43.0%)3 (33.3%)
N143 (39.4%)43 (43.0%)0 (0%)
N213 (11.9%)8 (8.0%)5 (55.6%)
N37 (6.4%)6 (6.0%)1 (11.1%)
Histology Category
Both Ductal and Lobular9 (8.3%)9 (9.0%)0 (0%)
Ductal84 (77.1%)78 (78.0%)6 (66.7%)
Lobular14 (12.8%)11 (11.0%)3 (33.3%)
Other2 (1.8%)2 (2.0%)0 (0%)
Radiation
No Radiation34 (31.2%)33 (33.0%)1 (11.1%)
Radiation75 (68.8%)67 (67.0%)8 (88.9%)
Chemo
No Chemo3 (2.8%)2 (2.0%)1 (11.1%)
Chemo106 (97.2%)98 (98.0%)8 (88.9%)
Endocrine Therapy
No Endocrine Therapy47 (43.1%)45 (45.0%)2 (22.2%)
Endocrine Therapy62 (56.9%)55 (55.0%)7 (77.8%)
Bone Modifying Treatment
No Bone Modifying Treatment70 (64.2%)65 (65.0%)5 (55.6%)
Bone Modifying Treatment39 (35.8%)35 (35.0%)4 (44.4%)
Node Status
Node Negative46 (42.2%)43 (43.0%)3 (33.3%)
Node Positive63 (57.8%)57 (57.0%)6 (66.7%)
Axillary Dissection
No Axillary Dissection54 (49.5%)52 (52.0%)2 (22.2%)
Axillary Dissection55 (50.5%)48 (48.0%)7 (77.8%)
Surgery Type
Lumpectomy45 (41.3%)41 (41.0%)4 (44.4%)
Mastectomy64 (58.7%)59 (59.0%)5 (55.6%)
Neoadjuvant Chemo
No Neoadjuvant Chemo90 (82.6%)82 (82.0%)8 (88.9%)
Neoadjuvant Chemo19 (17.4%)18 (18.0%)1 (11.1%)
Pathologic Complete Response
pCR1 (0.9%)1 (1.0%)0 (0%)
Non-pCR18 (16.5%)17 (17.0%)1 (11.1%)
Missing90 (82.6%)82 (82.0%)8 (88.9%)
- -
-
-
-

We have our basic Table 1 by ctDNA status.

-
-
#Adding P-values and tests of significance to the code. 
-
-# Step 1: Create table1 output
-table1_output <- table1(
-  ~ age_at_diag + final_receptor_group + demo_race_final + 
-    final_tumor_grade + final_overall_stage + 
-    final_t_stage + final_n_stage + 
-    histology_category + prtx_radiation + 
-    prtx_chemo + prtx_endo + prtx_bonemod + 
-    node_status + axillary_dissection + 
-    diag_surgery_type_1 + diag_neoadj_chemo_1 +diag_pcr_1 | 
-    ctDNA_ever,
-  data = unique_subset_data,
-  overall = c(left = "Total"),
-  caption = "Table 1: Summary of demographic and clinical variables by ctDNA status"
-)
-
-
-####
-pvalue_function <- function(x, ...) {
-  print(x)
-  # Remove any "overall" group if present and focus only on ctDNA+ and ctDNA- comparisons
-  x <- x[!names(x) %in% "overall"]  # Filter out the "Overall" column
-  y <- unlist(x)
-  g <- factor(rep(1:length(x), times = sapply(x, length)))
-  
-  # Debugging information to check group levels and data
-  if (length(unique(g)) != 2) {
-    return(NA)  # Return NA if not comparing exactly two groups
-  }
-
-  # Perform the appropriate test based on the type of variable
-  if (is.numeric(y)) {
-    # For continuous variables, perform a t-test
-    p <- t.test(y ~ g)$p.value
-  } else {
-    # For categorical variables, perform a chi-squared test or Fisher's test
-    table_result <- table(y, g)
-    
-    # Choose the correct test based on cell counts
-    if (any(table_result < 5)) {
-      p <- fisher.test(table_result)$p.value  # Use Fisher's test for low counts
-    } else {
-      p <- chisq.test(table_result)$p.value  # Use chi-squared test otherwise
-    }
-  }
-  
-  # Format the p-value for output
-  formatted_p <- format.pval(p, digits = 3, eps = 0.001)
-  return(formatted_p)
-}
-  
-
-# Generate table1 with the p-value column
-table1_p <- table1(
-  ~ age_at_diag + final_receptor_group + demo_race_final + 
-    final_tumor_grade + final_overall_stage + 
-    final_t_stage + final_n_stage + 
-    histology_category + prtx_radiation + 
-    prtx_chemo + prtx_endo + prtx_bonemod + 
-    node_status + axillary_dissection + 
-    diag_surgery_type_1 + diag_neoadj_chemo_1 + diag_pcr_1| 
-    ctDNA_ever,
-  data = unique_subset_data,
-  overall = c(left = "Total"),
-  extra.col = list("P-value" = pvalue_function),  # Add p-value function
-  extra.col.pos = 4  # Position of the extra column
-)
-
-
$overall
-  [1] 55.89870 49.25667 52.87611 29.93840 37.00753 48.98563 63.80835 40.89802
-  [9] 43.59754 38.57632 41.77687 45.68925 59.94524 59.43600 52.14511 42.93771
- [17] 64.69541 55.14031 41.26762 39.52361 57.76044 44.42984 51.34565 42.27789
- [25] 57.05133 57.62628 54.86927 44.18891 63.62491 36.00548 55.57837 30.71595
- [33] 41.28953 59.38946 48.79945 59.15400 48.97194 59.39767 39.67967 67.68515
- [41] 41.84531 48.16975 58.07529 62.49966 46.64476 47.34565 52.09856 36.58042
- [49] 58.26146 61.76318 61.73580 39.40862 55.30459 53.10335 43.30459 48.46270
- [57] 44.07666 52.55305 56.45996 67.72621 39.59206 51.82752 58.28611 46.93498
- [65] 31.17591 55.96441 46.38741 46.33812 40.62971 37.67556 32.35318 48.75291
- [73] 56.22177 39.41136 49.76591 43.22245 36.01095 41.30322 59.57016 39.65503
- [81] 54.94593 43.50992 48.80767 62.10541 63.35934 57.31417 59.74264 66.92676
- [89] 36.30938 34.83641 55.12115 52.07118 27.33744 64.41342 56.09035 47.90691
- [97] 51.38125 41.71663 48.47639 40.52567 60.39151 52.51198 60.87064 58.61465
-[105] 38.60370 68.93634 37.84531 51.43874 52.68720
-attr(,"label")
-[1] "Age at Diagnosis"
-attr(,"units")
-[1] "years"
-
-$`ctDNA Negative`
-  [1] 55.89870 49.25667 52.87611 29.93840 37.00753 48.98563 40.89802 43.59754
-  [9] 38.57632 41.77687 45.68925 59.94524 59.43600 52.14511 42.93771 64.69541
- [17] 55.14031 41.26762 39.52361 57.76044 44.42984 51.34565 42.27789 57.05133
- [25] 57.62628 54.86927 44.18891 36.00548 30.71595 41.28953 59.38946 59.15400
- [33] 48.97194 59.39767 39.67967 67.68515 41.84531 48.16975 62.49966 46.64476
- [41] 47.34565 52.09856 36.58042 58.26146 61.76318 61.73580 39.40862 55.30459
- [49] 53.10335 43.30459 48.46270 44.07666 52.55305 56.45996 67.72621 39.59206
- [57] 51.82752 58.28611 46.93498 31.17591 55.96441 46.33812 40.62971 37.67556
- [65] 32.35318 48.75291 56.22177 39.41136 49.76591 43.22245 36.01095 41.30322
- [73] 59.57016 39.65503 54.94593 43.50992 48.80767 62.10541 63.35934 57.31417
- [81] 59.74264 66.92676 36.30938 34.83641 55.12115 27.33744 56.09035 47.90691
- [89] 51.38125 41.71663 48.47639 40.52567 60.39151 52.51198 60.87064 58.61465
- [97] 68.93634 37.84531 51.43874 52.68720
-
-$`ctDNA Positive`
-[1] 63.80835 63.62491 55.57837 48.79945 58.07529 46.38741 52.07118 64.41342
-[9] 38.60370
-
-$overall
-  [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR- HER2+
-  [8] HR+ HER2+ TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2+
- [15] TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC     
- [22] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
- [29] HR+ HER2- TNBC      TNBC      HR+ HER2+ TNBC      HR+ HER2- HR+ HER2-
- [36] HR+ HER2+ TNBC      HR+ HER2- TNBC      HR+ HER2- TNBC      TNBC     
- [43] HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2-
- [50] HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2- TNBC      HR+ HER2-
- [57] TNBC      TNBC      TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2-
- [64] TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2-
- [71] TNBC      HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2+ HR+ HER2-
- [78] TNBC      HR+ HER2- HR+ HER2- TNBC      HR- HER2+ TNBC      HR+ HER2-
- [85] HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2-
- [92] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC     
- [99] HR+ HER2- TNBC      HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2-
-[106] HR+ HER2- HR- HER2+ TNBC      HR+ HER2-
-attr(,"label")
-[1] Final Receptor Group
-Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
-
-$`ctDNA Negative`
-  [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2+
-  [8] TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2+ TNBC     
- [15] TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC      HR+ HER2-
- [22] HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC      TNBC     
- [29] HR+ HER2+ TNBC      HR+ HER2- HR+ HER2+ TNBC      HR+ HER2- TNBC     
- [36] HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2-
- [43] HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2-
- [50] TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
- [57] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2-
- [64] HR+ HER2- TNBC      HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2+
- [71] HR+ HER2- TNBC      HR+ HER2- HR+ HER2- TNBC      HR- HER2+ TNBC     
- [78] HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2-
- [85] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2-
- [92] TNBC      HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2- HR- HER2+
- [99] TNBC      HR+ HER2-
-Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
-
-$`ctDNA Positive`
-[1] HR- HER2+ HR+ HER2- TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2-
-[8] HR+ HER2- HR+ HER2-
-Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
-
-$overall
-  [1] White White White White White White Black White White White White White
- [13] White White White White White White Black White White White White White
- [25] White Black White White White White White White Black White White White
- [37] White White White White White White White White White White White Black
- [49] White White White White White White White White Black White White White
- [61] White White White White Black White White White White White White White
- [73] White White White White White White White White Black White White White
- [85] White White White White White White White White Asian White White White
- [97] White Black White White White White White White White White White White
-[109] White
-attr(,"label")
-[1] Race
-Levels: Black Asian White
-
-$`ctDNA Negative`
-  [1] White White White White White White White White White White White White
- [13] White White White White White Black White White White White White White
- [25] Black White White White White Black White White White White White White
- [37] White White White White White White Black White White White White White
- [49] White White White Black White White White White White White White Black
- [61] White White White White White White White White White White White White
- [73] White White Black White White White White White White White White White
- [85] White Asian White White White Black White White White White White White
- [97] White White White White
-Levels: Black Asian White
-
-$`ctDNA Positive`
-[1] Black White White White White White White White White
-Levels: Black Asian White
-
-$overall
-  [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [10] Grade 3 Grade 3 Grade 3 <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3
- [19] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [28] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3
- [37] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 2
- [46] Grade 1 Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 <NA>   
- [55] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
- [64] Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
- [73] Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [82] Grade 3 Grade 3 Grade 2 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
- [91] Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
-[100] Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
-[109] Grade 3
-attr(,"label")
-[1] Tumor Grade
-Levels: Grade 3 Grade 1 Grade 2
-
-$`ctDNA Negative`
-  [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [10] Grade 3 Grade 3 <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [19] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [28] Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3
- [37] Grade 3 Grade 3 Grade 2 Grade 2 Grade 1 Grade 3 Grade 1 Grade 1 Grade 1
- [46] Grade 3 Grade 3 Grade 3 <NA>    Grade 3 Grade 1 Grade 3 Grade 3 Grade 3
- [55] Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
- [64] Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [73] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 3 Grade 3 Grade 1
- [82] Grade 3 Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3
- [91] Grade 1 Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
-[100] Grade 3
-Levels: Grade 3 Grade 1 Grade 2
-
-$`ctDNA Positive`
-[1] Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 1 Grade 1 Grade 1 Grade 3
-Levels: Grade 3 Grade 1 Grade 2
-
-$overall
-  [1] Stage III Stage I   Stage III Stage II  Stage II  Stage III Stage III
-  [8] Stage III Stage II  Stage I   Stage II  Stage II  Stage II  Stage II 
- [15] Stage I   Stage II  Stage I   Stage II  Stage II  Stage I   Stage II 
- [22] Stage III Stage III Stage II  Stage II  Stage II  Stage I   Stage II 
- [29] Stage III Stage II  Stage I   Stage II  Stage I   Stage II  Stage III
- [36] Stage II  Stage II  Stage III Stage I   Stage I   Stage II  Stage III
- [43] Stage II  Stage I   Stage I   Stage III Stage II  Stage III Stage II 
- [50] Stage I   Stage II  Stage II  Stage I   Stage II  Stage I   Stage III
- [57] Stage I   Stage II  Stage I   Stage II  Stage II  Stage III Stage I  
- [64] Stage II  Stage II  Stage II  Stage III Stage II  Stage I   Stage II 
- [71] Stage I   Stage II  Stage I   Stage III Stage III Stage I   Stage III
- [78] Stage I   Stage I   Stage I   Stage I   Stage II  Stage III Stage I  
- [85] Stage II  Stage I   Stage III Stage I   Stage II  Stage II  Stage II 
- [92] Stage III Stage II  Stage III Stage III Stage I   Stage II  Stage II 
- [99] <NA>      Stage I   Stage I   Stage III Stage III Stage I   Stage I  
-[106] Stage II  Stage II  Stage I   Stage II 
-attr(,"label")
-[1] Overall Stage
-Levels: Stage I Stage II Stage III
-
-$`ctDNA Negative`
-  [1] Stage III Stage I   Stage III Stage II  Stage II  Stage III Stage III
-  [8] Stage II  Stage I   Stage II  Stage II  Stage II  Stage II  Stage I  
- [15] Stage II  Stage I   Stage II  Stage II  Stage I   Stage II  Stage III
- [22] Stage III Stage II  Stage II  Stage II  Stage I   Stage II  Stage II 
- [29] Stage II  Stage I   Stage II  Stage II  Stage II  Stage III Stage I  
- [36] Stage I   Stage II  Stage III Stage I   Stage I   Stage III Stage II 
- [43] Stage III Stage II  Stage I   Stage II  Stage II  Stage I   Stage II 
- [50] Stage I   Stage III Stage I   Stage II  Stage I   Stage II  Stage II 
- [57] Stage III Stage I   Stage II  Stage II  Stage II  Stage II  Stage I  
- [64] Stage II  Stage I   Stage II  Stage I   Stage III Stage III Stage I  
- [71] Stage III Stage I   Stage I   Stage I   Stage I   Stage II  Stage III
- [78] Stage I   Stage II  Stage I   Stage III Stage I   Stage II  Stage II 
- [85] Stage II  Stage II  Stage III Stage I   Stage II  Stage II  <NA>     
- [92] Stage I   Stage I   Stage III Stage III Stage I   Stage II  Stage II 
- [99] Stage I   Stage II 
-Levels: Stage I Stage II Stage III
-
-$`ctDNA Positive`
-[1] Stage III Stage III Stage I   Stage III Stage II  Stage III Stage III
-[8] Stage III Stage I  
-Levels: Stage I Stage II Stage III
-
-$overall
-  [1] T2   T1   T2   T2   T2   T3   T2   T1   T2   T1   T2   T2   T3   T1   T1  
- [16] T2   T1   T1   T2   T1   T2   T2   T2   T2   T1   T2   T1   <NA> T2   T2  
- [31] T1   T1   T1   T3   T4   T2   T2   T1   T1   T1   T2   T3   T2   T1   T1  
- [46] T3   T1   T1   T2   T1   T2   T2   T1   T1   T1   T3   T1   T1   T1   T2  
- [61] T2   T3   T1   T2   T2   T2   T1   T1   T1   T2   T1   T2   T1   T3   T3  
- [76] T1   T2   T1   T1   T1   T1   T2   T2   T1   T2   T1   T1   T1   T2   T2  
- [91] T2   T3   T2   T2   T1   T1   T2   T1   T1   T1   T1   T3   T3   T1   T1  
-[106] T2   T2   T1   T2  
-attr(,"label")
-[1] T Stage
-Levels: T1 T2 T3 T4
-
-$`ctDNA Negative`
-  [1] T2   T1   T2   T2   T2   T3   T1   T2   T1   T2   T2   T3   T1   T1   T2  
- [16] T1   T1   T2   T1   T2   T2   T2   T2   T1   T2   T1   <NA> T2   T1   T1  
- [31] T3   T2   T2   T1   T1   T1   T2   T3   T1   T1   T3   T1   T1   T2   T1  
- [46] T2   T2   T1   T1   T1   T3   T1   T1   T1   T2   T2   T3   T1   T2   T2  
- [61] T2   T1   T1   T2   T1   T2   T1   T3   T3   T1   T2   T1   T1   T1   T1  
- [76] T2   T2   T1   T2   T1   T1   T1   T2   T2   T2   T2   T1   T1   T2   T1  
- [91] T1   T1   T1   T3   T3   T1   T2   T2   T1   T2  
-Levels: T1 T2 T3 T4
-
-$`ctDNA Positive`
-[1] T2 T2 T1 T4 T2 T1 T3 T2 T1
-Levels: T1 T2 T3 T4
-
-$overall
-  [1] N3 N0 N3 N0 N0 N2 N2 N2 N0 N0 N0 N1 N0 N1 N0 N1 N0 N1 N0 N0 N0 N3 N3 N1 N1
- [26] N0 N0 N0 N2 N0 N0 N1 N0 N0 N2 N0 N0 N2 N0 N0 N1 N2 N0 N0 N0 N2 N1 N2 N1 N0
- [51] N1 N1 N0 N1 N0 N1 N0 N1 N1 N1 N0 N1 N0 N1 N1 N0 N3 N1 N1 N1 N0 N1 N1 N3 N1
- [76] N1 N3 N0 N0 N1 N0 N1 N1 N1 N0 N0 N2 N0 N1 N1 N1 N2 N1 N2 N2 N0 N1 N1 N1 N0
-[101] N1 N1 N1 N0 N0 N1 N1 N0 N0
-attr(,"label")
-[1] N Stage
-Levels: N0 N1 N2 N3
-
-$`ctDNA Negative`
-  [1] N3 N0 N3 N0 N0 N2 N2 N0 N0 N0 N1 N0 N1 N0 N1 N0 N1 N0 N0 N0 N3 N3 N1 N1 N0
- [26] N0 N0 N0 N1 N0 N0 N0 N0 N2 N0 N0 N1 N2 N0 N0 N2 N1 N2 N1 N0 N1 N1 N0 N1 N0
- [51] N1 N0 N1 N1 N1 N0 N1 N0 N1 N1 N0 N1 N1 N1 N0 N1 N1 N3 N1 N1 N3 N0 N0 N1 N0
- [76] N1 N1 N1 N0 N0 N2 N0 N1 N1 N1 N1 N2 N0 N1 N1 N1 N0 N1 N1 N1 N0 N1 N1 N0 N0
-Levels: N0 N1 N2 N3
-
-$`ctDNA Positive`
-[1] N2 N2 N0 N2 N0 N3 N2 N2 N0
-Levels: N0 N1 N2 N3
-
-$overall
-  [1] Both Ductal and Lobular Ductal                  Ductal                 
-  [4] Ductal                  Ductal                  Lobular                
-  [7] Ductal                  Ductal                  Ductal                 
- [10] Ductal                  Ductal                  Ductal                 
- [13] Lobular                 Ductal                  Ductal                 
- [16] Ductal                  Ductal                  Ductal                 
- [19] Ductal                  Ductal                  Ductal                 
- [22] Ductal                  Ductal                  Ductal                 
- [25] Ductal                  Ductal                  Ductal                 
- [28] Ductal                  Ductal                  Ductal                 
- [31] Ductal                  Ductal                  Ductal                 
- [34] Lobular                 Ductal                  Ductal                 
- [37] Ductal                  Ductal                  Ductal                 
- [40] Ductal                  Ductal                  Ductal                 
- [43] Ductal                  Ductal                  Other                  
- [46] Lobular                 Ductal                  Ductal                 
- [49] Lobular                 Lobular                 Ductal                 
- [52] Ductal                  Ductal                  Lobular                
- [55] Ductal                  Lobular                 Ductal                 
- [58] Ductal                  Ductal                  Ductal                 
- [61] Other                   Lobular                 Ductal                 
- [64] Ductal                  Ductal                  Ductal                 
- [67] Lobular                 Ductal                  Ductal                 
- [70] Ductal                  Ductal                  Ductal                 
- [73] Both Ductal and Lobular Ductal                  Ductal                 
- [76] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
- [79] Ductal                  Both Ductal and Lobular Ductal                 
- [82] Ductal                  Ductal                  Ductal                 
- [85] Ductal                  Ductal                  Ductal                 
- [88] Ductal                  Ductal                  Ductal                 
- [91] Both Ductal and Lobular Lobular                 Ductal                 
- [94] Lobular                 Ductal                  Ductal                 
- [97] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
-[100] Ductal                  Lobular                 Ductal                 
-[103] Lobular                 Ductal                  Ductal                 
-[106] Ductal                  Ductal                  Ductal                 
-[109] Both Ductal and Lobular
-Levels: Both Ductal and Lobular Ductal Lobular Other
-
-$`ctDNA Negative`
-  [1] Both Ductal and Lobular Ductal                  Ductal                 
-  [4] Ductal                  Ductal                  Lobular                
-  [7] Ductal                  Ductal                  Ductal                 
- [10] Ductal                  Ductal                  Lobular                
- [13] Ductal                  Ductal                  Ductal                 
- [16] Ductal                  Ductal                  Ductal                 
- [19] Ductal                  Ductal                  Ductal                 
- [22] Ductal                  Ductal                  Ductal                 
- [25] Ductal                  Ductal                  Ductal                 
- [28] Ductal                  Ductal                  Ductal                 
- [31] Lobular                 Ductal                  Ductal                 
- [34] Ductal                  Ductal                  Ductal                 
- [37] Ductal                  Ductal                  Ductal                 
- [40] Other                   Lobular                 Ductal                 
- [43] Ductal                  Lobular                 Lobular                
- [46] Ductal                  Ductal                  Ductal                 
- [49] Lobular                 Ductal                  Lobular                
- [52] Ductal                  Ductal                  Ductal                 
- [55] Ductal                  Other                   Lobular                
- [58] Ductal                  Ductal                  Ductal                 
- [61] Ductal                  Ductal                  Ductal                 
- [64] Ductal                  Ductal                  Ductal                 
- [67] Both Ductal and Lobular Ductal                  Ductal                 
- [70] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
- [73] Ductal                  Both Ductal and Lobular Ductal                 
- [76] Ductal                  Ductal                  Ductal                 
- [79] Ductal                  Ductal                  Ductal                 
- [82] Ductal                  Ductal                  Ductal                 
- [85] Both Ductal and Lobular Ductal                  Ductal                 
- [88] Ductal                  Both Ductal and Lobular Ductal                 
- [91] Both Ductal and Lobular Ductal                  Lobular                
- [94] Ductal                  Lobular                 Ductal                 
- [97] Ductal                  Ductal                  Ductal                 
-[100] Both Ductal and Lobular
-Levels: Both Ductal and Lobular Ductal Lobular Other
-
-$`ctDNA Positive`
-[1] Ductal  Ductal  Ductal  Ductal  Ductal  Lobular Lobular Lobular Ductal 
-Levels: Ductal Lobular
-
-$overall
-  [1] Radiation    No Radiation No Radiation No Radiation No Radiation
-  [6] Radiation    Radiation    Radiation    Radiation    No Radiation
- [11] Radiation    Radiation    Radiation    Radiation    No Radiation
- [16] No Radiation Radiation    Radiation    No Radiation Radiation   
- [21] Radiation    Radiation    Radiation    Radiation    No Radiation
- [26] Radiation    No Radiation No Radiation Radiation    No Radiation
- [31] Radiation    Radiation    No Radiation Radiation    Radiation   
- [36] No Radiation No Radiation Radiation    No Radiation No Radiation
- [41] Radiation    Radiation    Radiation    Radiation    Radiation   
- [46] Radiation    No Radiation Radiation    Radiation    No Radiation
- [51] Radiation    Radiation    Radiation    No Radiation Radiation   
- [56] Radiation    No Radiation Radiation    Radiation    No Radiation
- [61] No Radiation Radiation    Radiation    Radiation    Radiation   
- [66] Radiation    Radiation    Radiation    No Radiation Radiation   
- [71] No Radiation Radiation    Radiation    Radiation    Radiation   
- [76] No Radiation Radiation    Radiation    Radiation    Radiation   
- [81] Radiation    Radiation    Radiation    Radiation    Radiation   
- [86] Radiation    Radiation    Radiation    Radiation    Radiation   
- [91] Radiation    Radiation    Radiation    Radiation    Radiation   
- [96] No Radiation Radiation    Radiation    No Radiation No Radiation
-[101] No Radiation Radiation    Radiation    Radiation    No Radiation
-[106] No Radiation Radiation    No Radiation No Radiation
-attr(,"label")
-[1] Radiation
-Levels: No Radiation Radiation
-
-$`ctDNA Negative`
-  [1] Radiation    No Radiation No Radiation No Radiation No Radiation
-  [6] Radiation    Radiation    Radiation    No Radiation Radiation   
- [11] Radiation    Radiation    Radiation    No Radiation No Radiation
- [16] Radiation    Radiation    No Radiation Radiation    Radiation   
- [21] Radiation    Radiation    Radiation    No Radiation Radiation   
- [26] No Radiation No Radiation No Radiation Radiation    No Radiation
- [31] Radiation    No Radiation No Radiation Radiation    No Radiation
- [36] No Radiation Radiation    Radiation    Radiation    Radiation   
- [41] Radiation    No Radiation Radiation    Radiation    No Radiation
- [46] Radiation    Radiation    Radiation    No Radiation Radiation   
- [51] Radiation    No Radiation Radiation    Radiation    No Radiation
- [56] No Radiation Radiation    Radiation    Radiation    Radiation   
- [61] Radiation    Radiation    No Radiation Radiation    No Radiation
- [66] Radiation    Radiation    Radiation    Radiation    No Radiation
- [71] Radiation    Radiation    Radiation    Radiation    Radiation   
- [76] Radiation    Radiation    Radiation    Radiation    Radiation   
- [81] Radiation    Radiation    Radiation    Radiation    Radiation   
- [86] Radiation    Radiation    No Radiation Radiation    Radiation   
- [91] No Radiation No Radiation No Radiation Radiation    Radiation   
- [96] Radiation    No Radiation Radiation    No Radiation No Radiation
-Levels: No Radiation Radiation
-
-$`ctDNA Positive`
-[1] Radiation    Radiation    Radiation    Radiation    Radiation   
-[6] Radiation    Radiation    Radiation    No Radiation
-Levels: No Radiation Radiation
-
-$overall
-  [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-  [9] Chemo    Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo   
- [17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [33] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo    Chemo   
- [41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [73] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [81] Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
- [89] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [97] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[105] Chemo    Chemo    Chemo    Chemo    Chemo   
-attr(,"label")
-[1] Chemo
-Levels: No Chemo Chemo
-
-$`ctDNA Negative`
-  [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-  [9] Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo   
- [17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [33] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [73] Chemo    Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo   
- [81] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [89] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [97] Chemo    Chemo    Chemo    Chemo   
-Levels: No Chemo Chemo
-
-$`ctDNA Positive`
-[1] Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
-[9] Chemo   
-Levels: No Chemo Chemo
-
-$overall
-  [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-  [4] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
-  [7] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [10] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [13] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [16] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [19] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [22] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [25] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [28] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [31] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [34] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [37] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [40] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
- [43] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [46] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [49] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [52] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [55] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [58] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [61] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
- [64] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [67] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [70] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
- [73] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [76] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [79] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [82] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [85] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [88] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [91] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [94] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [97] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
-[100] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
-[103] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-[106] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
-[109] Endocrine Therapy   
-attr(,"label")
-[1] Endocrine Therapy
-Levels: No Endocrine Therapy Endocrine Therapy
-
-$`ctDNA Negative`
-  [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-  [4] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
-  [7] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
- [10] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
- [13] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
- [16] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
- [19] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [22] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
- [25] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [28] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [31] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [34] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
- [37] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [40] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [43] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [46] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
- [49] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
- [52] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [55] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [58] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
- [61] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
- [64] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
- [67] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [70] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [73] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [76] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [79] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [82] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [85] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [88] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [91] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
- [94] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
- [97] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
-[100] Endocrine Therapy   
-Levels: No Endocrine Therapy Endocrine Therapy
-
-$`ctDNA Positive`
-[1] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
-[4] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-[7] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-Levels: No Endocrine Therapy Endocrine Therapy
-
-$overall
-  [1] No Bone Modifying Treatment No Bone Modifying Treatment
-  [3] No Bone Modifying Treatment No Bone Modifying Treatment
-  [5] No Bone Modifying Treatment Bone Modifying Treatment   
-  [7] No Bone Modifying Treatment No Bone Modifying Treatment
-  [9] No Bone Modifying Treatment No Bone Modifying Treatment
- [11] No Bone Modifying Treatment No Bone Modifying Treatment
- [13] No Bone Modifying Treatment No Bone Modifying Treatment
- [15] No Bone Modifying Treatment No Bone Modifying Treatment
- [17] No Bone Modifying Treatment No Bone Modifying Treatment
- [19] No Bone Modifying Treatment No Bone Modifying Treatment
- [21] No Bone Modifying Treatment Bone Modifying Treatment   
- [23] Bone Modifying Treatment    No Bone Modifying Treatment
- [25] No Bone Modifying Treatment No Bone Modifying Treatment
- [27] No Bone Modifying Treatment No Bone Modifying Treatment
- [29] No Bone Modifying Treatment No Bone Modifying Treatment
- [31] No Bone Modifying Treatment No Bone Modifying Treatment
- [33] No Bone Modifying Treatment Bone Modifying Treatment   
- [35] No Bone Modifying Treatment Bone Modifying Treatment   
- [37] No Bone Modifying Treatment Bone Modifying Treatment   
- [39] No Bone Modifying Treatment Bone Modifying Treatment   
- [41] No Bone Modifying Treatment No Bone Modifying Treatment
- [43] No Bone Modifying Treatment Bone Modifying Treatment   
- [45] Bone Modifying Treatment    Bone Modifying Treatment   
- [47] No Bone Modifying Treatment Bone Modifying Treatment   
- [49] No Bone Modifying Treatment Bone Modifying Treatment   
- [51] Bone Modifying Treatment    No Bone Modifying Treatment
- [53] No Bone Modifying Treatment No Bone Modifying Treatment
- [55] No Bone Modifying Treatment Bone Modifying Treatment   
- [57] Bone Modifying Treatment    No Bone Modifying Treatment
- [59] No Bone Modifying Treatment No Bone Modifying Treatment
- [61] No Bone Modifying Treatment Bone Modifying Treatment   
- [63] No Bone Modifying Treatment No Bone Modifying Treatment
- [65] No Bone Modifying Treatment No Bone Modifying Treatment
- [67] Bone Modifying Treatment    No Bone Modifying Treatment
- [69] Bone Modifying Treatment    Bone Modifying Treatment   
- [71] No Bone Modifying Treatment No Bone Modifying Treatment
- [73] No Bone Modifying Treatment Bone Modifying Treatment   
- [75] Bone Modifying Treatment    Bone Modifying Treatment   
- [77] Bone Modifying Treatment    No Bone Modifying Treatment
- [79] Bone Modifying Treatment    Bone Modifying Treatment   
- [81] No Bone Modifying Treatment No Bone Modifying Treatment
- [83] No Bone Modifying Treatment Bone Modifying Treatment   
- [85] Bone Modifying Treatment    Bone Modifying Treatment   
- [87] Bone Modifying Treatment    No Bone Modifying Treatment
- [89] No Bone Modifying Treatment Bone Modifying Treatment   
- [91] Bone Modifying Treatment    Bone Modifying Treatment   
- [93] Bone Modifying Treatment    Bone Modifying Treatment   
- [95] Bone Modifying Treatment    No Bone Modifying Treatment
- [97] Bone Modifying Treatment    No Bone Modifying Treatment
- [99] No Bone Modifying Treatment No Bone Modifying Treatment
-[101] Bone Modifying Treatment    No Bone Modifying Treatment
-[103] Bone Modifying Treatment    No Bone Modifying Treatment
-[105] Bone Modifying Treatment    No Bone Modifying Treatment
-[107] No Bone Modifying Treatment No Bone Modifying Treatment
-[109] No Bone Modifying Treatment
-attr(,"label")
-[1] Bone Modifying Treatment
-Levels: No Bone Modifying Treatment Bone Modifying Treatment
-
-$`ctDNA Negative`
-  [1] No Bone Modifying Treatment No Bone Modifying Treatment
-  [3] No Bone Modifying Treatment No Bone Modifying Treatment
-  [5] No Bone Modifying Treatment Bone Modifying Treatment   
-  [7] No Bone Modifying Treatment No Bone Modifying Treatment
-  [9] No Bone Modifying Treatment No Bone Modifying Treatment
- [11] No Bone Modifying Treatment No Bone Modifying Treatment
- [13] No Bone Modifying Treatment No Bone Modifying Treatment
- [15] No Bone Modifying Treatment No Bone Modifying Treatment
- [17] No Bone Modifying Treatment No Bone Modifying Treatment
- [19] No Bone Modifying Treatment No Bone Modifying Treatment
- [21] Bone Modifying Treatment    Bone Modifying Treatment   
- [23] No Bone Modifying Treatment No Bone Modifying Treatment
- [25] No Bone Modifying Treatment No Bone Modifying Treatment
- [27] No Bone Modifying Treatment No Bone Modifying Treatment
- [29] No Bone Modifying Treatment No Bone Modifying Treatment
- [31] Bone Modifying Treatment    Bone Modifying Treatment   
- [33] No Bone Modifying Treatment Bone Modifying Treatment   
- [35] No Bone Modifying Treatment Bone Modifying Treatment   
- [37] No Bone Modifying Treatment No Bone Modifying Treatment
- [39] Bone Modifying Treatment    Bone Modifying Treatment   
- [41] Bone Modifying Treatment    No Bone Modifying Treatment
- [43] Bone Modifying Treatment    No Bone Modifying Treatment
- [45] Bone Modifying Treatment    Bone Modifying Treatment   
- [47] No Bone Modifying Treatment No Bone Modifying Treatment
- [49] No Bone Modifying Treatment No Bone Modifying Treatment
- [51] Bone Modifying Treatment    Bone Modifying Treatment   
- [53] No Bone Modifying Treatment No Bone Modifying Treatment
- [55] No Bone Modifying Treatment No Bone Modifying Treatment
- [57] Bone Modifying Treatment    No Bone Modifying Treatment
- [59] No Bone Modifying Treatment No Bone Modifying Treatment
- [61] No Bone Modifying Treatment No Bone Modifying Treatment
- [63] Bone Modifying Treatment    Bone Modifying Treatment   
- [65] No Bone Modifying Treatment No Bone Modifying Treatment
- [67] No Bone Modifying Treatment Bone Modifying Treatment   
- [69] Bone Modifying Treatment    Bone Modifying Treatment   
- [71] Bone Modifying Treatment    No Bone Modifying Treatment
- [73] Bone Modifying Treatment    Bone Modifying Treatment   
- [75] No Bone Modifying Treatment No Bone Modifying Treatment
- [77] No Bone Modifying Treatment Bone Modifying Treatment   
- [79] Bone Modifying Treatment    Bone Modifying Treatment   
- [81] Bone Modifying Treatment    No Bone Modifying Treatment
- [83] No Bone Modifying Treatment Bone Modifying Treatment   
- [85] Bone Modifying Treatment    Bone Modifying Treatment   
- [87] Bone Modifying Treatment    No Bone Modifying Treatment
- [89] Bone Modifying Treatment    No Bone Modifying Treatment
- [91] No Bone Modifying Treatment No Bone Modifying Treatment
- [93] Bone Modifying Treatment    No Bone Modifying Treatment
- [95] Bone Modifying Treatment    No Bone Modifying Treatment
- [97] No Bone Modifying Treatment No Bone Modifying Treatment
- [99] No Bone Modifying Treatment No Bone Modifying Treatment
-Levels: No Bone Modifying Treatment Bone Modifying Treatment
-
-$`ctDNA Positive`
-[1] No Bone Modifying Treatment No Bone Modifying Treatment
-[3] No Bone Modifying Treatment No Bone Modifying Treatment
-[5] No Bone Modifying Treatment Bone Modifying Treatment   
-[7] Bone Modifying Treatment    Bone Modifying Treatment   
-[9] Bone Modifying Treatment   
-Levels: No Bone Modifying Treatment Bone Modifying Treatment
-
-$overall
-  [1] Node Positive Node Negative Node Positive Node Negative Node Negative
-  [6] Node Positive Node Positive Node Positive Node Negative Node Negative
- [11] Node Negative Node Positive Node Negative Node Positive Node Negative
- [16] Node Positive Node Negative Node Positive Node Negative Node Negative
- [21] Node Negative Node Positive Node Positive Node Positive Node Positive
- [26] Node Negative Node Negative Node Negative Node Positive Node Negative
- [31] Node Negative Node Positive Node Negative Node Negative Node Positive
- [36] Node Negative Node Negative Node Positive Node Negative Node Negative
- [41] Node Positive Node Positive Node Negative Node Negative Node Negative
- [46] Node Positive Node Positive Node Positive Node Positive Node Negative
- [51] Node Positive Node Positive Node Negative Node Positive Node Negative
- [56] Node Positive Node Negative Node Positive Node Positive Node Positive
- [61] Node Negative Node Positive Node Negative Node Positive Node Positive
- [66] Node Negative Node Positive Node Positive Node Positive Node Positive
- [71] Node Negative Node Positive Node Positive Node Positive Node Positive
- [76] Node Positive Node Positive Node Negative Node Negative Node Positive
- [81] Node Negative Node Positive Node Positive Node Positive Node Negative
- [86] Node Negative Node Positive Node Negative Node Positive Node Positive
- [91] Node Positive Node Positive Node Positive Node Positive Node Positive
- [96] Node Negative Node Positive Node Positive Node Positive Node Negative
-[101] Node Positive Node Positive Node Positive Node Negative Node Negative
-[106] Node Positive Node Positive Node Negative Node Negative
-Levels: Node Negative Node Positive
-
-$`ctDNA Negative`
-  [1] Node Positive Node Negative Node Positive Node Negative Node Negative
-  [6] Node Positive Node Positive Node Negative Node Negative Node Negative
- [11] Node Positive Node Negative Node Positive Node Negative Node Positive
- [16] Node Negative Node Positive Node Negative Node Negative Node Negative
- [21] Node Positive Node Positive Node Positive Node Positive Node Negative
- [26] Node Negative Node Negative Node Negative Node Positive Node Negative
- [31] Node Negative Node Negative Node Negative Node Positive Node Negative
- [36] Node Negative Node Positive Node Positive Node Negative Node Negative
- [41] Node Positive Node Positive Node Positive Node Positive Node Negative
- [46] Node Positive Node Positive Node Negative Node Positive Node Negative
- [51] Node Positive Node Negative Node Positive Node Positive Node Positive
- [56] Node Negative Node Positive Node Negative Node Positive Node Positive
- [61] Node Negative Node Positive Node Positive Node Positive Node Negative
- [66] Node Positive Node Positive Node Positive Node Positive Node Positive
- [71] Node Positive Node Negative Node Negative Node Positive Node Negative
- [76] Node Positive Node Positive Node Positive Node Negative Node Negative
- [81] Node Positive Node Negative Node Positive Node Positive Node Positive
- [86] Node Positive Node Positive Node Negative Node Positive Node Positive
- [91] Node Positive Node Negative Node Positive Node Positive Node Positive
- [96] Node Negative Node Positive Node Positive Node Negative Node Negative
-Levels: Node Negative Node Positive
-
-$`ctDNA Positive`
-[1] Node Positive Node Positive Node Negative Node Positive Node Negative
-[6] Node Positive Node Positive Node Positive Node Negative
-Levels: Node Negative Node Positive
-
-$overall
-  [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
-  [4] No Axillary Dissection No Axillary Dissection Axillary Dissection   
-  [7] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [10] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [13] No Axillary Dissection Axillary Dissection    No Axillary Dissection
- [16] Axillary Dissection    No Axillary Dissection No Axillary Dissection
- [19] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [22] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [25] Axillary Dissection    No Axillary Dissection Axillary Dissection   
- [28] No Axillary Dissection Axillary Dissection    No Axillary Dissection
- [31] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [34] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [37] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [40] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [43] Axillary Dissection    No Axillary Dissection No Axillary Dissection
- [46] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [49] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [52] No Axillary Dissection No Axillary Dissection Axillary Dissection   
- [55] No Axillary Dissection Axillary Dissection    No Axillary Dissection
- [58] No Axillary Dissection No Axillary Dissection Axillary Dissection   
- [61] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [64] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [67] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [70] Axillary Dissection    No Axillary Dissection Axillary Dissection   
- [73] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [76] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [79] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [82] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [85] No Axillary Dissection No Axillary Dissection Axillary Dissection   
- [88] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [91] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [94] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [97] Axillary Dissection    No Axillary Dissection Axillary Dissection   
-[100] No Axillary Dissection No Axillary Dissection Axillary Dissection   
-[103] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[106] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[109] No Axillary Dissection
-attr(,"label")
-[1] Axillary Dissection
-Levels: No Axillary Dissection Axillary Dissection
-
-$`ctDNA Negative`
-  [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
-  [4] No Axillary Dissection No Axillary Dissection Axillary Dissection   
-  [7] Axillary Dissection    No Axillary Dissection No Axillary Dissection
- [10] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [13] Axillary Dissection    No Axillary Dissection Axillary Dissection   
- [16] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [19] No Axillary Dissection No Axillary Dissection Axillary Dissection   
- [22] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [25] No Axillary Dissection Axillary Dissection    No Axillary Dissection
- [28] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [31] Axillary Dissection    No Axillary Dissection No Axillary Dissection
- [34] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [37] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [40] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [43] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [46] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [49] Axillary Dissection    No Axillary Dissection Axillary Dissection   
- [52] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [55] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [58] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [61] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [64] Axillary Dissection    No Axillary Dissection Axillary Dissection   
- [67] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [70] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [73] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [76] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [79] No Axillary Dissection No Axillary Dissection Axillary Dissection   
- [82] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [85] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [88] No Axillary Dissection Axillary Dissection    No Axillary Dissection
- [91] Axillary Dissection    No Axillary Dissection No Axillary Dissection
- [94] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [97] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[100] No Axillary Dissection
-Levels: No Axillary Dissection Axillary Dissection
-
-$`ctDNA Positive`
-[1] Axillary Dissection    Axillary Dissection    No Axillary Dissection
-[4] Axillary Dissection    Axillary Dissection    Axillary Dissection   
-[7] Axillary Dissection    Axillary Dissection    No Axillary Dissection
-Levels: No Axillary Dissection Axillary Dissection
-
-$overall
-  [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
-  [7] Lumpectomy Mastectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy
- [13] Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
- [19] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
- [25] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
- [31] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
- [37] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
- [43] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
- [49] Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
- [55] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
- [61] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy
- [67] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
- [73] Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
- [79] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
- [85] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
- [91] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
- [97] Mastectomy Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy
-[103] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
-[109] Mastectomy
-attr(,"label")
-[1] Surgery Type
-Levels: Lumpectomy Mastectomy
-
-$`ctDNA Negative`
-  [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
-  [7] Mastectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy
- [13] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
- [19] Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy
- [25] Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Mastectomy
- [31] Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
- [37] Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
- [43] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
- [49] Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
- [55] Mastectomy Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy
- [61] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
- [67] Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
- [73] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
- [79] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
- [85] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Lumpectomy
- [91] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy
- [97] Mastectomy Mastectomy Mastectomy Mastectomy
-Levels: Lumpectomy Mastectomy
-
-$`ctDNA Positive`
-[1] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Mastectomy
-[8] Mastectomy Mastectomy
-Levels: Lumpectomy Mastectomy
-
-$overall
-  [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-  [4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-  [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [16] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [25] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [28] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
- [31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [34] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
- [37] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [40] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [43] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
- [46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [49] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
- [52] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [55] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [58] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [61] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [64] Neoadjuvant Chemo    Neoadjuvant Chemo    Neoadjuvant Chemo   
- [67] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [70] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
- [73] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
- [76] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [79] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [82] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
- [85] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [88] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
- [91] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
- [94] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [97] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[100] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
-[103] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[106] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-[109] No Neoadjuvant Chemo
-attr(,"label")
-[1] Neoadjuvant Chemo
-Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
-
-$`ctDNA Negative`
-  [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-  [4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-  [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [16] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [25] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
- [28] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [31] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
- [34] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [37] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [40] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [43] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
- [46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [49] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [52] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [55] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [58] No Neoadjuvant Chemo Neoadjuvant Chemo    Neoadjuvant Chemo   
- [61] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
- [64] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
- [67] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
- [70] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [73] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [76] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
- [79] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [82] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
- [85] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
- [88] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [91] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
- [94] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
- [97] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-[100] No Neoadjuvant Chemo
-Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
-
-$`ctDNA Positive`
-[1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-[7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
-
-$overall
-  [1] <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
- [10] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
- [19] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
- [28] Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    Non-pCR
- [37] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>   
- [46] <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>   
- [55] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
- [64] Non-pCR Non-pCR Non-pCR <NA>    <NA>    <NA>    Non-pCR <NA>    Non-pCR
- [73] <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
- [82] Non-pCR Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR
- [91] <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
-[100] pCR     <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>   
-[109] <NA>   
-attr(,"label")
-[1] Pathologic Complete Response
-Levels: pCR Non-pCR
-
-$`ctDNA Negative`
-  [1] <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
- [10] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
- [19] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR
- [28] <NA>    <NA>    <NA>    Non-pCR Non-pCR <NA>    <NA>    <NA>    <NA>   
- [37] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>   
- [46] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
- [55] <NA>    <NA>    <NA>    <NA>    Non-pCR Non-pCR Non-pCR <NA>    <NA>   
- [64] Non-pCR <NA>    Non-pCR <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>   
- [73] <NA>    <NA>    <NA>    Non-pCR Non-pCR <NA>    <NA>    <NA>    <NA>   
- [82] <NA>    <NA>    Non-pCR <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
- [91] <NA>    pCR     <NA>    Non-pCR <NA>    <NA>    <NA>    Non-pCR <NA>   
-[100] <NA>   
-Levels: pCR Non-pCR
-
-$`ctDNA Positive`
-[1] <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
-Levels: pCR Non-pCR
-
-
table1_p #we have p-values!  
-
-
- ------- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Total
-(N=109)
ctDNA Negative
-(N=100)
ctDNA Positive
-(N=9)
P-value
Age at Diagnosis (years)0.118
Mean (SD)49.7 (9.66)49.2 (9.64)54.6 (8.94)
Median [Min, Max]49.3 [27.3, 68.9]49.0 [27.3, 68.9]55.6 [38.6, 64.4]
Final Receptor Group0.0891
TNBC45 (41.3%)44 (44.0%)1 (11.1%)
HR+ HER2-52 (47.7%)45 (45.0%)7 (77.8%)
HR+ HER2+8 (7.3%)8 (8.0%)0 (0%)
HR- HER2+4 (3.7%)3 (3.0%)1 (11.1%)
Race0.594
Black9 (8.3%)8 (8.0%)1 (11.1%)
Asian1 (0.9%)1 (1.0%)0 (0%)
White99 (90.8%)91 (91.0%)8 (88.9%)
Tumor Grade0.0366
Grade 379 (72.5%)75 (75.0%)4 (44.4%)
Grade 122 (20.2%)17 (17.0%)5 (55.6%)
Grade 26 (5.5%)6 (6.0%)0 (0%)
Missing2 (1.8%)2 (2.0%)0 (0%)
Overall Stage0.00814
Stage I35 (32.1%)33 (33.0%)2 (22.2%)
Stage II47 (43.1%)46 (46.0%)1 (11.1%)
Stage III26 (23.9%)20 (20.0%)6 (66.7%)
Missing1 (0.9%)1 (1.0%)0 (0%)
T Stage0.119
T151 (46.8%)48 (48.0%)3 (33.3%)
T244 (40.4%)40 (40.0%)4 (44.4%)
T312 (11.0%)11 (11.0%)1 (11.1%)
T41 (0.9%)0 (0%)1 (11.1%)
Missing1 (0.9%)1 (1.0%)0 (0%)
N Stage<0.001
N046 (42.2%)43 (43.0%)3 (33.3%)
N143 (39.4%)43 (43.0%)0 (0%)
N213 (11.9%)8 (8.0%)5 (55.6%)
N37 (6.4%)6 (6.0%)1 (11.1%)
Histology Category0.284
Both Ductal and Lobular9 (8.3%)9 (9.0%)0 (0%)
Ductal84 (77.1%)78 (78.0%)6 (66.7%)
Lobular14 (12.8%)11 (11.0%)3 (33.3%)
Other2 (1.8%)2 (2.0%)0 (0%)
Radiation0.268
No Radiation34 (31.2%)33 (33.0%)1 (11.1%)
Radiation75 (68.8%)67 (67.0%)8 (88.9%)
Chemo0.23
No Chemo3 (2.8%)2 (2.0%)1 (11.1%)
Chemo106 (97.2%)98 (98.0%)8 (88.9%)
Endocrine Therapy0.295
No Endocrine Therapy47 (43.1%)45 (45.0%)2 (22.2%)
Endocrine Therapy62 (56.9%)55 (55.0%)7 (77.8%)
Bone Modifying Treatment0.719
No Bone Modifying Treatment70 (64.2%)65 (65.0%)5 (55.6%)
Bone Modifying Treatment39 (35.8%)35 (35.0%)4 (44.4%)
Node Status0.731
Node Negative46 (42.2%)43 (43.0%)3 (33.3%)
Node Positive63 (57.8%)57 (57.0%)6 (66.7%)
Axillary Dissection0.161
No Axillary Dissection54 (49.5%)52 (52.0%)2 (22.2%)
Axillary Dissection55 (50.5%)48 (48.0%)7 (77.8%)
Surgery Type1
Lumpectomy45 (41.3%)41 (41.0%)4 (44.4%)
Mastectomy64 (58.7%)59 (59.0%)5 (55.6%)
Neoadjuvant Chemo1
No Neoadjuvant Chemo90 (82.6%)82 (82.0%)8 (88.9%)
Neoadjuvant Chemo19 (17.4%)18 (18.0%)1 (11.1%)
Pathologic Complete Response1
pCR1 (0.9%)1 (1.0%)0 (0%)
Non-pCR18 (16.5%)17 (17.0%)1 (11.1%)
Missing90 (82.6%)82 (82.0%)8 (88.9%)
- -
-
-
-

We can see in this Table 1 by ctDNA status, including tests of association, that the following variables have significant (p<0.05) associations: Tumor Grade (higher grade associated with positivity), overall stage (higher stage associated with positivity), N-stage (with higher N-stage seemingly associated with positivity), with trends towards significance (approaching a significant p-value) for receptor status and age at diagnosis.

-
-
-
-

4.2 Table of demographics and clinical factors by DTC status

-

Next we will create a Table to look at demographic and clinical factors by DTC status, including tests of association.

-
-
####### Table of clinical and demographic factors by DTC status ######### 
-
-# Prepare the dataset
-dtc_unique_subset_data <- subset_data |>
-  mutate(
-    # Replace "Missing" and 99 with NA in relevant columns
-    final_t_stage = na_if(as.character(final_t_stage), "Missing"),
-    final_t_stage = na_if(final_t_stage, "99"),
-    final_overall_stage = na_if(as.character(final_overall_stage), "Missing"),
-    final_overall_stage = na_if(final_overall_stage, "99"),
-    final_tumor_grade = na_if(final_tumor_grade, 3), # Assumes 3 means "Not Reported"
-    diag_pcr_1 = na_if(diag_pcr_1, "."),
-    # Replace 99 with NA in all numeric columns
-    across(where(is.numeric), ~ na_if(.x, 99))
-  ) |>
-  group_by(participant_id) |>
-  summarize(
-    # Summarize unique participant-level data
-    age_at_diag = first(na.omit(age_at_diag)),
-    final_receptor_group = first(na.omit(final_receptor_group)),
-    demo_race_final = first(na.omit(demo_race_final)),
-    final_tumor_grade = first(na.omit(final_tumor_grade)),
-    final_overall_stage = first(na.omit(final_overall_stage)),
-    final_t_stage = first(na.omit(final_t_stage)),
-    final_n_stage = first(na.omit(final_n_stage)),
-    histology_category = first(na.omit(histology_category)),
-    prtx_radiation = first(na.omit(prtx_radiation)),
-    prtx_chemo = first(na.omit(prtx_chemo)),
-    prtx_endo = first(na.omit(prtx_endo)),
-    prtx_bonemod = first(na.omit(prtx_bonemod)),
-    node_status = first(na.omit(node_status)),
-    axillary_dissection = first(na.omit(axillary_dissection)),
-    diag_surgery_type_1 = first(na.omit(diag_surgery_type_1)),
-    diag_neoadj_chemo_1 = first(na.omit(diag_neoadj_chemo_1)),
-    diag_pcr_1 = first(na.omit(diag_pcr_1)),
-    ctDNA_ever = first(na.omit(ctDNA_ever)),
-    dtc_ever = first(na.omit(dtc_ever))
-  )
-
-# Convert variables to labeled factors for table output
-dtc_unique_subset_data <- dtc_unique_subset_data |>
-  mutate(
-    final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4),
-                                  labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+")),
-    race = factor(demo_race_final, levels = c(1, 3, 5),
-                             labels = c("Black", "Asian", "White")),
-    final_tumor_grade = factor(final_tumor_grade, levels = c(0, 1, 2),
-                               labels = c("Grade 3", "Grade 1", "Grade 2")),
-    final_overall_stage = factor(final_overall_stage, levels = c(1, 2, 3),
-                                 labels = c("Stage I", "Stage II", "Stage III")),
-    final_t_stage = factor(final_t_stage, levels = c(1, 2, 3, 4),
-                           labels = c("T1", "T2", "T3", "T4")),
-    final_n_stage = factor(final_n_stage, levels = c(0, 1, 2, 3),
-                           labels = c("N0", "N1", "N2", "N3")),
-    prtx_radiation = factor(prtx_radiation, levels = c(0, 1),
-                            labels = c("No Radiation", "Radiation")),
-    prtx_chemo = factor(prtx_chemo, levels = c(0, 1),
-                        labels = c("No Chemo", "Chemo")),
-    prtx_endo = factor(prtx_endo, levels = c(0, 1),
-                       labels = c("No Endocrine Therapy", "Endocrine Therapy")),
-    prtx_bonemod = factor(prtx_bonemod, levels = c(0, 1),
-                          labels = c("No Bone Modifying Treatment", "Bone Modifying Treatment")),
-    axillary_dissection = factor(axillary_dissection, levels = c(0, 1),
-                                 labels = c("No Axillary Dissection", "Axillary Dissection")),
-    diag_surgery_type_1 = factor(diag_surgery_type_1, levels = c(1, 2),
-                                 labels = c("Lumpectomy", "Mastectomy")),
-    diag_neoadj_chemo_1 = factor(diag_neoadj_chemo_1, levels = c(0, 1),
-                                 labels = c("No Neoadjuvant Chemo", "Neoadjuvant Chemo")),
-    diag_pcr_1 = factor(diag_pcr_1, levels = c(1, 2),
-                        labels = c("pCR", "Non-pCR")),
-    ctDNA_ever = factor(ctDNA_ever, levels = c("FALSE", "TRUE"),
-                        labels = c("ctDNA Negative", "ctDNA Positive")),
-    dtc_ever = factor(dtc_ever, levels = c(0, 1),
-                      labels = c("DTC Negative", "DTC Positive"))
-  )
-
-#### Labels 
-
-label(dtc_unique_subset_data$age_at_diag) <- "Age at Diagnosis"
-units(dtc_unique_subset_data$age_at_diag)       <- "years"
-
-# assign `final_receptor_group` labels to `dc_unique_subset_data`
-label(dtc_unique_subset_data$final_receptor_group)       <- "Final Receptor Group"
-
-##demo_race_final 
-label(dtc_unique_subset_data$demo_race_final)  <- "Race"
-
-
-#final_tumor_grade 
-
-label(dtc_unique_subset_data$final_tumor_grade)  <- "Tumor Grade"
-
-
-#final_overall_stage
-
-label(dtc_unique_subset_data$final_overall_stage)  <- "Overall Stage"
-
-#final_t_stage
-label(dtc_unique_subset_data$final_t_stage)  <- "T Stage"
-
-
-#final_n_stage 
-label(dtc_unique_subset_data$final_n_stage)  <- "N Stage"
-
-#histology_category
-
-
-label(dtc_unique_subset_data$histology_category)  <- "Histology Category"
-
-
-#prtx_radiation 
-
-label(dtc_unique_subset_data$prtx_radiation)  <- "Radiation"
-
-#prtx_chemo
-label(dtc_unique_subset_data$prtx_chemo)  <- "Chemo"
-
-#prtx_endo
-label(dtc_unique_subset_data$prtx_endo)  <- "Endocrine Therapy"
-
-#prtx_bonemod 
-label(dtc_unique_subset_data$prtx_bonemod)  <- "Bone Modifying Treatment"
-
-#node_status 
-label(dtc_unique_subset_data$node_status)  <- "Node Status"
-
-#axillary_dissection 
-label(dtc_unique_subset_data$axillary_dissection)  <- "Axillary Dissection"
-
-#diag_surgery_type_1
-label(dtc_unique_subset_data$diag_surgery_type_1)  <- "Surgery Type"
-
-#diag_neoadj_chemo_1 
-
-label(dtc_unique_subset_data$diag_neoadj_chemo_1)  <- "Neoadjuvant Chemo"
-
-#pCR 
-
-label(dtc_unique_subset_data$diag_pcr_1)  <- "Pathologic Complete Response"
-
-
-#DTC_ever 
-label(dtc_unique_subset_data$ctDNA_ever)  <- "DTC Status"
-
-
-####
-
-# Step 1: Create table1 output
-table1_output <- table1(
-  ~ age_at_diag + final_receptor_group + demo_race_final + 
-    final_tumor_grade + final_overall_stage + 
-    final_t_stage + final_n_stage + 
-    histology_category + prtx_radiation + 
-    prtx_chemo + prtx_endo + prtx_bonemod + 
-    node_status + axillary_dissection + 
-    diag_surgery_type_1 + diag_neoadj_chemo_1 | 
-    dtc_ever,
-  data = dtc_unique_subset_data
-)
-
-table1_output
-
-
- ------ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DTC Negative
-(N=70)
DTC Positive
-(N=39)
Overall
-(N=109)
Age at Diagnosis (years)
Mean (SD)49.9 (9.74)49.2 (9.63)49.7 (9.66)
Median [Min, Max]51.6 [27.3, 68.9]48.8 [30.7, 67.7]49.3 [27.3, 68.9]
Final Receptor Group
TNBC25 (35.7%)20 (51.3%)45 (41.3%)
HR+ HER2-37 (52.9%)15 (38.5%)52 (47.7%)
HR+ HER2+4 (5.7%)4 (10.3%)8 (7.3%)
HR- HER2+4 (5.7%)0 (0%)4 (3.7%)
Race
Mean (SD)4.69 (1.06)4.59 (1.23)4.65 (1.12)
Median [Min, Max]5.00 [1.00, 5.00]5.00 [1.00, 5.00]5.00 [1.00, 5.00]
Tumor Grade
Grade 346 (65.7%)33 (84.6%)79 (72.5%)
Grade 118 (25.7%)4 (10.3%)22 (20.2%)
Grade 24 (5.7%)2 (5.1%)6 (5.5%)
Missing2 (2.9%)0 (0%)2 (1.8%)
Overall Stage
Stage I22 (31.4%)13 (33.3%)35 (32.1%)
Stage II29 (41.4%)18 (46.2%)47 (43.1%)
Stage III18 (25.7%)8 (20.5%)26 (23.9%)
Missing1 (1.4%)0 (0%)1 (0.9%)
T Stage
T134 (48.6%)17 (43.6%)51 (46.8%)
T227 (38.6%)17 (43.6%)44 (40.4%)
T38 (11.4%)4 (10.3%)12 (11.0%)
T40 (0%)1 (2.6%)1 (0.9%)
Missing1 (1.4%)0 (0%)1 (0.9%)
N Stage
N024 (34.3%)22 (56.4%)46 (42.2%)
N132 (45.7%)11 (28.2%)43 (39.4%)
N210 (14.3%)3 (7.7%)13 (11.9%)
N34 (5.7%)3 (7.7%)7 (6.4%)
Histology Category
Both Ductal and Lobular9 (12.9%)0 (0%)9 (8.3%)
Ductal48 (68.6%)36 (92.3%)84 (77.1%)
Lobular11 (15.7%)3 (7.7%)14 (12.8%)
Other2 (2.9%)0 (0%)2 (1.8%)
Radiation
No Radiation23 (32.9%)11 (28.2%)34 (31.2%)
Radiation47 (67.1%)28 (71.8%)75 (68.8%)
Chemo
No Chemo1 (1.4%)2 (5.1%)3 (2.8%)
Chemo69 (98.6%)37 (94.9%)106 (97.2%)
Endocrine Therapy
No Endocrine Therapy28 (40.0%)19 (48.7%)47 (43.1%)
Endocrine Therapy42 (60.0%)20 (51.3%)62 (56.9%)
Bone Modifying Treatment
No Bone Modifying Treatment45 (64.3%)25 (64.1%)70 (64.2%)
Bone Modifying Treatment25 (35.7%)14 (35.9%)39 (35.8%)
Node Status
Node Negative24 (34.3%)22 (56.4%)46 (42.2%)
Node Positive46 (65.7%)17 (43.6%)63 (57.8%)
Axillary Dissection
No Axillary Dissection31 (44.3%)23 (59.0%)54 (49.5%)
Axillary Dissection39 (55.7%)16 (41.0%)55 (50.5%)
Surgery Type
Lumpectomy31 (44.3%)14 (35.9%)45 (41.3%)
Mastectomy39 (55.7%)25 (64.1%)64 (58.7%)
Neoadjuvant Chemo
No Neoadjuvant Chemo60 (85.7%)30 (76.9%)90 (82.6%)
Neoadjuvant Chemo10 (14.3%)9 (23.1%)19 (17.4%)
- -
-
-
####
-pvalue_function <- function(x, ...) {
-  print(x)
-  # Remove any "overall" group if present and focus only on ctDNA+ and ctDNA- comparisons
-  x <- x[!names(x) %in% "overall"]  # Filter out the "Overall" column
-  y <- unlist(x)
-  g <- factor(rep(1:length(x), times = sapply(x, length)))
-  
-  # Debugging information to check group levels and data
-  if (length(unique(g)) != 2) {
-    return(NA)  # Return NA if not comparing exactly two groups
-  }
-
-  # Perform the appropriate test based on the type of variable
-  if (is.numeric(y)) {
-    # For continuous variables, perform a t-test
-    p <- t.test(y ~ g)$p.value
-  } else {
-    # For categorical variables, perform a chi-squared test or Fisher's test
-    table_result <- table(y, g)
-    
-    # Choose the correct test based on cell counts
-    if (any(table_result < 5)) {
-      p <- fisher.test(table_result)$p.value  # Use Fisher's test for low counts
-    } else {
-      p <- chisq.test(table_result)$p.value  # Use chi-squared test otherwise
-    }
-  }
-  
-  # Format the p-value for output
-  formatted_p <- format.pval(p, digits = 3, eps = 0.001)
-  return(formatted_p)
-}
-  
-
-# Generate table1 with the p-value column
-table1_dtc <- table1(
-  ~ age_at_diag + final_receptor_group + demo_race_final + 
-    final_tumor_grade + final_overall_stage + 
-    final_t_stage + final_n_stage + 
-    histology_category + prtx_radiation + 
-    prtx_chemo + prtx_endo + prtx_bonemod + 
-    node_status + axillary_dissection + 
-    diag_surgery_type_1 + diag_neoadj_chemo_1 | 
-    dtc_ever,
-  data = dtc_unique_subset_data,
-  overall = c(left = "Total"),
-  extra.col = list("P-value" = pvalue_function),  # Add p-value function
-  extra.col.pos = 4  # Position of the extra column
-)
-
-
$overall
-  [1] 55.89870 49.25667 52.87611 29.93840 37.00753 48.98563 63.80835 40.89802
-  [9] 43.59754 38.57632 41.77687 45.68925 59.94524 59.43600 52.14511 42.93771
- [17] 64.69541 55.14031 41.26762 39.52361 57.76044 44.42984 51.34565 42.27789
- [25] 57.05133 57.62628 54.86927 44.18891 63.62491 36.00548 55.57837 30.71595
- [33] 41.28953 59.38946 48.79945 59.15400 48.97194 59.39767 39.67967 67.68515
- [41] 41.84531 48.16975 58.07529 62.49966 46.64476 47.34565 52.09856 36.58042
- [49] 58.26146 61.76318 61.73580 39.40862 55.30459 53.10335 43.30459 48.46270
- [57] 44.07666 52.55305 56.45996 67.72621 39.59206 51.82752 58.28611 46.93498
- [65] 31.17591 55.96441 46.38741 46.33812 40.62971 37.67556 32.35318 48.75291
- [73] 56.22177 39.41136 49.76591 43.22245 36.01095 41.30322 59.57016 39.65503
- [81] 54.94593 43.50992 48.80767 62.10541 63.35934 57.31417 59.74264 66.92676
- [89] 36.30938 34.83641 55.12115 52.07118 27.33744 64.41342 56.09035 47.90691
- [97] 51.38125 41.71663 48.47639 40.52567 60.39151 52.51198 60.87064 58.61465
-[105] 38.60370 68.93634 37.84531 51.43874 52.68720
-attr(,"label")
-[1] "Age at Diagnosis"
-attr(,"units")
-[1] "years"
-
-$`DTC Negative`
- [1] 55.89870 49.25667 52.87611 29.93840 48.98563 63.80835 43.59754 38.57632
- [9] 45.68925 59.94524 59.43600 52.14511 55.14031 39.52361 54.86927 44.18891
-[17] 63.62491 41.28953 48.97194 59.39767 39.67967 41.84531 48.16975 46.64476
-[25] 52.09856 36.58042 58.26146 61.76318 61.73580 39.40862 53.10335 56.45996
-[33] 67.72621 39.59206 51.82752 58.28611 55.96441 46.38741 46.33812 40.62971
-[41] 37.67556 32.35318 56.22177 49.76591 43.22245 36.01095 39.65503 54.94593
-[49] 43.50992 57.31417 59.74264 66.92676 36.30938 34.83641 55.12115 52.07118
-[57] 27.33744 64.41342 56.09035 47.90691 51.38125 41.71663 48.47639 60.39151
-[65] 52.51198 60.87064 58.61465 68.93634 37.84531 52.68720
-
-$`DTC Positive`
- [1] 37.00753 40.89802 41.77687 42.93771 64.69541 41.26762 57.76044 44.42984
- [9] 51.34565 42.27789 57.05133 57.62628 36.00548 55.57837 30.71595 59.38946
-[17] 48.79945 59.15400 67.68515 58.07529 62.49966 47.34565 55.30459 43.30459
-[25] 48.46270 44.07666 52.55305 46.93498 31.17591 48.75291 39.41136 41.30322
-[33] 59.57016 48.80767 62.10541 63.35934 40.52567 38.60370 51.43874
-
-$overall
-  [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR- HER2+
-  [8] HR+ HER2+ TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2+
- [15] TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC     
- [22] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
- [29] HR+ HER2- TNBC      TNBC      HR+ HER2+ TNBC      HR+ HER2- HR+ HER2-
- [36] HR+ HER2+ TNBC      HR+ HER2- TNBC      HR+ HER2- TNBC      TNBC     
- [43] HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2-
- [50] HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2- TNBC      HR+ HER2-
- [57] TNBC      TNBC      TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2-
- [64] TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2-
- [71] TNBC      HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2+ HR+ HER2-
- [78] TNBC      HR+ HER2- HR+ HER2- TNBC      HR- HER2+ TNBC      HR+ HER2-
- [85] HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2-
- [92] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC     
- [99] HR+ HER2- TNBC      HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2-
-[106] HR+ HER2- HR- HER2+ TNBC      HR+ HER2-
-attr(,"label")
-[1] Final Receptor Group
-Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
-
-$`DTC Negative`
- [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2- HR- HER2+ TNBC     
- [8] TNBC      HR+ HER2- HR+ HER2- HR+ HER2+ TNBC      TNBC      TNBC     
-[15] TNBC      TNBC      HR+ HER2- TNBC      TNBC      HR+ HER2- TNBC     
-[22] TNBC      TNBC      HR+ HER2+ HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2-
-[29] TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2-
-[36] HR+ HER2- TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC     
-[43] HR+ HER2- TNBC      HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      HR- HER2+
-[50] HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2-
-[57] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2-
-[64] HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2- HR- HER2+ HR+ HER2-
-Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
-
-$`DTC Positive`
- [1] TNBC      HR+ HER2+ TNBC      TNBC      HR+ HER2- TNBC      TNBC     
- [8] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
-[15] HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2-
-[22] HR+ HER2- TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC     
-[29] TNBC      HR+ HER2- HR+ HER2- TNBC      HR+ HER2- TNBC      HR+ HER2-
-[36] HR+ HER2+ TNBC      HR+ HER2- TNBC     
-Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
-
-$overall
-  [1] 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 1 5 5 5 5 5 5 1 5 5 5 5
- [38] 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5
- [75] 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 3 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5
-attr(,"label")
-[1] "Race"
-
-$`DTC Negative`
- [1] 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 5
-[39] 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 3 5 5 5 5 1 5 5 5 5 5 5 5 5
-
-$`DTC Positive`
- [1] 5 5 5 5 5 1 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 5 5 1 5 5 1 5 5 5 5 5 5 5 5 5
-[39] 5
-
-$overall
-  [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [10] Grade 3 Grade 3 Grade 3 <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3
- [19] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [28] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3
- [37] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 2
- [46] Grade 1 Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 <NA>   
- [55] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
- [64] Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
- [73] Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
- [82] Grade 3 Grade 3 Grade 2 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
- [91] Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
-[100] Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
-[109] Grade 3
-attr(,"label")
-[1] Tumor Grade
-Levels: Grade 3 Grade 1 Grade 2
-
-$`DTC Negative`
- [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
-[10] <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
-[19] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 2 Grade 3 Grade 1 Grade 1
-[28] Grade 1 Grade 3 Grade 3 <NA>    Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
-[37] Grade 3 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3
-[46] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
-[55] Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
-[64] Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
-Levels: Grade 3 Grade 1 Grade 2
-
-$`DTC Positive`
- [1] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
-[10] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3
-[19] Grade 3 Grade 3 Grade 2 Grade 1 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3
-[28] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 3
-[37] Grade 3 Grade 3 Grade 3
-Levels: Grade 3 Grade 1 Grade 2
-
-$overall
-  [1] Stage III Stage I   Stage III Stage II  Stage II  Stage III Stage III
-  [8] Stage III Stage II  Stage I   Stage II  Stage II  Stage II  Stage II 
- [15] Stage I   Stage II  Stage I   Stage II  Stage II  Stage I   Stage II 
- [22] Stage III Stage III Stage II  Stage II  Stage II  Stage I   Stage II 
- [29] Stage III Stage II  Stage I   Stage II  Stage I   Stage II  Stage III
- [36] Stage II  Stage II  Stage III Stage I   Stage I   Stage II  Stage III
- [43] Stage II  Stage I   Stage I   Stage III Stage II  Stage III Stage II 
- [50] Stage I   Stage II  Stage II  Stage I   Stage II  Stage I   Stage III
- [57] Stage I   Stage II  Stage I   Stage II  Stage II  Stage III Stage I  
- [64] Stage II  Stage II  Stage II  Stage III Stage II  Stage I   Stage II 
- [71] Stage I   Stage II  Stage I   Stage III Stage III Stage I   Stage III
- [78] Stage I   Stage I   Stage I   Stage I   Stage II  Stage III Stage I  
- [85] Stage II  Stage I   Stage III Stage I   Stage II  Stage II  Stage II 
- [92] Stage III Stage II  Stage III Stage III Stage I   Stage II  Stage II 
- [99] <NA>      Stage I   Stage I   Stage III Stage III Stage I   Stage I  
-[106] Stage II  Stage II  Stage I   Stage II 
-attr(,"label")
-[1] Overall Stage
-Levels: Stage I Stage II Stage III
-
-$`DTC Negative`
- [1] Stage III Stage I   Stage III Stage II  Stage III Stage III Stage II 
- [8] Stage I   Stage II  Stage II  Stage II  Stage I   Stage II  Stage I  
-[15] Stage I   Stage II  Stage III Stage I   Stage II  Stage III Stage I  
-[22] Stage II  Stage III Stage I   Stage II  Stage III Stage II  Stage I  
-[29] Stage II  Stage II  Stage II  Stage I   Stage II  Stage II  Stage III
-[36] Stage I   Stage II  Stage III Stage II  Stage I   Stage II  Stage I  
-[43] Stage I   Stage III Stage I   Stage III Stage I   Stage I   Stage II 
-[50] Stage I   Stage III Stage I   Stage II  Stage II  Stage II  Stage III
-[57] Stage II  Stage III Stage III Stage I   Stage II  Stage II  <NA>     
-[64] Stage I   Stage III Stage III Stage I   Stage II  Stage II  Stage II 
-Levels: Stage I Stage II Stage III
-
-$`DTC Positive`
- [1] Stage II  Stage III Stage II  Stage II  Stage I   Stage II  Stage II 
- [8] Stage III Stage III Stage II  Stage II  Stage II  Stage II  Stage I  
-[15] Stage II  Stage II  Stage III Stage II  Stage I   Stage II  Stage I  
-[22] Stage III Stage I   Stage I   Stage III Stage I   Stage II  Stage II 
-[29] Stage II  Stage II  Stage III Stage I   Stage I   Stage III Stage I  
-[36] Stage II  Stage I   Stage I   Stage I  
-Levels: Stage I Stage II Stage III
-
-$overall
-  [1] T2   T1   T2   T2   T2   T3   T2   T1   T2   T1   T2   T2   T3   T1   T1  
- [16] T2   T1   T1   T2   T1   T2   T2   T2   T2   T1   T2   T1   <NA> T2   T2  
- [31] T1   T1   T1   T3   T4   T2   T2   T1   T1   T1   T2   T3   T2   T1   T1  
- [46] T3   T1   T1   T2   T1   T2   T2   T1   T1   T1   T3   T1   T1   T1   T2  
- [61] T2   T3   T1   T2   T2   T2   T1   T1   T1   T2   T1   T2   T1   T3   T3  
- [76] T1   T2   T1   T1   T1   T1   T2   T2   T1   T2   T1   T1   T1   T2   T2  
- [91] T2   T3   T2   T2   T1   T1   T2   T1   T1   T1   T1   T3   T3   T1   T1  
-[106] T2   T2   T1   T2  
-attr(,"label")
-[1] T Stage
-Levels: T1 T2 T3 T4
-
-$`DTC Negative`
- [1] T2   T1   T2   T2   T3   T2   T2   T1   T2   T3   T1   T1   T1   T1   T1  
-[16] <NA> T2   T1   T2   T1   T1   T2   T3   T1   T1   T1   T2   T1   T2   T2  
-[31] T1   T1   T2   T2   T3   T1   T2   T1   T1   T1   T2   T1   T1   T3   T1  
-[46] T2   T1   T1   T2   T1   T1   T1   T2   T2   T2   T3   T2   T2   T1   T1  
-[61] T2   T1   T1   T1   T3   T3   T1   T2   T2   T2  
-Levels: T1 T2 T3 T4
-
-$`DTC Positive`
- [1] T2 T1 T2 T2 T1 T2 T2 T2 T2 T2 T1 T2 T2 T1 T1 T3 T4 T2 T1 T2 T1 T3 T1 T1 T3
-[26] T1 T1 T2 T2 T2 T3 T1 T1 T2 T1 T2 T1 T1 T1
-Levels: T1 T2 T3 T4
-
-$overall
-  [1] N3 N0 N3 N0 N0 N2 N2 N2 N0 N0 N0 N1 N0 N1 N0 N1 N0 N1 N0 N0 N0 N3 N3 N1 N1
- [26] N0 N0 N0 N2 N0 N0 N1 N0 N0 N2 N0 N0 N2 N0 N0 N1 N2 N0 N0 N0 N2 N1 N2 N1 N0
- [51] N1 N1 N0 N1 N0 N1 N0 N1 N1 N1 N0 N1 N0 N1 N1 N0 N3 N1 N1 N1 N0 N1 N1 N3 N1
- [76] N1 N3 N0 N0 N1 N0 N1 N1 N1 N0 N0 N2 N0 N1 N1 N1 N2 N1 N2 N2 N0 N1 N1 N1 N0
-[101] N1 N1 N1 N0 N0 N1 N1 N0 N0
-attr(,"label")
-[1] N Stage
-Levels: N0 N1 N2 N3
-
-$`DTC Negative`
- [1] N3 N0 N3 N0 N2 N2 N0 N0 N1 N0 N1 N0 N1 N0 N0 N0 N2 N0 N0 N2 N0 N1 N2 N0 N1
-[26] N2 N1 N0 N1 N1 N1 N1 N1 N0 N1 N0 N0 N3 N1 N1 N1 N0 N1 N1 N1 N3 N1 N0 N1 N0
-[51] N2 N0 N1 N1 N1 N2 N1 N2 N2 N0 N1 N1 N1 N1 N1 N1 N0 N1 N1 N0
-Levels: N0 N1 N2 N3
-
-$`DTC Positive`
- [1] N0 N2 N0 N1 N0 N0 N0 N3 N3 N1 N1 N0 N0 N0 N1 N0 N2 N0 N0 N0 N0 N2 N0 N0 N1
-[26] N0 N1 N1 N1 N1 N3 N0 N0 N1 N1 N0 N0 N0 N0
-Levels: N0 N1 N2 N3
-
-$overall
-  [1] Both Ductal and Lobular Ductal                  Ductal                 
-  [4] Ductal                  Ductal                  Lobular                
-  [7] Ductal                  Ductal                  Ductal                 
- [10] Ductal                  Ductal                  Ductal                 
- [13] Lobular                 Ductal                  Ductal                 
- [16] Ductal                  Ductal                  Ductal                 
- [19] Ductal                  Ductal                  Ductal                 
- [22] Ductal                  Ductal                  Ductal                 
- [25] Ductal                  Ductal                  Ductal                 
- [28] Ductal                  Ductal                  Ductal                 
- [31] Ductal                  Ductal                  Ductal                 
- [34] Lobular                 Ductal                  Ductal                 
- [37] Ductal                  Ductal                  Ductal                 
- [40] Ductal                  Ductal                  Ductal                 
- [43] Ductal                  Ductal                  Other                  
- [46] Lobular                 Ductal                  Ductal                 
- [49] Lobular                 Lobular                 Ductal                 
- [52] Ductal                  Ductal                  Lobular                
- [55] Ductal                  Lobular                 Ductal                 
- [58] Ductal                  Ductal                  Ductal                 
- [61] Other                   Lobular                 Ductal                 
- [64] Ductal                  Ductal                  Ductal                 
- [67] Lobular                 Ductal                  Ductal                 
- [70] Ductal                  Ductal                  Ductal                 
- [73] Both Ductal and Lobular Ductal                  Ductal                 
- [76] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
- [79] Ductal                  Both Ductal and Lobular Ductal                 
- [82] Ductal                  Ductal                  Ductal                 
- [85] Ductal                  Ductal                  Ductal                 
- [88] Ductal                  Ductal                  Ductal                 
- [91] Both Ductal and Lobular Lobular                 Ductal                 
- [94] Lobular                 Ductal                  Ductal                 
- [97] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
-[100] Ductal                  Lobular                 Ductal                 
-[103] Lobular                 Ductal                  Ductal                 
-[106] Ductal                  Ductal                  Ductal                 
-[109] Both Ductal and Lobular
-Levels: Both Ductal and Lobular Ductal Lobular Other
-
-$`DTC Negative`
- [1] Both Ductal and Lobular Ductal                  Ductal                 
- [4] Ductal                  Lobular                 Ductal                 
- [7] Ductal                  Ductal                  Ductal                 
-[10] Lobular                 Ductal                  Ductal                 
-[13] Ductal                  Ductal                  Ductal                 
-[16] Ductal                  Ductal                  Ductal                 
-[19] Ductal                  Ductal                  Ductal                 
-[22] Ductal                  Ductal                  Other                  
-[25] Ductal                  Ductal                  Lobular                
-[28] Lobular                 Ductal                  Ductal                 
-[31] Lobular                 Ductal                  Ductal                 
-[34] Other                   Lobular                 Ductal                 
-[37] Ductal                  Lobular                 Ductal                 
-[40] Ductal                  Ductal                  Ductal                 
-[43] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
-[46] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
-[49] Ductal                  Ductal                  Ductal                 
-[52] Ductal                  Ductal                  Ductal                 
-[55] Both Ductal and Lobular Lobular                 Ductal                 
-[58] Lobular                 Ductal                  Ductal                 
-[61] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
-[64] Lobular                 Ductal                  Lobular                
-[67] Ductal                  Ductal                  Ductal                 
-[70] Both Ductal and Lobular
-Levels: Both Ductal and Lobular Ductal Lobular Other
-
-$`DTC Positive`
- [1] Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal 
-[10] Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Lobular Ductal  Ductal 
-[19] Ductal  Ductal  Ductal  Lobular Ductal  Ductal  Lobular Ductal  Ductal 
-[28] Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal 
-[37] Ductal  Ductal  Ductal 
-Levels: Ductal Lobular
-
-$overall
-  [1] Radiation    No Radiation No Radiation No Radiation No Radiation
-  [6] Radiation    Radiation    Radiation    Radiation    No Radiation
- [11] Radiation    Radiation    Radiation    Radiation    No Radiation
- [16] No Radiation Radiation    Radiation    No Radiation Radiation   
- [21] Radiation    Radiation    Radiation    Radiation    No Radiation
- [26] Radiation    No Radiation No Radiation Radiation    No Radiation
- [31] Radiation    Radiation    No Radiation Radiation    Radiation   
- [36] No Radiation No Radiation Radiation    No Radiation No Radiation
- [41] Radiation    Radiation    Radiation    Radiation    Radiation   
- [46] Radiation    No Radiation Radiation    Radiation    No Radiation
- [51] Radiation    Radiation    Radiation    No Radiation Radiation   
- [56] Radiation    No Radiation Radiation    Radiation    No Radiation
- [61] No Radiation Radiation    Radiation    Radiation    Radiation   
- [66] Radiation    Radiation    Radiation    No Radiation Radiation   
- [71] No Radiation Radiation    Radiation    Radiation    Radiation   
- [76] No Radiation Radiation    Radiation    Radiation    Radiation   
- [81] Radiation    Radiation    Radiation    Radiation    Radiation   
- [86] Radiation    Radiation    Radiation    Radiation    Radiation   
- [91] Radiation    Radiation    Radiation    Radiation    Radiation   
- [96] No Radiation Radiation    Radiation    No Radiation No Radiation
-[101] No Radiation Radiation    Radiation    Radiation    No Radiation
-[106] No Radiation Radiation    No Radiation No Radiation
-attr(,"label")
-[1] Radiation
-Levels: No Radiation Radiation
-
-$`DTC Negative`
- [1] Radiation    No Radiation No Radiation No Radiation Radiation   
- [6] Radiation    Radiation    No Radiation Radiation    Radiation   
-[11] Radiation    No Radiation Radiation    Radiation    No Radiation
-[16] No Radiation Radiation    No Radiation No Radiation Radiation   
-[21] No Radiation Radiation    Radiation    Radiation    No Radiation
-[26] Radiation    Radiation    No Radiation Radiation    Radiation   
-[31] No Radiation Radiation    No Radiation No Radiation Radiation   
-[36] Radiation    Radiation    Radiation    Radiation    No Radiation
-[41] Radiation    No Radiation Radiation    Radiation    No Radiation
-[46] Radiation    Radiation    Radiation    Radiation    Radiation   
-[51] Radiation    Radiation    Radiation    Radiation    Radiation   
-[56] Radiation    Radiation    Radiation    Radiation    No Radiation
-[61] Radiation    Radiation    No Radiation No Radiation Radiation   
-[66] Radiation    Radiation    No Radiation Radiation    No Radiation
-Levels: No Radiation Radiation
-
-$`DTC Positive`
- [1] No Radiation Radiation    Radiation    No Radiation Radiation   
- [6] No Radiation Radiation    Radiation    Radiation    Radiation   
-[11] No Radiation Radiation    No Radiation Radiation    Radiation   
-[16] Radiation    Radiation    No Radiation No Radiation Radiation   
-[21] Radiation    Radiation    Radiation    Radiation    Radiation   
-[26] No Radiation Radiation    Radiation    Radiation    Radiation   
-[31] Radiation    Radiation    Radiation    Radiation    Radiation   
-[36] Radiation    No Radiation No Radiation No Radiation
-Levels: No Radiation Radiation
-
-$overall
-  [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-  [9] Chemo    Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo   
- [17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [33] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo    Chemo   
- [41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [73] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [81] Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
- [89] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [97] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[105] Chemo    Chemo    Chemo    Chemo    Chemo   
-attr(,"label")
-[1] Chemo
-Levels: No Chemo Chemo
-
-$`DTC Negative`
- [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [9] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo    Chemo   
-[17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[33] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-Levels: No Chemo Chemo
-
-$`DTC Positive`
- [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
- [9] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[17] No Chemo Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
-[33] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
-Levels: No Chemo Chemo
-
-$overall
-  [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-  [4] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
-  [7] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [10] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [13] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [16] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [19] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [22] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [25] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [28] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [31] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [34] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [37] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [40] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
- [43] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [46] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [49] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [52] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [55] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [58] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [61] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
- [64] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
- [67] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [70] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
- [73] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [76] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [79] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [82] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [85] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [88] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
- [91] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [94] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [97] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
-[100] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
-[103] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-[106] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
-[109] Endocrine Therapy   
-attr(,"label")
-[1] Endocrine Therapy
-Levels: No Endocrine Therapy Endocrine Therapy
-
-$`DTC Negative`
- [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
- [4] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [7] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
-[10] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
-[13] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
-[16] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
-[19] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
-[22] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
-[25] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-[28] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
-[31] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
-[34] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
-[37] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
-[40] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
-[43] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
-[46] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
-[49] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
-[52] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
-[55] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-[58] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-[61] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
-[64] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
-[67] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
-[70] Endocrine Therapy   
-Levels: No Endocrine Therapy Endocrine Therapy
-
-$`DTC Positive`
- [1] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
- [4] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
- [7] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
-[10] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
-[13] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
-[16] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-[19] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
-[22] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
-[25] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
-[28] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
-[31] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
-[34] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
-[37] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
-Levels: No Endocrine Therapy Endocrine Therapy
-
-$overall
-  [1] No Bone Modifying Treatment No Bone Modifying Treatment
-  [3] No Bone Modifying Treatment No Bone Modifying Treatment
-  [5] No Bone Modifying Treatment Bone Modifying Treatment   
-  [7] No Bone Modifying Treatment No Bone Modifying Treatment
-  [9] No Bone Modifying Treatment No Bone Modifying Treatment
- [11] No Bone Modifying Treatment No Bone Modifying Treatment
- [13] No Bone Modifying Treatment No Bone Modifying Treatment
- [15] No Bone Modifying Treatment No Bone Modifying Treatment
- [17] No Bone Modifying Treatment No Bone Modifying Treatment
- [19] No Bone Modifying Treatment No Bone Modifying Treatment
- [21] No Bone Modifying Treatment Bone Modifying Treatment   
- [23] Bone Modifying Treatment    No Bone Modifying Treatment
- [25] No Bone Modifying Treatment No Bone Modifying Treatment
- [27] No Bone Modifying Treatment No Bone Modifying Treatment
- [29] No Bone Modifying Treatment No Bone Modifying Treatment
- [31] No Bone Modifying Treatment No Bone Modifying Treatment
- [33] No Bone Modifying Treatment Bone Modifying Treatment   
- [35] No Bone Modifying Treatment Bone Modifying Treatment   
- [37] No Bone Modifying Treatment Bone Modifying Treatment   
- [39] No Bone Modifying Treatment Bone Modifying Treatment   
- [41] No Bone Modifying Treatment No Bone Modifying Treatment
- [43] No Bone Modifying Treatment Bone Modifying Treatment   
- [45] Bone Modifying Treatment    Bone Modifying Treatment   
- [47] No Bone Modifying Treatment Bone Modifying Treatment   
- [49] No Bone Modifying Treatment Bone Modifying Treatment   
- [51] Bone Modifying Treatment    No Bone Modifying Treatment
- [53] No Bone Modifying Treatment No Bone Modifying Treatment
- [55] No Bone Modifying Treatment Bone Modifying Treatment   
- [57] Bone Modifying Treatment    No Bone Modifying Treatment
- [59] No Bone Modifying Treatment No Bone Modifying Treatment
- [61] No Bone Modifying Treatment Bone Modifying Treatment   
- [63] No Bone Modifying Treatment No Bone Modifying Treatment
- [65] No Bone Modifying Treatment No Bone Modifying Treatment
- [67] Bone Modifying Treatment    No Bone Modifying Treatment
- [69] Bone Modifying Treatment    Bone Modifying Treatment   
- [71] No Bone Modifying Treatment No Bone Modifying Treatment
- [73] No Bone Modifying Treatment Bone Modifying Treatment   
- [75] Bone Modifying Treatment    Bone Modifying Treatment   
- [77] Bone Modifying Treatment    No Bone Modifying Treatment
- [79] Bone Modifying Treatment    Bone Modifying Treatment   
- [81] No Bone Modifying Treatment No Bone Modifying Treatment
- [83] No Bone Modifying Treatment Bone Modifying Treatment   
- [85] Bone Modifying Treatment    Bone Modifying Treatment   
- [87] Bone Modifying Treatment    No Bone Modifying Treatment
- [89] No Bone Modifying Treatment Bone Modifying Treatment   
- [91] Bone Modifying Treatment    Bone Modifying Treatment   
- [93] Bone Modifying Treatment    Bone Modifying Treatment   
- [95] Bone Modifying Treatment    No Bone Modifying Treatment
- [97] Bone Modifying Treatment    No Bone Modifying Treatment
- [99] No Bone Modifying Treatment No Bone Modifying Treatment
-[101] Bone Modifying Treatment    No Bone Modifying Treatment
-[103] Bone Modifying Treatment    No Bone Modifying Treatment
-[105] Bone Modifying Treatment    No Bone Modifying Treatment
-[107] No Bone Modifying Treatment No Bone Modifying Treatment
-[109] No Bone Modifying Treatment
-attr(,"label")
-[1] Bone Modifying Treatment
-Levels: No Bone Modifying Treatment Bone Modifying Treatment
-
-$`DTC Negative`
- [1] No Bone Modifying Treatment No Bone Modifying Treatment
- [3] No Bone Modifying Treatment No Bone Modifying Treatment
- [5] Bone Modifying Treatment    No Bone Modifying Treatment
- [7] No Bone Modifying Treatment No Bone Modifying Treatment
- [9] No Bone Modifying Treatment No Bone Modifying Treatment
-[11] No Bone Modifying Treatment No Bone Modifying Treatment
-[13] No Bone Modifying Treatment No Bone Modifying Treatment
-[15] No Bone Modifying Treatment No Bone Modifying Treatment
-[17] No Bone Modifying Treatment No Bone Modifying Treatment
-[19] No Bone Modifying Treatment Bone Modifying Treatment   
-[21] No Bone Modifying Treatment No Bone Modifying Treatment
-[23] No Bone Modifying Treatment Bone Modifying Treatment   
-[25] No Bone Modifying Treatment Bone Modifying Treatment   
-[27] No Bone Modifying Treatment Bone Modifying Treatment   
-[29] Bone Modifying Treatment    No Bone Modifying Treatment
-[31] No Bone Modifying Treatment No Bone Modifying Treatment
-[33] No Bone Modifying Treatment No Bone Modifying Treatment
-[35] Bone Modifying Treatment    No Bone Modifying Treatment
-[37] No Bone Modifying Treatment Bone Modifying Treatment   
-[39] No Bone Modifying Treatment Bone Modifying Treatment   
-[41] Bone Modifying Treatment    No Bone Modifying Treatment
-[43] No Bone Modifying Treatment Bone Modifying Treatment   
-[45] Bone Modifying Treatment    Bone Modifying Treatment   
-[47] Bone Modifying Treatment    No Bone Modifying Treatment
-[49] No Bone Modifying Treatment Bone Modifying Treatment   
-[51] Bone Modifying Treatment    No Bone Modifying Treatment
-[53] No Bone Modifying Treatment Bone Modifying Treatment   
-[55] Bone Modifying Treatment    Bone Modifying Treatment   
-[57] Bone Modifying Treatment    Bone Modifying Treatment   
-[59] Bone Modifying Treatment    No Bone Modifying Treatment
-[61] Bone Modifying Treatment    No Bone Modifying Treatment
-[63] No Bone Modifying Treatment Bone Modifying Treatment   
-[65] No Bone Modifying Treatment Bone Modifying Treatment   
-[67] No Bone Modifying Treatment No Bone Modifying Treatment
-[69] No Bone Modifying Treatment No Bone Modifying Treatment
-Levels: No Bone Modifying Treatment Bone Modifying Treatment
-
-$`DTC Positive`
- [1] No Bone Modifying Treatment No Bone Modifying Treatment
- [3] No Bone Modifying Treatment No Bone Modifying Treatment
- [5] No Bone Modifying Treatment No Bone Modifying Treatment
- [7] No Bone Modifying Treatment Bone Modifying Treatment   
- [9] Bone Modifying Treatment    No Bone Modifying Treatment
-[11] No Bone Modifying Treatment No Bone Modifying Treatment
-[13] No Bone Modifying Treatment No Bone Modifying Treatment
-[15] No Bone Modifying Treatment Bone Modifying Treatment   
-[17] No Bone Modifying Treatment Bone Modifying Treatment   
-[19] Bone Modifying Treatment    No Bone Modifying Treatment
-[21] Bone Modifying Treatment    Bone Modifying Treatment   
-[23] No Bone Modifying Treatment No Bone Modifying Treatment
-[25] Bone Modifying Treatment    Bone Modifying Treatment   
-[27] No Bone Modifying Treatment No Bone Modifying Treatment
-[29] No Bone Modifying Treatment No Bone Modifying Treatment
-[31] Bone Modifying Treatment    No Bone Modifying Treatment
-[33] Bone Modifying Treatment    No Bone Modifying Treatment
-[35] Bone Modifying Treatment    Bone Modifying Treatment   
-[37] No Bone Modifying Treatment Bone Modifying Treatment   
-[39] No Bone Modifying Treatment
-Levels: No Bone Modifying Treatment Bone Modifying Treatment
-
-$overall
-  [1] Node Positive Node Negative Node Positive Node Negative Node Negative
-  [6] Node Positive Node Positive Node Positive Node Negative Node Negative
- [11] Node Negative Node Positive Node Negative Node Positive Node Negative
- [16] Node Positive Node Negative Node Positive Node Negative Node Negative
- [21] Node Negative Node Positive Node Positive Node Positive Node Positive
- [26] Node Negative Node Negative Node Negative Node Positive Node Negative
- [31] Node Negative Node Positive Node Negative Node Negative Node Positive
- [36] Node Negative Node Negative Node Positive Node Negative Node Negative
- [41] Node Positive Node Positive Node Negative Node Negative Node Negative
- [46] Node Positive Node Positive Node Positive Node Positive Node Negative
- [51] Node Positive Node Positive Node Negative Node Positive Node Negative
- [56] Node Positive Node Negative Node Positive Node Positive Node Positive
- [61] Node Negative Node Positive Node Negative Node Positive Node Positive
- [66] Node Negative Node Positive Node Positive Node Positive Node Positive
- [71] Node Negative Node Positive Node Positive Node Positive Node Positive
- [76] Node Positive Node Positive Node Negative Node Negative Node Positive
- [81] Node Negative Node Positive Node Positive Node Positive Node Negative
- [86] Node Negative Node Positive Node Negative Node Positive Node Positive
- [91] Node Positive Node Positive Node Positive Node Positive Node Positive
- [96] Node Negative Node Positive Node Positive Node Positive Node Negative
-[101] Node Positive Node Positive Node Positive Node Negative Node Negative
-[106] Node Positive Node Positive Node Negative Node Negative
-Levels: Node Negative Node Positive
-
-$`DTC Negative`
- [1] Node Positive Node Negative Node Positive Node Negative Node Positive
- [6] Node Positive Node Negative Node Negative Node Positive Node Negative
-[11] Node Positive Node Negative Node Positive Node Negative Node Negative
-[16] Node Negative Node Positive Node Negative Node Negative Node Positive
-[21] Node Negative Node Positive Node Positive Node Negative Node Positive
-[26] Node Positive Node Positive Node Negative Node Positive Node Positive
-[31] Node Positive Node Positive Node Positive Node Negative Node Positive
-[36] Node Negative Node Negative Node Positive Node Positive Node Positive
-[41] Node Positive Node Negative Node Positive Node Positive Node Positive
-[46] Node Positive Node Positive Node Negative Node Positive Node Negative
-[51] Node Positive Node Negative Node Positive Node Positive Node Positive
-[56] Node Positive Node Positive Node Positive Node Positive Node Negative
-[61] Node Positive Node Positive Node Positive Node Positive Node Positive
-[66] Node Positive Node Negative Node Positive Node Positive Node Negative
-Levels: Node Negative Node Positive
-
-$`DTC Positive`
- [1] Node Negative Node Positive Node Negative Node Positive Node Negative
- [6] Node Negative Node Negative Node Positive Node Positive Node Positive
-[11] Node Positive Node Negative Node Negative Node Negative Node Positive
-[16] Node Negative Node Positive Node Negative Node Negative Node Negative
-[21] Node Negative Node Positive Node Negative Node Negative Node Positive
-[26] Node Negative Node Positive Node Positive Node Positive Node Positive
-[31] Node Positive Node Negative Node Negative Node Positive Node Positive
-[36] Node Negative Node Negative Node Negative Node Negative
-Levels: Node Negative Node Positive
-
-$overall
-  [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
-  [4] No Axillary Dissection No Axillary Dissection Axillary Dissection   
-  [7] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [10] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [13] No Axillary Dissection Axillary Dissection    No Axillary Dissection
- [16] Axillary Dissection    No Axillary Dissection No Axillary Dissection
- [19] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [22] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [25] Axillary Dissection    No Axillary Dissection Axillary Dissection   
- [28] No Axillary Dissection Axillary Dissection    No Axillary Dissection
- [31] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [34] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [37] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [40] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [43] Axillary Dissection    No Axillary Dissection No Axillary Dissection
- [46] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [49] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [52] No Axillary Dissection No Axillary Dissection Axillary Dissection   
- [55] No Axillary Dissection Axillary Dissection    No Axillary Dissection
- [58] No Axillary Dissection No Axillary Dissection Axillary Dissection   
- [61] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [64] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [67] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [70] Axillary Dissection    No Axillary Dissection Axillary Dissection   
- [73] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [76] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [79] No Axillary Dissection No Axillary Dissection No Axillary Dissection
- [82] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [85] No Axillary Dissection No Axillary Dissection Axillary Dissection   
- [88] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [91] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [94] Axillary Dissection    Axillary Dissection    No Axillary Dissection
- [97] Axillary Dissection    No Axillary Dissection Axillary Dissection   
-[100] No Axillary Dissection No Axillary Dissection Axillary Dissection   
-[103] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[106] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[109] No Axillary Dissection
-attr(,"label")
-[1] Axillary Dissection
-Levels: No Axillary Dissection Axillary Dissection
-
-$`DTC Negative`
- [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
- [4] No Axillary Dissection Axillary Dissection    Axillary Dissection   
- [7] No Axillary Dissection No Axillary Dissection No Axillary Dissection
-[10] No Axillary Dissection Axillary Dissection    No Axillary Dissection
-[13] No Axillary Dissection No Axillary Dissection Axillary Dissection   
-[16] No Axillary Dissection Axillary Dissection    No Axillary Dissection
-[19] No Axillary Dissection No Axillary Dissection No Axillary Dissection
-[22] Axillary Dissection    Axillary Dissection    No Axillary Dissection
-[25] Axillary Dissection    Axillary Dissection    Axillary Dissection   
-[28] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[31] Axillary Dissection    No Axillary Dissection Axillary Dissection   
-[34] Axillary Dissection    Axillary Dissection    No Axillary Dissection
-[37] Axillary Dissection    Axillary Dissection    Axillary Dissection   
-[40] No Axillary Dissection Axillary Dissection    No Axillary Dissection
-[43] No Axillary Dissection Axillary Dissection    Axillary Dissection   
-[46] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[49] Axillary Dissection    No Axillary Dissection Axillary Dissection   
-[52] No Axillary Dissection Axillary Dissection    Axillary Dissection   
-[55] Axillary Dissection    Axillary Dissection    Axillary Dissection   
-[58] Axillary Dissection    Axillary Dissection    No Axillary Dissection
-[61] Axillary Dissection    No Axillary Dissection Axillary Dissection   
-[64] No Axillary Dissection Axillary Dissection    Axillary Dissection   
-[67] No Axillary Dissection Axillary Dissection    No Axillary Dissection
-[70] No Axillary Dissection
-Levels: No Axillary Dissection Axillary Dissection
-
-$`DTC Positive`
- [1] No Axillary Dissection Axillary Dissection    No Axillary Dissection
- [4] Axillary Dissection    No Axillary Dissection No Axillary Dissection
- [7] No Axillary Dissection Axillary Dissection    Axillary Dissection   
-[10] Axillary Dissection    Axillary Dissection    No Axillary Dissection
-[13] No Axillary Dissection No Axillary Dissection No Axillary Dissection
-[16] Axillary Dissection    Axillary Dissection    No Axillary Dissection
-[19] No Axillary Dissection Axillary Dissection    No Axillary Dissection
-[22] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[25] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[28] Axillary Dissection    Axillary Dissection    Axillary Dissection   
-[31] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[34] Axillary Dissection    No Axillary Dissection No Axillary Dissection
-[37] No Axillary Dissection No Axillary Dissection No Axillary Dissection
-Levels: No Axillary Dissection Axillary Dissection
-
-$overall
-  [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
-  [7] Lumpectomy Mastectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy
- [13] Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
- [19] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
- [25] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
- [31] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
- [37] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
- [43] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
- [49] Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
- [55] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
- [61] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy
- [67] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
- [73] Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
- [79] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
- [85] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
- [91] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
- [97] Mastectomy Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy
-[103] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
-[109] Mastectomy
-attr(,"label")
-[1] Surgery Type
-Levels: Lumpectomy Mastectomy
-
-$`DTC Negative`
- [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Lumpectomy
- [7] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy
-[13] Lumpectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
-[19] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
-[25] Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
-[31] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy
-[37] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
-[43] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Lumpectomy
-[49] Mastectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
-[55] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
-[61] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
-[67] Lumpectomy Mastectomy Mastectomy Mastectomy
-Levels: Lumpectomy Mastectomy
-
-$`DTC Positive`
- [1] Mastectomy Mastectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
- [7] Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
-[13] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
-[19] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
-[25] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy
-[31] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
-[37] Lumpectomy Mastectomy Mastectomy
-Levels: Lumpectomy Mastectomy
-
-$overall
-  [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-  [4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-  [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [16] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [25] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [28] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
- [31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [34] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
- [37] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [40] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [43] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
- [46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [49] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
- [52] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [55] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [58] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [61] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [64] Neoadjuvant Chemo    Neoadjuvant Chemo    Neoadjuvant Chemo   
- [67] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [70] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
- [73] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
- [76] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [79] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [82] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
- [85] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [88] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
- [91] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
- [94] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [97] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[100] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
-[103] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[106] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-[109] No Neoadjuvant Chemo
-attr(,"label")
-[1] Neoadjuvant Chemo
-Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
-
-$`DTC Negative`
- [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [4] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[16] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
-[19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[25] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
-[28] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[34] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[37] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
-[40] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-[43] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-[46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[49] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
-[52] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
-[55] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
-[58] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[61] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[64] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-[67] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
-[70] No Neoadjuvant Chemo
-Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
-
-$`DTC Positive`
- [1] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
- [4] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
- [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[16] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
-[19] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
-[22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[25] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[28] Neoadjuvant Chemo    Neoadjuvant Chemo    Neoadjuvant Chemo   
-[31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
-[34] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
-[37] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
-Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
-
-
table1_dtc #we have p-values!  
-
-
- ------- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Total
-(N=109)
DTC Negative
-(N=70)
DTC Positive
-(N=39)
P-value
Age at Diagnosis (years)0.722
Mean (SD)49.7 (9.66)49.9 (9.74)49.2 (9.63)
Median [Min, Max]49.3 [27.3, 68.9]51.6 [27.3, 68.9]48.8 [30.7, 67.7]
Final Receptor Group0.145
TNBC45 (41.3%)25 (35.7%)20 (51.3%)
HR+ HER2-52 (47.7%)37 (52.9%)15 (38.5%)
HR+ HER2+8 (7.3%)4 (5.7%)4 (10.3%)
HR- HER2+4 (3.7%)4 (5.7%)0 (0%)
Race0.683
Mean (SD)4.65 (1.12)4.69 (1.06)4.59 (1.23)
Median [Min, Max]5.00 [1.00, 5.00]5.00 [1.00, 5.00]5.00 [1.00, 5.00]
Tumor Grade0.11
Grade 379 (72.5%)46 (65.7%)33 (84.6%)
Grade 122 (20.2%)18 (25.7%)4 (10.3%)
Grade 26 (5.5%)4 (5.7%)2 (5.1%)
Missing2 (1.8%)2 (2.9%)0 (0%)
Overall Stage0.804
Stage I35 (32.1%)22 (31.4%)13 (33.3%)
Stage II47 (43.1%)29 (41.4%)18 (46.2%)
Stage III26 (23.9%)18 (25.7%)8 (20.5%)
Missing1 (0.9%)1 (1.4%)0 (0%)
T Stage0.629
T151 (46.8%)34 (48.6%)17 (43.6%)
T244 (40.4%)27 (38.6%)17 (43.6%)
T312 (11.0%)8 (11.4%)4 (10.3%)
T41 (0.9%)0 (0%)1 (2.6%)
Missing1 (0.9%)1 (1.4%)0 (0%)
N Stage0.114
N046 (42.2%)24 (34.3%)22 (56.4%)
N143 (39.4%)32 (45.7%)11 (28.2%)
N213 (11.9%)10 (14.3%)3 (7.7%)
N37 (6.4%)4 (5.7%)3 (7.7%)
Histology Category0.0157
Both Ductal and Lobular9 (8.3%)9 (12.9%)0 (0%)
Ductal84 (77.1%)48 (68.6%)36 (92.3%)
Lobular14 (12.8%)11 (15.7%)3 (7.7%)
Other2 (1.8%)2 (2.9%)0 (0%)
Radiation0.774
No Radiation34 (31.2%)23 (32.9%)11 (28.2%)
Radiation75 (68.8%)47 (67.1%)28 (71.8%)
Chemo0.291
No Chemo3 (2.8%)1 (1.4%)2 (5.1%)
Chemo106 (97.2%)69 (98.6%)37 (94.9%)
Endocrine Therapy0.497
No Endocrine Therapy47 (43.1%)28 (40.0%)19 (48.7%)
Endocrine Therapy62 (56.9%)42 (60.0%)20 (51.3%)
Bone Modifying Treatment1
No Bone Modifying Treatment70 (64.2%)45 (64.3%)25 (64.1%)
Bone Modifying Treatment39 (35.8%)25 (35.7%)14 (35.9%)
Node Status0.0414
Node Negative46 (42.2%)24 (34.3%)22 (56.4%)
Node Positive63 (57.8%)46 (65.7%)17 (43.6%)
Axillary Dissection0.204
No Axillary Dissection54 (49.5%)31 (44.3%)23 (59.0%)
Axillary Dissection55 (50.5%)39 (55.7%)16 (41.0%)
Surgery Type0.516
Lumpectomy45 (41.3%)31 (44.3%)14 (35.9%)
Mastectomy64 (58.7%)39 (55.7%)25 (64.1%)
Neoadjuvant Chemo0.37
No Neoadjuvant Chemo90 (82.6%)60 (85.7%)30 (76.9%)
Neoadjuvant Chemo19 (17.4%)10 (14.3%)9 (23.1%)
- -
-
-
-

We can see in this series of tests that there are similar, but not identical, sets of variables that appear to be significant in predicting DTC status, including: Histology category (with ductal histology more storngly correlated with positivity), Nodal status (with node positive patients more likely to have DTC positivity), with trends towards significance for N stage, receptor group, and tumor grade. We have decided to not include pCR in this table or in further analyses because the cohort of patients who received neoadjuvant therapy is only 19 patients, so the n is very low for any tests of association and there is significant missingness for the overall cohort.

-
-
-

4.3 Multivariable Analysis

-

Variable Selection and Planning: I have chosen to perform a multivariable logistic regression to identify predictors of ctDNA (and DTC) positivity as we suspect these are biomarkers of relapse and can see even in our data-set that ctDNA is strongly associated with relapse and worse overall survival. Identifying predictors of positivity would help us to understand who we might consider screening for ctDNA positivity, as this testing is expensive and takes time and resources–and may not benefit everyone. ctDNA positivity is a binary outcome, and we have performed univariable analyses as above already to look at potentially significant relationships. There are multiple types of variables worth considering, including demographic and clinical factors, disease factors (such as aggressiveness of the tumor as measured by histology and grade, the hormone receptor status of the tumor, the stage of disease at diagnosis), and treatment factors (surgery type, radiation or no radiation, neoadjuvant chemotherapy or no, endocrine aka anti-hormone therapy). The only variable we have removed from our model is pathologic complete response (whether or not patients have NO tumor at the time of surgery IF they received neoadjuvant chemo/immunotherapy before surgery) as the number of patients who received neoadjuvant therapy was not particularly high and therefore there is significant missingness (and it would not make sense to impute for this variable, as it only is a relevant factor to consider for those patients who received neoadjuvant therapy). These are all time-invarying factors, and all were present at the beginning of enrollment on study, prior to ctDNA testing. We will use all of our variables that we assessed in our initial univariable tests of association (including those that had significant associations and those that did not), as we suspect some of these variables are related to one another or colinear and therefore we cannot rely on simple univariable tests of association to determine what will be most predictive of positivity.

-

We have several variables that were significant in our univariable analyses (chi-squared). These include median age-at-diagnosis, longer time from diagnosis to enrollment, higher tumor grade, and higher initial stage at diagnosis. Variables that were not significant but that could be considered include histology, nodal positivity, higher t-stage, and receptor status. While recurrence (both distant and local) as well as worse survival are significantly associated with ctDNA positivity, these are outcomes that we think of as following ctDNA positivity temporally and therefore should not be included in our predictive model as predictors.

-

LASSO: Lasso will give us the most parsimonious model and is an automatic approach, without consideration of absolute p-value cutoffs. There is no specific “right” method to choose variables, but generally purposeful selection begins with univariable analysis which we have already performed. We considered stepwise model building based on p-values, but this approach has gone out of favor as this approach uses somewhat arbitrary p-value cutoffs and can ignore actually relevant and important variables. As we have already performed univariable tests of association above with chi-squared tests, we do not need to perform univariable logistic regression. We have removed one variable of interest that we assessed with univariable association for ctDNA because of the significant missingness in the cohort overall (and its applicability to only the small subset of patients who received neoadjuvant therapy). We will perform LASSO with our remaining variables to identify and select variables that are most predictive.

-
-
library(glmnet)
-
-
Loading required package: Matrix
-
-
-

-Attaching package: 'Matrix'
-
-
-
The following objects are masked from 'package:tidyr':
-
-    expand, pack, unpack
-
-
-
Loaded glmnet 4.1-8
-
-
set.seed(123) 
-
-# Prepare the response variable
-y <- unique_subset_data$ctDNA_ever
-
-
-#was getting an error message when I ran y and X2 because there were 4 missing observations, so will impute these as it is only 4 and missingness is lo (<10%)
-library(mice)
-
-

-Attaching package: 'mice'
-
-
-
The following object is masked from 'package:stats':
-
-    filter
-
-
-
The following objects are masked from 'package:base':
-
-    cbind, rbind
-
-
# Impute missing values (as general missingness is low as above)
-imputed_data <- mice(unique_subset_data, m = 1, method = "pmm", maxit = 5)
-
-

- iter imp variable
-  1   1  final_tumor_grade  final_overall_stage  final_t_stage
-  2   1  final_tumor_grade  final_overall_stage  final_t_stage
-  3   1  final_tumor_grade  final_overall_stage  final_t_stage
-  4   1  final_tumor_grade  final_overall_stage  final_t_stage
-  5   1  final_tumor_grade  final_overall_stage  final_t_stage
-
-
-
Warning: Number of logged events: 4
-
-
unique_subset_data <- complete(imputed_data)
-
-#-1 to not include intercept in this matrix as a predictor variable 
-X2 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+
-                                   final_tumor_grade + final_overall_stage + 
-                                      + final_t_stage + final_n_stage +  
-                                      histology_category + prtx_radiation + 
-                                      + prtx_chemo + prtx_endo + prtx_bonemod + 
-                                      node_status + axillary_dissection + 
-                                      diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data)
-
-
-
-# Fit lasso model
-lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1)  # alpha = 1 for lasso
-
-#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
-cv_lasso_model <- cv.glmnet(X2, y, family = "binomial", alpha = 1)
-
-
Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
-multinomial or binomial class has fewer than 8 observations; dangerous ground
-
-
-
Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
-multinomial or binomial class has fewer than 8 observations; dangerous ground
-
-
#plotting the results to look at the performance of different lamda 
-plot(cv_lasso_model)
-
-
-
-

-
-
-
-
#getting the best lambda  -- 0.052 
-best_lambda <- cv_lasso_model$lambda.min
-print(paste("Best lambda:", best_lambda))
-
-
[1] "Best lambda: 0.048114238291791"
-
-
#Finding the final fit model with the optimal lambda 
-final_lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1, lambda = best_lambda)
-
-#Which coefficents are included in the model. 
-coef(final_lasso_model) 
-
-
29 x 1 sparse Matrix of class "dgCMatrix"
-                                              s0
-(Intercept)                            -2.762725
-age_at_diag                             .       
-final_receptor_groupTNBC                .       
-final_receptor_groupHR+ HER2-           .       
-final_receptor_groupHR+ HER2+           .       
-final_receptor_groupHR- HER2+           .       
-demo_race_finalAsian                    .       
-demo_race_finalWhite                    .       
-final_tumor_gradeGrade 1                .       
-final_tumor_gradeGrade 2                .       
-final_overall_stageStage II             .       
-final_overall_stageStage III            .       
-final_t_stageT2                         .       
-final_t_stageT3                         .       
-final_t_stageT4                         1.189380
-final_n_stageN1                         .       
-final_n_stageN2                         1.573283
-final_n_stageN3                         .       
-histology_categoryDuctal                .       
-histology_categoryLobular               .       
-histology_categoryOther                 .       
-prtx_radiationRadiation                 .       
-prtx_chemoChemo                         .       
-prtx_endoEndocrine Therapy              .       
-prtx_bonemodBone Modifying Treatment    .       
-node_statusNode Positive                .       
-axillary_dissectionAxillary Dissection  .       
-diag_surgery_type_1Mastectomy           .       
-diag_neoadj_chemo_1Neoadjuvant Chemo    .       
-
-
-

Variables that remain significant in the LASSO for ctDNA positivity are t-stage and n-stage. It is slightly challenging to interpret these multi-level variables (such as T-stage and N stage) in the lasso but you can see that higher categories (T4, N2) are associated with positivity in the LASSO. The lambda for this model is quite low at 0.05. It is important to remember that a number of these variables are related to one another (such as T stage and N stage with final overall stage, which is built based on T and N stage), and node status + N stage (node status being built on N stage). I’ll try a few other LASSOs to see whether by eliminating one of each of these colinear variables we get different results.

-
-
library(glmnet)
-
-set.seed(123) #to ensure consistency of results 
-
-# Prepare the response variable
-y <- unique_subset_data$ctDNA_ever
-
-#yet again, the same 4 missing observations in X3, so will impute (only 4 observations). We have already imputed these, so I don't need to do it again for unique_subset_Data 
-
-
-### removed Nodal status as a variable 
-X3 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+
-                                   final_tumor_grade + final_overall_stage + 
-                                      + final_t_stage + final_n_stage + 
-                                      histology_category + prtx_radiation + 
-                                      + prtx_chemo + prtx_endo + prtx_bonemod  
-                                       + axillary_dissection + 
-                                      diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data)
-
-# Fit lasso model
-lasso_model <- glmnet(X3, y, family = "binomial", alpha = 1)  # alpha = 1 for lasso
-
-#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
-cv_lasso_model <- cv.glmnet(X3, y, family = "binomial", alpha = 1)
-
-
Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
-multinomial or binomial class has fewer than 8 observations; dangerous ground
-Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
-multinomial or binomial class has fewer than 8 observations; dangerous ground
-Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
-multinomial or binomial class has fewer than 8 observations; dangerous ground
-Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
-multinomial or binomial class has fewer than 8 observations; dangerous ground
-
-
#plotting the results to look at the performance of different lamda 
-plot(cv_lasso_model)
-
-
-
-

-
-
-
-
#getting the best lambda  -- 0.048, lower  
-best_lambda <- cv_lasso_model$lambda.min
-print(paste("Best lambda:", best_lambda))
-
-
[1] "Best lambda: 0.048114238291791"
-
-
#Finding the final fit model with the optimal lambda 
-paired_down_lasso <- glmnet(X3, y, family = "binomial", alpha = 1, lambda = best_lambda)
-
-#Which coefficents are included in the model. 
-coef(paired_down_lasso) 
-
-
28 x 1 sparse Matrix of class "dgCMatrix"
-                                              s0
-(Intercept)                            -2.762725
-age_at_diag                             .       
-final_receptor_groupTNBC                .       
-final_receptor_groupHR+ HER2-           .       
-final_receptor_groupHR+ HER2+           .       
-final_receptor_groupHR- HER2+           .       
-demo_race_finalAsian                    .       
-demo_race_finalWhite                    .       
-final_tumor_gradeGrade 1                .       
-final_tumor_gradeGrade 2                .       
-final_overall_stageStage II             .       
-final_overall_stageStage III            .       
-final_t_stageT2                         .       
-final_t_stageT3                         .       
-final_t_stageT4                         1.189380
-final_n_stageN1                         .       
-final_n_stageN2                         1.573283
-final_n_stageN3                         .       
-histology_categoryDuctal                .       
-histology_categoryLobular               .       
-histology_categoryOther                 .       
-prtx_radiationRadiation                 .       
-prtx_chemoChemo                         .       
-prtx_endoEndocrine Therapy              .       
-prtx_bonemodBone Modifying Treatment    .       
-axillary_dissectionAxillary Dissection  .       
-diag_surgery_type_1Mastectomy           .       
-diag_neoadj_chemo_1Neoadjuvant Chemo    .       
-
-
-

When we use the paired down lasso model for ctDNA positivity (removed nodal positivity), we see that T stage and N stage remain the only significant factors, and that higher nodal status is the most influential on ctDNA positivity. The lambda for this model is 0.048 which is lower than the prior model.

-

It is, however, somewhat challenging to model ctDNA positivity using any of these approaches because there were only 9 individuals in this cohort of 109 individuals with positive results. Because of this low N, it is hard to know exactly what to do with these predictors.

-

The intercept (-2.76) is the log-odds of the outcome (ctDNA positivity or DTC positivity) when all the predictor variables are zero. The coefficients can be interpreted as the amount/times the log odds increases (or decreases) for that cohort, holding all other variables equal.

-

To test our proof of principle approach that lasso can be applied to this dataset and perhaps generate more robust results, we will also look at DTC predictors, as DTC positivity was more frequent in this cohort and we therefore suspect the modeling approach may work better.

-
-
set.seed(123) 
-
-#### DTC predictions. 
-
-subset_data <- subset_data[!duplicated(subset_data$participant_id), ]
-dtc_unique_subset_data <- merge(unique_subset_data, subset_data[, c("participant_id", "dtc_ever")], by = "participant_id", all.x = TRUE)
-
-nrow(dtc_unique_subset_data)  # Should still be 109
-
-
[1] 109
-
-
table(dtc_unique_subset_data$dtc_ever, useNA = "ifany")  # Check for NA values
-
-

- 0  1 
-70 39 
-
-
#run the lasso for DTC status. This might work better as there are more DTC + results 
-y1 <- dtc_unique_subset_data$dtc_ever
-X2 #use the same X2 as it has the same predictors we are interested in  
-
-
    age_at_diag final_receptor_groupTNBC final_receptor_groupHR+ HER2-
-1      55.89870                        0                             1
-2      49.25667                        0                             1
-3      52.87611                        0                             1
-4      29.93840                        1                             0
-5      37.00753                        1                             0
-6      48.98563                        0                             1
-7      63.80835                        0                             0
-8      40.89802                        0                             0
-9      43.59754                        1                             0
-10     38.57632                        1                             0
-11     41.77687                        1                             0
-12     45.68925                        0                             1
-13     59.94524                        0                             1
-14     59.43600                        0                             0
-15     52.14511                        1                             0
-16     42.93771                        1                             0
-17     64.69541                        0                             1
-18     55.14031                        1                             0
-19     41.26762                        1                             0
-20     39.52361                        1                             0
-21     57.76044                        1                             0
-22     44.42984                        0                             1
-23     51.34565                        0                             1
-24     42.27789                        1                             0
-25     57.05133                        1                             0
-26     57.62628                        1                             0
-27     54.86927                        1                             0
-28     44.18891                        1                             0
-29     63.62491                        0                             1
-30     36.00548                        1                             0
-31     55.57837                        1                             0
-32     30.71595                        0                             0
-33     41.28953                        1                             0
-34     59.38946                        0                             1
-35     48.79945                        0                             1
-36     59.15400                        0                             0
-37     48.97194                        1                             0
-38     59.39767                        0                             1
-39     39.67967                        1                             0
-40     67.68515                        0                             1
-41     41.84531                        1                             0
-42     48.16975                        1                             0
-43     58.07529                        0                             1
-44     62.49966                        0                             1
-45     46.64476                        0                             0
-46     47.34565                        0                             1
-47     52.09856                        0                             1
-48     36.58042                        0                             0
-49     58.26146                        0                             1
-50     61.76318                        0                             1
-51     61.73580                        1                             0
-52     39.40862                        1                             0
-53     55.30459                        1                             0
-54     53.10335                        0                             1
-55     43.30459                        1                             0
-56     48.46270                        0                             1
-57     44.07666                        1                             0
-58     52.55305                        1                             0
-59     56.45996                        1                             0
-60     67.72621                        1                             0
-61     39.59206                        1                             0
-62     51.82752                        0                             1
-63     58.28611                        0                             1
-64     46.93498                        1                             0
-65     31.17591                        1                             0
-66     55.96441                        1                             0
-67     46.38741                        0                             1
-68     46.33812                        0                             1
-69     40.62971                        0                             1
-70     37.67556                        0                             1
-71     32.35318                        1                             0
-72     48.75291                        0                             1
-73     56.22177                        0                             1
-74     39.41136                        0                             1
-75     49.76591                        1                             0
-76     43.22245                        0                             0
-77     36.01095                        0                             1
-78     41.30322                        1                             0
-79     59.57016                        0                             1
-80     39.65503                        0                             1
-81     54.94593                        1                             0
-82     43.50992                        0                             0
-83     48.80767                        1                             0
-84     62.10541                        0                             1
-85     63.35934                        0                             0
-86     57.31417                        0                             1
-87     59.74264                        0                             1
-88     66.92676                        1                             0
-89     36.30938                        1                             0
-90     34.83641                        0                             1
-91     55.12115                        0                             1
-92     52.07118                        0                             1
-93     27.33744                        0                             1
-94     64.41342                        0                             1
-95     56.09035                        0                             1
-96     47.90691                        0                             1
-97     51.38125                        0                             1
-98     41.71663                        1                             0
-99     48.47639                        0                             1
-100    40.52567                        1                             0
-101    60.39151                        0                             1
-102    52.51198                        0                             0
-103    60.87064                        0                             1
-104    58.61465                        0                             1
-105    38.60370                        0                             1
-106    68.93634                        0                             1
-107    37.84531                        0                             0
-108    51.43874                        1                             0
-109    52.68720                        0                             1
-    final_receptor_groupHR+ HER2+ final_receptor_groupHR- HER2+
-1                               0                             0
-2                               0                             0
-3                               0                             0
-4                               0                             0
-5                               0                             0
-6                               0                             0
-7                               0                             1
-8                               1                             0
-9                               0                             0
-10                              0                             0
-11                              0                             0
-12                              0                             0
-13                              0                             0
-14                              1                             0
-15                              0                             0
-16                              0                             0
-17                              0                             0
-18                              0                             0
-19                              0                             0
-20                              0                             0
-21                              0                             0
-22                              0                             0
-23                              0                             0
-24                              0                             0
-25                              0                             0
-26                              0                             0
-27                              0                             0
-28                              0                             0
-29                              0                             0
-30                              0                             0
-31                              0                             0
-32                              1                             0
-33                              0                             0
-34                              0                             0
-35                              0                             0
-36                              1                             0
-37                              0                             0
-38                              0                             0
-39                              0                             0
-40                              0                             0
-41                              0                             0
-42                              0                             0
-43                              0                             0
-44                              0                             0
-45                              1                             0
-46                              0                             0
-47                              0                             0
-48                              1                             0
-49                              0                             0
-50                              0                             0
-51                              0                             0
-52                              0                             0
-53                              0                             0
-54                              0                             0
-55                              0                             0
-56                              0                             0
-57                              0                             0
-58                              0                             0
-59                              0                             0
-60                              0                             0
-61                              0                             0
-62                              0                             0
-63                              0                             0
-64                              0                             0
-65                              0                             0
-66                              0                             0
-67                              0                             0
-68                              0                             0
-69                              0                             0
-70                              0                             0
-71                              0                             0
-72                              0                             0
-73                              0                             0
-74                              0                             0
-75                              0                             0
-76                              1                             0
-77                              0                             0
-78                              0                             0
-79                              0                             0
-80                              0                             0
-81                              0                             0
-82                              0                             1
-83                              0                             0
-84                              0                             0
-85                              1                             0
-86                              0                             0
-87                              0                             0
-88                              0                             0
-89                              0                             0
-90                              0                             0
-91                              0                             0
-92                              0                             0
-93                              0                             0
-94                              0                             0
-95                              0                             0
-96                              0                             0
-97                              0                             0
-98                              0                             0
-99                              0                             0
-100                             0                             0
-101                             0                             0
-102                             0                             1
-103                             0                             0
-104                             0                             0
-105                             0                             0
-106                             0                             0
-107                             0                             1
-108                             0                             0
-109                             0                             0
-    demo_race_finalAsian demo_race_finalWhite final_tumor_gradeGrade 1
-1                      0                    1                        0
-2                      0                    1                        0
-3                      0                    1                        0
-4                      0                    1                        0
-5                      0                    1                        0
-6                      0                    1                        0
-7                      0                    0                        0
-8                      0                    1                        0
-9                      0                    1                        0
-10                     0                    1                        0
-11                     0                    1                        0
-12                     0                    1                        0
-13                     0                    1                        0
-14                     0                    1                        0
-15                     0                    1                        0
-16                     0                    1                        0
-17                     0                    1                        0
-18                     0                    1                        0
-19                     0                    0                        0
-20                     0                    1                        0
-21                     0                    1                        0
-22                     0                    1                        0
-23                     0                    1                        0
-24                     0                    1                        0
-25                     0                    1                        0
-26                     0                    0                        0
-27                     0                    1                        0
-28                     0                    1                        0
-29                     0                    1                        1
-30                     0                    1                        0
-31                     0                    1                        0
-32                     0                    1                        0
-33                     0                    0                        0
-34                     0                    1                        1
-35                     0                    1                        1
-36                     0                    1                        0
-37                     0                    1                        0
-38                     0                    1                        1
-39                     0                    1                        0
-40                     0                    1                        0
-41                     0                    1                        0
-42                     0                    1                        0
-43                     0                    1                        0
-44                     0                    1                        0
-45                     0                    1                        0
-46                     0                    1                        1
-47                     0                    1                        0
-48                     0                    0                        1
-49                     0                    1                        1
-50                     0                    1                        1
-51                     0                    1                        0
-52                     0                    1                        0
-53                     0                    1                        0
-54                     0                    1                        1
-55                     0                    1                        0
-56                     0                    1                        1
-57                     0                    0                        0
-58                     0                    1                        0
-59                     0                    1                        0
-60                     0                    1                        0
-61                     0                    1                        0
-62                     0                    1                        1
-63                     0                    1                        0
-64                     0                    1                        0
-65                     0                    0                        0
-66                     0                    1                        0
-67                     0                    1                        1
-68                     0                    1                        1
-69                     0                    1                        0
-70                     0                    1                        0
-71                     0                    1                        0
-72                     0                    1                        0
-73                     0                    1                        1
-74                     0                    1                        0
-75                     0                    1                        0
-76                     0                    1                        0
-77                     0                    1                        0
-78                     0                    1                        0
-79                     0                    1                        0
-80                     0                    1                        0
-81                     0                    0                        0
-82                     0                    1                        0
-83                     0                    1                        0
-84                     0                    1                        0
-85                     0                    1                        0
-86                     0                    1                        0
-87                     0                    1                        1
-88                     0                    1                        0
-89                     0                    1                        0
-90                     0                    1                        1
-91                     0                    1                        0
-92                     0                    1                        1
-93                     1                    0                        1
-94                     0                    1                        1
-95                     0                    1                        0
-96                     0                    1                        1
-97                     0                    1                        0
-98                     0                    0                        0
-99                     0                    1                        1
-100                    0                    1                        0
-101                    0                    1                        1
-102                    0                    1                        0
-103                    0                    1                        1
-104                    0                    1                        0
-105                    0                    1                        0
-106                    0                    1                        0
-107                    0                    1                        0
-108                    0                    1                        0
-109                    0                    1                        0
-    final_tumor_gradeGrade 2 final_overall_stageStage II
-1                          1                           0
-2                          1                           0
-3                          0                           0
-4                          0                           1
-5                          0                           1
-6                          0                           0
-7                          0                           0
-8                          0                           0
-9                          0                           1
-10                         0                           0
-11                         0                           1
-12                         0                           1
-13                         0                           1
-14                         1                           1
-15                         0                           0
-16                         0                           1
-17                         0                           0
-18                         0                           1
-19                         0                           1
-20                         0                           0
-21                         0                           1
-22                         0                           0
-23                         0                           0
-24                         0                           1
-25                         0                           1
-26                         0                           1
-27                         0                           0
-28                         0                           1
-29                         0                           0
-30                         0                           1
-31                         0                           0
-32                         0                           1
-33                         0                           0
-34                         0                           1
-35                         0                           0
-36                         0                           1
-37                         0                           1
-38                         0                           0
-39                         0                           0
-40                         0                           0
-41                         0                           1
-42                         0                           0
-43                         0                           1
-44                         1                           0
-45                         1                           0
-46                         0                           0
-47                         0                           1
-48                         0                           0
-49                         0                           1
-50                         0                           0
-51                         0                           1
-52                         0                           1
-53                         0                           0
-54                         0                           1
-55                         0                           0
-56                         0                           0
-57                         0                           0
-58                         0                           1
-59                         0                           0
-60                         0                           1
-61                         0                           1
-62                         0                           0
-63                         0                           0
-64                         0                           1
-65                         0                           1
-66                         0                           1
-67                         0                           0
-68                         0                           1
-69                         0                           0
-70                         0                           1
-71                         0                           0
-72                         0                           1
-73                         0                           0
-74                         0                           0
-75                         0                           0
-76                         0                           0
-77                         0                           0
-78                         0                           0
-79                         0                           0
-80                         0                           0
-81                         0                           0
-82                         0                           1
-83                         0                           0
-84                         1                           0
-85                         0                           1
-86                         0                           0
-87                         0                           0
-88                         0                           0
-89                         0                           1
-90                         0                           1
-91                         0                           1
-92                         0                           0
-93                         0                           1
-94                         0                           0
-95                         0                           0
-96                         0                           0
-97                         0                           1
-98                         0                           1
-99                         0                           1
-100                        0                           0
-101                        0                           0
-102                        0                           0
-103                        0                           0
-104                        0                           0
-105                        0                           0
-106                        0                           1
-107                        0                           1
-108                        0                           0
-109                        0                           1
-    final_overall_stageStage III final_t_stageT2 final_t_stageT3
-1                              1               1               0
-2                              0               0               0
-3                              1               1               0
-4                              0               1               0
-5                              0               1               0
-6                              1               0               1
-7                              1               1               0
-8                              1               0               0
-9                              0               1               0
-10                             0               0               0
-11                             0               1               0
-12                             0               1               0
-13                             0               0               1
-14                             0               0               0
-15                             0               0               0
-16                             0               1               0
-17                             0               0               0
-18                             0               0               0
-19                             0               1               0
-20                             0               0               0
-21                             0               1               0
-22                             1               1               0
-23                             1               1               0
-24                             0               1               0
-25                             0               0               0
-26                             0               1               0
-27                             0               0               0
-28                             0               0               0
-29                             1               1               0
-30                             0               1               0
-31                             0               0               0
-32                             0               0               0
-33                             0               0               0
-34                             0               0               1
-35                             1               0               0
-36                             0               1               0
-37                             0               1               0
-38                             1               0               0
-39                             0               0               0
-40                             0               0               0
-41                             0               1               0
-42                             1               0               1
-43                             0               1               0
-44                             0               0               0
-45                             0               0               0
-46                             1               0               1
-47                             0               0               0
-48                             1               0               0
-49                             0               1               0
-50                             0               0               0
-51                             0               1               0
-52                             0               1               0
-53                             0               0               0
-54                             0               0               0
-55                             0               0               0
-56                             1               0               1
-57                             0               0               0
-58                             0               0               0
-59                             0               0               0
-60                             0               1               0
-61                             0               1               0
-62                             1               0               1
-63                             0               0               0
-64                             0               1               0
-65                             0               1               0
-66                             0               1               0
-67                             1               0               0
-68                             0               0               0
-69                             0               0               0
-70                             0               1               0
-71                             0               0               0
-72                             0               1               0
-73                             0               0               0
-74                             1               0               1
-75                             1               0               1
-76                             0               0               0
-77                             1               1               0
-78                             0               0               0
-79                             0               0               0
-80                             0               0               0
-81                             0               0               0
-82                             0               1               0
-83                             1               1               0
-84                             0               0               0
-85                             0               1               0
-86                             0               0               0
-87                             1               0               0
-88                             0               0               0
-89                             0               1               0
-90                             0               1               0
-91                             0               1               0
-92                             1               0               1
-93                             0               1               0
-94                             1               1               0
-95                             1               0               0
-96                             0               0               0
-97                             0               1               0
-98                             0               0               0
-99                             0               0               0
-100                            0               0               0
-101                            0               0               0
-102                            1               0               1
-103                            1               0               1
-104                            0               0               0
-105                            0               0               0
-106                            0               1               0
-107                            0               1               0
-108                            0               0               0
-109                            0               1               0
-    final_t_stageT4 final_n_stageN1 final_n_stageN2 final_n_stageN3
-1                 0               0               0               1
-2                 0               0               0               0
-3                 0               0               0               1
-4                 0               0               0               0
-5                 0               0               0               0
-6                 0               0               1               0
-7                 0               0               1               0
-8                 0               0               1               0
-9                 0               0               0               0
-10                0               0               0               0
-11                0               0               0               0
-12                0               1               0               0
-13                0               0               0               0
-14                0               1               0               0
-15                0               0               0               0
-16                0               1               0               0
-17                0               0               0               0
-18                0               1               0               0
-19                0               0               0               0
-20                0               0               0               0
-21                0               0               0               0
-22                0               0               0               1
-23                0               0               0               1
-24                0               1               0               0
-25                0               1               0               0
-26                0               0               0               0
-27                0               0               0               0
-28                0               0               0               0
-29                0               0               1               0
-30                0               0               0               0
-31                0               0               0               0
-32                0               1               0               0
-33                0               0               0               0
-34                0               0               0               0
-35                1               0               1               0
-36                0               0               0               0
-37                0               0               0               0
-38                0               0               1               0
-39                0               0               0               0
-40                0               0               0               0
-41                0               1               0               0
-42                0               0               1               0
-43                0               0               0               0
-44                0               0               0               0
-45                0               0               0               0
-46                0               0               1               0
-47                0               1               0               0
-48                0               0               1               0
-49                0               1               0               0
-50                0               0               0               0
-51                0               1               0               0
-52                0               1               0               0
-53                0               0               0               0
-54                0               1               0               0
-55                0               0               0               0
-56                0               1               0               0
-57                0               0               0               0
-58                0               1               0               0
-59                0               1               0               0
-60                0               1               0               0
-61                0               0               0               0
-62                0               1               0               0
-63                0               0               0               0
-64                0               1               0               0
-65                0               1               0               0
-66                0               0               0               0
-67                0               0               0               1
-68                0               1               0               0
-69                0               1               0               0
-70                0               1               0               0
-71                0               0               0               0
-72                0               1               0               0
-73                0               1               0               0
-74                0               0               0               1
-75                0               1               0               0
-76                0               1               0               0
-77                0               0               0               1
-78                0               0               0               0
-79                0               0               0               0
-80                0               1               0               0
-81                0               0               0               0
-82                0               1               0               0
-83                0               1               0               0
-84                0               1               0               0
-85                0               0               0               0
-86                0               0               0               0
-87                0               0               1               0
-88                0               0               0               0
-89                0               1               0               0
-90                0               1               0               0
-91                0               1               0               0
-92                0               0               1               0
-93                0               1               0               0
-94                0               0               1               0
-95                0               0               1               0
-96                0               0               0               0
-97                0               1               0               0
-98                0               1               0               0
-99                0               1               0               0
-100               0               0               0               0
-101               0               1               0               0
-102               0               1               0               0
-103               0               1               0               0
-104               0               0               0               0
-105               0               0               0               0
-106               0               1               0               0
-107               0               1               0               0
-108               0               0               0               0
-109               0               0               0               0
-    histology_categoryDuctal histology_categoryLobular histology_categoryOther
-1                          0                         0                       0
-2                          1                         0                       0
-3                          1                         0                       0
-4                          1                         0                       0
-5                          1                         0                       0
-6                          0                         1                       0
-7                          1                         0                       0
-8                          1                         0                       0
-9                          1                         0                       0
-10                         1                         0                       0
-11                         1                         0                       0
-12                         1                         0                       0
-13                         0                         1                       0
-14                         1                         0                       0
-15                         1                         0                       0
-16                         1                         0                       0
-17                         1                         0                       0
-18                         1                         0                       0
-19                         1                         0                       0
-20                         1                         0                       0
-21                         1                         0                       0
-22                         1                         0                       0
-23                         1                         0                       0
-24                         1                         0                       0
-25                         1                         0                       0
-26                         1                         0                       0
-27                         1                         0                       0
-28                         1                         0                       0
-29                         1                         0                       0
-30                         1                         0                       0
-31                         1                         0                       0
-32                         1                         0                       0
-33                         1                         0                       0
-34                         0                         1                       0
-35                         1                         0                       0
-36                         1                         0                       0
-37                         1                         0                       0
-38                         1                         0                       0
-39                         1                         0                       0
-40                         1                         0                       0
-41                         1                         0                       0
-42                         1                         0                       0
-43                         1                         0                       0
-44                         1                         0                       0
-45                         0                         0                       1
-46                         0                         1                       0
-47                         1                         0                       0
-48                         1                         0                       0
-49                         0                         1                       0
-50                         0                         1                       0
-51                         1                         0                       0
-52                         1                         0                       0
-53                         1                         0                       0
-54                         0                         1                       0
-55                         1                         0                       0
-56                         0                         1                       0
-57                         1                         0                       0
-58                         1                         0                       0
-59                         1                         0                       0
-60                         1                         0                       0
-61                         0                         0                       1
-62                         0                         1                       0
-63                         1                         0                       0
-64                         1                         0                       0
-65                         1                         0                       0
-66                         1                         0                       0
-67                         0                         1                       0
-68                         1                         0                       0
-69                         1                         0                       0
-70                         1                         0                       0
-71                         1                         0                       0
-72                         1                         0                       0
-73                         0                         0                       0
-74                         1                         0                       0
-75                         1                         0                       0
-76                         0                         0                       0
-77                         0                         0                       0
-78                         1                         0                       0
-79                         1                         0                       0
-80                         0                         0                       0
-81                         1                         0                       0
-82                         1                         0                       0
-83                         1                         0                       0
-84                         1                         0                       0
-85                         1                         0                       0
-86                         1                         0                       0
-87                         1                         0                       0
-88                         1                         0                       0
-89                         1                         0                       0
-90                         1                         0                       0
-91                         0                         0                       0
-92                         0                         1                       0
-93                         1                         0                       0
-94                         0                         1                       0
-95                         1                         0                       0
-96                         1                         0                       0
-97                         0                         0                       0
-98                         1                         0                       0
-99                         0                         0                       0
-100                        1                         0                       0
-101                        0                         1                       0
-102                        1                         0                       0
-103                        0                         1                       0
-104                        1                         0                       0
-105                        1                         0                       0
-106                        1                         0                       0
-107                        1                         0                       0
-108                        1                         0                       0
-109                        0                         0                       0
-    prtx_radiationRadiation prtx_chemoChemo prtx_endoEndocrine Therapy
-1                         1               1                          1
-2                         0               1                          1
-3                         0               1                          1
-4                         0               1                          0
-5                         0               1                          1
-6                         1               1                          1
-7                         1               1                          0
-8                         1               1                          1
-9                         1               1                          0
-10                        0               1                          0
-11                        1               1                          0
-12                        1               1                          1
-13                        1               1                          1
-14                        1               0                          1
-15                        0               1                          0
-16                        0               1                          0
-17                        1               1                          1
-18                        1               1                          0
-19                        0               1                          0
-20                        1               1                          0
-21                        1               1                          0
-22                        1               1                          1
-23                        1               1                          1
-24                        1               1                          0
-25                        0               1                          0
-26                        1               1                          0
-27                        0               1                          0
-28                        0               1                          0
-29                        1               1                          1
-30                        0               1                          0
-31                        1               1                          0
-32                        1               1                          1
-33                        0               1                          0
-34                        1               1                          1
-35                        1               0                          1
-36                        0               1                          1
-37                        0               1                          0
-38                        1               1                          1
-39                        0               1                          0
-40                        0               1                          1
-41                        1               1                          0
-42                        1               1                          0
-43                        1               1                          1
-44                        1               1                          1
-45                        1               1                          1
-46                        1               1                          1
-47                        0               1                          1
-48                        1               1                          1
-49                        1               1                          1
-50                        0               1                          1
-51                        1               1                          1
-52                        1               1                          0
-53                        1               1                          0
-54                        0               1                          1
-55                        1               1                          0
-56                        1               1                          1
-57                        0               1                          0
-58                        1               1                          0
-59                        1               1                          0
-60                        0               1                          0
-61                        0               1                          0
-62                        1               1                          1
-63                        1               1                          1
-64                        1               1                          0
-65                        1               1                          0
-66                        1               1                          0
-67                        1               1                          1
-68                        1               1                          1
-69                        0               1                          1
-70                        1               1                          1
-71                        0               1                          0
-72                        1               1                          1
-73                        1               1                          1
-74                        1               1                          1
-75                        1               1                          0
-76                        0               1                          1
-77                        1               1                          1
-78                        1               1                          0
-79                        1               1                          1
-80                        1               1                          1
-81                        1               1                          0
-82                        1               1                          0
-83                        1               1                          0
-84                        1               0                          1
-85                        1               1                          1
-86                        1               1                          1
-87                        1               1                          1
-88                        1               1                          0
-89                        1               1                          0
-90                        1               1                          1
-91                        1               1                          1
-92                        1               1                          1
-93                        1               1                          1
-94                        1               1                          1
-95                        1               1                          1
-96                        0               1                          1
-97                        1               1                          1
-98                        1               1                          0
-99                        0               1                          1
-100                       0               1                          0
-101                       0               1                          1
-102                       1               1                          0
-103                       1               1                          1
-104                       1               1                          1
-105                       0               1                          1
-106                       0               1                          1
-107                       1               1                          0
-108                       0               1                          0
-109                       0               1                          1
-    prtx_bonemodBone Modifying Treatment node_statusNode Positive
-1                                      0                        1
-2                                      0                        0
-3                                      0                        1
-4                                      0                        0
-5                                      0                        0
-6                                      1                        1
-7                                      0                        1
-8                                      0                        1
-9                                      0                        0
-10                                     0                        0
-11                                     0                        0
-12                                     0                        1
-13                                     0                        0
-14                                     0                        1
-15                                     0                        0
-16                                     0                        1
-17                                     0                        0
-18                                     0                        1
-19                                     0                        0
-20                                     0                        0
-21                                     0                        0
-22                                     1                        1
-23                                     1                        1
-24                                     0                        1
-25                                     0                        1
-26                                     0                        0
-27                                     0                        0
-28                                     0                        0
-29                                     0                        1
-30                                     0                        0
-31                                     0                        0
-32                                     0                        1
-33                                     0                        0
-34                                     1                        0
-35                                     0                        1
-36                                     1                        0
-37                                     0                        0
-38                                     1                        1
-39                                     0                        0
-40                                     1                        0
-41                                     0                        1
-42                                     0                        1
-43                                     0                        0
-44                                     1                        0
-45                                     1                        0
-46                                     1                        1
-47                                     0                        1
-48                                     1                        1
-49                                     0                        1
-50                                     1                        0
-51                                     1                        1
-52                                     0                        1
-53                                     0                        0
-54                                     0                        1
-55                                     0                        0
-56                                     1                        1
-57                                     1                        0
-58                                     0                        1
-59                                     0                        1
-60                                     0                        1
-61                                     0                        0
-62                                     1                        1
-63                                     0                        0
-64                                     0                        1
-65                                     0                        1
-66                                     0                        0
-67                                     1                        1
-68                                     0                        1
-69                                     1                        1
-70                                     1                        1
-71                                     0                        0
-72                                     0                        1
-73                                     0                        1
-74                                     1                        1
-75                                     1                        1
-76                                     1                        1
-77                                     1                        1
-78                                     0                        0
-79                                     1                        0
-80                                     1                        1
-81                                     0                        0
-82                                     0                        1
-83                                     0                        1
-84                                     1                        1
-85                                     1                        0
-86                                     1                        0
-87                                     1                        1
-88                                     0                        0
-89                                     0                        1
-90                                     1                        1
-91                                     1                        1
-92                                     1                        1
-93                                     1                        1
-94                                     1                        1
-95                                     1                        1
-96                                     0                        0
-97                                     1                        1
-98                                     0                        1
-99                                     0                        1
-100                                    0                        0
-101                                    1                        1
-102                                    0                        1
-103                                    1                        1
-104                                    0                        0
-105                                    1                        0
-106                                    0                        1
-107                                    0                        1
-108                                    0                        0
-109                                    0                        0
-    axillary_dissectionAxillary Dissection diag_surgery_type_1Mastectomy
-1                                        1                             0
-2                                        1                             1
-3                                        1                             1
-4                                        0                             1
-5                                        0                             1
-6                                        1                             1
-7                                        1                             0
-8                                        1                             1
-9                                        0                             0
-10                                       0                             0
-11                                       0                             0
-12                                       0                             0
-13                                       0                             0
-14                                       1                             0
-15                                       0                             1
-16                                       1                             1
-17                                       0                             0
-18                                       0                             0
-19                                       0                             1
-20                                       0                             0
-21                                       0                             0
-22                                       1                             1
-23                                       1                             0
-24                                       1                             1
-25                                       1                             1
-26                                       0                             0
-27                                       1                             0
-28                                       0                             1
-29                                       1                             0
-30                                       0                             1
-31                                       0                             0
-32                                       0                             0
-33                                       0                             1
-34                                       1                             1
-35                                       1                             1
-36                                       0                             1
-37                                       0                             1
-38                                       0                             0
-39                                       0                             0
-40                                       0                             1
-41                                       1                             1
-42                                       1                             1
-43                                       1                             1
-44                                       0                             0
-45                                       0                             0
-46                                       1                             1
-47                                       1                             1
-48                                       1                             0
-49                                       1                             1
-50                                       1                             1
-51                                       0                             0
-52                                       0                             0
-53                                       0                             1
-54                                       1                             1
-55                                       0                             0
-56                                       1                             1
-57                                       0                             1
-58                                       0                             0
-59                                       0                             0
-60                                       1                             1
-61                                       1                             1
-62                                       1                             1
-63                                       0                             0
-64                                       1                             1
-65                                       1                             1
-66                                       1                             1
-67                                       1                             0
-68                                       1                             0
-69                                       0                             1
-70                                       1                             1
-71                                       0                             1
-72                                       1                             1
-73                                       0                             0
-74                                       1                             1
-75                                       1                             1
-76                                       1                             1
-77                                       1                             0
-78                                       0                             0
-79                                       0                             0
-80                                       0                             0
-81                                       0                             0
-82                                       1                             1
-83                                       1                             1
-84                                       0                             1
-85                                       0                             0
-86                                       0                             0
-87                                       1                             0
-88                                       0                             0
-89                                       1                             1
-90                                       1                             1
-91                                       1                             0
-92                                       1                             1
-93                                       1                             1
-94                                       1                             1
-95                                       1                             1
-96                                       0                             1
-97                                       1                             1
-98                                       0                             0
-99                                       1                             1
-100                                      0                             0
-101                                      0                             1
-102                                      1                             1
-103                                      1                             1
-104                                      0                             0
-105                                      0                             1
-106                                      1                             1
-107                                      0                             1
-108                                      0                             1
-109                                      0                             1
-    diag_neoadj_chemo_1Neoadjuvant Chemo
-1                                      0
-2                                      0
-3                                      0
-4                                      0
-5                                      1
-6                                      0
-7                                      0
-8                                      0
-9                                      0
-10                                     0
-11                                     0
-12                                     0
-13                                     0
-14                                     0
-15                                     0
-16                                     0
-17                                     0
-18                                     0
-19                                     0
-20                                     0
-21                                     0
-22                                     0
-23                                     0
-24                                     0
-25                                     0
-26                                     0
-27                                     0
-28                                     1
-29                                     0
-30                                     0
-31                                     0
-32                                     0
-33                                     0
-34                                     1
-35                                     0
-36                                     1
-37                                     0
-38                                     0
-39                                     0
-40                                     0
-41                                     0
-42                                     0
-43                                     1
-44                                     0
-45                                     0
-46                                     0
-47                                     0
-48                                     0
-49                                     1
-50                                     0
-51                                     0
-52                                     0
-53                                     0
-54                                     0
-55                                     0
-56                                     0
-57                                     0
-58                                     0
-59                                     0
-60                                     0
-61                                     0
-62                                     0
-63                                     0
-64                                     1
-65                                     1
-66                                     1
-67                                     0
-68                                     0
-69                                     0
-70                                     1
-71                                     0
-72                                     1
-73                                     0
-74                                     0
-75                                     1
-76                                     0
-77                                     0
-78                                     0
-79                                     0
-80                                     0
-81                                     0
-82                                     1
-83                                     1
-84                                     0
-85                                     0
-86                                     0
-87                                     0
-88                                     0
-89                                     0
-90                                     1
-91                                     0
-92                                     0
-93                                     1
-94                                     0
-95                                     0
-96                                     0
-97                                     0
-98                                     0
-99                                     0
-100                                    1
-101                                    0
-102                                    1
-103                                    0
-104                                    0
-105                                    0
-106                                    0
-107                                    1
-108                                    0
-109                                    0
-attr(,"assign")
- [1]  1  2  2  2  2  3  3  4  4  5  5  6  6  6  7  7  7  8  8  8  9 10 11 12 13
-[26] 14 15 16
-attr(,"contrasts")
-attr(,"contrasts")$final_receptor_group
-[1] "contr.treatment"
-
-attr(,"contrasts")$demo_race_final
-[1] "contr.treatment"
-
-attr(,"contrasts")$final_tumor_grade
-[1] "contr.treatment"
-
-attr(,"contrasts")$final_overall_stage
-[1] "contr.treatment"
-
-attr(,"contrasts")$final_t_stage
-[1] "contr.treatment"
-
-attr(,"contrasts")$final_n_stage
-[1] "contr.treatment"
-
-attr(,"contrasts")$histology_category
-[1] "contr.treatment"
-
-attr(,"contrasts")$prtx_radiation
-[1] "contr.treatment"
-
-attr(,"contrasts")$prtx_chemo
-[1] "contr.treatment"
-
-attr(,"contrasts")$prtx_endo
-[1] "contr.treatment"
-
-attr(,"contrasts")$prtx_bonemod
-[1] "contr.treatment"
-
-attr(,"contrasts")$node_status
-[1] "contr.treatment"
-
-attr(,"contrasts")$axillary_dissection
-[1] "contr.treatment"
-
-attr(,"contrasts")$diag_surgery_type_1
-[1] "contr.treatment"
-
-attr(,"contrasts")$diag_neoadj_chemo_1
-[1] "contr.treatment"
-
-
dim(X2)  # Rows should match nrow(dtc_unique_subset_data)
-
-
[1] 109  28
-
-
length(y1)  # Should also match nrow(dtc_unique_subset_data). We have the same # (109)!
-
-
[1] 109
-
-
lasso_model <- glmnet(X2, y1, family = "binomial", alpha = 1)  # alpha = 1 for lasso. 0 for ridge. 
-
-#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
-cv_lasso_model <- cv.glmnet(X2, y1, family = "binomial", alpha = 1)
-
-#plotting the results to look at the performance of different lamda 
-plot(cv_lasso_model)
-
-
-
-

-
-
-
-
#getting the best lambda  -- best lambda is 0.024, even lower! 
-best_lambda <- cv_lasso_model$lambda.min
-print(paste("Best lambda:", best_lambda)) 
-
-
[1] "Best lambda: 0.0243089462466253"
-
-
#Finding the final fit model with the optimal lambda 
-final_lasso_model <- glmnet(X2, y1, family = "binomial", alpha = 1, lambda = best_lambda)
-
-#Which coefficents are included in the model. For the model with DTC positivity, we get more coefficients retained in the model.  
-coef(final_lasso_model) 
-
-
29 x 1 sparse Matrix of class "dgCMatrix"
-                                                s0
-(Intercept)                            -1.84167985
-age_at_diag                             .         
-final_receptor_groupTNBC                .         
-final_receptor_groupHR+ HER2-           .         
-final_receptor_groupHR+ HER2+           0.45710104
-final_receptor_groupHR- HER2+          -1.69722848
-demo_race_finalAsian                   -0.74138771
-demo_race_finalWhite                    .         
-final_tumor_gradeGrade 1               -0.38615694
-final_tumor_gradeGrade 2                .         
-final_overall_stageStage II             .         
-final_overall_stageStage III            .         
-final_t_stageT2                         .         
-final_t_stageT3                         .         
-final_t_stageT4                         1.81638875
-final_n_stageN1                         .         
-final_n_stageN2                        -0.02487123
-final_n_stageN3                         0.39987630
-histology_categoryDuctal                1.10854122
-histology_categoryLobular               .         
-histology_categoryOther                -0.62014177
-prtx_radiationRadiation                 0.61433722
-prtx_chemoChemo                         .         
-prtx_endoEndocrine Therapy              .         
-prtx_bonemodBone Modifying Treatment    0.12821102
-node_statusNode Positive               -0.80642910
-axillary_dissectionAxillary Dissection  .         
-diag_surgery_type_1Mastectomy           0.60444202
-diag_neoadj_chemo_1Neoadjuvant Chemo    0.23561799
-
-
-

For the LASSO model with DTC positivity, we get many more coefficients retained in the model and a lower lambda, which suggests a better overall model. Most notable factors in the LASSO for DTC positivity are higher T stage (with T4 inducing the highest log odds risk of ctDNA positivity), triple positive status (HR+ HER2+) which has a strong negative association with DTC positivty (though this cohort only had a handful of people who met this criteria), and ductal histology. Other influential factors using LASSO are node negativity, radiation history, bone modifying treatment, mastectomy, and neoadjuvant therapy. We will try this modeling for DTC positivity without our nodal status variable as this is likely colinear with node positivity to see how our model changes.

-
-
#### DTC LASSO without final_n_stage in it 
-
-set.seed(123) 
-
-#run the lasso for DTC status. This might work better as there are more DTC + results 
-y1 <- dtc_unique_subset_data$dtc_ever
-
-
-### removed final_n_stage as a variable 
-X4 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+
-                                   final_tumor_grade + final_overall_stage + 
-                                      + final_t_stage  + 
-                                      histology_category + prtx_radiation + 
-                                      + prtx_chemo + prtx_endo + prtx_bonemod  
-                                       + axillary_dissection +  node_status +
-                                      diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data)
-
- 
-
-dim(X4)  # Rows should match nrow(dtc_unique_subset_data)
-
-
[1] 109  25
-
-
length(y1)  # Should also match nrow(dtc_unique_subset_data). We have the same # (109)!
-
-
[1] 109
-
-
lasso_model <- glmnet(X4, y1, family = "binomial", alpha = 1)  # alpha = 1 for lasso. 0 for ridge. 
-
-#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
-cv_lasso_model <- cv.glmnet(X4, y1, family = "binomial", alpha = 1)
-
-#plotting the results to look at the performance of different lamda 
-plot(cv_lasso_model)
-
-
-
-

-
-
-
-
#getting the best lambda  -- best lambda is 0.027, same as above
-best_lambda <- cv_lasso_model$lambda.min
-print(paste("Best lambda:", best_lambda)) 
-
-
[1] "Best lambda: 0.0266790384961084"
-
-
#Finding the final fit model with the optimal lambda 
-final_lasso_model <- glmnet(X3, y1, family = "binomial", alpha = 1, lambda = best_lambda)
-
-#Which coefficents are included in the model. For the model with DTC positivity, we get more coefficients retained in the model. Most notable is the influence of axillary dissection (or none) on the log-odds of dtc positivity.  Other influential factors using LASSO are surgery type (with mastectomy vs lumpectomy increasing the log odds of DTC positivity) and neoadjuvant chemotherapy (with the presence of NACT increasing the log odds of DTC positivity). 
-coef(final_lasso_model) 
-
-
28 x 1 sparse Matrix of class "dgCMatrix"
-                                                s0
-(Intercept)                            -1.74827344
-age_at_diag                             .         
-final_receptor_groupTNBC                .         
-final_receptor_groupHR+ HER2-           .         
-final_receptor_groupHR+ HER2+           0.38484232
-final_receptor_groupHR- HER2+          -1.63882020
-demo_race_finalAsian                   -0.57458506
-demo_race_finalWhite                    .         
-final_tumor_gradeGrade 1               -0.35364514
-final_tumor_gradeGrade 2                .         
-final_overall_stageStage II             .         
-final_overall_stageStage III            .         
-final_t_stageT2                         .         
-final_t_stageT3                         .         
-final_t_stageT4                         1.60923057
-final_n_stageN1                        -0.60299273
-final_n_stageN2                        -0.53436611
-final_n_stageN3                         .         
-histology_categoryDuctal                1.07913426
-histology_categoryLobular               .         
-histology_categoryOther                -0.35868770
-prtx_radiationRadiation                 0.48044674
-prtx_chemoChemo                         .         
-prtx_endoEndocrine Therapy              .         
-prtx_bonemodBone Modifying Treatment    0.05228216
-axillary_dissectionAxillary Dissection -0.09409813
-diag_surgery_type_1Mastectomy           0.51967688
-diag_neoadj_chemo_1Neoadjuvant Chemo    0.28396596
-
-
-

In this last LASSO, in which we removed nodal_status to just assess the more granular final_n_stage (N1 vs N2 vs N3 etc), a few more variables became more significant. T4 stage, ductal histology, and receptor status maintained their strong relationships with DTC positivity, and several other variables maintained their less strong relationships (including grade, nodal status, race, bone modifying treatment, mastectomy, and neoadjuvant therapy–which all increased the risk of DTC positivity). Axillary dissection was negatively associated with dtc positivity–but just barely. These models without the node_status variable are the ones we will choose given that the lambdas are about the same or lower (compared to those including node_status) for both the ctDNA and DTC models as these make more intuitive sense than including two variables that are very similar to one another (as they represent the same information in different ways).

-
-
-
-

5 Conclusion

-

In this cohort of 109 individuals on the SURMOUNT study, DTC positivity occurred more frequently (in around 30% of individuals) than ctDNA postiivity, which occurred in < 10% of patients either at baseline or during surveillance. Despite low numbers, there was good concordance between ctDNA and DTC positivity (in particular, accounting for timepoint, with a concordance of 0.8).

-

In assessing predictors of ctDNA positivity, we identified that higher T stage and N stage remain the most significant predictors of ctDNA positivity (With age at diagnosis, HR+ and HER2+, lobular histology, and lower grade also serving as significant predictors of ctDNA positivity). The lambda for this model is 0.048.

-

In assessing predictors of DTC positivity using LASSO, we identified a bunch of factors including ductal histology, higher T stage (larger tumor size), and HER2 negative histology as the factors most strongly associated with DTC positivity. Other factors that were associated in multivariable approaches included factors representing more treatment (mastectomy, radiation, and neoadjuvant therapy). Interestingly, nodal positivity seemed to be negatively associated with DTC positivity. The lambda for this model is 0.027.

-

It is worth noting that the ctDNA model in particular is challenging to interpret in the setting of the low number of ctDNA positive individuals (n=9).

-

Overall, ctDNA status was significantly associated with relapse (p<0.01), with a PPV of 89% and NPV of 94% (and a specificity for relapse of 0.99). DTC positivity was NOT significantly associated with relapse and the sensitivity and specificity of this test for relapse was challenging to interpret in light of the fact that all DTC positive patients in this cohort patients went onto interventional trials aimed at eliminating dormant cancer cells. The negative predictive value of DTC assessment was high (0.86), suggesting that this test may potentially be useful in identifying those individuals who are at lower risk of relapse.

-

Future directions will be aimed at assessing the test characteristics of DTC assessment in the full cohort of patients on SURMOUNT to date (n=220) and looking at the incremental value of multiple testing, obtaining ctDNA assessment for this full cohort of patients, and performing survival analyses to assess lead time to clinical events (relapse, death) with DTC and ctDNA assessment and looking at the fluctuation of ctDNA positivity among those patients on clinical trials who had frequent testing while on therapy (and following therapy).

-

We had several limitations: Missing data (though low levels for our variables of interest for this analysis). Our model also includes colinear variables–or variables that represent different ways of thinking about tumor aggressiveness or disease aggressiveness (such as T stage and N stage, which directly feed into Overall Stage) in the LASSO. The LASSO does not account for this, so we will try group lasso as our next step. We also had limited power in creating predictive model for ctDNA in particular given the rarity of positivity in our cohort (though this rate matches the positivity rate in other cohort studies).

-
- -
- - -
- - - - - diff --git a/FinalProjectTaranto.html b/FinalProjectTaranto.html new file mode 100644 index 000000000..d3f5a12fa --- /dev/null +++ b/FinalProjectTaranto.html @@ -0,0 +1 @@ + From 83fd2004cd21b81c62d1493b291d7aa4af55573f Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 9 Dec 2024 14:47:36 -0500 Subject: [PATCH 10/14] Update and rename README.md to READMETaranto.md --- README.md | 19 ------------------- READMETaranto.md | 15 +++++++++++++++ 2 files changed, 15 insertions(+), 19 deletions(-) delete mode 100644 README.md create mode 100644 READMETaranto.md diff --git a/README.md b/README.md deleted file mode 100644 index 1d9eda449..000000000 --- a/README.md +++ /dev/null @@ -1,19 +0,0 @@ -# BMIN503/EPID600 Final Project - -This repository contains templates for the final written report and GitHub repository. Follow the instructions below to clone this repository, and then turn in your final project's code via a pull request to this repository. - -1. To start, **fork** this BMIN503_Final_Project repository. -1. **Clone** the forked repository to your computer. -1. Modify the files provided, add your own, and **commit** changes to complete your final project. -1. **Push**/sync the changes up to your GitHub account. -1. Create a **pull request** on this, the original BMIN503_Final_Project, repository to turn in your final project. - - -Follow the instructions [here][forking] if you are unsure what the above steps mean. - -DUE DATE FOR FINAL VERSION: 12/13/24 11:59PM. This is a hard deadline. Turn in whatever you have by this date. - - - -[forking]: https://guides.github.com/activities/forking/ - diff --git a/READMETaranto.md b/READMETaranto.md new file mode 100644 index 000000000..35c893eeb --- /dev/null +++ b/READMETaranto.md @@ -0,0 +1,15 @@ +# BMIN503/EPID600 Final Project + +# BMIN503/EPID600 Final Project + +Link to my final project: https://github.com/ntaranto/BMIN503_Final_Project + +After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA (ctDNA) as risk factors for recurrence. Historically, in retrospective, observational studies, these markers of minimal residual disease (MRD) have demonstrated strong associations with recurrence and disease relapse. But it remains unclear how these associations persist with modern treatment paradigms, what the time course of positivity, and what clinical risk factors put patients at greatest risk for having DTC or ctDNA positivity and subsequent relapse--as well as which most strongly predict biomarker positivity. + +Optimizing detection, intervention and surveillance for MRD after breast cancer treatment utilizing highly sensitive, real-time assays to identify patients with dormant tumor cells and those undergoing tumor reactivation in-transit to recurrence would fundamentally change the paradigm of breast cancer follow up from one of “watchful waiting” to a proactive approach that could enable ongoing monitoring for subclinical disease and suppression or eradication of cells that could potentially metastasize. Ultimately, an approach using MRD biomarkers could provide reassurance to patients with definitively negative MRD testing that they are unlikely to experience a relapse, could enable detection and treatment strategies for those in whom MRD is detectable and, ultimately, prevent recurrence and subsequent death from metastatic breast cancer while preserving quality of life for millions of breast cancer survivors around the world. + +In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this study, we are following patients over multiple years after their breast cancer for these markers and clinically following them for relapse and survival events. The goal of this study is to assess the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time and risk factors for DTC and ctDNA positivity, optimize the type and number of tests needed to predict recurrence, and further evaluate the long-term impact of therapeutic interventions aimed at eliminating DTCs. + +In this specific analysis, we will look at the clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed. + +For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the associations between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about clinical predictors of positivity and the biomarkers of breast cancer recurrence and dormance more broadly. From 4723177ffc52d3e47abfc63b21fc30566dda17b0 Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 9 Dec 2024 14:51:36 -0500 Subject: [PATCH 11/14] Update FinalProjectTaranto.html --- FinalProjectTaranto.html | 16937 +++++++++++++++++++++++++++++++++++++ 1 file changed, 16937 insertions(+) diff --git a/FinalProjectTaranto.html b/FinalProjectTaranto.html index d3f5a12fa..fc509eacb 100644 --- a/FinalProjectTaranto.html +++ b/FinalProjectTaranto.html @@ -1 +1,16938 @@ + + + + + + + + + + +Predictors of ctDNA positivity + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+

Predictors of ctDNA positivity

+

BMIN503/EPID600 Final Project

+
+ + + +
+ +
+
Author
+
+

Eleanor Taranto

+
+
+ + + +
+ + + +
+ + +
+
+

1 Overview

+

Link to my final project: https://github.com/ntaranto/BMIN503_Final_Project

+

After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA (ctDNA) as risk factors for recurrence. Historically, in retrospective, observational studies, these markers of minimal residual disease (MRD) have demonstrated strong associations with recurrence and disease relapse. But it remains unclear how these associations persist with modern treatment paradigms, what the time course of positivity, and what clinical risk factors put patients at greatest risk for having DTC or ctDNA positivity and subsequent relapse–as well as which most strongly predict biomarker positivity.

+

Optimizing detection, intervention and surveillance for MRD after breast cancer treatment utilizing highly sensitive, real-time assays to identify patients with dormant tumor cells and those undergoing tumor reactivation in-transit to recurrence would fundamentally change the paradigm of breast cancer follow up from one of “watchful waiting” to a proactive approach that could enable ongoing monitoring for subclinical disease and suppression or eradication of cells that could potentially metastasize. Ultimately, an approach using MRD biomarkers could provide reassurance to patients with definitively negative MRD testing that they are unlikely to experience a relapse, could enable detection and treatment strategies for those in whom MRD is detectable and, ultimately, prevent recurrence and subsequent death from metastatic breast cancer while preserving quality of life for millions of breast cancer survivors around the world.

+

In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this study, we are following patients over multiple years after their breast cancer for these markers and clinically following them for relapse and survival events. The goal of this study is to assess the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time and risk factors for DTC and ctDNA positivity, optimize the type and number of tests needed to predict recurrence, and further evaluate the long-term impact of therapeutic interventions aimed at eliminating DTCs.

+

In this specific analysis, we will look at the clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed.

+

For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the associations between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about clinical predictors of positivity and the biomarkers of breast cancer recurrence and dormance more broadly.

+
+
+

2 Introduction

+

Breast cancer is the most prevalent cancer since it is both common and treatable, with 5-year survival rates approaching 90%. Of the 14.5 million cancer survivors alive today in the U.S., nearly 1 in 4 are breast cancer survivors. Unfortunately, up to 30% of patients who receive adjuvant therapy for breast cancer will ultimately recur and die from their disease, typically as a consequence of metastatic recurrence. Since recurrent breast cancer is incurable, the propensity of cancers to recur following treatment is arguably the most important determinant of clinical outcome. Recurrent breast cancers arise from the pool of breast cancer cells that remain after initial treatment, likely in the form of minimal residual disease (MRD): local and disseminated residual tumor cells that survive in their host in a presumed dormant state following treatment of the primary breast cancer. The development of incurable metastatic disease is thought to be due to this persistent pool of residual disease resulting from escape of cells from the primary tumor, intravasation and survival in the circulation, and eventual extravasation and metastatic seeding. Many breast cancers pass through a latent phase in which disseminated tumor cells (DTCs) may persist in niches where they may reside in a dormant state for months to decades. These DTCs are thought to exist in a temporary quiescent state of reversible cell-cycle arrest, from which some cells may eventually reactivate, resume proliferation and recirculate, at which point they can be detected as circulating tumor DNA (ctDNA) in the blood. Longitudinal studies demonstrate that the detection of DTCs in the bone marrow in such patients is associated with poorer disease-free, breast cancer recurrence-free, and overall survival compared to patients without DTCs. Several mechanisms implicated in this process by preclinical studies are therapeutically targetable, and the research group in the 2-PREVENT Breast Cancer Translational Center of Excellence (TCE) have developed several interventional trials aimed at targeting these DTCs.

+

However, it still remains unclear how exactly the presence of DTCs and/or ctDNA predicts relapse in the era of modern treatment for breast cancers, including chemotherapy, immunotherapy, surgery, targeted treatments, and radiation. Questions remain about who will develop DTC/ctDNA positivity, which patients with DTC positivity will have these cells reactivate, whether or not and when DTC positivity leads to ctDNA positivity, and which patients with these markers will develop relapse and subsequent metastatic disease.In the SURMOUNT surveillance study, patients with early stage (i.e. curable) but high-risk breast cancer are enrolled and undergo initial baseline bone marrow assessment (BMA) for evaluation of DTCs by immunohistochemistry (IHC), as well as peripheral blood assessment for retrospective ctDNA assesmsent. Patients who screen DTC positive–either at baseline or on yearly surveillance BMA–are referred for interventional trials aimed at eliminating dormant cells prior to clinical relapse. Patients who screen DTC negative remain in the SURMOUNT surveillance cohort and undergo yearly DTC assessment and peripheral blood collection for ctDNA assessment. All patients are followed for recurrence events and survival. The first intervention trial, CLEVER, completed enrollment in 2021, so this initial analysis is focused on the patients who were enrolled on SURMOUNT for the purposes of accruing this first trial.

+

Despite years of progress in breast cancer diagnostics and therapeutics, identifying the individuals at risk of recurrence – and figuring out how to manage and minimize their elevated risk–remains a challenge. In this study, we sought to assess the clinical validity of DTC and ctDNA assessment and to better understand the population in which they may be useful. Specifically, in this analysis, we looked at overall rates of ctDNA and DTC positivity in this cohort and clinical factors that were associated with each.

+
+
+

3 Methods

+

“PENN SURMOUNT”: SURMOUNT is a single center, prospective, longitudinal cohort study examining MRD biomarkers among pts within 5 years (y) of BC diagnosis who completed all curative treatment except endocrine therapy. Eligible pts must have had: 1) TNBC, or 2) HER2+ or HR+ BC with positive LN and/or residual disease after neoadjuvant therapy, or 3) HR+ BC with a 21-gene Recurrence score >25 and/or high risk Mammaprint. Pts had annual bone marrow aspirate (BMA) for DTCs by immunohistochemistry (using methods of Naume et al.). DTC+ pts went on therapeutic trial; DTC- pts had up to 5y of annual BMA and blood testing. ctDNA was retrospectively assessed using the RaDaR assay (NeoGenomics), which targets pt-specific somatic mutations identified by whole-exome sequencing (WES) of primary tumor tissue.

+

Data Collection and Merge: The ctDNA assessment was performed after bespoke panel development on tissue on peripheral blood from 109 patients by Neogenomics, inc. and provided back to the research team in .csv format, with the last data drop occurring July 30, 2024. DTC assessment was performed based on bone marrow assessment and ultimately entered into a REDCap database by the research team through this same follow-up date. Clinical and demographic factors–and follow-up data–were abstracted by the research team through October 2024 and entered into the same REDCap database. Data were exported in mid-October 2024 by the TCE data manager, and merged with the ctDNA data prior to hand-off for this analysis. The final locked and merged dataset, labeled “surmount184_merged_20241108.csv” is maintained in the TCE box for the ctDNA analysis, and a copy is being stored in the FinalProject_files.

+

First, we will import csv of final data, which is entitled “surmount184_merged_20241108.csv.”

+
+
library(here)
+
+
here() starts at /Users/NoraTaranto/BMIN503_Final_Project
+
+
library(dplyr) 
+
+

+Attaching package: 'dplyr'
+
+
+
The following objects are masked from 'package:stats':
+
+    filter, lag
+
+
+
The following objects are masked from 'package:base':
+
+    intersect, setdiff, setequal, union
+
+
d <- read.csv(file = here("data",
+                          "surmount184_merged_20241108.csv"))
+
+

Next, we will limit the data to the 109 patients who had ctDNA tested, of the 184 individuals who were included in the initial CLEVER trial screening group. We will look at the names and structures of the variables in the dataset “d”, of which there are 387, the majority of which are clinical variables (often categorical dummy variables), but some of which are outcome variables related to local relapse, distant relapse, and survival as well as to the pathology report accounting for DTC. This list will help us to identify the important factors to include ultimately in our multivariable model to predict positivity of these markers. We will also look at the structure of the variables as we may need to reformat some of them for analyses.

+
+
#looking at the names of the variables, and the structure of the variables. 
+names(d) 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                    
+
+
str(d)
+
+
'data.frame':   579 obs. of  387 variables:
+ $ ID                              : int  16001 16001 16001 16001 16001 16002 16003 16004 16005 16005 ...
+ $ trialID                         : int  16001 16001 16001 16001 16001 NA NA 16004 1813 1813 ...
+ $ participant_id                  : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ patient_id                      : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ fu_trial_pid                    : chr  "" "" "" "" ...
+ $ timepoint                       : chr  "SURMOUNT-Baseline" "Year 1 Follow Up" "Year 2 Follow Up" "Long Term FU 1" ...
+ $ project                         : chr  "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" ...
+ $ surmount_id                     : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ panel_id                        : chr  "23A05190P01" "23A05190P01" "23A05190P01" "23A05190P01" ...
+ $ accession                       : chr  "23A07639" "23A07640" "23A07641" "23A07642" ...
+ $ sample_id                       : chr  "23A0763907_pl" "23A0764006_pl" "23A0764105_pl" "23A0764206_pl" ...
+ $ collection_date                 : chr  "03AUG2016" "18SEP2017" "06AUG2018" "11NOV2021" ...
+ $ extracted_plasma_volume_ml      : num  5.61 5.15 3.68 4.87 5.05 ...
+ $ input                           : int  4999 4999 2775 1550 2725 NA NA 4999 NA NA ...
+ $ input_sample                    : int  19996 19996 11100 6200 10900 NA NA 19996 NA NA ...
+ $ physical_run_name               : chr  "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" ...
+ $ workflow_name                   : chr  "RaDaR" "RaDaR" "RaDaR" "RaDaR" ...
+ $ eVAF                            : num  2.07e-10 2.86e-10 5.64e-12 1.08e-09 2.23e-13 ...
+ $ mutant_molecules                : int  0 0 0 0 0 NA NA 0 NA NA ...
+ $ mean_VAF                        : num  1.48e-05 1.21e-05 7.21e-06 1.82e-05 1.10e-05 NA NA 2.85e-05 NA NA ...
+ $ Score                           : num  -1.013 -0.882 -0.74 -0.495 -0.724 ...
+ $ all_pass_variants               : int  16 16 16 16 16 NA NA 47 NA NA ...
+ $ total_variants                  : int  48 48 48 48 48 NA NA 48 NA NA ...
+ $ n_positive_variants             : int  0 0 0 0 0 NA NA 0 NA NA ...
+ $ ctDNA_detected                  : chr  "FALSE" "FALSE" "FALSE" "FALSE" ...
+ $ ctdna_cohort                    : int  1 1 1 NA NA 0 0 1 0 0 ...
+ $ dtc_ihc_date_final              : chr  "03AUG16:00:00:00" "18SEP17:00:00:00" "06AUG18:10:30:00" "" ...
+ $ dtc_ihc_cytospinnum_final       : int  10 5 5 NA NA 10 10 10 10 5 ...
+ $ dtc_ihc_result_final            : int  0 0 0 NA NA 0 0 0 0 0 ...
+ $ dtc_ihc_summary_count_final     : int  0 0 0 NA NA 0 0 0 0 0 ...
+ $ dtc_final_result_date           : chr  "08/03/2016" "09/18/2017" "08/06/2018" "" ...
+ $ pt                              : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ bma_date                        : chr  "" "" "" "" ...
+ $ ORIG_RSLT_DTC                   : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ ORIG_RSLT_DTC_COUNT             : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ FINAL_RESULT                    : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ FINAL_COUNT                     : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ org_consent_date                : chr  "06/09/2016" "06/09/2016" "06/09/2016" "06/09/2016" ...
+ $ demo_initials                   : chr  "LB" "LB" "LB" "LB" ...
+ $ demo_dob                        : chr  "09/21/1957" "09/21/1957" "09/21/1957" "09/21/1957" ...
+ $ demo_sex                        : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ demo_ethnicity                  : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ demo_race___1                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___2                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___3                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___4                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___5                   : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ demo_race___88                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___99                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race_other                 : logi  NA NA NA NA NA NA ...
+ $ prtx_radiation                  : int  1 1 1 1 1 1 1 0 1 1 ...
+ $ prtx_rad_start                  : chr  "04/29/2014" "04/29/2014" "04/29/2014" "04/29/2014" ...
+ $ prtx_rad_end                    : chr  "06/13/2014" "06/13/2014" "06/13/2014" "06/13/2014" ...
+ $ prtx_chemo                      : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ prtx_endo                       : int  1 1 1 1 1 0 0 1 0 0 ...
+ $ prtx_bonemod                    : int  0 0 0 0 0 0 0 0 1 1 ...
+ $ prior_therapy_complete          : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ inc_dx_crit                     : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ inc_dx_crit_list___1            : int  0 0 0 0 0 0 1 0 0 0 ...
+ $ inc_dx_crit_list___2            : int  1 1 1 1 1 0 1 0 1 1 ...
+ $ inc_dx_crit_list___3            : int  0 0 0 0 0 1 0 0 0 0 ...
+ $ inc_dx_crit_list___4            : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ final_receptor_group            : int  2 2 2 2 2 4 1 2 4 4 ...
+ $ demo_race_final                 : int  5 5 5 5 5 5 5 5 5 5 ...
+ $ final_histology                 : chr  "14,3" "14,3" "14,3" "14,3" ...
+ $ final_tumor_grade               : int  2 2 2 2 2 2 3 2 0 0 ...
+ $ final_overall_stage             : int  3 3 3 3 3 2 2 1 3 3 ...
+ $ final_t_stage                   : int  2 2 2 2 2 2 2 1 3 3 ...
+ $ final_n_stage                   : int  3 3 3 3 3 0 1 0 3 3 ...
+ $ fu_date_to                      : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
+ $ fu_surv                         : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ fu_date_death                   : chr  "" "" "" "" ...
+ $ fu_dec_bc_pres                  : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ fu_dec_bc_cause                 : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ fu_locreg_site_num              : chr  "" "" "" "" ...
+ $ fu_locreg_site_char             : chr  "" "" "" "" ...
+ $ fu_locreg_prog                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ fu_locreg_date                  : chr  "" "" "" "" ...
+ $ fu_dist_site_num                : chr  "" "" "" "" ...
+ $ fu_dist_site_char               : chr  "" "" "" "" ...
+ $ fu_dist_prog                    : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ fu_dist_date                    : chr  "" "" "" "" ...
+ $ censor_date                     : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
+ $ chemo_indication_1              : int  1 1 1 1 1 2 2 1 2 2 ...
+ $ chemo_name_1                    : int  2 2 2 2 2 5 2 1 7 7 ...
+ $ chemo_name_other_1              : chr  "" "" "" "" ...
+ $ chemo_start_date_1              : chr  "10/03/2013" "10/03/2013" "10/03/2013" "10/03/2013" ...
+ $ start_date_exact_1              : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ chemo_end_date_1                : chr  "01/16/2014" "01/16/2014" "01/16/2014" "01/16/2014" ...
+ $ end_date_exact_1                : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ chemo_notes_1                   : chr  "" "" "" "" ...
+ $ prior_chemotherapy_complete_1   : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ chemo_indication_2              : int  NA NA NA NA NA 2 NA NA 3 3 ...
+ $ chemo_name_2                    : int  NA NA NA NA NA 3 NA NA 25 25 ...
+ $ chemo_name_other_2              : chr  "" "" "" "" ...
+ $ chemo_start_date_2              : chr  "" "" "" "" ...
+ $ start_date_exact_2              : int  NA NA NA NA NA 1 NA NA 1 1 ...
+ $ chemo_end_date_2                : chr  "" "" "" "" ...
+ $ end_date_exact_2                : chr  "" "" "" "" ...
+  [list output truncated]
+
+
+

Summary variables: We have a few different important summary variables which we’ve identified. Summary variables: final_overall_stage final_t_stage final_n_stage, final_receptor_group (1=‘TNBC’, 2=‘HR+ Her2-’, 3=‘HR+ Her2+’, 4=‘HR- Her2+’) final_tumor_grade final_histology, demo_race_final fu_locreg_site_num (numeric values for local regional site), fu_locreg_site_char (character values for local regional site), fu_dist_site_num (numeric values for distant site), fu_dist_site_char (character values for distant site), censor_date (most recent fu_date_to among patients who are alive without local or distant progression).

+

Limiting from the overall cohort (184) to the ctDNA cohort: We know that this dataset contains 184 individuals (as this was the overall cohort of individuals that were screened for the CLEVER interventional study on SURMOUNT). But we also know, from the separate ctDNA csv and the information from the Neogenomics summary data, that there were 109 individuals on whom ctDNA was assessed. We need to limit the data set “d” to this “ctDNA cohort”–we will call the ctDNA cohort “subset_data.” We have an indicator variable “ctDNA_cohort” with which we can limit this subset.

+
+
#looking at the names of the variables, and the structure of the variables. 
+names(d) 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                    
+
+
str(d)
+
+
'data.frame':   579 obs. of  387 variables:
+ $ ID                              : int  16001 16001 16001 16001 16001 16002 16003 16004 16005 16005 ...
+ $ trialID                         : int  16001 16001 16001 16001 16001 NA NA 16004 1813 1813 ...
+ $ participant_id                  : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ patient_id                      : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ fu_trial_pid                    : chr  "" "" "" "" ...
+ $ timepoint                       : chr  "SURMOUNT-Baseline" "Year 1 Follow Up" "Year 2 Follow Up" "Long Term FU 1" ...
+ $ project                         : chr  "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" "Penn-SURMOUNT" ...
+ $ surmount_id                     : chr  "28115-16-001" "28115-16-001" "28115-16-001" "28115-16-001" ...
+ $ panel_id                        : chr  "23A05190P01" "23A05190P01" "23A05190P01" "23A05190P01" ...
+ $ accession                       : chr  "23A07639" "23A07640" "23A07641" "23A07642" ...
+ $ sample_id                       : chr  "23A0763907_pl" "23A0764006_pl" "23A0764105_pl" "23A0764206_pl" ...
+ $ collection_date                 : chr  "03AUG2016" "18SEP2017" "06AUG2018" "11NOV2021" ...
+ $ extracted_plasma_volume_ml      : num  5.61 5.15 3.68 4.87 5.05 ...
+ $ input                           : int  4999 4999 2775 1550 2725 NA NA 4999 NA NA ...
+ $ input_sample                    : int  19996 19996 11100 6200 10900 NA NA 19996 NA NA ...
+ $ physical_run_name               : chr  "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" "INXR14DEC2301_INXR18DEC2301" ...
+ $ workflow_name                   : chr  "RaDaR" "RaDaR" "RaDaR" "RaDaR" ...
+ $ eVAF                            : num  2.07e-10 2.86e-10 5.64e-12 1.08e-09 2.23e-13 ...
+ $ mutant_molecules                : int  0 0 0 0 0 NA NA 0 NA NA ...
+ $ mean_VAF                        : num  1.48e-05 1.21e-05 7.21e-06 1.82e-05 1.10e-05 NA NA 2.85e-05 NA NA ...
+ $ Score                           : num  -1.013 -0.882 -0.74 -0.495 -0.724 ...
+ $ all_pass_variants               : int  16 16 16 16 16 NA NA 47 NA NA ...
+ $ total_variants                  : int  48 48 48 48 48 NA NA 48 NA NA ...
+ $ n_positive_variants             : int  0 0 0 0 0 NA NA 0 NA NA ...
+ $ ctDNA_detected                  : chr  "FALSE" "FALSE" "FALSE" "FALSE" ...
+ $ ctdna_cohort                    : int  1 1 1 NA NA 0 0 1 0 0 ...
+ $ dtc_ihc_date_final              : chr  "03AUG16:00:00:00" "18SEP17:00:00:00" "06AUG18:10:30:00" "" ...
+ $ dtc_ihc_cytospinnum_final       : int  10 5 5 NA NA 10 10 10 10 5 ...
+ $ dtc_ihc_result_final            : int  0 0 0 NA NA 0 0 0 0 0 ...
+ $ dtc_ihc_summary_count_final     : int  0 0 0 NA NA 0 0 0 0 0 ...
+ $ dtc_final_result_date           : chr  "08/03/2016" "09/18/2017" "08/06/2018" "" ...
+ $ pt                              : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ bma_date                        : chr  "" "" "" "" ...
+ $ ORIG_RSLT_DTC                   : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ ORIG_RSLT_DTC_COUNT             : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ FINAL_RESULT                    : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ FINAL_COUNT                     : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ org_consent_date                : chr  "06/09/2016" "06/09/2016" "06/09/2016" "06/09/2016" ...
+ $ demo_initials                   : chr  "LB" "LB" "LB" "LB" ...
+ $ demo_dob                        : chr  "09/21/1957" "09/21/1957" "09/21/1957" "09/21/1957" ...
+ $ demo_sex                        : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ demo_ethnicity                  : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ demo_race___1                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___2                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___3                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___4                   : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___5                   : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ demo_race___88                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race___99                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ demo_race_other                 : logi  NA NA NA NA NA NA ...
+ $ prtx_radiation                  : int  1 1 1 1 1 1 1 0 1 1 ...
+ $ prtx_rad_start                  : chr  "04/29/2014" "04/29/2014" "04/29/2014" "04/29/2014" ...
+ $ prtx_rad_end                    : chr  "06/13/2014" "06/13/2014" "06/13/2014" "06/13/2014" ...
+ $ prtx_chemo                      : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ prtx_endo                       : int  1 1 1 1 1 0 0 1 0 0 ...
+ $ prtx_bonemod                    : int  0 0 0 0 0 0 0 0 1 1 ...
+ $ prior_therapy_complete          : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ inc_dx_crit                     : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ inc_dx_crit_list___1            : int  0 0 0 0 0 0 1 0 0 0 ...
+ $ inc_dx_crit_list___2            : int  1 1 1 1 1 0 1 0 1 1 ...
+ $ inc_dx_crit_list___3            : int  0 0 0 0 0 1 0 0 0 0 ...
+ $ inc_dx_crit_list___4            : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ final_receptor_group            : int  2 2 2 2 2 4 1 2 4 4 ...
+ $ demo_race_final                 : int  5 5 5 5 5 5 5 5 5 5 ...
+ $ final_histology                 : chr  "14,3" "14,3" "14,3" "14,3" ...
+ $ final_tumor_grade               : int  2 2 2 2 2 2 3 2 0 0 ...
+ $ final_overall_stage             : int  3 3 3 3 3 2 2 1 3 3 ...
+ $ final_t_stage                   : int  2 2 2 2 2 2 2 1 3 3 ...
+ $ final_n_stage                   : int  3 3 3 3 3 0 1 0 3 3 ...
+ $ fu_date_to                      : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
+ $ fu_surv                         : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ fu_date_death                   : chr  "" "" "" "" ...
+ $ fu_dec_bc_pres                  : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ fu_dec_bc_cause                 : int  NA NA NA NA NA NA NA NA NA NA ...
+ $ fu_locreg_site_num              : chr  "" "" "" "" ...
+ $ fu_locreg_site_char             : chr  "" "" "" "" ...
+ $ fu_locreg_prog                  : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ fu_locreg_date                  : chr  "" "" "" "" ...
+ $ fu_dist_site_num                : chr  "" "" "" "" ...
+ $ fu_dist_site_char               : chr  "" "" "" "" ...
+ $ fu_dist_prog                    : int  0 0 0 0 0 0 0 0 0 0 ...
+ $ fu_dist_date                    : chr  "" "" "" "" ...
+ $ censor_date                     : chr  "03/28/2024" "03/28/2024" "03/28/2024" "03/28/2024" ...
+ $ chemo_indication_1              : int  1 1 1 1 1 2 2 1 2 2 ...
+ $ chemo_name_1                    : int  2 2 2 2 2 5 2 1 7 7 ...
+ $ chemo_name_other_1              : chr  "" "" "" "" ...
+ $ chemo_start_date_1              : chr  "10/03/2013" "10/03/2013" "10/03/2013" "10/03/2013" ...
+ $ start_date_exact_1              : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ chemo_end_date_1                : chr  "01/16/2014" "01/16/2014" "01/16/2014" "01/16/2014" ...
+ $ end_date_exact_1                : int  1 1 1 1 1 1 1 1 1 1 ...
+ $ chemo_notes_1                   : chr  "" "" "" "" ...
+ $ prior_chemotherapy_complete_1   : int  2 2 2 2 2 2 2 2 2 2 ...
+ $ chemo_indication_2              : int  NA NA NA NA NA 2 NA NA 3 3 ...
+ $ chemo_name_2                    : int  NA NA NA NA NA 3 NA NA 25 25 ...
+ $ chemo_name_other_2              : chr  "" "" "" "" ...
+ $ chemo_start_date_2              : chr  "" "" "" "" ...
+ $ start_date_exact_2              : int  NA NA NA NA NA 1 NA NA 1 1 ...
+ $ chemo_end_date_2                : chr  "" "" "" "" ...
+ $ end_date_exact_2                : chr  "" "" "" "" ...
+  [list output truncated]
+
+
###### ctDNA to limit to ctDNA cohort (but ok to include NAs as long as they were ever ctDNA cohort == 1) --> shall call this subset_data 
+
+# Identified all participant_ids where ctDNA_cohort == 1 
+valid_participants <- d |> 
+  filter(ctdna_cohort == 1) |> 
+  pull(participant_id) |> 
+  unique()
+
+# Subset the data to include all rows where participant_id is in the valid list
+subset_data <- d |> 
+  filter(participant_id %in% valid_participants)
+
+# Count the number of unique participant_ids in the subset_data
+unique_count <- subset_data |> 
+  summarise(unique_participants = n_distinct(participant_id))
+
+# View the result == 109! This is the correct # of patients. 
+unique_count
+
+
  unique_participants
+1                 109
+
+
+

Creating the ctDNA_ever positive indicator: Now that we have the subset_data = the ctDNA cohort (n=109), we can start looking at demographics. First we will do overall, then we will do divided by ctDNA_detected. We can see, looking at the table by sample count using the ctDNA_detected variable (false = negative/ctDNA was NOT detected, true = positive/ctDNA was detected), that there were 385 negative samples, and 11 positive samples within the ctDNA cohort. Next, we will create the ctDNA_ever variable that will, by participant_id (which is the unique study ID), tell us if that participant ever had ctDNA detected.

+
+
#ctDNA_detected = character, ok 
+
+names(subset_data)
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                    
+
+
### Excluding the FAILS from this cohort 
+######create the ctDNA Ever positive variable 
+table(subset_data$ctDNA_detected) #385 FALSE,  11 TRUE 
+
+

+      FALSE  TRUE 
+    2   385    11 
+
+
table(d$ctDNA_detected)
+
+

+       Fail FALSE  TRUE 
+  175     8   385    11 
+
+
# Create the 'ctDNA_ever' variable: 
+# This will be 1 if ctDNA_detected was 1 for any record for the participant, otherwise 0.
+subset_data <- subset_data  |> 
+  group_by(participant_id) |>
+  mutate(ctDNA_ever = if_else(any(ctDNA_detected == TRUE), TRUE, FALSE)) |>
+  ungroup()
+
+# View the updated data
+table(subset_data$participant_id, subset_data$ctDNA_ever)
+
+
              
+               FALSE TRUE
+  28115-16-001     5    0
+  28115-16-004     1    0
+  28115-16-010     1    0
+  28115-16-014     1    0
+  28115-16-015    12    0
+  28115-16-017     3    0
+  28115-16-020     0    1
+  28115-16-021     9    0
+  28115-16-023     1    0
+  28115-16-025     1    0
+  28115-16-026    10    0
+  28115-16-027     3    0
+  28115-16-029     2    0
+  28115-16-033     2    0
+  28115-16-035     1    0
+  28115-17-001     8    0
+  28115-17-002     9    0
+  28115-17-006     1    0
+  28115-17-008     9    0
+  28115-17-009     1    0
+  28115-17-010     5    0
+  28115-17-011     9    0
+  28115-17-012    10    0
+  28115-17-016     4    0
+  28115-17-017     5    0
+  28115-17-019     9    0
+  28115-17-021     1    0
+  28115-17-022     1    0
+  28115-17-023     0    2
+  28115-17-024     4    0
+  28115-17-025     0    2
+  28115-17-027     8    0
+  28115-17-030     3    0
+  28115-17-031     5    0
+  28115-17-032     0   10
+  28115-17-036     7    0
+  28115-17-039     2    0
+  28115-17-040     4    0
+  28115-17-045     1    0
+  28115-17-046    10    0
+  28115-17-047     3    0
+  28115-17-048     2    0
+  28115-17-050     0    3
+  28115-17-051     9    0
+  28115-17-052     3    0
+  28115-18-001     7    0
+  28115-18-002     2    0
+  28115-18-004     2    0
+  28115-18-006     1    0
+  28115-18-009     1    0
+  28115-18-011     5    0
+  28115-18-014     2    0
+  28115-18-015     5    0
+  28115-18-017     1    0
+  28115-18-020     8    0
+  28115-18-021     8    0
+  28115-18-022    12    0
+  28115-18-023     3    0
+  28115-18-024     2    0
+  28115-18-027     1    0
+  28115-18-028     1    0
+  28115-18-029     4    0
+  28115-18-030     2    0
+  28115-18-031     3    0
+  28115-18-032     6    0
+  28115-18-034     1    0
+  28115-19-001     0    1
+  28115-19-002     2    0
+  28115-19-003     5    0
+  28115-19-004     1    0
+  28115-19-005     3    0
+  28115-19-006     8    0
+  28115-19-007     5    0
+  28115-19-009     6    0
+  28115-19-011     1    0
+  28115-19-012     3    0
+  28115-19-014     2    0
+  28115-19-016     2    0
+  28115-19-017     2    0
+  28115-19-019     3    0
+  28115-19-020     2    0
+  28115-19-021     4    0
+  28115-19-022     2    0
+  28115-19-025     6    0
+  28115-19-028     2    0
+  28115-20-004     2    0
+  28115-20-007     2    0
+  28115-20-009     4    0
+  28115-20-010     1    0
+  28115-21-001     1    0
+  28115-21-002     4    0
+  28115-21-003     0    2
+  28115-21-006     2    0
+  28115-21-007     0    3
+  28115-21-009     3    0
+  28115-21-011     1    0
+  28115-21-013     4    0
+  28115-21-014     2    0
+  28115-21-015     2    0
+  28115-21-016     8    0
+  28115-21-019     1    0
+  28115-21-020     3    0
+  28115-21-021     3    0
+  28115-21-022     1    0
+  28115-21-024     0    2
+  28115-21-025     2    0
+  28115-21-026     2    0
+  28115-21-027     2    0
+  28115-21-028     1    0
+
+
subset_data |> 
+  group_by(participant_id) |> 
+  summarize(ctDNA_ever = first(ctDNA_ever)) |> 
+  count(ctDNA_ever)
+
+
# A tibble: 2 × 2
+  ctDNA_ever     n
+  <lgl>      <int>
+1 FALSE        100
+2 TRUE           9
+
+
+

We can see here using the summary variable ctDNA_ever that there are 100 individuals with always negative results, and 9 individuals with “ever positive” ctDNA results, which matches our original ctDNA source data.

+

Creating the Ever DTC Positive Variable Next, we will create a variable to represent whether someone ever had a DTC positive test. To do this, we will use the final result variable “dtc_ihc_result_final” which tells us, for a given sample/date, whether that DTC result was positive (“1”) or negative (“0”). We see in this data set, by sample, that there are 221 negative samples, and 49 positive samples in this dataset (accross 109 patients, 39 of whom were DTC positive), which aligns with our prior data and consorts.

+
+
names(subset_data) #looking at the names of variables to find the DTC indicator variable 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+
+
library(stringr)
+
+#final result variable is dtc_ihc_result_final. This is on a by sample level though. 
+#final count for DTCs is dtc_ihc_summary_count
+#final result date is dtc_final_result_ date
+
+table(subset_data$dtc_ihc_result_final) #221 negatives, 49 positives 
+
+

+  0   1 
+221  49 
+
+
#making the dtc_ever variable 
+subset_data <- subset_data |> 
+  group_by(participant_id) |> 
+  mutate(dtc_ever = if_else(any(dtc_ihc_result_final == 1, na.rm = TRUE), 1, 0)) |> 
+  ungroup()
+
+table(subset_data$participant_id, subset_data$dtc_ever) 
+
+
              
+                0  1
+  28115-16-001  5  0
+  28115-16-004  1  0
+  28115-16-010  1  0
+  28115-16-014  1  0
+  28115-16-015  0 12
+  28115-16-017  3  0
+  28115-16-020  1  0
+  28115-16-021  0  9
+  28115-16-023  1  0
+  28115-16-025  1  0
+  28115-16-026  0 10
+  28115-16-027  3  0
+  28115-16-029  2  0
+  28115-16-033  2  0
+  28115-16-035  1  0
+  28115-17-001  0  8
+  28115-17-002  0  9
+  28115-17-006  1  0
+  28115-17-008  0  9
+  28115-17-009  1  0
+  28115-17-010  0  5
+  28115-17-011  0  9
+  28115-17-012  0 10
+  28115-17-016  0  4
+  28115-17-017  0  5
+  28115-17-019  0  9
+  28115-17-021  1  0
+  28115-17-022  1  0
+  28115-17-023  2  0
+  28115-17-024  0  4
+  28115-17-025  0  2
+  28115-17-027  0  8
+  28115-17-030  3  0
+  28115-17-031  0  5
+  28115-17-032  0 10
+  28115-17-036  0  7
+  28115-17-039  2  0
+  28115-17-040  4  0
+  28115-17-045  1  0
+  28115-17-046  0 10
+  28115-17-047  3  0
+  28115-17-048  2  0
+  28115-17-050  0  3
+  28115-17-051  0  9
+  28115-17-052  3  0
+  28115-18-001  0  7
+  28115-18-002  2  0
+  28115-18-004  2  0
+  28115-18-006  1  0
+  28115-18-009  1  0
+  28115-18-011  5  0
+  28115-18-014  2  0
+  28115-18-015  0  5
+  28115-18-017  1  0
+  28115-18-020  0  8
+  28115-18-021  0  8
+  28115-18-022  0 12
+  28115-18-023  0  3
+  28115-18-024  2  0
+  28115-18-027  1  0
+  28115-18-028  1  0
+  28115-18-029  4  0
+  28115-18-030  2  0
+  28115-18-031  0  3
+  28115-18-032  0  6
+  28115-18-034  1  0
+  28115-19-001  1  0
+  28115-19-002  2  0
+  28115-19-003  5  0
+  28115-19-004  1  0
+  28115-19-005  3  0
+  28115-19-006  0  8
+  28115-19-007  5  0
+  28115-19-009  0  6
+  28115-19-011  1  0
+  28115-19-012  3  0
+  28115-19-014  2  0
+  28115-19-016  0  2
+  28115-19-017  0  2
+  28115-19-019  3  0
+  28115-19-020  2  0
+  28115-19-021  4  0
+  28115-19-022  0  2
+  28115-19-025  0  6
+  28115-19-028  0  2
+  28115-20-004  2  0
+  28115-20-007  2  0
+  28115-20-009  4  0
+  28115-20-010  1  0
+  28115-21-001  1  0
+  28115-21-002  4  0
+  28115-21-003  2  0
+  28115-21-006  2  0
+  28115-21-007  3  0
+  28115-21-009  3  0
+  28115-21-011  1  0
+  28115-21-013  4  0
+  28115-21-014  2  0
+  28115-21-015  2  0
+  28115-21-016  0  8
+  28115-21-019  1  0
+  28115-21-020  3  0
+  28115-21-021  3  0
+  28115-21-022  1  0
+  28115-21-024  0  2
+  28115-21-025  2  0
+  28115-21-026  2  0
+  28115-21-027  0  2
+  28115-21-028  1  0
+
+
subset_data |> 
+  group_by(participant_id) |> 
+  summarize(dtc_ever = first(dtc_ever)) |> 
+  count(dtc_ever)
+
+
# A tibble: 2 × 2
+  dtc_ever     n
+     <dbl> <int>
+1        0    70
+2        1    39
+
+
+

Looking at the number of DTC positives by unique participant, we see 70 DTC ever negatives, 39 positives, which aligns with our source data on DTC positivity for this specific ctDNA cohort.

+
+
+
+

4 Results

+

Sample and Testing Information: In this cohort of 109 individuals who had ctDNA and DTC testing on SURMOUNT (either at baseline or in follow-up), 100 remained persistently ctDNA negative, and 70 remained persistently DTC negative–with 9 respective ctDNA-positive individuals and 39 DTC-positive individuals. Of 184 pts enrolled from 2016 – 2021, 121 had tissue available; 114/121 (94%) had successful WES (prior data/NeoGenomics data).

+
+
#counts for ctDNA positivity 
+subset_data |>
+  filter(ctDNA_ever == "TRUE") |>
+  summarize(unique_participants = n_distinct(participant_id))
+
+
# A tibble: 1 × 1
+  unique_participants
+                <int>
+1                   9
+
+
table(subset_data$ctDNA_detected) #385 FALSE,  11 TRUE 
+
+

+      FALSE  TRUE 
+    2   385    11 
+
+
table(d$ctDNA_detected) #385 false, 11 true, 8 fails 
+
+

+       Fail FALSE  TRUE 
+  175     8   385    11 
+
+
# Count unique participants with FAIL in ctDNA_detected (this is in database d, the original database, not in the ctDNA cohort, as these patients were excluded from the cohort)
+num_fail <- d |> 
+  filter(ctDNA_detected == "Fail") |>   # Filter rows where ctDNA_detected is FAIL
+  distinct(participant_id) |>          # Select unique participant_id
+  nrow()                                # Count the number of rows
+
+num_fail #4 individuals with Fails in original d dataset 
+
+
[1] 4
+
+
#timepoints of positivity. 2 at baseline, 7 after. 
+subset_data |>
+  filter(ctDNA_ever == "TRUE") |>
+  group_by(participant_id) |>
+  summarize(positive_timepoints = list(timepoint))
+
+
# A tibble: 9 × 2
+  participant_id positive_timepoints
+  <chr>          <list>             
+1 28115-16-020   <chr [1]>          
+2 28115-17-023   <chr [2]>          
+3 28115-17-025   <chr [2]>          
+4 28115-17-032   <chr [10]>         
+5 28115-17-050   <chr [3]>          
+6 28115-19-001   <chr [1]>          
+7 28115-21-003   <chr [2]>          
+8 28115-21-007   <chr [3]>          
+9 28115-21-024   <chr [2]>          
+
+
subset_data |>
+  filter(ctDNA_detected == "TRUE", timepoint == "SURMOUNT-Baseline") |>
+  summarize(count_SURMOUNT_Baseline = n())
+
+
# A tibble: 1 × 1
+  count_SURMOUNT_Baseline
+                    <int>
+1                       2
+
+
#eVAF 
+
+subset_data |>
+  filter(ctDNA_ever == "TRUE") |>
+  summarize(
+    mean_eVAF = mean(eVAF, na.rm = TRUE),
+    median_eVAF = median(eVAF, na.rm = TRUE),
+    sd_eVAF = sd(eVAF, na.rm = TRUE),
+    min_eVAF = min(eVAF, na.rm = TRUE),
+    max_eVAF = max(eVAF, na.rm = TRUE)
+  )
+
+
# A tibble: 1 × 5
+  mean_eVAF median_eVAF  sd_eVAF min_eVAF max_eVAF
+      <dbl>       <dbl>    <dbl>    <dbl>    <dbl>
+1 0.0000893 0.000000413 0.000219 2.14e-18 0.000836
+
+
#### DTC counts 
+
+#counts for DTC positivity --> 39 
+subset_data |>
+  filter(dtc_ever == 1) |>
+  summarize(unique_participants = n_distinct(participant_id))
+
+
# A tibble: 1 × 1
+  unique_participants
+                <int>
+1                  39
+
+
#timepoints of positivity. 
+subset_data |>
+  filter(dtc_ever == 1) |>
+  select(participant_id, timepoint)
+
+
# A tibble: 249 × 2
+   participant_id timepoint        
+   <chr>          <chr>            
+ 1 28115-16-015   SURMOUNT-Baseline
+ 2 28115-16-015   Year 1 Follow Up 
+ 3 28115-16-015   Year 2 Follow Up 
+ 4 28115-16-015   Year 3 Follow Up 
+ 5 28115-16-015   CLEVER-Baseline  
+ 6 28115-16-015   C6               
+ 7 28115-16-015   6M F/U           
+ 8 28115-16-015   12M F/U          
+ 9 28115-16-015   18M F/U          
+10 28115-16-015   24M F/U          
+# ℹ 239 more rows
+
+
# numbers at baseline 
+
+subset_data |>
+  filter(dtc_ihc_result_final == 1, timepoint == "SURMOUNT-Baseline") |>
+  summarize(count_SURMOUNT_Baseline = n())
+
+
# A tibble: 1 × 1
+  count_SURMOUNT_Baseline
+                    <int>
+1                      26
+
+
### Timepoint Data (# timepoints per patient)
+
+# Timepoints per patient (median, range), overall
+timepoints_per_patient <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    total_timepoints = n_distinct(timepoint),  # Count distinct timepoints for each patient
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_timepoints = median(total_timepoints, na.rm = TRUE),  # Calculate median
+    min_timepoints = min(total_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_timepoints = max(total_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+timepoints_per_patient
+
+
# A tibble: 1 × 3
+  median_timepoints min_timepoints max_timepoints
+              <int>          <int>          <int>
+1                 2              1             12
+
+
#  Timepoints of ctDNA assessment (`ctDNA_detected`)
+ctDNA_timepoints <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Filter out NA values for ctDNA_detected
+  group_by(participant_id) |>
+  summarise(
+    ctDNA_timepoints = n_distinct(timepoint),  # Count distinct timepoints of ctDNA assessment
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE),  # Calculate median
+    min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+ctDNA_timepoints
+
+
# A tibble: 1 × 3
+  median_ctDNA_timepoints min_ctDNA_timepoints max_ctDNA_timepoints
+                    <int>                <int>                <int>
+1                       2                    1                   12
+
+
#  Timepoints of DTC assessment (`dtc_ihc_results_final`)
+dtc_timepoints <- subset_data |>
+  filter(!is.na(dtc_ihc_result_final)) |>  # Filter out NA values for dtc_ihc_result_final
+  group_by(participant_id) |>
+  summarise(
+    dtc_timepoints = n_distinct(timepoint),  # Count distinct timepoints of DTC assessment
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE),  # Calculate median
+    min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+dtc_timepoints
+
+
# A tibble: 1 × 3
+  median_dtc_timepoints min_dtc_timepoints max_dtc_timepoints
+                  <int>              <int>              <int>
+1                     2                  1                  6
+
+
# Print all summaries
+print("Timepoints per patient:")
+
+
[1] "Timepoints per patient:"
+
+
print(timepoints_per_patient)
+
+
# A tibble: 1 × 3
+  median_timepoints min_timepoints max_timepoints
+              <int>          <int>          <int>
+1                 2              1             12
+
+
print("Timepoints of ctDNA assessment:")
+
+
[1] "Timepoints of ctDNA assessment:"
+
+
print(ctDNA_timepoints)
+
+
# A tibble: 1 × 3
+  median_ctDNA_timepoints min_ctDNA_timepoints max_ctDNA_timepoints
+                    <int>                <int>                <int>
+1                       2                    1                   12
+
+
print("Timepoints of DTC assessment:")
+
+
[1] "Timepoints of DTC assessment:"
+
+
print(dtc_timepoints)
+
+
# A tibble: 1 × 3
+  median_dtc_timepoints min_dtc_timepoints max_dtc_timepoints
+                  <int>              <int>              <int>
+1                     2                  1                  6
+
+
+

Timepoints of samples: A total of 396 plasma samples from 109 pts (median 2 timepoints each, range 1-12) have been successfully tested by RaDaR to date, with 8 failed samples across 4 unique individuals. These 4 individuals were excluded from the ctDNA cohort as they did not ultimately have succcesful ctDNA assessment. These 109 individuals had a median of 2 DTC assessment timepoints (range 1-6).

+

Overall, ctDNA was detected in 11 samples from 9/109 pts with a mean eVAF of 0.009% (range 0.002-0.084%). Two pts were ctDNA+ at baseline (BL), and 7 became positive on surveillance. 100/109 were ctDNA- across all timepoints. 39/109 pts were DTC+, either at BL (n=26) or after (n=13).

+
+
# Filter and get unique participants by participant_id
+concordance_overall_unique <- subset_data |> 
+  distinct(participant_id, .keep_all = TRUE) |> 
+  mutate(concordance = ifelse(dtc_ever == ctDNA_ever, "Concordant", "Discordant"))
+
+# Count total concordant and discordant pairs for unique participants
+overall_concordant <- sum(concordance_overall_unique$concordance == "Concordant")
+overall_discordant <- sum(concordance_overall_unique$concordance == "Discordant")
+
+# Proportion of concordance
+proportion_concordant <- overall_concordant / (overall_concordant + overall_discordant)
+
+cat("Overall Concordant (unique participants):", overall_concordant, "\n")
+
+
Overall Concordant (unique participants): 69 
+
+
cat("Overall Discordant (unique participants):", overall_discordant, "\n")
+
+
Overall Discordant (unique participants): 40 
+
+
cat("Overall Proportion Concordant (unique participants):", proportion_concordant, "\n")
+
+
Overall Proportion Concordant (unique participants): 0.6330275 
+
+
#Proportion concordance 63% (ever positive)
+unique <- subset_data |>
+  group_by(participant_id) |>
+  summarize(
+    dtc_ever = max(dtc_ever, na.rm = TRUE),    # Ensures 1 if DTC is ever detected
+    ctDNA_ever = max(ctDNA_ever, na.rm = TRUE) # Ensures 1 if ctDNA is ever detected
+  )
+
+# Create the 2x2 table
+table_ctDNA_dtc <- table(unique$ctDNA_ever, unique$dtc_ever)
+print(table_ctDNA_dtc)
+
+
   
+     0  1
+  0 65 35
+  1  5  4
+
+
+
+
#Concordance by timepoint 
+
+# Ensure that dtc_ihc_result_final is converted to TRUE/FALSE for consistency with ctDNA_detected
+concordance_by_timepoint <- subset_data |> 
+  filter(!is.na(dtc_ihc_result_final) & !is.na(ctDNA_detected)) |> 
+  mutate(
+    # Convert dtc_ihc_result_final (1 = TRUE, 0 = FALSE) to match ctDNA format (TRUE/FALSE)
+    dtc_ihc_result_final_bool = ifelse(dtc_ihc_result_final == 1, TRUE, FALSE),
+    # Create a column to determine concordance (both DTC and ctDNA being TRUE or both being FALSE)
+    concordance = ifelse(dtc_ihc_result_final_bool == ctDNA_detected, "Concordant", "Discordant")
+  ) |>
+  group_by(timepoint) |>
+  summarise(
+    total_concordant = sum(concordance == "Concordant"),
+    total_discordant = sum(concordance == "Discordant"),
+    total_samples = n(),  # Total number of samples at this timepoint
+    concordance_rate = total_concordant / total_samples  # Concordance rate per timepoint
+  )
+
+# Print concordance results for each timepoint
+print(concordance_by_timepoint)
+
+
# A tibble: 10 × 5
+   timepoint    total_concordant total_discordant total_samples concordance_rate
+   <chr>                   <int>            <int>         <int>            <dbl>
+ 1 6M F/U                     17                2            19            0.895
+ 2 C12                         4                0             4            1    
+ 3 C3                         17                2            19            0.895
+ 4 C6                         26                2            28            0.929
+ 5 EOO                         5                4             9            0.556
+ 6 SURMOUNT-Ba…               80               29           109            0.734
+ 7 Year 1 Foll…               31                9            40            0.775
+ 8 Year 2 Foll…               21                3            24            0.875
+ 9 Year 3 Foll…               11                3            14            0.786
+10 Year 4 Foll…                3                1             4            0.75 
+
+
# Now calculate overall concordance across all timepoints
+overall_concordance <- sum(concordance_by_timepoint$total_concordant) / 
+  sum(concordance_by_timepoint$total_samples)
+
+cat("Overall Concordance Rate across all timepoints:", overall_concordance, "\n")
+
+
Overall Concordance Rate across all timepoints: 0.7962963 
+
+
#concordance, considering testing by timepoint, is 80% 
+
+

Concordance of DTC and ctDNA testing: Considering all timepoints, concordance was 63%, with higher concordance (80%) taking into account result concordance at each timepoint. Of 39 ever-DTC+ pts, 4 were ctDNA+ (of whom 3/4 recurred) and 35 remained ctDNA- (with 1/30 who recurred).

+

Test Characteristics

+

Next, we will look at ctDNA and DTC test characteristics. First we will look at the association between ctDNA and DTC positivity. Next we will look at the number of tests.

+
+
############ Test Characteristics and Baseline versus cumulative positivity (ctDNA to start)#######
+
+### DTC by ctDNA (ever positive), association between test positivity. 
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    dtc = first(dtc_ever),  # Get the ever dtc for each participant
+    ctDNA_ever = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of dtc vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$dtc, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results, p-val 0.839 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    65    5
+  1    35    4
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.041269, df = 1, p-value = 0.839
+
+
##### Tests (#s and such of tests)
+
+#number of tests (ctDNA)
+library(dplyr)
+
+# Assuming the status variable is named `ctDNA_detected` in d, and then in subset 
+status_summary_d <- d |>
+  group_by(ctDNA_detected) |>
+  summarise(total_samples = n(), .groups = "drop")
+
+# Print the summary -- we've got 385 FALSE, 8 FAILS, 11 TRUES 
+print(status_summary_d)
+
+
# A tibble: 4 × 2
+  ctDNA_detected total_samples
+  <chr>                  <int>
+1 ""                       175
+2 "FALSE"                  385
+3 "Fail"                     8
+4 "TRUE"                    11
+
+
#looking at the number of Fails by unique participant_id
+fail_count <- d |>
+  filter(ctDNA_detected == "Fail") |>   # Filter for rows where status is "FAIL"
+  distinct(participant_id) |>          # Get unique participant IDs
+  summarise(total_fails = n())          # Count unique participant IDs
+
+# Print the result -- 4 individuals with FAIL results, which is what we got in the consort  
+print(fail_count)
+
+
  total_fails
+1           4
+
+
fail_count <- subset_data |>
+  filter(ctDNA_detected == "Fail") |>   # Filter for rows where status is "FAIL"
+  distinct(participant_id) |>          # Get unique participant IDs
+  summarise(total_fails = n())          # Count unique participant IDs
+
+# Print the result -- none of the fails were pulled into the ctDNA cohort  
+print(fail_count)
+
+
# A tibble: 1 × 1
+  total_fails
+        <int>
+1           0
+
+
#number of DTC tests in this cohort of 109 patients 
+
+unique(subset_data$dtc_ihc_result_final) #NA, 0, and 1 
+
+
[1]  0 NA  1
+
+
status_summary_subset <- subset_data |>
+  group_by(dtc_ihc_result_final) |>
+  summarise(total_samples = n(), .groups = "drop")
+
+# Print the summary -- we've got 221 negatives, 49 positives, 128 NAs, across 39 patients (positive) and 70 patients (negative)  
+#### confirm with nick that not missing the NAs, but I suspect based on the below that we are fine and thse are just ctDNA only timepoints 
+print(status_summary_subset)
+
+
# A tibble: 3 × 2
+  dtc_ihc_result_final total_samples
+                 <int>         <int>
+1                    0           221
+2                    1            49
+3                   NA           128
+
+
### looking at NAs -- all of them have FALSE (so i think these are all the ones that had ctDNA timepoints )
+na_participants_dtc <- subset_data |>
+  filter(is.na(dtc_ihc_result_final)) |>
+  select(participant_id, dtc_ihc_result_final, FINAL_RESULT, ORIG_RSLT_DTC, ctDNA_detected, timepoint)
+
+# Print the list of participant IDs with NA in `dtc_ihc_result_final`-- they all have FALSE ctDNA results, so these are the ctDNA timepoints
+#all of the timepoints are long-term except for CLEVER baseline. 
+print(na_participants_dtc, n=128)
+
+
# A tibble: 128 × 6
+    participant_id dtc_ihc_result_final FINAL_RESULT ORIG_RSLT_DTC
+    <chr>                         <int>        <int>         <int>
+  1 28115-16-001                     NA           NA            NA
+  2 28115-16-001                     NA           NA            NA
+  3 28115-16-015                     NA           NA            NA
+  4 28115-16-015                     NA           NA            NA
+  5 28115-16-015                     NA           NA            NA
+  6 28115-16-015                     NA           NA            NA
+  7 28115-16-015                     NA           NA            NA
+  8 28115-16-015                     NA           NA            NA
+  9 28115-16-021                     NA           NA            NA
+ 10 28115-16-021                     NA           NA            NA
+ 11 28115-16-021                     NA           NA            NA
+ 12 28115-16-021                     NA           NA            NA
+ 13 28115-16-026                     NA           NA            NA
+ 14 28115-16-026                     NA           NA            NA
+ 15 28115-16-026                     NA           NA            NA
+ 16 28115-16-026                     NA           NA            NA
+ 17 28115-16-026                     NA           NA            NA
+ 18 28115-16-026                     NA           NA            NA
+ 19 28115-16-033                     NA           NA            NA
+ 20 28115-17-001                     NA           NA            NA
+ 21 28115-17-001                     NA           NA            NA
+ 22 28115-17-001                     NA           NA            NA
+ 23 28115-17-001                     NA           NA            NA
+ 24 28115-17-001                     NA           NA            NA
+ 25 28115-17-002                     NA           NA            NA
+ 26 28115-17-002                     NA           NA            NA
+ 27 28115-17-002                     NA           NA            NA
+ 28 28115-17-002                     NA           NA            NA
+ 29 28115-17-002                     NA           NA            NA
+ 30 28115-17-008                     NA           NA            NA
+ 31 28115-17-008                     NA           NA            NA
+ 32 28115-17-008                     NA           NA            NA
+ 33 28115-17-008                     NA           NA            NA
+ 34 28115-17-008                     NA           NA            NA
+ 35 28115-17-008                     NA           NA            NA
+ 36 28115-17-010                     NA           NA            NA
+ 37 28115-17-011                     NA           NA            NA
+ 38 28115-17-011                     NA           NA            NA
+ 39 28115-17-011                     NA           NA            NA
+ 40 28115-17-011                     NA           NA            NA
+ 41 28115-17-012                     NA           NA            NA
+ 42 28115-17-012                     NA           NA            NA
+ 43 28115-17-012                     NA           NA            NA
+ 44 28115-17-012                     NA           NA            NA
+ 45 28115-17-012                     NA           NA            NA
+ 46 28115-17-012                     NA           NA            NA
+ 47 28115-17-016                     NA           NA            NA
+ 48 28115-17-017                     NA           NA            NA
+ 49 28115-17-017                     NA           NA            NA
+ 50 28115-17-019                     NA           NA            NA
+ 51 28115-17-019                     NA           NA            NA
+ 52 28115-17-019                     NA           NA            NA
+ 53 28115-17-019                     NA           NA            NA
+ 54 28115-17-019                     NA           NA            NA
+ 55 28115-17-024                     NA           NA            NA
+ 56 28115-17-027                     NA           NA            NA
+ 57 28115-17-027                     NA           NA            NA
+ 58 28115-17-027                     NA           NA            NA
+ 59 28115-17-027                     NA           NA            NA
+ 60 28115-17-031                     NA           NA            NA
+ 61 28115-17-032                     NA           NA            NA
+ 62 28115-17-032                     NA           NA            NA
+ 63 28115-17-032                     NA           NA            NA
+ 64 28115-17-032                     NA           NA            NA
+ 65 28115-17-032                     NA           NA            NA
+ 66 28115-17-032                     NA           NA            NA
+ 67 28115-17-036                     NA           NA            NA
+ 68 28115-17-036                     NA           NA            NA
+ 69 28115-17-046                     NA           NA            NA
+ 70 28115-17-046                     NA           NA            NA
+ 71 28115-17-046                     NA           NA            NA
+ 72 28115-17-046                     NA           NA            NA
+ 73 28115-17-046                     NA           NA            NA
+ 74 28115-17-046                     NA           NA            NA
+ 75 28115-17-050                     NA           NA            NA
+ 76 28115-17-050                     NA           NA            NA
+ 77 28115-17-051                     NA           NA            NA
+ 78 28115-17-051                     NA           NA            NA
+ 79 28115-17-051                     NA           NA            NA
+ 80 28115-17-051                     NA           NA            NA
+ 81 28115-17-051                     NA           NA            NA
+ 82 28115-17-052                     NA           NA            NA
+ 83 28115-18-001                     NA           NA            NA
+ 84 28115-18-001                     NA           NA            NA
+ 85 28115-18-001                     NA           NA            NA
+ 86 28115-18-001                     NA           NA            NA
+ 87 28115-18-004                     NA           NA            NA
+ 88 28115-18-015                     NA           NA            NA
+ 89 28115-18-020                     NA           NA            NA
+ 90 28115-18-020                     NA           NA            NA
+ 91 28115-18-020                     NA           NA            NA
+ 92 28115-18-020                     NA           NA            NA
+ 93 28115-18-021                     NA           NA            NA
+ 94 28115-18-021                     NA           NA            NA
+ 95 28115-18-021                     NA           NA            NA
+ 96 28115-18-021                     NA           NA            NA
+ 97 28115-18-021                     NA           NA            NA
+ 98 28115-18-021                     NA           NA            NA
+ 99 28115-18-022                     NA           NA            NA
+100 28115-18-022                     NA           NA            NA
+101 28115-18-022                     NA           NA            NA
+102 28115-18-022                     NA           NA            NA
+103 28115-18-022                     NA           NA            NA
+104 28115-18-022                     NA           NA            NA
+105 28115-18-023                     NA           NA            NA
+106 28115-18-023                     NA           NA            NA
+107 28115-18-029                     NA           NA            NA
+108 28115-18-031                     NA           NA            NA
+109 28115-18-032                     NA           NA            NA
+110 28115-18-032                     NA           NA            NA
+111 28115-18-032                     NA           NA            NA
+112 28115-18-032                     NA           NA            NA
+113 28115-19-002                     NA           NA            NA
+114 28115-19-005                     NA           NA            NA
+115 28115-19-005                     NA           NA            NA
+116 28115-19-006                     NA           NA            NA
+117 28115-19-006                     NA           NA            NA
+118 28115-19-006                     NA           NA            NA
+119 28115-19-009                     NA           NA            NA
+120 28115-19-025                     NA           NA            NA
+121 28115-19-028                     NA           NA            NA
+122 28115-20-007                     NA           NA            NA
+123 28115-21-006                     NA           NA            NA
+124 28115-21-016                     NA           NA            NA
+125 28115-21-016                     NA           NA            NA
+126 28115-21-016                     NA           NA            NA
+127 28115-21-016                     NA           NA            NA
+128 28115-21-025                     NA           NA            NA
+# ℹ 2 more variables: ctDNA_detected <chr>, timepoint <chr>
+
+
#look at timepoints 
+unique_timepoints <- unique(subset_data$timepoint)
+print(unique_timepoints)
+
+
 [1] "SURMOUNT-Baseline" "Year 1 Follow Up"  "Year 2 Follow Up" 
+ [4] "Long Term FU 1"    "Long Term FU 2"    "Year 3 Follow Up" 
+ [7] "CLEVER-Baseline"   "C6"                "6M F/U"           
+[10] "12M F/U"           "18M F/U"           "24M F/U"          
+[13] "30M F/U"           "36M F/U"           "C3"               
+[16] "EOO"               "C12"               "Year 4 Follow Up" 
+
+
##### eVAF 
+names(subset_data) #use eVAF
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                        
+
+
# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE`
+eVAF_range_ctDNA_detected_percent <- subset_data |>
+  filter(ctDNA_detected == TRUE) |>   # Filter for those with ctDNA detected
+  summarise(
+    median_eVAF_percent = median(eVAF, na.rm = TRUE) * 100,  # Convert median to percentage
+    min_eVAF_percent = min(eVAF, na.rm = TRUE) * 100,        # Convert minimum to percentage
+    max_eVAF_percent = max(eVAF, na.rm = TRUE) * 100         # Convert maximum to percentage
+  )
+
+# Print the result
+print(eVAF_range_ctDNA_detected_percent)
+
+
# A tibble: 1 × 3
+  median_eVAF_percent min_eVAF_percent max_eVAF_percent
+                <dbl>            <dbl>            <dbl>
+1             0.00901          0.00165           0.0836
+
+
#### DTC counts 
+names(subset_data) #use dtc_ihc_summary_count_final  
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                        
+
+
# Calculate the median and range (min and max) for `eVAF` as percentages for participants with `ctDNA_detected == TRUE`
+dtc_count <- subset_data |>
+  filter(dtc_ihc_result_final == 1) |>   # Filter for those with dtcs detected 
+  summarise(
+    median_dtc_count = median(dtc_ihc_summary_count_final, na.rm = TRUE), 
+    min_dtc_count = min(dtc_ihc_summary_count_final, na.rm = TRUE),        
+    max_dtc_count = max(dtc_ihc_summary_count_final, na.rm = TRUE)         
+  )
+
+# Print the result
+print(dtc_count)
+
+
# A tibble: 1 × 3
+  median_dtc_count min_dtc_count max_dtc_count
+             <int>         <int>         <int>
+1                2             1            10
+
+
#### Number of timepoints we see 
+
+# Timepoints per patient (median, range)
+timepoints_per_patient <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    total_timepoints = n_distinct(timepoint),  # Count distinct timepoints for each patient
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_timepoints = median(total_timepoints, na.rm = TRUE),  # Calculate median
+    min_timepoints = min(total_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_timepoints = max(total_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+
+#  Timepoints of ctDNA assessment (`ctDNA_detected`)
+ctDNA_timepoints <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Filter out NA values for ctDNA_detected
+  group_by(participant_id) |>
+  summarise(
+    ctDNA_timepoints = n_distinct(timepoint),  # Count distinct timepoints of ctDNA assessment
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_ctDNA_timepoints = median(ctDNA_timepoints, na.rm = TRUE),  # Calculate median
+    min_ctDNA_timepoints = min(ctDNA_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_ctDNA_timepoints = max(ctDNA_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+
+#  Timepoints of DTC assessment (`dtc_ihc_results_final`)
+dtc_timepoints <- subset_data |>
+  filter(!is.na(dtc_ihc_result_final)) |>  # Filter out NA values for dtc_ihc_result_final
+  group_by(participant_id) |>
+  summarise(
+    dtc_timepoints = n_distinct(timepoint),  # Count distinct timepoints of DTC assessment
+    .groups = "drop"
+  ) |>
+  summarise(
+    median_dtc_timepoints = median(dtc_timepoints, na.rm = TRUE),  # Calculate median
+    min_dtc_timepoints = min(dtc_timepoints, na.rm = TRUE),        # Calculate minimum
+    max_dtc_timepoints = max(dtc_timepoints, na.rm = TRUE)         # Calculate maximum
+  )
+
+# Print all summaries
+print("Timepoints per patient:")
+
+
[1] "Timepoints per patient:"
+
+
print(timepoints_per_patient)
+
+
# A tibble: 1 × 3
+  median_timepoints min_timepoints max_timepoints
+              <int>          <int>          <int>
+1                 2              1             12
+
+
print("Timepoints of ctDNA assessment:")
+
+
[1] "Timepoints of ctDNA assessment:"
+
+
print(ctDNA_timepoints)
+
+
# A tibble: 1 × 3
+  median_ctDNA_timepoints min_ctDNA_timepoints max_ctDNA_timepoints
+                    <int>                <int>                <int>
+1                       2                    1                   12
+
+
print("Timepoints of DTC assessment:")
+
+
[1] "Timepoints of DTC assessment:"
+
+
print(dtc_timepoints)
+
+
# A tibble: 1 × 3
+  median_dtc_timepoints min_dtc_timepoints max_dtc_timepoints
+                  <int>              <int>              <int>
+1                     2                  1                  6
+
+
### timepoints on clinical trial ### Ask Nick -- should we include all the timepoints on trial technically 
+#(CLEVER-Baseline, EOO, C3, C6, C12, 6M F/U, etc.) or just the ones while on treatment (C3, C6, C12) 
+#, or only the ones while patiennts are 
+unique_timepoints <- unique(subset_data$timepoint)
+print(unique_timepoints) 
+
+
 [1] "SURMOUNT-Baseline" "Year 1 Follow Up"  "Year 2 Follow Up" 
+ [4] "Long Term FU 1"    "Long Term FU 2"    "Year 3 Follow Up" 
+ [7] "CLEVER-Baseline"   "C6"                "6M F/U"           
+[10] "12M F/U"           "18M F/U"           "24M F/U"          
+[13] "30M F/U"           "36M F/U"           "C3"               
+[16] "EOO"               "C12"               "Year 4 Follow Up" 
+
+
trial_timepoints <- c("CLEVER-Baseline", "EOO", "C3", "C6", "C12", "6M F/U", "12M F/U", "18M F/U", "24M F/U", "30M F/U", "36M F/U")
+
+# Count the number of samples by timepoint (for specific clinical trial timepoints)
+samples_by_trial_timepoint <- subset_data |>
+  filter(timepoint %in% trial_timepoints) |>  # Filter for relevant timepoints
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples = n_distinct(participant_id),  # Count distinct participant_ids (samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result
+print(samples_by_trial_timepoint) #total samples on trial (ctDNA and dtC)
+
+
# A tibble: 11 × 2
+   timepoint       total_samples
+   <chr>                   <int>
+ 1 12M F/U                    18
+ 2 18M F/U                    13
+ 3 24M F/U                    13
+ 4 30M F/U                    12
+ 5 36M F/U                    18
+ 6 6M F/U                     27
+ 7 C12                         4
+ 8 C3                         20
+ 9 C6                         28
+10 CLEVER-Baseline            32
+11 EOO                         9
+
+
#### ctDNA on trial 
+
+ctDNA_samples_by_timepoint <- subset_data |>
+  filter(timepoint %in% trial_timepoints, !is.na(ctDNA_detected)) |>  # Filter for relevant timepoints and ctDNA detected
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples_ctDNA = n_distinct(participant_id),  # Count distinct participant_ids (ctDNA samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result for ctDNA samples -- 32 CLEVER-Baseline, 9 EOO, 20 C3, 28 C6, 4 C12, 27 6M, 18 12M, 13 18M, 13 24M, 12 30M, 18 36M 
+print(ctDNA_samples_by_timepoint)
+
+
# A tibble: 11 × 2
+   timepoint       total_samples_ctDNA
+   <chr>                         <int>
+ 1 12M F/U                          18
+ 2 18M F/U                          13
+ 3 24M F/U                          13
+ 4 30M F/U                          12
+ 5 36M F/U                          18
+ 6 6M F/U                           27
+ 7 C12                               4
+ 8 C3                               20
+ 9 C6                               28
+10 CLEVER-Baseline                  32
+11 EOO                               9
+
+
##### DTC by trial timepoint 
+# Count the number of DTC samples by timepoint (for specific clinical trial timepoints)
+dtc_samples_by_timepoint <- subset_data |>
+  filter(timepoint %in% trial_timepoints, !is.na(dtc_ihc_result_final)) |>  # Filter for relevant timepoints and DTC results
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples_dtc = n_distinct(participant_id),  # Count distinct participant_ids (DTC samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result for DTC samples -- makes sense, no CLEVER baseline timepoints, 9 EOO, 19 C3, 28 C6, 4 C12, 19 6M F/U 
+print(dtc_samples_by_timepoint)
+
+
# A tibble: 5 × 2
+  timepoint total_samples_dtc
+  <chr>                 <int>
+1 6M F/U                   19
+2 C12                       4
+3 C3                       19
+4 C6                       28
+5 EOO                       9
+
+
#### Number of ctDNA timepoints on surmount 
+print(unique_timepoints) 
+
+
 [1] "SURMOUNT-Baseline" "Year 1 Follow Up"  "Year 2 Follow Up" 
+ [4] "Long Term FU 1"    "Long Term FU 2"    "Year 3 Follow Up" 
+ [7] "CLEVER-Baseline"   "C6"                "6M F/U"           
+[10] "12M F/U"           "18M F/U"           "24M F/U"          
+[13] "30M F/U"           "36M F/U"           "C3"               
+[16] "EOO"               "C12"               "Year 4 Follow Up" 
+
+
surmount_timepoints <- c("SURMOUNT-Baseline", "Year 1 Follow Up", "Year 2 Follow Up", "Year 3 Follow Up", "Year 4 Follow Up", "Long Term FU 1", "Long Term FU 2") 
+
+ctDNA_surmount <- subset_data |>
+  filter(timepoint %in% surmount_timepoints, !is.na(ctDNA_detected)) |>  # Filter for relevant timepoints and ctDNA detected
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples_ctDNA = n_distinct(participant_id),  # Count distinct participant_ids (ctDNA samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result for ctDNA samples -- 109 Baseline, Y1 FU 40, Y2 FU 25, Y3 FU 14, Y4 FU 4, LT FU 10, LT 2 FU 2
+print(ctDNA_surmount)
+
+
# A tibble: 7 × 2
+  timepoint         total_samples_ctDNA
+  <chr>                           <int>
+1 Long Term FU 1                     10
+2 Long Term FU 2                      2
+3 SURMOUNT-Baseline                 109
+4 Year 1 Follow Up                   40
+5 Year 2 Follow Up                   25
+6 Year 3 Follow Up                   14
+7 Year 4 Follow Up                    4
+
+
### number of DTC timepoints on surmount 
+# Count the number of DTC samples by timepoint 
+dtc_timepoint_surmount <- subset_data |>
+  filter(timepoint %in% surmount_timepoints, !is.na(dtc_ihc_result_final)) |>  # Filter for relevant timepoints and DTC results
+  group_by(timepoint) |>                      # Group by timepoint
+  summarise(
+    total_samples_dtc = n_distinct(participant_id),  # Count distinct participant_ids (DTC samples)
+    .groups = "drop"  # Remove grouping after summarizing
+  )
+
+# Print the result for DTC samples -- 
+print(dtc_timepoint_surmount)
+
+
# A tibble: 5 × 2
+  timepoint         total_samples_dtc
+  <chr>                         <int>
+1 SURMOUNT-Baseline               109
+2 Year 1 Follow Up                 40
+3 Year 2 Follow Up                 24
+4 Year 3 Follow Up                 14
+5 Year 4 Follow Up                  4
+
+
#### positivity by timepoint -- ctDNA 
+
+ctDNA_pos_rate_by_timepoint <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Ensure we are considering only non-missing ctDNA_detected values
+  group_by(timepoint, participant_id) |>  # Group by timepoint and participant
+  summarise(
+    ctDNA_pos = max(ctDNA_detected == TRUE),  # If any value is TRUE, participant is ctDNA positive at that timepoint
+    .groups = "drop"
+  ) |>
+  group_by(timepoint) |>  # Group again by timepoint to calculate the positivity rate
+  summarise(
+    positivity_rate = mean(ctDNA_pos),  # Calculate the positivity rate for each timepoint
+    total_samples = n_distinct(participant_id),  # Count the number of distinct participants
+    .groups = "drop"
+  )
+
+# Print the result for ctDNA positivity rate by timepoint
+print(ctDNA_pos_rate_by_timepoint)
+
+
# A tibble: 18 × 3
+   timepoint         positivity_rate total_samples
+   <chr>                       <dbl>         <int>
+ 1 12M F/U                    0.0556            18
+ 2 18M F/U                    0                 13
+ 3 24M F/U                    0                 13
+ 4 30M F/U                    0                 12
+ 5 36M F/U                    0.0556            18
+ 6 6M F/U                     0.0370            27
+ 7 C12                        0                  4
+ 8 C3                         0                 20
+ 9 C6                         0                 28
+10 CLEVER-Baseline            0                 32
+11 EOO                        0                  9
+12 Long Term FU 1             0                 10
+13 Long Term FU 2             0                  2
+14 SURMOUNT-Baseline          0.0183           109
+15 Year 1 Follow Up           0.125             40
+16 Year 2 Follow Up           0.04              25
+17 Year 3 Follow Up           0                 14
+18 Year 4 Follow Up           0                  4
+
+
# Calculate cumulative ctDNA positivity rate by timepoint
+ctDNA_pos_rate_cumulative <- ctDNA_pos_rate_by_timepoint |>
+  arrange(timepoint) |>  # Ensure the data is sorted by timepoint
+  mutate(
+    cumulative_pos_rate = cumsum(positivity_rate * total_samples) / cumsum(total_samples)  # Cumulative positivity rate
+  )
+
+print(ctDNA_pos_rate_cumulative)
+
+
# A tibble: 18 × 4
+   timepoint         positivity_rate total_samples cumulative_pos_rate
+   <chr>                       <dbl>         <int>               <dbl>
+ 1 12M F/U                    0.0556            18              0.0556
+ 2 18M F/U                    0                 13              0.0323
+ 3 24M F/U                    0                 13              0.0227
+ 4 30M F/U                    0                 12              0.0179
+ 5 36M F/U                    0.0556            18              0.0270
+ 6 6M F/U                     0.0370            27              0.0297
+ 7 C12                        0                  4              0.0286
+ 8 C3                         0                 20              0.024 
+ 9 C6                         0                 28              0.0196
+10 CLEVER-Baseline            0                 32              0.0162
+11 EOO                        0                  9              0.0155
+12 Long Term FU 1             0                 10              0.0147
+13 Long Term FU 2             0                  2              0.0146
+14 SURMOUNT-Baseline          0.0183           109              0.0159
+15 Year 1 Follow Up           0.125             40              0.0282
+16 Year 2 Follow Up           0.04              25              0.0289
+17 Year 3 Follow Up           0                 14              0.0279
+18 Year 4 Follow Up           0                  4              0.0276
+
+
#### Cumulative positivity ctDNA 
+
+library(dplyr)
+
+# Calculate ctDNA positivity rate by participant
+ctDNA_pos_rate <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Exclude missing ctDNA results
+  group_by(participant_id) |>  # Group by participant
+  summarise(
+    ctDNA_pos = max(ctDNA_detected == TRUE),  # If any value is TRUE, participant is ctDNA positive
+    .groups = "drop"
+  )
+
+# Calculate cumulative positivity rate
+ctDNA_pos_rate_cumulative <- ctDNA_pos_rate |>
+  summarise(
+    total_pos = sum(ctDNA_pos),  # Total number of ctDNA positive participants
+    total_samples = n(),  # Total number of participants
+    cumulative_pos_rate = total_pos / total_samples  # Cumulative positivity rate
+  )
+
+# Print the cumulative positivity rate
+print(ctDNA_pos_rate_cumulative)
+
+
# A tibble: 1 × 3
+  total_pos total_samples cumulative_pos_rate
+      <int>         <int>               <dbl>
+1         9           109              0.0826
+
+
# Count the number of positive ctDNA samples and total samples
+ctDNA_pos_vs_total <- subset_data |>
+  filter(!is.na(ctDNA_detected)) |>  # Exclude missing ctDNA results
+  summarise(
+    total_samples = n(),  # Total number of ctDNA samples
+    positive_samples = sum(ctDNA_detected == TRUE),  # Count of positive ctDNA samples
+    .groups = "drop"
+  ) |>
+  mutate(
+    positivity_rate = positive_samples / total_samples  # Proportion of positive ctDNA samples
+  )
+
+# Print the results
+print(ctDNA_pos_vs_total)
+
+
# A tibble: 1 × 3
+  total_samples positive_samples positivity_rate
+          <int>            <int>           <dbl>
+1           398               11          0.0276
+
+
#### cumulative positivity DTC 
+
+# Calculate ctDNA positivity rate by participant
+DTC_pos_rate <- subset_data |>
+  filter(!is.na(dtc_ihc_result_final)) |>  # Exclude missing ctDNA results
+  group_by(participant_id) |>  # Group by participant
+  summarise(
+    dtc = max(dtc_ihc_result_final == 1),  # If any value is TRUE, participant is ctDNA positive
+    .groups = "drop"
+  )
+
+# Calculate cumulative positivity rate
+DTC_pos_rate_cumulative <- DTC_pos_rate |>
+  summarise(
+    total_pos = sum(dtc),  # Total number of ctDNA positive participants
+    total_samples = n(),  # Total number of participants
+    cumulative_pos_rate = total_pos / total_samples  # Cumulative positivity rate
+  )
+
+# Print the cumulative positivity rate
+print(DTC_pos_rate_cumulative)
+
+
# A tibble: 1 × 3
+  total_pos total_samples cumulative_pos_rate
+      <int>         <int>               <dbl>
+1        39           109               0.358
+
+
# Count the number of positive ctDNA samples and total samples
+dtc_pos_vs_total <- subset_data |>
+  filter(!is.na(dtc_ihc_result_final)) |>  # Exclude missing ctDNA results
+  summarise(
+    total_samples = n(),  # Total number of ctDNA samples
+    positive_samples = sum(dtc_ihc_result_final == 1),  # Count of positive ctDNA samples
+    .groups = "drop"
+  ) |>
+  mutate(
+    positivity_rate = positive_samples / total_samples  # Proportion of positive ctDNA samples
+  )
+
+# Print the results
+print(dtc_pos_vs_total)
+
+
# A tibble: 1 × 3
+  total_samples positive_samples positivity_rate
+          <int>            <int>           <dbl>
+1           270               49           0.181
+
+
+

We see the distribution of test samples by timepoint, and can see that the most samples–and the highest rate of positivity– occurred at SURMOUNT-baseline, but that more samples became positive with subsequent testing and that the cumulative positivity rate rose with additional timepoints–for both DTC and ctDNA assessment.

+

Test Characteristics of ctDNA assay: Next we will look at the sensitivity and specificity of the ctDNA assay.

+
+
######  Test characteristics ctDNA 
+#trying to do ctDNA 2x2 with ever relapsed on a patient level 
+
+library(dplyr)
+library(knitr)
+
+#create ever_relapsed variable 
+subset_data <- subset_data |>
+  mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No"))
+
+
+# Exclude participants with all NA for `ctDNA_ever` or `ever_relapsed`
+summarized_data <- subset_data |>
+  filter(!is.na(ctDNA_ever) | !is.na(ever_relapsed)) |> # Keep rows with at least one non-NA value
+  group_by(participant_id) |>
+  summarize(
+    ctDNA_ever = max(ctDNA_ever, na.rm = TRUE),       
+    ever_relapsed = max(ever_relapsed, na.rm = TRUE)  
+  )
+
+
Warning: There were 2 warnings in `summarize()`.
+The first warning was:
+ℹ In argument: `ever_relapsed = max(ever_relapsed, na.rm = TRUE)`.
+ℹ In group 27: `participant_id = "28115-17-021"`.
+Caused by warning in `max()`:
+! no non-missing arguments, returning NA
+ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
+
+
# Create the confusion matrix
+confusion_matrix <- table(summarized_data$ctDNA_ever, summarized_data$ever_relapsed)
+
+# Extract counts from the confusion matrix
+TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0)  # True Positives
+FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0)  # False Positives
+TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0)  # True Negatives
+FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0)  # False Negatives
+
+# Calculate performance metrics
+sensitivity <- TP / (TP + FN)  # Sensitivity
+specificity <- TN / (TN + FP)  # Specificity
+PPV <- TP / (TP + FP)          # Positive Predictive Value
+NPV <- TN / (TN + FN)          # Negative Predictive Value
+
+# Create a data frame for the table
+performance_table <- data.frame(
+  Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"),
+  Value = c(sensitivity, specificity, PPV, NPV)
+)
+
+# Print the table
+print(performance_table)
+
+
                           Metric     Value
+1                     Sensitivity 0.5714286
+2                     Specificity 0.9892473
+3 Positive Predictive Value (PPV) 0.8888889
+4 Negative Predictive Value (NPV) 0.9387755
+
+
#Format the table for better readability
+kable(performance_table, digits = 2, col.names = c("Metric", "Value"))
+
+ + + + + + + + + + + + + + + + + + + + + + + + + +
MetricValue
Sensitivity0.57
Specificity0.99
Positive Predictive Value (PPV)0.89
Negative Predictive Value (NPV)0.94
+
+
+

This ctDNA assay has high specificity (99%), with a high positive predictive value for relapse (89%) and also a high negative predictive value (94%).

+
+
### Test characteristics for DTC -- and trial #s 
+
+library(dplyr)
+
+# Total unique DTC+ patients
+total_dtc_plus <- subset_data |>
+  filter(dtc_ihc_result_final == 1) |>
+  distinct(participant_id) |>
+  nrow()
+
+# Unique DTC+ patients who went on trial (those who have a trial ID fu_trial_pid)
+dtc_plus_trial <- subset_data |>
+  filter(dtc_ihc_result_final == 1 & !is.na(fu_trial_pid)) |>
+  distinct(participant_id) |>
+  nrow()
+
+# Proportion of DTC+ patients who went on trial
+proportion_trial <- dtc_plus_trial / total_dtc_plus
+
+# Display results
+cat("Total unique DTC+ patients:", total_dtc_plus, "\n")
+
+
Total unique DTC+ patients: 39 
+
+
cat("Unique DTC+ patients who went on trial:", dtc_plus_trial, "\n")
+
+
Unique DTC+ patients who went on trial: 39 
+
+
cat("Proportion of DTC+ patients who went on trial:", proportion_trial, "\n")
+
+
Proportion of DTC+ patients who went on trial: 1 
+
+
# All DTC + patients went on trial (39/39)
+
+
+# Exclude participants with all NA for `dtc_ever` or `ever_relapsed`
+summarized_data <- subset_data |>
+  filter(!is.na(dtc_ever) | !is.na(ever_relapsed)) |> # Keep rows with at least one non-NA value
+  group_by(participant_id) |>
+  summarize(
+    dtc_ever = max(dtc_ever, na.rm = TRUE),       
+    ever_relapsed = max(ever_relapsed, na.rm = TRUE)  
+  )
+
+
Warning: There were 2 warnings in `summarize()`.
+The first warning was:
+ℹ In argument: `ever_relapsed = max(ever_relapsed, na.rm = TRUE)`.
+ℹ In group 27: `participant_id = "28115-17-021"`.
+Caused by warning in `max()`:
+! no non-missing arguments, returning NA
+ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
+
+
# Create the confusion matrix
+confusion_matrix <- table(summarized_data$dtc_ever, summarized_data$ever_relapsed)
+
+# Extract counts from the confusion matrix
+TP <- ifelse(!is.na(confusion_matrix[2, 2]), confusion_matrix[2, 2], 0)  # True Positives
+FP <- ifelse(!is.na(confusion_matrix[2, 1]), confusion_matrix[2, 1], 0)  # False Positives
+TN <- ifelse(!is.na(confusion_matrix[1, 1]), confusion_matrix[1, 1], 0)  # True Negatives
+FN <- ifelse(!is.na(confusion_matrix[1, 2]), confusion_matrix[1, 2], 0)  # False Negatives
+
+# Calculate performance metrics
+sensitivity <- TP / (TP + FN)  # Sensitivity
+specificity <- TN / (TN + FP)  # Specificity
+PPV <- TP / (TP + FP)          # Positive Predictive Value
+NPV <- TN / (TN + FN)          # Negative Predictive Value
+
+# Create a data frame for the table
+performance_table <- data.frame(
+  Metric = c("Sensitivity", "Specificity", "Positive Predictive Value (PPV)", "Negative Predictive Value (NPV)"),
+  Value = c(sensitivity, specificity, PPV, NPV)
+)
+
+# Print the table
+print(performance_table)
+
+
                           Metric     Value
+1                     Sensitivity 0.2857143
+2                     Specificity 0.6344086
+3 Positive Predictive Value (PPV) 0.1052632
+4 Negative Predictive Value (NPV) 0.8550725
+
+
#Format the table for better readability
+library(knitr)
+kable(performance_table, digits = 2, col.names = c("Metric", "Value"))
+
+ + + + + + + + + + + + + + + + + + + + + + + + + +
MetricValue
Sensitivity0.29
Specificity0.63
Positive Predictive Value (PPV)0.11
Negative Predictive Value (NPV)0.86
+
+
+

All of the 39 individuals who were DTC positive went onto an interventional treatment trial aimed at eliminating the presence of the DTCs. This is different from the workflow for ctDNA assessment, which occurred retrospectively–sometimes several years after testing–and was not the basis for any trial/intervention decision-making. It is therefore somewhat challenging to interpret the sensitivity and specificity of the DTC test, as relapse is the outcome and all of these patients are receiving an intervention aimed at eliminating the presence of the DTCs and thereby preventing relapse. The intervention after DTC assessment explains in part the low positive predictive value and the low sensitivity of the test. However, the high negative predictive value of 0.86 in the cohort–which is looking only at those who remained DTC negative and their outcomes (ie. those who did NOT get an intervention) suggests that repeat negative DTC testing (ie always remaining DTC negative on all testing) is valuable in predicting a good outcome (ie. NO relapse during follow-up).

+

Associations with Relapse

+
+
## ctDNA association with relapse ## 
+# link by participant id 
+subset_data_by_id <- subset_data %>%
+  group_by(participant_id) %>%
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs dtc_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results. ctDNA has a strong association with relapse (p<0.0001). 
+print(contingency_table)
+
+
     
+      FALSE TRUE
+  No     92    1
+  Yes     6    8
+
+
print(chisq_test)  
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 42.642, df = 1, p-value = 6.573e-11
+
+
#DTC association with relapse## 
+
+# link by participant id 
+subset_data_by_id <- subset_data %>%
+  group_by(participant_id) %>%
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc = first(dtc_ever),        # Get the dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs dtc_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results. Less strong of an association with relapse (p = 0.774) 
+print(contingency_table)
+
+
     
+       0  1
+  No  59 34
+  Yes 10  4
+
+
print(chisq_test)  
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.079932, df = 1, p-value = 0.7774
+
+
+

Looking at how our two biomarkers are associated with relapse using univariable tests of association, we can see that ctDNA positivity is strongly associated with relapse, but DTC positivity is not. It is important to keep in mind that DTC positivity was the basis for enrollment onto interventional clinical trials that were aimed at eliminating DTCs and preventing relapse (and all DTC positive individuals in this cohort enrolled on interventional trials). This likely confounds our ability to measure the association of DTC positivity with relapse. ctDNA assessment, meanwhile, was performed retrospectively and not used for clinical decision-making.

+

Demographics and Clinical Factor Assessment: Univariable associations by ctDNA status

+

Next we will start to build our Table 1, looking at important clinical and demographic variables in this ctDNA cohort. To start, we will look at univariable tests of association while looking at each variable (using chi-squared tests of association for categorical variables and t-tests for continuous variables).

+
+
library(dplyr)
+
+########### Variables to look at for Table 1 #########
+names(subset_data) #to identify the variables I want to use 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                         "ever_relapsed"                   
+
+
###### median age at diagnosis -- this requires some initial varialbe manipulation to start as the variables are in character form, not date form 
+str(subset_data$diag_date_1) #character -- need to be changed to date 
+
+
 chr [1:398] "08/15/2013" "08/15/2013" "08/15/2013" "08/15/2013" ...
+
+
str(subset_data$demo_dob) #character  -- need to be changed to date 
+
+
 chr [1:398] "09/21/1957" "09/21/1957" "09/21/1957" "09/21/1957" ...
+
+
d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y")  
+d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y")  
+
+str(d$diag_date_1) #dates! 
+
+
 Date[1:579], format: "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" ...
+
+
str(d$demo_dob) #dates! 
+
+
 Date[1:579], format: "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" ...
+
+
### doing the same for subset_data as it didn't carry over into that data set 
+subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y")  
+subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y")  
+
+# calculating age from date of diagnosis to dob 
+subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25
+head(subset_data$age_at_diag)
+
+
[1] 55.89870 55.89870 55.89870 55.89870 55.89870 49.25667
+
+
summary(subset_data$age_at_diag) #median 48.75 
+
+
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
+  27.34   41.73   48.75   49.35   57.63   68.94 
+
+
age_summary <- subset_data |> 
+  group_by(ctDNA_ever) |> 
+  summarise(
+    mean_age = mean(age_at_diag, na.rm = TRUE),  # Calculate mean age
+    median_age = median(age_at_diag, na.rm = TRUE),  # Calculate median age
+    sd_age = sd(age_at_diag, na.rm = TRUE),  # Calculate standard deviation of age
+    n = n()  # Number of participants in each group
+  )
+
+print(age_summary)
+
+
# A tibble: 2 × 5
+  ctDNA_ever mean_age median_age sd_age     n
+  <lgl>         <dbl>      <dbl>  <dbl> <int>
+1 FALSE          49.1       48.5   9.77   372
+2 TRUE           53.3       50.4   7.64    26
+
+
# Perform the Wilcoxon rank-sum test to compare the medians of age between ctDNA_ever positive and negative groups
+wilcox_test_result <- wilcox.test(age_at_diag ~ ctDNA_ever, data = subset_data)
+
+# Print the result
+print(wilcox_test_result)
+
+

+    Wilcoxon rank sum test with continuity correction
+
+data:  age_at_diag by ctDNA_ever
+W = 3499, p-value = 0.01842
+alternative hypothesis: true location shift is not equal to 0
+
+
#looking at range of age for the ctDNA pos vs neg groups 
+age_summary <- subset_data |> 
+  group_by(ctDNA_ever) |> 
+  summarise(
+    min_age = min(age_at_diag, na.rm = TRUE),  # Minimum age
+    max_age = max(age_at_diag, na.rm = TRUE),  # Maximum age
+    .groups = "drop"
+  )
+
+# View the summary table for age 
+print(age_summary)
+
+
# A tibble: 2 × 3
+  ctDNA_ever min_age max_age
+  <lgl>        <dbl>   <dbl>
+1 FALSE         27.3    68.9
+2 TRUE          38.6    64.4
+
+
+
+
##### Race: demo_race_final
+
+# Get the count of unique participant_ids for each category in demo_race_final
+race_counts_unique_percent <- subset_data |>
+  group_by(demo_race_final) |>
+  summarise(unique_participants = n_distinct(participant_id)) |>
+  mutate(percent = unique_participants / sum(unique_participants) * 100)
+
+# View the result
+print(race_counts_unique_percent)
+
+
# A tibble: 3 × 3
+  demo_race_final unique_participants percent
+            <int>               <int>   <dbl>
+1               1                   9   8.26 
+2               3                   1   0.917
+3               5                  99  90.8  
+
+
# Count distinct participant_ids by ctDNA_ever and demo_race_final
+count_distinct_participants <- subset_data |>
+  group_by(demo_race_final, ctDNA_ever) |>
+  summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop")
+
+# Print the result
+count_distinct_participants
+
+
# A tibble: 5 × 3
+  demo_race_final ctDNA_ever distinct_participant_count
+            <int> <lgl>                           <int>
+1               1 FALSE                               8
+2               1 TRUE                                1
+3               3 FALSE                               1
+4               5 FALSE                              91
+5               5 TRUE                                8
+
+
# Step 1: Summarize by unique participant_id
+summarized_data <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ctDNA_ever = first(ctDNA_ever),   # Taking the first observed value of ctDNA_ever for each participant
+    demo_race_final = first(demo_race_final),  # Taking the first observed value of demo_race_final for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table <- table(summarized_data$ctDNA_ever, summarized_data$demo_race_final)
+contingency_table
+
+
       
+         1  3  5
+  FALSE  8  1 91
+  TRUE   1  0  8
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the result p val - 0.91 
+chisq_test
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 0.19084, df = 2, p-value = 0.909
+
+
#####receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') 
+
+# Breakdown of final_receptor_group by unique participant_id
+receptor_status_by_participant <- subset_data |>
+  group_by(participant_id) |>
+  summarise(final_receptor_group = first(final_receptor_group),  # Or choose the most frequent group if needed
+            .groups = "drop")
+
+# View the result
+table(receptor_status_by_participant$final_receptor_group)
+
+

+ 1  2  3  4 
+45 52  8  4 
+
+
# Summarizing data by participant_id, final_receptor_group, and ctDNA_ever
+receptor_ctDNA_status <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    final_receptor_group = first(final_receptor_group),  # Or the most frequent if needed
+    ctDNA_ever = first(ctDNA_ever),  # Taking the first observed value for ctDNA_ever
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table_receptor <- table(receptor_ctDNA_status$final_receptor_group, receptor_ctDNA_status$ctDNA_ever)
+contingency_table_receptor
+
+
   
+    FALSE TRUE
+  1    44    1
+  2    45    7
+  3     8    0
+  4     3    1
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table_receptor)
+
+
Warning in chisq.test(contingency_table_receptor): Chi-squared approximation
+may be incorrect
+
+
# Step 4: Print the result # p-value 0.10
+chisq_test
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table_receptor
+X-squared = 6.2231, df = 3, p-value = 0.1012
+
+
#I was curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive)
+#inclusion criteria inc_dx_crit___1  = TNBC  (This has been confirmed with the study team)
+#inc_dx_crit_list___1  
+
+TNBC_ctDNA_status <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    inc_dx_crit_list___1 = first(inc_dx_crit_list___1),  # Or the most frequent if needed
+    ctDNA_ever = first(ctDNA_ever),  # Taking the first observed value for ctDNA_ever
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table_TNBC <- table(TNBC_ctDNA_status$inc_dx_crit_list___1, TNBC_ctDNA_status$ctDNA_ever)
+contingency_table_TNBC
+
+
   
+    FALSE TRUE
+  0    56    8
+  1    44    1
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table_TNBC)
+
+
Warning in chisq.test(contingency_table_TNBC): Chi-squared approximation may be
+incorrect
+
+
# Step 4: p-val is 0.12 
+chisq_test
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table_TNBC
+X-squared = 2.4526, df = 1, p-value = 0.1173
+
+
### HR positive vs HR negative (Hormone receptor positive vs hormone receptor negative)
+#first, I need to create a HR positive variable (HR_status)
+subset_data <- subset_data |> 
+  mutate(HR_status = case_when(
+    final_receptor_group %in% c(2, 3) ~ "HR+",
+    final_receptor_group %in% c(1, 4) ~ "Non-HR+",
+    TRUE ~ NA_character_  # In case there are missing or other unexpected values
+  ))
+
+# View the new HR_status variable
+table(subset_data$HR_status)
+
+

+    HR+ Non-HR+ 
+    225     173 
+
+
HR_status_by_participant <- subset_data |>
+  group_by(participant_id) |>
+  summarise(HR_status = first(HR_status),  # Or use mode() if you have multiple rows per participant
+            .groups = "drop")
+
+# View the result 
+table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-)
+
+

+    HR+ Non-HR+ 
+     60      49 
+
+
# Summarize ctDNA_detected status by HR_status, for each unique participant_id
+summary_data <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    HR_status = first(HR_status),  # Get the HR_status for the participant
+    ctDNA_status = first(ctDNA_ever),  # Get the ctDNA_detected status for the participant
+    .groups = "drop"
+  )
+
+contingency_table_HR <- table(summary_data$ctDNA_status, summary_data$HR_status)
+contingency_table_HR
+
+
       
+        HR+ Non-HR+
+  FALSE  53      47
+  TRUE    7       2
+
+
chisq_test <- chisq.test(contingency_table_HR)
+
+
Warning in chisq.test(contingency_table_HR): Chi-squared approximation may be
+incorrect
+
+
# Print chi-squared test results #0.28 
+chisq_test
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table_HR
+X-squared = 1.1696, df = 1, p-value = 0.2795
+
+
###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported  
+# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported 
+summary_data <- subset_data |>
+  filter(final_tumor_grade != 3) |>  # Exclude grade == 3
+  group_by(participant_id) |>
+  summarise(
+    grade = first(final_tumor_grade),  # Get the final_tumor_grade for each participant
+    ctDNA_ever = first(ctDNA_ever),    # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of grade vs ctDNA_ever
+contingency_table <- table(summary_data$grade, summary_data$ctDNA_ever)
+
+# View the contingency table
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    75    4
+  1    17    5
+  2     6    0
+
+
# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# View the Chi-squared test result -- p-value 0.0229 
+print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 7.5533, df = 2, p-value = 0.0229
+
+
######histology (final histology)
+#people have different combinations of histology (1-15)
+table(subset_data$participant_id, subset_data$final_histology)
+
+
              
+                1 1,13,14,3 1,3 11,3 12,3 13,3 13,3,5 14 14,15 14,15,3 14,3
+  28115-16-001  0         0   0    0    0    0      0  0     0       0    5
+  28115-16-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-010  0         0   0    0    1    0      0  0     0       0    0
+  28115-16-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-015  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-017  0         0   0    0    0    0      0  3     0       0    0
+  28115-16-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-023  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-026  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-029  0         0   0    0    0    0      0  2     0       0    0
+  28115-16-033  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-035  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-001  0         0   8    0    0    0      0  0     0       0    0
+  28115-17-002  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-006  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-008  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-010  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-012  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-017  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-019  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-023  0         0   2    0    0    0      0  0     0       0    0
+  28115-17-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-030  0         0   3    0    0    0      0  0     0       0    0
+  28115-17-031  0         0   0    0    0    0      0  5     0       0    0
+  28115-17-032  0         0   0   10    0    0      0  0     0       0    0
+  28115-17-036  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-039  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-040  0         0   4    0    0    0      0  0     0       0    0
+  28115-17-045  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-046  0         0   0    0    0   10      0  0     0       0    0
+  28115-17-047  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-048  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-050  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-051  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-052  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-001  0         0   0    0    0    0      0  0     7       0    0
+  28115-18-002  0         0   2    0    0    0      0  0     0       0    0
+  28115-18-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-006  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-009  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-015  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-017  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-021  0         0   0    0    0    0      0  8     0       0    0
+  28115-18-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-023  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-028  1         0   0    0    0    0      0  0     0       0    0
+  28115-18-029  0         0   0    0    0    0      0  4     0       0    0
+  28115-18-030  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-031  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-032  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-034  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-001  0         0   0    0    0    0      0  1     0       0    0
+  28115-19-002  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-003  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-005  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-006  0         0   0    0    0    8      0  0     0       0    0
+  28115-19-007  0         0   0    0    0    0      0  0     0       0    5
+  28115-19-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-012  0         3   0    0    0    0      0  0     0       0    0
+  28115-19-014  0         0   0    0    0    0      0  0     0       0    2
+  28115-19-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-017  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-019  0         0   0    0    0    0      0  0     0       0    3
+  28115-19-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-028  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-007  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-010  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-001  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-002  0         0   0    0    0    0      0  0     0       0    4
+  28115-21-003  0         0   0    0    0    0      0  2     0       0    0
+  28115-21-006  0         0   0    0    0    2      0  0     0       0    0
+  28115-21-007  0         0   0    0    0    0      0  3     0       0    0
+  28115-21-009  0         0   0    0    0    0      3  0     0       0    0
+  28115-21-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-013  0         0   0    0    0    0      0  0     0       0    4
+  28115-21-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-015  0         0   0    0    0    0      0  0     0       2    0
+  28115-21-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-019  0         0   0    0    0    0      0  1     0       0    0
+  28115-21-020  0         0   3    0    0    0      0  0     0       0    0
+  28115-21-021  0         0   0    0    0    0      0  3     0       0    0
+  28115-21-022  0         0   0    0    0    1      0  0     0       0    0
+  28115-21-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-026  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-028  0         0   0    0    0    0      0  0     0       0    1
+              
+               16,3  3 3,5 3,7  5
+  28115-16-001    0  0   0   0  0
+  28115-16-004    0  1   0   0  0
+  28115-16-010    0  0   0   0  0
+  28115-16-014    0  1   0   0  0
+  28115-16-015    0 12   0   0  0
+  28115-16-017    0  0   0   0  0
+  28115-16-020    0  0   1   0  0
+  28115-16-021    0  9   0   0  0
+  28115-16-023    0  1   0   0  0
+  28115-16-025    0  1   0   0  0
+  28115-16-026    0 10   0   0  0
+  28115-16-027    0  3   0   0  0
+  28115-16-029    0  0   0   0  0
+  28115-16-033    0  2   0   0  0
+  28115-16-035    0  1   0   0  0
+  28115-17-001    0  0   0   0  0
+  28115-17-002    0  9   0   0  0
+  28115-17-006    0  1   0   0  0
+  28115-17-008    0  9   0   0  0
+  28115-17-009    0  1   0   0  0
+  28115-17-010    0  5   0   0  0
+  28115-17-011    0  9   0   0  0
+  28115-17-012    0 10   0   0  0
+  28115-17-016    0  4   0   0  0
+  28115-17-017    0  5   0   0  0
+  28115-17-019    0  9   0   0  0
+  28115-17-021    0  1   0   0  0
+  28115-17-022    0  1   0   0  0
+  28115-17-023    0  0   0   0  0
+  28115-17-024    0  0   0   4  0
+  28115-17-025    0  2   0   0  0
+  28115-17-027    0  8   0   0  0
+  28115-17-030    0  0   0   0  0
+  28115-17-031    0  0   0   0  0
+  28115-17-032    0  0   0   0  0
+  28115-17-036    0  7   0   0  0
+  28115-17-039    0  2   0   0  0
+  28115-17-040    0  0   0   0  0
+  28115-17-045    0  0   1   0  0
+  28115-17-046    0  0   0   0  0
+  28115-17-047    0  3   0   0  0
+  28115-17-048    0  2   0   0  0
+  28115-17-050    0  3   0   0  0
+  28115-17-051    0  9   0   0  0
+  28115-17-052    0  0   0   0  3
+  28115-18-001    0  0   0   0  0
+  28115-18-002    0  0   0   0  0
+  28115-18-004    0  2   0   0  0
+  28115-18-006    0  0   0   0  0
+  28115-18-009    0  0   0   0  0
+  28115-18-011    0  5   0   0  0
+  28115-18-014    0  2   0   0  0
+  28115-18-015    0  5   0   0  0
+  28115-18-017    0  0   0   0  0
+  28115-18-020    0  8   0   0  0
+  28115-18-021    0  0   0   0  0
+  28115-18-022    0  0   0  12  0
+  28115-18-023    0  3   0   0  0
+  28115-18-024    0  0   0   2  0
+  28115-18-027    0  1   0   0  0
+  28115-18-028    0  0   0   0  0
+  28115-18-029    0  0   0   0  0
+  28115-18-030    0  2   0   0  0
+  28115-18-031    0  3   0   0  0
+  28115-18-032    0  6   0   0  0
+  28115-18-034    0  1   0   0  0
+  28115-19-001    0  0   0   0  0
+  28115-19-002    0  2   0   0  0
+  28115-19-003    0  5   0   0  0
+  28115-19-004    0  1   0   0  0
+  28115-19-005    0  3   0   0  0
+  28115-19-006    0  0   0   0  0
+  28115-19-007    0  0   0   0  0
+  28115-19-009    0  6   0   0  0
+  28115-19-011    0  1   0   0  0
+  28115-19-012    0  0   0   0  0
+  28115-19-014    0  0   0   0  0
+  28115-19-016    0  2   0   0  0
+  28115-19-017    0  2   0   0  0
+  28115-19-019    0  0   0   0  0
+  28115-19-020    0  2   0   0  0
+  28115-19-021    0  4   0   0  0
+  28115-19-022    0  2   0   0  0
+  28115-19-025    0  6   0   0  0
+  28115-19-028    0  2   0   0  0
+  28115-20-004    0  2   0   0  0
+  28115-20-007    0  2   0   0  0
+  28115-20-009    0  4   0   0  0
+  28115-20-010    0  1   0   0  0
+  28115-21-001    0  1   0   0  0
+  28115-21-002    0  0   0   0  0
+  28115-21-003    0  0   0   0  0
+  28115-21-006    0  0   0   0  0
+  28115-21-007    0  0   0   0  0
+  28115-21-009    0  0   0   0  0
+  28115-21-011    0  1   0   0  0
+  28115-21-013    0  0   0   0  0
+  28115-21-014    0  2   0   0  0
+  28115-21-015    0  0   0   0  0
+  28115-21-016    0  8   0   0  0
+  28115-21-019    0  0   0   0  0
+  28115-21-020    0  0   0   0  0
+  28115-21-021    0  0   0   0  0
+  28115-21-022    0  0   0   0  0
+  28115-21-024    0  2   0   0  0
+  28115-21-025    0  2   0   0  0
+  28115-21-026    2  0   0   0  0
+  28115-21-027    0  2   0   0  0
+  28115-21-028    0  0   0   0  0
+
+
  histology_summary <- subset_data |>
+    distinct(participant_id, final_histology) |>  # Get unique participant-histology combinations
+    group_by(final_histology) |>  # Group by histology type
+    summarise(count = n())  # Count the number of participants per histology type
+  
+  # View the summary table
+  print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology 
+
+
# A tibble: 16 × 2
+   final_histology count
+   <chr>           <int>
+ 1 1                   1
+ 2 1,13,14,3           1
+ 3 1,3                 6
+ 4 11,3                1
+ 5 12,3                1
+ 6 13,3                4
+ 7 13,3,5              1
+ 8 14                 13
+ 9 14,15               1
+10 14,15,3             1
+11 14,3                7
+12 16,3                1
+13 3                  65
+14 3,5                 2
+15 3,7                 3
+16 5                   1
+
+
  #trying to create Ductal, lobular, both, or other variables --> histology_category 
+  subset_data <- subset_data |>
+    mutate(histology_category = case_when(
+      grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular",  # Both Ductal and Lobular
+      grepl("3", as.character(final_histology)) ~ "Ductal",  # Ductal
+      grepl("14", as.character(final_histology)) ~ "Lobular",  # Lobular
+      TRUE ~ "Other"  # Any other combination
+    ))
+  
+  # Count the number of participants in each histology category
+  histology_counts <- subset_data |>
+    group_by(histology_category) |>
+    summarise(count = n_distinct(participant_id))  # Count distinct participants
+  
+  # View the counts -- adds up to 109! 
+  print(histology_counts)
+
+
# A tibble: 4 × 2
+  histology_category      count
+  <chr>                   <int>
+1 Both Ductal and Lobular     9
+2 Ductal                     84
+3 Lobular                    14
+4 Other                       2
+
+
  #contingency table 
+  library(tidyr)
+  contingency_table <- subset_data |>
+    distinct(participant_id, histology_category, ctDNA_ever) |>  # Ensure each patient is counted once
+    count(histology_category, ctDNA_ever) |>
+    pivot_wider(names_from = ctDNA_ever, values_from = n, values_fill = list(n = 0))  # Pivot the table to get ctDNA_ever as columns
+  
+  # 3. Perform the Chi-squared test of independence
+  chisq_test <- chisq.test(contingency_table[,-1])  # Remove the histology_category column for the test
+
+
Warning in chisq.test(contingency_table[, -1]): Chi-squared approximation may
+be incorrect
+
+
  # 4. Print the contingency table
+  print(contingency_table) 
+
+
# A tibble: 4 × 3
+  histology_category      `FALSE` `TRUE`
+  <chr>                     <int>  <int>
+1 Both Ductal and Lobular       9      0
+2 Ductal                       78      6
+3 Lobular                      11      3
+4 Other                         2      0
+
+
  # 5. Print the result of the Chi-squared test p-value - 0.2276
+  print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table[, -1]
+X-squared = 4.334, df = 3, p-value = 0.2276
+
+
#### Staging N stage (Nodal stage) 
+
+table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3)
+
+
              
+                0  1  2  3
+  28115-16-001  0  0  0  5
+  28115-16-004  1  0  0  0
+  28115-16-010  0  0  0  1
+  28115-16-014  1  0  0  0
+  28115-16-015 12  0  0  0
+  28115-16-017  0  0  3  0
+  28115-16-020  0  0  1  0
+  28115-16-021  0  0  9  0
+  28115-16-023  1  0  0  0
+  28115-16-025  1  0  0  0
+  28115-16-026 10  0  0  0
+  28115-16-027  0  3  0  0
+  28115-16-029  2  0  0  0
+  28115-16-033  0  2  0  0
+  28115-16-035  1  0  0  0
+  28115-17-001  0  8  0  0
+  28115-17-002  9  0  0  0
+  28115-17-006  0  1  0  0
+  28115-17-008  9  0  0  0
+  28115-17-009  1  0  0  0
+  28115-17-010  5  0  0  0
+  28115-17-011  0  0  0  9
+  28115-17-012  0  0  0 10
+  28115-17-016  0  4  0  0
+  28115-17-017  0  5  0  0
+  28115-17-019  9  0  0  0
+  28115-17-021  1  0  0  0
+  28115-17-022  1  0  0  0
+  28115-17-023  0  0  2  0
+  28115-17-024  4  0  0  0
+  28115-17-025  2  0  0  0
+  28115-17-027  0  8  0  0
+  28115-17-030  3  0  0  0
+  28115-17-031  5  0  0  0
+  28115-17-032  0  0 10  0
+  28115-17-036  7  0  0  0
+  28115-17-039  2  0  0  0
+  28115-17-040  0  0  4  0
+  28115-17-045  1  0  0  0
+  28115-17-046 10  0  0  0
+  28115-17-047  0  3  0  0
+  28115-17-048  0  0  2  0
+  28115-17-050  3  0  0  0
+  28115-17-051  9  0  0  0
+  28115-17-052  3  0  0  0
+  28115-18-001  0  0  7  0
+  28115-18-002  0  2  0  0
+  28115-18-004  0  0  2  0
+  28115-18-006  0  1  0  0
+  28115-18-009  1  0  0  0
+  28115-18-011  0  5  0  0
+  28115-18-014  0  2  0  0
+  28115-18-015  5  0  0  0
+  28115-18-017  0  1  0  0
+  28115-18-020  8  0  0  0
+  28115-18-021  0  8  0  0
+  28115-18-022 12  0  0  0
+  28115-18-023  0  3  0  0
+  28115-18-024  0  2  0  0
+  28115-18-027  0  1  0  0
+  28115-18-028  1  0  0  0
+  28115-18-029  0  4  0  0
+  28115-18-030  2  0  0  0
+  28115-18-031  0  3  0  0
+  28115-18-032  0  6  0  0
+  28115-18-034  1  0  0  0
+  28115-19-001  0  0  0  1
+  28115-19-002  0  2  0  0
+  28115-19-003  0  5  0  0
+  28115-19-004  0  1  0  0
+  28115-19-005  3  0  0  0
+  28115-19-006  0  8  0  0
+  28115-19-007  0  5  0  0
+  28115-19-009  0  0  0  6
+  28115-19-011  0  1  0  0
+  28115-19-012  0  3  0  0
+  28115-19-014  0  0  0  2
+  28115-19-016  2  0  0  0
+  28115-19-017  2  0  0  0
+  28115-19-019  0  3  0  0
+  28115-19-020  2  0  0  0
+  28115-19-021  0  4  0  0
+  28115-19-022  0  2  0  0
+  28115-19-025  0  6  0  0
+  28115-19-028  2  0  0  0
+  28115-20-004  2  0  0  0
+  28115-20-007  0  0  2  0
+  28115-20-009  4  0  0  0
+  28115-20-010  0  1  0  0
+  28115-21-001  0  1  0  0
+  28115-21-002  0  4  0  0
+  28115-21-003  0  0  2  0
+  28115-21-006  0  2  0  0
+  28115-21-007  0  0  3  0
+  28115-21-009  0  0  3  0
+  28115-21-011  1  0  0  0
+  28115-21-013  0  4  0  0
+  28115-21-014  0  2  0  0
+  28115-21-015  0  2  0  0
+  28115-21-016  8  0  0  0
+  28115-21-019  0  1  0  0
+  28115-21-020  0  3  0  0
+  28115-21-021  0  3  0  0
+  28115-21-022  1  0  0  0
+  28115-21-024  2  0  0  0
+  28115-21-025  0  2  0  0
+  28115-21-026  0  2  0  0
+  28115-21-027  2  0  0  0
+  28115-21-028  1  0  0  0
+
+
nodal_summary <- subset_data |>
+    distinct(participant_id, final_n_stage) |>  # Get unique participant-stage combinations
+    group_by(final_n_stage) |>  # Group by stage
+    summarise(count = n())  # Count the number of participants per histology type
+  
+#View the summary table --adds up to 109, 46 = pN0 63 = pN1 
+  print(nodal_summary)
+
+
# A tibble: 4 × 2
+  final_n_stage count
+          <int> <int>
+1             0    46
+2             1    43
+3             2    13
+4             3     7
+
+
  subset_data_by_id <- subset_data |>
+    filter(final_n_stage %in% c(0, 1, 2, 3)) |>  # Include only relevant nodal stages
+    group_by(participant_id) |>
+    summarise(
+      nodal_status = first(final_n_stage),  # Use final_n_stage as nodal_status for each participant
+      ctDNA_ever = first(ctDNA_ever),       # Get ctDNA_ever status for each participant
+      .groups = "drop"
+    )
+  
+  #Create a contingency table of nodal_status vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$ctDNA_ever)
+  
+  # Check if any cells in the contingency table have zero counts, which could affect test validity
+  print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    43    3
+  1    43    0
+  2     8    5
+  3     6    1
+
+
  # Step 5: Perform Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Step 6: Print the Chi-squared test result p = 0.0001 
+  print(chisq_test) 
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 20.045, df = 3, p-value = 0.0001661
+
+
  #### Node positive versus node negative: Using the final n stage to create a Node - vs node + variable from this summary indicator variable 
+  subset_data_by_id <- subset_data |>
+    group_by(participant_id) |>
+    summarise(
+      node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"),  # Node negative if 0, positive otherwise
+      ctDNA_ever = first(ctDNA_ever),  # Get ctDNA_ever status for each participant
+      .groups = "drop"
+    )
+  
+  #adding node_status to subset_data 
+ subset_data <- subset_data |>
+  left_join(subset_data_by_id |> select(participant_id, node_status), by = "participant_id")
+  
+  
+  #Create a contingency table of node_status vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  #Print the contingency table and Chi-squared test results
+  print(contingency_table)
+
+
               
+                FALSE TRUE
+  Node Negative    43    3
+  Node Positive    57    6
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.044142, df = 1, p-value = 0.8336
+
+
#######Looking at T stage or tumor size: the variable is final_t_stage 
+  
+  table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (99 = pTx, cannot evaluate) so can proceed with this 
+
+

+  1   2   3   4  99 
+173 168  46  10   1 
+
+
  t_summary <- subset_data |>
+    distinct(participant_id, final_t_stage) |>  # Get unique participant-stage combinations
+    group_by(final_t_stage) |>  # Group by stage
+    summarise(count = n())  # Count the number of participants per histology type
+  
+  # View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2  
+  print(t_summary)
+
+
# A tibble: 5 × 2
+  final_t_stage count
+          <int> <int>
+1             1    51
+2             2    44
+3             3    12
+4             4     1
+5            99     1
+
+
  #for our T stage table, will use T1 vs T2 or greater to simplify, and we want to exclude 99 (the pTx). We will create "final_t_stage_combined" to represent this.  
+  subset_data_clean <- subset_data |>
+    filter(final_t_stage != 99, ctDNA_ever != 99)
+  
+  # Combine final_t_stage into T1 vs. T2 or greater
+  subset_data_clean <- subset_data_clean |>
+    mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater"))
+  
+  # Summarize the data by participant_id after creating the new combined t_stage
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+      final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+    )
+  
+  # Create a contingency table of final_t_stage_combined vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results. P value = 0.6
+  print(contingency_table)
+
+
               
+                FALSE TRUE
+  T1               48    3
+  T2 or greater    51    6
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.27357, df = 1, p-value = 0.6009
+
+
#### I looked at a different cut-off for T stage stats using T3 or greater as cutoff and didn't see any significant difference so am not using this for the table. 
+  
+  #exclude 99 (the pTx) 
+  subset_data_clean <- subset_data |>
+    filter(final_t_stage != 99, ctDNA_ever != 99)
+  
+  # Combine final_t_stage into T1/T2 or T3 or greater
+  subset_data_clean <- subset_data_clean |>
+    mutate(final_t_stage_combined = case_when(
+      final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2",  # Group T1 and T2 together
+      final_t_stage >= 3 ~ "T3 or greater",  # Group T3 and higher as a separate category
+      TRUE ~ NA_character_  # Handle any unexpected values
+    ))
+  
+  
+  # Summarize the data by participant_id after creating the new combined t_stage
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+      final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+    )
+  
+  # Create a contingency table of final_t_stage_combined vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results --> not significant so ignore this 
+  print(contingency_table)
+
+
               
+                FALSE TRUE
+  T1 or T2         88    7
+  T3 or greater    11    2
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.19875, df = 1, p-value = 0.6557
+
+
  ########Overall stage of disease -- final_overall_stage 
+
+  table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this 
+
+

+  1   2   3  99 
+124 167 105   2 
+
+
  stage_summary <- subset_data |>
+    distinct(participant_id, final_overall_stage) |>  # Get unique participant-stage combinations
+    group_by(final_overall_stage) |>  # Group by stage
+    summarise(count = n())  # Count the number of participants per histology type
+  
+  # View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay)
+  print(stage_summary)
+
+
# A tibble: 4 × 2
+  final_overall_stage count
+                <int> <int>
+1                   1    35
+2                   2    47
+3                   3    26
+4                  99     1
+
+
  #exclude the 99 
+  subset_data_clean <- subset_data |>
+    filter(final_overall_stage != 99, ctDNA_ever != 99)
+  
+  # Summarize the data by participant_id
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+      final_overall_stage = first(final_overall_stage),  # Get the final_overall_stage for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+    )
+  
+  # Create a contingency table of final_overall_stage vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results --> hot dogggg p val = 0.006. Higher stage is associated with ctDNA_ever.  
+  print(contingency_table)
+
+
   
+    FALSE TRUE
+  1    33    2
+  2    46    1
+  3    20    6
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 10.082, df = 2, p-value = 0.006466
+
+
###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) 
+  
+  table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness 
+
+

+  1   2 
+158 240 
+
+
  surgery <- subset_data |>
+    distinct(participant_id, diag_surgery_type_1) |>  # Get unique participant-stage combinations
+    group_by(diag_surgery_type_1) |>  # Group by stage
+    summarise(count = n())  # Count the number of participants per histology type
+  
+  # View the summary table
+  print(surgery)
+
+
# A tibble: 2 × 2
+  diag_surgery_type_1 count
+                <int> <int>
+1                   1    45
+2                   2    64
+
+
  # Summarize the data by participant_id
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+     surgery = first(diag_surgery_type_1),  # Get the final_overall_stage for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+    )
+  
+  # Create a contingency table of final_overall_stage vs ctDNA_ever
+  contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$ctDNA_ever)
+  
+  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results --> p-val = 1....
+  print(contingency_table)
+
+
   
+    FALSE TRUE
+  1    41    4
+  2    58    5
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0, df = 1, p-value = 1
+
+
######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms). Created a new variable (axillary_dissection) 
+
+  
+  table(subset_data$diag_axillary_type___2_1) 
+
+

+  0   1 
+215 183 
+
+
  table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two 
+
+

+ 0  1 
+16  4 
+
+
  # Create a binary variable to identify participants who had axillary dissection
+  subset_data_clean <- subset_data |>
+    mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0))
+  
+  subset_data <- subset_data |>
+  mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0))
+  
+  # Ensure every participant has a ctDNA_ever and axillary_dissection value
+  # Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one
+  subset_data_clean <- subset_data |>
+    mutate(axillary_dissection = case_when(
+      diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1,  # Had axillary dissection
+      TRUE ~ 0  # No axillary dissection (includes missing values)
+    ))
+  
+  # Summarize the data by participant_id, including the axillary_dissection and ctDNA_ever variables
+  subset_data_by_id <- subset_data_clean |>
+    group_by(participant_id) |>
+    summarise(
+      axillary_dissection = first(axillary_dissection),  # Get the axillary dissection status for each participant
+      ctDNA_ever = first(ctDNA_ever)  # Get the ctDNA_ever status for each participant
+    )
+  
+  contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$ctDNA_ever)
+  
+  subset_data <- subset_data |>
+  mutate(axillary_dissection = ifelse(is.na(axillary_dissection), 0, axillary_dissection))
+table(subset_data$axillary_dissection)
+
+

+  0   1 
+214 184 
+
+
  # Perform the Chi-squared test
+  chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
  # Print the contingency table and Chi-squared test results --> p-value 0.173 
+  print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    52    2
+  1    48    7
+
+
  print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 1.8588, df = 1, p-value = 0.1728
+
+
####inflammatory (variable inflamm_yn)-- I have decided not to include inflammatory variable in table 1 as there were NO inflammatory breast cancers in the ctDNA cohort. 
+table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable)
+
+

+  0   1 
+568  11 
+
+
table(d$inflamm_yn_2)  ### I think inflammatory folks just not in subset of patients in the ctDNA cohort 
+
+

+ 0 
+24 
+
+
table(subset_data$inflamm_yn) 
+
+
Warning: Unknown or uninitialised column: `inflamm_yn`.
+
+
+
< table of extent 0 >
+
+
#### radiation prtx_radiation 
+table(subset_data$prtx_radiation) 
+
+

+  0   1 
+116 282 
+
+
radiation <- subset_data |> 
+  distinct(participant_id,prtx_radiation) |> 
+  group_by(prtx_radiation) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(radiation)
+
+
# A tibble: 2 × 2
+  prtx_radiation count
+           <int> <int>
+1              0    34
+2              1    75
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    radiation = first(prtx_radiation),  # xrt for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.33 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    33    1
+  1    67    8
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.96444, df = 1, p-value = 0.3261
+
+
#### chemotherapy prtx_chemo 
+table(subset_data$prtx_chemo) 
+
+

+  0   1 
+ 18 380 
+
+
chemo <- subset_data |> 
+  distinct(participant_id,prtx_chemo) |> 
+  group_by(prtx_chemo) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(chemo) #3 people did not get chemo in this cohort 
+
+
# A tibble: 2 × 2
+  prtx_chemo count
+       <int> <int>
+1          0     3
+2          1   106
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    chemo = first(prtx_chemo),  # chemo for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.59 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0     2    1
+  1    98    8
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.28802, df = 1, p-value = 0.5915
+
+
####neoadjuvant chemo -- there are two variables for this that could theoretically be included: diag_neoadj_chemo_1 or diag_neoadj_chemo_2 
+
+table(subset_data$diag_neoadj_chemo_1) 
+
+

+  0   1 
+327  71 
+
+
table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable 
+
+

+ 0 
+20 
+
+
nact <- subset_data |> 
+  distinct(participant_id,diag_neoadj_chemo_1) |> 
+  group_by(diag_neoadj_chemo_1) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(nact) #3 people did not get chemo in this cohort 
+
+
# A tibble: 2 × 2
+  diag_neoadj_chemo_1 count
+                <int> <int>
+1                   0    90
+2                   1    19
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    nact = first(diag_neoadj_chemo_1),  # NACT for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of NACT vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.95 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    82    8
+  1    18    1
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.0039839, df = 1, p-value = 0.9497
+
+
####hormone therapy prtx_endo 
+
+table(subset_data$prtx_endo) 
+
+

+  0   1 
+156 242 
+
+
endo <- subset_data |> 
+  distinct(participant_id,prtx_endo) |> 
+  group_by(prtx_endo) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(endo) #most ppl did get endo (62 of the 109)
+
+
# A tibble: 2 × 2
+  prtx_endo count
+      <int> <int>
+1         0    47
+2         1    62
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    endo = first(prtx_endo),  # Get the final_overall_stage for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.33 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    45    2
+  1    55    7
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.94139, df = 1, p-value = 0.3319
+
+
####bone modifying agents prtx_bonemod 
+
+table(subset_data$prtx_bonemod) 
+
+

+  0   1 
+238 160 
+
+
bonemod <- subset_data |> 
+  distinct(participant_id,prtx_bonemod) |> 
+  group_by(prtx_bonemod) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(bonemod) #most ppl did get endo (39 got bonemod)
+
+
# A tibble: 2 × 2
+  prtx_bonemod count
+         <int> <int>
+1            0    70
+2            1    39
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    bonemod = first(prtx_bonemod),  # Get bone mod status for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of bonemod vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.84 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    65    5
+  1    35    4
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.041269, df = 1, p-value = 0.839
+
+
#### PCR -- did NOT include this in Table 1 as it aligns closely with NACT) 
+# 2 = non-pcr, 1 = pcr 
+#the variables of interest for path cr: diag_pcr_1 or diag_pcr_2  
+table(subset_data$diag_pcr_1) 
+
+

+  .   1   2 
+327   8  63 
+
+
table(subset_data$diag_pcr_2) #none recorded here so can just use diag_pcr_1 
+
+

+      . 
+378  20 
+
+
pcr <- subset_data |>
+  mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) |>  # Convert "." to NA
+  filter(!is.na(diag_pcr_1)) |>  # Exclude rows where diag_pcr_1 is NA
+  distinct(participant_id, diag_pcr_1) |>
+  group_by(diag_pcr_1) |>
+  summarise(count = n()) # Count the number of participants per histology type
+
+# View the summary table
+print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data 
+
+
# A tibble: 2 × 2
+  diag_pcr_1 count
+  <chr>      <int>
+1 1              1
+2 2             18
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    pcr = first(diag_pcr_1),  # Get pcr for each participant
+    ctDNA_ever = first(ctDNA_ever)  # Get ctDNA_ever status for each participant
+  )
+
+# Create a contingency table of pcr vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.86 -- does not seem to be association among those who got pcr (but also we have a group with 1 in it...) 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  .    82    8
+  1     1    0
+  2    17    1
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 0.31085, df = 2, p-value = 0.8561
+
+
########recurrence
+#local first, then distant.then create summary variable of either locreg or distant 
+#local fu_locreg_prog 
+
+# Step 1: Summarize data by unique participant_id
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    fu_locreg_prog = first(fu_locreg_prog),  # Get fu_locreg_prog status for each participant
+    ctDNA_ever = first(ctDNA_ever),          # Get ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of fu_locreg_prog vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$ctDNA_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    96    5
+  1     2    4
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 20.564, df = 1, p-value = 5.768e-06
+
+
####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char 
+### Just want to look at site distribution here 
+
+# Summarize the distribution of fu_locreg_site_char by unique participant_id
+site_distribution <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    site = first(fu_locreg_site_char),  # Get the site for each unique participant
+    .groups = "drop"
+  ) |>
+  count(site)  # Count the occurrences of each site
+
+# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast 
+print(site_distribution)
+
+
# A tibble: 6 × 2
+  site                                                              n
+  <chr>                                                         <int>
+1 ""                                                              103
+2 "Axillary Nodes"                                                  2
+3 "Axillary Nodes,Internal Mammary Nodes,Supraclavicular Nodes"     1
+4 "Ipsilateral Breast"                                              1
+5 "Ipsilateral Breast,Axillary Nodes"                               1
+6 "Supraclavicular Nodes"                                           1
+
+
#####distant recurrence: distant fu_dist_prog 
+
+# Step 1: Summarize data by unique participant_id
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    fu_dist_prog = first(fu_dist_prog),  # Get fu_dist_prog status for each participant
+    ctDNA_ever = first(ctDNA_ever),          # Get ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of dist prog vs ctDNA_ever --> 12 who had distant progression 
+contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$ctDNA_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the contingency table and Chi-squared test results -- p-val of <<<< 0.000001 
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0    93    2
+  1     5    7
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 36.73, df = 1, p-value = 1.356e-09
+
+
### Distant sites 
+#distant site fu_dist_site_num #fu_dist_site_char  -- start just looking at the locations 
+
+# Summarize the distribution of fu_dist_site_char by unique participant_id
+dist_site_distribution <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    site = first(fu_dist_site_char),  # Get the site for each unique participant
+    .groups = "drop"
+  ) |>
+  count(site)  # Count the occurrences of each site
+
+# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal 
+print(dist_site_distribution)
+
+
# A tibble: 8 × 2
+  site                  n
+  <chr>             <int>
+1 ""                   97
+2 "Bone"                5
+3 "Bone,Other"          1
+4 "Intra-abdominal"     1
+5 "Liver"               2
+6 "Liver,Bone"          1
+7 "Lung"                1
+8 "Pleura,Lung"         1
+
+
##### ANY Recurrence -- this includes either fu_locreg_prog or fu_dist_prog 
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    ctDNA_ever = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results -- total 14 relapses, 8 were ctDNA +, 6 were not ever ctDNA positive 
+print(contingency_table)
+
+
     
+      FALSE TRUE
+  No     92    1
+  Yes     6    8
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 42.642, df = 1, p-value = 6.573e-11
+
+
#### Relapse and DTCs  
+#using ever_relapsed and dtc_ever
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc = first(dtc_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
     
+       0  1
+  No  59 34
+  Yes 10  4
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.079932, df = 1, p-value = 0.7774
+
+
# Identify participants missing data in either `ever_relapsed` or `dtc_ever`
+missing_data <- subset_data_by_id |>
+  filter(is.na(ever_relapsed) | is.na(dtc))
+
+# Print the IDs of participants with missing data
+print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above)
+
+
[1] "28115-17-021" "28115-18-032"
+
+
### look at ever_relapsed by ctDNA 
+
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    ctDNA = first(ctDNA_ever),        # Get the ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$ctDNA)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results, p < 0.00001 
+print(contingency_table)
+
+
     
+      FALSE TRUE
+  No     92    1
+  Yes     6    8
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 42.642, df = 1, p-value = 6.573e-11
+
+
####survival: fu_survival 
+
+table(subset_data$fu_surv)
+
+

+  0   1 
+  8 389 
+
+
surv <- subset_data |>
+  distinct(participant_id, fu_surv) |>
+  group_by(fu_surv) |>
+  summarise(count = n()) # Count the number of participants per histology type
+
+# View the summary table
+print(surv) #1 NA patient --> identify the NA patient below  dead = 5, alive 103. There is 1 that's an NA. 
+
+
# A tibble: 3 × 2
+  fu_surv count
+    <int> <int>
+1       0     5
+2       1   103
+3      NA     1
+
+
na_participant <- subset_data |>
+  filter(is.na(fu_surv)) |>
+  select(participant_id, fu_surv)
+
+# Print the result -- 28115-17-021  -- no follow up data for this pt looking in redcap, everyone else has some survival data in the ctDNA cohort. 
+print(na_participant)
+
+
# A tibble: 1 × 2
+  participant_id fu_surv
+  <chr>            <int>
+1 28115-17-021        NA
+
+
# Summarize data by unique participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    surv = first(fu_surv),          # Get survival status for each participant
+    ctDNA_ever = first(ctDNA_ever),  # Get ctDNA_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of surv vs ctDNA_ever
+contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$ctDNA_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results, p<0.00001
+print(contingency_table)
+
+
   
+    FALSE TRUE
+  0     1    4
+  1    98    5
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 26.099, df = 1, p-value = 3.243e-07
+
+
+

DTC Demographics and Univariable tests of association: Next we will look at the univariable tests of association by DTC status.

+
+
############### DTC Demographics ########## 
+
+###### median age at diagnosis 
+
+#### Age at Dx (by DTC)
+
+names(subset_data) #to identify the variables I want to use 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                         "ever_relapsed"                   
+[391] "age_at_diag"                      "HR_status"                       
+[393] "histology_category"               "node_status"                     
+[395] "axillary_dissection"             
+
+
str(subset_data$diag_date_1) #character
+
+
 Date[1:398], format: "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" ...
+
+
str(subset_data$demo_dob) #character 
+
+
 Date[1:398], format: "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" ...
+
+
d$diag_date_1 <- as.Date(d$diag_date_1, format = "%m/%d/%Y")  
+d$demo_dob <- as.Date(d$demo_dob, format = "%m/%d/%Y")  
+
+str(d$diag_date_1) #dates! 
+
+
 Date[1:579], format: "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" "2013-08-15" ...
+
+
str(d$demo_dob) #dates! 
+
+
 Date[1:579], format: "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" "1957-09-21" ...
+
+
### doing the same for subset_data as it didn't carry over into that data set 
+subset_data$diag_date_1 <- as.Date(subset_data$diag_date_1, format = "%m/%d/%Y")  
+subset_data$demo_dob <- as.Date(subset_data$demo_dob, format = "%m/%d/%Y")  
+
+# calculating age from date of diagnosis to dob 
+subset_data$age_at_diag <- as.numeric(difftime(subset_data$diag_date_1, subset_data$demo_dob, units = "days")) / 365.25
+head(subset_data$age_at_diag)
+
+
[1] 55.89870 55.89870 55.89870 55.89870 55.89870 49.25667
+
+
summary(subset_data$age_at_diag) #median 48.75 
+
+
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
+  27.34   41.73   48.75   49.35   57.63   68.94 
+
+
age_summary <- subset_data |>
+  group_by(dtc_ever) |>
+  summarise(
+    mean_age = mean(age_at_diag, na.rm = TRUE),  # Calculate mean age
+    median_age = median(age_at_diag, na.rm = TRUE),  # Calculate median age
+    sd_age = sd(age_at_diag, na.rm = TRUE),  # Calculate standard deviation of age
+    n = n()  # Number of participants in each group
+  )
+
+print(age_summary) #interesting dtc ever are slightly more positive 
+
+
# A tibble: 2 × 5
+  dtc_ever mean_age median_age sd_age     n
+     <dbl>    <dbl>      <dbl>  <dbl> <int>
+1        0     50.6       51.8   9.48   149
+2        1     48.6       47.3   9.75   249
+
+
# Perform the Wilcoxon rank-sum test to compare the medians of age between dtc_ever positive and negative groups
+wilcox_test_result <- wilcox.test(age_at_diag ~ dtc_ever, data = subset_data)
+
+# Print the result
+print(wilcox_test_result)
+
+

+    Wilcoxon rank sum test with continuity correction
+
+data:  age_at_diag by dtc_ever
+W = 20838, p-value = 0.03946
+alternative hypothesis: true location shift is not equal to 0
+
+
#looking at range of age for the dtc pos 
+age_summary <- subset_data |>
+  group_by(dtc_ever) |>
+  summarise(
+    min_age = min(age_at_diag, na.rm = TRUE),  # Minimum age
+    max_age = max(age_at_diag, na.rm = TRUE),  # Maximum age
+    .groups = "drop"
+  )
+
+# View the summary table
+print(age_summary)
+
+
# A tibble: 2 × 3
+  dtc_ever min_age max_age
+     <dbl>   <dbl>   <dbl>
+1        0    27.3    68.9
+2        1    30.7    67.7
+
+
+
+
##### Race: demo_race_final
+
+# Get the count of unique participant_ids for each category in demo_race_final
+race_counts_unique_percent <- subset_data |>
+  group_by(demo_race_final) |>
+  summarise(unique_participants = n_distinct(participant_id)) |>
+  mutate(percent = unique_participants / sum(unique_participants) * 100)
+
+# View the result
+print(race_counts_unique_percent)
+
+
# A tibble: 3 × 3
+  demo_race_final unique_participants percent
+            <int>               <int>   <dbl>
+1               1                   9   8.26 
+2               3                   1   0.917
+3               5                  99  90.8  
+
+
# Count distinct participant_ids by dtc_ever and demo_race_final
+count_distinct_participants <- subset_data |>
+  group_by(demo_race_final, dtc_ever) |>
+  summarise(distinct_participant_count = n_distinct(participant_id), .groups = "drop")
+
+# Print the result
+count_distinct_participants
+
+
# A tibble: 5 × 3
+  demo_race_final dtc_ever distinct_participant_count
+            <int>    <dbl>                      <int>
+1               1        0                          5
+2               1        1                          4
+3               3        0                          1
+4               5        0                         64
+5               5        1                         35
+
+
library(dplyr)
+
+# Step 1: Summarize by unique participant_id
+summarized_data <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    dtc_ever = first(dtc_ever),   # Taking the first observed value of dtc_ever for each participant
+    demo_race_final = first(demo_race_final),  # Taking the first observed value of demo_race_final for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table <- table(summarized_data$dtc_ever, summarized_data$demo_race_final)
+contingency_table
+
+
   
+     1  3  5
+  0  5  1 64
+  1  4  0 35
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the result p val - 0.65 
+chisq_test
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 0.85903, df = 2, p-value = 0.6508
+
+
#receptor status final_receptor_group (1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+') 
+
+# Breakdown of final_receptor_group by unique participant_id
+receptor_status_by_participant <- subset_data |>
+  group_by(participant_id) |>
+  summarise(final_receptor_group = first(final_receptor_group),  # Or choose the most frequent group if needed
+            .groups = "drop")
+
+# View the result
+table(receptor_status_by_participant$final_receptor_group)
+
+

+ 1  2  3  4 
+45 52  8  4 
+
+
# Summarizing data by participant_id, final_receptor_group, and dtc_ever
+receptor_dtc_status <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    final_receptor_group = first(final_receptor_group),  # Or the most frequent if needed
+    dtc_ever = first(dtc_ever),  # Taking the first observed value for dtc_ever
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table_receptor <- table(receptor_dtc_status$final_receptor_group, receptor_dtc_status$dtc_ever)
+contingency_table_receptor
+
+
   
+     0  1
+  1 25 20
+  2 37 15
+  3  4  4
+  4  4  0
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table_receptor)
+
+
Warning in chisq.test(contingency_table_receptor): Chi-squared approximation
+may be incorrect
+
+
# Step 4: Print the result # p-value 0.14 -- interesting looks like more even distribution of DTC + across TNBC than for ctDNA 
+chisq_test
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table_receptor
+X-squared = 5.4909, df = 3, p-value = 0.1392
+
+
#curious about whether TNBC vs non-TNBC association exists (or HR + vs non-HR positive)
+#start with TNBC (using QDC)
+#inclusion criteria inc_dx_crit___1  = TNBC 
+
+
+#inc_dx_crit_list___1  
+
+TNBC_dtc_status <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    inc_dx_crit_list___1 = first(inc_dx_crit_list___1),  # Or the most frequent if needed
+    dtc_ever = first(dtc_ever),  # Taking the first observed value for dtc_ever
+    .groups = "drop"
+  )
+
+# Step 2: Create the contingency table
+contingency_table_TNBC <- table(TNBC_dtc_status$inc_dx_crit_list___1, TNBC_dtc_status$dtc_ever)
+contingency_table_TNBC
+
+
   
+     0  1
+  0 45 19
+  1 25 20
+
+
# Step 3: Perform the chi-squared test of independence
+chisq_test <- chisq.test(contingency_table_TNBC)
+
+# Step 4: p-val is 0.17 
+chisq_test
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table_TNBC
+X-squared = 1.903, df = 1, p-value = 0.1677
+
+
#ER vs non-ER 
+#first create HR_status variable 
+subset_data <- subset_data |> 
+  mutate(HR_status = case_when(
+    final_receptor_group %in% c(2, 3) ~ "HR+",
+    final_receptor_group %in% c(1, 4) ~ "Non-HR+",
+    TRUE ~ NA_character_  # In case there are missing or other unexpected values
+  ))
+
+# View the new HR_status variable
+table(subset_data$HR_status)
+
+

+    HR+ Non-HR+ 
+    225     173 
+
+
HR_status_by_participant <- subset_data |>
+  group_by(participant_id) |>
+  summarise(HR_status = first(HR_status),  # Or use mode() if you have multiple rows per participant
+            .groups = "drop")
+
+# View the result 
+table(HR_status_by_participant$HR_status) #aligns with final receptor status (ultimately 60 HR+, 49 HR-)
+
+

+    HR+ Non-HR+ 
+     60      49 
+
+
# Summarize dtc_detected status by HR_status, for each unique participant_id
+summary_data <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    HR_status = first(HR_status),  # Get the HR_status for the participant
+    dtc_status = first(dtc_ever),  # Get the dtc_detected status for the participant
+    .groups = "drop"
+  )
+
+contingency_table_HR <- table(summary_data$dtc_status, summary_data$HR_status)
+contingency_table_HR
+
+
   
+    HR+ Non-HR+
+  0  41      29
+  1  19      20
+
+
chisq_test <- chisq.test(contingency_table_HR)
+
+# Print chi-squared test results #0.28 
+chisq_test
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table_HR
+X-squared = 0.62484, df = 1, p-value = 0.4293
+
+
###tumor grade: final_tumor_grade --> NOTE: 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not reported  
+
+# Exclude rows where final_tumor_grade is 3 (not reported), want to exclude the 2 pts who had grade not reported 
+summary_data <- subset_data |>
+  filter(final_tumor_grade != 3) |>  # Exclude grade == 3
+  group_by(participant_id) |>
+  summarise(
+    grade = first(final_tumor_grade),  # Get the final_tumor_grade for each participant
+    dtc_ever = first(dtc_ever),    # Get the dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of grade vs dtc_ever
+contingency_table <- table(summary_data$grade, summary_data$dtc_ever)
+
+# View the contingency table
+print(contingency_table)
+
+
   
+     0  1
+  0 46 33
+  1 18  4
+  2  4  2
+
+
# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# View the Chi-squared test result -- p-value 0.12 NOT SIG for DTCs 
+print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 4.1608, df = 2, p-value = 0.1249
+
+
######histology  #people have different combinations of histology (1-15)
+table(subset_data$participant_id, subset_data$final_histology)
+
+
              
+                1 1,13,14,3 1,3 11,3 12,3 13,3 13,3,5 14 14,15 14,15,3 14,3
+  28115-16-001  0         0   0    0    0    0      0  0     0       0    5
+  28115-16-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-010  0         0   0    0    1    0      0  0     0       0    0
+  28115-16-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-015  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-017  0         0   0    0    0    0      0  3     0       0    0
+  28115-16-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-023  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-026  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-029  0         0   0    0    0    0      0  2     0       0    0
+  28115-16-033  0         0   0    0    0    0      0  0     0       0    0
+  28115-16-035  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-001  0         0   8    0    0    0      0  0     0       0    0
+  28115-17-002  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-006  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-008  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-010  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-012  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-017  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-019  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-023  0         0   2    0    0    0      0  0     0       0    0
+  28115-17-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-030  0         0   3    0    0    0      0  0     0       0    0
+  28115-17-031  0         0   0    0    0    0      0  5     0       0    0
+  28115-17-032  0         0   0   10    0    0      0  0     0       0    0
+  28115-17-036  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-039  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-040  0         0   4    0    0    0      0  0     0       0    0
+  28115-17-045  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-046  0         0   0    0    0   10      0  0     0       0    0
+  28115-17-047  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-048  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-050  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-051  0         0   0    0    0    0      0  0     0       0    0
+  28115-17-052  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-001  0         0   0    0    0    0      0  0     7       0    0
+  28115-18-002  0         0   2    0    0    0      0  0     0       0    0
+  28115-18-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-006  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-009  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-015  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-017  0         0   0    0    0    0      0  1     0       0    0
+  28115-18-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-021  0         0   0    0    0    0      0  8     0       0    0
+  28115-18-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-023  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-028  1         0   0    0    0    0      0  0     0       0    0
+  28115-18-029  0         0   0    0    0    0      0  4     0       0    0
+  28115-18-030  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-031  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-032  0         0   0    0    0    0      0  0     0       0    0
+  28115-18-034  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-001  0         0   0    0    0    0      0  1     0       0    0
+  28115-19-002  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-003  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-005  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-006  0         0   0    0    0    8      0  0     0       0    0
+  28115-19-007  0         0   0    0    0    0      0  0     0       0    5
+  28115-19-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-012  0         3   0    0    0    0      0  0     0       0    0
+  28115-19-014  0         0   0    0    0    0      0  0     0       0    2
+  28115-19-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-017  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-019  0         0   0    0    0    0      0  0     0       0    3
+  28115-19-020  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-021  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-022  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-19-028  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-004  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-007  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-009  0         0   0    0    0    0      0  0     0       0    0
+  28115-20-010  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-001  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-002  0         0   0    0    0    0      0  0     0       0    4
+  28115-21-003  0         0   0    0    0    0      0  2     0       0    0
+  28115-21-006  0         0   0    0    0    2      0  0     0       0    0
+  28115-21-007  0         0   0    0    0    0      0  3     0       0    0
+  28115-21-009  0         0   0    0    0    0      3  0     0       0    0
+  28115-21-011  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-013  0         0   0    0    0    0      0  0     0       0    4
+  28115-21-014  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-015  0         0   0    0    0    0      0  0     0       2    0
+  28115-21-016  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-019  0         0   0    0    0    0      0  1     0       0    0
+  28115-21-020  0         0   3    0    0    0      0  0     0       0    0
+  28115-21-021  0         0   0    0    0    0      0  3     0       0    0
+  28115-21-022  0         0   0    0    0    1      0  0     0       0    0
+  28115-21-024  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-025  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-026  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-027  0         0   0    0    0    0      0  0     0       0    0
+  28115-21-028  0         0   0    0    0    0      0  0     0       0    1
+              
+               16,3  3 3,5 3,7  5
+  28115-16-001    0  0   0   0  0
+  28115-16-004    0  1   0   0  0
+  28115-16-010    0  0   0   0  0
+  28115-16-014    0  1   0   0  0
+  28115-16-015    0 12   0   0  0
+  28115-16-017    0  0   0   0  0
+  28115-16-020    0  0   1   0  0
+  28115-16-021    0  9   0   0  0
+  28115-16-023    0  1   0   0  0
+  28115-16-025    0  1   0   0  0
+  28115-16-026    0 10   0   0  0
+  28115-16-027    0  3   0   0  0
+  28115-16-029    0  0   0   0  0
+  28115-16-033    0  2   0   0  0
+  28115-16-035    0  1   0   0  0
+  28115-17-001    0  0   0   0  0
+  28115-17-002    0  9   0   0  0
+  28115-17-006    0  1   0   0  0
+  28115-17-008    0  9   0   0  0
+  28115-17-009    0  1   0   0  0
+  28115-17-010    0  5   0   0  0
+  28115-17-011    0  9   0   0  0
+  28115-17-012    0 10   0   0  0
+  28115-17-016    0  4   0   0  0
+  28115-17-017    0  5   0   0  0
+  28115-17-019    0  9   0   0  0
+  28115-17-021    0  1   0   0  0
+  28115-17-022    0  1   0   0  0
+  28115-17-023    0  0   0   0  0
+  28115-17-024    0  0   0   4  0
+  28115-17-025    0  2   0   0  0
+  28115-17-027    0  8   0   0  0
+  28115-17-030    0  0   0   0  0
+  28115-17-031    0  0   0   0  0
+  28115-17-032    0  0   0   0  0
+  28115-17-036    0  7   0   0  0
+  28115-17-039    0  2   0   0  0
+  28115-17-040    0  0   0   0  0
+  28115-17-045    0  0   1   0  0
+  28115-17-046    0  0   0   0  0
+  28115-17-047    0  3   0   0  0
+  28115-17-048    0  2   0   0  0
+  28115-17-050    0  3   0   0  0
+  28115-17-051    0  9   0   0  0
+  28115-17-052    0  0   0   0  3
+  28115-18-001    0  0   0   0  0
+  28115-18-002    0  0   0   0  0
+  28115-18-004    0  2   0   0  0
+  28115-18-006    0  0   0   0  0
+  28115-18-009    0  0   0   0  0
+  28115-18-011    0  5   0   0  0
+  28115-18-014    0  2   0   0  0
+  28115-18-015    0  5   0   0  0
+  28115-18-017    0  0   0   0  0
+  28115-18-020    0  8   0   0  0
+  28115-18-021    0  0   0   0  0
+  28115-18-022    0  0   0  12  0
+  28115-18-023    0  3   0   0  0
+  28115-18-024    0  0   0   2  0
+  28115-18-027    0  1   0   0  0
+  28115-18-028    0  0   0   0  0
+  28115-18-029    0  0   0   0  0
+  28115-18-030    0  2   0   0  0
+  28115-18-031    0  3   0   0  0
+  28115-18-032    0  6   0   0  0
+  28115-18-034    0  1   0   0  0
+  28115-19-001    0  0   0   0  0
+  28115-19-002    0  2   0   0  0
+  28115-19-003    0  5   0   0  0
+  28115-19-004    0  1   0   0  0
+  28115-19-005    0  3   0   0  0
+  28115-19-006    0  0   0   0  0
+  28115-19-007    0  0   0   0  0
+  28115-19-009    0  6   0   0  0
+  28115-19-011    0  1   0   0  0
+  28115-19-012    0  0   0   0  0
+  28115-19-014    0  0   0   0  0
+  28115-19-016    0  2   0   0  0
+  28115-19-017    0  2   0   0  0
+  28115-19-019    0  0   0   0  0
+  28115-19-020    0  2   0   0  0
+  28115-19-021    0  4   0   0  0
+  28115-19-022    0  2   0   0  0
+  28115-19-025    0  6   0   0  0
+  28115-19-028    0  2   0   0  0
+  28115-20-004    0  2   0   0  0
+  28115-20-007    0  2   0   0  0
+  28115-20-009    0  4   0   0  0
+  28115-20-010    0  1   0   0  0
+  28115-21-001    0  1   0   0  0
+  28115-21-002    0  0   0   0  0
+  28115-21-003    0  0   0   0  0
+  28115-21-006    0  0   0   0  0
+  28115-21-007    0  0   0   0  0
+  28115-21-009    0  0   0   0  0
+  28115-21-011    0  1   0   0  0
+  28115-21-013    0  0   0   0  0
+  28115-21-014    0  2   0   0  0
+  28115-21-015    0  0   0   0  0
+  28115-21-016    0  8   0   0  0
+  28115-21-019    0  0   0   0  0
+  28115-21-020    0  0   0   0  0
+  28115-21-021    0  0   0   0  0
+  28115-21-022    0  0   0   0  0
+  28115-21-024    0  2   0   0  0
+  28115-21-025    0  2   0   0  0
+  28115-21-026    2  0   0   0  0
+  28115-21-027    0  2   0   0  0
+  28115-21-028    0  0   0   0  0
+
+
histology_summary <- subset_data |>
+  distinct(participant_id, final_histology) |>  # Get unique participant-histology combinations
+  group_by(final_histology) |>  # Group by histology type
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(histology_summary) #the different combinations add up to 109. There are a decent number who had both ductal and lobular histology 
+
+
# A tibble: 16 × 2
+   final_histology count
+   <chr>           <int>
+ 1 1                   1
+ 2 1,13,14,3           1
+ 3 1,3                 6
+ 4 11,3                1
+ 5 12,3                1
+ 6 13,3                4
+ 7 13,3,5              1
+ 8 14                 13
+ 9 14,15               1
+10 14,15,3             1
+11 14,3                7
+12 16,3                1
+13 3                  65
+14 3,5                 2
+15 3,7                 3
+16 5                   1
+
+
#trying to create Ductal, lobular, both, or other variables 
+subset_data <- subset_data |>
+  mutate(histology_category = case_when(
+    grepl("3", as.character(final_histology)) & grepl("14", as.character(final_histology)) ~ "Both Ductal and Lobular",  # Both Ductal and Lobular
+    grepl("3", as.character(final_histology)) ~ "Ductal",  # Ductal
+    grepl("14", as.character(final_histology)) ~ "Lobular",  # Lobular
+    TRUE ~ "Other"  # Any other combination
+  ))
+
+# Count the number of participants in each histology category
+histology_counts <- subset_data |>
+  group_by(histology_category) |>
+  summarise(count = n_distinct(participant_id))  # Count distinct participants
+
+# View the counts -- adds up to 109! 
+print(histology_counts)
+
+
# A tibble: 4 × 2
+  histology_category      count
+  <chr>                   <int>
+1 Both Ductal and Lobular     9
+2 Ductal                     84
+3 Lobular                    14
+4 Other                       2
+
+
#contingency table 
+library(tidyr)
+contingency_table <- subset_data |>
+  distinct(participant_id, histology_category, dtc_ever) |>  # Ensure each patient is counted once
+  count(histology_category, dtc_ever) |>
+  pivot_wider(names_from = dtc_ever, values_from = n, values_fill = list(n = 0))  # Pivot the table to get dtc_ever as columns
+
+# 3. Perform the Chi-squared test of independence
+chisq_test <- chisq.test(contingency_table[,-1])  # Remove the histology_category column for the test
+
+
Warning in chisq.test(contingency_table[, -1]): Chi-squared approximation may
+be incorrect
+
+
# 4. Print the contingency table
+print(contingency_table) 
+
+
# A tibble: 4 × 3
+  histology_category        `0`   `1`
+  <chr>                   <int> <int>
+1 Both Ductal and Lobular     9     0
+2 Ductal                     48    36
+3 Lobular                    11     3
+4 Other                       2     0
+
+
# 5. Print the result of the Chi-squared test p-value - 0.03 ### More ductal positive generally compard to all histology 
+print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table[, -1]
+X-squared = 9.2145, df = 3, p-value = 0.02657
+
+
#### Stage -- N stage  --> come back to this N stage stuff 
+
+table(subset_data$participant_id, subset_data$final_n_stage) #we have 0,1, 2, 3 (No, N1, N2, N3)
+
+
              
+                0  1  2  3
+  28115-16-001  0  0  0  5
+  28115-16-004  1  0  0  0
+  28115-16-010  0  0  0  1
+  28115-16-014  1  0  0  0
+  28115-16-015 12  0  0  0
+  28115-16-017  0  0  3  0
+  28115-16-020  0  0  1  0
+  28115-16-021  0  0  9  0
+  28115-16-023  1  0  0  0
+  28115-16-025  1  0  0  0
+  28115-16-026 10  0  0  0
+  28115-16-027  0  3  0  0
+  28115-16-029  2  0  0  0
+  28115-16-033  0  2  0  0
+  28115-16-035  1  0  0  0
+  28115-17-001  0  8  0  0
+  28115-17-002  9  0  0  0
+  28115-17-006  0  1  0  0
+  28115-17-008  9  0  0  0
+  28115-17-009  1  0  0  0
+  28115-17-010  5  0  0  0
+  28115-17-011  0  0  0  9
+  28115-17-012  0  0  0 10
+  28115-17-016  0  4  0  0
+  28115-17-017  0  5  0  0
+  28115-17-019  9  0  0  0
+  28115-17-021  1  0  0  0
+  28115-17-022  1  0  0  0
+  28115-17-023  0  0  2  0
+  28115-17-024  4  0  0  0
+  28115-17-025  2  0  0  0
+  28115-17-027  0  8  0  0
+  28115-17-030  3  0  0  0
+  28115-17-031  5  0  0  0
+  28115-17-032  0  0 10  0
+  28115-17-036  7  0  0  0
+  28115-17-039  2  0  0  0
+  28115-17-040  0  0  4  0
+  28115-17-045  1  0  0  0
+  28115-17-046 10  0  0  0
+  28115-17-047  0  3  0  0
+  28115-17-048  0  0  2  0
+  28115-17-050  3  0  0  0
+  28115-17-051  9  0  0  0
+  28115-17-052  3  0  0  0
+  28115-18-001  0  0  7  0
+  28115-18-002  0  2  0  0
+  28115-18-004  0  0  2  0
+  28115-18-006  0  1  0  0
+  28115-18-009  1  0  0  0
+  28115-18-011  0  5  0  0
+  28115-18-014  0  2  0  0
+  28115-18-015  5  0  0  0
+  28115-18-017  0  1  0  0
+  28115-18-020  8  0  0  0
+  28115-18-021  0  8  0  0
+  28115-18-022 12  0  0  0
+  28115-18-023  0  3  0  0
+  28115-18-024  0  2  0  0
+  28115-18-027  0  1  0  0
+  28115-18-028  1  0  0  0
+  28115-18-029  0  4  0  0
+  28115-18-030  2  0  0  0
+  28115-18-031  0  3  0  0
+  28115-18-032  0  6  0  0
+  28115-18-034  1  0  0  0
+  28115-19-001  0  0  0  1
+  28115-19-002  0  2  0  0
+  28115-19-003  0  5  0  0
+  28115-19-004  0  1  0  0
+  28115-19-005  3  0  0  0
+  28115-19-006  0  8  0  0
+  28115-19-007  0  5  0  0
+  28115-19-009  0  0  0  6
+  28115-19-011  0  1  0  0
+  28115-19-012  0  3  0  0
+  28115-19-014  0  0  0  2
+  28115-19-016  2  0  0  0
+  28115-19-017  2  0  0  0
+  28115-19-019  0  3  0  0
+  28115-19-020  2  0  0  0
+  28115-19-021  0  4  0  0
+  28115-19-022  0  2  0  0
+  28115-19-025  0  6  0  0
+  28115-19-028  2  0  0  0
+  28115-20-004  2  0  0  0
+  28115-20-007  0  0  2  0
+  28115-20-009  4  0  0  0
+  28115-20-010  0  1  0  0
+  28115-21-001  0  1  0  0
+  28115-21-002  0  4  0  0
+  28115-21-003  0  0  2  0
+  28115-21-006  0  2  0  0
+  28115-21-007  0  0  3  0
+  28115-21-009  0  0  3  0
+  28115-21-011  1  0  0  0
+  28115-21-013  0  4  0  0
+  28115-21-014  0  2  0  0
+  28115-21-015  0  2  0  0
+  28115-21-016  8  0  0  0
+  28115-21-019  0  1  0  0
+  28115-21-020  0  3  0  0
+  28115-21-021  0  3  0  0
+  28115-21-022  1  0  0  0
+  28115-21-024  2  0  0  0
+  28115-21-025  0  2  0  0
+  28115-21-026  0  2  0  0
+  28115-21-027  2  0  0  0
+  28115-21-028  1  0  0  0
+
+
nodal_summary <- subset_data |>
+  distinct(participant_id, final_n_stage) |>  # Get unique participant-stage combinations
+  group_by(final_n_stage) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+#View the summary table --adds up to 109, 46 = pN0 63 = pN1 
+print(nodal_summary)
+
+
# A tibble: 4 × 2
+  final_n_stage count
+          <int> <int>
+1             0    46
+2             1    43
+3             2    13
+4             3     7
+
+
subset_data_by_id <- subset_data |>
+  filter(final_n_stage %in% c(0, 1, 2, 3)) |>  # Include only relevant nodal stages
+  group_by(participant_id) |>
+  summarise(
+    nodal_status = first(final_n_stage),  # Use final_n_stage as nodal_status for each participant
+    dtc_ever = first(dtc_ever),       # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 3: Create a contingency table of nodal_status vs dtc_ever
+contingency_table <- table(subset_data_by_id$nodal_status, subset_data_by_id$dtc_ever)
+
+# Step 4: Check if any cells in the contingency table have zero counts, which could affect test validity
+print(contingency_table)
+
+
   
+     0  1
+  0 24 22
+  1 32 11
+  2 10  3
+  3  4  3
+
+
# Step 5: Perform Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 6: Print the Chi-squared test result p = 0.0001 
+print(chisq_test) 
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 5.9169, df = 3, p-value = 0.1157
+
+
#### Creating Node - vs node + variable from summary variable  
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    node_status = ifelse(first(final_n_stage) == 0, "Node Negative", "Node Positive"),  # Node negative if 0, positive otherwise
+    dtc_ever = first(dtc_ever),  # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of node_status vs dtc_ever
+contingency_table <- table(subset_data_by_id$node_status, subset_data_by_id$dtc_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Step 4: Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
               
+                 0  1
+  Node Negative 24 22
+  Node Positive 46 17
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 4.1601, df = 1, p-value = 0.04139
+
+
####### EXTRA CODE/CONFIRMATION / slightly different but ignore for our analysis 
+#cross-check with indicator pN0 in our data that reflects nodal positivity.... there is 1 patient that is node - by summary variable but node + by indicator variable 
+## should double check this at some point 
+node_pos <- subset_data |>
+  distinct(participant_id, inc_dx_crit_list___2) |>  # Get unique participant-stage combinations
+  group_by(inc_dx_crit_list___2) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+print(node_pos)
+
+
# A tibble: 2 × 2
+  inc_dx_crit_list___2 count
+                 <int> <int>
+1                    0    45
+2                    1    64
+
+
contingency_table <- subset_data |>
+  distinct(participant_id, inc_dx_crit_list___2, dtc_ever) |>  # Ensure unique participants
+  count(inc_dx_crit_list___2, dtc_ever) |>  # Count occurrences
+  spread(key = dtc_ever, value = n, fill = 0)  # Spread data into a matrix
+
+# View the contingency table
+print(contingency_table)
+
+
# A tibble: 2 × 3
+  inc_dx_crit_list___2   `0`   `1`
+                 <int> <dbl> <dbl>
+1                    0    25    20
+2                    1    45    19
+
+
# Perform the Chi-square test =0.3902 
+chi_square_result <- chisq.test(contingency_table[, -1])  # Exclude the first column with the levels
+print(chi_square_result)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table[, -1]
+X-squared = 1.903, df = 1, p-value = 0.1677
+
+
#######t stage final_t_stage 
+
+table(subset_data$final_t_stage) #we have values in 1,2,3,4, and 99 (90 = pTx) so can proceed with this 
+
+

+  1   2   3   4  99 
+173 168  46  10   1 
+
+
t_summary <- subset_data |>
+  distinct(participant_id, final_t_stage) |>  # Get unique participant-stage combinations
+  group_by(final_t_stage) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table --adds up to 109, most are pN0 or cN0, a good number are pN1 or cN1, or pN2  
+print(t_summary)
+
+
# A tibble: 5 × 2
+  final_t_stage count
+          <int> <int>
+1             1    51
+2             2    44
+3             3    12
+4             4     1
+5            99     1
+
+
#### T stage, for our T stage table, will use T1 vs T2 or greater to simplify 
+#exclude 99 (the pTx) 
+subset_data_clean <- subset_data |>
+  filter(final_t_stage != 99, dtc_ever != 99)
+
+# Combine final_t_stage into T1 vs. T2 or greater
+subset_data_clean <- subset_data_clean |>
+  mutate(final_t_stage_combined = ifelse(final_t_stage == 1, "T1", "T2 or greater"))
+
+# Summarize the data by participant_id after creating the new combined t_stage
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_t_stage_combined vs dtc_ever
+contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
               
+                 0  1
+  T1            34 17
+  T2 or greater 35 22
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.13531, df = 1, p-value = 0.713
+
+
#### TRy for T stage stats using T3 or greater as cutoff -- not super useful, So DONOT USE THIS FOR TABLE  
+
+#exclude 99 (the pTx) 
+subset_data_clean <- subset_data |>
+  filter(final_t_stage != 99, dtc_ever != 99)
+
+# Combine final_t_stage into T1/T2 or T3 or greater
+subset_data_clean <- subset_data_clean |>
+  mutate(final_t_stage_combined = case_when(
+    final_t_stage == 1 | final_t_stage == 2 ~ "T1 or T2",  # Group T1 and T2 together
+    final_t_stage >= 3 ~ "T3 or greater",  # Group T3 and higher as a separate category
+    TRUE ~ NA_character_  # Handle any unexpected values
+  ))
+
+
+# Summarize the data by participant_id after creating the new combined t_stage
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    final_t_stage_combined = first(final_t_stage_combined),  # Get the combined t status for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_t_stage_combined vs dtc_ever
+contingency_table <- table(subset_data_by_id$final_t_stage_combined, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> not significant so ignore this 
+print(contingency_table)
+
+
               
+                 0  1
+  T1 or T2      61 34
+  T3 or greater  8  5
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 1.4397e-31, df = 1, p-value = 1
+
+
########stage of disease -- final_overall_stage 
+
+table(subset_data$final_overall_stage) #we have values in 1,2,3,4, and 99 so can proceed with this 
+
+

+  1   2   3  99 
+124 167 105   2 
+
+
stage_summary <- subset_data |>
+  distinct(participant_id, final_overall_stage) |>  # Get unique participant-stage combinations
+  group_by(final_overall_stage) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table --adds up to 109 (1 99), most are stage 2, some are stage I some are stage III, no stage IV (yay)
+print(stage_summary)
+
+
# A tibble: 4 × 2
+  final_overall_stage count
+                <int> <int>
+1                   1    35
+2                   2    47
+3                   3    26
+4                  99     1
+
+
#exclude the 99 
+subset_data_clean <- subset_data |>
+  filter(final_overall_stage != 99, dtc_ever != 99)
+
+# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    final_overall_stage = first(final_overall_stage),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$final_overall_stage, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> kind of interesting, stage doesnt seem to predict dtc pos --> 0.80 
+print(contingency_table)
+
+
   
+     0  1
+  1 22 13
+  2 29 18
+  3 18  8
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 0.43515, df = 2, p-value = 0.8045
+
+
###########surgical treatment diag_surgery_type_1 (or diag_surgery_type_2) 
+
+
+table(subset_data$diag_surgery_type_1) #1= partial mastectomy/lumpectomy, 2 = mastectomy. no missingness 
+
+

+  1   2 
+158 240 
+
+
surgery <- subset_data |>
+  distinct(participant_id, diag_surgery_type_1) |>  # Get unique participant-stage combinations
+  group_by(diag_surgery_type_1) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(surgery)
+
+
# A tibble: 2 × 2
+  diag_surgery_type_1 count
+                <int> <int>
+1                   1    45
+2                   2    64
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    surgery = first(diag_surgery_type_1),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$surgery, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> p-val = 0.48....
+print(contingency_table)
+
+
   
+     0  1
+  1 31 14
+  2 38 25
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.50569, df = 1, p-value = 0.477
+
+
######## axillary management. Really just care about whether or not someone got an axillary dissection. diag_axillary_type___2_1 (or diag_axillary_type___2_2 IF 2 biopsy forms)
+
+table(subset_data$diag_axillary_type___2_1) 
+
+

+  0   1 
+215 183 
+
+
table(subset_data$diag_axillary_type___2_2) #a handful of axillary dissection on the second biopsy form so i think we want to combine these two 
+
+

+ 0  1 
+16  4 
+
+
# Create a binary variable to identify participants who had axillary dissection
+subset_data_clean <- subset_data |>
+  mutate(axillary_dissection = ifelse(diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1, 1, 0))
+
+# Ensure every participant has a dtc_ever and axillary_dissection value
+# Ensure every patient has an axillary dissection category, where 0 means no axillary dissection, and 1 means they had one
+subset_data_clean <- subset_data |>
+  mutate(axillary_dissection = case_when(
+    diag_axillary_type___2_1 == 1 | diag_axillary_type___2_2 == 1 ~ 1,  # Had axillary dissection
+    TRUE ~ 0  # No axillary dissection (includes missing values)
+  ))
+
+# Summarize the data by participant_id, including the axillary_dissection and dtc_ever variables
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    axillary_dissection = first(axillary_dissection),  # Get the axillary dissection status for each participant
+    dtc_ever = first(dtc_ever)  # Get the dtc_ever status for each participant
+  )
+
+contingency_table <- table(subset_data_by_id$axillary_dissection, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+fishers <- fisher.test(contingency_table)
+print(fishers)
+
+

+    Fisher's Exact Test for Count Data
+
+data:  contingency_table
+p-value = 0.1649
+alternative hypothesis: true odds ratio is not equal to 1
+95 percent confidence interval:
+ 0.2309022 1.3129062
+sample estimates:
+odds ratio 
+ 0.5559943 
+
+
# Print the contingency table and Chi-squared test results --> p-value 0.173 (used chisq for consistency...)
+print(contingency_table)
+
+
   
+     0  1
+  0 31 23
+  1 39 16
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 1.614, df = 1, p-value = 0.2039
+
+
####inflammatory inflamm_yn -- IGNORE THIS for Table 1 
+table(d$inflamm_yn_1) ### i think the inflammatory folks must just not be in the subset of patients (6 yes's in the overall cohort, but none in the subset data for etierh inflamm variable)
+
+

+  0   1 
+568  11 
+
+
table(d$inflamm_yn_2)  ### I think inflammatory folks just not in subset of patients in the dtc cohort 
+
+

+ 0 
+24 
+
+
table(subset_data$inflamm_yn)
+
+
Warning: Unknown or uninitialised column: `inflamm_yn`.
+
+
+
< table of extent 0 >
+
+
#### radiation prtx_radiation 
+table(subset_data$prtx_radiation) 
+
+

+  0   1 
+116 282 
+
+
radiation <- subset_data |> 
+  distinct(participant_id,prtx_radiation) |> 
+  group_by(prtx_radiation) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(radiation)
+
+
# A tibble: 2 × 2
+  prtx_radiation count
+           <int> <int>
+1              0    34
+2              1    75
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    radiation = first(prtx_radiation),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$radiation, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+fishers <- fisher.test(contingency_table)
+print(fishers)
+
+

+    Fisher's Exact Test for Count Data
+
+data:  contingency_table
+p-value = 0.6709
+alternative hypothesis: true odds ratio is not equal to 1
+95 percent confidence interval:
+ 0.4916844 3.2745694
+sample estimates:
+odds ratio 
+  1.243166 
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.77 
+print(contingency_table)
+
+
   
+     0  1
+  0 23 11
+  1 47 28
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.0823, df = 1, p-value = 0.7742
+
+
#### chemotherapy prtx_chemo 
+table(subset_data$prtx_chemo) 
+
+

+  0   1 
+ 18 380 
+
+
chemo <- subset_data |> 
+  distinct(participant_id,prtx_chemo) |> 
+  group_by(prtx_chemo) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(chemo) #3 people didn not get chemo in this cohort 
+
+
# A tibble: 2 × 2
+  prtx_chemo count
+       <int> <int>
+1          0     3
+2          1   106
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    chemo = first(prtx_chemo),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$chemo, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
fishers <- fisher.test(contingency_table)
+print(fishers)
+
+

+    Fisher's Exact Test for Count Data
+
+data:  contingency_table
+p-value = 0.2906
+alternative hypothesis: true odds ratio is not equal to 1
+95 percent confidence interval:
+ 0.00448755 5.37725419
+sample estimates:
+odds ratio 
+ 0.2715663 
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.60 
+print(contingency_table)
+
+
   
+     0  1
+  0  1  2
+  1 69 37
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.27148, df = 1, p-value = 0.6023
+
+
####neoadjuvant chemo diag_neoadj_chemo_1 or diag_neoadj_chemo_2 
+
+table(subset_data$diag_neoadj_chemo_1) 
+
+

+  0   1 
+327  71 
+
+
table(subset_data$diag_neoadj_chemo_2) #there were no ppl who were recorded on the second biopsy form as having neoadjuvant therapy so can use only the first variable 
+
+

+ 0 
+20 
+
+
nact <- subset_data |> 
+  distinct(participant_id,diag_neoadj_chemo_1) |> 
+  group_by(diag_neoadj_chemo_1) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(nact) #3 people didn not get chemo in this cohort 
+
+
# A tibble: 2 × 2
+  diag_neoadj_chemo_1 count
+                <int> <int>
+1                   0    90
+2                   1    19
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    nact = first(diag_neoadj_chemo_1),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$nact, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> p-val  = 0.37 slightly greater trend than with ctDNA  
+print(contingency_table)
+
+
   
+     0  1
+  0 60 30
+  1 10  9
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.80344, df = 1, p-value = 0.3701
+
+
####hormone therapy prtx_endo 
+
+table(subset_data$prtx_endo) 
+
+

+  0   1 
+156 242 
+
+
endo <- subset_data |> 
+  distinct(participant_id,prtx_endo) |> 
+  group_by(prtx_endo) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(endo) #most ppl did get endo (62 of the 109)
+
+
# A tibble: 2 × 2
+  prtx_endo count
+      <int> <int>
+1         0    47
+2         1    62
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    endo = first(prtx_endo),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of final_overall_stage vs dtc_ever
+contingency_table <- table(subset_data_by_id$endo, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> p-val  = 0.50 
+print(contingency_table)
+
+
   
+     0  1
+  0 28 19
+  1 42 20
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.46137, df = 1, p-value = 0.497
+
+
####bone modifying agents prtx_bonemod 
+
+table(subset_data$prtx_bonemod) 
+
+

+  0   1 
+238 160 
+
+
bonemod <- subset_data |> 
+  distinct(participant_id,prtx_bonemod) |> 
+  group_by(prtx_bonemod) |>  # Group by stage
+  summarise(count = n())  # Count the number of participants per histology type
+
+# View the summary table
+print(bonemod) #most ppl did get endo (39 got bonemod)
+
+
# A tibble: 2 × 2
+  prtx_bonemod count
+         <int> <int>
+1            0    70
+2            1    39
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    bonemod = first(prtx_bonemod),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of bonemod vs dtc_ever
+contingency_table <- table(subset_data_by_id$bonemod, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+# Print the contingency table and Chi-squared test results --> p-val  = 1 
+print(contingency_table)
+
+
   
+     0  1
+  0 45 25
+  1 25 14
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0, df = 1, p-value = 1
+
+
+
+
#pCR 
+#2 = non-pcr, 1 = pcr 
+#path cr diag_pcr_1 or diag_pcr_2 (as this could be on either of the two diagnosis and staging forms, there are 2 variables for this)
+table(subset_data$diag_pcr_1) 
+
+

+  .   1   2 
+327   8  63 
+
+
table(subset_data$diag_pcr_2) #none recorded here so can just use pcr_1 
+
+

+      . 
+378  20 
+
+
pcr <- subset_data |>
+  mutate(diag_pcr_1 = na_if(diag_pcr_1, ".")) |>  # Convert "." to NA
+  filter(!is.na(diag_pcr_1)) |>  # Exclude rows where diag_pcr_1 is NA
+  distinct(participant_id, diag_pcr_1) |>
+  group_by(diag_pcr_1) |>
+  summarise(count = n()) # Count the number of participants per histology type
+
+# View the summary table
+print(pcr) #this is correct bc 19 ppl got NACT, and we have 19 ppl recorded for pcr data 
+
+
# A tibble: 2 × 2
+  diag_pcr_1 count
+  <chr>      <int>
+1 1              1
+2 2             18
+
+
# Summarize the data by participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    pcr = first(diag_pcr_1),  # Get the final_overall_stage for each participant
+    dtc_ever = first(dtc_ever)  # Get dtc_ever status for each participant
+  )
+
+# Create a contingency table of pcr vs dtc_ever
+contingency_table <- table(subset_data_by_id$pcr, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results --> p-val  = 0.27-- does not seem to be association among those who got pcr (but also we have a group with 1 in it and a very small sample size of those on whom pCR was evaluated (18 individuals)
+print(contingency_table)
+
+
   
+     0  1
+  . 60 30
+  1  0  1
+  2 10  8
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test
+
+data:  contingency_table
+X-squared = 2.6174, df = 2, p-value = 0.2702
+
+
########recurrence
+#local first, then distant.then create summary variable of either locreg or distant 
+#local fu_locreg_prog 
+
+# Step 1: Summarize data by unique participant_id
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    fu_locreg_prog = first(fu_locreg_prog),  # Get fu_locreg_prog status for each participant
+    dtc_ever = first(dtc_ever),          # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of fu_locreg_prog vs dtc_ever
+contingency_table <- table(subset_data_by_id$fu_locreg_prog, subset_data_by_id$dtc_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the contingency table and Chi-squared test results -- p-val of 0.74, less of an association (but pts on trial) 
+print(contingency_table)
+
+
   
+     0  1
+  0 66 35
+  1  3  3
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.10507, df = 1, p-value = 0.7458
+
+
####looking at sites of locoregional progression: local site fu_locreg_site_num or fu_locreg_site_char 
+### Just want to look at site distribution here 
+
+# Summarize the distribution of fu_locreg_site_char by unique participant_id
+site_distribution <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    site = first(fu_locreg_site_char),  # Get the site for each unique participant
+    .groups = "drop"
+  ) |>
+  count(site)  # Count the occurrences of each site
+
+# View the distribution of sites -- mostly axillary nodes. 4 of 6 axillary. 1 internal mammary. 2 supraclav. 2 ipsilateral breast 
+print(site_distribution)
+
+
# A tibble: 6 × 2
+  site                                                              n
+  <chr>                                                         <int>
+1 ""                                                              103
+2 "Axillary Nodes"                                                  2
+3 "Axillary Nodes,Internal Mammary Nodes,Supraclavicular Nodes"     1
+4 "Ipsilateral Breast"                                              1
+5 "Ipsilateral Breast,Axillary Nodes"                               1
+6 "Supraclavicular Nodes"                                           1
+
+
#####distant recurrence: distant fu_dist_prog 
+
+# Step 1: Summarize data by unique participant_id
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    fu_dist_prog = first(fu_dist_prog),  # Get fu_dist_prog status for each participant
+    dtc_ever = first(dtc_ever),          # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Step 2: Create a contingency table of dist prog vs dtc_ever --> 12 who had distant progression 
+contingency_table <- table(subset_data_by_id$fu_dist_prog, subset_data_by_id$dtc_ever)
+
+# Step 3: Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Step 4: Print the contingency table and Chi-squared test results -- p-val 0.63
+print(contingency_table)
+
+
   
+     0  1
+  0 60 35
+  1  9  3
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.23777, df = 1, p-value = 0.6258
+
+
### Distant sites 
+#distant site fu_dist_site_num #fu_dist_site_char  -- start justl ooking at the locations 
+
+# Summarize the distribution of fu_dist_site_char by unique participant_id
+dist_site_distribution <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    site = first(fu_dist_site_char),  # Get the site for each unique participant
+    .groups = "drop"
+  ) |>
+  count(site)  # Count the occurrences of each site
+
+# View the distribution of sites -- Bone in 7. Liver in 3. Lung in 2. 1 intra-abdominal 
+print(dist_site_distribution)
+
+
# A tibble: 8 × 2
+  site                  n
+  <chr>             <int>
+1 ""                   97
+2 "Bone"                5
+3 "Bone,Other"          1
+4 "Intra-abdominal"     1
+5 "Liver"               2
+6 "Liver,Bone"          1
+7 "Lung"                1
+8 "Pleura,Lung"         1
+
+
#any recurrence 
+#either fu_locreg_prog or fu_dist_prog 
+
+subset_data <- subset_data |>
+  mutate(ever_relapsed = ifelse(fu_locreg_prog == 1 | fu_dist_prog == 1, "Yes", "No"))
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc_ever = first(dtc_ever),        # Get the dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs dtc_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results -- total 14 relapses, 10 were dtc - 4 were dtc + 
+print(contingency_table)
+
+
     
+       0  1
+  No  59 34
+  Yes 10  4
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.079932, df = 1, p-value = 0.7774
+
+
#### Relapse and DTC 
+#using ever_relapsed
+
+# link by participant id 
+subset_data_by_id <- subset_data |>
+  group_by(participant_id) |>
+  summarise(
+    ever_relapsed = first(ever_relapsed),  # Get the ever_relapsed status for each participant
+    dtc = first(dtc_ever),        # Get the dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of ever_relapsed vs dtc_ever
+contingency_table <- table(subset_data_by_id$ever_relapsed, subset_data_by_id$dtc)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
     
+       0  1
+  No  59 34
+  Yes 10  4
+
+
print(chisq_test) 
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.079932, df = 1, p-value = 0.7774
+
+
# Identify participants missing data in either `ever_relapsed` or `dtc_ever`
+missing_data <- subset_data_by_id |>
+  filter(is.na(ever_relapsed) | is.na(dtc))
+
+# Print the IDs of participants with missing data
+print(missing_data$participant_id) #two individuals do not have relapse data (17-021, 18-032, as we had seen above)
+
+
[1] "28115-17-021" "28115-18-032"
+
+
####survival analysis  fu_survival 
+
+table(subset_data$fu_surv)
+
+

+  0   1 
+  8 389 
+
+
surv <- subset_data |>
+  distinct(participant_id, fu_surv) |>
+  group_by(fu_surv) |>
+  summarise(count = n()) # Count the number of participants per histology type
+
+# View the summary table
+print(surv) #1 NA patient --> identify the NA patient below  dead = 5, alive 103. There is 1 that's an NA. 
+
+
# A tibble: 3 × 2
+  fu_surv count
+    <int> <int>
+1       0     5
+2       1   103
+3      NA     1
+
+
na_participant <- subset_data |>
+  filter(is.na(fu_surv)) |>
+  select(participant_id, fu_surv)
+
+# Print the result -- 28115-17-021  -- no follow up data for this pt looking in redcap, everyone else has some survival data in the dtc cohort. 
+print(na_participant)
+
+
# A tibble: 1 × 2
+  participant_id fu_surv
+  <chr>            <int>
+1 28115-17-021        NA
+
+
# Summarize data by unique participant_id
+subset_data_by_id <- subset_data_clean |>
+  group_by(participant_id) |>
+  summarise(
+    surv = first(fu_surv),          # Get survival status for each participant
+    dtc_ever = first(dtc_ever),  # Get dtc_ever status for each participant
+    .groups = "drop"
+  )
+
+# Create a contingency table of surv vs dtc_ever
+contingency_table <- table(subset_data_by_id$surv, subset_data_by_id$dtc_ever)
+
+# Perform the Chi-squared test
+chisq_test <- chisq.test(contingency_table)
+
+
Warning in chisq.test(contingency_table): Chi-squared approximation may be
+incorrect
+
+
# Print the contingency table and Chi-squared test results
+print(contingency_table)
+
+
   
+     0  1
+  0  4  1
+  1 65 38
+
+
print(chisq_test)
+
+

+    Pearson's Chi-squared test with Yates' continuity correction
+
+data:  contingency_table
+X-squared = 0.084865, df = 1, p-value = 0.7708
+
+
+

Now that we have run the univariate associations for all the important demographic and clinical factors for both ctDNA and DTC, we will work on actually making our table 1, first by ctDNA status and a second table by DTC status.

+
+

4.1 Making our Table 1

+
+

4.1.1 Demographics and Clinical Factors by ctDNA Status

+
+
####### Making Table 1--first for ctDNA ######### 
+
+## Resources to try for both making Table 1 and LASSO 
+## https://cran.r-project.org/web/packages/table1/vignettes/table1-examples.html
+## lasso https://stats.stackexchange.com/questions/72251/an-example-lasso-regression-using-glmnet-for-binary-outcome 
+## https://www.r-bloggers.com/2020/05/quick-tutorial-on-lasso-regression-with-example/#google_vignette 
+
+#Table 1 Code 
+library(table1)
+
+

+Attaching package: 'table1'
+
+
+
The following objects are masked from 'package:base':
+
+    units, units<-
+
+
names(subset_data) #to choose variables 
+
+
  [1] "ID"                               "trialID"                         
+  [3] "participant_id"                   "patient_id"                      
+  [5] "fu_trial_pid"                     "timepoint"                       
+  [7] "project"                          "surmount_id"                     
+  [9] "panel_id"                         "accession"                       
+ [11] "sample_id"                        "collection_date"                 
+ [13] "extracted_plasma_volume_ml"       "input"                           
+ [15] "input_sample"                     "physical_run_name"               
+ [17] "workflow_name"                    "eVAF"                            
+ [19] "mutant_molecules"                 "mean_VAF"                        
+ [21] "Score"                            "all_pass_variants"               
+ [23] "total_variants"                   "n_positive_variants"             
+ [25] "ctDNA_detected"                   "ctdna_cohort"                    
+ [27] "dtc_ihc_date_final"               "dtc_ihc_cytospinnum_final"       
+ [29] "dtc_ihc_result_final"             "dtc_ihc_summary_count_final"     
+ [31] "dtc_final_result_date"            "pt"                              
+ [33] "bma_date"                         "ORIG_RSLT_DTC"                   
+ [35] "ORIG_RSLT_DTC_COUNT"              "FINAL_RESULT"                    
+ [37] "FINAL_COUNT"                      "org_consent_date"                
+ [39] "demo_initials"                    "demo_dob"                        
+ [41] "demo_sex"                         "demo_ethnicity"                  
+ [43] "demo_race___1"                    "demo_race___2"                   
+ [45] "demo_race___3"                    "demo_race___4"                   
+ [47] "demo_race___5"                    "demo_race___88"                  
+ [49] "demo_race___99"                   "demo_race_other"                 
+ [51] "prtx_radiation"                   "prtx_rad_start"                  
+ [53] "prtx_rad_end"                     "prtx_chemo"                      
+ [55] "prtx_endo"                        "prtx_bonemod"                    
+ [57] "prior_therapy_complete"           "inc_dx_crit"                     
+ [59] "inc_dx_crit_list___1"             "inc_dx_crit_list___2"            
+ [61] "inc_dx_crit_list___3"             "inc_dx_crit_list___4"            
+ [63] "final_receptor_group"             "demo_race_final"                 
+ [65] "final_histology"                  "final_tumor_grade"               
+ [67] "final_overall_stage"              "final_t_stage"                   
+ [69] "final_n_stage"                    "fu_date_to"                      
+ [71] "fu_surv"                          "fu_date_death"                   
+ [73] "fu_dec_bc_pres"                   "fu_dec_bc_cause"                 
+ [75] "fu_locreg_site_num"               "fu_locreg_site_char"             
+ [77] "fu_locreg_prog"                   "fu_locreg_date"                  
+ [79] "fu_dist_site_num"                 "fu_dist_site_char"               
+ [81] "fu_dist_prog"                     "fu_dist_date"                    
+ [83] "censor_date"                      "chemo_indication_1"              
+ [85] "chemo_name_1"                     "chemo_name_other_1"              
+ [87] "chemo_start_date_1"               "start_date_exact_1"              
+ [89] "chemo_end_date_1"                 "end_date_exact_1"                
+ [91] "chemo_notes_1"                    "prior_chemotherapy_complete_1"   
+ [93] "chemo_indication_2"               "chemo_name_2"                    
+ [95] "chemo_name_other_2"               "chemo_start_date_2"              
+ [97] "start_date_exact_2"               "chemo_end_date_2"                
+ [99] "end_date_exact_2"                 "chemo_notes_2"                   
+[101] "prior_chemotherapy_complete_2"    "chemo_indication_3"              
+[103] "chemo_name_3"                     "chemo_name_other_3"              
+[105] "chemo_start_date_3"               "start_date_exact_3"              
+[107] "chemo_end_date_3"                 "end_date_exact_3"                
+[109] "chemo_notes_3"                    "prior_chemotherapy_complete_3"   
+[111] "chemo_indication_4"               "chemo_name_4"                    
+[113] "chemo_name_other_4"               "chemo_start_date_4"              
+[115] "start_date_exact_4"               "chemo_end_date_4"                
+[117] "end_date_exact_4"                 "chemo_notes_4"                   
+[119] "prior_chemotherapy_complete_4"    "hormone_indication_1"            
+[121] "hormone_name___1_1"               "hormone_name___2_1"              
+[123] "hormone_name___3_1"               "hormone_name___4_1"              
+[125] "hormone_name___5_1"               "hormone_name___6_1"              
+[127] "hormone_name___7_1"               "hormone_other_1"                 
+[129] "hormone_start_date_1"             "hormone_ongoing_1"               
+[131] "hormone_end_date_1"               "hormone_notes_1"                 
+[133] "prior_hormone_therapy_complete_1" "hormone_indication_2"            
+[135] "hormone_name___1_2"               "hormone_name___2_2"              
+[137] "hormone_name___3_2"               "hormone_name___4_2"              
+[139] "hormone_name___5_2"               "hormone_name___6_2"              
+[141] "hormone_name___7_2"               "hormone_other_2"                 
+[143] "hormone_start_date_2"             "hormone_ongoing_2"               
+[145] "hormone_end_date_2"               "hormone_notes_2"                 
+[147] "prior_hormone_therapy_complete_2" "hormone_indication_3"            
+[149] "hormone_name___1_3"               "hormone_name___2_3"              
+[151] "hormone_name___3_3"               "hormone_name___4_3"              
+[153] "hormone_name___5_3"               "hormone_name___6_3"              
+[155] "hormone_name___7_3"               "hormone_other_3"                 
+[157] "hormone_start_date_3"             "hormone_ongoing_3"               
+[159] "hormone_end_date_3"               "hormone_notes_3"                 
+[161] "prior_hormone_therapy_complete_3" "hormone_indication_4"            
+[163] "hormone_name___1_4"               "hormone_name___2_4"              
+[165] "hormone_name___3_4"               "hormone_name___4_4"              
+[167] "hormone_name___5_4"               "hormone_name___6_4"              
+[169] "hormone_name___7_4"               "hormone_other_4"                 
+[171] "hormone_start_date_4"             "hormone_ongoing_4"               
+[173] "hormone_end_date_4"               "hormone_notes_4"                 
+[175] "prior_hormone_therapy_complete_4" "hormone_indication_5"            
+[177] "hormone_name___1_5"               "hormone_name___2_5"              
+[179] "hormone_name___3_5"               "hormone_name___4_5"              
+[181] "hormone_name___5_5"               "hormone_name___6_5"              
+[183] "hormone_name___7_5"               "hormone_other_5"                 
+[185] "hormone_start_date_5"             "hormone_ongoing_5"               
+[187] "hormone_end_date_5"               "hormone_notes_5"                 
+[189] "prior_hormone_therapy_complete_5" "hormone_indication_6"            
+[191] "hormone_name___1_6"               "hormone_name___2_6"              
+[193] "hormone_name___3_6"               "hormone_name___4_6"              
+[195] "hormone_name___5_6"               "hormone_name___6_6"              
+[197] "hormone_name___7_6"               "hormone_other_6"                 
+[199] "hormone_start_date_6"             "hormone_ongoing_6"               
+[201] "hormone_end_date_6"               "hormone_notes_6"                 
+[203] "prior_hormone_therapy_complete_6" "bonemod_indication___1_1"        
+[205] "bonemod_indication___2_1"         "bonemod_indication___3_1"        
+[207] "bonemod_name_1"                   "bonemod_start_date_1"            
+[209] "bonemod_ongoing_1"                "bonemod_end_date_1"              
+[211] "prior_bonemod_agents_complete_1"  "bonemod_indication___1_2"        
+[213] "bonemod_indication___2_2"         "bonemod_indication___3_2"        
+[215] "bonemod_name_2"                   "bonemod_start_date_2"            
+[217] "bonemod_ongoing_2"                "bonemod_end_date_2"              
+[219] "prior_bonemod_agents_complete_2"  "diag_date_1"                     
+[221] "diag_lateral_1"                   "diag_menopause_1"                
+[223] "diag_inv_histology___1_1"         "diag_inv_histology___2_1"        
+[225] "diag_inv_histology___3_1"         "diag_inv_histology___4_1"        
+[227] "diag_inv_histology___5_1"         "diag_inv_histology___6_1"        
+[229] "diag_inv_histology___7_1"         "diag_inv_histology___8_1"        
+[231] "diag_inv_histology___9_1"         "diag_inv_histology___10_1"       
+[233] "diag_inv_histology___11_1"        "diag_inv_histology___12_1"       
+[235] "diag_inv_histology___13_1"        "diag_inv_histology___14_1"       
+[237] "diag_inv_histology___15_1"        "diag_inv_histology___16_1"       
+[239] "inflamm_yn_1"                     "diag_tumor_grade_1"              
+[241] "diag_er_status_1"                 "diag_er_percent_1"               
+[243] "diag_pr_status_1"                 "diag_pr_percent_1"               
+[245] "diag_her2_status_1"               "diag_her2_method_1"              
+[247] "diag_her2_ihc_score_1"            "diag_her2_fish_copy_num_1"       
+[249] "diag_her2_fish_ratio_1"           "diag_neoadj_chemo_1"             
+[251] "diag_pcr_1"                       "diag_surgery_date_1"             
+[253] "diag_timing_tumor_1"              "diag_surgery_type_1"             
+[255] "diag_surgery_type_other_1"        "diag_axillary_type___1_1"        
+[257] "diag_axillary_type___2_1"         "diag_inv_histology_surg_1_1"     
+[259] "diag_inv_histology_surg_2_1"      "diag_inv_histology_surg_3_1"     
+[261] "diag_inv_histology_surg_4_1"      "diag_inv_histology_surg_5_1"     
+[263] "diag_inv_histology_surg_6_1"      "diag_inv_histology_surg_7_1"     
+[265] "diag_inv_histology_surg_8_1"      "diag_inv_histology_surg_9_1"     
+[267] "diag_inv_histology_surg_10_1"     "diag_inv_histology_surg_11_1"    
+[269] "diag_inv_histology_surg_12_1"     "diag_inv_histology_surg_13_1"    
+[271] "diag_inv_histology_surg_14_1"     "diag_inv_histology_surg_15_1"    
+[273] "diag_inv_histology_surg_16_1"     "diag_inv_histology_surg_99_1"    
+[275] "diag_tumor_grade_surg_1"          "diag_tumor_size_1"               
+[277] "diag_lymph_nodes_num_1"           "diag_nodes_total_1"              
+[279] "diag_surg_receptors_1"            "diag_surg_er_status_1"           
+[281] "diag_surg_er_percent_1"           "diag_surg_pr_status_1"           
+[283] "diag_surg_pr_percent_1"           "diag_surg_her2_status_1"         
+[285] "diag_surg_her2_method_1"          "diag_surg_her2_ihc_score_1"      
+[287] "diag_surg_her2_fish_score_1"      "diag_surg_her2_fish_ratio_1"     
+[289] "diag_ajcc_1"                      "diag_path_t_stage_1"             
+[291] "diag_path_n_stage_1"              "diag_path_m_stage_1"             
+[293] "diag_overall_stage_1"             "diag_ajcc_clin_1"                
+[295] "diag_clin_t_stage_1"              "diag_clin_n_stage_1"             
+[297] "diag_clin_m_stage_1"              "diag_overall_stage_clin_1"       
+[299] "diag_oncotype_1"                  "diag_oncotype_score_1"           
+[301] "mammoprint_1"                     "mammoprint_result_1"             
+[303] "diag_notes_1"                     "diag_date_2"                     
+[305] "diag_lateral_2"                   "diag_menopause_2"                
+[307] "diag_inv_histology___1_2"         "diag_inv_histology___2_2"        
+[309] "diag_inv_histology___3_2"         "diag_inv_histology___4_2"        
+[311] "diag_inv_histology___5_2"         "diag_inv_histology___6_2"        
+[313] "diag_inv_histology___7_2"         "diag_inv_histology___8_2"        
+[315] "diag_inv_histology___9_2"         "diag_inv_histology___10_2"       
+[317] "diag_inv_histology___11_2"        "diag_inv_histology___12_2"       
+[319] "diag_inv_histology___13_2"        "diag_inv_histology___14_2"       
+[321] "diag_inv_histology___15_2"        "diag_inv_histology___16_2"       
+[323] "inflamm_yn_2"                     "diag_tumor_grade_2"              
+[325] "diag_er_status_2"                 "diag_er_percent_2"               
+[327] "diag_pr_status_2"                 "diag_pr_percent_2"               
+[329] "diag_her2_status_2"               "diag_her2_method_2"              
+[331] "diag_her2_ihc_score_2"            "diag_her2_fish_copy_num_2"       
+[333] "diag_her2_fish_ratio_2"           "diag_neoadj_chemo_2"             
+[335] "diag_pcr_2"                       "diag_surgery_date_2"             
+[337] "diag_timing_tumor_2"              "diag_surgery_type_2"             
+[339] "diag_surgery_type_other_2"        "diag_axillary_type___1_2"        
+[341] "diag_axillary_type___2_2"         "diag_inv_histology_surg_1_2"     
+[343] "diag_inv_histology_surg_2_2"      "diag_inv_histology_surg_3_2"     
+[345] "diag_inv_histology_surg_4_2"      "diag_inv_histology_surg_5_2"     
+[347] "diag_inv_histology_surg_6_2"      "diag_inv_histology_surg_7_2"     
+[349] "diag_inv_histology_surg_8_2"      "diag_inv_histology_surg_9_2"     
+[351] "diag_inv_histology_surg_10_2"     "diag_inv_histology_surg_11_2"    
+[353] "diag_inv_histology_surg_12_2"     "diag_inv_histology_surg_13_2"    
+[355] "diag_inv_histology_surg_14_2"     "diag_inv_histology_surg_15_2"    
+[357] "diag_inv_histology_surg_16_2"     "diag_inv_histology_surg_99_2"    
+[359] "diag_tumor_grade_surg_2"          "diag_tumor_size_2"               
+[361] "diag_lymph_nodes_num_2"           "diag_nodes_total_2"              
+[363] "diag_surg_receptors_2"            "diag_surg_er_status_2"           
+[365] "diag_surg_er_percent_2"           "diag_surg_pr_status_2"           
+[367] "diag_surg_pr_percent_2"           "diag_surg_her2_status_2"         
+[369] "diag_surg_her2_method_2"          "diag_surg_her2_ihc_score_2"      
+[371] "diag_surg_her2_fish_score_2"      "diag_surg_her2_fish_ratio_2"     
+[373] "diag_ajcc_2"                      "diag_path_t_stage_2"             
+[375] "diag_path_n_stage_2"              "diag_path_m_stage_2"             
+[377] "diag_overall_stage_2"             "diag_ajcc_clin_2"                
+[379] "diag_clin_t_stage_2"              "diag_clin_n_stage_2"             
+[381] "diag_clin_m_stage_2"              "diag_overall_stage_clin_2"       
+[383] "diag_oncotype_2"                  "diag_oncotype_score_2"           
+[385] "mammoprint_2"                     "mammoprint_result_2"             
+[387] "diag_notes_2"                     "ctDNA_ever"                      
+[389] "dtc_ever"                         "ever_relapsed"                   
+[391] "age_at_diag"                      "HR_status"                       
+[393] "histology_category"               "node_status"                     
+[395] "axillary_dissection"             
+
+
library(dplyr)
+library(tidyr)
+library(stringr)
+
+# Prepare the dataset
+unique_subset_data <- subset_data |>
+  mutate(
+    # Convert "Missing" and 99 to NA in relevant columns
+    final_t_stage = na_if(as.character(final_t_stage), "Missing"),
+    final_t_stage = na_if(final_t_stage, "99"),
+    final_overall_stage = na_if(as.character(final_overall_stage), "Missing"),
+    final_overall_stage = na_if(final_overall_stage, "99"),
+    final_tumor_grade = na_if(final_tumor_grade, 3),
+    diag_pcr_1 = na_if(diag_pcr_1, "."),
+    # Replace 99 with NA in all numeric columns
+    across(where(is.numeric), ~ na_if(.x, 99))
+  )  |>
+  group_by(participant_id) |>
+  summarize(
+    age_at_diag = first(na.omit(age_at_diag)),
+    final_receptor_group = first(na.omit(final_receptor_group)),
+    demo_race_final = first(na.omit(demo_race_final)),
+    final_tumor_grade = first(na.omit(final_tumor_grade)),
+    final_overall_stage = first(na.omit(final_overall_stage)),
+    final_t_stage = first(na.omit(final_t_stage)),
+    final_n_stage = first(na.omit(final_n_stage)),
+    histology_category = first(na.omit(histology_category)),
+    prtx_radiation = first(na.omit(prtx_radiation)),
+    prtx_chemo = first(na.omit(prtx_chemo)),
+    prtx_endo = first(na.omit(prtx_endo)),
+    prtx_bonemod = first(na.omit(prtx_bonemod)),
+    node_status = first(na.omit(node_status)),
+    axillary_dissection = first(na.omit(axillary_dissection)),
+    diag_surgery_type_1 = first(na.omit(diag_surgery_type_1)),
+    diag_neoadj_chemo_1 = first(na.omit(diag_neoadj_chemo_1)), diag_pcr_1 = first(na.omit(diag_pcr_1)),
+    ctDNA_ever = first(na.omit(ctDNA_ever))
+  )
+
+#######
+#add labels for 
+#final_receptor_group
+#demo_race_final
+#final_tumor_grade
+#final_overall_tage
+#final_t_stage) 
+#final_n_stage 
+#histology_category
+#prtx_radiation 
+#prtx_chemo) 
+#prtx_endo
+#prtx_bonemod 
+#node_status) 
+#axillary_dissection 
+#diag_surgery_type_1
+#diag_neoadj_chemo_1 
+#ctDNA_ever 
+#diag_pcr_1
+
+
+label(unique_subset_data$age_at_diag) <- "Age at Diagnosis"
+units(unique_subset_data$age_at_diag)       <- "years"
+
+#Final receptor group: 1='TNBC', 2='HR+ Her2-', 3='HR+ Her2+', 4='HR- Her2+'
+
+
+# assign `final_receptor_group` factor levels and labels to `unique_subset_data`
+unique_subset_data <- unique_subset_data |>
+  mutate(
+    final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4),
+                                  labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+"))
+  )
+
+label(unique_subset_data$final_receptor_group)       <- "Final Receptor Group"
+
+table(unique_subset_data$final_receptor_group)
+
+

+     TNBC HR+ HER2- HR+ HER2+ HR- HER2+ 
+       45        52         8         4 
+
+
##demo_race_final 
+
+table(unique_subset_data$demo_race_final) #1, 3, 5 -- 5 = white, 1 = black, 3 = asian 
+
+

+ 1  3  5 
+ 9  1 99 
+
+
unique_subset_data$demo_race_final <- 
+  factor(unique_subset_data$demo_race_final, levels=c(1,3,5),
+         labels=c("Black", 
+                  "Asian", "White"))
+label(unique_subset_data$demo_race_final)  <- "Race"
+table(unique_subset_data$demo_race_final) 
+
+

+Black Asian White 
+    9     1    99 
+
+
#final_tumor_grade 
+table(unique_subset_data$final_tumor_grade) # 0 = grade 3, 1 = grade 1, 2 = grade 2, 3 = Not Reported. Added 3 to NA up above for table 1 code so it will be considered N/A. 
+
+

+ 0  1  2 
+79 22  6 
+
+
unique_subset_data$final_tumor_grade <- 
+  factor(unique_subset_data$final_tumor_grade, levels=c(0,1,2),
+         labels=c("Grade 3", 
+                  "Grade 1", "Grade 2"))
+label(unique_subset_data$final_tumor_grade)  <- "Tumor Grade"
+table(unique_subset_data$final_tumor_grade) 
+
+

+Grade 3 Grade 1 Grade 2 
+     79      22       6 
+
+
#final_overall_stage
+
+table(unique_subset_data$final_overall_stage) # 1 = stage I 2 = stage II 3 = stage III  
+
+

+ 1  2  3 
+35 47 26 
+
+
unique_subset_data$final_overall_stage <- 
+  factor(unique_subset_data$final_overall_stage, levels=c(1,2,3),
+         labels=c("Stage I", 
+                  "Stage II", "Stage III"))
+label(unique_subset_data$final_overall_stage)  <- "Overall Stage"
+table(unique_subset_data$final_overall_stage) 
+
+

+  Stage I  Stage II Stage III 
+       35        47        26 
+
+
#final_t_stage
+table(unique_subset_data$final_t_stage) # 1 = T1 2 = T2 3 = T3 4 = T4  
+
+

+ 1  2  3  4 
+51 44 12  1 
+
+
unique_subset_data$final_t_stage <- 
+  factor(unique_subset_data$final_t_stage, levels=c(1,2,3,4),
+         labels=c("T1", 
+                  "T2", "T3", "T4"))
+label(unique_subset_data$final_t_stage)  <- "T Stage"
+table(unique_subset_data$final_t_stage) 
+
+

+T1 T2 T3 T4 
+51 44 12  1 
+
+
#final_n_stage 
+
+table(unique_subset_data$final_n_stage) # 0 = N0 1 = N1 2 = N2 3 = N3 
+
+

+ 0  1  2  3 
+46 43 13  7 
+
+
unique_subset_data$final_n_stage <- 
+  factor(unique_subset_data$final_n_stage, levels=c(0,1,2,3),
+         labels=c("N0", 
+                  "N1", "N2", "N3"))
+label(unique_subset_data$final_n_stage)  <- "N Stage"
+table(unique_subset_data$final_n_stage) 
+
+

+N0 N1 N2 N3 
+46 43 13  7 
+
+
#histology_category
+
+table(unique_subset_data$histology_category) #These are labeled already correctly as both ductal and lobular, ductal, lobular, and other  
+
+

+Both Ductal and Lobular                  Ductal                 Lobular 
+                      9                      84                      14 
+                  Other 
+                      2 
+
+
label(unique_subset_data$histology_category)  <- "Histology Category"
+
+
+#prtx_radiation 
+
+table(unique_subset_data$prtx_radiation) #1 = radiation 0 = no 
+
+

+ 0  1 
+34 75 
+
+
unique_subset_data$prtx_radiation <- 
+  factor(unique_subset_data$prtx_radiation, levels=c(0,1),
+         labels=c("No Radiation", "Radiation"))
+label(unique_subset_data$prtx_radiation)  <- "Radiation"
+table(unique_subset_data$prtx_radiation)
+
+

+No Radiation    Radiation 
+          34           75 
+
+
#prtx_chemo
+
+table(unique_subset_data$prtx_chemo) #1 = chemo 0 = no 
+
+

+  0   1 
+  3 106 
+
+
table(subset_data$prtx_chemo)
+
+

+  0   1 
+ 18 380 
+
+
unique_subset_data$prtx_chemo <- 
+factor(unique_subset_data$prtx_chemo, levels=c(0,1),
+         labels=c("No Chemo", "Chemo"))
+label(unique_subset_data$prtx_chemo)  <- "Chemo"
+table(unique_subset_data$prtx_chemo)
+
+

+No Chemo    Chemo 
+       3      106 
+
+
#prtx_endo
+
+
+table(unique_subset_data$prtx_endo) #1 = hormone therapy 0 = no 
+
+

+ 0  1 
+47 62 
+
+
table(subset_data$prtx_endo)
+
+

+  0   1 
+156 242 
+
+
unique_subset_data$prtx_endo <- 
+factor(unique_subset_data$prtx_endo, levels=c(0,1),
+         labels=c("No Endocrine Therapy", "Endocrine Therapy"))
+label(unique_subset_data$prtx_endo)  <- "Endocrine Therapy"
+table(unique_subset_data$prtx_endo)
+
+

+No Endocrine Therapy    Endocrine Therapy 
+                  47                   62 
+
+
#prtx_bonemod 
+
+table(unique_subset_data$prtx_bonemod) #1 = bonemod 0 = no 
+
+

+ 0  1 
+70 39 
+
+
table(unique_subset_data$prtx_bonemod)
+
+

+ 0  1 
+70 39 
+
+
unique_subset_data$prtx_bonemod <- 
+factor(unique_subset_data$prtx_bonemod, levels=c(0,1),
+         labels=c("No Bone Modifying Treatment", "Bone Modifying Treatment"))
+label(unique_subset_data$prtx_bonemod)  <- "Bone Modifying Treatment"
+table(unique_subset_data$prtx_bonemod)
+
+

+No Bone Modifying Treatment    Bone Modifying Treatment 
+                         70                          39 
+
+
#node_status 
+table(unique_subset_data$node_status) #already positive and negative  
+
+

+Node Negative Node Positive 
+           46            63 
+
+
label(unique_subset_data$node_status)  <- "Node Status"
+
+#axillary_dissection 
+
+table(unique_subset_data$axillary_dissection) #1 = axillary dissection 0 = no dissection
+
+

+ 0  1 
+54 55 
+
+
unique_subset_data$axillary_dissection <- 
+factor(unique_subset_data$axillary_dissection, levels=c(0,1),
+         labels=c("No Axillary Dissection", "Axillary Dissection"))
+label(unique_subset_data$axillary_dissection)  <- "Axillary Dissection"
+table(unique_subset_data$axillary_dissection)
+
+

+No Axillary Dissection    Axillary Dissection 
+                    54                     55 
+
+
#diag_surgery_type_1
+table(unique_subset_data$diag_surgery_type_1) #1 = Lumpectomy 2 = Mastectomy
+
+

+ 1  2 
+45 64 
+
+
unique_subset_data$diag_surgery_type_1 <- 
+factor(unique_subset_data$diag_surgery_type_1, levels=c(1,2),
+         labels=c("Lumpectomy", "Mastectomy"))
+label(unique_subset_data$diag_surgery_type_1)  <- "Surgery Type"
+table(unique_subset_data$diag_surgery_type_1)
+
+

+Lumpectomy Mastectomy 
+        45         64 
+
+
#diag_neoadj_chemo_1 
+
+table(unique_subset_data$diag_neoadj_chemo_1) #1 = Neoadj Chemo 0 = No Neoadjuv 
+
+

+ 0  1 
+90 19 
+
+
unique_subset_data$diag_neoadj_chemo_1 <- 
+factor(unique_subset_data$diag_neoadj_chemo_1, levels=c(0,1),
+         labels=c("No Neoadjuvant Chemo", "Neoadjuvant Chemo"))
+label(unique_subset_data$diag_neoadj_chemo_1)  <- "Neoadjuvant Chemo"
+table(unique_subset_data$diag_neoadj_chemo_1)
+
+

+No Neoadjuvant Chemo    Neoadjuvant Chemo 
+                  90                   19 
+
+
#pCR 
+table(unique_subset_data$diag_pcr_1) #1 = pCR 2 = non-PCR  
+
+

+ 1  2 
+ 1 18 
+
+
unique_subset_data$diag_pcr_1<- 
+factor(unique_subset_data$diag_pcr_1, levels=c(1,2),
+         labels=c("pCR", "Non-pCR"))
+label(unique_subset_data$diag_pcr_1)  <- "Pathologic Complete Response"
+table(unique_subset_data$diag_pcr_1)
+
+

+    pCR Non-pCR 
+      1      18 
+
+
#ctDNA_ever 
+table(unique_subset_data$ctDNA_ever) #FALSE = ctDNA Negative TRUE = ctDNA Positive
+
+

+FALSE  TRUE 
+  100     9 
+
+
unique_subset_data$ctDNA_ever <- 
+factor(unique_subset_data$ctDNA_ever, levels=c("FALSE", "TRUE"),
+         labels=c("ctDNA Negative", "ctDNA Positive"))
+label(unique_subset_data$ctDNA_ever)  <- "ctDNA Status"
+table(unique_subset_data$ctDNA_ever)
+
+

+ctDNA Negative ctDNA Positive 
+           100              9 
+
+
caption  <- "Table 1 by ctDNA Status"
+
+# Generate the table1 summary
+table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 + diag_pcr_1 | 
+    ctDNA_ever,
+  data = unique_subset_data, overall=c(left="Total"), caption=caption)
+
+
+ + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1 by ctDNA Status
Total
+(N=109)
ctDNA Negative
+(N=100)
ctDNA Positive
+(N=9)
Age at Diagnosis (years)
Mean (SD)49.7 (9.66)49.2 (9.64)54.6 (8.94)
Median [Min, Max]49.3 [27.3, 68.9]49.0 [27.3, 68.9]55.6 [38.6, 64.4]
Final Receptor Group
TNBC45 (41.3%)44 (44.0%)1 (11.1%)
HR+ HER2-52 (47.7%)45 (45.0%)7 (77.8%)
HR+ HER2+8 (7.3%)8 (8.0%)0 (0%)
HR- HER2+4 (3.7%)3 (3.0%)1 (11.1%)
Race
Black9 (8.3%)8 (8.0%)1 (11.1%)
Asian1 (0.9%)1 (1.0%)0 (0%)
White99 (90.8%)91 (91.0%)8 (88.9%)
Tumor Grade
Grade 379 (72.5%)75 (75.0%)4 (44.4%)
Grade 122 (20.2%)17 (17.0%)5 (55.6%)
Grade 26 (5.5%)6 (6.0%)0 (0%)
Missing2 (1.8%)2 (2.0%)0 (0%)
Overall Stage
Stage I35 (32.1%)33 (33.0%)2 (22.2%)
Stage II47 (43.1%)46 (46.0%)1 (11.1%)
Stage III26 (23.9%)20 (20.0%)6 (66.7%)
Missing1 (0.9%)1 (1.0%)0 (0%)
T Stage
T151 (46.8%)48 (48.0%)3 (33.3%)
T244 (40.4%)40 (40.0%)4 (44.4%)
T312 (11.0%)11 (11.0%)1 (11.1%)
T41 (0.9%)0 (0%)1 (11.1%)
Missing1 (0.9%)1 (1.0%)0 (0%)
N Stage
N046 (42.2%)43 (43.0%)3 (33.3%)
N143 (39.4%)43 (43.0%)0 (0%)
N213 (11.9%)8 (8.0%)5 (55.6%)
N37 (6.4%)6 (6.0%)1 (11.1%)
Histology Category
Both Ductal and Lobular9 (8.3%)9 (9.0%)0 (0%)
Ductal84 (77.1%)78 (78.0%)6 (66.7%)
Lobular14 (12.8%)11 (11.0%)3 (33.3%)
Other2 (1.8%)2 (2.0%)0 (0%)
Radiation
No Radiation34 (31.2%)33 (33.0%)1 (11.1%)
Radiation75 (68.8%)67 (67.0%)8 (88.9%)
Chemo
No Chemo3 (2.8%)2 (2.0%)1 (11.1%)
Chemo106 (97.2%)98 (98.0%)8 (88.9%)
Endocrine Therapy
No Endocrine Therapy47 (43.1%)45 (45.0%)2 (22.2%)
Endocrine Therapy62 (56.9%)55 (55.0%)7 (77.8%)
Bone Modifying Treatment
No Bone Modifying Treatment70 (64.2%)65 (65.0%)5 (55.6%)
Bone Modifying Treatment39 (35.8%)35 (35.0%)4 (44.4%)
Node Status
Node Negative46 (42.2%)43 (43.0%)3 (33.3%)
Node Positive63 (57.8%)57 (57.0%)6 (66.7%)
Axillary Dissection
No Axillary Dissection54 (49.5%)52 (52.0%)2 (22.2%)
Axillary Dissection55 (50.5%)48 (48.0%)7 (77.8%)
Surgery Type
Lumpectomy45 (41.3%)41 (41.0%)4 (44.4%)
Mastectomy64 (58.7%)59 (59.0%)5 (55.6%)
Neoadjuvant Chemo
No Neoadjuvant Chemo90 (82.6%)82 (82.0%)8 (88.9%)
Neoadjuvant Chemo19 (17.4%)18 (18.0%)1 (11.1%)
Pathologic Complete Response
pCR1 (0.9%)1 (1.0%)0 (0%)
Non-pCR18 (16.5%)17 (17.0%)1 (11.1%)
Missing90 (82.6%)82 (82.0%)8 (88.9%)
+ +
+
+
+

We have our basic Table 1 by ctDNA status.

+
+
#Adding P-values and tests of significance to the code. 
+
+# Step 1: Create table1 output
+table1_output <- table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 +diag_pcr_1 | 
+    ctDNA_ever,
+  data = unique_subset_data,
+  overall = c(left = "Total"),
+  caption = "Table 1: Summary of demographic and clinical variables by ctDNA status"
+)
+
+
+####
+pvalue_function <- function(x, ...) {
+  print(x)
+  # Remove any "overall" group if present and focus only on ctDNA+ and ctDNA- comparisons
+  x <- x[!names(x) %in% "overall"]  # Filter out the "Overall" column
+  y <- unlist(x)
+  g <- factor(rep(1:length(x), times = sapply(x, length)))
+  
+  # Debugging information to check group levels and data
+  if (length(unique(g)) != 2) {
+    return(NA)  # Return NA if not comparing exactly two groups
+  }
+
+  # Perform the appropriate test based on the type of variable
+  if (is.numeric(y)) {
+    # For continuous variables, perform a t-test
+    p <- t.test(y ~ g)$p.value
+  } else {
+    # For categorical variables, perform a chi-squared test or Fisher's test
+    table_result <- table(y, g)
+    
+    # Choose the correct test based on cell counts
+    if (any(table_result < 5)) {
+      p <- fisher.test(table_result)$p.value  # Use Fisher's test for low counts
+    } else {
+      p <- chisq.test(table_result)$p.value  # Use chi-squared test otherwise
+    }
+  }
+  
+  # Format the p-value for output
+  formatted_p <- format.pval(p, digits = 3, eps = 0.001)
+  return(formatted_p)
+}
+  
+
+# Generate table1 with the p-value column
+table1_p <- table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 + diag_pcr_1| 
+    ctDNA_ever,
+  data = unique_subset_data,
+  overall = c(left = "Total"),
+  extra.col = list("P-value" = pvalue_function),  # Add p-value function
+  extra.col.pos = 4  # Position of the extra column
+)
+
+
$overall
+  [1] 55.89870 49.25667 52.87611 29.93840 37.00753 48.98563 63.80835 40.89802
+  [9] 43.59754 38.57632 41.77687 45.68925 59.94524 59.43600 52.14511 42.93771
+ [17] 64.69541 55.14031 41.26762 39.52361 57.76044 44.42984 51.34565 42.27789
+ [25] 57.05133 57.62628 54.86927 44.18891 63.62491 36.00548 55.57837 30.71595
+ [33] 41.28953 59.38946 48.79945 59.15400 48.97194 59.39767 39.67967 67.68515
+ [41] 41.84531 48.16975 58.07529 62.49966 46.64476 47.34565 52.09856 36.58042
+ [49] 58.26146 61.76318 61.73580 39.40862 55.30459 53.10335 43.30459 48.46270
+ [57] 44.07666 52.55305 56.45996 67.72621 39.59206 51.82752 58.28611 46.93498
+ [65] 31.17591 55.96441 46.38741 46.33812 40.62971 37.67556 32.35318 48.75291
+ [73] 56.22177 39.41136 49.76591 43.22245 36.01095 41.30322 59.57016 39.65503
+ [81] 54.94593 43.50992 48.80767 62.10541 63.35934 57.31417 59.74264 66.92676
+ [89] 36.30938 34.83641 55.12115 52.07118 27.33744 64.41342 56.09035 47.90691
+ [97] 51.38125 41.71663 48.47639 40.52567 60.39151 52.51198 60.87064 58.61465
+[105] 38.60370 68.93634 37.84531 51.43874 52.68720
+attr(,"label")
+[1] "Age at Diagnosis"
+attr(,"units")
+[1] "years"
+
+$`ctDNA Negative`
+  [1] 55.89870 49.25667 52.87611 29.93840 37.00753 48.98563 40.89802 43.59754
+  [9] 38.57632 41.77687 45.68925 59.94524 59.43600 52.14511 42.93771 64.69541
+ [17] 55.14031 41.26762 39.52361 57.76044 44.42984 51.34565 42.27789 57.05133
+ [25] 57.62628 54.86927 44.18891 36.00548 30.71595 41.28953 59.38946 59.15400
+ [33] 48.97194 59.39767 39.67967 67.68515 41.84531 48.16975 62.49966 46.64476
+ [41] 47.34565 52.09856 36.58042 58.26146 61.76318 61.73580 39.40862 55.30459
+ [49] 53.10335 43.30459 48.46270 44.07666 52.55305 56.45996 67.72621 39.59206
+ [57] 51.82752 58.28611 46.93498 31.17591 55.96441 46.33812 40.62971 37.67556
+ [65] 32.35318 48.75291 56.22177 39.41136 49.76591 43.22245 36.01095 41.30322
+ [73] 59.57016 39.65503 54.94593 43.50992 48.80767 62.10541 63.35934 57.31417
+ [81] 59.74264 66.92676 36.30938 34.83641 55.12115 27.33744 56.09035 47.90691
+ [89] 51.38125 41.71663 48.47639 40.52567 60.39151 52.51198 60.87064 58.61465
+ [97] 68.93634 37.84531 51.43874 52.68720
+
+$`ctDNA Positive`
+[1] 63.80835 63.62491 55.57837 48.79945 58.07529 46.38741 52.07118 64.41342
+[9] 38.60370
+
+$overall
+  [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR- HER2+
+  [8] HR+ HER2+ TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2+
+ [15] TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC     
+ [22] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
+ [29] HR+ HER2- TNBC      TNBC      HR+ HER2+ TNBC      HR+ HER2- HR+ HER2-
+ [36] HR+ HER2+ TNBC      HR+ HER2- TNBC      HR+ HER2- TNBC      TNBC     
+ [43] HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2-
+ [50] HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2- TNBC      HR+ HER2-
+ [57] TNBC      TNBC      TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [64] TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2-
+ [71] TNBC      HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2+ HR+ HER2-
+ [78] TNBC      HR+ HER2- HR+ HER2- TNBC      HR- HER2+ TNBC      HR+ HER2-
+ [85] HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [92] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC     
+ [99] HR+ HER2- TNBC      HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2-
+[106] HR+ HER2- HR- HER2+ TNBC      HR+ HER2-
+attr(,"label")
+[1] Final Receptor Group
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$`ctDNA Negative`
+  [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2+
+  [8] TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2+ TNBC     
+ [15] TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC      HR+ HER2-
+ [22] HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC      TNBC     
+ [29] HR+ HER2+ TNBC      HR+ HER2- HR+ HER2+ TNBC      HR+ HER2- TNBC     
+ [36] HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2-
+ [43] HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2-
+ [50] TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
+ [57] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [64] HR+ HER2- TNBC      HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2+
+ [71] HR+ HER2- TNBC      HR+ HER2- HR+ HER2- TNBC      HR- HER2+ TNBC     
+ [78] HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2-
+ [85] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2-
+ [92] TNBC      HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2- HR- HER2+
+ [99] TNBC      HR+ HER2-
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$`ctDNA Positive`
+[1] HR- HER2+ HR+ HER2- TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2-
+[8] HR+ HER2- HR+ HER2-
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$overall
+  [1] White White White White White White Black White White White White White
+ [13] White White White White White White Black White White White White White
+ [25] White Black White White White White White White Black White White White
+ [37] White White White White White White White White White White White Black
+ [49] White White White White White White White White Black White White White
+ [61] White White White White Black White White White White White White White
+ [73] White White White White White White White White Black White White White
+ [85] White White White White White White White White Asian White White White
+ [97] White Black White White White White White White White White White White
+[109] White
+attr(,"label")
+[1] Race
+Levels: Black Asian White
+
+$`ctDNA Negative`
+  [1] White White White White White White White White White White White White
+ [13] White White White White White Black White White White White White White
+ [25] Black White White White White Black White White White White White White
+ [37] White White White White White White Black White White White White White
+ [49] White White White Black White White White White White White White Black
+ [61] White White White White White White White White White White White White
+ [73] White White Black White White White White White White White White White
+ [85] White Asian White White White Black White White White White White White
+ [97] White White White White
+Levels: Black Asian White
+
+$`ctDNA Positive`
+[1] Black White White White White White White White White
+Levels: Black Asian White
+
+$overall
+  [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [10] Grade 3 Grade 3 Grade 3 <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3
+ [19] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [28] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3
+ [37] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 2
+ [46] Grade 1 Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 <NA>   
+ [55] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+ [64] Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
+ [73] Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [82] Grade 3 Grade 3 Grade 2 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+ [91] Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+[100] Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+[109] Grade 3
+attr(,"label")
+[1] Tumor Grade
+Levels: Grade 3 Grade 1 Grade 2
+
+$`ctDNA Negative`
+  [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [10] Grade 3 Grade 3 <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [19] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [28] Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3
+ [37] Grade 3 Grade 3 Grade 2 Grade 2 Grade 1 Grade 3 Grade 1 Grade 1 Grade 1
+ [46] Grade 3 Grade 3 Grade 3 <NA>    Grade 3 Grade 1 Grade 3 Grade 3 Grade 3
+ [55] Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+ [64] Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [73] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 3 Grade 3 Grade 1
+ [82] Grade 3 Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3
+ [91] Grade 1 Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
+[100] Grade 3
+Levels: Grade 3 Grade 1 Grade 2
+
+$`ctDNA Positive`
+[1] Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 1 Grade 1 Grade 1 Grade 3
+Levels: Grade 3 Grade 1 Grade 2
+
+$overall
+  [1] Stage III Stage I   Stage III Stage II  Stage II  Stage III Stage III
+  [8] Stage III Stage II  Stage I   Stage II  Stage II  Stage II  Stage II 
+ [15] Stage I   Stage II  Stage I   Stage II  Stage II  Stage I   Stage II 
+ [22] Stage III Stage III Stage II  Stage II  Stage II  Stage I   Stage II 
+ [29] Stage III Stage II  Stage I   Stage II  Stage I   Stage II  Stage III
+ [36] Stage II  Stage II  Stage III Stage I   Stage I   Stage II  Stage III
+ [43] Stage II  Stage I   Stage I   Stage III Stage II  Stage III Stage II 
+ [50] Stage I   Stage II  Stage II  Stage I   Stage II  Stage I   Stage III
+ [57] Stage I   Stage II  Stage I   Stage II  Stage II  Stage III Stage I  
+ [64] Stage II  Stage II  Stage II  Stage III Stage II  Stage I   Stage II 
+ [71] Stage I   Stage II  Stage I   Stage III Stage III Stage I   Stage III
+ [78] Stage I   Stage I   Stage I   Stage I   Stage II  Stage III Stage I  
+ [85] Stage II  Stage I   Stage III Stage I   Stage II  Stage II  Stage II 
+ [92] Stage III Stage II  Stage III Stage III Stage I   Stage II  Stage II 
+ [99] <NA>      Stage I   Stage I   Stage III Stage III Stage I   Stage I  
+[106] Stage II  Stage II  Stage I   Stage II 
+attr(,"label")
+[1] Overall Stage
+Levels: Stage I Stage II Stage III
+
+$`ctDNA Negative`
+  [1] Stage III Stage I   Stage III Stage II  Stage II  Stage III Stage III
+  [8] Stage II  Stage I   Stage II  Stage II  Stage II  Stage II  Stage I  
+ [15] Stage II  Stage I   Stage II  Stage II  Stage I   Stage II  Stage III
+ [22] Stage III Stage II  Stage II  Stage II  Stage I   Stage II  Stage II 
+ [29] Stage II  Stage I   Stage II  Stage II  Stage II  Stage III Stage I  
+ [36] Stage I   Stage II  Stage III Stage I   Stage I   Stage III Stage II 
+ [43] Stage III Stage II  Stage I   Stage II  Stage II  Stage I   Stage II 
+ [50] Stage I   Stage III Stage I   Stage II  Stage I   Stage II  Stage II 
+ [57] Stage III Stage I   Stage II  Stage II  Stage II  Stage II  Stage I  
+ [64] Stage II  Stage I   Stage II  Stage I   Stage III Stage III Stage I  
+ [71] Stage III Stage I   Stage I   Stage I   Stage I   Stage II  Stage III
+ [78] Stage I   Stage II  Stage I   Stage III Stage I   Stage II  Stage II 
+ [85] Stage II  Stage II  Stage III Stage I   Stage II  Stage II  <NA>     
+ [92] Stage I   Stage I   Stage III Stage III Stage I   Stage II  Stage II 
+ [99] Stage I   Stage II 
+Levels: Stage I Stage II Stage III
+
+$`ctDNA Positive`
+[1] Stage III Stage III Stage I   Stage III Stage II  Stage III Stage III
+[8] Stage III Stage I  
+Levels: Stage I Stage II Stage III
+
+$overall
+  [1] T2   T1   T2   T2   T2   T3   T2   T1   T2   T1   T2   T2   T3   T1   T1  
+ [16] T2   T1   T1   T2   T1   T2   T2   T2   T2   T1   T2   T1   <NA> T2   T2  
+ [31] T1   T1   T1   T3   T4   T2   T2   T1   T1   T1   T2   T3   T2   T1   T1  
+ [46] T3   T1   T1   T2   T1   T2   T2   T1   T1   T1   T3   T1   T1   T1   T2  
+ [61] T2   T3   T1   T2   T2   T2   T1   T1   T1   T2   T1   T2   T1   T3   T3  
+ [76] T1   T2   T1   T1   T1   T1   T2   T2   T1   T2   T1   T1   T1   T2   T2  
+ [91] T2   T3   T2   T2   T1   T1   T2   T1   T1   T1   T1   T3   T3   T1   T1  
+[106] T2   T2   T1   T2  
+attr(,"label")
+[1] T Stage
+Levels: T1 T2 T3 T4
+
+$`ctDNA Negative`
+  [1] T2   T1   T2   T2   T2   T3   T1   T2   T1   T2   T2   T3   T1   T1   T2  
+ [16] T1   T1   T2   T1   T2   T2   T2   T2   T1   T2   T1   <NA> T2   T1   T1  
+ [31] T3   T2   T2   T1   T1   T1   T2   T3   T1   T1   T3   T1   T1   T2   T1  
+ [46] T2   T2   T1   T1   T1   T3   T1   T1   T1   T2   T2   T3   T1   T2   T2  
+ [61] T2   T1   T1   T2   T1   T2   T1   T3   T3   T1   T2   T1   T1   T1   T1  
+ [76] T2   T2   T1   T2   T1   T1   T1   T2   T2   T2   T2   T1   T1   T2   T1  
+ [91] T1   T1   T1   T3   T3   T1   T2   T2   T1   T2  
+Levels: T1 T2 T3 T4
+
+$`ctDNA Positive`
+[1] T2 T2 T1 T4 T2 T1 T3 T2 T1
+Levels: T1 T2 T3 T4
+
+$overall
+  [1] N3 N0 N3 N0 N0 N2 N2 N2 N0 N0 N0 N1 N0 N1 N0 N1 N0 N1 N0 N0 N0 N3 N3 N1 N1
+ [26] N0 N0 N0 N2 N0 N0 N1 N0 N0 N2 N0 N0 N2 N0 N0 N1 N2 N0 N0 N0 N2 N1 N2 N1 N0
+ [51] N1 N1 N0 N1 N0 N1 N0 N1 N1 N1 N0 N1 N0 N1 N1 N0 N3 N1 N1 N1 N0 N1 N1 N3 N1
+ [76] N1 N3 N0 N0 N1 N0 N1 N1 N1 N0 N0 N2 N0 N1 N1 N1 N2 N1 N2 N2 N0 N1 N1 N1 N0
+[101] N1 N1 N1 N0 N0 N1 N1 N0 N0
+attr(,"label")
+[1] N Stage
+Levels: N0 N1 N2 N3
+
+$`ctDNA Negative`
+  [1] N3 N0 N3 N0 N0 N2 N2 N0 N0 N0 N1 N0 N1 N0 N1 N0 N1 N0 N0 N0 N3 N3 N1 N1 N0
+ [26] N0 N0 N0 N1 N0 N0 N0 N0 N2 N0 N0 N1 N2 N0 N0 N2 N1 N2 N1 N0 N1 N1 N0 N1 N0
+ [51] N1 N0 N1 N1 N1 N0 N1 N0 N1 N1 N0 N1 N1 N1 N0 N1 N1 N3 N1 N1 N3 N0 N0 N1 N0
+ [76] N1 N1 N1 N0 N0 N2 N0 N1 N1 N1 N1 N2 N0 N1 N1 N1 N0 N1 N1 N1 N0 N1 N1 N0 N0
+Levels: N0 N1 N2 N3
+
+$`ctDNA Positive`
+[1] N2 N2 N0 N2 N0 N3 N2 N2 N0
+Levels: N0 N1 N2 N3
+
+$overall
+  [1] Both Ductal and Lobular Ductal                  Ductal                 
+  [4] Ductal                  Ductal                  Lobular                
+  [7] Ductal                  Ductal                  Ductal                 
+ [10] Ductal                  Ductal                  Ductal                 
+ [13] Lobular                 Ductal                  Ductal                 
+ [16] Ductal                  Ductal                  Ductal                 
+ [19] Ductal                  Ductal                  Ductal                 
+ [22] Ductal                  Ductal                  Ductal                 
+ [25] Ductal                  Ductal                  Ductal                 
+ [28] Ductal                  Ductal                  Ductal                 
+ [31] Ductal                  Ductal                  Ductal                 
+ [34] Lobular                 Ductal                  Ductal                 
+ [37] Ductal                  Ductal                  Ductal                 
+ [40] Ductal                  Ductal                  Ductal                 
+ [43] Ductal                  Ductal                  Other                  
+ [46] Lobular                 Ductal                  Ductal                 
+ [49] Lobular                 Lobular                 Ductal                 
+ [52] Ductal                  Ductal                  Lobular                
+ [55] Ductal                  Lobular                 Ductal                 
+ [58] Ductal                  Ductal                  Ductal                 
+ [61] Other                   Lobular                 Ductal                 
+ [64] Ductal                  Ductal                  Ductal                 
+ [67] Lobular                 Ductal                  Ductal                 
+ [70] Ductal                  Ductal                  Ductal                 
+ [73] Both Ductal and Lobular Ductal                  Ductal                 
+ [76] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
+ [79] Ductal                  Both Ductal and Lobular Ductal                 
+ [82] Ductal                  Ductal                  Ductal                 
+ [85] Ductal                  Ductal                  Ductal                 
+ [88] Ductal                  Ductal                  Ductal                 
+ [91] Both Ductal and Lobular Lobular                 Ductal                 
+ [94] Lobular                 Ductal                  Ductal                 
+ [97] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
+[100] Ductal                  Lobular                 Ductal                 
+[103] Lobular                 Ductal                  Ductal                 
+[106] Ductal                  Ductal                  Ductal                 
+[109] Both Ductal and Lobular
+Levels: Both Ductal and Lobular Ductal Lobular Other
+
+$`ctDNA Negative`
+  [1] Both Ductal and Lobular Ductal                  Ductal                 
+  [4] Ductal                  Ductal                  Lobular                
+  [7] Ductal                  Ductal                  Ductal                 
+ [10] Ductal                  Ductal                  Lobular                
+ [13] Ductal                  Ductal                  Ductal                 
+ [16] Ductal                  Ductal                  Ductal                 
+ [19] Ductal                  Ductal                  Ductal                 
+ [22] Ductal                  Ductal                  Ductal                 
+ [25] Ductal                  Ductal                  Ductal                 
+ [28] Ductal                  Ductal                  Ductal                 
+ [31] Lobular                 Ductal                  Ductal                 
+ [34] Ductal                  Ductal                  Ductal                 
+ [37] Ductal                  Ductal                  Ductal                 
+ [40] Other                   Lobular                 Ductal                 
+ [43] Ductal                  Lobular                 Lobular                
+ [46] Ductal                  Ductal                  Ductal                 
+ [49] Lobular                 Ductal                  Lobular                
+ [52] Ductal                  Ductal                  Ductal                 
+ [55] Ductal                  Other                   Lobular                
+ [58] Ductal                  Ductal                  Ductal                 
+ [61] Ductal                  Ductal                  Ductal                 
+ [64] Ductal                  Ductal                  Ductal                 
+ [67] Both Ductal and Lobular Ductal                  Ductal                 
+ [70] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
+ [73] Ductal                  Both Ductal and Lobular Ductal                 
+ [76] Ductal                  Ductal                  Ductal                 
+ [79] Ductal                  Ductal                  Ductal                 
+ [82] Ductal                  Ductal                  Ductal                 
+ [85] Both Ductal and Lobular Ductal                  Ductal                 
+ [88] Ductal                  Both Ductal and Lobular Ductal                 
+ [91] Both Ductal and Lobular Ductal                  Lobular                
+ [94] Ductal                  Lobular                 Ductal                 
+ [97] Ductal                  Ductal                  Ductal                 
+[100] Both Ductal and Lobular
+Levels: Both Ductal and Lobular Ductal Lobular Other
+
+$`ctDNA Positive`
+[1] Ductal  Ductal  Ductal  Ductal  Ductal  Lobular Lobular Lobular Ductal 
+Levels: Ductal Lobular
+
+$overall
+  [1] Radiation    No Radiation No Radiation No Radiation No Radiation
+  [6] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [11] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [16] No Radiation Radiation    Radiation    No Radiation Radiation   
+ [21] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [26] Radiation    No Radiation No Radiation Radiation    No Radiation
+ [31] Radiation    Radiation    No Radiation Radiation    Radiation   
+ [36] No Radiation No Radiation Radiation    No Radiation No Radiation
+ [41] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [46] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [51] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [56] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [61] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [66] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [71] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [76] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [81] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [86] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [91] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [96] No Radiation Radiation    Radiation    No Radiation No Radiation
+[101] No Radiation Radiation    Radiation    Radiation    No Radiation
+[106] No Radiation Radiation    No Radiation No Radiation
+attr(,"label")
+[1] Radiation
+Levels: No Radiation Radiation
+
+$`ctDNA Negative`
+  [1] Radiation    No Radiation No Radiation No Radiation No Radiation
+  [6] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [11] Radiation    Radiation    Radiation    No Radiation No Radiation
+ [16] Radiation    Radiation    No Radiation Radiation    Radiation   
+ [21] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [26] No Radiation No Radiation No Radiation Radiation    No Radiation
+ [31] Radiation    No Radiation No Radiation Radiation    No Radiation
+ [36] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [41] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [46] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [51] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [56] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [61] Radiation    Radiation    No Radiation Radiation    No Radiation
+ [66] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [71] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [76] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [81] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [86] Radiation    Radiation    No Radiation Radiation    Radiation   
+ [91] No Radiation No Radiation No Radiation Radiation    Radiation   
+ [96] Radiation    No Radiation Radiation    No Radiation No Radiation
+Levels: No Radiation Radiation
+
+$`ctDNA Positive`
+[1] Radiation    Radiation    Radiation    Radiation    Radiation   
+[6] Radiation    Radiation    Radiation    No Radiation
+Levels: No Radiation Radiation
+
+$overall
+  [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+  [9] Chemo    Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo   
+ [17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [33] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo    Chemo   
+ [41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [73] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [81] Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
+ [89] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [97] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[105] Chemo    Chemo    Chemo    Chemo    Chemo   
+attr(,"label")
+[1] Chemo
+Levels: No Chemo Chemo
+
+$`ctDNA Negative`
+  [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+  [9] Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo   
+ [17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [33] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [73] Chemo    Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo   
+ [81] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [89] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [97] Chemo    Chemo    Chemo    Chemo   
+Levels: No Chemo Chemo
+
+$`ctDNA Positive`
+[1] Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
+[9] Chemo   
+Levels: No Chemo Chemo
+
+$overall
+  [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+  [4] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+  [7] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [10] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [13] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [16] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [19] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [22] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [25] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [28] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [31] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [34] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [37] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [40] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [43] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [46] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [49] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [52] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [55] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [58] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [61] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [64] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [67] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [70] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [73] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [76] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [79] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [82] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [85] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [88] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [91] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [94] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [97] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[100] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[103] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[106] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[109] Endocrine Therapy   
+attr(,"label")
+[1] Endocrine Therapy
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$`ctDNA Negative`
+  [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+  [4] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+  [7] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [10] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [13] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [16] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [19] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [22] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [25] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [28] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [31] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [34] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [37] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [40] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [43] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [46] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [49] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [52] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [55] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [58] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [61] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [64] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [67] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [70] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [73] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [76] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [79] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [82] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [85] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [88] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [91] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [94] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [97] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[100] Endocrine Therapy   
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$`ctDNA Positive`
+[1] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[4] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[7] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$overall
+  [1] No Bone Modifying Treatment No Bone Modifying Treatment
+  [3] No Bone Modifying Treatment No Bone Modifying Treatment
+  [5] No Bone Modifying Treatment Bone Modifying Treatment   
+  [7] No Bone Modifying Treatment No Bone Modifying Treatment
+  [9] No Bone Modifying Treatment No Bone Modifying Treatment
+ [11] No Bone Modifying Treatment No Bone Modifying Treatment
+ [13] No Bone Modifying Treatment No Bone Modifying Treatment
+ [15] No Bone Modifying Treatment No Bone Modifying Treatment
+ [17] No Bone Modifying Treatment No Bone Modifying Treatment
+ [19] No Bone Modifying Treatment No Bone Modifying Treatment
+ [21] No Bone Modifying Treatment Bone Modifying Treatment   
+ [23] Bone Modifying Treatment    No Bone Modifying Treatment
+ [25] No Bone Modifying Treatment No Bone Modifying Treatment
+ [27] No Bone Modifying Treatment No Bone Modifying Treatment
+ [29] No Bone Modifying Treatment No Bone Modifying Treatment
+ [31] No Bone Modifying Treatment No Bone Modifying Treatment
+ [33] No Bone Modifying Treatment Bone Modifying Treatment   
+ [35] No Bone Modifying Treatment Bone Modifying Treatment   
+ [37] No Bone Modifying Treatment Bone Modifying Treatment   
+ [39] No Bone Modifying Treatment Bone Modifying Treatment   
+ [41] No Bone Modifying Treatment No Bone Modifying Treatment
+ [43] No Bone Modifying Treatment Bone Modifying Treatment   
+ [45] Bone Modifying Treatment    Bone Modifying Treatment   
+ [47] No Bone Modifying Treatment Bone Modifying Treatment   
+ [49] No Bone Modifying Treatment Bone Modifying Treatment   
+ [51] Bone Modifying Treatment    No Bone Modifying Treatment
+ [53] No Bone Modifying Treatment No Bone Modifying Treatment
+ [55] No Bone Modifying Treatment Bone Modifying Treatment   
+ [57] Bone Modifying Treatment    No Bone Modifying Treatment
+ [59] No Bone Modifying Treatment No Bone Modifying Treatment
+ [61] No Bone Modifying Treatment Bone Modifying Treatment   
+ [63] No Bone Modifying Treatment No Bone Modifying Treatment
+ [65] No Bone Modifying Treatment No Bone Modifying Treatment
+ [67] Bone Modifying Treatment    No Bone Modifying Treatment
+ [69] Bone Modifying Treatment    Bone Modifying Treatment   
+ [71] No Bone Modifying Treatment No Bone Modifying Treatment
+ [73] No Bone Modifying Treatment Bone Modifying Treatment   
+ [75] Bone Modifying Treatment    Bone Modifying Treatment   
+ [77] Bone Modifying Treatment    No Bone Modifying Treatment
+ [79] Bone Modifying Treatment    Bone Modifying Treatment   
+ [81] No Bone Modifying Treatment No Bone Modifying Treatment
+ [83] No Bone Modifying Treatment Bone Modifying Treatment   
+ [85] Bone Modifying Treatment    Bone Modifying Treatment   
+ [87] Bone Modifying Treatment    No Bone Modifying Treatment
+ [89] No Bone Modifying Treatment Bone Modifying Treatment   
+ [91] Bone Modifying Treatment    Bone Modifying Treatment   
+ [93] Bone Modifying Treatment    Bone Modifying Treatment   
+ [95] Bone Modifying Treatment    No Bone Modifying Treatment
+ [97] Bone Modifying Treatment    No Bone Modifying Treatment
+ [99] No Bone Modifying Treatment No Bone Modifying Treatment
+[101] Bone Modifying Treatment    No Bone Modifying Treatment
+[103] Bone Modifying Treatment    No Bone Modifying Treatment
+[105] Bone Modifying Treatment    No Bone Modifying Treatment
+[107] No Bone Modifying Treatment No Bone Modifying Treatment
+[109] No Bone Modifying Treatment
+attr(,"label")
+[1] Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$`ctDNA Negative`
+  [1] No Bone Modifying Treatment No Bone Modifying Treatment
+  [3] No Bone Modifying Treatment No Bone Modifying Treatment
+  [5] No Bone Modifying Treatment Bone Modifying Treatment   
+  [7] No Bone Modifying Treatment No Bone Modifying Treatment
+  [9] No Bone Modifying Treatment No Bone Modifying Treatment
+ [11] No Bone Modifying Treatment No Bone Modifying Treatment
+ [13] No Bone Modifying Treatment No Bone Modifying Treatment
+ [15] No Bone Modifying Treatment No Bone Modifying Treatment
+ [17] No Bone Modifying Treatment No Bone Modifying Treatment
+ [19] No Bone Modifying Treatment No Bone Modifying Treatment
+ [21] Bone Modifying Treatment    Bone Modifying Treatment   
+ [23] No Bone Modifying Treatment No Bone Modifying Treatment
+ [25] No Bone Modifying Treatment No Bone Modifying Treatment
+ [27] No Bone Modifying Treatment No Bone Modifying Treatment
+ [29] No Bone Modifying Treatment No Bone Modifying Treatment
+ [31] Bone Modifying Treatment    Bone Modifying Treatment   
+ [33] No Bone Modifying Treatment Bone Modifying Treatment   
+ [35] No Bone Modifying Treatment Bone Modifying Treatment   
+ [37] No Bone Modifying Treatment No Bone Modifying Treatment
+ [39] Bone Modifying Treatment    Bone Modifying Treatment   
+ [41] Bone Modifying Treatment    No Bone Modifying Treatment
+ [43] Bone Modifying Treatment    No Bone Modifying Treatment
+ [45] Bone Modifying Treatment    Bone Modifying Treatment   
+ [47] No Bone Modifying Treatment No Bone Modifying Treatment
+ [49] No Bone Modifying Treatment No Bone Modifying Treatment
+ [51] Bone Modifying Treatment    Bone Modifying Treatment   
+ [53] No Bone Modifying Treatment No Bone Modifying Treatment
+ [55] No Bone Modifying Treatment No Bone Modifying Treatment
+ [57] Bone Modifying Treatment    No Bone Modifying Treatment
+ [59] No Bone Modifying Treatment No Bone Modifying Treatment
+ [61] No Bone Modifying Treatment No Bone Modifying Treatment
+ [63] Bone Modifying Treatment    Bone Modifying Treatment   
+ [65] No Bone Modifying Treatment No Bone Modifying Treatment
+ [67] No Bone Modifying Treatment Bone Modifying Treatment   
+ [69] Bone Modifying Treatment    Bone Modifying Treatment   
+ [71] Bone Modifying Treatment    No Bone Modifying Treatment
+ [73] Bone Modifying Treatment    Bone Modifying Treatment   
+ [75] No Bone Modifying Treatment No Bone Modifying Treatment
+ [77] No Bone Modifying Treatment Bone Modifying Treatment   
+ [79] Bone Modifying Treatment    Bone Modifying Treatment   
+ [81] Bone Modifying Treatment    No Bone Modifying Treatment
+ [83] No Bone Modifying Treatment Bone Modifying Treatment   
+ [85] Bone Modifying Treatment    Bone Modifying Treatment   
+ [87] Bone Modifying Treatment    No Bone Modifying Treatment
+ [89] Bone Modifying Treatment    No Bone Modifying Treatment
+ [91] No Bone Modifying Treatment No Bone Modifying Treatment
+ [93] Bone Modifying Treatment    No Bone Modifying Treatment
+ [95] Bone Modifying Treatment    No Bone Modifying Treatment
+ [97] No Bone Modifying Treatment No Bone Modifying Treatment
+ [99] No Bone Modifying Treatment No Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$`ctDNA Positive`
+[1] No Bone Modifying Treatment No Bone Modifying Treatment
+[3] No Bone Modifying Treatment No Bone Modifying Treatment
+[5] No Bone Modifying Treatment Bone Modifying Treatment   
+[7] Bone Modifying Treatment    Bone Modifying Treatment   
+[9] Bone Modifying Treatment   
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$overall
+  [1] Node Positive Node Negative Node Positive Node Negative Node Negative
+  [6] Node Positive Node Positive Node Positive Node Negative Node Negative
+ [11] Node Negative Node Positive Node Negative Node Positive Node Negative
+ [16] Node Positive Node Negative Node Positive Node Negative Node Negative
+ [21] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [26] Node Negative Node Negative Node Negative Node Positive Node Negative
+ [31] Node Negative Node Positive Node Negative Node Negative Node Positive
+ [36] Node Negative Node Negative Node Positive Node Negative Node Negative
+ [41] Node Positive Node Positive Node Negative Node Negative Node Negative
+ [46] Node Positive Node Positive Node Positive Node Positive Node Negative
+ [51] Node Positive Node Positive Node Negative Node Positive Node Negative
+ [56] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [61] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [66] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [71] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [76] Node Positive Node Positive Node Negative Node Negative Node Positive
+ [81] Node Negative Node Positive Node Positive Node Positive Node Negative
+ [86] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [91] Node Positive Node Positive Node Positive Node Positive Node Positive
+ [96] Node Negative Node Positive Node Positive Node Positive Node Negative
+[101] Node Positive Node Positive Node Positive Node Negative Node Negative
+[106] Node Positive Node Positive Node Negative Node Negative
+Levels: Node Negative Node Positive
+
+$`ctDNA Negative`
+  [1] Node Positive Node Negative Node Positive Node Negative Node Negative
+  [6] Node Positive Node Positive Node Negative Node Negative Node Negative
+ [11] Node Positive Node Negative Node Positive Node Negative Node Positive
+ [16] Node Negative Node Positive Node Negative Node Negative Node Negative
+ [21] Node Positive Node Positive Node Positive Node Positive Node Negative
+ [26] Node Negative Node Negative Node Negative Node Positive Node Negative
+ [31] Node Negative Node Negative Node Negative Node Positive Node Negative
+ [36] Node Negative Node Positive Node Positive Node Negative Node Negative
+ [41] Node Positive Node Positive Node Positive Node Positive Node Negative
+ [46] Node Positive Node Positive Node Negative Node Positive Node Negative
+ [51] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [56] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [61] Node Negative Node Positive Node Positive Node Positive Node Negative
+ [66] Node Positive Node Positive Node Positive Node Positive Node Positive
+ [71] Node Positive Node Negative Node Negative Node Positive Node Negative
+ [76] Node Positive Node Positive Node Positive Node Negative Node Negative
+ [81] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [86] Node Positive Node Positive Node Negative Node Positive Node Positive
+ [91] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [96] Node Negative Node Positive Node Positive Node Negative Node Negative
+Levels: Node Negative Node Positive
+
+$`ctDNA Positive`
+[1] Node Positive Node Positive Node Negative Node Positive Node Negative
+[6] Node Positive Node Positive Node Positive Node Negative
+Levels: Node Negative Node Positive
+
+$overall
+  [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+  [4] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+  [7] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [10] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [13] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [16] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [19] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [22] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [25] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [28] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [31] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [34] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [37] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [40] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [43] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [46] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [49] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [52] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [55] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [58] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [61] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [64] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [67] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [70] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [73] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [76] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [79] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [82] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [85] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [88] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [91] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [94] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [97] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[100] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+[103] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[106] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[109] No Axillary Dissection
+attr(,"label")
+[1] Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$`ctDNA Negative`
+  [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+  [4] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+  [7] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [10] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [13] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [16] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [19] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [22] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [25] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [28] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [31] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [34] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [37] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [40] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [43] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [46] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [49] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [52] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [55] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [58] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [61] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [64] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [67] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [70] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [73] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [76] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [79] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [82] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [85] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [88] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [91] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [94] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [97] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[100] No Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$`ctDNA Positive`
+[1] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[4] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[7] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$overall
+  [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+  [7] Lumpectomy Mastectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy
+ [13] Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [19] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [25] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [31] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [37] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [43] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+ [49] Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [55] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [61] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [67] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [73] Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [79] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [85] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [91] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [97] Mastectomy Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy
+[103] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+[109] Mastectomy
+attr(,"label")
+[1] Surgery Type
+Levels: Lumpectomy Mastectomy
+
+$`ctDNA Negative`
+  [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+  [7] Mastectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy
+ [13] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [19] Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy
+ [25] Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Mastectomy
+ [31] Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [37] Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [43] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [49] Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [55] Mastectomy Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy
+ [61] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [67] Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [73] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [79] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [85] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Lumpectomy
+ [91] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy
+ [97] Mastectomy Mastectomy Mastectomy Mastectomy
+Levels: Lumpectomy Mastectomy
+
+$`ctDNA Positive`
+[1] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Mastectomy
+[8] Mastectomy Mastectomy
+Levels: Lumpectomy Mastectomy
+
+$overall
+  [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+  [4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+  [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [16] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [25] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [28] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [34] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [37] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [40] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [43] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [49] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [52] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [55] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [58] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [61] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [64] Neoadjuvant Chemo    Neoadjuvant Chemo    Neoadjuvant Chemo   
+ [67] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [70] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [73] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [76] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [79] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [82] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [85] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [88] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [91] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [94] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [97] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[100] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+[103] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[106] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[109] No Neoadjuvant Chemo
+attr(,"label")
+[1] Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$`ctDNA Negative`
+  [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+  [4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+  [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [16] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [25] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [28] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [31] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [34] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [37] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [40] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [43] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [49] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [52] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [55] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [58] No Neoadjuvant Chemo Neoadjuvant Chemo    Neoadjuvant Chemo   
+ [61] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [64] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [67] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [70] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [73] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [76] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [79] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [82] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [85] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [88] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [91] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [94] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [97] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[100] No Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$`ctDNA Positive`
+[1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$overall
+  [1] <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [10] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [19] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [28] Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    Non-pCR
+ [37] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>   
+ [46] <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>   
+ [55] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [64] Non-pCR Non-pCR Non-pCR <NA>    <NA>    <NA>    Non-pCR <NA>    Non-pCR
+ [73] <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [82] Non-pCR Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR
+ [91] <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+[100] pCR     <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>   
+[109] <NA>   
+attr(,"label")
+[1] Pathologic Complete Response
+Levels: pCR Non-pCR
+
+$`ctDNA Negative`
+  [1] <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [10] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [19] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR
+ [28] <NA>    <NA>    <NA>    Non-pCR Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [37] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>   
+ [46] <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   
+ [55] <NA>    <NA>    <NA>    <NA>    Non-pCR Non-pCR Non-pCR <NA>    <NA>   
+ [64] Non-pCR <NA>    Non-pCR <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>   
+ [73] <NA>    <NA>    <NA>    Non-pCR Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [82] <NA>    <NA>    Non-pCR <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
+ [91] <NA>    pCR     <NA>    Non-pCR <NA>    <NA>    <NA>    Non-pCR <NA>   
+[100] <NA>   
+Levels: pCR Non-pCR
+
+$`ctDNA Positive`
+[1] <NA>    <NA>    <NA>    <NA>    Non-pCR <NA>    <NA>    <NA>    <NA>   
+Levels: pCR Non-pCR
+
+
table1_p #we have p-values!  
+
+
+ +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Total
+(N=109)
ctDNA Negative
+(N=100)
ctDNA Positive
+(N=9)
P-value
Age at Diagnosis (years)0.118
Mean (SD)49.7 (9.66)49.2 (9.64)54.6 (8.94)
Median [Min, Max]49.3 [27.3, 68.9]49.0 [27.3, 68.9]55.6 [38.6, 64.4]
Final Receptor Group0.0891
TNBC45 (41.3%)44 (44.0%)1 (11.1%)
HR+ HER2-52 (47.7%)45 (45.0%)7 (77.8%)
HR+ HER2+8 (7.3%)8 (8.0%)0 (0%)
HR- HER2+4 (3.7%)3 (3.0%)1 (11.1%)
Race0.594
Black9 (8.3%)8 (8.0%)1 (11.1%)
Asian1 (0.9%)1 (1.0%)0 (0%)
White99 (90.8%)91 (91.0%)8 (88.9%)
Tumor Grade0.0366
Grade 379 (72.5%)75 (75.0%)4 (44.4%)
Grade 122 (20.2%)17 (17.0%)5 (55.6%)
Grade 26 (5.5%)6 (6.0%)0 (0%)
Missing2 (1.8%)2 (2.0%)0 (0%)
Overall Stage0.00814
Stage I35 (32.1%)33 (33.0%)2 (22.2%)
Stage II47 (43.1%)46 (46.0%)1 (11.1%)
Stage III26 (23.9%)20 (20.0%)6 (66.7%)
Missing1 (0.9%)1 (1.0%)0 (0%)
T Stage0.119
T151 (46.8%)48 (48.0%)3 (33.3%)
T244 (40.4%)40 (40.0%)4 (44.4%)
T312 (11.0%)11 (11.0%)1 (11.1%)
T41 (0.9%)0 (0%)1 (11.1%)
Missing1 (0.9%)1 (1.0%)0 (0%)
N Stage<0.001
N046 (42.2%)43 (43.0%)3 (33.3%)
N143 (39.4%)43 (43.0%)0 (0%)
N213 (11.9%)8 (8.0%)5 (55.6%)
N37 (6.4%)6 (6.0%)1 (11.1%)
Histology Category0.284
Both Ductal and Lobular9 (8.3%)9 (9.0%)0 (0%)
Ductal84 (77.1%)78 (78.0%)6 (66.7%)
Lobular14 (12.8%)11 (11.0%)3 (33.3%)
Other2 (1.8%)2 (2.0%)0 (0%)
Radiation0.268
No Radiation34 (31.2%)33 (33.0%)1 (11.1%)
Radiation75 (68.8%)67 (67.0%)8 (88.9%)
Chemo0.23
No Chemo3 (2.8%)2 (2.0%)1 (11.1%)
Chemo106 (97.2%)98 (98.0%)8 (88.9%)
Endocrine Therapy0.295
No Endocrine Therapy47 (43.1%)45 (45.0%)2 (22.2%)
Endocrine Therapy62 (56.9%)55 (55.0%)7 (77.8%)
Bone Modifying Treatment0.719
No Bone Modifying Treatment70 (64.2%)65 (65.0%)5 (55.6%)
Bone Modifying Treatment39 (35.8%)35 (35.0%)4 (44.4%)
Node Status0.731
Node Negative46 (42.2%)43 (43.0%)3 (33.3%)
Node Positive63 (57.8%)57 (57.0%)6 (66.7%)
Axillary Dissection0.161
No Axillary Dissection54 (49.5%)52 (52.0%)2 (22.2%)
Axillary Dissection55 (50.5%)48 (48.0%)7 (77.8%)
Surgery Type1
Lumpectomy45 (41.3%)41 (41.0%)4 (44.4%)
Mastectomy64 (58.7%)59 (59.0%)5 (55.6%)
Neoadjuvant Chemo1
No Neoadjuvant Chemo90 (82.6%)82 (82.0%)8 (88.9%)
Neoadjuvant Chemo19 (17.4%)18 (18.0%)1 (11.1%)
Pathologic Complete Response1
pCR1 (0.9%)1 (1.0%)0 (0%)
Non-pCR18 (16.5%)17 (17.0%)1 (11.1%)
Missing90 (82.6%)82 (82.0%)8 (88.9%)
+ +
+
+
+

We can see in this Table 1 by ctDNA status, including tests of association, that the following variables have significant (p<0.05) associations: Tumor Grade (higher grade associated with positivity), overall stage (higher stage associated with positivity), N-stage (with higher N-stage seemingly associated with positivity), with trends towards significance (approaching a significant p-value) for receptor status and age at diagnosis.

+
+
+
+

4.2 Table of demographics and clinical factors by DTC status

+

Next we will create a Table to look at demographic and clinical factors by DTC status, including tests of association.

+
+
####### Table of clinical and demographic factors by DTC status ######### 
+
+# Prepare the dataset
+dtc_unique_subset_data <- subset_data |>
+  mutate(
+    # Replace "Missing" and 99 with NA in relevant columns
+    final_t_stage = na_if(as.character(final_t_stage), "Missing"),
+    final_t_stage = na_if(final_t_stage, "99"),
+    final_overall_stage = na_if(as.character(final_overall_stage), "Missing"),
+    final_overall_stage = na_if(final_overall_stage, "99"),
+    final_tumor_grade = na_if(final_tumor_grade, 3), # Assumes 3 means "Not Reported"
+    diag_pcr_1 = na_if(diag_pcr_1, "."),
+    # Replace 99 with NA in all numeric columns
+    across(where(is.numeric), ~ na_if(.x, 99))
+  ) |>
+  group_by(participant_id) |>
+  summarize(
+    # Summarize unique participant-level data
+    age_at_diag = first(na.omit(age_at_diag)),
+    final_receptor_group = first(na.omit(final_receptor_group)),
+    demo_race_final = first(na.omit(demo_race_final)),
+    final_tumor_grade = first(na.omit(final_tumor_grade)),
+    final_overall_stage = first(na.omit(final_overall_stage)),
+    final_t_stage = first(na.omit(final_t_stage)),
+    final_n_stage = first(na.omit(final_n_stage)),
+    histology_category = first(na.omit(histology_category)),
+    prtx_radiation = first(na.omit(prtx_radiation)),
+    prtx_chemo = first(na.omit(prtx_chemo)),
+    prtx_endo = first(na.omit(prtx_endo)),
+    prtx_bonemod = first(na.omit(prtx_bonemod)),
+    node_status = first(na.omit(node_status)),
+    axillary_dissection = first(na.omit(axillary_dissection)),
+    diag_surgery_type_1 = first(na.omit(diag_surgery_type_1)),
+    diag_neoadj_chemo_1 = first(na.omit(diag_neoadj_chemo_1)),
+    diag_pcr_1 = first(na.omit(diag_pcr_1)),
+    ctDNA_ever = first(na.omit(ctDNA_ever)),
+    dtc_ever = first(na.omit(dtc_ever))
+  )
+
+# Convert variables to labeled factors for table output
+dtc_unique_subset_data <- dtc_unique_subset_data |>
+  mutate(
+    final_receptor_group = factor(final_receptor_group, levels = c(1, 2, 3, 4),
+                                  labels = c("TNBC", "HR+ HER2-", "HR+ HER2+", "HR- HER2+")),
+    race = factor(demo_race_final, levels = c(1, 3, 5),
+                             labels = c("Black", "Asian", "White")),
+    final_tumor_grade = factor(final_tumor_grade, levels = c(0, 1, 2),
+                               labels = c("Grade 3", "Grade 1", "Grade 2")),
+    final_overall_stage = factor(final_overall_stage, levels = c(1, 2, 3),
+                                 labels = c("Stage I", "Stage II", "Stage III")),
+    final_t_stage = factor(final_t_stage, levels = c(1, 2, 3, 4),
+                           labels = c("T1", "T2", "T3", "T4")),
+    final_n_stage = factor(final_n_stage, levels = c(0, 1, 2, 3),
+                           labels = c("N0", "N1", "N2", "N3")),
+    prtx_radiation = factor(prtx_radiation, levels = c(0, 1),
+                            labels = c("No Radiation", "Radiation")),
+    prtx_chemo = factor(prtx_chemo, levels = c(0, 1),
+                        labels = c("No Chemo", "Chemo")),
+    prtx_endo = factor(prtx_endo, levels = c(0, 1),
+                       labels = c("No Endocrine Therapy", "Endocrine Therapy")),
+    prtx_bonemod = factor(prtx_bonemod, levels = c(0, 1),
+                          labels = c("No Bone Modifying Treatment", "Bone Modifying Treatment")),
+    axillary_dissection = factor(axillary_dissection, levels = c(0, 1),
+                                 labels = c("No Axillary Dissection", "Axillary Dissection")),
+    diag_surgery_type_1 = factor(diag_surgery_type_1, levels = c(1, 2),
+                                 labels = c("Lumpectomy", "Mastectomy")),
+    diag_neoadj_chemo_1 = factor(diag_neoadj_chemo_1, levels = c(0, 1),
+                                 labels = c("No Neoadjuvant Chemo", "Neoadjuvant Chemo")),
+    diag_pcr_1 = factor(diag_pcr_1, levels = c(1, 2),
+                        labels = c("pCR", "Non-pCR")),
+    ctDNA_ever = factor(ctDNA_ever, levels = c("FALSE", "TRUE"),
+                        labels = c("ctDNA Negative", "ctDNA Positive")),
+    dtc_ever = factor(dtc_ever, levels = c(0, 1),
+                      labels = c("DTC Negative", "DTC Positive"))
+  )
+
+#### Labels 
+
+label(dtc_unique_subset_data$age_at_diag) <- "Age at Diagnosis"
+units(dtc_unique_subset_data$age_at_diag)       <- "years"
+
+# assign `final_receptor_group` labels to `dc_unique_subset_data`
+label(dtc_unique_subset_data$final_receptor_group)       <- "Final Receptor Group"
+
+##demo_race_final 
+label(dtc_unique_subset_data$demo_race_final)  <- "Race"
+
+
+#final_tumor_grade 
+
+label(dtc_unique_subset_data$final_tumor_grade)  <- "Tumor Grade"
+
+
+#final_overall_stage
+
+label(dtc_unique_subset_data$final_overall_stage)  <- "Overall Stage"
+
+#final_t_stage
+label(dtc_unique_subset_data$final_t_stage)  <- "T Stage"
+
+
+#final_n_stage 
+label(dtc_unique_subset_data$final_n_stage)  <- "N Stage"
+
+#histology_category
+
+
+label(dtc_unique_subset_data$histology_category)  <- "Histology Category"
+
+
+#prtx_radiation 
+
+label(dtc_unique_subset_data$prtx_radiation)  <- "Radiation"
+
+#prtx_chemo
+label(dtc_unique_subset_data$prtx_chemo)  <- "Chemo"
+
+#prtx_endo
+label(dtc_unique_subset_data$prtx_endo)  <- "Endocrine Therapy"
+
+#prtx_bonemod 
+label(dtc_unique_subset_data$prtx_bonemod)  <- "Bone Modifying Treatment"
+
+#node_status 
+label(dtc_unique_subset_data$node_status)  <- "Node Status"
+
+#axillary_dissection 
+label(dtc_unique_subset_data$axillary_dissection)  <- "Axillary Dissection"
+
+#diag_surgery_type_1
+label(dtc_unique_subset_data$diag_surgery_type_1)  <- "Surgery Type"
+
+#diag_neoadj_chemo_1 
+
+label(dtc_unique_subset_data$diag_neoadj_chemo_1)  <- "Neoadjuvant Chemo"
+
+#pCR 
+
+label(dtc_unique_subset_data$diag_pcr_1)  <- "Pathologic Complete Response"
+
+
+#DTC_ever 
+label(dtc_unique_subset_data$ctDNA_ever)  <- "DTC Status"
+
+
+####
+
+# Step 1: Create table1 output
+table1_output <- table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 | 
+    dtc_ever,
+  data = dtc_unique_subset_data
+)
+
+table1_output
+
+
+ ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DTC Negative
+(N=70)
DTC Positive
+(N=39)
Overall
+(N=109)
Age at Diagnosis (years)
Mean (SD)49.9 (9.74)49.2 (9.63)49.7 (9.66)
Median [Min, Max]51.6 [27.3, 68.9]48.8 [30.7, 67.7]49.3 [27.3, 68.9]
Final Receptor Group
TNBC25 (35.7%)20 (51.3%)45 (41.3%)
HR+ HER2-37 (52.9%)15 (38.5%)52 (47.7%)
HR+ HER2+4 (5.7%)4 (10.3%)8 (7.3%)
HR- HER2+4 (5.7%)0 (0%)4 (3.7%)
Race
Mean (SD)4.69 (1.06)4.59 (1.23)4.65 (1.12)
Median [Min, Max]5.00 [1.00, 5.00]5.00 [1.00, 5.00]5.00 [1.00, 5.00]
Tumor Grade
Grade 346 (65.7%)33 (84.6%)79 (72.5%)
Grade 118 (25.7%)4 (10.3%)22 (20.2%)
Grade 24 (5.7%)2 (5.1%)6 (5.5%)
Missing2 (2.9%)0 (0%)2 (1.8%)
Overall Stage
Stage I22 (31.4%)13 (33.3%)35 (32.1%)
Stage II29 (41.4%)18 (46.2%)47 (43.1%)
Stage III18 (25.7%)8 (20.5%)26 (23.9%)
Missing1 (1.4%)0 (0%)1 (0.9%)
T Stage
T134 (48.6%)17 (43.6%)51 (46.8%)
T227 (38.6%)17 (43.6%)44 (40.4%)
T38 (11.4%)4 (10.3%)12 (11.0%)
T40 (0%)1 (2.6%)1 (0.9%)
Missing1 (1.4%)0 (0%)1 (0.9%)
N Stage
N024 (34.3%)22 (56.4%)46 (42.2%)
N132 (45.7%)11 (28.2%)43 (39.4%)
N210 (14.3%)3 (7.7%)13 (11.9%)
N34 (5.7%)3 (7.7%)7 (6.4%)
Histology Category
Both Ductal and Lobular9 (12.9%)0 (0%)9 (8.3%)
Ductal48 (68.6%)36 (92.3%)84 (77.1%)
Lobular11 (15.7%)3 (7.7%)14 (12.8%)
Other2 (2.9%)0 (0%)2 (1.8%)
Radiation
No Radiation23 (32.9%)11 (28.2%)34 (31.2%)
Radiation47 (67.1%)28 (71.8%)75 (68.8%)
Chemo
No Chemo1 (1.4%)2 (5.1%)3 (2.8%)
Chemo69 (98.6%)37 (94.9%)106 (97.2%)
Endocrine Therapy
No Endocrine Therapy28 (40.0%)19 (48.7%)47 (43.1%)
Endocrine Therapy42 (60.0%)20 (51.3%)62 (56.9%)
Bone Modifying Treatment
No Bone Modifying Treatment45 (64.3%)25 (64.1%)70 (64.2%)
Bone Modifying Treatment25 (35.7%)14 (35.9%)39 (35.8%)
Node Status
Node Negative24 (34.3%)22 (56.4%)46 (42.2%)
Node Positive46 (65.7%)17 (43.6%)63 (57.8%)
Axillary Dissection
No Axillary Dissection31 (44.3%)23 (59.0%)54 (49.5%)
Axillary Dissection39 (55.7%)16 (41.0%)55 (50.5%)
Surgery Type
Lumpectomy31 (44.3%)14 (35.9%)45 (41.3%)
Mastectomy39 (55.7%)25 (64.1%)64 (58.7%)
Neoadjuvant Chemo
No Neoadjuvant Chemo60 (85.7%)30 (76.9%)90 (82.6%)
Neoadjuvant Chemo10 (14.3%)9 (23.1%)19 (17.4%)
+ +
+
+
####
+pvalue_function <- function(x, ...) {
+  print(x)
+  # Remove any "overall" group if present and focus only on ctDNA+ and ctDNA- comparisons
+  x <- x[!names(x) %in% "overall"]  # Filter out the "Overall" column
+  y <- unlist(x)
+  g <- factor(rep(1:length(x), times = sapply(x, length)))
+  
+  # Debugging information to check group levels and data
+  if (length(unique(g)) != 2) {
+    return(NA)  # Return NA if not comparing exactly two groups
+  }
+
+  # Perform the appropriate test based on the type of variable
+  if (is.numeric(y)) {
+    # For continuous variables, perform a t-test
+    p <- t.test(y ~ g)$p.value
+  } else {
+    # For categorical variables, perform a chi-squared test or Fisher's test
+    table_result <- table(y, g)
+    
+    # Choose the correct test based on cell counts
+    if (any(table_result < 5)) {
+      p <- fisher.test(table_result)$p.value  # Use Fisher's test for low counts
+    } else {
+      p <- chisq.test(table_result)$p.value  # Use chi-squared test otherwise
+    }
+  }
+  
+  # Format the p-value for output
+  formatted_p <- format.pval(p, digits = 3, eps = 0.001)
+  return(formatted_p)
+}
+  
+
+# Generate table1 with the p-value column
+table1_dtc <- table1(
+  ~ age_at_diag + final_receptor_group + demo_race_final + 
+    final_tumor_grade + final_overall_stage + 
+    final_t_stage + final_n_stage + 
+    histology_category + prtx_radiation + 
+    prtx_chemo + prtx_endo + prtx_bonemod + 
+    node_status + axillary_dissection + 
+    diag_surgery_type_1 + diag_neoadj_chemo_1 | 
+    dtc_ever,
+  data = dtc_unique_subset_data,
+  overall = c(left = "Total"),
+  extra.col = list("P-value" = pvalue_function),  # Add p-value function
+  extra.col.pos = 4  # Position of the extra column
+)
+
+
$overall
+  [1] 55.89870 49.25667 52.87611 29.93840 37.00753 48.98563 63.80835 40.89802
+  [9] 43.59754 38.57632 41.77687 45.68925 59.94524 59.43600 52.14511 42.93771
+ [17] 64.69541 55.14031 41.26762 39.52361 57.76044 44.42984 51.34565 42.27789
+ [25] 57.05133 57.62628 54.86927 44.18891 63.62491 36.00548 55.57837 30.71595
+ [33] 41.28953 59.38946 48.79945 59.15400 48.97194 59.39767 39.67967 67.68515
+ [41] 41.84531 48.16975 58.07529 62.49966 46.64476 47.34565 52.09856 36.58042
+ [49] 58.26146 61.76318 61.73580 39.40862 55.30459 53.10335 43.30459 48.46270
+ [57] 44.07666 52.55305 56.45996 67.72621 39.59206 51.82752 58.28611 46.93498
+ [65] 31.17591 55.96441 46.38741 46.33812 40.62971 37.67556 32.35318 48.75291
+ [73] 56.22177 39.41136 49.76591 43.22245 36.01095 41.30322 59.57016 39.65503
+ [81] 54.94593 43.50992 48.80767 62.10541 63.35934 57.31417 59.74264 66.92676
+ [89] 36.30938 34.83641 55.12115 52.07118 27.33744 64.41342 56.09035 47.90691
+ [97] 51.38125 41.71663 48.47639 40.52567 60.39151 52.51198 60.87064 58.61465
+[105] 38.60370 68.93634 37.84531 51.43874 52.68720
+attr(,"label")
+[1] "Age at Diagnosis"
+attr(,"units")
+[1] "years"
+
+$`DTC Negative`
+ [1] 55.89870 49.25667 52.87611 29.93840 48.98563 63.80835 43.59754 38.57632
+ [9] 45.68925 59.94524 59.43600 52.14511 55.14031 39.52361 54.86927 44.18891
+[17] 63.62491 41.28953 48.97194 59.39767 39.67967 41.84531 48.16975 46.64476
+[25] 52.09856 36.58042 58.26146 61.76318 61.73580 39.40862 53.10335 56.45996
+[33] 67.72621 39.59206 51.82752 58.28611 55.96441 46.38741 46.33812 40.62971
+[41] 37.67556 32.35318 56.22177 49.76591 43.22245 36.01095 39.65503 54.94593
+[49] 43.50992 57.31417 59.74264 66.92676 36.30938 34.83641 55.12115 52.07118
+[57] 27.33744 64.41342 56.09035 47.90691 51.38125 41.71663 48.47639 60.39151
+[65] 52.51198 60.87064 58.61465 68.93634 37.84531 52.68720
+
+$`DTC Positive`
+ [1] 37.00753 40.89802 41.77687 42.93771 64.69541 41.26762 57.76044 44.42984
+ [9] 51.34565 42.27789 57.05133 57.62628 36.00548 55.57837 30.71595 59.38946
+[17] 48.79945 59.15400 67.68515 58.07529 62.49966 47.34565 55.30459 43.30459
+[25] 48.46270 44.07666 52.55305 46.93498 31.17591 48.75291 39.41136 41.30322
+[33] 59.57016 48.80767 62.10541 63.35934 40.52567 38.60370 51.43874
+
+$overall
+  [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR- HER2+
+  [8] HR+ HER2+ TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2+
+ [15] TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC      TNBC     
+ [22] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
+ [29] HR+ HER2- TNBC      TNBC      HR+ HER2+ TNBC      HR+ HER2- HR+ HER2-
+ [36] HR+ HER2+ TNBC      HR+ HER2- TNBC      HR+ HER2- TNBC      TNBC     
+ [43] HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2-
+ [50] HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2- TNBC      HR+ HER2-
+ [57] TNBC      TNBC      TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [64] TNBC      TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2-
+ [71] TNBC      HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2+ HR+ HER2-
+ [78] TNBC      HR+ HER2- HR+ HER2- TNBC      HR- HER2+ TNBC      HR+ HER2-
+ [85] HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2-
+ [92] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC     
+ [99] HR+ HER2- TNBC      HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2-
+[106] HR+ HER2- HR- HER2+ TNBC      HR+ HER2-
+attr(,"label")
+[1] Final Receptor Group
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$`DTC Negative`
+ [1] HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2- HR- HER2+ TNBC     
+ [8] TNBC      HR+ HER2- HR+ HER2- HR+ HER2+ TNBC      TNBC      TNBC     
+[15] TNBC      TNBC      HR+ HER2- TNBC      TNBC      HR+ HER2- TNBC     
+[22] TNBC      TNBC      HR+ HER2+ HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2-
+[29] TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC      HR+ HER2-
+[36] HR+ HER2- TNBC      HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC     
+[43] HR+ HER2- TNBC      HR+ HER2+ HR+ HER2- HR+ HER2- TNBC      HR- HER2+
+[50] HR+ HER2- HR+ HER2- TNBC      TNBC      HR+ HER2- HR+ HER2- HR+ HER2-
+[57] HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- HR+ HER2- TNBC      HR+ HER2-
+[64] HR+ HER2- HR- HER2+ HR+ HER2- HR+ HER2- HR+ HER2- HR- HER2+ HR+ HER2-
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$`DTC Positive`
+ [1] TNBC      HR+ HER2+ TNBC      TNBC      HR+ HER2- TNBC      TNBC     
+ [8] HR+ HER2- HR+ HER2- TNBC      TNBC      TNBC      TNBC      TNBC     
+[15] HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2+ HR+ HER2- HR+ HER2- HR+ HER2-
+[22] HR+ HER2- TNBC      TNBC      HR+ HER2- TNBC      TNBC      TNBC     
+[29] TNBC      HR+ HER2- HR+ HER2- TNBC      HR+ HER2- TNBC      HR+ HER2-
+[36] HR+ HER2+ TNBC      HR+ HER2- TNBC     
+Levels: TNBC HR+ HER2- HR+ HER2+ HR- HER2+
+
+$overall
+  [1] 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 1 5 5 5 5 5 5 1 5 5 5 5
+ [38] 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5
+ [75] 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 3 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5
+attr(,"label")
+[1] "Race"
+
+$`DTC Negative`
+ [1] 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 5
+[39] 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 3 5 5 5 5 1 5 5 5 5 5 5 5 5
+
+$`DTC Positive`
+ [1] 5 5 5 5 5 1 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 5 5 1 5 5 1 5 5 5 5 5 5 5 5 5
+[39] 5
+
+$overall
+  [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [10] Grade 3 Grade 3 Grade 3 <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3
+ [19] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [28] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3
+ [37] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 2
+ [46] Grade 1 Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 <NA>   
+ [55] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+ [64] Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
+ [73] Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+ [82] Grade 3 Grade 3 Grade 2 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+ [91] Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+[100] Grade 3 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+[109] Grade 3
+attr(,"label")
+[1] Tumor Grade
+Levels: Grade 3 Grade 1 Grade 2
+
+$`DTC Negative`
+ [1] Grade 2 Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+[10] <NA>    Grade 2 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+[19] Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 2 Grade 3 Grade 1 Grade 1
+[28] Grade 1 Grade 3 Grade 3 <NA>    Grade 3 Grade 3 Grade 3 Grade 1 Grade 3
+[37] Grade 3 Grade 1 Grade 1 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3
+[46] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+[55] Grade 3 Grade 1 Grade 1 Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 1
+[64] Grade 1 Grade 3 Grade 1 Grade 3 Grade 3 Grade 3 Grade 3
+Levels: Grade 3 Grade 1 Grade 2
+
+$`DTC Positive`
+ [1] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3
+[10] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 1 Grade 1 Grade 3
+[19] Grade 3 Grade 3 Grade 2 Grade 1 Grade 3 Grade 3 Grade 1 Grade 3 Grade 3
+[28] Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 3 Grade 2 Grade 3
+[37] Grade 3 Grade 3 Grade 3
+Levels: Grade 3 Grade 1 Grade 2
+
+$overall
+  [1] Stage III Stage I   Stage III Stage II  Stage II  Stage III Stage III
+  [8] Stage III Stage II  Stage I   Stage II  Stage II  Stage II  Stage II 
+ [15] Stage I   Stage II  Stage I   Stage II  Stage II  Stage I   Stage II 
+ [22] Stage III Stage III Stage II  Stage II  Stage II  Stage I   Stage II 
+ [29] Stage III Stage II  Stage I   Stage II  Stage I   Stage II  Stage III
+ [36] Stage II  Stage II  Stage III Stage I   Stage I   Stage II  Stage III
+ [43] Stage II  Stage I   Stage I   Stage III Stage II  Stage III Stage II 
+ [50] Stage I   Stage II  Stage II  Stage I   Stage II  Stage I   Stage III
+ [57] Stage I   Stage II  Stage I   Stage II  Stage II  Stage III Stage I  
+ [64] Stage II  Stage II  Stage II  Stage III Stage II  Stage I   Stage II 
+ [71] Stage I   Stage II  Stage I   Stage III Stage III Stage I   Stage III
+ [78] Stage I   Stage I   Stage I   Stage I   Stage II  Stage III Stage I  
+ [85] Stage II  Stage I   Stage III Stage I   Stage II  Stage II  Stage II 
+ [92] Stage III Stage II  Stage III Stage III Stage I   Stage II  Stage II 
+ [99] <NA>      Stage I   Stage I   Stage III Stage III Stage I   Stage I  
+[106] Stage II  Stage II  Stage I   Stage II 
+attr(,"label")
+[1] Overall Stage
+Levels: Stage I Stage II Stage III
+
+$`DTC Negative`
+ [1] Stage III Stage I   Stage III Stage II  Stage III Stage III Stage II 
+ [8] Stage I   Stage II  Stage II  Stage II  Stage I   Stage II  Stage I  
+[15] Stage I   Stage II  Stage III Stage I   Stage II  Stage III Stage I  
+[22] Stage II  Stage III Stage I   Stage II  Stage III Stage II  Stage I  
+[29] Stage II  Stage II  Stage II  Stage I   Stage II  Stage II  Stage III
+[36] Stage I   Stage II  Stage III Stage II  Stage I   Stage II  Stage I  
+[43] Stage I   Stage III Stage I   Stage III Stage I   Stage I   Stage II 
+[50] Stage I   Stage III Stage I   Stage II  Stage II  Stage II  Stage III
+[57] Stage II  Stage III Stage III Stage I   Stage II  Stage II  <NA>     
+[64] Stage I   Stage III Stage III Stage I   Stage II  Stage II  Stage II 
+Levels: Stage I Stage II Stage III
+
+$`DTC Positive`
+ [1] Stage II  Stage III Stage II  Stage II  Stage I   Stage II  Stage II 
+ [8] Stage III Stage III Stage II  Stage II  Stage II  Stage II  Stage I  
+[15] Stage II  Stage II  Stage III Stage II  Stage I   Stage II  Stage I  
+[22] Stage III Stage I   Stage I   Stage III Stage I   Stage II  Stage II 
+[29] Stage II  Stage II  Stage III Stage I   Stage I   Stage III Stage I  
+[36] Stage II  Stage I   Stage I   Stage I  
+Levels: Stage I Stage II Stage III
+
+$overall
+  [1] T2   T1   T2   T2   T2   T3   T2   T1   T2   T1   T2   T2   T3   T1   T1  
+ [16] T2   T1   T1   T2   T1   T2   T2   T2   T2   T1   T2   T1   <NA> T2   T2  
+ [31] T1   T1   T1   T3   T4   T2   T2   T1   T1   T1   T2   T3   T2   T1   T1  
+ [46] T3   T1   T1   T2   T1   T2   T2   T1   T1   T1   T3   T1   T1   T1   T2  
+ [61] T2   T3   T1   T2   T2   T2   T1   T1   T1   T2   T1   T2   T1   T3   T3  
+ [76] T1   T2   T1   T1   T1   T1   T2   T2   T1   T2   T1   T1   T1   T2   T2  
+ [91] T2   T3   T2   T2   T1   T1   T2   T1   T1   T1   T1   T3   T3   T1   T1  
+[106] T2   T2   T1   T2  
+attr(,"label")
+[1] T Stage
+Levels: T1 T2 T3 T4
+
+$`DTC Negative`
+ [1] T2   T1   T2   T2   T3   T2   T2   T1   T2   T3   T1   T1   T1   T1   T1  
+[16] <NA> T2   T1   T2   T1   T1   T2   T3   T1   T1   T1   T2   T1   T2   T2  
+[31] T1   T1   T2   T2   T3   T1   T2   T1   T1   T1   T2   T1   T1   T3   T1  
+[46] T2   T1   T1   T2   T1   T1   T1   T2   T2   T2   T3   T2   T2   T1   T1  
+[61] T2   T1   T1   T1   T3   T3   T1   T2   T2   T2  
+Levels: T1 T2 T3 T4
+
+$`DTC Positive`
+ [1] T2 T1 T2 T2 T1 T2 T2 T2 T2 T2 T1 T2 T2 T1 T1 T3 T4 T2 T1 T2 T1 T3 T1 T1 T3
+[26] T1 T1 T2 T2 T2 T3 T1 T1 T2 T1 T2 T1 T1 T1
+Levels: T1 T2 T3 T4
+
+$overall
+  [1] N3 N0 N3 N0 N0 N2 N2 N2 N0 N0 N0 N1 N0 N1 N0 N1 N0 N1 N0 N0 N0 N3 N3 N1 N1
+ [26] N0 N0 N0 N2 N0 N0 N1 N0 N0 N2 N0 N0 N2 N0 N0 N1 N2 N0 N0 N0 N2 N1 N2 N1 N0
+ [51] N1 N1 N0 N1 N0 N1 N0 N1 N1 N1 N0 N1 N0 N1 N1 N0 N3 N1 N1 N1 N0 N1 N1 N3 N1
+ [76] N1 N3 N0 N0 N1 N0 N1 N1 N1 N0 N0 N2 N0 N1 N1 N1 N2 N1 N2 N2 N0 N1 N1 N1 N0
+[101] N1 N1 N1 N0 N0 N1 N1 N0 N0
+attr(,"label")
+[1] N Stage
+Levels: N0 N1 N2 N3
+
+$`DTC Negative`
+ [1] N3 N0 N3 N0 N2 N2 N0 N0 N1 N0 N1 N0 N1 N0 N0 N0 N2 N0 N0 N2 N0 N1 N2 N0 N1
+[26] N2 N1 N0 N1 N1 N1 N1 N1 N0 N1 N0 N0 N3 N1 N1 N1 N0 N1 N1 N1 N3 N1 N0 N1 N0
+[51] N2 N0 N1 N1 N1 N2 N1 N2 N2 N0 N1 N1 N1 N1 N1 N1 N0 N1 N1 N0
+Levels: N0 N1 N2 N3
+
+$`DTC Positive`
+ [1] N0 N2 N0 N1 N0 N0 N0 N3 N3 N1 N1 N0 N0 N0 N1 N0 N2 N0 N0 N0 N0 N2 N0 N0 N1
+[26] N0 N1 N1 N1 N1 N3 N0 N0 N1 N1 N0 N0 N0 N0
+Levels: N0 N1 N2 N3
+
+$overall
+  [1] Both Ductal and Lobular Ductal                  Ductal                 
+  [4] Ductal                  Ductal                  Lobular                
+  [7] Ductal                  Ductal                  Ductal                 
+ [10] Ductal                  Ductal                  Ductal                 
+ [13] Lobular                 Ductal                  Ductal                 
+ [16] Ductal                  Ductal                  Ductal                 
+ [19] Ductal                  Ductal                  Ductal                 
+ [22] Ductal                  Ductal                  Ductal                 
+ [25] Ductal                  Ductal                  Ductal                 
+ [28] Ductal                  Ductal                  Ductal                 
+ [31] Ductal                  Ductal                  Ductal                 
+ [34] Lobular                 Ductal                  Ductal                 
+ [37] Ductal                  Ductal                  Ductal                 
+ [40] Ductal                  Ductal                  Ductal                 
+ [43] Ductal                  Ductal                  Other                  
+ [46] Lobular                 Ductal                  Ductal                 
+ [49] Lobular                 Lobular                 Ductal                 
+ [52] Ductal                  Ductal                  Lobular                
+ [55] Ductal                  Lobular                 Ductal                 
+ [58] Ductal                  Ductal                  Ductal                 
+ [61] Other                   Lobular                 Ductal                 
+ [64] Ductal                  Ductal                  Ductal                 
+ [67] Lobular                 Ductal                  Ductal                 
+ [70] Ductal                  Ductal                  Ductal                 
+ [73] Both Ductal and Lobular Ductal                  Ductal                 
+ [76] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
+ [79] Ductal                  Both Ductal and Lobular Ductal                 
+ [82] Ductal                  Ductal                  Ductal                 
+ [85] Ductal                  Ductal                  Ductal                 
+ [88] Ductal                  Ductal                  Ductal                 
+ [91] Both Ductal and Lobular Lobular                 Ductal                 
+ [94] Lobular                 Ductal                  Ductal                 
+ [97] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
+[100] Ductal                  Lobular                 Ductal                 
+[103] Lobular                 Ductal                  Ductal                 
+[106] Ductal                  Ductal                  Ductal                 
+[109] Both Ductal and Lobular
+Levels: Both Ductal and Lobular Ductal Lobular Other
+
+$`DTC Negative`
+ [1] Both Ductal and Lobular Ductal                  Ductal                 
+ [4] Ductal                  Lobular                 Ductal                 
+ [7] Ductal                  Ductal                  Ductal                 
+[10] Lobular                 Ductal                  Ductal                 
+[13] Ductal                  Ductal                  Ductal                 
+[16] Ductal                  Ductal                  Ductal                 
+[19] Ductal                  Ductal                  Ductal                 
+[22] Ductal                  Ductal                  Other                  
+[25] Ductal                  Ductal                  Lobular                
+[28] Lobular                 Ductal                  Ductal                 
+[31] Lobular                 Ductal                  Ductal                 
+[34] Other                   Lobular                 Ductal                 
+[37] Ductal                  Lobular                 Ductal                 
+[40] Ductal                  Ductal                  Ductal                 
+[43] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
+[46] Both Ductal and Lobular Both Ductal and Lobular Ductal                 
+[49] Ductal                  Ductal                  Ductal                 
+[52] Ductal                  Ductal                  Ductal                 
+[55] Both Ductal and Lobular Lobular                 Ductal                 
+[58] Lobular                 Ductal                  Ductal                 
+[61] Both Ductal and Lobular Ductal                  Both Ductal and Lobular
+[64] Lobular                 Ductal                  Lobular                
+[67] Ductal                  Ductal                  Ductal                 
+[70] Both Ductal and Lobular
+Levels: Both Ductal and Lobular Ductal Lobular Other
+
+$`DTC Positive`
+ [1] Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal 
+[10] Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Lobular Ductal  Ductal 
+[19] Ductal  Ductal  Ductal  Lobular Ductal  Ductal  Lobular Ductal  Ductal 
+[28] Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal  Ductal 
+[37] Ductal  Ductal  Ductal 
+Levels: Ductal Lobular
+
+$overall
+  [1] Radiation    No Radiation No Radiation No Radiation No Radiation
+  [6] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [11] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [16] No Radiation Radiation    Radiation    No Radiation Radiation   
+ [21] Radiation    Radiation    Radiation    Radiation    No Radiation
+ [26] Radiation    No Radiation No Radiation Radiation    No Radiation
+ [31] Radiation    Radiation    No Radiation Radiation    Radiation   
+ [36] No Radiation No Radiation Radiation    No Radiation No Radiation
+ [41] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [46] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [51] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [56] Radiation    No Radiation Radiation    Radiation    No Radiation
+ [61] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [66] Radiation    Radiation    Radiation    No Radiation Radiation   
+ [71] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [76] No Radiation Radiation    Radiation    Radiation    Radiation   
+ [81] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [86] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [91] Radiation    Radiation    Radiation    Radiation    Radiation   
+ [96] No Radiation Radiation    Radiation    No Radiation No Radiation
+[101] No Radiation Radiation    Radiation    Radiation    No Radiation
+[106] No Radiation Radiation    No Radiation No Radiation
+attr(,"label")
+[1] Radiation
+Levels: No Radiation Radiation
+
+$`DTC Negative`
+ [1] Radiation    No Radiation No Radiation No Radiation Radiation   
+ [6] Radiation    Radiation    No Radiation Radiation    Radiation   
+[11] Radiation    No Radiation Radiation    Radiation    No Radiation
+[16] No Radiation Radiation    No Radiation No Radiation Radiation   
+[21] No Radiation Radiation    Radiation    Radiation    No Radiation
+[26] Radiation    Radiation    No Radiation Radiation    Radiation   
+[31] No Radiation Radiation    No Radiation No Radiation Radiation   
+[36] Radiation    Radiation    Radiation    Radiation    No Radiation
+[41] Radiation    No Radiation Radiation    Radiation    No Radiation
+[46] Radiation    Radiation    Radiation    Radiation    Radiation   
+[51] Radiation    Radiation    Radiation    Radiation    Radiation   
+[56] Radiation    Radiation    Radiation    Radiation    No Radiation
+[61] Radiation    Radiation    No Radiation No Radiation Radiation   
+[66] Radiation    Radiation    No Radiation Radiation    No Radiation
+Levels: No Radiation Radiation
+
+$`DTC Positive`
+ [1] No Radiation Radiation    Radiation    No Radiation Radiation   
+ [6] No Radiation Radiation    Radiation    Radiation    Radiation   
+[11] No Radiation Radiation    No Radiation Radiation    Radiation   
+[16] Radiation    Radiation    No Radiation No Radiation Radiation   
+[21] Radiation    Radiation    Radiation    Radiation    Radiation   
+[26] No Radiation Radiation    Radiation    Radiation    Radiation   
+[31] Radiation    Radiation    Radiation    Radiation    Radiation   
+[36] Radiation    No Radiation No Radiation No Radiation
+Levels: No Radiation Radiation
+
+$overall
+  [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+  [9] Chemo    Chemo    Chemo    Chemo    Chemo    No Chemo Chemo    Chemo   
+ [17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [33] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo    Chemo   
+ [41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [73] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [81] Chemo    Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
+ [89] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [97] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[105] Chemo    Chemo    Chemo    Chemo    Chemo   
+attr(,"label")
+[1] Chemo
+Levels: No Chemo Chemo
+
+$`DTC Negative`
+ [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [9] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo    Chemo   
+[17] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[33] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[41] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[49] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[57] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[65] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+Levels: No Chemo Chemo
+
+$`DTC Positive`
+ [1] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+ [9] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[17] No Chemo Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[25] Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo    Chemo   
+[33] Chemo    Chemo    No Chemo Chemo    Chemo    Chemo    Chemo   
+Levels: No Chemo Chemo
+
+$overall
+  [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+  [4] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+  [7] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [10] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [13] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [16] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [19] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [22] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [25] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [28] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [31] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [34] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [37] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [40] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+ [43] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [46] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [49] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [52] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [55] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [58] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [61] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+ [64] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+ [67] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [70] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+ [73] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [76] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [79] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [82] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [85] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [88] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+ [91] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [94] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [97] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[100] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[103] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[106] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[109] Endocrine Therapy   
+attr(,"label")
+[1] Endocrine Therapy
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$`DTC Negative`
+ [1] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+ [4] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [7] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[10] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[13] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+[16] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[19] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+[22] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[25] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[28] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[31] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[34] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[37] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[40] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[43] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[46] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[49] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[52] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[55] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[58] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[61] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[64] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[67] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+[70] Endocrine Therapy   
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$`DTC Positive`
+ [1] Endocrine Therapy    Endocrine Therapy    No Endocrine Therapy
+ [4] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+ [7] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[10] No Endocrine Therapy No Endocrine Therapy No Endocrine Therapy
+[13] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[16] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[19] Endocrine Therapy    Endocrine Therapy    Endocrine Therapy   
+[22] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[25] Endocrine Therapy    No Endocrine Therapy No Endocrine Therapy
+[28] No Endocrine Therapy No Endocrine Therapy Endocrine Therapy   
+[31] Endocrine Therapy    No Endocrine Therapy Endocrine Therapy   
+[34] No Endocrine Therapy Endocrine Therapy    Endocrine Therapy   
+[37] No Endocrine Therapy Endocrine Therapy    No Endocrine Therapy
+Levels: No Endocrine Therapy Endocrine Therapy
+
+$overall
+  [1] No Bone Modifying Treatment No Bone Modifying Treatment
+  [3] No Bone Modifying Treatment No Bone Modifying Treatment
+  [5] No Bone Modifying Treatment Bone Modifying Treatment   
+  [7] No Bone Modifying Treatment No Bone Modifying Treatment
+  [9] No Bone Modifying Treatment No Bone Modifying Treatment
+ [11] No Bone Modifying Treatment No Bone Modifying Treatment
+ [13] No Bone Modifying Treatment No Bone Modifying Treatment
+ [15] No Bone Modifying Treatment No Bone Modifying Treatment
+ [17] No Bone Modifying Treatment No Bone Modifying Treatment
+ [19] No Bone Modifying Treatment No Bone Modifying Treatment
+ [21] No Bone Modifying Treatment Bone Modifying Treatment   
+ [23] Bone Modifying Treatment    No Bone Modifying Treatment
+ [25] No Bone Modifying Treatment No Bone Modifying Treatment
+ [27] No Bone Modifying Treatment No Bone Modifying Treatment
+ [29] No Bone Modifying Treatment No Bone Modifying Treatment
+ [31] No Bone Modifying Treatment No Bone Modifying Treatment
+ [33] No Bone Modifying Treatment Bone Modifying Treatment   
+ [35] No Bone Modifying Treatment Bone Modifying Treatment   
+ [37] No Bone Modifying Treatment Bone Modifying Treatment   
+ [39] No Bone Modifying Treatment Bone Modifying Treatment   
+ [41] No Bone Modifying Treatment No Bone Modifying Treatment
+ [43] No Bone Modifying Treatment Bone Modifying Treatment   
+ [45] Bone Modifying Treatment    Bone Modifying Treatment   
+ [47] No Bone Modifying Treatment Bone Modifying Treatment   
+ [49] No Bone Modifying Treatment Bone Modifying Treatment   
+ [51] Bone Modifying Treatment    No Bone Modifying Treatment
+ [53] No Bone Modifying Treatment No Bone Modifying Treatment
+ [55] No Bone Modifying Treatment Bone Modifying Treatment   
+ [57] Bone Modifying Treatment    No Bone Modifying Treatment
+ [59] No Bone Modifying Treatment No Bone Modifying Treatment
+ [61] No Bone Modifying Treatment Bone Modifying Treatment   
+ [63] No Bone Modifying Treatment No Bone Modifying Treatment
+ [65] No Bone Modifying Treatment No Bone Modifying Treatment
+ [67] Bone Modifying Treatment    No Bone Modifying Treatment
+ [69] Bone Modifying Treatment    Bone Modifying Treatment   
+ [71] No Bone Modifying Treatment No Bone Modifying Treatment
+ [73] No Bone Modifying Treatment Bone Modifying Treatment   
+ [75] Bone Modifying Treatment    Bone Modifying Treatment   
+ [77] Bone Modifying Treatment    No Bone Modifying Treatment
+ [79] Bone Modifying Treatment    Bone Modifying Treatment   
+ [81] No Bone Modifying Treatment No Bone Modifying Treatment
+ [83] No Bone Modifying Treatment Bone Modifying Treatment   
+ [85] Bone Modifying Treatment    Bone Modifying Treatment   
+ [87] Bone Modifying Treatment    No Bone Modifying Treatment
+ [89] No Bone Modifying Treatment Bone Modifying Treatment   
+ [91] Bone Modifying Treatment    Bone Modifying Treatment   
+ [93] Bone Modifying Treatment    Bone Modifying Treatment   
+ [95] Bone Modifying Treatment    No Bone Modifying Treatment
+ [97] Bone Modifying Treatment    No Bone Modifying Treatment
+ [99] No Bone Modifying Treatment No Bone Modifying Treatment
+[101] Bone Modifying Treatment    No Bone Modifying Treatment
+[103] Bone Modifying Treatment    No Bone Modifying Treatment
+[105] Bone Modifying Treatment    No Bone Modifying Treatment
+[107] No Bone Modifying Treatment No Bone Modifying Treatment
+[109] No Bone Modifying Treatment
+attr(,"label")
+[1] Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$`DTC Negative`
+ [1] No Bone Modifying Treatment No Bone Modifying Treatment
+ [3] No Bone Modifying Treatment No Bone Modifying Treatment
+ [5] Bone Modifying Treatment    No Bone Modifying Treatment
+ [7] No Bone Modifying Treatment No Bone Modifying Treatment
+ [9] No Bone Modifying Treatment No Bone Modifying Treatment
+[11] No Bone Modifying Treatment No Bone Modifying Treatment
+[13] No Bone Modifying Treatment No Bone Modifying Treatment
+[15] No Bone Modifying Treatment No Bone Modifying Treatment
+[17] No Bone Modifying Treatment No Bone Modifying Treatment
+[19] No Bone Modifying Treatment Bone Modifying Treatment   
+[21] No Bone Modifying Treatment No Bone Modifying Treatment
+[23] No Bone Modifying Treatment Bone Modifying Treatment   
+[25] No Bone Modifying Treatment Bone Modifying Treatment   
+[27] No Bone Modifying Treatment Bone Modifying Treatment   
+[29] Bone Modifying Treatment    No Bone Modifying Treatment
+[31] No Bone Modifying Treatment No Bone Modifying Treatment
+[33] No Bone Modifying Treatment No Bone Modifying Treatment
+[35] Bone Modifying Treatment    No Bone Modifying Treatment
+[37] No Bone Modifying Treatment Bone Modifying Treatment   
+[39] No Bone Modifying Treatment Bone Modifying Treatment   
+[41] Bone Modifying Treatment    No Bone Modifying Treatment
+[43] No Bone Modifying Treatment Bone Modifying Treatment   
+[45] Bone Modifying Treatment    Bone Modifying Treatment   
+[47] Bone Modifying Treatment    No Bone Modifying Treatment
+[49] No Bone Modifying Treatment Bone Modifying Treatment   
+[51] Bone Modifying Treatment    No Bone Modifying Treatment
+[53] No Bone Modifying Treatment Bone Modifying Treatment   
+[55] Bone Modifying Treatment    Bone Modifying Treatment   
+[57] Bone Modifying Treatment    Bone Modifying Treatment   
+[59] Bone Modifying Treatment    No Bone Modifying Treatment
+[61] Bone Modifying Treatment    No Bone Modifying Treatment
+[63] No Bone Modifying Treatment Bone Modifying Treatment   
+[65] No Bone Modifying Treatment Bone Modifying Treatment   
+[67] No Bone Modifying Treatment No Bone Modifying Treatment
+[69] No Bone Modifying Treatment No Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$`DTC Positive`
+ [1] No Bone Modifying Treatment No Bone Modifying Treatment
+ [3] No Bone Modifying Treatment No Bone Modifying Treatment
+ [5] No Bone Modifying Treatment No Bone Modifying Treatment
+ [7] No Bone Modifying Treatment Bone Modifying Treatment   
+ [9] Bone Modifying Treatment    No Bone Modifying Treatment
+[11] No Bone Modifying Treatment No Bone Modifying Treatment
+[13] No Bone Modifying Treatment No Bone Modifying Treatment
+[15] No Bone Modifying Treatment Bone Modifying Treatment   
+[17] No Bone Modifying Treatment Bone Modifying Treatment   
+[19] Bone Modifying Treatment    No Bone Modifying Treatment
+[21] Bone Modifying Treatment    Bone Modifying Treatment   
+[23] No Bone Modifying Treatment No Bone Modifying Treatment
+[25] Bone Modifying Treatment    Bone Modifying Treatment   
+[27] No Bone Modifying Treatment No Bone Modifying Treatment
+[29] No Bone Modifying Treatment No Bone Modifying Treatment
+[31] Bone Modifying Treatment    No Bone Modifying Treatment
+[33] Bone Modifying Treatment    No Bone Modifying Treatment
+[35] Bone Modifying Treatment    Bone Modifying Treatment   
+[37] No Bone Modifying Treatment Bone Modifying Treatment   
+[39] No Bone Modifying Treatment
+Levels: No Bone Modifying Treatment Bone Modifying Treatment
+
+$overall
+  [1] Node Positive Node Negative Node Positive Node Negative Node Negative
+  [6] Node Positive Node Positive Node Positive Node Negative Node Negative
+ [11] Node Negative Node Positive Node Negative Node Positive Node Negative
+ [16] Node Positive Node Negative Node Positive Node Negative Node Negative
+ [21] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [26] Node Negative Node Negative Node Negative Node Positive Node Negative
+ [31] Node Negative Node Positive Node Negative Node Negative Node Positive
+ [36] Node Negative Node Negative Node Positive Node Negative Node Negative
+ [41] Node Positive Node Positive Node Negative Node Negative Node Negative
+ [46] Node Positive Node Positive Node Positive Node Positive Node Negative
+ [51] Node Positive Node Positive Node Negative Node Positive Node Negative
+ [56] Node Positive Node Negative Node Positive Node Positive Node Positive
+ [61] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [66] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [71] Node Negative Node Positive Node Positive Node Positive Node Positive
+ [76] Node Positive Node Positive Node Negative Node Negative Node Positive
+ [81] Node Negative Node Positive Node Positive Node Positive Node Negative
+ [86] Node Negative Node Positive Node Negative Node Positive Node Positive
+ [91] Node Positive Node Positive Node Positive Node Positive Node Positive
+ [96] Node Negative Node Positive Node Positive Node Positive Node Negative
+[101] Node Positive Node Positive Node Positive Node Negative Node Negative
+[106] Node Positive Node Positive Node Negative Node Negative
+Levels: Node Negative Node Positive
+
+$`DTC Negative`
+ [1] Node Positive Node Negative Node Positive Node Negative Node Positive
+ [6] Node Positive Node Negative Node Negative Node Positive Node Negative
+[11] Node Positive Node Negative Node Positive Node Negative Node Negative
+[16] Node Negative Node Positive Node Negative Node Negative Node Positive
+[21] Node Negative Node Positive Node Positive Node Negative Node Positive
+[26] Node Positive Node Positive Node Negative Node Positive Node Positive
+[31] Node Positive Node Positive Node Positive Node Negative Node Positive
+[36] Node Negative Node Negative Node Positive Node Positive Node Positive
+[41] Node Positive Node Negative Node Positive Node Positive Node Positive
+[46] Node Positive Node Positive Node Negative Node Positive Node Negative
+[51] Node Positive Node Negative Node Positive Node Positive Node Positive
+[56] Node Positive Node Positive Node Positive Node Positive Node Negative
+[61] Node Positive Node Positive Node Positive Node Positive Node Positive
+[66] Node Positive Node Negative Node Positive Node Positive Node Negative
+Levels: Node Negative Node Positive
+
+$`DTC Positive`
+ [1] Node Negative Node Positive Node Negative Node Positive Node Negative
+ [6] Node Negative Node Negative Node Positive Node Positive Node Positive
+[11] Node Positive Node Negative Node Negative Node Negative Node Positive
+[16] Node Negative Node Positive Node Negative Node Negative Node Negative
+[21] Node Negative Node Positive Node Negative Node Negative Node Positive
+[26] Node Negative Node Positive Node Positive Node Positive Node Positive
+[31] Node Positive Node Negative Node Negative Node Positive Node Positive
+[36] Node Negative Node Negative Node Negative Node Negative
+Levels: Node Negative Node Positive
+
+$overall
+  [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+  [4] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+  [7] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [10] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [13] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [16] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [19] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [22] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [25] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [28] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [31] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [34] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [37] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [40] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [43] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [46] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [49] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [52] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [55] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [58] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [61] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [64] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [67] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [70] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+ [73] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [76] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [79] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+ [82] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [85] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+ [88] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [91] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [94] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+ [97] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[100] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+[103] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[106] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[109] No Axillary Dissection
+attr(,"label")
+[1] Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$`DTC Negative`
+ [1] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+ [4] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+ [7] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+[10] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[13] No Axillary Dissection No Axillary Dissection Axillary Dissection   
+[16] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[19] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+[22] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[25] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[28] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[31] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[34] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[37] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[40] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[43] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+[46] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[49] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[52] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+[55] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[58] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[61] Axillary Dissection    No Axillary Dissection Axillary Dissection   
+[64] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+[67] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[70] No Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$`DTC Positive`
+ [1] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+ [4] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+ [7] No Axillary Dissection Axillary Dissection    Axillary Dissection   
+[10] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[13] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+[16] Axillary Dissection    Axillary Dissection    No Axillary Dissection
+[19] No Axillary Dissection Axillary Dissection    No Axillary Dissection
+[22] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[25] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[28] Axillary Dissection    Axillary Dissection    Axillary Dissection   
+[31] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[34] Axillary Dissection    No Axillary Dissection No Axillary Dissection
+[37] No Axillary Dissection No Axillary Dissection No Axillary Dissection
+Levels: No Axillary Dissection Axillary Dissection
+
+$overall
+  [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+  [7] Lumpectomy Mastectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy
+ [13] Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [19] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [25] Mastectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [31] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [37] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [43] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+ [49] Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [55] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Mastectomy
+ [61] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [67] Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [73] Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+ [79] Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+ [85] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+ [91] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+ [97] Mastectomy Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy
+[103] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+[109] Mastectomy
+attr(,"label")
+[1] Surgery Type
+Levels: Lumpectomy Mastectomy
+
+$`DTC Negative`
+ [1] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Lumpectomy
+ [7] Lumpectomy Lumpectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy
+[13] Lumpectomy Lumpectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+[19] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+[25] Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy
+[31] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Lumpectomy
+[37] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+[43] Lumpectomy Mastectomy Mastectomy Lumpectomy Lumpectomy Lumpectomy
+[49] Mastectomy Lumpectomy Lumpectomy Lumpectomy Mastectomy Mastectomy
+[55] Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy Mastectomy
+[61] Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy Mastectomy
+[67] Lumpectomy Mastectomy Mastectomy Mastectomy
+Levels: Lumpectomy Mastectomy
+
+$`DTC Positive`
+ [1] Mastectomy Mastectomy Lumpectomy Mastectomy Lumpectomy Mastectomy
+ [7] Lumpectomy Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+[13] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+[19] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+[25] Mastectomy Mastectomy Lumpectomy Mastectomy Mastectomy Mastectomy
+[31] Mastectomy Lumpectomy Lumpectomy Mastectomy Mastectomy Lumpectomy
+[37] Lumpectomy Mastectomy Mastectomy
+Levels: Lumpectomy Mastectomy
+
+$overall
+  [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+  [4] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+  [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [16] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [25] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [28] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [34] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [37] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [40] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [43] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [49] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [52] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [55] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [58] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [61] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [64] Neoadjuvant Chemo    Neoadjuvant Chemo    Neoadjuvant Chemo   
+ [67] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [70] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [73] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [76] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [79] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [82] Neoadjuvant Chemo    Neoadjuvant Chemo    No Neoadjuvant Chemo
+ [85] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [88] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [91] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+ [94] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [97] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[100] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+[103] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[106] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[109] No Neoadjuvant Chemo
+attr(,"label")
+[1] Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$`DTC Negative`
+ [1] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [4] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[16] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+[19] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[25] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+[28] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[34] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[37] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+[40] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[43] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[46] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[49] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+[52] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+[55] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+[58] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[61] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[64] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[67] No Neoadjuvant Chemo No Neoadjuvant Chemo Neoadjuvant Chemo   
+[70] No Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+$`DTC Positive`
+ [1] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [4] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+ [7] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[10] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[13] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[16] Neoadjuvant Chemo    No Neoadjuvant Chemo Neoadjuvant Chemo   
+[19] No Neoadjuvant Chemo Neoadjuvant Chemo    No Neoadjuvant Chemo
+[22] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[25] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[28] Neoadjuvant Chemo    Neoadjuvant Chemo    Neoadjuvant Chemo   
+[31] No Neoadjuvant Chemo No Neoadjuvant Chemo No Neoadjuvant Chemo
+[34] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+[37] Neoadjuvant Chemo    No Neoadjuvant Chemo No Neoadjuvant Chemo
+Levels: No Neoadjuvant Chemo Neoadjuvant Chemo
+
+
table1_dtc #we have p-values!  
+
+
+ +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Total
+(N=109)
DTC Negative
+(N=70)
DTC Positive
+(N=39)
P-value
Age at Diagnosis (years)0.722
Mean (SD)49.7 (9.66)49.9 (9.74)49.2 (9.63)
Median [Min, Max]49.3 [27.3, 68.9]51.6 [27.3, 68.9]48.8 [30.7, 67.7]
Final Receptor Group0.145
TNBC45 (41.3%)25 (35.7%)20 (51.3%)
HR+ HER2-52 (47.7%)37 (52.9%)15 (38.5%)
HR+ HER2+8 (7.3%)4 (5.7%)4 (10.3%)
HR- HER2+4 (3.7%)4 (5.7%)0 (0%)
Race0.683
Mean (SD)4.65 (1.12)4.69 (1.06)4.59 (1.23)
Median [Min, Max]5.00 [1.00, 5.00]5.00 [1.00, 5.00]5.00 [1.00, 5.00]
Tumor Grade0.11
Grade 379 (72.5%)46 (65.7%)33 (84.6%)
Grade 122 (20.2%)18 (25.7%)4 (10.3%)
Grade 26 (5.5%)4 (5.7%)2 (5.1%)
Missing2 (1.8%)2 (2.9%)0 (0%)
Overall Stage0.804
Stage I35 (32.1%)22 (31.4%)13 (33.3%)
Stage II47 (43.1%)29 (41.4%)18 (46.2%)
Stage III26 (23.9%)18 (25.7%)8 (20.5%)
Missing1 (0.9%)1 (1.4%)0 (0%)
T Stage0.629
T151 (46.8%)34 (48.6%)17 (43.6%)
T244 (40.4%)27 (38.6%)17 (43.6%)
T312 (11.0%)8 (11.4%)4 (10.3%)
T41 (0.9%)0 (0%)1 (2.6%)
Missing1 (0.9%)1 (1.4%)0 (0%)
N Stage0.114
N046 (42.2%)24 (34.3%)22 (56.4%)
N143 (39.4%)32 (45.7%)11 (28.2%)
N213 (11.9%)10 (14.3%)3 (7.7%)
N37 (6.4%)4 (5.7%)3 (7.7%)
Histology Category0.0157
Both Ductal and Lobular9 (8.3%)9 (12.9%)0 (0%)
Ductal84 (77.1%)48 (68.6%)36 (92.3%)
Lobular14 (12.8%)11 (15.7%)3 (7.7%)
Other2 (1.8%)2 (2.9%)0 (0%)
Radiation0.774
No Radiation34 (31.2%)23 (32.9%)11 (28.2%)
Radiation75 (68.8%)47 (67.1%)28 (71.8%)
Chemo0.291
No Chemo3 (2.8%)1 (1.4%)2 (5.1%)
Chemo106 (97.2%)69 (98.6%)37 (94.9%)
Endocrine Therapy0.497
No Endocrine Therapy47 (43.1%)28 (40.0%)19 (48.7%)
Endocrine Therapy62 (56.9%)42 (60.0%)20 (51.3%)
Bone Modifying Treatment1
No Bone Modifying Treatment70 (64.2%)45 (64.3%)25 (64.1%)
Bone Modifying Treatment39 (35.8%)25 (35.7%)14 (35.9%)
Node Status0.0414
Node Negative46 (42.2%)24 (34.3%)22 (56.4%)
Node Positive63 (57.8%)46 (65.7%)17 (43.6%)
Axillary Dissection0.204
No Axillary Dissection54 (49.5%)31 (44.3%)23 (59.0%)
Axillary Dissection55 (50.5%)39 (55.7%)16 (41.0%)
Surgery Type0.516
Lumpectomy45 (41.3%)31 (44.3%)14 (35.9%)
Mastectomy64 (58.7%)39 (55.7%)25 (64.1%)
Neoadjuvant Chemo0.37
No Neoadjuvant Chemo90 (82.6%)60 (85.7%)30 (76.9%)
Neoadjuvant Chemo19 (17.4%)10 (14.3%)9 (23.1%)
+ +
+
+
+

We can see in this series of tests that there are similar, but not identical, sets of variables that appear to be significant in predicting DTC status, including: Histology category (with ductal histology more storngly correlated with positivity), Nodal status (with node positive patients more likely to have DTC positivity), with trends towards significance for N stage, receptor group, and tumor grade. We have decided to not include pCR in this table or in further analyses because the cohort of patients who received neoadjuvant therapy is only 19 patients, so the n is very low for any tests of association and there is significant missingness for the overall cohort.

+
+
+

4.3 Multivariable Analysis

+

Variable Selection and Planning: I have chosen to perform a multivariable logistic regression to identify predictors of ctDNA (and DTC) positivity as we suspect these are biomarkers of relapse and can see even in our data-set that ctDNA is strongly associated with relapse and worse overall survival. Identifying predictors of positivity would help us to understand who we might consider screening for ctDNA positivity, as this testing is expensive and takes time and resources–and may not benefit everyone. ctDNA positivity is a binary outcome, and we have performed univariable analyses as above already to look at potentially significant relationships. There are multiple types of variables worth considering, including demographic and clinical factors, disease factors (such as aggressiveness of the tumor as measured by histology and grade, the hormone receptor status of the tumor, the stage of disease at diagnosis), and treatment factors (surgery type, radiation or no radiation, neoadjuvant chemotherapy or no, endocrine aka anti-hormone therapy). The only variable we have removed from our model is pathologic complete response (whether or not patients have NO tumor at the time of surgery IF they received neoadjuvant chemo/immunotherapy before surgery) as the number of patients who received neoadjuvant therapy was not particularly high and therefore there is significant missingness (and it would not make sense to impute for this variable, as it only is a relevant factor to consider for those patients who received neoadjuvant therapy). These are all time-invarying factors, and all were present at the beginning of enrollment on study, prior to ctDNA testing. We will use all of our variables that we assessed in our initial univariable tests of association (including those that had significant associations and those that did not), as we suspect some of these variables are related to one another or colinear and therefore we cannot rely on simple univariable tests of association to determine what will be most predictive of positivity.

+

We have several variables that were significant in our univariable analyses (chi-squared). These include median age-at-diagnosis, longer time from diagnosis to enrollment, higher tumor grade, and higher initial stage at diagnosis. Variables that were not significant but that could be considered include histology, nodal positivity, higher t-stage, and receptor status. While recurrence (both distant and local) as well as worse survival are significantly associated with ctDNA positivity, these are outcomes that we think of as following ctDNA positivity temporally and therefore should not be included in our predictive model as predictors.

+

LASSO: Lasso will give us the most parsimonious model and is an automatic approach, without consideration of absolute p-value cutoffs. There is no specific “right” method to choose variables, but generally purposeful selection begins with univariable analysis which we have already performed. We considered stepwise model building based on p-values, but this approach has gone out of favor as this approach uses somewhat arbitrary p-value cutoffs and can ignore actually relevant and important variables. As we have already performed univariable tests of association above with chi-squared tests, we do not need to perform univariable logistic regression. We have removed one variable of interest that we assessed with univariable association for ctDNA because of the significant missingness in the cohort overall (and its applicability to only the small subset of patients who received neoadjuvant therapy). We will perform LASSO with our remaining variables to identify and select variables that are most predictive.

+
+
library(glmnet)
+
+
Loading required package: Matrix
+
+
+

+Attaching package: 'Matrix'
+
+
+
The following objects are masked from 'package:tidyr':
+
+    expand, pack, unpack
+
+
+
Loaded glmnet 4.1-8
+
+
set.seed(123) 
+
+# Prepare the response variable
+y <- unique_subset_data$ctDNA_ever
+
+
+#was getting an error message when I ran y and X2 because there were 4 missing observations, so will impute these as it is only 4 and missingness is lo (<10%)
+library(mice)
+
+

+Attaching package: 'mice'
+
+
+
The following object is masked from 'package:stats':
+
+    filter
+
+
+
The following objects are masked from 'package:base':
+
+    cbind, rbind
+
+
# Impute missing values (as general missingness is low as above)
+imputed_data <- mice(unique_subset_data, m = 1, method = "pmm", maxit = 5)
+
+

+ iter imp variable
+  1   1  final_tumor_grade  final_overall_stage  final_t_stage
+  2   1  final_tumor_grade  final_overall_stage  final_t_stage
+  3   1  final_tumor_grade  final_overall_stage  final_t_stage
+  4   1  final_tumor_grade  final_overall_stage  final_t_stage
+  5   1  final_tumor_grade  final_overall_stage  final_t_stage
+
+
+
Warning: Number of logged events: 4
+
+
unique_subset_data <- complete(imputed_data)
+
+#-1 to not include intercept in this matrix as a predictor variable 
+X2 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+
+                                   final_tumor_grade + final_overall_stage + 
+                                      + final_t_stage + final_n_stage +  
+                                      histology_category + prtx_radiation + 
+                                      + prtx_chemo + prtx_endo + prtx_bonemod + 
+                                      node_status + axillary_dissection + 
+                                      diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data)
+
+
+
+# Fit lasso model
+lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1)  # alpha = 1 for lasso
+
+#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
+cv_lasso_model <- cv.glmnet(X2, y, family = "binomial", alpha = 1)
+
+
Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+
+
+
Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+
+
#plotting the results to look at the performance of different lamda 
+plot(cv_lasso_model)
+
+
+
+

+
+
+
+
#getting the best lambda  -- 0.052 
+best_lambda <- cv_lasso_model$lambda.min
+print(paste("Best lambda:", best_lambda))
+
+
[1] "Best lambda: 0.048114238291791"
+
+
#Finding the final fit model with the optimal lambda 
+final_lasso_model <- glmnet(X2, y, family = "binomial", alpha = 1, lambda = best_lambda)
+
+#Which coefficents are included in the model. 
+coef(final_lasso_model) 
+
+
29 x 1 sparse Matrix of class "dgCMatrix"
+                                              s0
+(Intercept)                            -2.762725
+age_at_diag                             .       
+final_receptor_groupTNBC                .       
+final_receptor_groupHR+ HER2-           .       
+final_receptor_groupHR+ HER2+           .       
+final_receptor_groupHR- HER2+           .       
+demo_race_finalAsian                    .       
+demo_race_finalWhite                    .       
+final_tumor_gradeGrade 1                .       
+final_tumor_gradeGrade 2                .       
+final_overall_stageStage II             .       
+final_overall_stageStage III            .       
+final_t_stageT2                         .       
+final_t_stageT3                         .       
+final_t_stageT4                         1.189380
+final_n_stageN1                         .       
+final_n_stageN2                         1.573283
+final_n_stageN3                         .       
+histology_categoryDuctal                .       
+histology_categoryLobular               .       
+histology_categoryOther                 .       
+prtx_radiationRadiation                 .       
+prtx_chemoChemo                         .       
+prtx_endoEndocrine Therapy              .       
+prtx_bonemodBone Modifying Treatment    .       
+node_statusNode Positive                .       
+axillary_dissectionAxillary Dissection  .       
+diag_surgery_type_1Mastectomy           .       
+diag_neoadj_chemo_1Neoadjuvant Chemo    .       
+
+
+

Variables that remain significant in the LASSO for ctDNA positivity are t-stage and n-stage. It is slightly challenging to interpret these multi-level variables (such as T-stage and N stage) in the lasso but you can see that higher categories (T4, N2) are associated with positivity in the LASSO. The lambda for this model is quite low at 0.05. It is important to remember that a number of these variables are related to one another (such as T stage and N stage with final overall stage, which is built based on T and N stage), and node status + N stage (node status being built on N stage). I’ll try a few other LASSOs to see whether by eliminating one of each of these colinear variables we get different results.

+
+
library(glmnet)
+
+set.seed(123) #to ensure consistency of results 
+
+# Prepare the response variable
+y <- unique_subset_data$ctDNA_ever
+
+#yet again, the same 4 missing observations in X3, so will impute (only 4 observations). We have already imputed these, so I don't need to do it again for unique_subset_Data 
+
+
+### removed Nodal status as a variable 
+X3 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+
+                                   final_tumor_grade + final_overall_stage + 
+                                      + final_t_stage + final_n_stage + 
+                                      histology_category + prtx_radiation + 
+                                      + prtx_chemo + prtx_endo + prtx_bonemod  
+                                       + axillary_dissection + 
+                                      diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data)
+
+# Fit lasso model
+lasso_model <- glmnet(X3, y, family = "binomial", alpha = 1)  # alpha = 1 for lasso
+
+#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
+cv_lasso_model <- cv.glmnet(X3, y, family = "binomial", alpha = 1)
+
+
Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+Warning in lognet(xd, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : one
+multinomial or binomial class has fewer than 8 observations; dangerous ground
+
+
#plotting the results to look at the performance of different lamda 
+plot(cv_lasso_model)
+
+
+
+

+
+
+
+
#getting the best lambda  -- 0.048, lower  
+best_lambda <- cv_lasso_model$lambda.min
+print(paste("Best lambda:", best_lambda))
+
+
[1] "Best lambda: 0.048114238291791"
+
+
#Finding the final fit model with the optimal lambda 
+paired_down_lasso <- glmnet(X3, y, family = "binomial", alpha = 1, lambda = best_lambda)
+
+#Which coefficents are included in the model. 
+coef(paired_down_lasso) 
+
+
28 x 1 sparse Matrix of class "dgCMatrix"
+                                              s0
+(Intercept)                            -2.762725
+age_at_diag                             .       
+final_receptor_groupTNBC                .       
+final_receptor_groupHR+ HER2-           .       
+final_receptor_groupHR+ HER2+           .       
+final_receptor_groupHR- HER2+           .       
+demo_race_finalAsian                    .       
+demo_race_finalWhite                    .       
+final_tumor_gradeGrade 1                .       
+final_tumor_gradeGrade 2                .       
+final_overall_stageStage II             .       
+final_overall_stageStage III            .       
+final_t_stageT2                         .       
+final_t_stageT3                         .       
+final_t_stageT4                         1.189380
+final_n_stageN1                         .       
+final_n_stageN2                         1.573283
+final_n_stageN3                         .       
+histology_categoryDuctal                .       
+histology_categoryLobular               .       
+histology_categoryOther                 .       
+prtx_radiationRadiation                 .       
+prtx_chemoChemo                         .       
+prtx_endoEndocrine Therapy              .       
+prtx_bonemodBone Modifying Treatment    .       
+axillary_dissectionAxillary Dissection  .       
+diag_surgery_type_1Mastectomy           .       
+diag_neoadj_chemo_1Neoadjuvant Chemo    .       
+
+
+

When we use the paired down lasso model for ctDNA positivity (removed nodal positivity), we see that T stage and N stage remain the only significant factors, and that higher nodal status is the most influential on ctDNA positivity. The lambda for this model is 0.048 which is lower than the prior model.

+

It is, however, somewhat challenging to model ctDNA positivity using any of these approaches because there were only 9 individuals in this cohort of 109 individuals with positive results. Because of this low N, it is hard to know exactly what to do with these predictors.

+

The intercept (-2.76) is the log-odds of the outcome (ctDNA positivity or DTC positivity) when all the predictor variables are zero. The coefficients can be interpreted as the amount/times the log odds increases (or decreases) for that cohort, holding all other variables equal.

+

To test our proof of principle approach that lasso can be applied to this dataset and perhaps generate more robust results, we will also look at DTC predictors, as DTC positivity was more frequent in this cohort and we therefore suspect the modeling approach may work better.

+
+
set.seed(123) 
+
+#### DTC predictions. 
+
+subset_data <- subset_data[!duplicated(subset_data$participant_id), ]
+dtc_unique_subset_data <- merge(unique_subset_data, subset_data[, c("participant_id", "dtc_ever")], by = "participant_id", all.x = TRUE)
+
+nrow(dtc_unique_subset_data)  # Should still be 109
+
+
[1] 109
+
+
table(dtc_unique_subset_data$dtc_ever, useNA = "ifany")  # Check for NA values
+
+

+ 0  1 
+70 39 
+
+
#run the lasso for DTC status. This might work better as there are more DTC + results 
+y1 <- dtc_unique_subset_data$dtc_ever
+X2 #use the same X2 as it has the same predictors we are interested in  
+
+
    age_at_diag final_receptor_groupTNBC final_receptor_groupHR+ HER2-
+1      55.89870                        0                             1
+2      49.25667                        0                             1
+3      52.87611                        0                             1
+4      29.93840                        1                             0
+5      37.00753                        1                             0
+6      48.98563                        0                             1
+7      63.80835                        0                             0
+8      40.89802                        0                             0
+9      43.59754                        1                             0
+10     38.57632                        1                             0
+11     41.77687                        1                             0
+12     45.68925                        0                             1
+13     59.94524                        0                             1
+14     59.43600                        0                             0
+15     52.14511                        1                             0
+16     42.93771                        1                             0
+17     64.69541                        0                             1
+18     55.14031                        1                             0
+19     41.26762                        1                             0
+20     39.52361                        1                             0
+21     57.76044                        1                             0
+22     44.42984                        0                             1
+23     51.34565                        0                             1
+24     42.27789                        1                             0
+25     57.05133                        1                             0
+26     57.62628                        1                             0
+27     54.86927                        1                             0
+28     44.18891                        1                             0
+29     63.62491                        0                             1
+30     36.00548                        1                             0
+31     55.57837                        1                             0
+32     30.71595                        0                             0
+33     41.28953                        1                             0
+34     59.38946                        0                             1
+35     48.79945                        0                             1
+36     59.15400                        0                             0
+37     48.97194                        1                             0
+38     59.39767                        0                             1
+39     39.67967                        1                             0
+40     67.68515                        0                             1
+41     41.84531                        1                             0
+42     48.16975                        1                             0
+43     58.07529                        0                             1
+44     62.49966                        0                             1
+45     46.64476                        0                             0
+46     47.34565                        0                             1
+47     52.09856                        0                             1
+48     36.58042                        0                             0
+49     58.26146                        0                             1
+50     61.76318                        0                             1
+51     61.73580                        1                             0
+52     39.40862                        1                             0
+53     55.30459                        1                             0
+54     53.10335                        0                             1
+55     43.30459                        1                             0
+56     48.46270                        0                             1
+57     44.07666                        1                             0
+58     52.55305                        1                             0
+59     56.45996                        1                             0
+60     67.72621                        1                             0
+61     39.59206                        1                             0
+62     51.82752                        0                             1
+63     58.28611                        0                             1
+64     46.93498                        1                             0
+65     31.17591                        1                             0
+66     55.96441                        1                             0
+67     46.38741                        0                             1
+68     46.33812                        0                             1
+69     40.62971                        0                             1
+70     37.67556                        0                             1
+71     32.35318                        1                             0
+72     48.75291                        0                             1
+73     56.22177                        0                             1
+74     39.41136                        0                             1
+75     49.76591                        1                             0
+76     43.22245                        0                             0
+77     36.01095                        0                             1
+78     41.30322                        1                             0
+79     59.57016                        0                             1
+80     39.65503                        0                             1
+81     54.94593                        1                             0
+82     43.50992                        0                             0
+83     48.80767                        1                             0
+84     62.10541                        0                             1
+85     63.35934                        0                             0
+86     57.31417                        0                             1
+87     59.74264                        0                             1
+88     66.92676                        1                             0
+89     36.30938                        1                             0
+90     34.83641                        0                             1
+91     55.12115                        0                             1
+92     52.07118                        0                             1
+93     27.33744                        0                             1
+94     64.41342                        0                             1
+95     56.09035                        0                             1
+96     47.90691                        0                             1
+97     51.38125                        0                             1
+98     41.71663                        1                             0
+99     48.47639                        0                             1
+100    40.52567                        1                             0
+101    60.39151                        0                             1
+102    52.51198                        0                             0
+103    60.87064                        0                             1
+104    58.61465                        0                             1
+105    38.60370                        0                             1
+106    68.93634                        0                             1
+107    37.84531                        0                             0
+108    51.43874                        1                             0
+109    52.68720                        0                             1
+    final_receptor_groupHR+ HER2+ final_receptor_groupHR- HER2+
+1                               0                             0
+2                               0                             0
+3                               0                             0
+4                               0                             0
+5                               0                             0
+6                               0                             0
+7                               0                             1
+8                               1                             0
+9                               0                             0
+10                              0                             0
+11                              0                             0
+12                              0                             0
+13                              0                             0
+14                              1                             0
+15                              0                             0
+16                              0                             0
+17                              0                             0
+18                              0                             0
+19                              0                             0
+20                              0                             0
+21                              0                             0
+22                              0                             0
+23                              0                             0
+24                              0                             0
+25                              0                             0
+26                              0                             0
+27                              0                             0
+28                              0                             0
+29                              0                             0
+30                              0                             0
+31                              0                             0
+32                              1                             0
+33                              0                             0
+34                              0                             0
+35                              0                             0
+36                              1                             0
+37                              0                             0
+38                              0                             0
+39                              0                             0
+40                              0                             0
+41                              0                             0
+42                              0                             0
+43                              0                             0
+44                              0                             0
+45                              1                             0
+46                              0                             0
+47                              0                             0
+48                              1                             0
+49                              0                             0
+50                              0                             0
+51                              0                             0
+52                              0                             0
+53                              0                             0
+54                              0                             0
+55                              0                             0
+56                              0                             0
+57                              0                             0
+58                              0                             0
+59                              0                             0
+60                              0                             0
+61                              0                             0
+62                              0                             0
+63                              0                             0
+64                              0                             0
+65                              0                             0
+66                              0                             0
+67                              0                             0
+68                              0                             0
+69                              0                             0
+70                              0                             0
+71                              0                             0
+72                              0                             0
+73                              0                             0
+74                              0                             0
+75                              0                             0
+76                              1                             0
+77                              0                             0
+78                              0                             0
+79                              0                             0
+80                              0                             0
+81                              0                             0
+82                              0                             1
+83                              0                             0
+84                              0                             0
+85                              1                             0
+86                              0                             0
+87                              0                             0
+88                              0                             0
+89                              0                             0
+90                              0                             0
+91                              0                             0
+92                              0                             0
+93                              0                             0
+94                              0                             0
+95                              0                             0
+96                              0                             0
+97                              0                             0
+98                              0                             0
+99                              0                             0
+100                             0                             0
+101                             0                             0
+102                             0                             1
+103                             0                             0
+104                             0                             0
+105                             0                             0
+106                             0                             0
+107                             0                             1
+108                             0                             0
+109                             0                             0
+    demo_race_finalAsian demo_race_finalWhite final_tumor_gradeGrade 1
+1                      0                    1                        0
+2                      0                    1                        0
+3                      0                    1                        0
+4                      0                    1                        0
+5                      0                    1                        0
+6                      0                    1                        0
+7                      0                    0                        0
+8                      0                    1                        0
+9                      0                    1                        0
+10                     0                    1                        0
+11                     0                    1                        0
+12                     0                    1                        0
+13                     0                    1                        0
+14                     0                    1                        0
+15                     0                    1                        0
+16                     0                    1                        0
+17                     0                    1                        0
+18                     0                    1                        0
+19                     0                    0                        0
+20                     0                    1                        0
+21                     0                    1                        0
+22                     0                    1                        0
+23                     0                    1                        0
+24                     0                    1                        0
+25                     0                    1                        0
+26                     0                    0                        0
+27                     0                    1                        0
+28                     0                    1                        0
+29                     0                    1                        1
+30                     0                    1                        0
+31                     0                    1                        0
+32                     0                    1                        0
+33                     0                    0                        0
+34                     0                    1                        1
+35                     0                    1                        1
+36                     0                    1                        0
+37                     0                    1                        0
+38                     0                    1                        1
+39                     0                    1                        0
+40                     0                    1                        0
+41                     0                    1                        0
+42                     0                    1                        0
+43                     0                    1                        0
+44                     0                    1                        0
+45                     0                    1                        0
+46                     0                    1                        1
+47                     0                    1                        0
+48                     0                    0                        1
+49                     0                    1                        1
+50                     0                    1                        1
+51                     0                    1                        0
+52                     0                    1                        0
+53                     0                    1                        0
+54                     0                    1                        1
+55                     0                    1                        0
+56                     0                    1                        1
+57                     0                    0                        0
+58                     0                    1                        0
+59                     0                    1                        0
+60                     0                    1                        0
+61                     0                    1                        0
+62                     0                    1                        1
+63                     0                    1                        0
+64                     0                    1                        0
+65                     0                    0                        0
+66                     0                    1                        0
+67                     0                    1                        1
+68                     0                    1                        1
+69                     0                    1                        0
+70                     0                    1                        0
+71                     0                    1                        0
+72                     0                    1                        0
+73                     0                    1                        1
+74                     0                    1                        0
+75                     0                    1                        0
+76                     0                    1                        0
+77                     0                    1                        0
+78                     0                    1                        0
+79                     0                    1                        0
+80                     0                    1                        0
+81                     0                    0                        0
+82                     0                    1                        0
+83                     0                    1                        0
+84                     0                    1                        0
+85                     0                    1                        0
+86                     0                    1                        0
+87                     0                    1                        1
+88                     0                    1                        0
+89                     0                    1                        0
+90                     0                    1                        1
+91                     0                    1                        0
+92                     0                    1                        1
+93                     1                    0                        1
+94                     0                    1                        1
+95                     0                    1                        0
+96                     0                    1                        1
+97                     0                    1                        0
+98                     0                    0                        0
+99                     0                    1                        1
+100                    0                    1                        0
+101                    0                    1                        1
+102                    0                    1                        0
+103                    0                    1                        1
+104                    0                    1                        0
+105                    0                    1                        0
+106                    0                    1                        0
+107                    0                    1                        0
+108                    0                    1                        0
+109                    0                    1                        0
+    final_tumor_gradeGrade 2 final_overall_stageStage II
+1                          1                           0
+2                          1                           0
+3                          0                           0
+4                          0                           1
+5                          0                           1
+6                          0                           0
+7                          0                           0
+8                          0                           0
+9                          0                           1
+10                         0                           0
+11                         0                           1
+12                         0                           1
+13                         0                           1
+14                         1                           1
+15                         0                           0
+16                         0                           1
+17                         0                           0
+18                         0                           1
+19                         0                           1
+20                         0                           0
+21                         0                           1
+22                         0                           0
+23                         0                           0
+24                         0                           1
+25                         0                           1
+26                         0                           1
+27                         0                           0
+28                         0                           1
+29                         0                           0
+30                         0                           1
+31                         0                           0
+32                         0                           1
+33                         0                           0
+34                         0                           1
+35                         0                           0
+36                         0                           1
+37                         0                           1
+38                         0                           0
+39                         0                           0
+40                         0                           0
+41                         0                           1
+42                         0                           0
+43                         0                           1
+44                         1                           0
+45                         1                           0
+46                         0                           0
+47                         0                           1
+48                         0                           0
+49                         0                           1
+50                         0                           0
+51                         0                           1
+52                         0                           1
+53                         0                           0
+54                         0                           1
+55                         0                           0
+56                         0                           0
+57                         0                           0
+58                         0                           1
+59                         0                           0
+60                         0                           1
+61                         0                           1
+62                         0                           0
+63                         0                           0
+64                         0                           1
+65                         0                           1
+66                         0                           1
+67                         0                           0
+68                         0                           1
+69                         0                           0
+70                         0                           1
+71                         0                           0
+72                         0                           1
+73                         0                           0
+74                         0                           0
+75                         0                           0
+76                         0                           0
+77                         0                           0
+78                         0                           0
+79                         0                           0
+80                         0                           0
+81                         0                           0
+82                         0                           1
+83                         0                           0
+84                         1                           0
+85                         0                           1
+86                         0                           0
+87                         0                           0
+88                         0                           0
+89                         0                           1
+90                         0                           1
+91                         0                           1
+92                         0                           0
+93                         0                           1
+94                         0                           0
+95                         0                           0
+96                         0                           0
+97                         0                           1
+98                         0                           1
+99                         0                           1
+100                        0                           0
+101                        0                           0
+102                        0                           0
+103                        0                           0
+104                        0                           0
+105                        0                           0
+106                        0                           1
+107                        0                           1
+108                        0                           0
+109                        0                           1
+    final_overall_stageStage III final_t_stageT2 final_t_stageT3
+1                              1               1               0
+2                              0               0               0
+3                              1               1               0
+4                              0               1               0
+5                              0               1               0
+6                              1               0               1
+7                              1               1               0
+8                              1               0               0
+9                              0               1               0
+10                             0               0               0
+11                             0               1               0
+12                             0               1               0
+13                             0               0               1
+14                             0               0               0
+15                             0               0               0
+16                             0               1               0
+17                             0               0               0
+18                             0               0               0
+19                             0               1               0
+20                             0               0               0
+21                             0               1               0
+22                             1               1               0
+23                             1               1               0
+24                             0               1               0
+25                             0               0               0
+26                             0               1               0
+27                             0               0               0
+28                             0               0               0
+29                             1               1               0
+30                             0               1               0
+31                             0               0               0
+32                             0               0               0
+33                             0               0               0
+34                             0               0               1
+35                             1               0               0
+36                             0               1               0
+37                             0               1               0
+38                             1               0               0
+39                             0               0               0
+40                             0               0               0
+41                             0               1               0
+42                             1               0               1
+43                             0               1               0
+44                             0               0               0
+45                             0               0               0
+46                             1               0               1
+47                             0               0               0
+48                             1               0               0
+49                             0               1               0
+50                             0               0               0
+51                             0               1               0
+52                             0               1               0
+53                             0               0               0
+54                             0               0               0
+55                             0               0               0
+56                             1               0               1
+57                             0               0               0
+58                             0               0               0
+59                             0               0               0
+60                             0               1               0
+61                             0               1               0
+62                             1               0               1
+63                             0               0               0
+64                             0               1               0
+65                             0               1               0
+66                             0               1               0
+67                             1               0               0
+68                             0               0               0
+69                             0               0               0
+70                             0               1               0
+71                             0               0               0
+72                             0               1               0
+73                             0               0               0
+74                             1               0               1
+75                             1               0               1
+76                             0               0               0
+77                             1               1               0
+78                             0               0               0
+79                             0               0               0
+80                             0               0               0
+81                             0               0               0
+82                             0               1               0
+83                             1               1               0
+84                             0               0               0
+85                             0               1               0
+86                             0               0               0
+87                             1               0               0
+88                             0               0               0
+89                             0               1               0
+90                             0               1               0
+91                             0               1               0
+92                             1               0               1
+93                             0               1               0
+94                             1               1               0
+95                             1               0               0
+96                             0               0               0
+97                             0               1               0
+98                             0               0               0
+99                             0               0               0
+100                            0               0               0
+101                            0               0               0
+102                            1               0               1
+103                            1               0               1
+104                            0               0               0
+105                            0               0               0
+106                            0               1               0
+107                            0               1               0
+108                            0               0               0
+109                            0               1               0
+    final_t_stageT4 final_n_stageN1 final_n_stageN2 final_n_stageN3
+1                 0               0               0               1
+2                 0               0               0               0
+3                 0               0               0               1
+4                 0               0               0               0
+5                 0               0               0               0
+6                 0               0               1               0
+7                 0               0               1               0
+8                 0               0               1               0
+9                 0               0               0               0
+10                0               0               0               0
+11                0               0               0               0
+12                0               1               0               0
+13                0               0               0               0
+14                0               1               0               0
+15                0               0               0               0
+16                0               1               0               0
+17                0               0               0               0
+18                0               1               0               0
+19                0               0               0               0
+20                0               0               0               0
+21                0               0               0               0
+22                0               0               0               1
+23                0               0               0               1
+24                0               1               0               0
+25                0               1               0               0
+26                0               0               0               0
+27                0               0               0               0
+28                0               0               0               0
+29                0               0               1               0
+30                0               0               0               0
+31                0               0               0               0
+32                0               1               0               0
+33                0               0               0               0
+34                0               0               0               0
+35                1               0               1               0
+36                0               0               0               0
+37                0               0               0               0
+38                0               0               1               0
+39                0               0               0               0
+40                0               0               0               0
+41                0               1               0               0
+42                0               0               1               0
+43                0               0               0               0
+44                0               0               0               0
+45                0               0               0               0
+46                0               0               1               0
+47                0               1               0               0
+48                0               0               1               0
+49                0               1               0               0
+50                0               0               0               0
+51                0               1               0               0
+52                0               1               0               0
+53                0               0               0               0
+54                0               1               0               0
+55                0               0               0               0
+56                0               1               0               0
+57                0               0               0               0
+58                0               1               0               0
+59                0               1               0               0
+60                0               1               0               0
+61                0               0               0               0
+62                0               1               0               0
+63                0               0               0               0
+64                0               1               0               0
+65                0               1               0               0
+66                0               0               0               0
+67                0               0               0               1
+68                0               1               0               0
+69                0               1               0               0
+70                0               1               0               0
+71                0               0               0               0
+72                0               1               0               0
+73                0               1               0               0
+74                0               0               0               1
+75                0               1               0               0
+76                0               1               0               0
+77                0               0               0               1
+78                0               0               0               0
+79                0               0               0               0
+80                0               1               0               0
+81                0               0               0               0
+82                0               1               0               0
+83                0               1               0               0
+84                0               1               0               0
+85                0               0               0               0
+86                0               0               0               0
+87                0               0               1               0
+88                0               0               0               0
+89                0               1               0               0
+90                0               1               0               0
+91                0               1               0               0
+92                0               0               1               0
+93                0               1               0               0
+94                0               0               1               0
+95                0               0               1               0
+96                0               0               0               0
+97                0               1               0               0
+98                0               1               0               0
+99                0               1               0               0
+100               0               0               0               0
+101               0               1               0               0
+102               0               1               0               0
+103               0               1               0               0
+104               0               0               0               0
+105               0               0               0               0
+106               0               1               0               0
+107               0               1               0               0
+108               0               0               0               0
+109               0               0               0               0
+    histology_categoryDuctal histology_categoryLobular histology_categoryOther
+1                          0                         0                       0
+2                          1                         0                       0
+3                          1                         0                       0
+4                          1                         0                       0
+5                          1                         0                       0
+6                          0                         1                       0
+7                          1                         0                       0
+8                          1                         0                       0
+9                          1                         0                       0
+10                         1                         0                       0
+11                         1                         0                       0
+12                         1                         0                       0
+13                         0                         1                       0
+14                         1                         0                       0
+15                         1                         0                       0
+16                         1                         0                       0
+17                         1                         0                       0
+18                         1                         0                       0
+19                         1                         0                       0
+20                         1                         0                       0
+21                         1                         0                       0
+22                         1                         0                       0
+23                         1                         0                       0
+24                         1                         0                       0
+25                         1                         0                       0
+26                         1                         0                       0
+27                         1                         0                       0
+28                         1                         0                       0
+29                         1                         0                       0
+30                         1                         0                       0
+31                         1                         0                       0
+32                         1                         0                       0
+33                         1                         0                       0
+34                         0                         1                       0
+35                         1                         0                       0
+36                         1                         0                       0
+37                         1                         0                       0
+38                         1                         0                       0
+39                         1                         0                       0
+40                         1                         0                       0
+41                         1                         0                       0
+42                         1                         0                       0
+43                         1                         0                       0
+44                         1                         0                       0
+45                         0                         0                       1
+46                         0                         1                       0
+47                         1                         0                       0
+48                         1                         0                       0
+49                         0                         1                       0
+50                         0                         1                       0
+51                         1                         0                       0
+52                         1                         0                       0
+53                         1                         0                       0
+54                         0                         1                       0
+55                         1                         0                       0
+56                         0                         1                       0
+57                         1                         0                       0
+58                         1                         0                       0
+59                         1                         0                       0
+60                         1                         0                       0
+61                         0                         0                       1
+62                         0                         1                       0
+63                         1                         0                       0
+64                         1                         0                       0
+65                         1                         0                       0
+66                         1                         0                       0
+67                         0                         1                       0
+68                         1                         0                       0
+69                         1                         0                       0
+70                         1                         0                       0
+71                         1                         0                       0
+72                         1                         0                       0
+73                         0                         0                       0
+74                         1                         0                       0
+75                         1                         0                       0
+76                         0                         0                       0
+77                         0                         0                       0
+78                         1                         0                       0
+79                         1                         0                       0
+80                         0                         0                       0
+81                         1                         0                       0
+82                         1                         0                       0
+83                         1                         0                       0
+84                         1                         0                       0
+85                         1                         0                       0
+86                         1                         0                       0
+87                         1                         0                       0
+88                         1                         0                       0
+89                         1                         0                       0
+90                         1                         0                       0
+91                         0                         0                       0
+92                         0                         1                       0
+93                         1                         0                       0
+94                         0                         1                       0
+95                         1                         0                       0
+96                         1                         0                       0
+97                         0                         0                       0
+98                         1                         0                       0
+99                         0                         0                       0
+100                        1                         0                       0
+101                        0                         1                       0
+102                        1                         0                       0
+103                        0                         1                       0
+104                        1                         0                       0
+105                        1                         0                       0
+106                        1                         0                       0
+107                        1                         0                       0
+108                        1                         0                       0
+109                        0                         0                       0
+    prtx_radiationRadiation prtx_chemoChemo prtx_endoEndocrine Therapy
+1                         1               1                          1
+2                         0               1                          1
+3                         0               1                          1
+4                         0               1                          0
+5                         0               1                          1
+6                         1               1                          1
+7                         1               1                          0
+8                         1               1                          1
+9                         1               1                          0
+10                        0               1                          0
+11                        1               1                          0
+12                        1               1                          1
+13                        1               1                          1
+14                        1               0                          1
+15                        0               1                          0
+16                        0               1                          0
+17                        1               1                          1
+18                        1               1                          0
+19                        0               1                          0
+20                        1               1                          0
+21                        1               1                          0
+22                        1               1                          1
+23                        1               1                          1
+24                        1               1                          0
+25                        0               1                          0
+26                        1               1                          0
+27                        0               1                          0
+28                        0               1                          0
+29                        1               1                          1
+30                        0               1                          0
+31                        1               1                          0
+32                        1               1                          1
+33                        0               1                          0
+34                        1               1                          1
+35                        1               0                          1
+36                        0               1                          1
+37                        0               1                          0
+38                        1               1                          1
+39                        0               1                          0
+40                        0               1                          1
+41                        1               1                          0
+42                        1               1                          0
+43                        1               1                          1
+44                        1               1                          1
+45                        1               1                          1
+46                        1               1                          1
+47                        0               1                          1
+48                        1               1                          1
+49                        1               1                          1
+50                        0               1                          1
+51                        1               1                          1
+52                        1               1                          0
+53                        1               1                          0
+54                        0               1                          1
+55                        1               1                          0
+56                        1               1                          1
+57                        0               1                          0
+58                        1               1                          0
+59                        1               1                          0
+60                        0               1                          0
+61                        0               1                          0
+62                        1               1                          1
+63                        1               1                          1
+64                        1               1                          0
+65                        1               1                          0
+66                        1               1                          0
+67                        1               1                          1
+68                        1               1                          1
+69                        0               1                          1
+70                        1               1                          1
+71                        0               1                          0
+72                        1               1                          1
+73                        1               1                          1
+74                        1               1                          1
+75                        1               1                          0
+76                        0               1                          1
+77                        1               1                          1
+78                        1               1                          0
+79                        1               1                          1
+80                        1               1                          1
+81                        1               1                          0
+82                        1               1                          0
+83                        1               1                          0
+84                        1               0                          1
+85                        1               1                          1
+86                        1               1                          1
+87                        1               1                          1
+88                        1               1                          0
+89                        1               1                          0
+90                        1               1                          1
+91                        1               1                          1
+92                        1               1                          1
+93                        1               1                          1
+94                        1               1                          1
+95                        1               1                          1
+96                        0               1                          1
+97                        1               1                          1
+98                        1               1                          0
+99                        0               1                          1
+100                       0               1                          0
+101                       0               1                          1
+102                       1               1                          0
+103                       1               1                          1
+104                       1               1                          1
+105                       0               1                          1
+106                       0               1                          1
+107                       1               1                          0
+108                       0               1                          0
+109                       0               1                          1
+    prtx_bonemodBone Modifying Treatment node_statusNode Positive
+1                                      0                        1
+2                                      0                        0
+3                                      0                        1
+4                                      0                        0
+5                                      0                        0
+6                                      1                        1
+7                                      0                        1
+8                                      0                        1
+9                                      0                        0
+10                                     0                        0
+11                                     0                        0
+12                                     0                        1
+13                                     0                        0
+14                                     0                        1
+15                                     0                        0
+16                                     0                        1
+17                                     0                        0
+18                                     0                        1
+19                                     0                        0
+20                                     0                        0
+21                                     0                        0
+22                                     1                        1
+23                                     1                        1
+24                                     0                        1
+25                                     0                        1
+26                                     0                        0
+27                                     0                        0
+28                                     0                        0
+29                                     0                        1
+30                                     0                        0
+31                                     0                        0
+32                                     0                        1
+33                                     0                        0
+34                                     1                        0
+35                                     0                        1
+36                                     1                        0
+37                                     0                        0
+38                                     1                        1
+39                                     0                        0
+40                                     1                        0
+41                                     0                        1
+42                                     0                        1
+43                                     0                        0
+44                                     1                        0
+45                                     1                        0
+46                                     1                        1
+47                                     0                        1
+48                                     1                        1
+49                                     0                        1
+50                                     1                        0
+51                                     1                        1
+52                                     0                        1
+53                                     0                        0
+54                                     0                        1
+55                                     0                        0
+56                                     1                        1
+57                                     1                        0
+58                                     0                        1
+59                                     0                        1
+60                                     0                        1
+61                                     0                        0
+62                                     1                        1
+63                                     0                        0
+64                                     0                        1
+65                                     0                        1
+66                                     0                        0
+67                                     1                        1
+68                                     0                        1
+69                                     1                        1
+70                                     1                        1
+71                                     0                        0
+72                                     0                        1
+73                                     0                        1
+74                                     1                        1
+75                                     1                        1
+76                                     1                        1
+77                                     1                        1
+78                                     0                        0
+79                                     1                        0
+80                                     1                        1
+81                                     0                        0
+82                                     0                        1
+83                                     0                        1
+84                                     1                        1
+85                                     1                        0
+86                                     1                        0
+87                                     1                        1
+88                                     0                        0
+89                                     0                        1
+90                                     1                        1
+91                                     1                        1
+92                                     1                        1
+93                                     1                        1
+94                                     1                        1
+95                                     1                        1
+96                                     0                        0
+97                                     1                        1
+98                                     0                        1
+99                                     0                        1
+100                                    0                        0
+101                                    1                        1
+102                                    0                        1
+103                                    1                        1
+104                                    0                        0
+105                                    1                        0
+106                                    0                        1
+107                                    0                        1
+108                                    0                        0
+109                                    0                        0
+    axillary_dissectionAxillary Dissection diag_surgery_type_1Mastectomy
+1                                        1                             0
+2                                        1                             1
+3                                        1                             1
+4                                        0                             1
+5                                        0                             1
+6                                        1                             1
+7                                        1                             0
+8                                        1                             1
+9                                        0                             0
+10                                       0                             0
+11                                       0                             0
+12                                       0                             0
+13                                       0                             0
+14                                       1                             0
+15                                       0                             1
+16                                       1                             1
+17                                       0                             0
+18                                       0                             0
+19                                       0                             1
+20                                       0                             0
+21                                       0                             0
+22                                       1                             1
+23                                       1                             0
+24                                       1                             1
+25                                       1                             1
+26                                       0                             0
+27                                       1                             0
+28                                       0                             1
+29                                       1                             0
+30                                       0                             1
+31                                       0                             0
+32                                       0                             0
+33                                       0                             1
+34                                       1                             1
+35                                       1                             1
+36                                       0                             1
+37                                       0                             1
+38                                       0                             0
+39                                       0                             0
+40                                       0                             1
+41                                       1                             1
+42                                       1                             1
+43                                       1                             1
+44                                       0                             0
+45                                       0                             0
+46                                       1                             1
+47                                       1                             1
+48                                       1                             0
+49                                       1                             1
+50                                       1                             1
+51                                       0                             0
+52                                       0                             0
+53                                       0                             1
+54                                       1                             1
+55                                       0                             0
+56                                       1                             1
+57                                       0                             1
+58                                       0                             0
+59                                       0                             0
+60                                       1                             1
+61                                       1                             1
+62                                       1                             1
+63                                       0                             0
+64                                       1                             1
+65                                       1                             1
+66                                       1                             1
+67                                       1                             0
+68                                       1                             0
+69                                       0                             1
+70                                       1                             1
+71                                       0                             1
+72                                       1                             1
+73                                       0                             0
+74                                       1                             1
+75                                       1                             1
+76                                       1                             1
+77                                       1                             0
+78                                       0                             0
+79                                       0                             0
+80                                       0                             0
+81                                       0                             0
+82                                       1                             1
+83                                       1                             1
+84                                       0                             1
+85                                       0                             0
+86                                       0                             0
+87                                       1                             0
+88                                       0                             0
+89                                       1                             1
+90                                       1                             1
+91                                       1                             0
+92                                       1                             1
+93                                       1                             1
+94                                       1                             1
+95                                       1                             1
+96                                       0                             1
+97                                       1                             1
+98                                       0                             0
+99                                       1                             1
+100                                      0                             0
+101                                      0                             1
+102                                      1                             1
+103                                      1                             1
+104                                      0                             0
+105                                      0                             1
+106                                      1                             1
+107                                      0                             1
+108                                      0                             1
+109                                      0                             1
+    diag_neoadj_chemo_1Neoadjuvant Chemo
+1                                      0
+2                                      0
+3                                      0
+4                                      0
+5                                      1
+6                                      0
+7                                      0
+8                                      0
+9                                      0
+10                                     0
+11                                     0
+12                                     0
+13                                     0
+14                                     0
+15                                     0
+16                                     0
+17                                     0
+18                                     0
+19                                     0
+20                                     0
+21                                     0
+22                                     0
+23                                     0
+24                                     0
+25                                     0
+26                                     0
+27                                     0
+28                                     1
+29                                     0
+30                                     0
+31                                     0
+32                                     0
+33                                     0
+34                                     1
+35                                     0
+36                                     1
+37                                     0
+38                                     0
+39                                     0
+40                                     0
+41                                     0
+42                                     0
+43                                     1
+44                                     0
+45                                     0
+46                                     0
+47                                     0
+48                                     0
+49                                     1
+50                                     0
+51                                     0
+52                                     0
+53                                     0
+54                                     0
+55                                     0
+56                                     0
+57                                     0
+58                                     0
+59                                     0
+60                                     0
+61                                     0
+62                                     0
+63                                     0
+64                                     1
+65                                     1
+66                                     1
+67                                     0
+68                                     0
+69                                     0
+70                                     1
+71                                     0
+72                                     1
+73                                     0
+74                                     0
+75                                     1
+76                                     0
+77                                     0
+78                                     0
+79                                     0
+80                                     0
+81                                     0
+82                                     1
+83                                     1
+84                                     0
+85                                     0
+86                                     0
+87                                     0
+88                                     0
+89                                     0
+90                                     1
+91                                     0
+92                                     0
+93                                     1
+94                                     0
+95                                     0
+96                                     0
+97                                     0
+98                                     0
+99                                     0
+100                                    1
+101                                    0
+102                                    1
+103                                    0
+104                                    0
+105                                    0
+106                                    0
+107                                    1
+108                                    0
+109                                    0
+attr(,"assign")
+ [1]  1  2  2  2  2  3  3  4  4  5  5  6  6  6  7  7  7  8  8  8  9 10 11 12 13
+[26] 14 15 16
+attr(,"contrasts")
+attr(,"contrasts")$final_receptor_group
+[1] "contr.treatment"
+
+attr(,"contrasts")$demo_race_final
+[1] "contr.treatment"
+
+attr(,"contrasts")$final_tumor_grade
+[1] "contr.treatment"
+
+attr(,"contrasts")$final_overall_stage
+[1] "contr.treatment"
+
+attr(,"contrasts")$final_t_stage
+[1] "contr.treatment"
+
+attr(,"contrasts")$final_n_stage
+[1] "contr.treatment"
+
+attr(,"contrasts")$histology_category
+[1] "contr.treatment"
+
+attr(,"contrasts")$prtx_radiation
+[1] "contr.treatment"
+
+attr(,"contrasts")$prtx_chemo
+[1] "contr.treatment"
+
+attr(,"contrasts")$prtx_endo
+[1] "contr.treatment"
+
+attr(,"contrasts")$prtx_bonemod
+[1] "contr.treatment"
+
+attr(,"contrasts")$node_status
+[1] "contr.treatment"
+
+attr(,"contrasts")$axillary_dissection
+[1] "contr.treatment"
+
+attr(,"contrasts")$diag_surgery_type_1
+[1] "contr.treatment"
+
+attr(,"contrasts")$diag_neoadj_chemo_1
+[1] "contr.treatment"
+
+
dim(X2)  # Rows should match nrow(dtc_unique_subset_data)
+
+
[1] 109  28
+
+
length(y1)  # Should also match nrow(dtc_unique_subset_data). We have the same # (109)!
+
+
[1] 109
+
+
lasso_model <- glmnet(X2, y1, family = "binomial", alpha = 1)  # alpha = 1 for lasso. 0 for ridge. 
+
+#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
+cv_lasso_model <- cv.glmnet(X2, y1, family = "binomial", alpha = 1)
+
+#plotting the results to look at the performance of different lamda 
+plot(cv_lasso_model)
+
+
+
+

+
+
+
+
#getting the best lambda  -- best lambda is 0.024, even lower! 
+best_lambda <- cv_lasso_model$lambda.min
+print(paste("Best lambda:", best_lambda)) 
+
+
[1] "Best lambda: 0.0243089462466253"
+
+
#Finding the final fit model with the optimal lambda 
+final_lasso_model <- glmnet(X2, y1, family = "binomial", alpha = 1, lambda = best_lambda)
+
+#Which coefficents are included in the model. For the model with DTC positivity, we get more coefficients retained in the model.  
+coef(final_lasso_model) 
+
+
29 x 1 sparse Matrix of class "dgCMatrix"
+                                                s0
+(Intercept)                            -1.84167985
+age_at_diag                             .         
+final_receptor_groupTNBC                .         
+final_receptor_groupHR+ HER2-           .         
+final_receptor_groupHR+ HER2+           0.45710104
+final_receptor_groupHR- HER2+          -1.69722848
+demo_race_finalAsian                   -0.74138771
+demo_race_finalWhite                    .         
+final_tumor_gradeGrade 1               -0.38615694
+final_tumor_gradeGrade 2                .         
+final_overall_stageStage II             .         
+final_overall_stageStage III            .         
+final_t_stageT2                         .         
+final_t_stageT3                         .         
+final_t_stageT4                         1.81638875
+final_n_stageN1                         .         
+final_n_stageN2                        -0.02487123
+final_n_stageN3                         0.39987630
+histology_categoryDuctal                1.10854122
+histology_categoryLobular               .         
+histology_categoryOther                -0.62014177
+prtx_radiationRadiation                 0.61433722
+prtx_chemoChemo                         .         
+prtx_endoEndocrine Therapy              .         
+prtx_bonemodBone Modifying Treatment    0.12821102
+node_statusNode Positive               -0.80642910
+axillary_dissectionAxillary Dissection  .         
+diag_surgery_type_1Mastectomy           0.60444202
+diag_neoadj_chemo_1Neoadjuvant Chemo    0.23561799
+
+
+

For the LASSO model with DTC positivity, we get many more coefficients retained in the model and a lower lambda, which suggests a better overall model. Most notable factors in the LASSO for DTC positivity are higher T stage (with T4 inducing the highest log odds risk of ctDNA positivity), triple positive status (HR+ HER2+) which has a strong negative association with DTC positivty (though this cohort only had a handful of people who met this criteria), and ductal histology. Other influential factors using LASSO are node negativity, radiation history, bone modifying treatment, mastectomy, and neoadjuvant therapy. We will try this modeling for DTC positivity without our nodal status variable as this is likely colinear with node positivity to see how our model changes.

+
+
#### DTC LASSO without final_n_stage in it 
+
+set.seed(123) 
+
+#run the lasso for DTC status. This might work better as there are more DTC + results 
+y1 <- dtc_unique_subset_data$dtc_ever
+
+
+### removed final_n_stage as a variable 
+X4 <- model.matrix(ctDNA_ever ~ -1 + age_at_diag + final_receptor_group + demo_race_final+
+                                   final_tumor_grade + final_overall_stage + 
+                                      + final_t_stage  + 
+                                      histology_category + prtx_radiation + 
+                                      + prtx_chemo + prtx_endo + prtx_bonemod  
+                                       + axillary_dissection +  node_status +
+                                      diag_surgery_type_1 + diag_neoadj_chemo_1, data = unique_subset_data)
+
+ 
+
+dim(X4)  # Rows should match nrow(dtc_unique_subset_data)
+
+
[1] 109  25
+
+
length(y1)  # Should also match nrow(dtc_unique_subset_data). We have the same # (109)!
+
+
[1] 109
+
+
lasso_model <- glmnet(X4, y1, family = "binomial", alpha = 1)  # alpha = 1 for lasso. 0 for ridge. 
+
+#Cross-validation. This approach performs k-fold cross validation to find the best lambda. The smaller the lambda, the more variables are included. 
+cv_lasso_model <- cv.glmnet(X4, y1, family = "binomial", alpha = 1)
+
+#plotting the results to look at the performance of different lamda 
+plot(cv_lasso_model)
+
+
+
+

+
+
+
+
#getting the best lambda  -- best lambda is 0.027, same as above
+best_lambda <- cv_lasso_model$lambda.min
+print(paste("Best lambda:", best_lambda)) 
+
+
[1] "Best lambda: 0.0266790384961084"
+
+
#Finding the final fit model with the optimal lambda 
+final_lasso_model <- glmnet(X3, y1, family = "binomial", alpha = 1, lambda = best_lambda)
+
+#Which coefficents are included in the model. For the model with DTC positivity, we get more coefficients retained in the model. Most notable is the influence of axillary dissection (or none) on the log-odds of dtc positivity.  Other influential factors using LASSO are surgery type (with mastectomy vs lumpectomy increasing the log odds of DTC positivity) and neoadjuvant chemotherapy (with the presence of NACT increasing the log odds of DTC positivity). 
+coef(final_lasso_model) 
+
+
28 x 1 sparse Matrix of class "dgCMatrix"
+                                                s0
+(Intercept)                            -1.74827344
+age_at_diag                             .         
+final_receptor_groupTNBC                .         
+final_receptor_groupHR+ HER2-           .         
+final_receptor_groupHR+ HER2+           0.38484232
+final_receptor_groupHR- HER2+          -1.63882020
+demo_race_finalAsian                   -0.57458506
+demo_race_finalWhite                    .         
+final_tumor_gradeGrade 1               -0.35364514
+final_tumor_gradeGrade 2                .         
+final_overall_stageStage II             .         
+final_overall_stageStage III            .         
+final_t_stageT2                         .         
+final_t_stageT3                         .         
+final_t_stageT4                         1.60923057
+final_n_stageN1                        -0.60299273
+final_n_stageN2                        -0.53436611
+final_n_stageN3                         .         
+histology_categoryDuctal                1.07913426
+histology_categoryLobular               .         
+histology_categoryOther                -0.35868770
+prtx_radiationRadiation                 0.48044674
+prtx_chemoChemo                         .         
+prtx_endoEndocrine Therapy              .         
+prtx_bonemodBone Modifying Treatment    0.05228216
+axillary_dissectionAxillary Dissection -0.09409813
+diag_surgery_type_1Mastectomy           0.51967688
+diag_neoadj_chemo_1Neoadjuvant Chemo    0.28396596
+
+
+

In this last LASSO, in which we removed nodal_status to just assess the more granular final_n_stage (N1 vs N2 vs N3 etc), a few more variables became more significant. T4 stage, ductal histology, and receptor status maintained their strong relationships with DTC positivity, and several other variables maintained their less strong relationships (including grade, nodal status, race, bone modifying treatment, mastectomy, and neoadjuvant therapy–which all increased the risk of DTC positivity). Axillary dissection was negatively associated with dtc positivity–but just barely. These models without the node_status variable are the ones we will choose given that the lambdas are about the same or lower (compared to those including node_status) for both the ctDNA and DTC models as these make more intuitive sense than including two variables that are very similar to one another (as they represent the same information in different ways).

+
+
+
+

5 Conclusion

+

In this cohort of 109 individuals on the SURMOUNT study, DTC positivity occurred more frequently (in around 30% of individuals) than ctDNA postiivity, which occurred in < 10% of patients either at baseline or during surveillance. Despite low numbers, there was good concordance between ctDNA and DTC positivity (in particular, accounting for timepoint, with a concordance of 0.8).

+

In assessing predictors of ctDNA positivity, we identified that higher T stage and N stage remain the most significant predictors of ctDNA positivity (With age at diagnosis, HR+ and HER2+, lobular histology, and lower grade also serving as significant predictors of ctDNA positivity). The lambda for this model is 0.048.

+

In assessing predictors of DTC positivity using LASSO, we identified a bunch of factors including ductal histology, higher T stage (larger tumor size), and HER2 negative histology as the factors most strongly associated with DTC positivity. Other factors that were associated in multivariable approaches included factors representing more treatment (mastectomy, radiation, and neoadjuvant therapy). Interestingly, nodal positivity seemed to be negatively associated with DTC positivity. The lambda for this model is 0.027.

+

It is worth noting that the ctDNA model in particular is challenging to interpret in the setting of the low number of ctDNA positive individuals (n=9).

+

Overall, ctDNA status was significantly associated with relapse (p<0.01), with a PPV of 89% and NPV of 94% (and a specificity for relapse of 0.99). DTC positivity was NOT significantly associated with relapse and the sensitivity and specificity of this test for relapse was challenging to interpret in light of the fact that all DTC positive patients in this cohort patients went onto interventional trials aimed at eliminating dormant cancer cells. The negative predictive value of DTC assessment was high (0.86), suggesting that this test may potentially be useful in identifying those individuals who are at lower risk of relapse.

+

Future directions will be aimed at assessing the test characteristics of DTC assessment in the full cohort of patients on SURMOUNT to date (n=220) and looking at the incremental value of multiple testing, obtaining ctDNA assessment for this full cohort of patients, and performing survival analyses to assess lead time to clinical events (relapse, death) with DTC and ctDNA assessment and looking at the fluctuation of ctDNA positivity among those patients on clinical trials who had frequent testing while on therapy (and following therapy).

+

We had several limitations: Missing data (though low levels for our variables of interest for this analysis). Our model also includes colinear variables–or variables that represent different ways of thinking about tumor aggressiveness or disease aggressiveness (such as T stage and N stage, which directly feed into Overall Stage) in the LASSO. The LASSO does not account for this, so we will try group lasso as our next step. We also had limited power in creating predictive model for ctDNA in particular given the rarity of positivity in our cohort (though this rate matches the positivity rate in other cohort studies).

+
+ +
+ + +
+ + + + + From c752fb380bc28d07935dd497de5e3bb570dd8cdc Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 9 Dec 2024 14:52:09 -0500 Subject: [PATCH 12/14] Delete final_project_template.qmd --- final_project_template.qmd | 35 ----------------------------------- 1 file changed, 35 deletions(-) delete mode 100644 final_project_template.qmd diff --git a/final_project_template.qmd b/final_project_template.qmd deleted file mode 100644 index 918aa801a..000000000 --- a/final_project_template.qmd +++ /dev/null @@ -1,35 +0,0 @@ ---- -title: "Your Title" -subtitle: "BMIN503/EPID600 Final Project" -author: "FirstName LastName" -format: html -editor: visual -number-sections: true -embed-resources: true ---- - ------------------------------------------------------------------------- - -Use this template to complete your project throughout the course. Your Final Project presentation will be based on the contents of this document. Replace the title/name above and text below with your own, but keep the headers. Feel free to change the theme and other display settings, although this is not required. I added a new sentence - -## Overview {#sec-overview} - -Give a brief a description of your project and its goal(s), what data you are using to complete it, and what two faculty/staff in different fields you have spoken to about your project with a brief summary of what you learned from each person. Include a link to your final project GitHub repository. - -## Introduction {#sec-introduction} - -Describe the problem addressed, its significance, and some background to motivate the problem. This should extend what is in the @sec-overview. - -Explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff. - -## Methods {#sec-methods} - -Describe the data used and general methodological approach used to address the problem described in the @sec-introduction. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why. - -## Results {#sec-results} - -Describe your results and include relevant tables, plots, and code/comments used to obtain them. You may refer to the @sec-methods as needed. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you'd like, but this is not required. - -## Conclusion - -This the conclusion. The @sec-results can be invoked here. From 9acedf7d1bcd802a98f497524764043e482a5a02 Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 9 Dec 2024 14:57:17 -0500 Subject: [PATCH 13/14] Delete READMETaranto.md --- READMETaranto.md | 15 --------------- 1 file changed, 15 deletions(-) delete mode 100644 READMETaranto.md diff --git a/READMETaranto.md b/READMETaranto.md deleted file mode 100644 index 35c893eeb..000000000 --- a/READMETaranto.md +++ /dev/null @@ -1,15 +0,0 @@ -# BMIN503/EPID600 Final Project - -# BMIN503/EPID600 Final Project - -Link to my final project: https://github.com/ntaranto/BMIN503_Final_Project - -After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA (ctDNA) as risk factors for recurrence. Historically, in retrospective, observational studies, these markers of minimal residual disease (MRD) have demonstrated strong associations with recurrence and disease relapse. But it remains unclear how these associations persist with modern treatment paradigms, what the time course of positivity, and what clinical risk factors put patients at greatest risk for having DTC or ctDNA positivity and subsequent relapse--as well as which most strongly predict biomarker positivity. - -Optimizing detection, intervention and surveillance for MRD after breast cancer treatment utilizing highly sensitive, real-time assays to identify patients with dormant tumor cells and those undergoing tumor reactivation in-transit to recurrence would fundamentally change the paradigm of breast cancer follow up from one of “watchful waiting” to a proactive approach that could enable ongoing monitoring for subclinical disease and suppression or eradication of cells that could potentially metastasize. Ultimately, an approach using MRD biomarkers could provide reassurance to patients with definitively negative MRD testing that they are unlikely to experience a relapse, could enable detection and treatment strategies for those in whom MRD is detectable and, ultimately, prevent recurrence and subsequent death from metastatic breast cancer while preserving quality of life for millions of breast cancer survivors around the world. - -In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this study, we are following patients over multiple years after their breast cancer for these markers and clinically following them for relapse and survival events. The goal of this study is to assess the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time and risk factors for DTC and ctDNA positivity, optimize the type and number of tests needed to predict recurrence, and further evaluate the long-term impact of therapeutic interventions aimed at eliminating DTCs. - -In this specific analysis, we will look at the clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed. - -For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the associations between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about clinical predictors of positivity and the biomarkers of breast cancer recurrence and dormance more broadly. From 001d2f63a9cfff3fee2c48aa1b1070b62ce640e2 Mon Sep 17 00:00:00 2001 From: ntaranto Date: Mon, 9 Dec 2024 14:57:49 -0500 Subject: [PATCH 14/14] Create READMETaranto.md --- READMETaranto.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 READMETaranto.md diff --git a/READMETaranto.md b/READMETaranto.md new file mode 100644 index 000000000..6ae54e663 --- /dev/null +++ b/READMETaranto.md @@ -0,0 +1,13 @@ +# BMIN503/EPID600 Final Project + +Link to my final project: https://github.com/ntaranto/BMIN503_Final_Project + +After treatment for early stage breast cancer, recurrence remains a problem in up to 30% of individuals, in spite of these patients completing definitive treatment with curative intent (including surgery, radiation, and chemotherapy/endocrine therapy) for their early stage breast cancer. It remains unclear who will experience disease recurrence and who will not, and whether we can identify biomarkers such as disseminated/dormant tumor cells (DTCs) or circulating tumor DNA (ctDNA) as risk factors for recurrence. Historically, in retrospective, observational studies, these markers of minimal residual disease (MRD) have demonstrated strong associations with recurrence and disease relapse. But it remains unclear how these associations persist with modern treatment paradigms, what the time course of positivity, and what clinical risk factors put patients at greatest risk for having DTC or ctDNA positivity and subsequent relapse--as well as which most strongly predict biomarker positivity. + +Optimizing detection, intervention and surveillance for MRD after breast cancer treatment utilizing highly sensitive, real-time assays to identify patients with dormant tumor cells and those undergoing tumor reactivation in-transit to recurrence would fundamentally change the paradigm of breast cancer follow up from one of “watchful waiting” to a proactive approach that could enable ongoing monitoring for subclinical disease and suppression or eradication of cells that could potentially metastasize. Ultimately, an approach using MRD biomarkers could provide reassurance to patients with definitively negative MRD testing that they are unlikely to experience a relapse, could enable detection and treatment strategies for those in whom MRD is detectable and, ultimately, prevent recurrence and subsequent death from metastatic breast cancer while preserving quality of life for millions of breast cancer survivors around the world. + +In the SURMOUNT study, we have followed patients who have undergone definitive treatment for their early stage breast cancer for recurrence, and also obtained bone marrow and peripheral blood assessment, looking for disseminated tumor cells (DTCs) in the bone marrow by immunohistochemistry (DTC-IHC) and circulating tumor DNA (ctDNA) in the blood. In this study, we are following patients over multiple years after their breast cancer for these markers and clinically following them for relapse and survival events. The goal of this study is to assess the relationship between ongoing surveillance for MRD (utilizing standard DTC-IHC and ctDNA by a tumor-informed assay) and recurrence, assess the trajectory of MRD over time and risk factors for DTC and ctDNA positivity, optimize the type and number of tests needed to predict recurrence, and further evaluate the long-term impact of therapeutic interventions aimed at eliminating DTCs. + +In this specific analysis, we will look at the clinical predictors of ctDNA and DTC positivity to try to understand the strongest drivers of MRD in the population of patients with high-risk early breast breast cancer who have finished definitive treatment. We will do so by looking at the clinical characteristics, DTC and ctDNA assessment, and clinical follow-up for the cohort of patients enrolled onto the SURMOUNT (Surveillance Markers of Utility for Recurrence after (Neo)adjuvant Therapy for Breast Cancer) study between 2016 and 2021 who had ctDNA assessment performed. + +For this analysis, I have spoken with Dr. Nicholas Seewald, my biostatistical mentor, and Dr. Angela DeMichele, my epidemiology mentor and a breast oncologist with expertise in the biomarkers of recurrence (and the PI on the SURMOUNT study). Dr. Seewald has been critical to thinking about the approach to multivariable analysis to assess the clinical predictors of ctDNA and DTC positivity as well as survival analyses to understand the associations between biomarker positivity and relapse. Dr. Demichele has been instrumental in the development of the SURMOUNT study design, designing the analysis of this initial surveillance cohort, and thinking about clinical predictors of positivity and the biomarkers of breast cancer recurrence and dormance more broadly.