Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Option for isolated and colony profiles #30

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 58 additions & 14 deletions aggregate.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,13 @@
'aggregate

Usage:
aggregate.R <sqlite_file> -o <file>
aggregate.R -s <sqlite_file> -o <file> [-t <sc_type>]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after reviewing, I don't think this change is ready for prime time. A couple more conditional statements that only behave when sc_type is specified are required.


Options:
-h --help Show this screen.
-o <file> --output=<file> per-well aggregated data.' -> doc
-h --help Show this screen.
-s <sqlite_file> --sqlite_file=<sqlite_file> Location of the sql_lite file.
-o <file> --output=<file> per-well aggregated data.
-t <sc_type> --sc_type=<sc_type> which sc_type to focus on [default: none]' -> doc

suppressWarnings(suppressMessages(library(docopt)))

Expand All @@ -19,40 +21,82 @@ suppressWarnings(suppressMessages(library(stringr)))

opts <- docopt(doc)

db <- src_sqlite(path = opts[["sqlite_file"]])
sql_file <- opts[["sqlite_file"]]

output_file <- opts[["output"]]

sc_type <- opts[["sc_type"]]

db <- src_sqlite(path = sql_file)

image <- tbl(src = db, "image") %>%
select(TableNumber, ImageNumber, Metadata_Plate, Metadata_Well)

aggregate_objects <- function(compartment) {
# If sc_type is specified, group by different columns
if (sc_type == "none") {
strata_cols <- c("Metadata_Plate", "Metadata_Well")
} else {
strata_cols <- c("TableNumber", "ImageNumber", "ObjectNumber", "Metadata_Plate",
"Metadata_Well")
}

aggregate_objects <- function(compartment,
strata_cols = c("Metadata_Plate", "Metadata_Well"),
sc_type = "none") {
object <- tbl(src = db, compartment)

object %<>% inner_join(image, by = c("TableNumber", "ImageNumber"))

if (sc_type == "isolated") {
object <- object %>% dplyr::filter(Cells_Neighbors_NumberOfNeighbors_Adjacent == 0)
Copy link
Collaborator Author

@shntnu shntnu Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gwaygenomics This wouldn't work when compartment is Cytoplasm or Nuclei, correct?

Copy link
Member

@gwaybio gwaybio Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wrote this (see below), but then realized I am wrong. This may have been my original plan, but I do not remember 100%

you are correct, this will fail as currently implemented! This is because we're aggregating compartment by compartment. If we merge everything first and then aggregate, it should work. It seems like some version of this must have worked at one point (because the call on L72 would have errored out for other non-cell compartments).

Original thought that is wrong

If I'm remembering correctly, sc_type should be specified only when compartment = "cells" (here).

Since we're doing an inner join on L72, this should take care of filtering for other compartments

} else if (sc_type == "colony") {
object <- object %>% dplyr::filter(Cells_Neighbors_NumberOfNeighbors_Adjacent >= 4)
}

# compartment tag converts nuclei to ^Nuclei_
compartment_tag <-
str_c("^", str_sub(compartment, 1, 1) %>% str_to_upper(), str_sub(compartment, 2), "_")

variables <- colnames(object) %>% stringr::str_subset(compartment_tag)

futile.logger::flog.info(str_c("Started aggregating ", compartment))

cytominer::aggregate(
population = object,
variables = variables,
strata = c("Metadata_Plate", "Metadata_Well"),
strata = strata_cols,
operation = "mean"
) %>% collect()

}

aggregated <-
aggregate_objects("cells") %>%
inner_join(aggregate_objects("cytoplasm"),
by = c("Metadata_Plate", "Metadata_Well")) %>%
inner_join(aggregate_objects("nuclei"),
by = c("Metadata_Plate", "Metadata_Well"))
aggregate_objects(compartment = "cells",
strata_cols = strata_cols,
sc_type = sc_type) %>%
inner_join(
aggregate_objects(compartment = "cytoplasm",
strata_cols = strata_cols,
sc_type = sc_type),
by = strata_cols
) %>%
inner_join(
aggregate_objects(compartment = "nuclei",
strata_cols = strata_cols,
sc_type = sc_type),
by = strata_cols
)

if (sc_type != "none") {
futile.logger::flog.info(
paste0("Now collapsing single cell by well for ", output_file)
)

aggregated <- aggregated %>%
dplyr::group_by_(.dots = c("Metadata_Plate", "Metadata_Well")) %>%
dplyr::summarize_at(.funs = 'mean', .vars = variables) %>%
dplyr::ungroup()
}

futile.logger::flog.info(paste0("Writing aggregated to ", opts[["output"]]))
futile.logger::flog.info(paste0("Writing aggregated to ", output_file))

aggregated %>% readr::write_csv(opts[["output"]])
aggregated %>% readr::write_csv(output_file)
8 changes: 4 additions & 4 deletions annotate.R
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ perturbation_mode <- opts[["perturbation_mode"]]

metadata_dir <- paste("../..", "metadata", batch_id, sep = "/")

backend_dir <- paste("../..", "backend", batch_id, plate_id, sep = "/")
backend_dir <- paste("../..", "backend", batch_id, unlist(strsplit(plate_id, "_"))[1], sep = "/")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gwaygenomics Do you know why the string is being split?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed on the brief chat, indeed, it looks like I am splitting the plate ID because I am inputing the plate ID with the sc_type tagged (here).

This is a result of the output from the modified aggregate.R call above, based on this line

This will definitely break in situations where the plate ID has an underscore.

Will need to address!


# read profiles and rename column names

Expand Down Expand Up @@ -85,13 +85,13 @@ if (format_broad_cmap) {
Metadata_pert_mfc_id = Metadata_broad_sample,
Metadata_pert_well = Metadata_Well,
Metadata_pert_id_vendor = "")

if ('Metadata_cell_id' %in% names(profiles)) {
message('`cell_id` column present in metadata, will not override.')

} else {
profiles %<>% mutate(Metadata_cell_id = cell_id)

}

if (perturbation_mode == "chemical") {
Expand Down
17 changes: 7 additions & 10 deletions collate.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,6 @@ suppressWarnings(suppressMessages(library(magrittr)))

suppressWarnings(suppressMessages(library(stringr)))

#str <- "-b 2017_12_05_Batch2 -p BR00092655 -c ingest_config.ini -d -r s3://imaging-platform/projects/2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad/workspace"

#opts <- docopt(doc, str)

opts <- docopt(doc)

batch_id <- opts[["batch_id"]]
Expand All @@ -56,8 +52,6 @@ stopifnot(!download || !is.null(remote_base_dir))

base_dir = "../.."

# str(opts)

input_dir <- file.path(base_dir, "analysis", batch_id, plate_id, pipeline_name)

if (download) {
Expand Down Expand Up @@ -106,8 +100,13 @@ if (file.exists(backend_file) & file.exists(aggregated_file)) {
}

if (!file.exists(cache_backend_file) | overwrite_backend_cache) {
ingest_cmd <- paste("cytominer-database", "ingest", input_dir, paste0("sqlite:///", cache_backend_file),
"-c", config, ifelse(munge, "--munge", "--no-munge"))
ingest_cmd <- paste("cytominer-database",
"ingest",
input_dir,
paste0("sqlite:///", cache_backend_file),
"-c",
config,
ifelse(munge, "--munge", "--no-munge"))

if (file.exists(cache_backend_file)) {
file.remove(cache_backend_file)
Expand Down Expand Up @@ -138,7 +137,6 @@ if (!file.exists(cache_backend_file) | overwrite_backend_cache) {
futile.logger::flog.info("Indexing...")

system(index_cmd)

}

# create aggregated (even if it already exists)
Expand Down Expand Up @@ -201,5 +199,4 @@ if (download) {

move_and_check(cache_aggregated_file, aggregated_file)


}
5 changes: 2 additions & 3 deletions normalize.R
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ operation <- opts[["operation"]]

subset <- opts[["subset"]] # e.g. "Metadata_broad_sample_type == '''control'''"

backend_dir <- paste("../..", "backend", batch_id, plate_id, sep = "/")
backend_dir <- paste("../..", "backend", batch_id, unlist(strsplit(plate_id, "_"))[1], sep = "/")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably go (same as before)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


# load profiles
profiles <- suppressMessages(readr::read_csv(paste(backend_dir, paste0(plate_id, "_augmented.csv"), sep = "/")))
Expand Down Expand Up @@ -103,7 +103,7 @@ normalize_profiles <- function(compartment) {

normalized <-
cytominer::normalize(
population = load_profiles(compartment = compartment),
population = load_profiles(compartment = compartment) %>% mutate_at(variables, as.double),
variables = variables,
strata = c("Metadata_Plate"),
sample = sample,
Expand All @@ -123,4 +123,3 @@ normalized <-
by = metadata)

normalized %>% readr::write_csv(paste(backend_dir, paste0(plate_id, "_normalized.csv"), sep = "/"))

2 changes: 1 addition & 1 deletion preselect.R
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ for (operation in operations) {
# This is handled differently because there is no direct way yet to do filtering in cytominer
# TODO: rewrite this after cytominer has an appropriate filtering function for this
testthat::expect_false(is.null(replicates), info="replicates should be specified when performing replicate_correlation")

Copy link
Collaborator Author

@shntnu shntnu Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The print debugging can go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@gwaybio gwaybio Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in a3675c9

feature_replicate_correlations <-
df %>%
cytominer::replicate_correlation(
Expand Down
3 changes: 1 addition & 2 deletions select.R
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,7 @@ variables_selected <-
Reduce(function(df1, df2) dplyr::inner_join(df1, df2, by ="variable"), .) %>%
magrittr::extract2("variable")

backend_dir <- paste("../..", "backend", batch_id, plate_id, sep = "/")

backend_dir <- paste("../..", "backend", batch_id, unlist(strsplit(plate_id, "_"))[1], sep = "/")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably go (same as before)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

profiles <- paste(backend_dir, paste0(plate_id, "_normalized.csv"), sep = "/")

profiles_variable_selected <- paste(backend_dir, paste0(plate_id, "_normalized_variable_selected.csv"), sep = "/")
Expand Down